Data set examples

This section presents several data sets to show some concrete data set and see how to integrate censored data, covariates, …

Data sets with continuous outputs

  • Theophylline data set: continuous outputs are taken into account along with categorical and continuous covariates (sex and weight respectively). Moreover, censored data are also managed.
  • Tobramycin data set: continuous PK output are taken into account, along with categorical and continuous covariates.
  • HIV data set: two continuous censored outputs are considered. No dose is used in the data set, and the treatment type is considered as a categorical covariate.
  • Veralipride data set: continuous output with an interesting absorption variability being by far the most probable physiological explanation for the double peak phenomenon.
  • Remifentanil data set: Remifentanil is an opioid analgesic drug with a rapid onset and rapid recovery time. Remifentanil concentration over 65 healthy adults is proposed.

Data sets with discrete count outputs

  • Epilepsy attacks data set: count outputs are taken into account along with categorical and continuous covariates. The data arose from a clinical trial of 59 epileptics who were randomized to receive either the anti-epileptic drug progabide or a placebo, as an adjuvant to standard chemotherapy. Patients attended four successive post-randomisation clinic visits, where the number of seizures that occurred over the previous 2 weeks was reported.
  • Crohn’s Disease Adverse Events data set: Data set issued from a study of the adverse events of a drug on 117 patients affected by Crohn’s disease (a chronic inflammatory disease of the intestines). In addition to the response variable number of adverse events, 7 explanatory variables were recorded for each patient.

Data sets with discrete categorical outputs

  • Respiratory status data set: the respiratory status of patients under placebo or treatment is categorized as “poor” or “good” once per month during 5 months over 111 patients.
  • Inpatient multidimensional psychiatric data set: categorical output with a categorical covariate (treatment) during 6 weeks. These data are from the National Institute of Mental Health Schizophrenia Collaborative Study and are available here. Patients were randomized to receive one of four medications, either placebo or one of three different anti-psychotic drugs. The primary outcome is item 79 on the Inpatient Multidimensional Psychiatric.
  • Zylkene data set: The putative effects of a tryptic bovine αs1-casein hydrolysate on anxious disorders in cats was investigated using this data set over 24 cats. The score is a global score of emotional state.

Data sets with  time-to-event outputs

  • PBC data set: PBC is a rare but fatal chronic liver disease of unknown cause, with a prevalence of about 50-cases-per-million population. Between January, 1974 and May, 1984, the Mayo Clinic conducted a double-blinded randomized trial in primary biliary cirrhosis of the liver (PBC), comparing the drug D-penicillamine (DPCA) with a placebo.
  • Oropharynx data set: The following data set provides the data for a part of a large clinical trial carried out by the Radiation Therapy Oncology Group in the United States. One objective of the study was to compare the two treatment policies with respect to patient survival.
  • Veterans’ Administration Lung Cancer data set: In this study conducted by the US Veterans Administration, time to death was recorded for 137 male patients with advanced inoperable lung cancer, which were given either a standard therapy or a test chemotherapy.
  • NCCTG lung cancer data set: The North Central Cancer Treatment Group (NCCTG) data set records the survival (time-to-event output) of 228 patients with advanced lung cancer, together with assessments of the patients performance status measured either by the physician and by the patients themselves.
  • Cardiovascular data set:  A subset of the fields was selected to model the differential length of stay for patients entering the hospital to receive one of two standard cardiovascular procedures: CABG and PTCA. The data set contains 3589 individuals.

Joint data sets

  • Warfarin data set: Warfarin is an anticoagulant normally used in the prevention of thrombosis and thromboembolism.  Plasma warfarin concentrations and Prothrombin Complex Response in thirty normal subjects after a single loading dose are measured. Both measurements are continuous.
  • Remifentanil data set: Remifentanil is an opoid analgesic drug with a rapid onset and rapid recovery time. Both remifentanil concentration and EEG measurement are proposed on 65 healthy adults. Both measurements are continuous.
  • PSA and survival data set: PSA kinetics and survival data for 400 men with metastatic Castration-Resistant Prostate Cancer (mCRPC) treated with docetaxel and prednisone, the first-line reference chemotherapy, which constituted the control arm of a phase 3 clinical trial. In this context of advanced disease, the incidence of death is high and the PSA kinetics is closely monitored after treatment initiation to rapidly detect a breakthrough in PSA and propose rescue strategies.