Primary Biliary Cirrhosis data set

Primary Biliary Cirrhosis is a rare but fatal chronic liver disease of unknown cause, with a prevalence of about 50-cases-per-million population. The primary pathologic event appears to be the destruction of interlobular bile ducts, which may be mediated by immunologic mechanisms.
Between January, 1974 and May, 1984, the Mayo Clinic conducted a double-blinded randomized trial in primary biliary cirrhosis of the liver (PBC), comparing the drug D-penicillamine (DPCA) with a placebo. There were 424 patients who met the eligibility criteria seen at the Clinic while the trial was open for patient registration. Both the treating physician and the patient agreed to participate in the randomized trial in 312 of the 424 cases. The date of randomization and a large number of clinical, biochemical, serologic, and histologic parameters were recorded for each of the 312 clinical trial patients. The data from the trial were analyzed in 1986 for presentation in the clinical literature. For that analysis, disease and survival status as of July, 1986, were recorded for as many patients as possible. By that date, 125 of the 312 patients had died, with only 11 not attributable to PBC. Eight patients had been lost to follow up, and 19 had undergone liver transplantation.

The considered data set comes from Counting Processes and Survival Analysis by T. Fleming & D. Harrington, (1991), published by John Wiley & Sons. On the following figure, one could see the survival curve and the mean number of events with respect to time. Notice, that this figure was generated using Datxplore.

In this data set, there are a lot of available covariates

id       = case number
futime   = number of days between registration and the earlier of death,
           transplantion, or study analysis time in July, 1986
status   = 0=alive, 1=liver transplant, 2=dead
drug     = 1= D-penicillamine, 2=placebo
age      = age in days
sex      = 0=male, 1=female
ascites  = presence of ascites: 0=no 1=yes
hepato   = presence of hepatomegaly 0=no 1=yes
spiders  = presence of spiders 0=no 1=yes
edema    = presence of edema 0=no edema and no diuretic therapy for edema;
          .5 = edema present without diuretics, or edema resolved by diuretics;
           1 = edema despite diuretic therapy
bili     = serum bilirubin in mg/dl
chol     = serum cholesterol in mg/dl
albumin  = albumin in gm/dl
copper   = urine copper in ug/day
alk_phos = alkaline phosphatase in U/liter
sgot     = SGOT in U/ml
trig     = triglicerides in mg/dl
platelet = platelets per cubic ml/1000
protime  = prothrombin time in seconds
stage    = histologic stage of disease

On the two following figure, one could see the survival curve with respect to the treatment. Notice, that this figure was generated using Datxplore.

Simplified data set

The data set for subjects 1 and 2 can be defined as follows

ID;TIME;Y;TRT;AGE;SEX;
1;0;0;1;58.7652;1;
1;400;1;1;58.7652;1;
2;0;0;1;56.4463;1;
2;4500;0;1;56.4463;1;

One must indicated the start time of the observation period with Y=0 (at line 1 and 3 for subject 1 and 2 respectively), and the time of event (Y=1) or the time of the end of the observation period if no event has occurred (Y=0). In this simplified data set, subject one had an event at time 400 leading to a line in the data set where Y=1. On the contrary, no event occurred for subject 2. Thus, at the end of the observation (TIME=4500), Y is set to 0.