PBC data set
PBC is a rare but fatal chronic liver disease of unknown cause, with a prevalence of about 50-cases-per-million population. The primary pathologic event appears to be the destruction of interlobular bile ducts, which may be mediated by immunologic mechanisms.
Between January, 1974 and May, 1984, the Mayo Clinic conducted a double-blinded randomized trial in primary biliary cirrhosis of the liver (PBC), comparing the drug D-penicillamine (DPCA) with a placebo. There were 424 patients who met the eligibility criteria seen at the Clinic while the trial was open for patient registration. Both the treating physician and the patient agreed to participate in the randomized trial in 312 of the 424 cases. The date of randomization and a large number of clinical, biochemical, serologic, and histologic parameters were recorded for each of the 312 clinical trial patients. The data from the trial were analyzed in 1986 for presentation in the clinical literature. For that analysis, disease and survival status as of July, 1986, were recorded for as many patients as possible. By that date, 125 of the 312 patients had died, with only 11 not attributable to PBC. Eight patients had been lost to follow up, and 19 had undergone liver transplantation.
The considered data set comes from Counting Processes and Survival Analysis by T. Fleming & D. Harrington, (1991), published by John Wiley & Sons. The data set can be seen here, and the corresponding Datxplore project here (notice that both file should be in the same folder to be correctly linked).
On the following figure, one could see the survival curve and the mean number of events with respect to time. Notice, that this figure was generated using Datxplore.
In this data set, there are a lot of available covariates
id = case number futime = number of days between registration and the earlier of death, transplantion, or study analysis time in July, 1986 status = 0=alive, 1=liver transplant, 2=dead drug = 1= D-penicillamine, 2=placebo age = age in days sex = 0=male, 1=female ascites = presence of ascites: 0=no 1=yes hepato = presence of hepatomegaly 0=no 1=yes spiders = presence of spiders 0=no 1=yes edema = presence of edema 0=no edema and no diuretic therapy for edema; .5 = edema present without diuretics, or edema resolved by diuretics; 1 = edema despite diuretic therapy bili = serum bilirubin in mg/dl chol = serum cholesterol in mg/dl albumin = albumin in gm/dl copper = urine copper in ug/day alk_phos = alkaline phosphatase in U/liter sgot = SGOT in U/ml trig = triglicerides in mg/dl platelet = platelets per cubic ml/1000 protime = prothrombin time in seconds stage = histologic stage of disease
On the two following figure, one could see the survival curve and the mean number of events with respect to time for two groups, the first groups concerns the subjects younger than 52.3 years and the other group concerns the other one. Notice, that this figure was generated using Datxplore.
Simplified PBC data set
The data set for subjects 1 and 2 can be defined as follows
ID;TIME;Y;TRT;AGE;SEX; 1;0;0;1;58.7652;1; 1;400;1;1;58.7652;1; 2;0;0;1;56.4463;1; 2;4500;0;1;56.4463;1;
One must indicated the start time of the observation period with Y=0 (at line 1 and 3 for subject 1 and 2 respectively), and the time of event (Y=1) or the time of the end of the observation period if no event has occurred (Y=0). In this simplified data set, subject one had an event at time 400 leading to a line in the data set where Y=1. On the contrary, no event occurred for subject 2. Thus, at the end of the observation (TIME=4500), Y is set to 0.