Description of column-types used to define responses

Y: response

The Y column-type can be used for continuous, categorical, count or time-to-event data.
When there is no EVID or MDV column (see hereunder), a line is considered as a response-line if it contains a value and there is no dose-column (i.e. content of the column with dose-type AMT) or if the dose-column contains either string ‘.’ or a 0. As a consequence, when there are null values in both dose-column and response-column, line is considered as a response-line. The following table sums up the different situations

Line consideration for all dose and response cases

Notice that in the case of the definition of both a non null amount and a measurement, the choice was made to favor the measurementTo solve it without any EVID column, the user should provide two distinct lines to provide both a dose-line and a response line. For instance, in the following data set

TIME ID AMT Y
12.1 John 1.1 12.6

the line is considered as a response-line, a measurement is set at 12.6 at time 12.1 and no dose is added. Of course it is possible to specify a response and a dose at same time but lines shall be duplicated as in the following data set

TIME ID AMT Y
12.1 Tom . 12.6
12.1 Tom 1.1 .

In that case, the first line is again considered as a response-line, a measurement is set at 12.6 at time 12.1. But the second line is considered as a dose amount at time 12.1 with an amount 1.1.

For continuous data:

For continuous data, the time and value of each observation for each subject is given, as in the following example:

ID TIME AMT Y
1 0 50 .
1 0.5 . 1.1
1 1 . 10.2
1 1.5 . 8.5
1 2 . 6.3
1 2.5 . 5.5

One can see theophylline data set, the warfarin data set, and the HIV data set for example for more practical examples on continuous outputs data set.

For categorical data:

In case of categorical data, the observations at each time point can only take values in a fixed and finite set of nominal categories. In the data set, the output categories must be coded as integers, as in the following example:

ID TIME Y
1 0.5 3
1 1 0
1 1.5 2
1 2 2
1 2.5 3

One can see the warfarin data set for example for more practical examples on a joint continuous and categorical data set.

For count data:

Count data can take only non-negative integer values that come from counting something, e.g., the number of trials required for completing a given task. The task can for instance be repeated several times and the individuals performance followed. In the following data set:

ID TIME Y
1 0 10
1 24 6
1 48 5
1 72 2

10 trials are necessary the first day (t=0), 6 the second day (t=24), etc.

Count data can also represent the number of events happening in regularly spaced intervals, e.g the number of seizures every week. If the time intervals are not regular, the data may be considered as repeated time-to-event interval censored, or the interval length can be given as regressor to be used to define the probability distribution in the model.

For (repeated) time-to-event data:

In this case, the observations are the “times at which events occur“. An event may be on-off (e.g., death) or repeated (e.g., epileptic seizures, mechanical incidents, strikes). In addition, an event can be exactly observed, interval censored or right censored.

For single events exactly observed:

One must indicated the start time of the observation period with Y=0, and the time of event (Y=1) or the time of the end of the observation period if no event has occurred (Y=0). In the following example:

ID TIME Y
1 0 0
1 34 1
2 0 0
2 80 0

the observation period last from starting time t=0 to the final time t=80. For individual 1, the event is observed at t=34, and for individual 2, no event is observed during the period. Thus it is noticed that at the final time (t=80), no event occurred. One can see PBC data set and Oropharynx data set for practical example of time-to-event data set.

For repeated events exactly observed:

One must indicate the start time of the observation period (Y=0), the end time (Y=0) and the time of each event (Y=1). In the following example:

ID TIME Y
1 0 0
1 34 1
1 76 1
1 80 0
2 0 0
2 80 0

Again, the observation period last from starting time t=0 to the final time t=80. For individual 1, two events are observed at t=34 and t=76, and for individual 2, no event is observed during the period.

For single events interval censored:

When the exact time of the event is not known, but only an interval can be given, the start time of this interval is given with Y=0, and the end time with Y=1. As before, the start time of the observation period must be given with Y=0. In the following example:

ID TIME Y
1 0 0
1 32 0
1 35 1

we only know that the event has happened between t=32 and t=35.

For repeated events interval censored:

In this case, we do not know the exact event times, but only the number of events that occurred for each individual in each interval of time. The column-type Y can now take values greater than 1, if several events occurred during an interval. In the following example:

ID TIME Y
1 0 0
1 32 0
1 35 1
1 50 1
1 56 0
1 78 2
1 80 1

No event occurred between t=0 and t=32, 1 event occurred between t=32 and t=35, 1 between t=35 and t=50, none between t=50 and t=56, 2 between t=56 and t=78 and finally 1 between t=78 and t=80.

Format restrictions (an exception will be thrown otherwise):

  • A data set shall not contain more than one column with column-type Y.
  • Response-column shall contain double value or string “.”.
  • If there is a non null double value in dose-column, there must be a non null double value in the response-column.

Warning

  • If a subject or a subject/occasion has no observation, a warning message arises telling which individuals, subjects/individuals have no measurement.

YTYPE: response type

If observations are recorded on several quantities (several concentrations, effects, etc), the column-type YTYPE permits to assign names to the observations of the column-type Y, for mapping with the quantities outputted by the model. Notice that in case of a dose line, the value in the YTYPE column will not be read, thus the user can set any value (‘.’; the same as a concentration, …) 
Entries in the column-type YTYPE can be strings or integers however, we strongly recommend to use only alphanumeric characters. The underscore “_” character is allowed in the strings of your data set. The mapping of the YTYPE to the model output (in the OUTPUT block of the Mlxtran model file) is done following alphabetical order (and not name matching). In the following data set:

TIME DOSE Y Y_TYPE
0 . 12 conc
5 . 6 conc
10 . 4 effect
15 . 3 effect
20 . 2.1 conc
25 . 2 conc

with the following OUTPUT block in the Mlxtran model file:

OUTPUT:
output = {E, Cc}

the observations tagged with “conc” will be mapped to the first output “E”, and those tagged with “effect” will be mapped to the second output “Cc”, because in alphabetical order “conc” comes before “effect”. To avoid confusion, we recommend to use integers in the YTYPE column-type, with “1” corresponding to the first output, “2” to the second, etc… If you have more than 10 types of observations, notice that in alphabetical order “10” comes before “2”.
If you use strings, note that “.” is not considered as a repetition or previous line but as the name of a response. For instance, the following data set creates three different types of responses : “type1”, “.”, and “type2”:

TIME DOSE Y Y_TYPE
0 . 12 type1
5 . 6 type1
10 . 4 .
15 . 3 .
20 . 2.1 type2
25 . 2 type2

Format restrictions (an exception will be thrown otherwise):

  • A data set shall not contain more than one column with column-type YTYPE.

CENS: censored response

  • CENS = 1 means that the value in response-column (y_{obs}), the content of the column with column-type Y) is an upper limit, true observation y verifies y<y_{obs}.
  • CENS = 0 means the value in response-column corresponds to a valid observation (no interval associated).
  • CENS = -1 means that the value in response-column (y_{obs}) is a lower bound, true observation y verifies y>y_{obs}.

Format restrictions (an exception will be thrown otherwise):

  • A data set shall not contain more than one column with column-type CENS.
  • There are only three possible values : -1, 0, and 1.
  • String “.” is interpreted as 0.

LIMIT: limit for censored values

When column LIMIT contains a value and CENS is different that 0, then the value in the LIMIT column, it can be interpreted as the second bound of the observation interval. Thus, it implies that y\in [y_{limit}, y_{obs}].

Format restrictions (an exception will be thrown otherwise):

  • A data set shall not contain more than one column with column-type LIMIT.
  • A data set shall not contain any column with column-type LIMIT if no column with column-type CENS is present.
  • Column LIMIT shall contain either a string that can be converted to a double or “.”.

Example of censored data definition

The proposed example illustrates the case of upper and lower bound on a classical data set of a classical PK model (first order absorption and linear elimination). From the measurements point of view

  • There is a lower bound at .5 as the censor is not able to measure lower concentrations, it corresponds to CENS=1 case. Moreover, the concentration can not be lower than 0, thus LIMIT=0.
  • There is an upper bound at 5 as the censor is not able to measure higher concentrations, it corresponds to CENS=-1 case. Moreover, from the experimental/modeler point of view, the concentration can not be higher than 6, thus LIMIT=6.

The measurement is represented in the following figure

kaVCl

The measurement corresponds to the blue stars, the real values when censoring arises are in red and green. The corresponding data set is

ID Time Y CENS LIMIT 
1  0  0.5 1 0 
1  1  0.5 1 0 
1  2  4.7 0 0 
1  3  5.0 -1 6 
1  4  5.0 -1 6 
1  5  4.5 0 0 
1  6  3.8 0 0 
* * * * * 
1 15  0.6 0 0 
1 16  0.5 0 0 
1 17  0.5 1 0 
1 18  0.5 1 0 
* *   *   * *  

The mathematical handling of censored data is described here.