Select Page

# Version 2018R1

This documentation specifies the format of datasets for the MonolixSuite 2018R1. It details:

• General structure: structure of a dataset for population modeling
• Format rules: rules to format your experimental data, using the available column types
• Examples: examples of real datasets with typical features (continuous data, discrete data, time-to-event data, censored data, data with several types of measurements, …)
• Nonmem differences: key differences with the Nonmem format

## Dataset for population modeling

The considered datasets are dedicated to population modeling. The population approach describes phenomena observed in each of a set of individuals and the variability between individuals. The data is thus individual data, and is often longitudinal (over time).  For each subject, the dataset contains measurements, dose regimen, covariates etc … i.e. all collected information.

## General format

The data must be in long format, i.e each row is one time point per subject. For each row, the individuals ID, observations, dose amount, covariates, etc are recorded in different columns. The column headers in the dataset are free, but the columns must be tagged using the available column types when defining the data in the applications of the MonolixSuite, such that the application knows how to interpret the data. The column types are very similar and compatible with the structure used by the Nonmem software.

The file extension should be .txt or .csv, a header line is needed and the data must be separated by tab “\t”, comma “,”,  semicolon “;” or a space ” “.

### Data set structure

The data set structure contains for each subject measurements, dose regimen, covariates etc … i.e. all collected information. The data must be in the long format, i.e each line corresponds to one individual and one time point. Different type of information (dose, observation, covariate, etc) are recorded in different columns, which must be tagged with a column type (see below). The column types are very similar and compatible with the structure used by the Nonmem software (the differences are listed here).

Depending on the information it contains, each line will be considered as (with exception of the header line):

• dose-line: line that contains information about the dose’s regimen (and possibly also about covariates and regression variables)
• response-line: line that contains an observation (and possibly also about covariates and regression variables)
• both dose and response-line: line that contains information about both the dose regimen and an observation (and possibly also about covariates and regression variables)

Note that in the MonolixSuite2016R1, a line could not be both a dose-line and a response-line. Two lines were necessary to define a dose information and a measure occurring at the same time.

### Description of column-types

The first line of the data set must be a header line, defining the names of the columns. The columns names are completely free. In the MonolixSuite applications, when defining the data, the user will be asked to assign each column to a column-type (see here for an example of this step). The column type will indicate to the application how to interpret the information in that column. The available column types are given below:

Column-types used for all types of lines:

Column-types used for response-lines:

Column-types used for dose-lines:

### ID: subject identifier

The column is used to identify the different subjects and is mandatory. Its content is totally free (integers, double, strings…), but we recommend to use integers for better readability.

#### Examples

• The string ‘.’ will not be interpreted as a repetition of the previous line. As a consequence a data set of the form
ID * *
John * *
John * *
Mike * *
. * *


contains 3 different subjects : ‘John’, ‘Mike’ and ‘.’.

• Contrarily to NONMEM, the lines corresponding to the same subject do not need to be next to each other. Thus, the following file contains 2 subjects with IDs “1” and “2”.
ID * *
1 * *
1 * *
2 * *
2 * *
1 * *

• The IDs are not sorted lexicographical order but by order of appearance in the data set.

#### Format restrictions

• A data set shall contain one and only one column ID.
• The ID must be defined for all lines.

### OCCASION: occasion identifiers

Occasions define different periods of time within individuals. Occasions may be (but don’t have to) used to define inter-occasion (intra-patient) variability. The MonolixSuite allows the definition of several columns with the column-type OCCASION, which can be used to define several levels of inter-occasion variability. The OCCASION columns can contain only integers (neither necessarily starting at one, nor necessarily consecutive), which represent occasion identifiers. All times points belonging to one occasion must be in one block (i.e not interrupted by time points of another occasion). When switching from one occasion to the next one, time can restart at the initial value or continue. If different occasions contain time points that overlap, a washout will automatically be added.

#### Examples and typical situations

• Cross over study: In that case, data are collected for each patient during two independent treatment periods of time, there is an overlap on the time definition of the periods (e.g both periods start at 0). A column-type OCCASION can be used used to identify the periods. See here for an example.
• Occasions with washout (due to EVENT ID = 4): In that case, there are no overlap between the periods. The time is increasing but the dynamical system (i.e. the compartments) is reset when the second period starts. In particular, EVENT ID = 4 indicates that the system is reset (washout) for example, when a new dose is administrated. See here for an example.
• Occasions with washout (due to overlapping times): In that case, the time is increasing and the overlap between two time points of two different occasions creates a washout. If the washout is not desired, one of the two times can be offset by a small value to avoid the overlap.
• Occasions without washout: In that case, there are no overlap between the periods. The time is increasing and we want to differentiate periods in terms of occasions without any reset of the dynamical system. On the example defined here, multiple doses are administrated to each patient and each period of time between successive doses is defined as a different occasion via the column-type OCCASION.

However, the following situation, which would aim at defining the same occasion index to all morning doses, is not allowed:

#### How can occasions appear while no OCCASION column is defined?

Occasions can be generated even if no OCCASION column is defined in the data set. Occasions will be visible in the data set as a button appears in the Monolix interface allowing to add inter occasion variability to the model. Occasions are automatically created if there is an EVENT ID column with a value 4, which is not the first record of the individual. Within an individual, each EVENT ID = 4 will create a new occasion. Inter-occasion variability can be considered for the automatically created OCCevid occasions but doesn’t has to.

The following data set are equivalent.

ID TIME  Y  OCC
1   0    0   1
1   1    2   1
1   2    2   1
1   0    0   2
1   4    1   2
1   5    2   2

ID TIME  Y  EVID
1   0    0   0
1   1    2   0
1   2    2   0
1   0    0   4
1   4    1   0
1   5    2   0


Remark: In MonolixSuite versions prior 2018R1, occasions were also generated by SS=1.

#### Frequently asked questions on occasions in the data set

• Do all the individual need to share the same sequence of occasion? No, the number of occasions and the times defining the occasions can differ from one individual to another.
• Do the occasion indices need to start at one for each individual? No.
• Do the occasion indices need to be consecutive for each individual? No.
• Is there any limit in terms of number of occasions? No.
• Is it possible to have several levels of occasions? Yes, it can be extended on several level of occasions, see an example here.

#### Format restrictions

• The OCCASION columns should contain only integers.
• If the OCCASION column-type is used, the OCCASION must be defined for all lines.

### TIME: data time stamp

The TIME columns define the time at wich dose and observation events occurred. When no DATE/DAT1/DAT2/DAT3 column is present, the time represents the time elapsed. When a DATE/DAT1/DAT2/DAT3 column is present, it represents the time of the day. The time can be defined using a double, or a clock format hh:mm or hh:mm:ss. Negative time values are allowed. When the double format is used, and no DATE/DAT1/DAT2/DAT3 column is present, the time has no predefined units. In all other cases, the time units are hours.

When a subject has time under the clock format, all times are converted into relative hours, as on the following example:

 TIME Reconstructed time 10:00 10 10:30 10.5 14:00 14 08:59 8.983333

When there is no column-type TIME, the column-type DATE is used to time-stamp data.

#### Format restrictions

• A data set shall not contain more than one column with the column-type TIME.
• If the TIME column-type is used, the TIME must be defined for all lines.
• String “.” will not be interpreted as a repetition of the previous line and is then non-compliant with formats listed here-above.

### DATE/DAT1/DAT2/DAT3: date information

The DATE column-type can be used to indicate the date of the dose or observation event. It is usually used in combination with the TIME column-type, which in that case indicates the time of the day. To accommodate the different date formats, several column types are possible:

Format and associated date column name
DATE DAT1 DAT2 DAT3
Day, month and year mm/dd/yy or mm/dd/yyyy

mm-dd-yy or mm-dd-yyyy

dd/mm/yy or dd/mm/yyyy

dd-mm-yy or dd-mm-yyyy

yy/mm/dd or yyyy/mm/dd

yy-mm-dd or yyyy-mm-dd

yy/dd/mm or yyyy/dd/mm

yy-dd-mm or yyyy-dd-mm

By default, when the year is coded with two digits, it is interpreted as 20xx.

#### Format restrictions

• A data set shall not contain more than one column-type DATE / DAT1 / DAT2 / DAT3.
• Year, day, and month shall be integers.
• The separator must be “/” or “-“
• Character “.” will not be interpreted as a repetition of the previous line but will throw an exception as any non-compliance with formats listed here-above.
• All the lines should be filled correctly within the same delimiter, according to the specified date format: i.e., no empty year, no empty month, no empty day, no mix of delimiters.

#### Timestamp summary

As can be seen there are several ways to define the timestamp of the data set depending if there is a TIME column or not and if there is a DATE column or not.

 TIME column present TIME column not present DATE column present DATE column is considered to represent the day and the TIME column the hour within this day Date column is considered to represent the time DATE column not present TIME column is considered to represent the time (no specific units) First regression-column will be used to timestamp data

#### FAQ

• My data is not “over time”, what should I do? You can arbitrarily set the time of each observation to 0.
• What happens if neither TIME nor DATE is defined? We strongly encourage the user to explicitly define the TIME column-type. However, if there is neither TIME nor DATE column-types, the first regression-column (i.e. first column with column-type REGRESSION) will be used to timestamp data. Moreover, if there is no TIME, no DATE and no REGRESSION column-type, an arbitrary time is computed.

### OBSERVATION: response

The OBSERVATION column-type (formerly Y) can be used to record continuous, categorical, count or time-to-event data.

When there is no EVID or MDV column ‘”forcing” the usage of the measurement or the dose), the observation is taken into account if the value is not ‘.’.

Remarks

• If there is a non null dose and a value in the response-column, we consider it as both dose and response. Is was formerly considered as a response.
• In MonolixSuite version prior to 2018R1, in the case of the definition of both a non null amount and a measurement, the choice was made to favor the measurement. It is no longer the case. However, providing two distinct lines to provide both a dose-line and a response line is still possible and recommended.

#### For continuous data:

The value represents what has been measured (e.g concentrations) and can be any double value.

Examples:

• Basic example:
ID TIME AMT   Y
1   0   50     .
1 0.5    .   1.1
1   1    .   9.2
1 1.5    .   8.5
1   2    .   6.3
1 2.5    .   5.5

#### For categorical data:

In case of categorical data, the observations at each time point can only take values in a fixed and finite set of nominal categories. In the data set, the output categories must be coded as consecutive integers.

Examples:

• Basic example:
ID TIME Y
1 0.5   3
1   1   0
1 1.5   2
1   2   2
1 2.5   3

#### For count data:

Count data can take only non-negative integer values that come from counting something, e.g., the number of trials required for completing a given task. The task can for instance be repeated several times and the individuals performance followed.

Count data can also represent the number of events happening in regularly spaced intervals, e.g the number of seizures every week. If the time intervals are not regular, the data may be considered as repeated time-to-event interval censored, or the interval length can be given as regressor to be used to define the probability distribution in the model.

Examples:

• Basic example: in the data set below, 10 trials are necessary the first day (t=0), 6 the second day (t=24), etc.
ID TIME  Y
1   0   10
1  24    6
1  48    5
1  72    2

#### For (repeated) time-to-event data:

In this case, the observations are the “times at which events occur“. An event may be on-off (e.g., death) or repeated (e.g., epileptic seizures, mechanical incidents, strikes). In addition, an event can be exactly observed, interval censored or right censored. The figure below summarizes the different situations:

##### For single events exactly observed:

One must indicated the start time of the observation period with Y=0, and the time of event (Y=1) or the time of the end of the observation period if no event has occurred (Y=0).

Examples:

• Basic example: in the following dataset, the observation period last from starting time t=0 to the final time t=80. For individual 1, the event is observed at t=34, and for individual 2, no event is observed during the period. Thus it is noticed that at the final time (t=80), no event occurred.
ID TIME Y
1   0   0
1  34   1
2   0   0
2  80   0
##### For repeated events exactly observed:

One must indicate the start time of the observation period (Y=0), the end time (Y=0) and the time of each event (Y=1).

Examples:

• Basic example: below the observation period last from starting time t=0 to the final time t=80. For individual 1, two events are observed at t=34 and t=76, and for individual 2, no event is observed during the period.
ID TIME Y
1   0   0
1  34   1
1  76   1
1  80   0
2   0   0
2  80   0
##### For single events interval censored:

When the exact time of the event is not known, but only an interval can be given, the start time of this interval is given with Y=0, and the end time with Y=1. As before, the start time of the observation period must be given with Y=0.

Examples:

• Basic example: we only know that the event has happened between t=32 and t=35.
ID TIME Y
1   0   0
1  32   0
1  35   1
##### For repeated events interval censored:

In this case, we do not know the exact event times, but only the number of events that occurred for each individual in each interval of time. The column-type Y can now take values greater than 1, if several events occurred during an interval.

Examples:

ID TIME Y
1 0 0
1 32 0
1 35 1
1 50 1
1 56 0
1 78 2
1 80 1

No event occurred between t=0 and t=32, 1 event occurred between t=32 and t=35, 1 between t=35 and t=50, none between t=50 and t=56, 2 between t=56 and t=78 and finally 1 between t=78 and t=80.

#### Format restrictions

• A data set shall not contain more than one column with column-type OBSERVATION.
• Response-column shall contain double value or string “.”.
• If there is a non null double value in dose-column, there must be a non null double value in the response-column.

#### Warnings

• If a subject or a subject/occasion has no observations, a warning message arises telling which individuals, subjects/individuals have no measurements.

#### FAQ

• My data is not “over time”, what should I do? You can arbitrarily set the time of each observation to 0.

### OBSERVATION ID: response type (former YTYPE)

If observations are recorded on several quantities (several concentrations, effects, etc), the column-type OBSERVATION ID permits to assign names to the observations of the column-type Y, for mapping with the quantities outputted by the model. Notice that in case of a dose line, the value in the OBSERVATION ID column will not be read, thus the user can set any value (‘.’; the same as a concentration, …)
Entries in the column-type YTYPE can be strings or integers however, we strongly recommend to use only alphanumeric characters. The underscore “_” character is allowed in the strings of your data set. The mapping of the YTYPE to the model output (in the OUTPUT block of the Mlxtran model file) is done following alphabetical order (and not name matching). In the following data set:

TIME DOSE Y Y_TYPE
0 . 12 conc
5 . 6 conc
10 . 4 effect
15 . 3 effect
20 . 2.1 conc
25 . 2 conc

with the following OUTPUT block in the Mlxtran model file:

OUTPUT:
output = {E, Cc}

the observations tagged with “conc” will be mapped to the first output “E”, and those tagged with “effect” will be mapped to the second output “Cc”, because in alphabetical order “conc” comes before “effect”. To avoid confusion, we recommend to use integers in the OBSERVATION ID column-type, with “1” corresponding to the first output, “2” to the second, etc… If you have more than 10 types of observations, notice that in alphabetical order “10” comes before “2”.
If you use strings, note that “.” is not considered as a repetition or previous line but as the name of a response. For instance, the following data set creates three different types of responses : “type1”, “.”, and “type2”:

TIME DOSE Y Y_TYPE
0 . 12 type1
5 . 6 type1
10 . 4 .
15 . 3 .
20 . 2.1 type2
25 . 2 type2


#### Format restrictions (an exception will be thrown otherwise):

• A data set shall not contain more than one column with column-type OBSERVATION ID.

### CENSORED: censored response

• CENSORED = 1 means that the value in response-column ($y_{obs}$), the content of the column with column-type Y) is an upper limit, true observation y verifies $y.
• CENSORED = 0 means the value in response-column corresponds to a valid observation (no interval associated).
• CENSORED = -1 means that the value in response-column ($y_{obs}$) is a lower bound, true observation y verifies $y>y_{obs}$.

#### Format restrictions (an exception will be thrown otherwise):

• A data set shall not contain more than one column with column-type CENSORED.
• There are only three possible values : -1, 0, and 1.
• String “.” is interpreted as 0.

### LIMIT: limit for censored values

When column LIMIT contains a value and CENS is different that 0, then the value in the LIMIT column, it can be interpreted as the second bound of the observation interval. Thus, it implies that $y\in [y_{limit}, y_{obs}]$.

#### Format restrictions (an exception will be thrown otherwise):

• A data set shall not contain more than one column with column-type LIMIT.
• A data set shall not contain any column with column-type LIMIT if no column with column-type CENS is present.
• Column LIMIT shall contain either a string that can be converted to a double or “.”.

### Example of censored data definition

The proposed example illustrates the case of upper and lower bound on a classical data set of a classical PK model (first order absorption and linear elimination). From the measurements point of view

• There is a lower bound at .5 as the censor is not able to measure lower concentrations, it corresponds to CENS=1 case. Moreover, the concentration can not be lower than 0, thus LIMIT=0.
• There is an upper bound at 5 as the censor is not able to measure higher concentrations, it corresponds to CENS=-1 case. Moreover, from the experimental/modeler point of view, the concentration can not be higher than 6, thus LIMIT=6.

The measurement is represented in the following figure

The measurement corresponds to the blue stars, the real values when censoring arises are in red and green. The corresponding data set is

ID Time Y CENS LIMIT
1  0  0.5 1 0
1  1  0.5 1 0
1  2  4.7 0 0
1  3  5.0 -1 6
1  4  5.0 -1 6
1  5  4.5 0 0
1  6  3.8 0 0
* * * * *
1 15  0.6 0 0
1 16  0.5 0 0
1 17  0.5 1 0
1 18  0.5 1 0
* *   *   * *


The mathematical handling of censored data is described here.

### AMOUNT: dose amount

The content of column AMOUNT will be called the dose-column. It shall either contain a double value or string “.”. When there is no EVID or MDV column, when a dose-column contains a double value different from 0 then it will be considered as a dose-line (i.e. a line containing dose information).

Remarks

• If there is a non null dose and a value in the response-column, we consider it as both dose and response. Is was formerly considered as a response.
• In MonolixSuite version prior to 2018R1, in the case of the definition of both a non null amount and a measurement, the choice was made to favor the measurement. It is no longer the case. However, providing two distinct lines to provide both a dose-line and a response line is still possible and recommended.

#### Format restrictions (an exception will be thrown otherwise):

• A data set shall not contain more than one column-type AMOUNT.
• AMOUNT column shall either contain a double value or string “.”.

The goal of this column is to be able to define several types of administration (e.g. oral administration, intravenous,…). The integer in the ADM column works like a flag, which can be used in the model file to link the dose informations of the data set to a specific administration route in the model. For instance, with the following data set:

ID TIME AMT ADM Y
John 0 10 1 .
Eric 0 20 2 .

and the following PK block in the Mlxtran model file:

PK:
iv(type=1)
oral(type=2, ka)

the subject John will receive a dose of 10 via a bolus iv, while subject Eric will receive a dose of 20 orally with first-order rate constant ka. The identifier in the ADM column should match the “type=” field of the macro. We recommend using ADM to define the type of dose only, and set ADM=”.” for response-lines (in this case, the string “.” will not be interpreted as a repetition of the previous column).

Moreover, it is possible to combine the information of the type of response (as YTYPE) in case of response-lines. Thus, if there are several outputs and several administration routes it is possible to set all the information in the ADM column. The several possibilities using YTYPE and ADM are summarized in the following table

 Type of line \ Case YTYPE off / ADM off YTYPE on/ ADM off YTYPE off / ADM on YTYPE on/ ADM on Response line Only one output Defined using YTYPE Defined using ADM Defined using YTYPE Dose line Only one administration route (type = 1) Only one administration route (type = 1) Defined using ADM Defined using ADM

Notice that, for readability and better understanding), we strongly recommend to

• use ADM to define the type of dose only, and set ADM=”.” for response-lines
• use YTYPE to define the type of output, and set YTYPE = “.” or the first value for dose lines

#### Format restrictions (an exception will be thrown otherwise):

• For dose-lines, the column shall contain only positive integers. For response-lines strings or integers are allowed.
• A data set shall not contain more than one ADM column-type.

### RATE, TINF: rate and infusion duration

These columns enable to define the rate (RATE column-type) or duration (TINF column-type) of doses administered as infusions. The column content is meaningful only for dose-lines. The rate and duration information is transferred to the model via the use of the iv macro. If a RATE is defined, the duration of the infusion will be AMOUNT/RATE. If a TINF is defined, the rate will be AMOUT/TINF.
We strongly recommend to have small duration values (less than 10) to be able to manage it efficiently with analytical solutions. Indeed, if the duration is too long, the calculation of the exponential may produce NaN. Two workarounds:
– Either rescale your time to have durations relevants w.r.t. your time
– If not possible, you may use ODEs and not analytical solutions.

#### Format restrictions (an exception will be thrown otherwise):

• A data set shall not contain more than one column with column-type RATE or TINF.
• “.” or 0 means a bolus dose, without any infusion rate or time.
• Values can be any double value.
• If a negative value is used in combination with the iv macro, the administration will be a bolus.

Steady-state is used to specify that any transitory effect is over and that the system response is now a periodic function of doses. To do this, a fixed number of doses (by default 5) is added before the dose entered with the STEADY STATE flag set to true (so 6 doses in total, by default). The period between doses is set to the INTERDOSE INTERVAL. The number of doses can be changed in the data frame and will be saved in the project file.

Remark:

• In version prior MonolixSuite2018R1, the number of added doses can be changed in the preferences.xmlx file, located in <home>/lixoft/monolix/config in the user folder. The number of doses was defined in the line <dosesToAddForSteadyState value="5"/>, and can for instance be changed to <dosesToAddForSteadyState value="20"/>.

On the following example:

ID TIME AMT SS II EVID Y
Tom 0 10 1 2 1 .

5 doses are applied, at times -10, -8, -6, -4, -2 in addition to the dose at time = 0. The above data set is thus equivalent to:

ID TIME AMT SS II EVID Y
Tom -10 10 0 0 1 .
Tom  -8 10 0 0 1 .
Tom  -6 10 0 0 1 .
Tom  -4 10 0 0 1 .
Tom  -2 10 0 0 1 .
Tom   0 10 0 0 1 .


The first added dose will have a wash-out, thus for clarity an EVID column has been included in the previous example. But of course it is possible to specify a steady-state even if there is no EVID column in the data set. However an II column is mandatory to specify the period between the five added doses to reach steady-state. The absence of this column will throw an exception (see here under for the complete list of exceptions).

Depending on the value of STEADY STATE, a wash-out is done or not for the first dose. Thus, if

• STEADY STATE = 1, a washout is performed
• STEADY STATE = 2 or STEADY STATE = 3, a wash-out is not performed

It is possible to find in a data set a mix of steady-state and non steady-state doses. To prevent doses and measurement from colliding, if a normal dose or a measurement is present before a steady-state dose, we stop adding doses not to impact on the result and a warning is thrown. No occasion is generated.

The following data set, with a normal dose at  t=0 and a steady-state dose at t=10 with an interdose-interval of 3.5 will lead to this kind of output

 ID TIME Y AMT SS  II 1 0 . 10 0 0 1 0 10 . . . 1 1 6 . . . 1 2 3.5 . . . 1 10 . 10 1 3.5 1 11 9 . . . 1 12 6 . . . 1 13 3 . . . 1 14 2 . . .

There will be only 2 additional doses as else wise it would overlay the previous measurements. In addition, there is the wash out (as seen in purple) to start the steady state definition. Moreover, if we replace the SS=1 by SS=2 on line 5, we have the same overlay but no wash out as can be seen in the following figure

 ID TIME Y AMT SS  II 1 0 . 10 0 0 1 0 10 . . . 1 1 6 . . . 1 2 3.5 . . . 1 10 . 10 2 3.5 1 11 9 . . . 1 12 6 . . . 1 13 3 . . . 1 14 2 . . .

There will be only 2 additional doses as else wise it would overlay the previous measurements. In addition, there is the wash out (as seen in purple) to start the steady state definition.

Remarks

• In version prior MonolixSuite2018R1, STEADY STATE= 2 and STEADY STATE =3 where not managed.
• In version prior MonolixSuite2018R1, to prevent doses from colliding, if a normal dose is present before a steady-state dose, a new occasion will be created for the steady-state dose. Thus for the previous example, two occasions would have been created

#### Format restrictions (an exception will be thrown otherwise):

• A data set shall not contain more than one column with column-type STEADY STATE.
• A data set shall not contain more than one column with column-type INTERDOSE INTERVAL.
• If there is  a column-type STEADY STATE, there should be a column-type INTERDOSE INTERVAL.
• When a data set contains a column with column-type STEADY STATE there must be a column with column-type INTERDOSE INTERVAL.
• The column is meaningful only for dose-lines. Its format shall be (for all lines including response-lines for which STEADY STATE information is not applicable) :
• STEADY STATE shall be either 0 or 1 (‘.’ will be replaced by 0).
• II shall contain a double value and it shall be positive (or null).
• when STEADY STATE= 0 then the value shall be null.
• when STEADY STATE= 1, the value shall be strictly positive.

Additional dose lines is a useful shortcut to specify dose regimens with repetitive treatments. ADDL is the number of times the dose shall be repeated and column II contains the dose repetition interval. For instance to specify a dose of 10 every 12 hours during 3 days it is possible to write:

ID TIME AMT
Tom 0 10
Tom 12 10
Tom 24 10
Tom 36 10
Tom 48 10
Tom 60 10
Tom 72 10


but ADDL and II (interdose-interval) can also be used to specify the same information in a single line

ID TIME AMT ADDL II
Tom 0 10 6 12


Notice that in the proposed example, ADDL should be at 6 to have 6 additional administrations. This is very useful for periodic treatments. Two important remarks concerning regression values:

• If there is a regression-column (i.e. a column with column-type REGERSSOR), its value will also be repeated for added doses even though this value has not been specified but obtained via interpolation.
• When regression values are defined after the first added dose, warnings are generated. Indeed these values will not be repeated and can possibly interfere with automatically added regression values at dose time. So the warning is generated for the user to confirm that its data make sense.

#### Format restrictions (an exception will be thrown otherwise):

• ADDL shall only contain positive (or null) integers or “.” (which will be replaced by 0).
• When there is an ADDL column there must be an INTERDOSE INTERVAL (interdose interval) column to indicate the inter dose timing.
• For dose-lines with ADDL strictly positive, the INTERDOSE INTERVAL value must be strictly positive.

### CONTINUOUS COVARIATE: continuous covariate

It is possible to have in a data set one or several columns with column-type CONTINUOUS COVARIATE. There must be one covariate defined per subject-occasion else wise. String “.” can be used to prevent multiple definitions of a covariate for a subject-occasion as it is interpreted as an absence of definition. Therefore, we encourage the user to either define the covariate at each line, or, more simply, at the first use of a subject for readability reasons (even if the covariate has not necessarily to be defined at first occurrence of subject-occasion in the data set).

#### Format restrictions (an exception will be thrown otherwise):

• Continuous covariate columns shall contain either strings that can be converted to double or “.”.
• The covariate must be defined at least each time per subject-occasion.
• The covariate must remain the same for all the lines within the same subject-occasion.

### CATEGORICAL COVARIATE: categorical covariate

It is possible to have in a data set one or several columns with column-type CATEGORICAL COVARIATE. It is possible to enter in a CAT column any string and “.” has no special meaning. We strongly recommend to use only alphanumeric characters and the underscore “_” character in the strings of the CAT columns. In the MonolixSuite 2016R1, special characters such as spaces ” “, stars “*”, parentheses “(“, brackets “[“, dashes “-“, dots “.” and slashes “/” are not supported (this feature will be back in the next release).

Moreover, on the contrary to the continuous covariable, the following data set will generate an error

ID OCC CAT
Tom 1 M
Tom 1 .


#### Format restrictions (an exception will be thrown otherwise):

• The categorical covariable must be the same for all the lines with the same subject-occasion.

### REGRESSOR: regression value

It is possible to have in a data set one or several columns with column-type X. Within a given subject-occasion, string “.” will be interpolated (nearest neighbor interpolation is used) for dose-lines only (N.B.: if there is an EVID column dose-lines correspond to EVID = 1 or EVID = 4). Else wise, for measurement line, no interpolation is performed. If no regressor is defined on such a line, it will be replaced by a NaN. Therefore, in the following data set example,

ID TIME X AMT Y EVID
Tom 0 . 1 . 1
Tom 5 1 . 12 0
Tom 10 . . 10 0
Tom 15 12 1.5 . 1
Tom 20 -6 . 8 0
Tom 25 . 0.2 . 4
Tom 30 . . 0.1 0


The evolution of X with respect to time is defined by the following figure.

Thus, X is set to

• X(0) = 1 (it is a dose-line so an interpolation is realized. The nearest interpolation is realized and here nearest sample corresponds to a response-line).
• X(5) = 1 (from direct reading of input file).
• X(10) = NaN (regression is undefined in the input file but since it is not a dose-line, no interpolation is realized).
• X(15) = 12 (from direct reading of input file).
• X(20) = -6 (from direct reading of input file).
• X(25) = -6 (it is a dose-line so an interpolation is realized. The nearest interpolation is realized and here nearest sample corresponds to a response-line).
• X(30) = NaN (regression is undefined in the input file but since it is not a dose-line, no interpolation is realized).

To add a valid information between time 10 and 15, for example X = 1.5, the data set should contain both a regressor value at time 10 along with the measurement value,

 Tom 10 1.5 . 10 0


Notice that if the line has a MDV value at 1, the regression is taken into account.

#### Format restrictions (an exception will be thrown otherwise):

• The regression-columns (i.e. columns with column-type REGRESSOR) shall contain either strings that can be converted to double or “.”.
• Each subject-occasion must contain at least one non “.” value (since it is then impossible to interpolate values).
• When there are several lines with the same time, the value of the regressor column must be the same.

### EVID: event identification data item.

EVID corresponds to the identification of an event. It is an integer between 0 and 4. It helps to define the type of line.

• EVID = 0: observation event, the line is a response-line.
• EVID = 1: dose event, the line is a dose-line.
• EVID = 2: other event. UNUSED (exception thrown). To define times for model predictions without corresponding observations, use MDV=2.
• EVID = 3: reset event. UNUSED (exception thrown).
• EVID = 4: reset + dose event, indicates a wash-out (i.e reset to initial values) immediately followed by a dose.

#### Format restrictions (an exception will be thrown otherwise):

• A data set shall not contain more than one column with column-type EVID.
• EVID shall contain an integer in [0, 4].
• when a line is tagged (EVID = 0), the observation contained in column Y shall be convertible to a double value.
• when a line is tagged (EVID = 1, EVID = 4), the value in dose-column (i.e. content of the column with column-type AMT) shall be convertible to a double.

### MDV: missing dependent variable.

The MDV column-type enables to tag lines for which the information in the Y column-type is missing. Most of the time, this column is not necessary.

• MDV=0: when a line is tagged MDV = 0 AND if it contains a string convertible to a double value in response-column (the column with column-type Y), then the value in the Y column is taken into account. Values in dose-column (the column with column-type AMT) will not be taken into account.
• MDV=1: when a line is tagged MDV = 1 then the value in column Y will not be taken into account. The value in dose-column, if present, will be taken into account.
• MDV=2: when a line is tagged MDV = 2 then the value in the response-column is not taken into account. The value in dose-column, if present, will be taken into account. The time, covariates, regressors, etc will be taken into account to output a prediction at that time point.

If there are both a MDV and EVID columns, the EVID column is used in priority.
The MDV column is useful to ignore specific response-lines, for instance if the observation is obviously wrong. If a MDV column is added to the dataset, the response-lines to ignore should have MDV=1, but also the dose-lines should have MDV=1 (otherwise the dose will be ignored). MDV=2 permits to define times at which model predictions should be outputted, even if there is no corresponding observation.When there are multiple MDV columns, a synthetic value MDV is computed as:

• if MDV = 0 in all columns, then resulting synthetic MDV equals 0.
• if MDV = 1 in at least one column and the other equals 0, then the resulting synthetic MDV equals 1.
• if MDV = 2 in at least one column and the other equals 0, then the resulting synthetic MDVsynth equals 2.

#### Format restrictions (an exception will be thrown otherwise):

• MDV shall contain only integers belonging to interval [0, 2].
• When MDV=0, the value in the Y column should be convertible to a double value, otherwise an exception will be thrown.

### Character definition

We recommend to use only alphanumeric characters and the underscore “_” character in the strings of your data set.

Unfortunately, in the Monolix2016R1 suite, special characters such as spaces ” “, stars “*”, parentheses “(“, brackets “[“, dashes “-“, dots “.” and slashes “/” are not supported in:

• The strings in CAT column.

Please be careful that if your data set includes unsupported characters, the error will only de detected and displayed when loading a saved project (and not when creating and saving the project).

This feature is back in MonolixSuite2018R1.

### On the use of “.”

The “.” can be used in almost all the lines of the data set but has several meaning depending on the context. The following table summarizes the use of it.

 Type of column Not allowed Considered as a regular string Considered as Not considered ID X OCCASION X TIME X DATE/DAT1/DAT2/DAT3 X OBSERVATION On a response line On a dose line YTYPE On a response line On a dose line (not read) CENSORED 0 LIMIT -Inf if CENS =1 , +Inf if CENS = -1 AMOUNT On a dose line On a response line (not read) ADM On a dose line On a response line STEADY STATE 0 ADDL 0 INTERDOSE INTERVAL 0 CONTINUOUS COVARIATE Previously defined value of the COV (in the ID/OCC) CATEGORICAL COVARIATE X REGRESSOR Interpolation on a dose line, NaN on a response line EVID X MDV X

### 3.Data set examples

This section presents several data sets to show some concrete data set and see how to integrate censored data, covariates, …

### Data sets with continuous outputs

• Theophylline data set: continuous outputs are taken into account along with categorical and continuous covariates (sex and weight respectively). Moreover, censored data are also managed.
• Tobramycin data set: continuous PK output are taken into account, along with categorical and continuous covariates.
• HIV data set: two continuous censored outputs are considered. No dose is used in the data set, and the treatment type is considered as a categorical covariate.
• Veralipride data set: continuous output with an interesting absorption variability being by far the most probable physiological explanation for the double peak phenomenon.
• Remifentanil data set: Remifentanil is an opioid analgesic drug with a rapid onset and rapid recovery time. Remifentanil concentration over 65 healthy adults is proposed.

### Data sets with discrete count outputs

• Epilepsy attacks data set: count outputs are taken into account along with categorical and continuous covariates. The data arose from a clinical trial of 59 epileptics who were randomized to receive either the anti-epileptic drug progabide or a placebo, as an adjuvant to standard chemotherapy. Patients attended four successive post-randomisation clinic visits, where the number of seizures that occurred over the previous 2 weeks was reported.
• Crohn’s Disease Adverse Events data set: Data set issued from a study of the adverse events of a drug on 117 patients affected by Crohn’s disease (a chronic inflammatory disease of the intestines). In addition to the response variable number of adverse events, 7 explanatory variables were recorded for each patient.

### Data sets with discrete categorical outputs

• Respiratory status data set: the respiratory status of patients under placebo or treatment is categorized as “poor” or “good” once per month during 5 months over 111 patients.
• Inpatient multidimensional psychiatric data set: categorical output with a categorical covariate (treatment) during 6 weeks. These data are from the National Institute of Mental Health Schizophrenia Collaborative Study and are available here. Patients were randomized to receive one of four medications, either placebo or one of three different anti-psychotic drugs. The primary outcome is item 79 on the Inpatient Multidimensional Psychiatric.
• Zylkene data set: The putative effects of a tryptic bovine αs1-casein hydrolysate on anxious disorders in cats was investigated using this data set over 24 cats. The score is a global score of emotional state.

### Data sets with time-to-event outputs

• PBC data set: PBC is a rare but fatal chronic liver disease of unknown cause, with a prevalence of about 50-cases-per-million population. Between January, 1974 and May, 1984, the Mayo Clinic conducted a double-blinded randomized trial in primary biliary cirrhosis of the liver (PBC), comparing the drug D-penicillamine (DPCA) with a placebo.
• Oropharynx data set: The following data set provides the data for a part of a large clinical trial carried out by the Radiation Therapy Oncology Group in the United States. One objective of the study was to compare the two treatment policies with respect to patient survival.
• Veterans’ Administration Lung Cancer data set: In this study conducted by the US Veterans Administration, time to death was recorded for 137 male patients with advanced inoperable lung cancer, which were given either a standard therapy or a test chemotherapy.
• NCCTG lung cancer data set: The North Central Cancer Treatment Group (NCCTG) data set records the survival (time-to-event output) of 228 patients with advanced lung cancer, together with assessments of the patients performance status measured either by the physician and by the patients themselves.
• Cardiovascular data set:  A subset of the fields was selected to model the differential length of stay for patients entering the hospital to receive one of two standard cardiovascular procedures: CABG and PTCA. The data set contains 3589 individuals.

### Joint data sets

• Warfarin data set: Warfarin is an anticoagulant normally used in the prevention of thrombosis and thromboembolism.  Plasma warfarin concentrations and Prothrombin Complex Response in thirty normal subjects after a single loading dose are measured. Both measurements are continuous.
• Remifentanil data set: Remifentanil is an opoid analgesic drug with a rapid onset and rapid recovery time. Both remifentanil concentration and EEG measurement are proposed on 65 healthy adults. Both measurements are continuous.
• PSA and survival data set: PSA kinetics and survival data for 400 men with metastatic Castration-Resistant Prostate Cancer (mCRPC) treated with docetaxel and prednisone, the first-line reference chemotherapy, which constituted the control arm of a phase 3 clinical trial. In this context of advanced disease, the incidence of death is high and the PSA kinetics is closely monitored after treatment initiation to rapidly detect a breakthrough in PSA and propose rescue strategies.

## Theophylline data set

The data considered here are courtesy of Dr. Robert A. Upton of the University of California, San Francisco. Theophylline is a methylxanthine drug used in therapy for respiratory diseases such as chronic obstructive pulmonary disease (COPD) and asthma under a variety of brand names. Theophylline was administered orally to 12 subjects whose serum concentrations were measured at 11 times over the next 25 hours. This is an example of a laboratory pharmacokinetic study characterized by many observations on a moderate number of individuals. The data set can be seen here, and the corresponding Datxplore project here (notice that both file should be in the same folder to be correctly linked). A representation of the concentration over time for each subject is presented on the following figure (notice, that this figure was generated using Datxplore).

The purpose of this page is to see the construction, the definition and the use of such a data set in Datxplore and Monolix. For sake of simplicity, we look only on one subject (corresponding to ID 1).

## Simplified data set

The data set for subject one writes as follows

ID AMT TIME    CONC    WEIGHT  SEX
1   4.02    0   .   79.6    M
1   .   0.25    2.84    79.6    M
1   .   0.57    6.57    79.6    M
1   .   1.12    10.5    79.6    M
1   .   2.02    9.66    79.6    M
1   .   3.82    8.58    79.6    M
1   .   5.1 8.36    79.6    M
1   .   7.03    7.47    79.6    M
1   .   9.05    6.89    79.6    M
1   .   12.12   5.94    79.6    M
1   .   24.37   3.28    79.6    M

#### Interpretation

One can see the following columns

Several points can be noticed.

1. The first line corresponds to a dose, while the other ones are measurements. This explains the dot in the CONC column for the first line and the dots in the AMT column for the other ones.
2. The covariates columns (the continuous WEIGHT and the categorical SEX) are constant over the individual. Even though it is not necessary, we encourage the user to fill the columns for readability and usage reasons.
3. Finally, notice that no initial washout is needed at the beginning as by default, the null initial condition is used for parameter estimation.

### 3.1.2.Tobramycin data set

This data set has been originally published in:

Aarons, L., Vozeh, S., Wenk, M., Weiss, P. H., & Follath, F. (1989). Population pharmacokinetics of tobramycin. British journal of clinical pharmacology, 28(3), 305-314.

Tobramycin is an antimicrobial agent of the aminoglycosides family, which is among others used against severe gram-negative infections. Because tobramycin does not pass the gastro-intestinal tract, it is usually administrated intravenously as intermittent bolus doses or short infusions. Tobramycin is a drug with a narrow therapeutic index.
Tobramycin bolus doses ranging from 20 to 140mg were administrated every 8 hours in 97 patients (45 females, 52 male) during 1 to 21 days (for most patients, during ~6 days). Age, weight (kg), sex and creatinine clearance (mL/min) were available as covariates. The tobramycin concentration (mg/L) was measured 1 to 9 times per patients (322 measures in total), most of the time between 2 and 6h post-dose. This sparse data set is presented on the figure below

Below is an extract of the data set:

The columns have the following meaning:

Several points can be noticed:

1. The four first lines correspond to doses, while the other ones are measurements, as indicated by the EVID column. The MDV column is not necessary. The zeros of the DOSE and CP columns could have been replaced by dots ‘.’ .
2. The covariates columns (WT, SEX and CLCR) are filled with the same value for each individual. Covariates must be constant within subjects (or subject-occasions when occasions are defined).

## HIV data set

In the COPHAR II-ANRS 134 trial, an open prospective non-randomized interventional study, 115 HIV-infected patients adults started an antiviral therapy. 48 patients were treated with indinavir (and ritonavir as a booster) (treatment A), 38 with lopinavir (and ritonavir as a booster) (treatment B), and 35 with nelfinavir (Treatment C). patients were followed one year after treatment initialization.

Viral load and CD4 cell count were measured at screenin, at inclusion and at weeks 2 (or 4), 8, 16, 24, 36, and 48. Plasma HIV-1-RNA were measured by Roche monitored with a limit of quantification of 50 copies/ml. The results of this trial are reported in Duval and al. (2009). The data set can be seen here, and the corresponding Datxplore project here (notice that both file should be in the same folder to be correctly linked).

On the two following figures, one could see the two outputs with respect to time for all subjects split by treatments. The red circle corresponds to censored data.

Notice, that these figures were generated using Datxplore.

## Simplified HIV data set

The data set for subject 2 can be defined as follows

ID TIME    Y_NCENS Y   CENS    YTYPE   TREATMENT
2   -2.43   4.9443  4.9443  0   1    A
2   -2.43   249 249 0   2    A
2   0   4.5245  4.5245  0   1    A
2   2   2.3546  2.3546  0   1    A
2   2   266 266 0   2    A
2   4.29    268 268 0   2    A
2   8   2.5585  2.5585  0   1    A
2   8   34  34  0   2    A
2   16  352 352 0   2    A
2   24  1.7981  2   1   1    A
2   24  385 385 0   2    A
2   32  348 348 0   2    A
2   43  415 415 0   2    A

#### Interpretation

One can see the following columns

Several points can be noticed.

1. There are no dose in the data set.
2. There is only a categorical covariate defining the treatment.
3. In the presented case, one does not necessary have both measurements at the same time. Indeed, this is not required for data export using Datxplore, nor parameter estimation using Monolix. Moreover, measurements for negative time is possible.

### 3.1.4.Veralipride data set

This data set has been originally published in:

Plusquellec, Y, Campistron, G, Staveris, S, Barre, J, Jung, L, Tillement, JP, Houin, G (1987). A double-peak phenomenon in the pharmacokinetics of veralipride after oral administration: a double-site model for drug absorption. J Pharmacokinet Biopharm, 15, 3:225-39.

Veralipride is a benzamide neuroleptic medicine indicated in the treatment of vasomotor symptoms associated with the menopause.
In this dataset, 100 mg doses of veralipride were given to 12 healthy volunteers by oral solution. Individual plasma concentrations of veralipride (ng/ml) were observed at 16 time points during 24h (time is measured in h) after the administration. Doses were given in the morning after an overnight fast, and subjects fasted up to 4 hr after drug administration in each case.

This data set is displayed on the figure below. For some individuals, as the one highlighted on the figure, a double peak in plasma concentrations was observed after oral administration of the solution.

This double peak phenomenon is not systematically noticeable, as can be seen on the next figure.

Below is an extract of the data set file:

The columns are:

### 3.1.5.Remifentanil data set

This data set has been originally published in:

Influence of age and gender on the pharmacokinetics and pharmacodynamics of remifentanil. I. Model development. Anesthesiology, Minto, et al. (1997)

Remifentanil is an opioid analgesic drug with a rapid onset and rapid recovery time. It is used for sedation as well as combined with other medications for use in general anesthesia. It is given in adults via continuous IV infusion.

65 healthy adults have received remifentanil IV infusion at a constant diffusion rate between 1 and 8 µg.kg-1.min-1 for 4 to 20 minutes. The data set contains remifentanil admission characteristics (time and rate of infusion), dense measurements of remifentanil blood concentration during infusion and after (PK data), as well as dense electroencephalogram measurements (PD data) recording the depth of anesthesia. In addition, a list of covariates is available: age, gender, and lean body mass (LBM). Moreover, a variable TINFCAT classifies the patients in several categories with similar infusion time.

One can see on the following figure the remifentanil concentrations over time split in two groups (female and male). On each figure, the subjects with age lower than 50 are in blue while the ones with an age over 50 are in green.

On the following figure, one can see the electroencephalogram measurements with respect to time for all subjects.

Below is an extract of the data set file:

The columns have the following meaning:

### 3.2.1.Epilepsy attacks data set

This data set has been originally published in:

Leppik, IE. et al. (1985) A double-blind crossover evaluation of progabide in partial seizures. Neurology 35, 285.

The data arose from a clinical trial of 59 epileptics who were randomized to receive either the anti-epileptic drug progabide or a placebo, as an adjuvant to standard chemotherapy. The hope was that progabide would help to reduce the number of seizures experienced by patients. Patients attended four successive post-randomisation clinic visits, where the number of seizures that occurred over the previous 2 weeks was reported. At baseline, information on the age of the patient and the 8-week pre-randomisation seizure count was recorded.

Below is an extract of the data set:

The columns have the following meaning:

Several points can be noticed:

1. There are several seizure counts for each individual, thus the time allows to define to which period it is related.
2. ID and TIME column are mandatory. Thus, if there is only one count measurement by individual, an additional column with TIME should be added (full of 0 for example).
3. The covariates columns (treatment, base and age) are filled with the same value for each individual. Covariates must be constant within subjects (or subject-occasions when occasions are defined).

Moreover, we can split by the covariate treatment and thus see the impact of the treatment

It seems the the subjects with the treatment have lower seizure rate. We can also display it grouped and not in a spaghetti display as in the following

Using that, we have a better understanding of the seizure_rate, and it seems that the treatment is effective.

### 3.2.2.Crohn's Disease Adverse Events data set

Data set issued from a study of the adverse events of a drug on 117 patients affected by Crohn’s disease (a chronic inflammatory disease of the intestines). In addition to the response variable AE (number of adverse events), 7 explanatory variables were recorded for each patient: BMI (body mass index), HEIGHT, COUNTRY (one of the two countries where the patient lives), SEX, AGE, WEIGHT, and TREAT (the drug taken by the patient in factor form: placebo, d1, d2).

Below is an extract of the data set:

The definition of the columns is the following:

We can see on the following figure the number of adverse events on that period providing a global evaluation of the number of adverse events over the population.

One can split by the categorical covariate treat and see if this covariate as an impact on the number of adverse events. As we can see on the following figure, the drugs seem efficient has we notice that the number of adverse event decrease when drug is used.

One can also stratify by the other covariate to have a first idea of the dependencies before the statistical population analysis using Monolix.

### 3.3.1.Respiratory status data set

In this data set, 111 patients have been administrated a placebo or an active treatment. At randomization and at four visits during the treatment, their respiratory status was determined as being “poor” or “good”, which constitutes the categorical output. Covariates such as center, sex and age were also recorded. The goal was to evaluate the effect of the treatment on the respiratory status.
This data set has been originally published in:

Davis, C. S. (1991). Semi-parametric and non-parametric methods for the analysis of repeated measurements with applications to clinical trials. Statistics in Medicine, 10(12), 1959–80

Below we show a snapshot of the data set:

In MonolixSuite, the output categories must be coded as integers. Thus, we have created the column statusInteger where the respiratory status is coded as 0 for “poor” and 1 for “good”. For individual 1 on placebo, the respiratory status is poor at randomization and remains so during the 4 months. For individual 12 on treatment, the respiratory status is poor at randomization and improves to good during the first three months before deteriorating again to poor at month 4.

The definition of the columns is the following:

The representation of statusInteger with respect to time is proposed on the following figure

Several points can be noticed:

1. The categories must be coded as integers.
2. There are respiratory status measures for each individual, the month column allows to define at which time the measures were done.
3. ID and TIME column are mandatory. Thus, even when there is only one measurement per individual, an additional column with TIME should be added (full of 0 for example).
4. Covariates must be constant within subjects (or subject-occasions when occasions are defined).
5. In this example, two categories are present (“good” and “poor”), but any number of categories is possible.

When loading this data set into Datxplore, one can easily visualize the number of individuals with “poor” (coded as 0, in dark blue) or “good” (coded as 1, in light blue) respiratory status over time in the case of placebo (left) or active treatment (right):

Based on this figure, it seems that the treatment is efficient a priori. We can additionally look at the other covariates and the impact on the output. One can split by sex as can be seen on the following figure.

In that case, the sex covariate seems not to influence a lot the output.

### 3.3.2.Inpatient multidimensional psychiatric data set

This data set has been originally published in:

Hedeker D. and Gibbons R.D. (1996) A computer program for mixed-effects ordinal regression analysis. Computer Methods and Programs in Biomedicine 49, 157-176.

These data are from the National Institute of Mental Health Schizophrenia Collaborative Study and are available here. Patients were randomized to receive one of four medications, either placebo or one of three different anti-psychotic drugs. The protocol indicated subjects were to then be evaluated at weeks 0, 1, 3, 6 to assess severity of illness; additionally some measurements were made at weeks 2, 4, and 5. The primary outcome is item 79 on the Inpatient Multidimensional Psychiatric.
Scale which indicates severity of illness. We will analyze imps79o which is an ordinally scaled version of the original variable imps79 which has the following interpretation

IMPS IMPSo
1 and 2 1 (not ill or borderline)
3 and 4 2 (not ill or borderline)
5 3 (markedly)
6 and 7 4 (severely or most extremely ill)

Predictor variables of interest are TxDrug a dummy coded variable indicating treatment with drug or placebo. Below is an extract of the data set:

The columns have the following meaning:

• ID: patient identification number. It is the identifier (ID).
• IMPS: original variable imps79. This  column is ignored as we use an ordinally scaled version of this variable.
• IMPSo: ordinally version of imps79. It is the measurement (Y).
• TxDrug indicates if there is a treatment or the placebo. This is a categorical covariate (CAT).
• Week: measurement period. It is the time (TIME).

One can see on the following figure the evolution of the IMPSo with respect to time on both treated patients and patients with placebo.

### 3.3.3.Zylkene data set

This data set has been originally published in:

C. Beata, J. Cordel, N. Marlois (2007). Effect of alpha-casozepine (Zylkene) on Anxiety in Cats, Journal of Veterinary Behavior: Clinical Applications in Research, Vol. 2., Issue 2, pp. 40-46.

The putative effects of a tryptic bovine αs1-casein hydrolysate on anxious disorders in cats was investigated. This product is known as alpha-casozepine and patented under the name of Zylkene (Ingredia, Arras, France). Within veterinary practices, 34 cats were recruited by certified behaviorist surgeons. This 56-day trial against placebo showed the statistically positive effect of this product in the management of anxious disorders such as social phobias in cats. Global score, as well as different items (fear of strangers, contact with familiars, general fears, fear-related aggressions, autonomic disorders), were all significantly improved by the use of this natural decapeptide.

Below we show a snapshot of the data set:

Note that in the MonolixSuite the output categories must be coded as integers. In that cases, observation is a score between 1 and 25. The definition of the columns is the following:

We can see on the following figure the evolution of all SCORE with respect to time. We see that, when the time increases, the number of cats with a higher score increases too.

To see the impact of the treatment, we can split by the covariate TRT and see what is the difference between the two groups as on the following figure (placebo on the left and treatment on the right). We see that the treatment seems efficient as the SCORE is better with the treatment

We can do the same with the GENDER categorical covariate, we can create two groups the female (associating F and NeuteredF) and the male (NeutredM). On the contrary to the treatment, the GENDER does not seem to impact the SCORE.

## PBC data set

PBC is a rare but fatal chronic liver disease of unknown cause, with a prevalence of about 50-cases-per-million population. The primary pathologic event appears to be the destruction of interlobular bile ducts, which may be mediated by immunologic mechanisms.
Between January, 1974 and May, 1984, the Mayo Clinic conducted a double-blinded randomized trial in primary biliary cirrhosis of the liver (PBC), comparing the drug D-penicillamine (DPCA) with a placebo. There were 424 patients who met the eligibility criteria seen at the Clinic while the trial was open for patient registration. Both the treating physician and the patient agreed to participate in the randomized trial in 312 of the 424 cases. The date of randomization and a large number of clinical, biochemical, serologic, and histologic parameters were recorded for each of the 312 clinical trial patients. The data from the trial were analyzed in 1986 for presentation in the clinical literature. For that analysis, disease and survival status as of July, 1986, were recorded for as many patients as possible. By that date, 125 of the 312 patients had died, with only 11 not attributable to PBC. Eight patients had been lost to follow up, and 19 had undergone liver transplantation.

The considered data set comes from Counting Processes and Survival Analysis by T. Fleming & D. Harrington, (1991), published by John Wiley & Sons. The data set can be seen here, and the corresponding Datxplore project here (notice that both file should be in the same folder to be correctly linked).

On the following figure, one could see the survival curve and the mean number of events with respect to time. Notice, that this figure was generated using Datxplore.

In this data set, there are a lot of available covariates

id       = case number
futime   = number of days between registration and the earlier of death,
transplantion, or study analysis time in July, 1986
status   = 0=alive, 1=liver transplant, 2=dead
drug     = 1= D-penicillamine, 2=placebo
age      = age in days
sex      = 0=male, 1=female
ascites  = presence of ascites: 0=no 1=yes
hepato   = presence of hepatomegaly 0=no 1=yes
spiders  = presence of spiders 0=no 1=yes
edema    = presence of edema 0=no edema and no diuretic therapy for edema;
.5 = edema present without diuretics, or edema resolved by diuretics;
1 = edema despite diuretic therapy
bili     = serum bilirubin in mg/dl
chol     = serum cholesterol in mg/dl
albumin  = albumin in gm/dl
copper   = urine copper in ug/day
alk_phos = alkaline phosphatase in U/liter
sgot     = SGOT in U/ml
trig     = triglicerides in mg/dl
platelet = platelets per cubic ml/1000
protime  = prothrombin time in seconds
stage    = histologic stage of disease

On the two following figure, one could see the survival curve and the mean number of events with respect to time for two groups, the first groups concerns the subjects younger than 52.3 years and the other group concerns the other one. Notice, that this figure was generated using Datxplore.

## Simplified PBC data set

The data set for subjects 1 and 2 can be defined as follows

ID;TIME;Y;TRT;AGE;SEX;
1;0;0;1;58.7652;1;
1;400;1;1;58.7652;1;
2;0;0;1;56.4463;1;
2;4500;0;1;56.4463;1;


One must indicated the start time of the observation period with Y=0 (at line 1 and 3 for subject 1 and 2 respectively), and the time of event (Y=1) or the time of the end of the observation period if no event has occurred (Y=0). In this simplified data set, subject one had an event at time 400 leading to a line in the data set where Y=1. On the contrary, no event occurred for subject 2. Thus, at the end of the observation (TIME=4500), Y is set to 0.

## Oropharynx data set

The following data set provides the data for a part of a large clinical trial carried out by the Radiation Therapy Oncology Group in the United States. The full study included patients with squamous carcinoma of 15 sites in the mouth and throat, with 16 participating institutions, though only data on three sites in the oropharynx reported by the six largest institutions are considered here. Patients entering the study were randomly assigned to one of two treatment groups, radiation therapy alone or radiation therapy together with a chemotherapeutic agent. One objective of the study was to  compare the two treatment policies with respect to patient survival. Approximately 30% of the survival times are censored owing primarily to patients surviving to the time of analysis. Some patients were lost to follow-up because the patient moved or transferred to an institution not participating in the study, though these cases were relatively rare.

The considered data set comes from The Statistical Analysis of Failure Time Data, by JD Kalbfleisch & RL Prentice, (1980), Published by John Wiley & Sons.The data set can be seen here, and the corresponding Datxplore project here (notice that both file should be in the same folder to be correctly linked).

On the following figure, one could see the survival curve and the mean number of events with respect to time. Notice, that this figure was generated using Datxplore.

This study included measurements of many covariates which would be expected to relate to survival experience. Six such variables are given in the data (sex, T staging, N staging, age, general condition, and grade). The site of the primary tumor and possible differences between participating institutions require consideration as well.

CASE          Case Number
INST          Participating Institution
SEX           1=male, 2=female
TX        Treatment: 1=standard, 2=test
3=poorly differentiated,  9=missing
AGE           In years at time of diagnosis
COND          Condition: 1=no disability, 2=restricted work, 3=requires assistance
with self care, 4=bed confined,  9=missing
SITE          1=faucial arch, 2=tonsillar fossa, 3=posterior pillar,
4=pharyngeal tongue, 5=posterior wall
T_STAGE       1=primary tumor measuring 2 cm or less in largest diameter,
2=primary tumor measuring 2 cm to 4 cm in largest diameter with
minimal infiltration in depth, 3=primary tumor measuring more
than 4 cm, 4=massive invasive tumor
N_STAGE       0=no clinical evidence of node metastases, 1=single positive
node 3 cm or less in diameter, not fixed, 2=single positive
node more than 3 cm in diameter, not fixed, 3=multiple
positive nodes or fixed positive nodes
ENTRY_DT      Date of study entry: Day of year and year, dddyy
TIME          Survival time in days from day of diagnosis

On the two following figure, one could see the survival curve and the mean number of events with respect to time for two groups, the first groups concerns the subjects younger than 55 years and the other group concerns the other one. Notice, that this figure was generated using Datxplore.

## Simplified Oropharynx data set

The data set for subjects 47 and 48 can be defined as follows

ID;INST;SEX;TRT;GRADE;AGE;COND;SITE;T_STAGE;N_STAGE;ENTRY_DT;Y;Time
47;4;1;2;2;49;3;1;4;3;5669;0;0
47;4;1;2;2;49;3;1;4;3;5669;1;74
48;3;1;1;1;44;1;1;3;1;2769;0;0
48;3;1;1;1;44;1;1;3;1;2769;0;1609


One must indicated the start time of the observation period with Y=0 (at line 1 and 3 for subject 47 and 48 respectively), and the time of event (Y=1) or the time of the end of the observation period if no event has occurred (Y=0). In this simplified data set, subject 47 had an event at time 74 leading to a line in the data set where Y=1. On the contrary, no event occurred for subject 48. Thus, at the end of the observation (TIME=1609), Y is set to 0.

### 3.4.3.Veterans’ Administration Lung Cancer data set

In a study conducted by the US Veterans Administration, male patients with advanced inoperable lung cancer were given either a standard therapy or a test chemotherapy. Time to death was recorded for 137 patients, while 9 left the study before death. Various covariates were also documented for each patient.
The primary goal of the study was to assess if the test chemotherapy is beneficial. Secondary goals included the analysis of covariates as prognostic variables.
This data set has been published in D Kalbfleisch and RL Prentice (1980), The Statistical Analysis of Failure Time Data. Wiley, New York. The data set can be downloaded here.

A snapshot of the data set in shown below:

The TIME and Y columns are interpreted in the following way: the observation period for individual 1 start at time 0 and the event occurs at time 72 (i.e 72 days after the enrollment). For individual 10, the start time is also 0 and by the end of the observation period for this individual at time 100, no event has yet occurred.

The structure of the data file is the following:

• ID: ID of the patient
• TIME: time of start of the observation period (if Y=0, first occurrence), death (if Y=1) or censoring (if Y=0, second occurrence)
• Y: 0 to indicate the start of the observation period or censoring and 1 to indicate death
• trt: treatment type, categorical covariate
• celltype: histological type of the tumor, categorical covariate
• karno: Karnofsky performance score that describes the overall patients status at the beginning of the study, continuous covariate
• diagtime: Time between diagnosis and start of the study (in month), continuous covariate
• age: age of the patient (in years), continuous covariate
• priortherapy: indicates if the patient has received another therapy before the current one, categorical covariate

Using Datxplore, one can visualize the Kaplan-Meier curve. The censored data are indicated by red crosses.

### 3.4.4.NCCTG lung cancer data set

The North Central Cancer Treatment Group (NCCTG) data set records the survival of patients with advanced lung cancer, together with assessments of the patients performance status measured either by the physician and by the patients themselves. The goal of the study was to determine whether patients self-assessment could provide prognostic information complementary to the physician’s assessment. The data set contains 228 patients, including 63 patients that are right censored (patients that left the study before their death).

This data set has been originally presented and analyzed in Loprinzi et al. (1994). Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer Treatment Group. Journal of Clinical Oncology : Official Journal of the American Society of Clinical Oncology, 12(3), 601–607. The data set can be downloaded here.

A snapshot of the data set is displayed below:

The observation period for individual 1 start at 0 and the death event is observed at time 306. On the opposite, individual 3 left the study at time 1010, before his death.

The meaning of the columns is the following:

• ID: ID of the patient
• TIME: time of start of the observation period (if Y=0, first occurrence), death (if Y=1) or censoring (if Y=0, second occurrence)
• Y: 0 to indicate the start of the observation period or censoring and 1 to indicate death
• age: age of the patient (years)
• sex: sex of the patient (F for female, M for male)
• ecogPH: ECOG (Eastern Cooperative Oncology Group) performance status assessed by the physician, on a scale from 0 (fully active) to 5 (dead). For information on the scale, click here.
• karnoPH: Karnofsky performance status, assessed by the physician, on a scale from 0 (dead) to 100 (completely healthy). More details about the scale can be found here.
• karnoPAT: Karnofsky performance status, assessed by the patient

Using Datxplore, one can visualize the Kaplan-Meier kurve. The censored data are indicated by red crosses.

### 3.4.5.Cardiovascular data set

Data come from the 1991 Arizona cardiovascular patient files. A subset of the fields was selected to model the differential length of stay for patients entering the hospital to receive one of two standard cardiovascular procedures: CABG and PTCA. CABG is the standard acronym for Coronary Artery Bypass Graft, where the flow of blood in a diseased or blocked coronary artery or vein has been grafted to bypass the diseased sections. PTCA, or Percutaneous Transluminal Coronary Angioplasty, is a method of placing a balloon in a blocked coronary artery to open it to blood flow. It is a much less severe method of treatment for those having coronary blockage, with a corresponding reduction in risk.

Below we show a snapshot of the data set:

The definition of the columns is the following:

The representation of the Keiplan-Meier curve with respect to time is presenter below.

## Warfarin data set

This data set has been originally published in:

O’Reilly (1968). Studies on coumarin anticoagulant drugs. Initiation of warfarin therapy without a loading dose. Circulation 1968, 38:169-177.

Warfarin is an anticoagulant normally used in the prevention of thrombosis and thromboembolism, the formation of blood clots in the blood vessels and their migration elsewhere in the body, respectively. The data set provides set of plasma warfarin concentrations and Prothrombin Complex Response in thirty normal subjects after a single loading dose. A single large loading dose of warfarin sodium, 1.5 mg/kg of body weight, was administered orally to all subjects. Measurements were made each 12 or 24h.
On the two following figure, one could see the concentration and the effect with respect to time for all subjects.

The data set for subject one can be defined as follows

id time    amt dv  dvid    wt  age sex
1   0   100 .   1   66.7    50  1
1   0   .   100 2   66.7    50  1
1   24  .   9.2 1   66.7    50  1
1   24  .   49  2   66.7    50  1
1   36  .   8.5 1   66.7    50  1
1   36  .   32  2   66.7    50  1
1   48  .   6.4 1   66.7    50  1
1   48  .   26  2   66.7    50  1
1   72  .   4.8 1   66.7    50  1
1   72  .   22  2   66.7    50  1
1   96  .   3.1 1   66.7    50  1
1   96  .   28  2   66.7    50  1
1   120 .   2.5 1   66.7    50  1
1   120 .   33  2   66.7    50  1

#### Interpretation

One can see the following columns

Several points can be noticed.

1. The first line corresponds to a dose, while the other ones are measurements. This explains the dot in the CONC column for the first line and the dots in the AMT column for the other ones.
2. The covariates columns (the continuous wt and the categorical covariates age and sex) are filled with the same values. Even though it is not necessary, we encourage the user to fill the columns for readability and usage reasons.
3. In the presented case, both PK and PD measurements are at the same time, this is not required for data exploration using Datxplore, nor parameter estimation using Monolix.
4. Finally, notice that no initial washout is needed at the beginning as by default, the null initial condition is used for parameter estimation.

Interestingly, one can display the Effect with respect to the Concentration in order to have an idea on how to model the interaction between the PD and the PK part.

Then, the response does not seem to be direct. Notice that, as the observation times are no the same between the PK and the PD, interpolation is made to propose this kind of plot. One can also focus on one individual in particular as on the following figure

Notice that we also propose a red arrow to describe the evolution of time.

### 3.5.2.PSA and survival data set

This data set has been originally published in:

Desmée, S, Mentré, F, Veyrat-Follet, C, Sébastien, B, Guedj, J (2017). Using the SAEM algorithm for mechanistic joint models characterizing the relationship between nonlinear PSA kinetics and survival in prostate cancer patients. Biometrics, 73, 1:305-312.

It contains PSA kinetics and survival data for 400 men with metastatic Castration-Resistant Prostate Cancer (mCRPC) treated with docetaxel and prednisone, the first-line reference chemotherapy, which constituted the control arm of a phase 3 clinical trial. In this context of advanced disease, the incidence of death is high and the PSA kinetics is closely monitored after treatment initiation to rapidly detect a breakthrough in PSA and propose rescue strategies. In more details:

• PSA kinetics: 6627 PSA measurements were collected, among which roughly 20% were pretreatment, 60% on treatment, and 20%posttreatment. 2.5% are below the limit of quantification (LoQ),set at 0.1 ng.ml−1.
• survival: 286 patients deceased (71.5%), leading to a median survival of 656 days.

PSA measurements are displayed on the figure below. Since the data density is too high to see clearly, the next figure is a selection of six individuals, that allows to identify the shape of PSA measurements followed by most individuals: a decrease of PSA concentration after the treatment initiation at time zero, followed by an increase some time later due to resistance.

The following figure shows the Kaplan-Meier curves for the survival data split in two groups by the median time of end of treatment

Below is an extract of the data set file:

The columns are:

### 4.1.FAQ

#### Evolution

• What are the evolution in this version compared to the previous version (2016R1)? All the evolution can be seen here.
• Can I get the documentation for the data set association to the previous version? Of course, you can download it here.

• Which file formats are supported? Text and comma-separated values file are allowed. The file extension should preferably be .txt or .csv.
• Should I have a header line? Yes, having a header line is mandatory.
• Are there restrictions on header names? No, there is no limitation in terms of names nor on character number. However, some characters are not allowed as in the rest of the data file (see here).
• Which column types are mandatory? The ID, TIME and Y column-types are mandatory. All others are optional.
• Which column-types are possible? The complete list of supported column-types can be found here.
• Which separators are allowed? The supported separators are comma (“,”), semicolon (“;”), space (” “), and tab (“\t”).
• Which characters are allowed in strings? The list of allowed characters can be found here.
• What does “.” mean? The “.” can be used in almost all the lines of the data set but has several meaning depending on the context. A summary can be found here.
• How can I ignore certain response-lines of my data set? Use MDV=1 for that.
• Can I specify time in hour or in days? Yes, all the possible formats are defined here.
• Can the data by split into several files (for instance one file for dosing and one for observations)? No. All the data must be grouped into a single file.

#### Questions about format difference with NONMEM

• What are the differences between Monolix and Nonmem in terms of data set? The few differences are listed here.
• What is the equivalent of the NONMEM CMT column? Depending on the usage of the CMT column, it can correspond to the YTYPE column-type, the ADM column-type or to both. All differences between NONMEM and the MonolixSuite are listed here.

#### Questions about subjects and occasions

• Must all lines corresponding to the same individual be grouped? No, this is not necessary. All lines with the same ID will be assigned to the same individual, whatever their order or grouping.
• How can I define occasions? For that, you can use the OCC column-type as explained here.

• Must the times be in ascending order? For a given individual, the times do not need to be in order. The sorting will be done automatically.
• Can I specify time in hour or in days? Yes, all the possible formats are defined here.
• Can time have negative values? Yes.
• For time-to-event data, do I have to indicate the start time? Yes, it must be explicitly stated, for instance with TIME=0 and Y=0. Guidelines for data set formatting for time-to-event data are given here.

#### Questions about responses and observations

• Are non-continuous data types (such as count, time-to-event and categorical data) supported? Yes. Exemples of data set for non-continuous data types are presented here.
• Which value should I enter in the Y column-type for BLQ values? In the Y column-type, give the limit of quantification (LOQ). To mark the observations as being BLQ, use the CENS column-type. To indicate a censoring interval, use the LIMIT column-type (in addition to the CENS and Y columns).
• Can my data set contain different types of observations? Yes, use the YTYPE column to define to which type of data the line corresponds. An example data set with different types of observations is presented here.
• What happen if I define both a dose and a response in the same line? Depending on the values, it can be a dose, a response. To see all the configurations, see here.

• What happen if I define both a dose and a response in the same line? Depending on the values, it can be a dose, a response. To see all the configurations, see here.
• For dose-lines, should I specify the compartment into which the dose is introduced? No. In the MonolixSuite, the matching between the data (dose and observation lines) and the model (administrations and predictions) is done using identifiers, not based on compartment numbers. To assign a dose to a specific administration of the model (oral or iv macros for classical PK models, depot macro for more complex ODEs), the column ADM is used. The identifier in the ADM column should match the “type=” field of the macro.
• If I have several outputs, should I duplicate the dosing information? No.

• Can I have time-varying covariates? Continuous (COV) and categorical (CAT) covariates must be contant within a subject-occasion. Yet continuous covariates can be tagged as regressors (column-type X). However, if a continuous covariate varies with respect to time, the first value declared will be used for the entire subject-occasion.

#### Questions about controls and events

• How can I ignore certain response-lines of my data set? Use MDV=1 for that.
• Are the MDV and EVID columns necessary? These columns are not mandatory and most of the time not necessary.
• Which values are allowed for EVID? EVID can takes the values EVID=0 (observation), EVID=1 (dose) and EVID=4 (reset followed by a dose). EVID=2 and EVID=3 are not supported.
• How can I define a time at which I which the output the predictions, even if I have no observation? Use MDV=2 for this purpose.

### 4.2.Translating your dataset from NONMEM format to the Monolix Suite format

The required format for the data set in NONMEM and in the Monolix Suite is very similar. Usually only few changes (if any) are required to go from one format to the other one.

### General formatting

• Column names: in the Monolix Suite column names are not restricted in length, and not restricted to uppercase format. Yet, only alphanumeric and the underscore “_” characters are allowed. Special characters such as spaces ” “, stars “*”, parenthèses or brackets “(“, dashes “-“, slashes “/” are not supported.
• Header line: no need to start the header line with the “#” character in the Monolix Suite, the column headers line will be recognized automatically.
• Number of columns: there is no limitation of the number of columns in the Monolix Suite

### Dose column-types

• SS column: SS=2 and SS=3 are not supported in the Monolix Suite. When a data set contains a column with column-type SS, there must be a column with column-type II. If SS=1, then the value in the II column must be strictly positive. In case of steady-state, steady-state formulas are not used. Instead, additional doses (5 by default) are added before the SS dose to reach steady-state.
• RATE and TINF: in case of an infusion, in the Monolix Suite, it is possible to define either the rate (RATE column-type) or the duration (TINF column-type). The rate and the duration are related to each other via the amount: TINF=RATE/AMT. Negative values in the RATE column-type result in a bolus, when used in combination with the iv macro (and models from the library with iv). When used in combination with the oral macro (and models from the library with oral0 or oral1), the RATE column is ignored if the value is negative and an error is triggered if the value is positive. If infusion duration is defined in the model (parameter or fixed value), the RATE column is not necessary (in opposition to NONMEM, where RATE=-1 and RATE=-2 are used).
• CMT column: in NONMEM, for observation-lines, CMT specifies the compartment from which the predicted value of the observation is obtained. For dose-lines, CMT specifies the compartment into which the dose is introduced. In the MonolixSuite, the matching between the data (dose and observation lines) and the model (administrations and predictions) is done using identifiers, not based on compartment numbers. To assign a dose to a specific administration of the model (oral or iv macros for classical PK models, depot macro for more complex ODEs), the column ADM is used. The identifier in the ADM column should match the “type=” field of the macro. To assign an observation to a prediction, the column YTYPE is used. Observation lines with YTYPE=1 will be assigned to the first output (output = {…} statement in the model file), lines with YTYPE=2 to the second output, etc. Note that the default values for the administration type in administration macros or in the pkmodel macro is type=1. Similarly, in case of a single output, YTYPE=1 by default (while in NONMEM, the central compartment may have number 1 or 2). In the ADM column-type, negative values are not allowed. Turning off compartments should instead be defined in the model file.

### Control and event columns-types

• EVID column-type: in the Monolix Suite, the EVID column is not mandatory, since dose-events (EVID=1) and response-events (EVID=0) are automatically recognized. Note that the Monolix Suite does not recognize EVID=2 (“Other event”) and EVID=3 (“Reset event”), but recognizes EVID=4 (which corresponds to a reset to initial values immediately followed by a dose). EVID=4 creates a new occasion for the individual. In NONMEM, EVID=2 is sometimes used to define a time point at which one would like to predict a concentration, without having an observation. In the Monolix Suite, this is done using MDV=2 (see below).
• MDV column-type: the MDV column is not mandatory in the Monolix Suite. Dose-lines and observation-lines will be recognized automatically. Yet the MDV column can be useful to force a response-line to be ignored (MDV=1). Several MDV columns are allowed, in this case a synthetic MDV value is computed. In the Monolix Suite, MDV can in addition take the value MDV=2, which permits to define a time point (and possibly a regressor value) to output a prediction without having the corresponding observation. In Monolix, the time points tagged MDV=2 will for instance appear in the table “fulltimes.txt”, outputted when selecting “All times” in the “Outputs to save” window.

### Response column-types

• Censored data: in the Monolix Suite, censored data should be tagged in the data set using additional columns with CENS (mark as censored observation) and if necessary LIMIT (give other interval boundary) column-types. The LOQ value is indicated in the Y column. Censored data are then automatically taken into account in the likelihood in a rigorous statistical way. If only the CENS column is used, the method in the MonolisSuite is equivalent to the so-called M3 method. When both CENS and LIMIT are used, the method in equivalent to M4.

### Subject identification columns-types

• ID column-type: in NONMEM all lines related to a single individual must be in one block, which is not the case in the Monolix Suite. If the ID column contains the following IDs: [1,1,1,2,2,1,1], NONMEM will consider that the dataset comprise three individuals with IDs 1 (with 3 observations), 2 (with 2 observations) and 1_1 (with 2 observations). In the Monolix Suite, two individuals are considered, with IDs 1 (with 5 observations) and 2 (with 2 observations).

### Time column-types

• TIME column-type: values in the time column can be negative in the Monolix Suite.

### Covariates and regression column-types

• Covariates: in the Monolix Suite, columns corresponding to continuous covariates must be set to the COV column-type, and categorical covariates to the CAT column-type
• Regression variables: in the Monolix Suite, regression variables must be set to the X column-type. If several regression variables are used, their order must be the same in the dataset and in the “input” field of the model file.

### Unsupported column-types

• The PCMT, CONT, CALL, MRG_, RAW_, RPT_, L1, and L2 column-types are not supported in the MonolixSuite.
Suggest Edit