Population modeling data set description

Data set structure

The data set structure contains for each subject measurements, dose regimen, covariates etc … i.e. all collected information. The data must be in the long format, i.e each line corresponds to one individual and one time point. Different type of information (dose, observation, covariate, etc) are recorded in different columns, which must be tagged with a column type (see below). The column types are very similar and compatible with the structure used by the Nonmem software (the differences are listed here).

Description of line-types

Depending on the information it contains, each line will be considered as (with exception of the header line):

dose-line: line that contains information about the dose’s regimen (and possibly also about covariates and regression variables)
response-line: line that contains an observation (and possibly also about covariates and regression variables)
both dose and response-line: line that contains information about both the dose regimen and an observation (and possibly also about covariates and regression variables)

The EVENT ID column-type can be used to enforce each line to be a dose or response line. Without EVENT ID column, the content of the AMOUNT, OBSERVATION and IGNORED OBSERVATION columns are used to assign lines as dose lines, response lines or both. A table summarizing all cases can be found here.

Changes with respect to the MonolixSuite2016R1 version:

In the MonolixSuite2016R1, a line could not be both a dose-line and a response-line. Two lines were necessary to define a dose information and a measure occurring at the same time. In particular, in the MonolixSuite2018R1 version, if there is a non null dose and a value in the response-column, we consider it as both dose and response. It was formerly considered as a response.

Description of column-types

The first line of the data set must be a header line, defining the names of the columns. The columns names are completely free. In the MonolixSuite applications, when defining the data, the user will be asked to assign each column to a column-type (see here for an example of this step). The column type will indicate to the application how to interpret the information in that column. The available column types are given below:

Column-types used for all types of lines:

ID (mandatory): identifier of the individual
OCCASION (formerly OCC): identifier (index) of the occasion
TIME: time of the dose or observation record
DATE/DAT1/DAT2/DAT3: date of the dose or observation record, to be used in combination with the TIME column
EVENT ID (formerly EVID): identifier to indicate if the line is a dose-line or a response-line
IGNORED OBSERVATION (formerly MDV): identifier to ignore the OBSERVATION information of that line
IGNORED LINE (from 2019 version): identifier to ignore all the informations of that line
CONTINUOUS COVARIATE (formerly COV): continuous covariates (which can take values on a continuous scale)
CATEGORICAL COVARIATE (formerly CAT): categorical covariate (which can only take a finite number of values)
REGRESSOR (formerly X): defines a regression variable, i.e a variable that can be used in the structural model (used e.g for time-varying covariates)
IGNORE: ignores the information of that column for all lines

Column-types used for response-lines:

OBSERVATION (mandatory, formerly Y): records the measurement/observation for continuous, count, categorical or time-to-event data
OBSERVATION ID (formerly YTYPE): identifier for the observation type (to distinguish different types of observations, e.g PK and PD)
CENSORING (formerly CENS): marks censored data, below the lower limit or above the upper limit of quantification
LIMIT: upper or lower boundary for the censoring interval in case of CENSORING column

Column-types used for dose-lines:

AMOUNT (formerly AMT): dose amount
ADMINISTRATION ID (formerly ADM): identifier for the type of dose (given via different routes for instance)
INFUSION RATE (formerly RATE): rate of the dose administration (used in particular for infusions)
INFUSION DURATION (formerly TINF): duration of the dose administration (used in particular for infusions)
ADDITIONAL DOSES (formerly ADDL): number of doses to add in addition to the defined dose, at intervals INTERDOSE INTERVAL
INTERDOSE INTERVAL (formerly II): interdose interval for doses added using ADDITIONAL DOSES or STEADY-STATE column types
STEADY STATE (formerly SS): marks that steady-state has been achieved, and will add a predefined number of doses before the actual dose, at interval INTERDOSE INTERVAL, in order to achieve steady-state