Data set documentation

Version 2023

This documentation specifies the format of datasets for the MonolixSuite starting in 2018. It details:

  • General structure: structure of a dataset for population modeling
  • Format rules: rules to format your experimental data, using the available column types
  • Examples: examples of real datasets with typical features (continuous data, discrete data, time-to-event data, censored data, data with several types of measurements, …)
  • Nonmem differences: key differences with the Nonmem format

©Lixoft

Dataset for population modeling

The considered datasets are dedicated to population modeling. The population approach describes phenomena observed in each of a set of individuals and the variability between individuals. The data is thus individual data, and is often longitudinal (over time).  For each subject, the dataset contains measurements, dose regimen, covariates etc … i.e. all collected information.

General format

The data must be in long format, i.e each row is one time point per subject. For each row, the individuals ID, observations, dose amount, covariates, etc are recorded in different columns. The column headers in the dataset are free, but the columns must be tagged using the available column types when defining the data in the applications of the MonolixSuite, such that the application knows how to interpret the data. The column types are very similar and compatible with the structure used by the Nonmem software.

The file extension should be .txt or .csv, a header line is needed and the data must be separated by tab “\t”, comma “,”,  semicolon “;” or a space ” “.