{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Input data structure\n",
    "Here we will describe the structure of the files required to create and run an INCA model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import pathlib\n",
    "import ast\n",
    "import incawrapper\n",
    "from incawrapper import utils "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The INCAWrapper does not directly read any files, but takes `pandas dataframes` as input. Thus, the user can make these dataframes using there preferred methods. In the end of this guide, we have a guide to write and read .csv or excel files to import them as correctly formatted dataframes.\n",
    "\n",
    "The data is validated using the Pandera python package. The validation covers both columns names and data types, thus column names must be specified exactly as shown in these examples. However the dataframes are allowed to contain more columns than the once required, though these extra columns are not parsed to INCA or to the INCA results object. To view the exact validation criteria for a specific data type inspect the data schema's found in the dataschema module.\n",
    "\n",
    "As an example we will show how to inspect the `ReactionsSchema`, which further described in the section **Reactions data**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'rxn_id': <Schema Column(name=rxn_id, type=DataType(str))>,\n",
       " 'rxn_eqn': <Schema Column(name=rxn_eqn, type=DataType(str))>}"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "incawrapper.ReactionsSchema.columns"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reactions data\n",
    "This is were the reactions and the atom map in the model are defined. INCA requires the reactions to be defined with arrows `->` for irreversible reactions and `<->` for reversible reactions. There are two different syntaxes for the atom maps. First and simples is just to use letters, e.g. `abc`. If one requires more fine-grained control INCA also supports a syntax which specifies the individual atoms more explicit: `C1:a C2:b C3:c`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column name</th>\n",
       "      <th>dtype</th>\n",
       "      <th>required</th>\n",
       "      <th>nullable</th>\n",
       "      <th>description</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>rxn_id</td>\n",
       "      <td>str</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>The unique id of the reaction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>rxn_eqn</td>\n",
       "      <td>str</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>The reaction equation with atom map. Allowed reaction arrows: -&gt;, &lt;-&gt;.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "utils.present_schema_overview(incawrapper.ReactionsSchema)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>rxn_id</th>\n",
       "      <th>rxn_eqn</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>R1</td>\n",
       "      <td>A (abc) -&gt; B (ab) + D (c)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>R2</td>\n",
       "      <td>B (C1:a C1:b) &lt;-&gt; C (C1:b C2:a)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>R3</td>\n",
       "      <td>C -&gt; D</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  rxn_id                          rxn_eqn\n",
       "0     R1        A (abc) -> B (ab) + D (c)\n",
       "1     R2  B (C1:a C1:b) <-> C (C1:b C2:a)\n",
       "2     R3                           C -> D"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_reactions_example = pd.DataFrame(\n",
    "    {\n",
    "        \"rxn_id\": [\"R1\", \"R2\", \"R3\"],\n",
    "        \"rxn_eqn\": [\"A (abc) -> B (ab) + D (c)\", \"B (C1:a C1:b) <-> C (C1:b C2:a)\", \"C -> D\"],\n",
    "    }\n",
    ")\n",
    "model_reactions_example.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that is is allowed to mix the atom mapping syntax and the model may contain reactions without an atom map."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tracer data\n",
    "The tracer data specify the labelled compounds added to the experiment. The dataframe has a row for each experiment-tracer-labelling group combination, more on this later. For most users it will be sufficient to consider each row one experiment-tracer combination. The tracer dataframe has the following required columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column name</th>\n",
       "      <th>dtype</th>\n",
       "      <th>required</th>\n",
       "      <th>nullable</th>\n",
       "      <th>description</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>experiment_id</td>\n",
       "      <td>str</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>ID of the experiment. Must be a valid MATLAB variable name, legal characters are a-z, A-Z, 0-9, and the underscore character.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>tracer_id</td>\n",
       "      <td>str</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>The unique id of the tracer compound.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>met_id</td>\n",
       "      <td>str</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>The metabolite id of the labelled compound.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>atom_ids</td>\n",
       "      <td>object</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>Ids of the labelled atoms in the labelled atom group (equivalent to columns of the same name in the INCA GUI)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>atom_mdv</td>\n",
       "      <td>object</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>mass/isotopomer distribution vector of the \\nlabelled atom group (equivalent to columns of the same name in the INCA GUI). The simplest way to use this column is to specify the purity of \\nthe labelling group. This is done supplying a list two numbers, e.g. [0.01, 0.99].</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>enrichment</td>\n",
       "      <td>float64</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>mass/isotopomer distribution vector of the labelled atom \\ngroup (equivalent to columns of the same name in the INCA GUI). The simplest way to use this column is to specify the purity of the labelling \\ngroup. This is done supplying a list two numbers, e.g. `[0.5, 0.95]` specifies 95% of the compound will be fully labelled in this labelling \\ngroup. If different atom positions has different purity create a different labelling group for each position. For further description please \\nrefer to the INCA manual. Currently, the incawrapper only supports `atom_mdv` of length 2 for each labelling group.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "utils.present_schema_overview(incawrapper.TracerSchema)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets look at a simple and common example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>experiment_id</th>\n",
       "      <th>met_id</th>\n",
       "      <th>tracer_id</th>\n",
       "      <th>atom_ids</th>\n",
       "      <th>atom_mdv</th>\n",
       "      <th>enrichment</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>exp1</td>\n",
       "      <td>glc</td>\n",
       "      <td>[1,2-13C]glucose</td>\n",
       "      <td>[1, 2]</td>\n",
       "      <td>[0.02, 0.98]</td>\n",
       "      <td>0.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>exp1</td>\n",
       "      <td>glc</td>\n",
       "      <td>[U-13C]glucose</td>\n",
       "      <td>[1, 2, 3, 4, 5, 6]</td>\n",
       "      <td>[0.001, 0.999]</td>\n",
       "      <td>0.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  experiment_id met_id         tracer_id            atom_ids        atom_mdv  \\\n",
       "0          exp1    glc  [1,2-13C]glucose              [1, 2]    [0.02, 0.98]   \n",
       "1          exp1    glc    [U-13C]glucose  [1, 2, 3, 4, 5, 6]  [0.001, 0.999]   \n",
       "\n",
       "   enrichment  \n",
       "0         0.2  \n",
       "1         0.8  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "simple_tracer = pd.DataFrame(\n",
    "    {\n",
    "        \"experiment_id\": [\"exp1\",\"exp1\"],\n",
    "        \"met_id\": [\"glc\", \"glc\"],\n",
    "        \"tracer_id\": [\"[1,2-13C]glucose\",\"[U-13C]glucose\"],\n",
    "        \"atom_ids\" : [[1,2], [1,2,3,4,5,6]],\n",
    "        \"atom_mdv\" : [[0.02, 0.98], [0.001, 0.999]],\n",
    "        \"enrichment\" : [0.2, 0.8],\n",
    "    }\n",
    ")\n",
    "simple_tracer.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The specification above specifies a single experiment were a mixture of two types of labelled glucose was used. The enrichment specifies that the labelled medium contained 20% of [1,2-13C]glucose and 80% [U-13C]glucose. The purity of the used tracers (`atom_mdv`) is [1,2-13C]glucose was 98% pure, i.e. 98% of the of the 20% glucose is labelled on the carbon 1 and 2, while 2% of the 20% glucose is has no labels (i.e. not considered the naturally abundant labelling).\n",
    "\n",
    "### Parallel isotopomer labelling experiment\n",
    "To specify the a set of parallel isotopomer labelling experiment, i.e. different experiments that which should fitted simultaneously. In the following example, specify the experiment from above in parallel with an experiment conducted with 100% [1-13C]glucose."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>experiment_id</th>\n",
       "      <th>met_id</th>\n",
       "      <th>tracer_id</th>\n",
       "      <th>atom_ids</th>\n",
       "      <th>atom_mdv</th>\n",
       "      <th>enrichment</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>exp1</td>\n",
       "      <td>glc</td>\n",
       "      <td>[1,2-13C]glucose</td>\n",
       "      <td>[1, 2]</td>\n",
       "      <td>[0.02, 0.98]</td>\n",
       "      <td>0.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>exp1</td>\n",
       "      <td>glc</td>\n",
       "      <td>[U-13C]glucose</td>\n",
       "      <td>[1, 2, 3, 4, 5, 6]</td>\n",
       "      <td>[0.001, 0.999]</td>\n",
       "      <td>0.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>exp2</td>\n",
       "      <td>glc</td>\n",
       "      <td>[1-13C]glucose</td>\n",
       "      <td>[1]</td>\n",
       "      <td>[0.05, 0.95]</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  experiment_id met_id         tracer_id            atom_ids        atom_mdv  \\\n",
       "0          exp1    glc  [1,2-13C]glucose              [1, 2]    [0.02, 0.98]   \n",
       "1          exp1    glc    [U-13C]glucose  [1, 2, 3, 4, 5, 6]  [0.001, 0.999]   \n",
       "2          exp2    glc    [1-13C]glucose                 [1]    [0.05, 0.95]   \n",
       "\n",
       "   enrichment  \n",
       "0         0.2  \n",
       "1         0.8  \n",
       "2         1.0  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "parallel_experiment_tracer = pd.DataFrame(\n",
    "    {\n",
    "        \"experiment_id\": [\"exp1\",\"exp1\",\"exp2\"],\n",
    "        \"met_id\": [\"glc\", \"glc\", \"glc\"],\n",
    "        \"tracer_id\": [\"[1,2-13C]glucose\",\"[U-13C]glucose\",\"[1-13C]glucose\"],\n",
    "        \"atom_ids\" : [[1,2], [1,2,3,4,5,6],[1]],\n",
    "        \"atom_mdv\" : [[0.02, 0.98], [0.001, 0.999],[0.05, 0.95]],\n",
    "        \"enrichment\" : [0.2, 0.8,1],\n",
    "    }\n",
    ")\n",
    "parallel_experiment_tracer.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using labelling groups\n",
    "The important part for different atom labelling groups is that the `tracer_id` is the same. In the following, we specify that we used a [1,2-13C]glucose tracer where 98% is labelled at carbon atom 1, and the 95% is labelled carbon atom 2. In this case the `enrichment` has to be the same for each labelling group."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>experiment_id</th>\n",
       "      <th>met_id</th>\n",
       "      <th>tracer_id</th>\n",
       "      <th>atom_ids</th>\n",
       "      <th>atom_mdv</th>\n",
       "      <th>enrichment</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>exp1</td>\n",
       "      <td>glc</td>\n",
       "      <td>[1,2-13C]glucose</td>\n",
       "      <td>[1]</td>\n",
       "      <td>[0.02, 0.98]</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>exp1</td>\n",
       "      <td>glc</td>\n",
       "      <td>[1,2-13C]glucose</td>\n",
       "      <td>[2]</td>\n",
       "      <td>[0.05, 0.95]</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  experiment_id met_id         tracer_id atom_ids      atom_mdv  enrichment\n",
       "0          exp1    glc  [1,2-13C]glucose      [1]  [0.02, 0.98]           1\n",
       "1          exp1    glc  [1,2-13C]glucose      [2]  [0.05, 0.95]           1"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "two_labelling_groups_tracer = pd.DataFrame(\n",
    "    {\n",
    "        \"experiment_id\": [\"exp1\",\"exp1\"],\n",
    "        \"met_id\": [\"glc\", \"glc\"],\n",
    "        \"tracer_id\": [\"[1,2-13C]glucose\",\"[1,2-13C]glucose\"],\n",
    "        \"atom_ids\" : [[1], [2]],\n",
    "        \"atom_mdv\" : [[0.02, 0.98], [0.05, 0.95]],\n",
    "        \"enrichment\" : [1, 1],\n",
    "    }\n",
    ")\n",
    "two_labelling_groups_tracer.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Flux measurement data\n",
    "Flux measurements are typically uptake or secretion rates which this does not require labelling. Therefore this data is also quite simple to define. The required columns are the following:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column name</th>\n",
       "      <th>dtype</th>\n",
       "      <th>required</th>\n",
       "      <th>nullable</th>\n",
       "      <th>description</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>experiment_id</td>\n",
       "      <td>str</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>ID of the experiment. Must be a valid MATLAB variable name, legal characters are a-z, A-Z, 0-9, and the underscore character.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>rxn_id</td>\n",
       "      <td>str</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>The unique id of the reaction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>flux</td>\n",
       "      <td>float64</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>Measured/estimated rate typically in mmol/gDW/h</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>flux_std_error</td>\n",
       "      <td>float64</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>Standard error of the measured/estimated rate</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "utils.present_schema_overview(incawrapper.FluxMeasurementsSchema)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The units has to be consistent for all measurements, because INCA assumes that all rates have the same units. Notice that it is not possible supply a time point for the rate estimates. This is because INCA supports steady state and isotopically non-stationary labelling analysis. Both of these methods assumes that all rates are constant over the time duration and only isotopomer distribution vector are allowed to change over time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>experiment_id</th>\n",
       "      <th>rxn_id</th>\n",
       "      <th>flux</th>\n",
       "      <th>flux_std_error</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>exp1</td>\n",
       "      <td>R1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>exp1</td>\n",
       "      <td>R2</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>exp1</td>\n",
       "      <td>R3</td>\n",
       "      <td>3.0</td>\n",
       "      <td>0.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>exp2</td>\n",
       "      <td>R1</td>\n",
       "      <td>1.2</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>exp2</td>\n",
       "      <td>R2</td>\n",
       "      <td>1.8</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  experiment_id rxn_id  flux  flux_std_error\n",
       "0          exp1     R1   1.0             0.1\n",
       "1          exp1     R2   2.0             0.5\n",
       "2          exp1     R3   3.0             0.2\n",
       "3          exp2     R1   1.2             0.1\n",
       "4          exp2     R2   1.8             0.5"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "flux_measurements_example = pd.DataFrame(\n",
    "    {\n",
    "        \"experiment_id\": [\"exp1\", \"exp1\", \"exp1\", \"exp2\", \"exp2\", \"exp2\"],\n",
    "        \"rxn_id\": [\"R1\", \"R2\", \"R3\", \"R1\", \"R2\", \"R3\"],\n",
    "        \"flux\": [1.0, 2.0, 3.0, 1.2, 1.8, 2.8],\n",
    "        \"flux_std_error\": [0.1, 0.5, 0.2, 0.1, 0.5, 0.2],\n",
    "    }\n",
    ")\n",
    "flux_measurements_example.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Mass spectrometry measurements\n",
    "Mass spectrometry measurements are given as isotopomer distribution vectors and the measurement standard error. These can be corrected for natural abundance or not, but by default INCA performs natural abundance correction, thus this needs to be turned of in the options if it is not required (See XX). The required columns for ms measurements are:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column name</th>\n",
       "      <th>dtype</th>\n",
       "      <th>required</th>\n",
       "      <th>nullable</th>\n",
       "      <th>description</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>experiment_id</td>\n",
       "      <td>str</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>ID of the experiment. Must be a valid MATLAB variable name, legal characters are a-z, A-Z, 0-9, and the underscore character.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>met_id</td>\n",
       "      <td>str</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>Metabolite ID of metabolite which is directly measured or from which the fragment is derived through a derivatization method.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>ms_id</td>\n",
       "      <td>str</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>ID of the measured ms fragment - often multiple fragment can be measured from the same metabolite</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>measurement_replicate</td>\n",
       "      <td>int64</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>Replicate number of the measurement of the same fragment in the same experiment. \\n\"In most cases, the data will only have one measurement per fragment per experiment.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>labelled_atom_ids</td>\n",
       "      <td>object</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>List of atom ids of the labelled atoms in the metabolite.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>unlabelled_atoms</td>\n",
       "      <td>str</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>The molecular formula of the all atoms that cannot be labelled through \\nthe introduced labels in the tracers. This typically includes non-carbon elements of the fragment and all elements originating from derivatization agent. \\nINCA uses the unlabelled atoms to correct for natural abundance.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>mass_isotope</td>\n",
       "      <td>int64</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>The mass isotopomer of the fragment.\\nE.g. M0, M+1, etc. Specified as an integer. It is allowed to have gaps in the isotopmer of a given fragment, e.g. 0, 2, 3. In this case the intensity and \\nstd error of missing isotopomers are filled with NaN before inserted in INCA.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>intensity</td>\n",
       "      <td>float64</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>The measured intensity of the fragment mass isotope.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>intensity_std_error</td>\n",
       "      <td>float64</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>The standard error of the measured intensity of the fragment mass isotope.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>time</td>\n",
       "      <td>float64</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>Time point of measurement only relevant for isotopically non-stationary MFA analysis</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "utils.present_schema_overview(incawrapper.MSMeasurementsSchema)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>experiment_id</th>\n",
       "      <th>met_id</th>\n",
       "      <th>ms_id</th>\n",
       "      <th>measurement_replicate</th>\n",
       "      <th>labelled_atom_ids</th>\n",
       "      <th>unlabelled_atoms</th>\n",
       "      <th>mass_isotope</th>\n",
       "      <th>intensity</th>\n",
       "      <th>intensity_std_error</th>\n",
       "      <th>time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>exp1</td>\n",
       "      <td>tyr</td>\n",
       "      <td>tyr_302</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2]</td>\n",
       "      <td>C12H32O2NSi2</td>\n",
       "      <td>0</td>\n",
       "      <td>0.1000</td>\n",
       "      <td>0.0100</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>exp1</td>\n",
       "      <td>tyr</td>\n",
       "      <td>tyr_302</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2]</td>\n",
       "      <td>C12H32O2NSi2</td>\n",
       "      <td>1</td>\n",
       "      <td>0.2000</td>\n",
       "      <td>0.0200</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>exp1</td>\n",
       "      <td>tyr</td>\n",
       "      <td>tyr_302</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2]</td>\n",
       "      <td>C12H32O2NSi2</td>\n",
       "      <td>2</td>\n",
       "      <td>0.4000</td>\n",
       "      <td>0.0400</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_390</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2, 3]</td>\n",
       "      <td>C14H40O3NSi3</td>\n",
       "      <td>0</td>\n",
       "      <td>0.2000</td>\n",
       "      <td>0.0200</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_390</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2, 3]</td>\n",
       "      <td>C14H40O3NSi3</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0001</td>\n",
       "      <td>0.0001</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_390</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2, 3]</td>\n",
       "      <td>C14H40O3NSi3</td>\n",
       "      <td>2</td>\n",
       "      <td>0.0030</td>\n",
       "      <td>0.0030</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_390</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2, 3]</td>\n",
       "      <td>C14H40O3NSi3</td>\n",
       "      <td>3</td>\n",
       "      <td>0.3000</td>\n",
       "      <td>0.0300</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_362</td>\n",
       "      <td>1</td>\n",
       "      <td>[2, 3]</td>\n",
       "      <td>C14H40O2NSi3</td>\n",
       "      <td>0</td>\n",
       "      <td>0.4000</td>\n",
       "      <td>0.0400</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_362</td>\n",
       "      <td>1</td>\n",
       "      <td>[2, 3]</td>\n",
       "      <td>C14H40O2NSi3</td>\n",
       "      <td>1</td>\n",
       "      <td>0.1000</td>\n",
       "      <td>0.0100</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_362</td>\n",
       "      <td>1</td>\n",
       "      <td>[2, 3]</td>\n",
       "      <td>C14H40O2NSi3</td>\n",
       "      <td>2</td>\n",
       "      <td>0.5000</td>\n",
       "      <td>0.0500</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  experiment_id met_id    ms_id  measurement_replicate labelled_atom_ids  \\\n",
       "0          exp1    tyr  tyr_302                      1            [1, 2]   \n",
       "1          exp1    tyr  tyr_302                      1            [1, 2]   \n",
       "2          exp1    tyr  tyr_302                      1            [1, 2]   \n",
       "3          exp1    ser  ser_390                      1         [1, 2, 3]   \n",
       "4          exp1    ser  ser_390                      1         [1, 2, 3]   \n",
       "5          exp1    ser  ser_390                      1         [1, 2, 3]   \n",
       "6          exp1    ser  ser_390                      1         [1, 2, 3]   \n",
       "7          exp1    ser  ser_362                      1            [2, 3]   \n",
       "8          exp1    ser  ser_362                      1            [2, 3]   \n",
       "9          exp1    ser  ser_362                      1            [2, 3]   \n",
       "\n",
       "  unlabelled_atoms  mass_isotope  intensity  intensity_std_error  time  \n",
       "0     C12H32O2NSi2             0     0.1000               0.0100     0  \n",
       "1     C12H32O2NSi2             1     0.2000               0.0200     0  \n",
       "2     C12H32O2NSi2             2     0.4000               0.0400     0  \n",
       "3     C14H40O3NSi3             0     0.2000               0.0200     0  \n",
       "4     C14H40O3NSi3             1     0.0001               0.0001     0  \n",
       "5     C14H40O3NSi3             2     0.0030               0.0030     0  \n",
       "6     C14H40O3NSi3             3     0.3000               0.0300     0  \n",
       "7     C14H40O2NSi3             0     0.4000               0.0400     0  \n",
       "8     C14H40O2NSi3             1     0.1000               0.0100     0  \n",
       "9     C14H40O2NSi3             2     0.5000               0.0500     0  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "n_isotopes = [3, 4, 3]\n",
    "ms_measurements_example = pd.DataFrame(\n",
    "    {\n",
    "        \"experiment_id\": np.repeat([\"exp1\"], sum(n_isotopes)),\n",
    "        \"met_id\": np.repeat([\"tyr\", \"ser\", \"ser\"], n_isotopes), \n",
    "        \"ms_id\": np.repeat([\"tyr_302\", \"ser_390\", \"ser_362\"], n_isotopes),\n",
    "        \"measurement_replicate\": np.repeat([1], sum(n_isotopes)),\n",
    "        \"labelled_atom_ids\": np.repeat(np.array([[1,2], [1,2,3], [2,3]], dtype=object), n_isotopes),\n",
    "        \"unlabelled_atoms\": np.repeat([\"C12H32O2NSi2\", \"C14H40O3NSi3\", \"C14H40O2NSi3\"], n_isotopes),\n",
    "        \"mass_isotope\": [0, 1, 2, 0, 1, 2, 3, 0, 1, 2],\n",
    "        \"intensity\": [0.1, 0.2, 0.4, 0.2, 0.0001, 0.003, 0.3, 0.4, 0.1, 0.5],\n",
    "        \"intensity_std_error\": [0.01, 0.02, 0.04, 0.02, 0.0001, 0.003, 0.03, 0.04, 0.01, 0.05],\n",
    "        \"time\": np.repeat([0], sum(n_isotopes)),\n",
    "    }\n",
    ")\n",
    "ms_measurements_example"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Note about formatting when reading csv and and excel files\n",
    "Some of the data inputs requires the element of a cell to be a python list. This can cause issues when reading the data from .csv or excel files. To accommodate the issue simply write a list in python syntax as a string in the csv or excel file. When the file is read using pandas you will evaluate the columns the list strings, which will convert the strings in to python lists.\n",
    "\n",
    "### CSV files\n",
    "For .csv files we can use the convert argument in the `pd.read_csv` functions. Lets use the ms data as an example. The csv file would look as follows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "experiment_id,met_id,ms_id,measurement_replicate,labelled_atom_ids,unlabelled_atoms,mass_isotope,intensity,intensity_std_error,time\n",
      "exp1,tyr,tyr_302,1,\"[1, 2]\",C12H32O2NSi2,0,0.1000,0.0100,0\n",
      "exp1,tyr,tyr_302,1,\"[1, 2]\",C12H32O2NSi2,1,0.2000,0.0200,0\n",
      "exp1,tyr,tyr_302,1,\"[1, 2]\",C12H32O2NSi2,2,0.4000,0.0400,0\n",
      "exp1,ser,ser_390,1,\"[1, 2, 3]\",C14H40O3NSi3,0,0.2000,0.0200,0\n",
      "exp1,ser,ser_390,1,\"[1, 2, 3]\",C14H40O3NSi3,1,0.0001,0.0001,0\n",
      "exp1,ser,ser_390,1,\"[1, 2, 3]\",C14H40O3NSi3,2,0.0030,0.0030,0\n",
      "exp1,ser,ser_390,1,\"[1, 2, 3]\",C14H40O3NSi3,3,0.3000,0.0300,0\n",
      "exp1,ser,ser_362,1,\"[2, 3]\",C14H40O2NSi3,0,0.4000,0.0400,0\n",
      "exp1,ser,ser_362,1,\"[2, 3]\",C14H40O2NSi3,1,0.1000,0.0100,0\n",
      "exp1,ser,ser_362,1,\"[2, 3]\",C14H40O2NSi3,2,0.5000,0.0500,0\n"
     ]
    }
   ],
   "source": [
    "csv_illtration_file = pathlib.Path(\"./examples/Literature data/simple model/ms_measurement_csv_input_example.csv\")\n",
    "with open(csv_illtration_file, \"r\") as f:\n",
    "    print(f.read())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that the list definitions ([]) are inclosed in double quotes, but importantly when the lists contains strings these strings should be also be inclosed in single quotes, e.g. `\"['C3','C4']\"`. When this csv file is read through `pd.read_csv` the string lists are correctly read."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>experiment_id</th>\n",
       "      <th>met_id</th>\n",
       "      <th>ms_id</th>\n",
       "      <th>measurement_replicate</th>\n",
       "      <th>labelled_atom_ids</th>\n",
       "      <th>unlabelled_atoms</th>\n",
       "      <th>mass_isotope</th>\n",
       "      <th>intensity</th>\n",
       "      <th>intensity_std_error</th>\n",
       "      <th>time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>exp1</td>\n",
       "      <td>tyr</td>\n",
       "      <td>tyr_302</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2]</td>\n",
       "      <td>C12H32O2NSi2</td>\n",
       "      <td>0</td>\n",
       "      <td>0.1000</td>\n",
       "      <td>0.0100</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>exp1</td>\n",
       "      <td>tyr</td>\n",
       "      <td>tyr_302</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2]</td>\n",
       "      <td>C12H32O2NSi2</td>\n",
       "      <td>1</td>\n",
       "      <td>0.2000</td>\n",
       "      <td>0.0200</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>exp1</td>\n",
       "      <td>tyr</td>\n",
       "      <td>tyr_302</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2]</td>\n",
       "      <td>C12H32O2NSi2</td>\n",
       "      <td>2</td>\n",
       "      <td>0.4000</td>\n",
       "      <td>0.0400</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_390</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2, 3]</td>\n",
       "      <td>C14H40O3NSi3</td>\n",
       "      <td>0</td>\n",
       "      <td>0.2000</td>\n",
       "      <td>0.0200</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_390</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2, 3]</td>\n",
       "      <td>C14H40O3NSi3</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0001</td>\n",
       "      <td>0.0001</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  experiment_id met_id    ms_id  measurement_replicate labelled_atom_ids  \\\n",
       "0          exp1    tyr  tyr_302                      1            [1, 2]   \n",
       "1          exp1    tyr  tyr_302                      1            [1, 2]   \n",
       "2          exp1    tyr  tyr_302                      1            [1, 2]   \n",
       "3          exp1    ser  ser_390                      1         [1, 2, 3]   \n",
       "4          exp1    ser  ser_390                      1         [1, 2, 3]   \n",
       "\n",
       "  unlabelled_atoms  mass_isotope  intensity  intensity_std_error  time  \n",
       "0     C12H32O2NSi2             0     0.1000               0.0100     0  \n",
       "1     C12H32O2NSi2             1     0.2000               0.0200     0  \n",
       "2     C12H32O2NSi2             2     0.4000               0.0400     0  \n",
       "3     C14H40O3NSi3             0     0.2000               0.0200     0  \n",
       "4     C14H40O3NSi3             1     0.0001               0.0001     0  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from_csv = pd.read_csv(csv_illtration_file, converters={\"labelled_atom_ids\": ast.literal_eval})\n",
    "from_csv.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can verify that the data type of the idv is a list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "list"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(from_csv[\"labelled_atom_ids\"][0])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And check that the dataframe passes the schema validation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>experiment_id</th>\n",
       "      <th>met_id</th>\n",
       "      <th>ms_id</th>\n",
       "      <th>measurement_replicate</th>\n",
       "      <th>labelled_atom_ids</th>\n",
       "      <th>unlabelled_atoms</th>\n",
       "      <th>mass_isotope</th>\n",
       "      <th>intensity</th>\n",
       "      <th>intensity_std_error</th>\n",
       "      <th>time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>exp1</td>\n",
       "      <td>tyr</td>\n",
       "      <td>tyr_302</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2]</td>\n",
       "      <td>C12H32O2NSi2</td>\n",
       "      <td>0</td>\n",
       "      <td>0.1000</td>\n",
       "      <td>0.0100</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>exp1</td>\n",
       "      <td>tyr</td>\n",
       "      <td>tyr_302</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2]</td>\n",
       "      <td>C12H32O2NSi2</td>\n",
       "      <td>1</td>\n",
       "      <td>0.2000</td>\n",
       "      <td>0.0200</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>exp1</td>\n",
       "      <td>tyr</td>\n",
       "      <td>tyr_302</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2]</td>\n",
       "      <td>C12H32O2NSi2</td>\n",
       "      <td>2</td>\n",
       "      <td>0.4000</td>\n",
       "      <td>0.0400</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_390</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2, 3]</td>\n",
       "      <td>C14H40O3NSi3</td>\n",
       "      <td>0</td>\n",
       "      <td>0.2000</td>\n",
       "      <td>0.0200</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_390</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2, 3]</td>\n",
       "      <td>C14H40O3NSi3</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0001</td>\n",
       "      <td>0.0001</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_390</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2, 3]</td>\n",
       "      <td>C14H40O3NSi3</td>\n",
       "      <td>2</td>\n",
       "      <td>0.0030</td>\n",
       "      <td>0.0030</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_390</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2, 3]</td>\n",
       "      <td>C14H40O3NSi3</td>\n",
       "      <td>3</td>\n",
       "      <td>0.3000</td>\n",
       "      <td>0.0300</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_362</td>\n",
       "      <td>1</td>\n",
       "      <td>[2, 3]</td>\n",
       "      <td>C14H40O2NSi3</td>\n",
       "      <td>0</td>\n",
       "      <td>0.4000</td>\n",
       "      <td>0.0400</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_362</td>\n",
       "      <td>1</td>\n",
       "      <td>[2, 3]</td>\n",
       "      <td>C14H40O2NSi3</td>\n",
       "      <td>1</td>\n",
       "      <td>0.1000</td>\n",
       "      <td>0.0100</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_362</td>\n",
       "      <td>1</td>\n",
       "      <td>[2, 3]</td>\n",
       "      <td>C14H40O2NSi3</td>\n",
       "      <td>2</td>\n",
       "      <td>0.5000</td>\n",
       "      <td>0.0500</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  experiment_id met_id    ms_id  measurement_replicate labelled_atom_ids  \\\n",
       "0          exp1    tyr  tyr_302                      1            [1, 2]   \n",
       "1          exp1    tyr  tyr_302                      1            [1, 2]   \n",
       "2          exp1    tyr  tyr_302                      1            [1, 2]   \n",
       "3          exp1    ser  ser_390                      1         [1, 2, 3]   \n",
       "4          exp1    ser  ser_390                      1         [1, 2, 3]   \n",
       "5          exp1    ser  ser_390                      1         [1, 2, 3]   \n",
       "6          exp1    ser  ser_390                      1         [1, 2, 3]   \n",
       "7          exp1    ser  ser_362                      1            [2, 3]   \n",
       "8          exp1    ser  ser_362                      1            [2, 3]   \n",
       "9          exp1    ser  ser_362                      1            [2, 3]   \n",
       "\n",
       "  unlabelled_atoms  mass_isotope  intensity  intensity_std_error  time  \n",
       "0     C12H32O2NSi2             0     0.1000               0.0100   0.0  \n",
       "1     C12H32O2NSi2             1     0.2000               0.0200   0.0  \n",
       "2     C12H32O2NSi2             2     0.4000               0.0400   0.0  \n",
       "3     C14H40O3NSi3             0     0.2000               0.0200   0.0  \n",
       "4     C14H40O3NSi3             1     0.0001               0.0001   0.0  \n",
       "5     C14H40O3NSi3             2     0.0030               0.0030   0.0  \n",
       "6     C14H40O3NSi3             3     0.3000               0.0300   0.0  \n",
       "7     C14H40O2NSi3             0     0.4000               0.0400   0.0  \n",
       "8     C14H40O2NSi3             1     0.1000               0.0100   0.0  \n",
       "9     C14H40O2NSi3             2     0.5000               0.0500   0.0  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "incawrapper.MSMeasurementsSchema.validate(from_csv)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Excel files\n",
    "The excel file can written more straight forward without double quotes around the lists. For example a cell in labelled_atom_ids can be `['C3','C4']`. Notice, that the atom ids are still wrapped in single quotes because they are defined using the C notation.\n",
    "\n",
    "Reading from an excel file is slightly more verbose. Here we first read the file and then use `ast.literal_eval()` to convert the strings to python-lists."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "excel_illtration_file = pathlib.Path(\"./examples/Literature data/simple model/ms_measurement_csv_input_example.xlsx\")\n",
    "from_excel = pd.read_excel(excel_illtration_file)\n",
    "from_excel[['labelled_atom_ids']] = from_excel[['labelled_atom_ids']].applymap(lambda x: ast.literal_eval(x))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, we see that the data read from excel passes the schema validation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>experiment_id</th>\n",
       "      <th>met_id</th>\n",
       "      <th>ms_id</th>\n",
       "      <th>measurement_replicate</th>\n",
       "      <th>labelled_atom_ids</th>\n",
       "      <th>unlabelled_atoms</th>\n",
       "      <th>mass_isotope</th>\n",
       "      <th>intensity</th>\n",
       "      <th>intensity_std_error</th>\n",
       "      <th>time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>exp1</td>\n",
       "      <td>tyr</td>\n",
       "      <td>tyr_302</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2]</td>\n",
       "      <td>C12H32O2NSi2</td>\n",
       "      <td>0</td>\n",
       "      <td>0.1000</td>\n",
       "      <td>0.0100</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>exp1</td>\n",
       "      <td>tyr</td>\n",
       "      <td>tyr_302</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2]</td>\n",
       "      <td>C12H32O2NSi2</td>\n",
       "      <td>1</td>\n",
       "      <td>0.2000</td>\n",
       "      <td>0.0200</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>exp1</td>\n",
       "      <td>tyr</td>\n",
       "      <td>tyr_302</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2]</td>\n",
       "      <td>C12H32O2NSi2</td>\n",
       "      <td>2</td>\n",
       "      <td>0.4000</td>\n",
       "      <td>0.0400</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_390</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2, 3]</td>\n",
       "      <td>C14H40O3NSi3</td>\n",
       "      <td>0</td>\n",
       "      <td>0.2000</td>\n",
       "      <td>0.0200</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_390</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2, 3]</td>\n",
       "      <td>C14H40O3NSi3</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0001</td>\n",
       "      <td>0.0001</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_390</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2, 3]</td>\n",
       "      <td>C14H40O3NSi3</td>\n",
       "      <td>2</td>\n",
       "      <td>0.0030</td>\n",
       "      <td>0.0030</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_390</td>\n",
       "      <td>1</td>\n",
       "      <td>[1, 2, 3]</td>\n",
       "      <td>C14H40O3NSi3</td>\n",
       "      <td>3</td>\n",
       "      <td>0.3000</td>\n",
       "      <td>0.0300</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_362</td>\n",
       "      <td>1</td>\n",
       "      <td>[2, 3]</td>\n",
       "      <td>C14H40O2NSi3</td>\n",
       "      <td>0</td>\n",
       "      <td>0.4000</td>\n",
       "      <td>0.0400</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_362</td>\n",
       "      <td>1</td>\n",
       "      <td>[2, 3]</td>\n",
       "      <td>C14H40O2NSi3</td>\n",
       "      <td>1</td>\n",
       "      <td>0.1000</td>\n",
       "      <td>0.0100</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>exp1</td>\n",
       "      <td>ser</td>\n",
       "      <td>ser_362</td>\n",
       "      <td>1</td>\n",
       "      <td>[2, 3]</td>\n",
       "      <td>C14H40O2NSi3</td>\n",
       "      <td>2</td>\n",
       "      <td>0.5000</td>\n",
       "      <td>0.0500</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  experiment_id met_id    ms_id  measurement_replicate labelled_atom_ids  \\\n",
       "0          exp1    tyr  tyr_302                      1            [1, 2]   \n",
       "1          exp1    tyr  tyr_302                      1            [1, 2]   \n",
       "2          exp1    tyr  tyr_302                      1            [1, 2]   \n",
       "3          exp1    ser  ser_390                      1         [1, 2, 3]   \n",
       "4          exp1    ser  ser_390                      1         [1, 2, 3]   \n",
       "5          exp1    ser  ser_390                      1         [1, 2, 3]   \n",
       "6          exp1    ser  ser_390                      1         [1, 2, 3]   \n",
       "7          exp1    ser  ser_362                      1            [2, 3]   \n",
       "8          exp1    ser  ser_362                      1            [2, 3]   \n",
       "9          exp1    ser  ser_362                      1            [2, 3]   \n",
       "\n",
       "  unlabelled_atoms  mass_isotope  intensity  intensity_std_error  time  \n",
       "0     C12H32O2NSi2             0     0.1000               0.0100   0.0  \n",
       "1     C12H32O2NSi2             1     0.2000               0.0200   0.0  \n",
       "2     C12H32O2NSi2             2     0.4000               0.0400   0.0  \n",
       "3     C14H40O3NSi3             0     0.2000               0.0200   0.0  \n",
       "4     C14H40O3NSi3             1     0.0001               0.0001   0.0  \n",
       "5     C14H40O3NSi3             2     0.0030               0.0030   0.0  \n",
       "6     C14H40O3NSi3             3     0.3000               0.0300   0.0  \n",
       "7     C14H40O2NSi3             0     0.4000               0.0400   0.0  \n",
       "8     C14H40O2NSi3             1     0.1000               0.0100   0.0  \n",
       "9     C14H40O2NSi3             2     0.5000               0.0500   0.0  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "incawrapper.MSMeasurementsSchema.validate(from_excel)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we have showed the expected structure of the input data you can move on to the next section to read about the how to use the INCAWrapper."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "bfair-testing",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.8"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "820c70ec08a0eb018d8ec3c5d089748cbb2d1e243fbedf0a7cbeb7c8948c3b84"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}