5. Confidence intervals through Monte Carlo sampling

INCA can obtain confidence interval using Monte Carlo sampling for a set of estimated fluxes. In this notebook, we show how to use the this functionality with the INCAWrapper. This guide will focus on how to run Monte Carlo sampling and load the results. Please refer to other guides for a more detailed description of how to load the data and setup the INCA script.

[1]:
import pandas as pd
import pathlib
import incawrapper
import ast
import pandas as pd
import dotenv

Again, we will use the simple toy model, that we also used in other guides. First, we load the data.

[2]:
data_folder = pathlib.Path("./examples/Literature data/simple model")
tracers_data = pd.read_csv(data_folder / "tracers.csv",
   converters={'atom_mdv':ast.literal_eval, 'atom_ids':ast.literal_eval} # a trick to read lists from csv
)
reactions_data = pd.read_csv(data_folder / "reactions.csv")
flux_data = pd.read_csv(data_folder / "flux_measurements.csv")
ms_data = pd.read_csv(data_folder / "ms_measurements.csv",
   converters={'labelled_atom_ids': ast.literal_eval} # a trick to read lists from csv
)

To run the Monte Carlo sampling simply set the run_montecarlo argument to True in the define_runner() function. The settings for the Monte Carlo algorithm is defined in the options block using the define_options() function. Please refer to the INCA documentation to see which options are available. The Monte Carlo algorithm require a estimate algorithm to be run first. Therefore, we also set the run_estimate=True. Finally the run_simuation=True to be able to open the estimate in the INCA GUI.

NB. currently it is not possible to open the results of the Monte Carlo sampling in the INCA GUI.

The results of the Monte Carlo sampling will be saved to a separate .mat file. This file will named <output_file>_mc.mat, where <output_file> is the output file given to the define_runner() function.

[6]:
output_file = pathlib.Path("./examples//Literature data/simple model/simple_model_mc_tutorial.mat")
script = incawrapper.create_inca_script_from_data(reactions_data, tracers_data, flux_data, ms_data, experiment_ids=["exp1"])
script.add_to_block("options", incawrapper.define_options(fit_starts=5,sim_na=False))
script.add_to_block("runner", incawrapper.define_runner(output_file, run_estimate=True, run_simulation=True, run_montecarlo=True))

On larger models running monte carlo sampling can be computational heavy and thus take a while. Therefore, INCA produces a file called dump.mat which contains the temporary results. If the process is interrupted the results are store in dump files. By default the run_inca function runs INCA in a temporary directory, thus if the process is interrupted the dump file with be destroyed. Therefore, we recommend to specify an execution directory, when running monte carlo sampling to keep the results even is the process is interrupted.

[7]:
inca_directory = pathlib.Path(dotenv.get_key(dotenv.find_dotenv(), "INCA_base_directory"))
incawrapper.run_inca(script, INCA_base_directory=inca_directory, execution_directory=data_folder)
INCA script saved to examples/Literature data/simple model/inca_script.m.
Starting MATLAB engine...

ms_exp1 = 1x1 msdata object

fields: atoms  id  [idvs]  more  on  state

F1


m = 1x1 model object

fields: [expts]  [mets]  notes  [options]  [rates]  [states]

        5 reactions (6 fluxes)
        6 states (3 balanced, 1 source, 2 sink and 0 unbalanced)
        6 metabolites
        1 experiments


                                         Directional
 Iteration      Residual     Step-size    derivative        Lambda
     0       9.99215e+11
     1           2710.13             1     -1.07e+07      0.550165
     2           1036.74             1      1.87e+03      0.183388
     3           232.556             1          -130      0.178407
     4           139.288             1         -11.8     0.0594691
     5           136.533             1        -0.278      0.019823

 Optimization terminated: Unconstrained optimum found.
 Parameters converged to within tolerance.

                                         Directional
 Iteration      Residual     Step-size    derivative        Lambda
     0       9.99618e+11
     1        8.2702e+08             1     -2.23e+10       842.504
     2       3.45244e+07             1     -4.58e+07       280.835
     3        9.2106e+06         0.663     -9.66e+06       93.6116
     4       1.47034e+06             1     -1.58e+06       93.6116
     5           64453.2             1     -1.65e+05       64.0253
     6              4432             1     -7.58e+03       21.3418
     7           2344.37             1      1.16e+04       7.11392
     8           5177.44         0.578          -690       6.01035
     9           5069.71         0.728          -241       12.0207
    10           1094.97             1      1.33e+03       48.0828
    11           189.275             1           341       47.7252
    12           136.773             1         -2.45       15.9084
    13           136.494             1       -0.0206        5.3028

 Optimization terminated: Unconstrained optimum found.
 Parameters converged to within tolerance.

                                         Directional
 Iteration      Residual     Step-size    derivative        Lambda
     0       9.99371e+11
     1       9.45394e+11        0.0274     -9.72e+11       8.05144
     2            149383             1     -3.02e+04       8.05144
     3           6548.04             1      -1.6e+04       2.68381
     4           1047.67             1          -160      0.894604
     5              2494             1         3e+03      0.298201
     6           2485.01             1      2.99e+03      0.596403
     7           2432.09             1      2.94e+03       2.38561
     8           2011.95             1      2.51e+03       19.0849
     9           789.893             1           885       152.679
    10           157.342             1           159        161.51
    11           136.506             1        -0.411       53.8368

 Optimization terminated: Unconstrained optimum found.
 Parameters converged to within tolerance.

                                         Directional
 Iteration      Residual     Step-size    derivative        Lambda
     0       9.99241e+11
     1       9.98837e+11      0.000202     -9.99e+11      0.212698
     2           7500.53             1     -6.29e+05      0.212698
     3           5782.11             1          -717     0.0708994
     4           5048.38             1           290     0.0236331
     5           5000.02             1        -0.992    0.00787771

 Optimization terminated: Constrained optimum found.
 Parameters converged to within tolerance.

                                         Directional
 Iteration      Residual     Step-size    derivative        Lambda
     0        9.9957e+11
     1       9.19397e+11         0.041     -9.58e+11       3.57529
     2           4550.67             1     -5.56e+03       3.57529
     3           5919.76          0.96           151       1.19176
     4           5919.34         0.971           142       2.38352
     5           5665.25             1      5.63e+03        9.5341
     6            1367.9             1      1.43e+03       76.2728
     7            1010.2             1           916       25.4243
     8           169.481             1           240       27.2478
     9            136.62             1         -1.28       9.08258
    10           136.492             1      -0.00981       3.02753

 Optimization terminated: Unconstrained optimum found.
 Parameters converged to within tolerance.

                                         Directional
 Iteration      Residual     Step-size    derivative        Lambda
     0           136.492
     1           136.491             1     -0.000109     0.0003591

 Optimization terminated: Unconstrained optimum found.
 Parameters converged to within tolerance.

        Estimation completed in 10.8600 seconds.

        Preprocessing time:     7.1200 s

        Computation time:       3.6000 s

        Postprocessing time:    0.1400 s

Warning: Network is ill-conditioned.

        Simulation completed in 0.1100 seconds.

        Preprocessing time:     0.0900 s

        Computation time:       0.0100 s

        Postprocessing time:    0.0100 s


 Iteration    Error Norm
     1        Initial
     2        0.01068525
     3        0.00143113
     4        0.00394698
     5        0.00525153
     6        0.00144298
     7        0.00311287
     8        0.00270034

        Monte Carlo simulation completed in 9.7400 seconds.

        Preprocessing time:     0.1100 s

        Computation time:       9.6300 s
--- 21.35728120803833 seconds -

The INCA run above produces two files.

  • simple_model_mc_tutorial.mat - This regular output file which also can be opened in the INCA GUI. It contains the results of flux estimation.

  • simple_model_mc_tutorial_mc.mat - The Monte Carlo results. This contain the Monte Carlo samples and the resulting confidence intervals of all the parameters.

We can now load the results of both the flux estimation and the Monte Carlo sampling by setting the argument load_mc_data=True. This expects the Monte Carlo results file to be named <output_file>_mc.mat, so don’t renamed it.

[9]:
res = incawrapper.INCAResults(output_file, load_mc_data=True)

We now have a INCAResults object which contain all the results from the INCA run. It contains the regular fitdata object.

[10]:
res.fitdata.fitted_parameters.head()
[10]:
type id eqn val std lb ub unit free alf chi2s cont cor cov vals base
0 Net flux R1 A -> B 10.000000 0.000010 [] [] [] 0 0.05 [] 1 [1.0, 8.947213030262141e-05, 2.001822930102039... [1.0000002128446317e-10, 6.084410059115086e-11... [] {'id': []}
1 Net flux R2 net B <-> D 6.084423 0.068003 [] [] [] 1 0.05 [] 1 [8.947210479326169e-05, 1.0, 0.92843189384833,... [6.08440832439161e-11, 0.004624462937176305, 0... [] {'id': []}
2 Exch flux R2 exch B <-> D 6.622329 0.330811 [] [] [] 1 0.05 [] 1 [2.001819888669558e-05, 0.9284318796575584, 0.... [6.622242337828155e-11, 0.020886261902995223, ... [] {'id': []}
3 Net flux R3 B -> C + E 1.957789 0.034002 [] [] [] 1 0.05 [] 1 [5.757937359024585e-05, -0.9999999891879296, -... [1.9577969020273533e-11, -0.002312231438166102... [] {'id': []}
4 Net flux R4 B + C -> D + E + E 1.957789 0.034002 [] [] [] 0 0.05 [] 1 [5.757937359024585e-05, -0.9999999891879296, -... [1.9577969020273533e-11, -0.002312231438166102... [] {'id': []}

But also a INCAMonteCarloResults object in the .mc attribute. This object contains the confidence intervals and the samples.

[11]:
res.mc.ci
[11]:
R1 R2 net R2 exch R3 R4 R5 exp1 F1 exp1_F1_0_0_1 norm
lb 9.999979 5.952222 5.998825 1.891729 1.891729 7.976103 0.951393
ub 10.000019 6.216543 7.276211 2.023882 2.023882 8.108272 1.014831
[12]:
res.mc.samples.head()
[12]:
R1 R2 net R2 exch R3 R4 R5 exp1 F1 exp1_F1_0_0_1 norm
0 10.000003 5.997718 6.259179 2.001142 2.001142 7.998861 0.976538
1 10.000001 6.127566 6.807506 1.936218 1.936218 8.063784 0.995646
2 10.000001 6.137113 6.855429 1.931444 1.931444 8.068557 1.003207
3 9.999994 6.066098 6.722543 1.966948 1.966948 8.033046 0.966709
4 10.000002 6.049678 6.577879 1.975162 1.975162 8.024840 0.982117

Each row in the .samples dataframe is a sample from the Monte Carlo sampling algorithm. It is easy to visualize the samples of one parameter (e.g. reaction) as a histogram using matplotlib.

[13]:
import matplotlib.pyplot as plt
plt.hist(res.mc.samples["R2 net"], bins=50)
plt.show()
_images/monte_carlo_sampling_17_0.png

Alternatively, one can inspect the correlations using the pandas.plotting.scatter_matrix(), note we recommend not plotting all parameters at the same time as this can make huge plots.

[14]:
g = pd.plotting.scatter_matrix(res.mc.samples.filter(['R1', 'R2 net', 'R2 exch', 'R3']), figsize=(10,10))
plt.tight_layout()
plt.show()
_images/monte_carlo_sampling_19_0.png
[ ]: