5. Confidence intervals through Monte Carlo sampling¶
INCA can obtain confidence interval using Monte Carlo sampling for a set of estimated fluxes. In this notebook, we show how to use the this functionality with the INCAWrapper. This guide will focus on how to run Monte Carlo sampling and load the results. Please refer to other guides for a more detailed description of how to load the data and setup the INCA script.
[1]:
import pandas as pd
import pathlib
import incawrapper
import ast
import pandas as pd
import dotenv
Again, we will use the simple toy model, that we also used in other guides. First, we load the data.
[2]:
data_folder = pathlib.Path("./examples/Literature data/simple model")
tracers_data = pd.read_csv(data_folder / "tracers.csv",
converters={'atom_mdv':ast.literal_eval, 'atom_ids':ast.literal_eval} # a trick to read lists from csv
)
reactions_data = pd.read_csv(data_folder / "reactions.csv")
flux_data = pd.read_csv(data_folder / "flux_measurements.csv")
ms_data = pd.read_csv(data_folder / "ms_measurements.csv",
converters={'labelled_atom_ids': ast.literal_eval} # a trick to read lists from csv
)
To run the Monte Carlo sampling simply set the run_montecarlo
argument to True in the define_runner() function. The settings for the Monte Carlo algorithm is defined in the options block using the define_options()
function. Please refer to the INCA documentation to see which options are available. The Monte Carlo algorithm require a estimate algorithm to be run first. Therefore, we also set the run_estimate=True
. Finally the run_simuation=True
to be able to open the estimate in
the INCA GUI.
NB. currently it is not possible to open the results of the Monte Carlo sampling in the INCA GUI.
The results of the Monte Carlo sampling will be saved to a separate .mat
file. This file will named <output_file>_mc.mat
, where <output_file>
is the output file given to the define_runner()
function.
[6]:
output_file = pathlib.Path("./examples//Literature data/simple model/simple_model_mc_tutorial.mat")
script = incawrapper.create_inca_script_from_data(reactions_data, tracers_data, flux_data, ms_data, experiment_ids=["exp1"])
script.add_to_block("options", incawrapper.define_options(fit_starts=5,sim_na=False))
script.add_to_block("runner", incawrapper.define_runner(output_file, run_estimate=True, run_simulation=True, run_montecarlo=True))
On larger models running monte carlo sampling can be computational heavy and thus take a while. Therefore, INCA produces a file called dump.mat
which contains the temporary results. If the process is interrupted the results are store in dump files. By default the run_inca function runs INCA in a temporary directory, thus if the process is interrupted the dump file with be destroyed. Therefore, we recommend to specify an execution directory, when running monte carlo sampling to keep the
results even is the process is interrupted.
[7]:
inca_directory = pathlib.Path(dotenv.get_key(dotenv.find_dotenv(), "INCA_base_directory"))
incawrapper.run_inca(script, INCA_base_directory=inca_directory, execution_directory=data_folder)
INCA script saved to examples/Literature data/simple model/inca_script.m.
Starting MATLAB engine...
ms_exp1 = 1x1 msdata object
fields: atoms id [idvs] more on state
F1
m = 1x1 model object
fields: [expts] [mets] notes [options] [rates] [states]
5 reactions (6 fluxes)
6 states (3 balanced, 1 source, 2 sink and 0 unbalanced)
6 metabolites
1 experiments
Directional
Iteration Residual Step-size derivative Lambda
0 9.99215e+11
1 2710.13 1 -1.07e+07 0.550165
2 1036.74 1 1.87e+03 0.183388
3 232.556 1 -130 0.178407
4 139.288 1 -11.8 0.0594691
5 136.533 1 -0.278 0.019823
Optimization terminated: Unconstrained optimum found.
Parameters converged to within tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99618e+11
1 8.2702e+08 1 -2.23e+10 842.504
2 3.45244e+07 1 -4.58e+07 280.835
3 9.2106e+06 0.663 -9.66e+06 93.6116
4 1.47034e+06 1 -1.58e+06 93.6116
5 64453.2 1 -1.65e+05 64.0253
6 4432 1 -7.58e+03 21.3418
7 2344.37 1 1.16e+04 7.11392
8 5177.44 0.578 -690 6.01035
9 5069.71 0.728 -241 12.0207
10 1094.97 1 1.33e+03 48.0828
11 189.275 1 341 47.7252
12 136.773 1 -2.45 15.9084
13 136.494 1 -0.0206 5.3028
Optimization terminated: Unconstrained optimum found.
Parameters converged to within tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99371e+11
1 9.45394e+11 0.0274 -9.72e+11 8.05144
2 149383 1 -3.02e+04 8.05144
3 6548.04 1 -1.6e+04 2.68381
4 1047.67 1 -160 0.894604
5 2494 1 3e+03 0.298201
6 2485.01 1 2.99e+03 0.596403
7 2432.09 1 2.94e+03 2.38561
8 2011.95 1 2.51e+03 19.0849
9 789.893 1 885 152.679
10 157.342 1 159 161.51
11 136.506 1 -0.411 53.8368
Optimization terminated: Unconstrained optimum found.
Parameters converged to within tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99241e+11
1 9.98837e+11 0.000202 -9.99e+11 0.212698
2 7500.53 1 -6.29e+05 0.212698
3 5782.11 1 -717 0.0708994
4 5048.38 1 290 0.0236331
5 5000.02 1 -0.992 0.00787771
Optimization terminated: Constrained optimum found.
Parameters converged to within tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.9957e+11
1 9.19397e+11 0.041 -9.58e+11 3.57529
2 4550.67 1 -5.56e+03 3.57529
3 5919.76 0.96 151 1.19176
4 5919.34 0.971 142 2.38352
5 5665.25 1 5.63e+03 9.5341
6 1367.9 1 1.43e+03 76.2728
7 1010.2 1 916 25.4243
8 169.481 1 240 27.2478
9 136.62 1 -1.28 9.08258
10 136.492 1 -0.00981 3.02753
Optimization terminated: Unconstrained optimum found.
Parameters converged to within tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 136.492
1 136.491 1 -0.000109 0.0003591
Optimization terminated: Unconstrained optimum found.
Parameters converged to within tolerance.
Estimation completed in 10.8600 seconds.
Preprocessing time: 7.1200 s
Computation time: 3.6000 s
Postprocessing time: 0.1400 s
Warning: Network is ill-conditioned.
Simulation completed in 0.1100 seconds.
Preprocessing time: 0.0900 s
Computation time: 0.0100 s
Postprocessing time: 0.0100 s
Iteration Error Norm
1 Initial
2 0.01068525
3 0.00143113
4 0.00394698
5 0.00525153
6 0.00144298
7 0.00311287
8 0.00270034
Monte Carlo simulation completed in 9.7400 seconds.
Preprocessing time: 0.1100 s
Computation time: 9.6300 s
--- 21.35728120803833 seconds -
The INCA run above produces two files.
simple_model_mc_tutorial.mat
- This regular output file which also can be opened in the INCA GUI. It contains the results of flux estimation.simple_model_mc_tutorial_mc.mat
- The Monte Carlo results. This contain the Monte Carlo samples and the resulting confidence intervals of all the parameters.
We can now load the results of both the flux estimation and the Monte Carlo sampling by setting the argument load_mc_data=True
. This expects the Monte Carlo results file to be named <output_file>_mc.mat
, so don’t renamed it.
[9]:
res = incawrapper.INCAResults(output_file, load_mc_data=True)
We now have a INCAResults
object which contain all the results from the INCA run. It contains the regular fitdata object.
[10]:
res.fitdata.fitted_parameters.head()
[10]:
type | id | eqn | val | std | lb | ub | unit | free | alf | chi2s | cont | cor | cov | vals | base | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Net flux | R1 | A -> B | 10.000000 | 0.000010 | [] | [] | [] | 0 | 0.05 | [] | 1 | [1.0, 8.947213030262141e-05, 2.001822930102039... | [1.0000002128446317e-10, 6.084410059115086e-11... | [] | {'id': []} |
1 | Net flux | R2 net | B <-> D | 6.084423 | 0.068003 | [] | [] | [] | 1 | 0.05 | [] | 1 | [8.947210479326169e-05, 1.0, 0.92843189384833,... | [6.08440832439161e-11, 0.004624462937176305, 0... | [] | {'id': []} |
2 | Exch flux | R2 exch | B <-> D | 6.622329 | 0.330811 | [] | [] | [] | 1 | 0.05 | [] | 1 | [2.001819888669558e-05, 0.9284318796575584, 0.... | [6.622242337828155e-11, 0.020886261902995223, ... | [] | {'id': []} |
3 | Net flux | R3 | B -> C + E | 1.957789 | 0.034002 | [] | [] | [] | 1 | 0.05 | [] | 1 | [5.757937359024585e-05, -0.9999999891879296, -... | [1.9577969020273533e-11, -0.002312231438166102... | [] | {'id': []} |
4 | Net flux | R4 | B + C -> D + E + E | 1.957789 | 0.034002 | [] | [] | [] | 0 | 0.05 | [] | 1 | [5.757937359024585e-05, -0.9999999891879296, -... | [1.9577969020273533e-11, -0.002312231438166102... | [] | {'id': []} |
But also a INCAMonteCarloResults
object in the .mc
attribute. This object contains the confidence intervals and the samples.
[11]:
res.mc.ci
[11]:
R1 | R2 net | R2 exch | R3 | R4 | R5 | exp1 F1 exp1_F1_0_0_1 norm | |
---|---|---|---|---|---|---|---|
lb | 9.999979 | 5.952222 | 5.998825 | 1.891729 | 1.891729 | 7.976103 | 0.951393 |
ub | 10.000019 | 6.216543 | 7.276211 | 2.023882 | 2.023882 | 8.108272 | 1.014831 |
[12]:
res.mc.samples.head()
[12]:
R1 | R2 net | R2 exch | R3 | R4 | R5 | exp1 F1 exp1_F1_0_0_1 norm | |
---|---|---|---|---|---|---|---|
0 | 10.000003 | 5.997718 | 6.259179 | 2.001142 | 2.001142 | 7.998861 | 0.976538 |
1 | 10.000001 | 6.127566 | 6.807506 | 1.936218 | 1.936218 | 8.063784 | 0.995646 |
2 | 10.000001 | 6.137113 | 6.855429 | 1.931444 | 1.931444 | 8.068557 | 1.003207 |
3 | 9.999994 | 6.066098 | 6.722543 | 1.966948 | 1.966948 | 8.033046 | 0.966709 |
4 | 10.000002 | 6.049678 | 6.577879 | 1.975162 | 1.975162 | 8.024840 | 0.982117 |
Each row in the .samples
dataframe is a sample from the Monte Carlo sampling algorithm. It is easy to visualize the samples of one parameter (e.g. reaction) as a histogram using matplotlib.
[13]:
import matplotlib.pyplot as plt
plt.hist(res.mc.samples["R2 net"], bins=50)
plt.show()
Alternatively, one can inspect the correlations using the pandas.plotting.scatter_matrix()
, note we recommend not plotting all parameters at the same time as this can make huge plots.
[14]:
g = pd.plotting.scatter_matrix(res.mc.samples.filter(['R1', 'R2 net', 'R2 exch', 'R3']), figsize=(10,10))
plt.tight_layout()
plt.show()
[ ]: