8. Low level API¶
Most users will use the create_script_from_data()
to create the INCAScript, but if one wants more control over the script it is good to known the how the underlying api works.
[26]:
import pandas as pd
import pathlib
from incawrapper import (
define_flux_measurements,
define_ms_data,
define_experiment,
define_model,
define_tracers,
define_options,
define_runner,
define_reactions,
INCAScript,
run_inca
)
import ast
data_folder = pathlib.Path("./examples/Literature data/simple model")
To illustrate how to use the INCAWrapper, we will use a small toy model with 5 reactions original used as a test model in [1,2]. As described in the Input data notebook, the INCAWrapper takes inputs in the form of pandas dataframes, which has to obey specific structure schema’s. Let’s have a look at the reactions data of our simple model.
[27]:
reactions_data = pd.read_csv(data_folder / "reactions.csv")
reactions_data.head()
[27]:
model | rxn_id | rxn_eqn | |
---|---|---|---|
0 | simple_model | R1 | A (abc) -> B (abc) |
1 | simple_model | R2 | B (abc) <-> D (abc) |
2 | simple_model | R3 | B (abc) -> C (bc) + E (a) |
3 | simple_model | R4 | B (abc) + C (de) -> D (bcd) + E (a) + E (e) |
4 | simple_model | R5 | D (abc) -> F (abc) |
We see that the simple toy model consist of 5 reactions each defined with an atom map and has a unique identifier. Lets move on to the tracer data.
[28]:
tracers_data = pd.read_csv(data_folder / "tracers.csv", converters={'atom_ids': ast.literal_eval, 'atom_mdv':ast.literal_eval}) # remove id, add prurity
tracers_data.head()
[28]:
experiment_id | met_id | tracer_id | atom_ids | ratio | atom_mdv | enrichment | |
---|---|---|---|---|---|---|---|
0 | exp1 | A | [2-13C]A | [2] | 1.0 | [0, 1] | 1 |
1 | exp2 | A | [1,2-13C]A | [1, 2] | 0.5 | [0.05, 0.95] | 1 |
Our data set contains two experiments each carried out with a different labelled substrate of metabolite A. In this data set we had measurements of exchange fluxes and ms data of some metabolites. Notice that we use the converters argument to properly read some the data (see XX for more information).
[29]:
flux_data = pd.read_csv(data_folder / "flux_measurements.csv")
ms_data = pd.read_csv(data_folder / "ms_measurements.csv",
converters={'labelled_atom_ids': ast.literal_eval, 'idv': ast.literal_eval, 'idv_std_error': ast.literal_eval}
)
flux_data.head()
[29]:
experiment_id | rxn_id | flux | flux_std_error | |
---|---|---|---|---|
0 | exp1 | R1 | 10.0 | 0.00001 |
[30]:
ms_data.head()
[30]:
experiment_id | met_id | ms_id | measurement_replicate | labelled_atom_ids | unlabelled_atoms | mass_isotope | intensity | intensity_std_error | time | |
---|---|---|---|---|---|---|---|---|---|---|
0 | exp1 | F | F1 | 1 | [1, 2, 3] | NaN | 0 | 0.0001 | 0.000002 | 0 |
1 | exp1 | F | F1 | 1 | [1, 2, 3] | NaN | 1 | 0.8008 | 0.016016 | 0 |
2 | exp1 | F | F1 | 1 | [1, 2, 3] | NaN | 2 | 0.1983 | 0.003966 | 0 |
3 | exp1 | F | F1 | 1 | [1, 2, 3] | NaN | 3 | 0.0009 | 0.000018 | 0 |
4 | exp2 | F | F1 | 1 | [1, 2, 3] | NaN | 0 | 0.0002 | 0.000002 | 0 |
8.1. Generate the MATLAB script¶
To prepare a model and data for 13C-MFA in INCA, we need to write a matlab script, which specifies to the model, the tracers and the measured data. The script is constructed incrementally, line by line, within an INCAScript object. When the INCAWrapper executes INCA, it essentially runs this script within the MATLAB environment.
To ensure the orderly execution of the MATLAB code, the structure of the INCAScript object is organized into discrete code blocks. Throughout this workflow, users progressively populate these code blocks one at a time until the script is fully formed. The central mechanism for this procedure involves employing the .add_to_block() method of the INCAScript object, in combination with a function that generates the corresponding MATLAB code string.
Now, let’s delve into the process of defining the model within the INCAScript. To achieve this, we make use of a script-writing function named define_reactions(). This function operates on a dataframe detailing the reactions and outputs a MATLAB code string that effectively defines the reactions within an INCA model.
[31]:
print(define_reactions(reactions_data))
% Create reactions
r = [...
reaction('A (abc) -> B (abc)', 'id', 'R1'),...
reaction('B (abc) <-> D (abc)', 'id', 'R2'),...
reaction('B (abc) -> C (bc) + E (a)', 'id', 'R3'),...
reaction('B (abc) + C (de) -> D (bcd) + E (a) + E (e)', 'id', 'R4'),...
reaction('D (abc) -> F (abc)', 'id', 'R5'),...
];
To add the reaction definition to the INCAScript
we use the .add_to_block()
method.
[32]:
script = INCAScript()
script.add_to_block('reactions', define_reactions(reactions_data))
We can view the reactions block in the INCAScript
.
[16]:
print(script.blocks['reactions'])
% REACTION BLOCK
% Create reactions
r = [...
reaction('A (abc) -> B (abc)', 'id', 'R1'),...
reaction('B (abc) <-> D (abc)', 'id', 'R2'),...
reaction('B (abc) -> C (bc) + E (a)', 'id', 'R3'),...
reaction('B (abc) + C (de) -> D (bcd) + E (a) + E (e)', 'id', 'R4'),...
reaction('D (abc) -> F (abc)', 'id', 'R5'),...
];
There are similar script writing functions to generate the other parts of the Matlab script. The experimental data is added to the INCAScript
on a per experiment basis. Thus, the functions which adds experimental data also takes a argument for the experiment id. One example is the define_tracers
:
[17]:
print(define_tracers(tracers_data, 'exp1'))
% define tracers used in exp1
t_exp1 = tracer({...
'[2-13C]A: A @ 2',...
});
t_exp1.frac = [1.0 ];
t_exp1.atoms.it(:,1) = [0,1];
Because the .add_to_block()
appends to the code block is best practice to include the INCAScript
instantiation and the .add_to_block()
calls within the same Jupyter-notebook cell. This will avoid adding the same code multiple times, when working a Jupyter-notebook file. In the following we add tracers, flux measurements, and ms measurements from the experiment with experiment id exp1.
[18]:
script = INCAScript()
script.add_to_block('reactions', define_reactions(reactions_data))
script.add_to_block('tracers', define_tracers(tracers_data, 'exp1'))
script.add_to_block('fluxes', define_flux_measurements(flux_data, 'exp1'))
script.add_to_block('ms_fragments', define_ms_data(ms_data, 'exp1'))
script.add_to_block('experiments', define_experiment('exp1', measurement_types=['data_flx', 'data_ms']))
Notice that that addition of data is a two step procedure. First, the data is added to the script and second the data is added to the experiment using the define_experiment()
function. This function takes the experiment id and a list of what measurement types are associated with this experiment.
We can view the script that we have build by simply printing the script object.
[ ]:
print(script)
clear functions
% REACTION BLOCK
% Create reactions
r = [...
reaction('A (abc) -> B (abc)', ['id'], ['R1']),...
reaction('B (abc) <-> D (abc)', ['id'], ['R2']),...
reaction('B (abc) -> C (bc) + E (a)', ['id'], ['R3']),...
reaction('B (abc) + C (de) -> D (bcd) + E (a) + E (e)', ['id'], ['R4']),...
reaction('D (abc) -> F (abc)', ['id'], ['R5']),...
];
% TRACERS BLOCK
% define tracers used in exp1
t_exp1 = tracer({...
'[1-13C]A: A @ 1',...
});
t_exp1.frac = [1.0 ];
t_exp1.atoms.it(:,1) = [0,1];
% FLUXES BLOCK
% define flux measurements for experiment exp1
f_exp1 = [...
data('R1', 'val', 10.0, 'std', 1e-05),...
];
% MS_FRAGMENTS BLOCK
% define mass spectrometry measurements for experiment exp1
ms_exp1 = [...
msdata('F1: F @ 1 2 3', 'more', 'C3H5O1'),...
];
% define mass spectrometry measurements for experiment exp1
ms_exp1{'F1'}.idvs = idv([[0.01;0.8;0.1;0.0009]], 'id', {'exp1_F1_0_0_1'}, 'std', [[0.0003;0.003;0.0008;0.001]], 'time', 0.0)
% EXPERIMENTAL_DATA BLOCK
e_exp1 = experiment(t_exp1, 'id', 'exp1', 'data_flx', f_exp1, 'data_ms', ms_exp1);
% MODEL BLOCK
% OPTIONS BLOCK
m.rates.flx.val = mod2stoich(m); % make sure the fluxes are feasible
% RUNNER BLOCK
We see that the three last blocks model, options and runner, have not been populated yet. To populate the model block we need to define what experiments should be included in the model. In the options block, we can changes the options/settings which influence how INCA is run, for example we can increase the number of restarts during the flux estimation procedure and turn off the natural abundance correction. Finally, the runner defines what algorithms INCA should run on the model. In this example we will run the estimation and the simulation algorithm.
[ ]:
# same as previous code
script = INCAScript()
script.add_to_block('reactions', define_reactions(reactions_data))
script.add_to_block('tracers', define_tracers(tracers_data, 'exp1'))
script.add_to_block('fluxes', define_flux_measurements(flux_data, 'exp1'))
script.add_to_block('ms_fragments', define_ms_data(ms_data, 'exp1'))
script.add_to_block('experiments', define_experiment('exp1', ['data_flx', 'data_ms']))
# new code to define model, options and runner
script.add_to_block('model', define_model(['exp1']))
script.add_to_block('options', define_options(fit_starts=20, sim_na=False))
script.add_to_block('runner', define_runner("/path/to/output/file.mat", run_estimate=True, run_simulation=True))
[ ]:
print(script)
clear functions
% REACTION BLOCK
% Create reactions
r = [...
reaction('A (abc) -> B (abc)', ['id'], ['R1']),...
reaction('B (abc) <-> D (abc)', ['id'], ['R2']),...
reaction('B (abc) -> C (bc) + E (a)', ['id'], ['R3']),...
reaction('B (abc) + C (de) -> D (bcd) + E (a) + E (e)', ['id'], ['R4']),...
reaction('D (abc) -> F (abc)', ['id'], ['R5']),...
];
% TRACERS BLOCK
% define tracers used in exp1
t_exp1 = tracer({...
'[1-13C]A: A @ 1',...
});
t_exp1.frac = [1.0 ];
t_exp1.atoms.it(:,1) = [0,1];
% FLUXES BLOCK
% define flux measurements for experiment exp1
f_exp1 = [...
data('R1', 'val', 10.0, 'std', 1e-05),...
];
% MS_FRAGMENTS BLOCK
% define mass spectrometry measurements for experiment exp1
ms_exp1 = [...
msdata('F1: F @ 1 2 3', 'more', 'C3H5O1'),...
];
% define mass spectrometry measurements for experiment exp1
ms_exp1{'F1'}.idvs = idv([[0.01;0.8;0.1;0.0009]], 'id', {'exp1_F1_0_0_1'}, 'std', [[0.0003;0.003;0.0008;0.001]], 'time', 0.0)
% EXPERIMENTAL_DATA BLOCK
e_exp1 = experiment(t_exp1, 'id', 'exp1', 'data_flx', f_exp1, 'data_ms', ms_exp1);
% MODEL BLOCK
m = model(r, 'expts', [e_exp1]);
% OPTIONS BLOCK
m.options = option('fit_starts', 20, 'sim_na', false)
m.rates.flx.val = mod2stoich(m); % make sure the fluxes are feasible
% RUNNER BLOCK
f = estimate(m);
s=simulate(m);
filename = '/path/to/output/file.mat';
save(filename, 'f', 's', 'm');
[33]:
# replace fake path with actual path, this is just to hide my actual path from being displayed in the docs.
# It is not necessary in the actual workflow
script.blocks['runner'] = script.blocks['runner'].replace("'/path/to/output/file.mat'", "'" + str((data_folder / 'simple_model_inca_from_low_level_api.mat').resolve()) + "'")
Now we are ready to run the flux estimation algorithm in INCA.
[ ]:
import dotenv
inca_directory = pathlib.Path(dotenv.get_key(dotenv.find_dotenv(), "INCA_base_directory")) # Simply replace this with the path to your INCA installation
run_inca(script, INCA_base_directory=inca_directory)
INCA script saved to /var/folders/z6/mxpxh4k56tv0h0ff41vmx7gdwtlpvp/T/tmpvymjjsts/inca_script.m.
Starting MATLAB engine...
ms_exp1 = 1x1 msdata object
fields: atoms id [idvs] more on state
F1
m = 1x1 model object
fields: [expts] [mets] notes [options] [rates] [states]
5 reactions (6 fluxes)
6 states (3 balanced, 1 source, 2 sink and 0 unbalanced)
6 metabolites
1 experiments
Directional
Iteration Residual Step-size derivative Lambda
0 9.99461e+11
1 9.97934e+11 0.000764 -9.99e+11 1.60177
2 1.42897e+06 1 -3.06e+06 1.60177
3 80454.8 1 -1.92e+05 0.533923
4 8606.5 1 -4.85e+03 0.177974
5 8153.72 1 -3.28 0.0593248
Optimization terminated: Constrained optimum found.
Parameters converged to within tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99276e+11
1 9.96908e+11 0.00119 -9.98e+11 2.99791
2 2.82654e+06 1 -2.71e+06 2.99791
3 206220 1 -4.07e+05 0.999304
4 10942.2 1 -1.83e+04 0.364917
5 8154.58 1 -47.9 0.121639
6 8153.7 1 -0.00032 0.0405463
Optimization terminated: Constrained optimum found.
Norm of gradient less than tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99496e+11
1 9.97013e+11 0.00124 -9.98e+11 8.25557
2 7.49812e+11 0.133 -8.65e+11 8.25557
3 1.72508e+06 1 -3.54e+04 8.25557
4 198278 1 -2.51e+05 2.75186
5 31932.4 1 -2.07e+04 1.35509
6 10888.2 0.68 748 0.451696
7 9267.69 1 -0.00839 0.451696
Optimization terminated: Constrained optimum found.
Norm of gradient less than tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99592e+11
1 9.97293e+11 0.00115 -9.98e+11 0.256206
2 9263.85 1 -1.4e+05 0.256206
3 8153.7 1 2.63 0.0854019
Optimization terminated: Constrained optimum found.
Parameters converged to within tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99754e+11
1 9.9861e+11 0.000573 -9.98e+11 3.17144
2 3.05221e+06 1 -1.37e+07 3.17144
3 201540 1 -4.43e+05 1.05715
4 10824.2 1 -1.77e+04 0.353927
5 8154.51 1 -45 0.117976
6 8153.7 1 -0.000282 0.0393252
Optimization terminated: Constrained optimum found.
Norm of gradient less than tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99757e+11
1 9.98946e+11 0.000406 -9.99e+11 4.83394
2 1.82069e+06 1 -3.57e+07 4.83394
3 18373.2 1 -1.19e+05 1.61131
4 8164.33 1 -309 0.537104
5 8153.7 1 -0.0134 0.179035
Optimization terminated: Constrained optimum found.
Parameters converged to within tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99103e+11
1 9.97131e+11 0.000987 -9.98e+11 0.592389
2 203195 1 -5.81e+05 0.592389
3 10931.3 1 -1.8e+04 0.197463
4 8154.57 1 -47.6 0.065821
5 8153.7 1 -0.000303 0.0219403
Optimization terminated: Constrained optimum found.
Norm of gradient less than tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99644e+11
1 9.97232e+11 0.00121 -9.95e+11 13.1605
2 9.83399e+11 0.00696 -9.9e+11 13.1605
3 2.42944e+11 0.503 -4.89e+11 13.1605
4 2.13269e+06 1 -12.5 13.1605
5 1.10561e+06 1 -3.68e+05 4.38683
6 117851 1 -1.58e+05 2.40399
7 28352.2 1 -9.65e+03 1.02639
8 10233.5 0.747 -41.7 0.342129
9 9267.69 1 -0.00379 0.342129
Optimization terminated: Constrained optimum found.
Norm of gradient less than tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99422e+11
1 9.97065e+11 0.00118 -9.97e+11 6.62877
2 9.50368e+11 0.0237 -9.73e+11 6.62877
3 2.43406e+11 0.494 -4.81e+11 6.62877
4 2.13269e+06 1 -6.35 6.62877
5 790061 1 -4.06e+05 2.20959
6 82008.8 1 -1.1e+05 1.41421
7 27252.5 1 -5.16e+03 0.531876
8 9614.73 0.839 594 0.177292
9 9267.69 1 -0.000705 0.177292
Optimization terminated: Constrained optimum found.
Norm of gradient less than tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.98733e+11
1 9.98153e+11 0.00029 -9.98e+11 2.63162
2 2.18933e+06 1 -6.31e+06 2.63162
3 142395 1 -3.08e+05 0.877207
4 9549.51 1 -1.1e+04 0.292402
5 8153.92 1 -17.3 0.0974675
6 8153.7 1 -4.39e-05 0.0324892
Optimization terminated: Constrained optimum found.
Norm of gradient less than tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99412e+11
1 9.96285e+11 0.00157 -9.98e+11 7.87281
2 3.80862e+11 0.382 -6.16e+11 7.87281
3 925902 1 -1.39e+05 7.87281
4 96902.3 1 -1.31e+05 2.62427
5 27664.8 1 -6.98e+03 1.05019
6 9905.18 0.79 147 0.350062
7 9267.69 1 -0.00256 0.350062
Optimization terminated: Constrained optimum found.
Norm of gradient less than tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99643e+11
1 9.9789e+11 0.000878 -9.98e+11 6.90612
2 3.57725e+11 0.401 -5.97e+11 6.90612
3 847758 1 -1.41e+05 6.90612
4 88226.1 1 -1.19e+05 2.30204
5 27416.3 1 -5.91e+03 0.890441
6 9740.78 0.816 361 0.296814
7 9267.69 1 -0.00161 0.296814
Optimization terminated: Constrained optimum found.
Norm of gradient less than tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99477e+11
1 9.97624e+11 0.000928 -9.99e+11 0.759894
2 631059 1 -1.1e+06 0.759894
3 27938.5 1 -7.5e+04 0.253298
4 8193.37 1 -808 0.0844327
5 8153.7 1 -0.0873 0.0281442
Optimization terminated: Constrained optimum found.
Parameters converged to within tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99804e+11
1 9.98903e+11 0.000451 -9.99e+11 1.5183
2 1.22163e+06 1 -1.01e+07 1.5183
3 64389.6 1 -1.61e+05 0.506101
4 8438.17 1 -3.45e+03 0.1687
5 8153.71 1 -1.64 0.0562334
Optimization terminated: Constrained optimum found.
Parameters converged to within tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99437e+11
1 9.96992e+11 0.00122 -9.98e+11 3.72851
2 3.68594e+06 1 -3.8e+06 3.72851
3 287827 1 -5.43e+05 1.24284
4 13241.8 1 -2.82e+04 0.48519
5 8156.57 1 -115 0.16173
6 8153.7 1 -0.00182 0.05391
Optimization terminated: Constrained optimum found.
Norm of gradient less than tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99732e+11
1 9.98354e+11 0.00069 -9.98e+11 6.9614
2 7.60754e+11 0.127 -8.71e+11 6.9614
3 1.59821e+06 1 -2.86e+04 6.9614
4 180930 1 -2.32e+05 2.32047
5 31050.2 1 -1.82e+04 1.11703
6 10798.5 0.688 494 0.372343
7 9267.69 1 -0.00654 0.372343
Optimization terminated: Constrained optimum found.
Norm of gradient less than tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99245e+11
1 9.96412e+11 0.00142 -9.98e+11 1.30251
2 1.40943e+06 1 -8.68e+05 1.30251
3 79523.6 1 -1.9e+05 0.434169
4 8595.83 1 -4.77e+03 0.144723
5 8153.72 1 -3.16 0.048241
Optimization terminated: Constrained optimum found.
Parameters converged to within tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99414e+11
1 1.8588e+06 1 -5.85e+07 2.52112
2 99555.8 1 -2.53e+05 0.840374
3 8849.47 1 -6.64e+03 0.280125
4 8153.75 1 -6.2 0.0933749
Optimization terminated: Constrained optimum found.
Parameters converged to within tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99534e+11
1 9.98138e+11 0.000698 -9.99e+11 0.446625
2 216198 1 -1.04e+06 0.446625
3 11271.7 1 -1.96e+04 0.148875
4 8154.8 1 -56.4 0.049625
5 8153.7 1 -0.000418 0.0165417
Optimization terminated: Constrained optimum found.
Norm of gradient less than tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 9.99578e+11
1 9.98277e+11 0.000651 -9.99e+11 4.39459
2 2.9496e+06 1 -1.24e+07 4.39459
3 183948 1 -4.25e+05 1.46486
4 10407.4 1 -1.57e+04 0.488288
5 8154.28 1 -35.1 0.162763
6 8153.7 1 -0.000181 0.0542543
Optimization terminated: Constrained optimum found.
Norm of gradient less than tolerance.
Directional
Iteration Residual Step-size derivative Lambda
0 8153.7
1 8153.7 1 -1.83e-15 0.000211752
Optimization terminated: Constrained optimum found.
Norm of gradient less than tolerance.
Estimation completed in 11.4100 seconds.
Preprocessing time: 7.3100 s
Computation time: 3.9600 s
Postprocessing time: 0.1400 s
Warning: Network is ill-conditioned.
Simulation completed in 0.1000 seconds.
Preprocessing time: 0.0900 s
Computation time: 0.0100 s
Postprocessing time: 0.0000 s
--- 21.26518416404724 seconds -
INCA has now done flux estimation on our model and saved it to a .mat file, which was specified in the INCAScript
(here simple_model_inca.mat). This file can be open using the INCAResults
workflow described in a later notebook or directly in the INCA GUI using “Open fluxmap”.
8.1.1. Note about opening models in INCA GUI¶
There are multiple ways to open models in the INCA GUI: “Open model”, “Open fluxmap”. Both methods opens the .mat file with is generated by run_inca()
. “Open model” will open the model with any associated experiments and data, but it will not include the results of flux estimation. To get the results of the estimate, continuation or monte carlo algorithms use the “Open fluxmap” instead. This will load both the model and the results of the algorithms which were applied. One thing to know is
that “Open fluxmap” in the INCA GUI fails if there is no simulation in the .mat file. Thus, to be able to open the results in the INCA GUI it is required to set run_simulation=True
in the define_runner()
.
8.2. References¶
[1] M. R. Antoniewicz, J. K. Kelleher, and G. Stephanopoulos, “Determination of confidence intervals of metabolic fluxes estimated from stable isotope measurements,” Metabolic Engineering, vol. 8, no. 4, pp. 324–337, Jul. 2006, doi: 10.1016/j.ymben.2006.01.004.
[2] M. R. Antoniewicz, J. K. Kelleher, and G. Stephanopoulos, “Elementary metabolite units (EMU): A novel framework for modeling isotopic distributions,” Metabolic Engineering, vol. 9, no. 1, pp. 68–86, Jan. 2007, doi: 10.1016/j.ymben.2006.09.001.