nonlinear optimization techniques for genome …

249
The Pennsylvania State University The Graduate School College of Engineering NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME-SCALE FLUX ELUCIDATION AND KINETIC PARAMETERIZATION A Dissertation in Chemical Engineering by Saratram Gopalakrishnan © 2019 Saratram Gopalakrishnan Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2019

Upload: others

Post on 25-Nov-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

The Pennsylvania State University

The Graduate School

College of Engineering

NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME-SCALE FLUX

ELUCIDATION AND KINETIC PARAMETERIZATION

A Dissertation in

Chemical Engineering

by

Saratram Gopalakrishnan

© 2019 Saratram Gopalakrishnan

Submitted in Partial Fulfillment

of the Requirements

for the Degree of

Doctor of Philosophy

August 2019

Page 2: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

The dissertation of Saratram Gopalakrishnan was reviewed and approved* by the

following:

Costas D. Maranas

Donald B. Broughton Professor of Chemical Engineering

Dissertation Advisor

Chair of Committee

Phillip Savage

Walter L. Robb Family Department Head Chair of CHE

Kristen Fichthorn

Merrell Fenske Professor of Chemical Engineering

Professor of Physics

Andrew Patterson

Tombros Early Career Professor

Associate Professor of Molecular Toxicology

Associate Professor of Biochemistry & Molecular Biology

*Signatures are on file in the Graduate School

Page 3: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

iii

ABSTRACT

Modeling metabolism elucidates the relationship between the genetic state, the

environment, and the phenotype of an organism which provides insights into its biological

objectives and informs metabolic engineering strategies. Steady-state metabolism is

typically modeled using stoichiometric frameworks based on Flux Balance Analysis (FBA)

which cannot capture the effect of intracellular metabolite concentrations, enzyme

abundances, and regulatory effects. This often results in over-prediction of fluxes or

prediction of metabolic states that are not physiologically relevant. A first step towards the

construction of predictive metabolic models is the inclusion of kinetic descriptions for

metabolic reactions that enable the model to faithfully capture the influence of metabolite

concentrations. Since kinetic information generally cannot be imported from databases

such as BRENDA due to paucity of organism-specific kinetic descriptions or differences

in assay conditions, in vivo kinetic parameters must be estimated using metabolic flux and

intracellular concentration data. This thesis details the development of nonlinear

regression-based tools to first elucidate genome-scale metabolic fluxes for all intracellular

reactions using 13C-Metabolic Flux Analysis (13C-MFA) and then use this fluxomic data

to construct a large-scale kinetic model of metabolism that recapitulates the effects of

single gene-deletions.

Metabolic models used in 13C-MFA generally include a limited number of reactions

primarily from central metabolism. They typically omit degradation pathways, complete

cofactor balances, and atom transition contributions for reactions outside central

metabolism. Scaling up 13C-MFA to the genome-scale first requires the construction of a

Page 4: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

iv

genome-scale carbon mapping model that accurately traces the path of all carbon atoms

through the various intracellular reactions. Two mapping models imEco726 and imSyn617

are constructed for E. coli and Synechocystis PCC 6803, respectively. imEco726 is

deployed for steady-state flux elucidation in E. coli to reveal the expansion in flux ranges

relative to a core metabolic model due to the inclusion of redundant carbon paths and

elucidate the loss of information arising from projecting fluxes elucidated from core

models onto expanded models for subsequent analyses such as strain design and kinetic

modeling. imSyn617 is deployed for flux elucidation in Synechocystis using transient

labeling data to uncover the role of a novel bifurcated pathway topologies central to

maximizing the routing of carbons towards growth.

Finally, K-FIT, a decomposition-based approach for estimating kinetic parameters given

steady-state fluxomic data is introduced. K-FIT offers orders of magnitude improvements

in CPU time over meta-heuristic based approaches. The speed-up is mostly due to the

efficient identification of steady-state fluxes using a fixed-point iteration scheme that

iterates between two linear sub-problems, thereby largely bypassing the computationally

expensive numerical integration steps. The applicability of this approach to large-scale

models is demonstrated by parameterizing an expanded kinetic model for E. coli (307

reactions and 258 metabolites) using fluxomic data for six mutants to explain the role of

flux rerouting through energy metabolism to meet biosynthetic ATP and NADPH

demands. The speed-up afforded by K-FIT is transformational as it enables follow-up

robustness of inference analyses and optimal design of experiments that can inform

metabolic engineering strategies.

Page 5: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

v

TABLE OF CONTENTS

LIST OF FIGURES ..................................................................................................... viii

LIST OF TABLES ....................................................................................................... x

ACKNOWLEDGEMENTS ......................................................................................... xi

Chapter 1

Introduction......................................................................................................... 1

1.1. Modeling metabolism .................................................................................... 1

1.2. Requirements for constructing predictive models of metabolism ................. 4

1.3. Flux elucidation using 13C-MFA .................................................................. 5

1.4. Construction of kinetic models of metabolism .............................................. 10

1.5. Aim and outline of the thesis ......................................................................... 10

Chapter 2

13C Metabolic flux analysis at the genome-scale .................................................... 21

2.1. Introduction .................................................................................................... 21

2.2. Methods ......................................................................................................... 26

2.2.1. Genome-scale atom mapping model ................................................... 26

2.2.2. Flux estimation procedure ................................................................... 27

2.2.3. Confidence intervals ............................................................................ 28

2.3. Results............................................................................................................ 29

2.3.1. Active EMU network .......................................................................... 29

2.3.2. Flux identifiability and statistical validity of the model ...................... 33

2.3.3. Flux and range estimation at the genome-scale ................................... 35

2.4. Discussion ...................................................................................................... 42

Chapter 3

Elucidation of photoautotrophic carbon flux topology in Synechocystis

PCC 6803 using genome-scale carbon mapping models .................................. 63

3.1. Introduction .................................................................................................... 63

3.2. Methods ......................................................................................................... 67

3.2.1. Construction of imSyn617 ................................................................... 67

3.2.2. Algorithmic procedure for flux estimation based on least-squares

minimization .................................................................................................. 69

3.3. Results............................................................................................................ 70

3.3.1. New carbon paths covered by mapping model imSyn617 .................. 70

3.3.2. Comparison of elucidated fluxes between using imSyn617 and

core mapping models ..................................................................................... 72

Page 6: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

vi

3.3.3. New insights on carbon paths gained using imSyn617 ....................... 76

3.4. Discussion ...................................................................................................... 79

Chapter 4

K-FIT: An accelerated kinetic parameterization algorithm using steady-

state fluxomic data .............................................................................................. 98

4.1. Introduction .................................................................................................... 98

4.2. Methods ......................................................................................................... 103

4.2.1. Kinetic parameterization using K-FIT ................................................. 103

4.2.2. Construction of the expanded kinetic model for E. coli, k-ecoli307 ... 104

4.3. Results............................................................................................................ 105

4.3.1. The K-FIT algorithm ........................................................................... 105

4.3.2. Benchmarking K-FIT against Ensemble Modeling ............................. 107

4.3.3. Parameterization of a kinetic model (k-ecoli307) for E. coli with

near-genome-scale coverage ......................................................................... 111

4.4. Discussion ...................................................................................................... 117

Chapter 5

Summary and future work ................................................................................ 136

5.1. Summary ........................................................................................................ 136

5.2. Completed and ongoing research ................................................................... 139

5.3. Future directions ............................................................................................ 141

Appendix A

Flux elucidation at isotopic steady-state ........................................................... 144

A.1. Predicting labeling patterns .......................................................................... 144

A.2. Least-squares NLP ........................................................................................ 145

A.3. Implementation ............................................................................................. 146

A.4. Estimation of confidence intervals ............................................................... 147

Appendix B

Flux elucidation procedure for isotopic instationary MFA ............................ 149

B.1. Least-squares NLP for flux and pool size estimation ................................... 149

B.2. Dynamic EMU balances and simulation of labeling distributions ............... 155

B.3. An improved algorithm for simulating labeling dynamics and

sensitivities .................................................................................................... 163

Page 7: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

vii

Appendix C

Mathematical description of K-FIT .................................................................. 166

C.1. Overview of elementary step decomposition ................................................ 166

C.2. Nonlinear least-squares regression0based procedure for kinetic

parameterization ............................................................................................ 181

C.3. K-SOLVE: Anchoring kinetic parameters to the WT flux distributions ...... 190

C.4. SSF-Evaluator: Evaluation of steady-state fluxes for the mutant

networks using the kinetic parameter assignments of K-SOLVE ................. 196

C.4.1. Fixed-point iteration (FPI) .................................................................. 199

C.4.2. Netwon’s method for accelerating convergence ................................. 200

C.4.3. Richardson’s Extrapolation when 𝑱 becomes singular ....................... 204

C.4.4. Integration of FPI, Newton’s method, and semi-implicit

integration into a single pipeline ................................................................... 208

C.5. NLP problem K-FIT ...................................................................................... 210

C.6. K-UPDATE procedure that checks for convergence and updates kinetic

parameters using the approximate gradient and Hessian of 𝜙 ...................... 212

C.7. Algorithmic description of K-FIT ................................................................. 219

References ............................................................................................................................... 222

Page 8: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

viii

LIST OF FIGURES

Figure 1.1: A toy reaction network example for MFA. ........................................................... 16

Figure 1.2: Isotopomers, cumomers, and EMUs for metabolite A. ......................................... 17

Figure 2.1: Comparison of prediction of experimentally observed amino acid MS

data by the core model and the GSM model .................................................. 50

Figure 2.2: Comparison of fluxes elucidated using 2-13C-glucose with the core

model and GSM model... .................................................................................. 51

Figure 2.3: Resolution of energy metabolism in core model and GSM model. ............ 56

Figure 2.4: Loss of information flux ranges are estimated using FVA with core

model-based MFA derived flux ranges as constraints... ............................... 57

Figure 2.5: Flux distribution comparison for core model and GSM model using 5-

13C glucose tracer... .......................................................................................... 58

Figure 3.1: Representation of central metabolism in Synechocystis ............................... 84

Figure 3.2: Carbon incorporation paths and conserved moiety cycling in

imSyn617 ............................................................................................................ 85

Figure 3.3: Recapitulation of experimentally observed labeling distributions .............. 87

Figure 3.4: Flux ranges for central metabolism in Synechocystis ................................... 88

Figure 3.5: Bifurcated topology in the photorespiratory pathway and the TCA

cycle ..................................................................................................................... 89

Figure 3.6: Recapitulation of labeling dynamics of CBB intermediates ........................ 91

Figure 3.7: Carbon positional shifts in upper glycolysis of Synechocystis .................... 93

Figure 3.8: F-Test on the oxidative pentose phosphate pathway ..................................... 95

Figure 3.9: F-Test on Transaldolase .................................................................................... 97

Figure 4.1: Overview of the core loop of the K-FIT algorithm ....................................... 122

Figure 4.2: Flux distribution through central metabolism in k-ecoli307 ........................ 123

Figure 4.3: Uncertainty in estimation of Michaelis-Menten parameters in k-

ecoli307 ............................................................................................................... 129

Page 9: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

ix

Figure 4.4: Overview of the K-FIT algorithm showing the flow of information

between various components ........................................................................... 130

Figure 4.5: Test models used for benchmarking the performance of K-FIT against

GA-based EM procedure .................................................................................. 131

Figure 4.6: Uncertainty in estimation of kinetic parameters and WT enzyme

fractions ............................................................................................................... 134

Figure B.1.: Flux balance for EMU M23 ............................................................................. 158

Page 10: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

x

LIST OF TABLES

Table 1.1: Reaction stoichiometry and atom mapping for toy network. .................................. 20

Table 2.1: 𝜒2 degrees of freedom for the core model and the genome-scale model. ... 48

Table 2.2: Additional suggested MS measurements for resolving various alternate

routes..................................................................................................................... 49

Table 4.1: Comparison of product yields predicted by k-ecoli307 against experimental

yields.. .................................................................................................................... 49

Table B.1: Four types of reaction classes impacting EMU balances.. .................................... 156

Table C.1: List of elementary steps describing the catalytic mechanism and regulation of

enzyme activity.. .................................................................................................... 167

Table C.2: Elementary step decomposition for various reactions.. ......................................... 180

Page 11: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

xi

ACKNOWLEDGEMENTS

Completing the requirements for a PhD degree requires the professional and personal contribution

of many people. First and foremost, I would like to extend my most sincere gratitude to my advisor

Dr. Costas Maranas who guided me through every step of the way, taught me to perform high

quality research and communicate the work in a coherent manner. The long-term goals of this

project have been achievable largely due to his broad and deep scientific knowledge and expert

student advising style. His strong emphasis on communication of research in addition to his

guidance on application of correct research methodology has help shape me into the researcher I

am today. Words cannot express the crucial role played by the constant encouragement and

unwavering faith from my parents Gopal and Kala in the successful completion of my PhD. My

brother Saran deserves a special mention for providing an outsider’s perspective to scientific

research and bringing to my attention the importance of recording failed experiments. I would also

like to thank Rajib Saha, Akhil Kumar, Anupam Chowdhury, Ali Khodayari, Satyakam Dash, Ratul

Chowdhury, Shyam Srinivasan, John Hendry, Thomas Mueller, and the other members of Dr.

Maranas’ research group for all the engaging discussions, research and otherwise, on an almost

daily basis. Finally, I would like to thank Achyut, Gaurav Kumar, Sandeep, and Arpan Sircar for

providing a weekly reminder that a world outside of research also exists.

Page 12: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

Chapter 1

Introduction

1.1. Modeling metabolism

Metabolism is a complex network of biochemical reactions that fuels growth and

homeostasis in all organisms and determines its phenotype. By leveraging pathways of

enzyme-catalyzed reactions, metabolism enables the production of antibiotics,

nutraceuticals, and biopolymers at ambient conditions (temperature and pressure) that

would typically be economically unviable using traditional chemical processes.

Quantification of metabolism provides insights into driving forces behind cellular

physiology and is required to study the pathophysiology of non-infectious diseases,

identify efficient intervention strategies to aid drug discovery, and suggest engineering

strategies to increase the production of high-value chemicals. Owing to its complexity, it

is desirable to study metabolism with the aid of predictive mathematical models which

provide insights into pathway usage. Furthermore, with advancements in gene-editing

technologies, the emphasis falls on predictive models to inform decisions on metabolic

engineering and accelerate build-design-test cycles for the construction of engineered

organisms capable of carrying out specialized functions.

Generally, metabolism is modeled at metabolic steady-state conditions where the

concentrations of intracellular metabolites, enzymes, and the other cellular components are

unchanging. If 𝐼 = {1,2, … ,𝑀} is the set of metabolites, 𝐽 = {1,2, … ,𝑁} is the set of

Page 13: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

2

reactions in a metabolic model, 𝑣𝑗 is the metabolic flux (reaction rate per cell) through

reaction 𝑗 ∈ 𝐽, and 𝑆𝑖𝑗 is the stoichiometric coefficient for metabolite 𝑖 ∈ 𝐼 in reaction 𝑗 ∈

𝐽, conservation of mass across any metabolite 𝑖 at pseudo-steady-state is defined as:

∑𝑆𝑖𝑗𝑣𝑗

𝑁

𝑗=1

= 0

Since the number of metabolites is always less than the number of reactions in the

metabolic model, the above equality represents the set of all feasible metabolic flux

distributions attainable by the metabolic model. 𝑣𝑗 is actually expressed as a function of

enzyme concentration and metabolite concentrations that represents the kinetic rate law for

the enzyme-catalyzed reaction 𝑗. However, since all concentrations are unchanging at

metabolic steady-state, concentrations and kinetic constants are lumped together into the

quantity 𝑣𝑗 in the stoichiometric framework.

Flux prediction in the stoichiometric framework is generally performed using Flux Balance

Analysis (FBA) (Varma and Palsson, 1994) which elucidates fluxes by maximizing a

“biological objective” such as growth rate. The corresponding flux ranges are elucidated

using Flux Variability Analysis (FVA) (Mahadevan and Schilling, 2003) to account for the

fact that FBA solves an underdetermined system of linear algebraic equations which can

have alternate solutions. Large flux ranges reported by FVA have motivated the

development of more data-driven approaches such as 13C-Metabolic Flux Analysis (13C-

MFA) which traces metabolism using a stable-isotope tracer (usually 13C) and elucidates

fluxes by leveraging the property that different pathways rearrange the carbon backbone of

Page 14: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

3

intracellular metabolites in different ways. Although, 13C-MFA affords higher accuracy

in flux elucidation, it requires as much as two minutes to elucidate all fluxes in central

metabolism with high precision. In comparison, FBA predicts fluxes by solving a linear

programming (LP) problem and is able to report fluxes for the same metabolic network

within a fraction of a second, albeit with very low precision. Although 13C-MFA is able

to provide more meaningful insights, it must be noted that it is an analysis tool and has no

predictive capabilities whatsoever. For a standard model organism such as E. coli, FBA

predicts that the maximum biomass yield is 93 gDW/mol-glucose which is 17% higher

than the actual biomass yield of 79.2 gDW/mol-glucose for the wild-type (WT) strain of

E. coli grown in M9 minimal media under aerobic conditions with glucose as the sole

carbon source (Feist et al., 2007). This is because, FBA cannot capture acetate secretion by

E. coli which is driven by regulation of enzyme activities and reaction kinetics.

Several strain design algorithms have been designed in the stoichiometric framework such

as Optknock (Burgard et al., 2003), RobustKnock (Tepper and Shlomi, 2010), BiMOMA

(Kim et al., 2011), and OptForce (Ranganathan et al., 2010). These approaches have

successfully aided the construction of glutamate and succinate overproducing strains in E.

coli (Kim et al., 2011), hydrogen overproduction in Clostridium acetobutylicum and

Methylobacterium extorquens (Pharkya et al., 2004), glycerol overproduction in

Saccharomyces cerevisiae (Patil et al., 2005), fatty acid production in E. coli (Ranganathan

et al., 2012; Xu et al., 2011) and overproduction of flavonoid precursor shikimate in

Saccharomyces cerevisiae (Suastegui et al., 2017). However, these frameworks are unable

to identify interventions to regulatory interactions such as the ptsG knockout for succinate

Page 15: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

4

overproduction in E. coli (Chowdhury et al., 2015b) and engineering the transcription

regulator FapR for malonyl-CoA overproduction in E. coli (Xu et al., 2014). Limitations

with stoichiometric frameworks stem from the limited ability to capture the effects of

metabolomic fluctuations, proteomic fluctuations, enzyme saturation, and allosteric

regulation of enzyme activity (Chowdhury et al., 2015a; Saa and Nielsen, 2017).

Furthermore, stoichiometric frameworks afford limited support for integration of

transcriptomic, proteomic, and metabolomic datasets (Machado and Herrgard, 2014; Tian

and Reed, 2018) as changes in gene expression do not translate to proportional changes in

metabolic fluxes and the lack of kinetic descriptions precludes the investigation of model

dynamics, limiting assessment to metabolic steady-states only.

1.2. Requirements for constructing predictive models of metabolism

In response to these limitations, it is of interest to construct models of metabolism that can

explain and support the integration of metabolomic, proteomic, and transcriptomic data in

addition to fluxomic data. These facets of metabolism are captured by kinetic models that

relate fluxes to both metabolite concentrations and enzyme abundances. However, unlike

stoichiometric models that are constructed using genome annotations, biomass

composition measurements, and experimental yield measurements, construction of a

kinetic model is substantially more demanding in terms of data requirements and

appropriate kinetic parameter identification. The key requirements for parameterizing a

kinetic model include (i) precise metabolic flux distributions (generally obtained using

13C-MFA) in multiple genetic and/or environmentally perturbed conditions, (ii) an

appropriate rate law formalism that relates fluxes to metabolite concentrations, and (iii) an

Page 16: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

5

efficient procedure for identifying the optimal kinetic parameters that recapitulates the

available experimental data. These requirements are further elaborated in the following

subsections.

1.3. Flux elucidation using 13C-MFA

The objective of 13C-MFA is to identify a suitable flux distribution that recapitulates the

labeling distribution of intracellular metabolites measured using NMR spectroscopy or

Mass Spectrometry using nonlinear least-squares regression. The primary requirements for

performing 13C-MFA at the genome-scale are (i) the availability of a well curated GSM

model, (ii) availability of a curated atom mapping model, (iii) availability of partial

positional labeling information, and (iv) a simulation platform for predicting intracellular

metabolite labeling distributions given flux distributions. For well-studied model

organisms such as E. coli, curated GSM models that faithfully capture the dispensability

of the reactions in the model and the yields of biomass and metabolic byproducts are

available and can be reliably used for flux elucidation. Such models do not contain non-

native pathways and accurately predict the functionality of existing metabolic pathways.

This is typically quantified using the sensitivity and specificity metrics that represent the

fraction of correctly predicted viable and lethal mutants, respectively given a set of gene

knockout data. Specificity is lowered when the model fails to recapitulate the essentiality

of any particular reaction due to the presence of alternate pathways with providing the same

functionality and sensitivity is lowered when the model incorrectly identifies a non-

essential gene as essential (Zomorrodi and Maranas, 2010) due to missing reactions in the

model. Generally, specificity can be improved by accounting for condition-specific

Page 17: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

6

expression using frameworks such as R-GPRs (Nazem-Bokaee et al., 2016) and sensitivity

can be improved by looking for alternate genes capable of performing the same function

using a bidirectional BLAST search (Zomorrodi and Maranas, 2010). Models with low

sensitivity are often more difficult to resolve as the issue stems from incomplete and/or

incorrect gene annotation as well as uncharacterized side reactions catalyzed by already

annotated reactions. Models for well characterized organisms such as E. coli and

Synechocystis PCC 6803 typically have a sensitivity and specificity >80% and can be safely

used for 13C-MFA. On the other hand, less characterized organisms such as Clostridium

thermocellum (Xiong et al., 2018) must be deployed with caution for flux elucidation using

13C-MFA.

Atom mapping information for central metabolism remains conserved across all species

and is largely readily available. Atom mapping for peripheral metabolism can be obtained

from online databases such as MetaCyc (Caspi et al., 2014), KEGG (Tanabe and Kanehisa,

2012) and MetRxn (Kumar et al., 2012). When information is unavailable from databases,

automated mapping algorithms such as MCS (Chen et al., 2013), PMCD (Jochum et al.,

1980), EC (Morgan, 1965), MWED (Latendresse et al., 2012), and CLCA (Kumar and

Maranas, 2014) may be used to infer plausible mappings. Care must be taken to account

for organism-specific pathways and promiscuity of enzyme activity. For example, S.

cerevisiae contains yeast-specific pathways such as the α-aminoadipate pathway for lysine

biosynthesis (Xu et al., 2006) and Synechococcus elongatus UTEX 2973 contains the

phosphoketolase pathway which must be included in their respective atom mapping

models. In addition, Characterization of promiscuity of enzyme activity has added novel

Page 18: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

7

metabolic reactions such as the riboneogenesis pathway (Clasquin et al., 2011) for which

atom mapping remains poorly established. For such reactions, mapping algorithms based

on graph theory are available (Latendresse et al., 2012). In particular, the recent CLCA

algorithm has been shown to be faster and more accurate in generating reaction atom maps

in compared to previous algorithms due to the constraints imposed by chemical and stereo-

chemical properties of reactions (Kumar and Maranas, 2014). Complex chemical entities

and incorrect determination of alternate reaction maps necessitate that the generated maps

must be manually inspected. Computational mapping algorithms generally rely on

SMILES notation (Latendresse et al., 2012) or graph invariance numbers (Weininger et al.,

1989), which is often very different from IUPAC numbering schemes. The limited

availability of inter-nomenclature conversion tools further complicates the inspection and

correction of data, often requiring additional visual support provided in MetaCyc

(Latendresse et al., 2012) and MetRxn databases (Kumar et al., 2012). Atom mapping is

represented using a string of identifiers such that the position of the atom identifier on the

reactants side maps to the position of the identifier on the products side. For the example

network shown in Figure 1.1, the atom mapping for all the relevant reactions is shown in

Table 1.1.

Once the stoichiometric model and the corresponding atom mapping model are available,

the next step in flux elucidation is the setting up of a framework to predict labeling

distributions when fluxes are known. A number of frameworks are available for this

purpose that operate on the concept of isotopomers. Isotopomers are the exhaustive set of

all possible configurations of labeled and unlabeled atoms for a given metabolite. For the

Page 19: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

8

three-carbon metabolite 𝐴, the set and description of isotopomers is shown in Figure 1.2a.

for a molecule containing 𝑛 possibly labeled atoms, 2𝑛 isotopomers can be defined.

Therefore, for the three-carbon metabolite 𝐴, eight isotopomers exist. The [2𝑛 × 1] vector

of fractional abundance of each isotopomers is known as an isotopomers distribution vector

(IDV). Since stable isotopes of atoms cannot be created or destroyed in a biochemical

reaction network, conservation of mass across all isotopomers can be enforced using a

system of balance equations. This system of nonlinear algebraic equations forms the

Isotopomer framework for flux elucidation (Schmidt et al., 1997) from which IDVs can be

calculated either by numerical integration or by solving the system of algebraic equations

using Newton’s method.

It is important to note that experimental measurements using NMR spectroscopy or Mass

Spectrometry (MS) do not usually provide information on positional enrichment of atoms.

Instead, they detect the total number of labeled atoms per molecule based on mass shifts

arising from incorporation of heavier isotopes. Isotopomers that differ in the number of

labeled atoms only are termed mass isotopomers. For a molecule containing 𝑛 atoms, 𝑛 +

1 mass isotopomers can exist. The [(𝑛 + 1) × 1] vector containing the fractional

abundance of mass isotopomers is known as the mass-isotopomers distribution vector

(MDV) and are assembled from isotopomers for metabolite 𝐴 as shown in Figure 1.2a.

Once IDVs are computed in the isotopomers framework, the corresponding MDVs for the

metabolites whose labeling distribution is quantified by NMR or MS are assembled.

Finally, fluxes are elucidated that minimizes the deviation between predicted and measured

MDVs.

Page 20: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

9

Since the number of isotopomers scales exponentially with number of atoms, it is often

intractable for large models, particularly since a system of nonlinear equations must be

solved repeatedly to compute IDVs and MDV. To partially alleviate this difficulty, the

concept of a cumulative isotopomers or cumomer was proposed (Wiechert et al., 1999).

The cumomers for metabolite 𝐴 are shown in Figure 1.2b. Note that the number of

isotopomers is equal to the number of cumomers for any metabolite and a linear mapping

can be established between isotopomers and cumomers. As such, recasting the isotopomers

balances in the cumomer space does not reduce the number of unknown variables.

However, recasting the equations in the cumomer space allows the equations to be

decomposed into a cascaded system of equations based on the fact that cumomer weights

are directionally coupled. This means that, for a given cumomer weight, a mass balance

equation only depends on cumomers of the same weight or less weight, but never depends

on a cumomer of higher weight. This reduces the cumomer framework to a system of linear

equations in unknown cumomers of a specified weight when fluxes are specified. The

ability to solve for cumomers as a cascaded system of linear algebraic equations

dramatically lowers the computation time required to solve for MDVs thereby allowing

networks as large as a core metabolic model to be analyzed using the cumomer framework

(Wiechert et al., 1997) while also opening up the possibility of performing local statistical

analyses on inferred fluxes in the form of confidence intervals.

While the computational advancement with the cumomer framework is noteworthy, the

main limitation with the isotopomers framework in terms of scalability and applicability to

more descriptive models of metabolism remains unaddressed by the cumomer framework.

Page 21: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

10

In response to this limitation, the elementary metabolite units (EMU) framework

(Antoniewicz et al., 2007) was introduced. The EMU method introduces two major

modifications to the cumomer framework to improve tractability. First, EMUs group

together isotopomers so that they now represent MDVs for the specified EMU. This allows

EMUs to operate in the MDV space directly as opposed to the isotopomer space. As an

example, the grouping of isotopomers to EMUs is shown in Figure 1.2c. Following this,

the EMU method employs a depth-first search algorithm to identify the minimum number

of EMU balances required to simulate the MDV of a measured metabolite fragment, further

reducing the number of relevant balance equations to be solved. These improvements are

implemented while retaining the same cascaded problem structure in the cumomer

framework. Overall, this contributes to a 95% reduction in the number of balance equation

for a central metabolic model for E. coli that also contains amino acid pathways. Although

the EMU framework remains tractable and provides a substantial speed-up in flux

elucidation for a central metabolic model, tractability at the genome-scale has never been

demonstrated, due to which, it is unclear whether 13C-assisted flux elucidation can even

be applied to larger metabolic models directly in their current form.

1.4. Construction of kinetic models of metabolism

Kinetic parameters are identified by solving a nonlinear least-squares regression problem

that recapitulates experimentally measured temporal concentration profiles (Jahan et al.,

2016) or steady-state fluxes and metabolite concentrations (Khodayari et al., 2014) in

response to genetic and environmental perturbations. Steady-state fluxes are generally

elucidated using 13C-MFA when available. Otherwise, the WT flux distribution is sampled

Page 22: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

11

using FBA and kinetic parameters are identified that recapitulate yields of products such

as biomass, ethanol, and acetate under multiple mutant conditions (Dash et al., 2017).

Flexibility exists in the choice of modeling framework relating fluxes to metabolite

concentrations and selection of optimization method used for estimation of kinetic

parameters. Bottom-up approaches are avoided due of lack of organism-specific

information and kinetic data generated at conditions different than in vivo growth

conditions, leading to haphazard data integration and construction of models that are either

unstable or have poor predictive capabilities. Limitations with bottom-up approaches have

motivated the development of various data-driven kinetic parameterization frameworks

such as ORACLE (Miskovic and Hatzimanikatis, 2010) which expresses rate laws using a

log-linear formalism (Hatzimanikatis and Bailey, 1997), MASS models (Jamshidi and

Palsson, 2008) expressing fluxes using mass-action kinetics, Ensemble Modeling (EM)

(Tran et al., 2008) relating fluxes to metabolite concentrations using mass-action kinetics

in conjunction with elementary-step decomposition of the mechanism of enzyme catalysis,

and GRASP (Saa and Nielsen, 2015) which uses the general Monod-Wyman-Changeaux

formalism (Monod et al., 1965) within a Bayesian paradigm. Besides these, models

employing Michaelis-Menten and Hill kinetic formalisms have also been parameterized

(Chassagnole et al., 2002; Srinivasan et al., 2018). Of all these frameworks, only ORACLE

and GRASP can provide a reliable estimate of local sensitivity of fluxes to metabolite

concentration fluctuations as they support easy computation of thermodynamically feasible

elasticities and control coefficients. Furthermore, GRASP is also able to provide a

distribution of kinetic parameters explaining the available experimental data as it is cast

within a Bayesian statistical framework. However, its predictive capabilities of ORACLE

Page 23: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

12

are limited to the vicinity of the reference state about which linearization is performed and

GRASP has limited scalability due to insufficient sampling of higher dimensional kinetic

spaces using Monte-Carlo-based sampling techniques. The MASS framework assumes that

all reactions follow generalized mass-action kinetics. Since it does not account for enzyme

saturation effects, good predictive capabilities are limited to the substrate-limited regime

(Du et al., 2016). Of all the frameworks, MASS represents the kinetic model using the

fewest number of parameters and is therefore the most scalable (Saa and Nielsen, 2017).

Mechanistic frameworks such as EM and Michaelis-Menten-based formalisms represent

conservation of mass across metabolites using a system of ODEs, thereby requiring an

ODE solver (Hoops et al., 2006; Tran et al., 2008) for steady-state evaluation. Compared

to Michaelis-Menten formalisms, EM offers a tractable framework for relating fluxes,

enzyme abundances, metabolite concentrations, and kinetic parameters through specified

mechanisms which decompose into systems of bilinear equations. This allows easy

insertion/deletion of regulatory components without the need for reformulating the kinetic

rate-law expression. The main limitation with mechanistic frameworks is that due to the

large dynamic range of kinetic parameters, the system of ODEs can be stiff, thereby

rendering integration computationally expensive and susceptible to failure. Gradient

calculations must be performed using forward sensitivity analysis (Raue et al., 2013) as the

use of finite difference approximations for functions that are the solution to a system of

ODEs is computationally expensive, inefficient, and inaccurate (Frohlich et al., 2017).

Forward sensitivity analysis has very poor scalability and can require the solution of over

100,000 ODEs for a model with only 1,000 kinetic parameters. Owing to these limitations,

the use of metaheuristic approaches such as genetic algorithm (GA) (Khodayari et al.,

Page 24: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

13

2014) and particle swarm optimization (Millard et al., 2017) for traversal of the feasible

solution space is favored. Meta-heuristic algorithms suffer from two key limitations: (i) the

exponential increase in the number of function evaluations required to adequately sample

the kinetic space upon model scale-up, and (ii) the inability to confirm optimality of a

reported solution due to the exclusion of gradient evaluations. The exclusion of gradient

calculations also prevents the evaluation of local sensitivities and any follow-up

calculations on uncertainty of estimated kinetic parameters. Although the GRASP

framework is compatible with the EM formalism, the poor scalability of the underlying

Monte-Carlo approach limits its application in uncertainty analysis of larger models. This

motivates the development of a kinetic parameterization framework that overcomes

difficulties associated with numerical integration and allows convenient calculation of

sensitivities to improve compatibility with local optimization solvers.

1.5. Aim and outline of the thesis

The objective of this thesis is to develop computational tools based on nonlinear

optimization to construct large-scale predictive models of metabolism integrated with

kinetic descriptions for fluxes using 13C labeling data. First, tools for flux elucidation and

confidence interval estimation for genome-scale will be described. Following this, a novel

decomposition-based algorithm for accelerated and reproducible parameterization of

kinetic models is introduced. This thesis is outlined as follows:

• Chapter 2 details the application of isotopic steady-state 13C-MFA to a genome-

scale metabolic network of E. coli. Metabolic models used in 13C metabolic flux

analysis generally include a limited number of reactions primarily from central

Page 25: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

14

metabolism. They typically omit degradation pathways, complete cofactor

balances, and atom transition contributions for reactions outside central

metabolism. This chapter addresses the impact on prediction fidelity upon scaling-

up mapping models to a genome-scale. To this end, the genome-scale metabolic

mapping model (GSMM) (imEco726) is constructed using as a basis the iAF1260

model upon eliminating reactions guaranteed to not carry flux based on growth and

fermentation data for a minimal glucose growth medium. This chapter discusses

the role of stoichiometric flux coupling in the resolution of metabolic fluxes at the

genome-scale and the loss of information associated with mapping fluxes from

MFA on a core model to a GSM model is quantified.

• Chapter 3 describes the scale-up of existing algorithms for isotopic instationary

MFA to genome-scale models and demonstrates an application for flux elucidation

in Synechocystis PCC 6803. Completeness and accuracy of metabolic mapping

models impacts the reliability of flux estimation in photoautotrophic systems. In

this chapter, metabolic fluxes under photoautotrophic growth conditions in the

widely-used cyanobacterium Synechocystis PCC 6803 are quantified by re-

analyzing an existing dataset using genome-scale isotopic instationary 13C-

Metabolic Flux Analysis (INST-MFA). Flux elucidation using the genome-scale

carbon mapping model reveals a qualitatively different solution relative to that

predicted by a core model and identifies a novel bifurcated pathway topology that

enables maximum carbon routing towards biomass. Flux prediction departures

from the ones obtained with the core model demonstrate the importance of

Page 26: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

15

constructing mapping models with global coverage to reliably glean new biological

insights using labeled substrates.

• Chapter 4 introduces a novel decomposition-based algorithm for estimation of

kinetic parameters using available fluxomic data. Parameterization of organism-

level kinetic models that faithfully reproduce the effect of different genetic or

environmental perturbations remains an open challenge due to the intractability of

existing algorithms. This chapter introduces K-FIT, an accelerated kinetic

parameterization workflow that leverages a novel decomposition approach to

identify steady-state fluxes in response to genetic perturbations followed by a

gradient-based update of kinetic parameters until predictions simultaneously agree

with the metabolic flux data for all perturbed metabolic networks. The applicability

of this approach to large-scale models is demonstrated by parameterizing an

expanded kinetic model for E. coli (307 reactions and 258 metabolites) using

fluxomic data for six mutants. The 1,000-fold speed-up afforded by K-FIT is

transformational as it enables follow-up robustness of inference analyses and

optimal design of experiments that can inform metabolic engineering strategies.

• Chapter 5 summarizes the accomplishments of this thesis, details some of the

successful follow-up work enabled by the work presented in this thesis and

discusses the possible future directions in the field of metabolic modeling enabled

by the work presented in this thesis.

Page 27: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

16

Figure 1.1: A toy reaction network example for MFA

Page 28: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

17

Figure 1.2: Isotopomers, cumomers, and EMUs for metabolite 𝐴. (a) Isotopomers and

the grouping of isotopomers into mass isotopomers. (b) Grouping of isotopomers into

cumomers. (c) Grouping of isotopomers into EMUs. The solid black circles represent

labeled atoms whereas the white circles represent unlabeled atoms.

(a)

Page 29: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

18

(b)

Page 30: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

19

(c)

Page 31: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

20

Table 1.1: Reaction stoichiometry and atom mapping for toy network

Reaction Reaction Stoichiometry Reaction Atom Mapping

𝑣1 𝐴 → 𝐵 𝐴(𝑎𝑏𝑐) → 𝐵(𝑎𝑏𝑐)

𝑣2 𝐵 → 𝐷 𝐵(𝑎𝑏𝑐) → 𝐷(𝑐𝑏𝑎)

𝑣3 𝐷 → 𝐵 𝐷(𝑎𝑏𝑐) → 𝐵(𝑐𝑏𝑎)

𝑣4 𝐵 → 𝐷 𝐵(𝑎𝑏𝑐) → 𝐷(𝑎𝑏𝑐)

𝑣5 𝐵 → 𝐸 + 𝐶 𝐵(𝑎𝑏𝑐) → 𝐸(𝑎) + 𝐶(𝑏𝑐)

𝑣6 2𝐶 → 𝐷 + 𝐹 𝐶(𝑎𝑏) + 𝐶(𝑎𝑏) → 𝐷(𝑎𝑏𝑎) + 𝐹(𝑏)

𝑣7 𝐶 + 𝐹 → 𝐷 𝐶(𝑎𝑏) + 𝐹(𝑐) → 𝐷(𝑎𝑏𝑐)

Page 32: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

21

Chapter 2

13C Metabolic flux analysis at the genome-scale

This chapter has been previously published in modified form in Metabolic Engineering

(Saratram Gopalakrishnan and Costas D. Maranas. 13C Metabolic flux analysis at the

genome-scale. Metabolic Engineering 32(2015): 12-22.)

2.1. Introduction

Cellular metabolism is a direct indicator of its physiological state (Nielsen, 2003).

Estimation of fluxes using 13C metabolic flux analysis involves solving a nonlinear least-

squares problem for the flux distribution capable of matching experimentally measured

labeling patterns of analyzed metabolites (Zomorrodi et al., 2012), typically amino-acids

and fatty acids. Labeling patterns given a flux distribution can be predicted by relating the

target labeling patterns to input tracers and a flux distribution using a system of algebraic

equations. This can be achieved by decomposing the network using various frameworks

such as isotopomers (Schmidt et al., 1997), cumomers (Wiechert et al., 1999), or the EMU

method (Antoniewicz et al., 2007) all of which are based on the atom mapping matrix

(AMM) concept (Zupke and Stephanopoulos, 1994). The computational complexity arises

from the fact that the number of equations scales super-linearly with network size. Network

decomposition using the isotopomer approach results in 4,612 unknown mass isotopomers

involving bilinear terms for a complete central metabolic network of E. coli (Antoniewicz

et al., 2007). Efforts in the last decade have focused on reducing complexity and proposing

better algorithms to solve the mass isotopomer distributions (MIDs) (Wiechert and de

Page 33: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

22

Graaf, 1996) (Wiechert et al., 1997). For instance, the EMU method reduces the number

of isotopomer variables from 4,612 to 310 for a central metabolic network of E. coli. Sub-

networks can be simplified further using the Dulmage-Mendelsohn decomposition to

improve the speed of estimation (Young et al., 2008). A variety of optimization approaches

(Schmidt et al., 1999) have been used to infer the metabolic fluxes that minimize the sum

of the least squares while the statistical significance of the estimated flux distribution is

evaluated using the χ2 test (Pazman, 1993). Due to the limited number of analyzed

metabolites and inherent measurement error, flux ranges rather than unique values are

obtained for the metabolic fluxes using either linearized statistics (Mollney et al., 1999),

grid search, or non-linear statistics (Antoniewicz et al., 2006). All of these approaches are

iterative in nature, requiring repeated solution of the least-squares minimization problem

placing additional computational burden.

The general practice is for 13-C MFA models to include only a skeletal representation of

central metabolism comprised of the EMP pathway, PPP, TCA cycle, glyoxylate shunt,

and the ED pathway. Important pathways such as serine and arginine degradation are

typically absent. 13C MFA has been used extensively to elucidate the metabolic properties

of knockout strains (Flores et al., 2002; Hua et al., 2003; Shimizu, 2004; Usui et al., 2012;

Zhao and Shimizu, 2003), identify metabolic bottlenecks (Antoniewicz et al., 2007), and

even confirm the activity of various pathways (Crown et al., 2011; You et al., 2014; Young

et al., 2011). Earlier E.coli mapping models (Shimizu, 2004; Zhao and Shimizu, 2003) did

not use a biomass equation, instead they included specific drains proportional to the

specific growth rate (Hua et al., 2003) to account for biomass formation (Holms, 1996).

Page 34: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

23

Newer models use a defined biomass equation obtained from macromolecular composition

of a wild-type strain (Antoniewicz et al., 2007). While the use of a biomass equation in

MFA constrains metabolism by imposing specific requirements on macromolecule

precursors from central metabolism, it neglects the contribution of the soluble pool and the

energetic requirements that are part of a genome-scale model. The use of cofactor balances

in MFA, limited to newer prokaryotic models (Bonarius et al., 1998; van Gulik and

Heijnen, 1995), can sharpen the reaction bounds involved in energy metabolism. However,

such MFA models assume that the cell is purely biosynthetic neglecting possibly active

pathways such as gluconeogenesis and amino acid degradation. MFA models for

organisms such as Synechocystis (Young et al., 2011), CHO cells, (Ahn and Antoniewicz,

2011), and hybridoma cell lines (Murphy et al., 2013) have so far omitted cofactor

balances. Nevertheless, it has been previously shown that neglecting potentially active

reactions that contribute to cofactor balances can alter the estimated flux ranges using 13C

MFA (Bonarius et al., 1998). The key advantage of using a genome-scale model in MFA

is that it represents the totality of reactions that can be carried out by the organism avoiding

any biases introduced by lumping reactions or omitting pathways pre-judged as non-

functional. An earlier MFA study on a larger metabolic model of E.coli (Suthers et al.,

2007) proposed the possibility of the integration of a number of non-central pathways, and

found that nearly half of the fluxes were fixed by stoichiometry alone. Here we take the

next step by making use of a genome-scale model for flux elucidation thereby avoiding

any pre-conceived assumptions about which pathways should be active or inactive.

Page 35: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

24

Mapping models used for MFA typically include less than 10% of the reactions contained

within a genome-scale model. Flux ranges obtained using 13C MFA have been used

extensively to test the validity of genome-scale models (Chen et al., 2011; Dash et al.,

2014; Saha et al., 2012). However, this transfers the assumptions used in the construction

of MFA models to the GSM model, thereby providing a solution space which may be more

constrained than what the labeling data supports. On the other hand, GSMs are generally

analyzed using methods such as Flux Balance Analysis (Varma and Palsson, 1994), Flux

Variability Analysis (Mahadevan and Schilling, 2003), and MOMA (Segre et al., 2002).

Often, the predicted metabolic phenotypic space is quite large with split ratios and cycles

poorly resolved. 13C MFA at a genome-scale holds the promise of resolving split ratios

and cycles while avoiding making any assumptions about which pathways should be active.

As a result, it can identify the activity of all degradation pathways which are generally

neglected by existing mapping models, impose detailed cofactor balances, generate

unbiased confidence intervals for all fluxes within the network, provide insight into which

fluxes can or cannot be resolved using C-13 labeling data (i.e., identifiability problem),

maintain consistency with a comprehensive biomass equation describing metabolite

demands for macromolecule biosynthesis, soluble pool, and experimentally measured

energy demands, and even accurately predict the impact of genetic modifications which

are essentially unresolvable by constraint-based modeling techniques (Copeland et al.,

2012).

Successful 13C-MFA at a genome-scale requires a reliable GSM model along with detailed

atom maps of every reaction in the network. Atom mapping information for central

Page 36: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

25

metabolism reactions is readily available from biochemistry textbooks. For other pathways,

online databases such as KEGG (Latendresse et al., 2012), MetaCyc (Korner and

Apostolakis, 2008), or MetRxn (Kumar et al., 2012) are useful resources. MetRxn includes

reaction mapping information for over 27,000 reactions generated using a novel sub-

structure search algorithm known as Canonical Labeling for Clique Approximation

(CLCA) (Kumar and Maranas, 2014) which offers improved accuracy and memory

utilization over existing heuristic algorithms. The approach utilizes number theory to

generate unique ids for each atom followed by a maximum common substructure search.

The MetRxn database contains atom mapping information for reactions from 112

metabolic models including iAF1260 directly downloadable from

http://www.metrxn.che.psu.edu/.

In this study we carry out estimation of flux ranges for E-coli using both a core mapping

model (Leighty and Antoniewicz, 2013) and for a genome-scale model iAF1260 (Feist et

al., 2007) using measured fluxes and 13C labeling data as constraints. The GSM model is

refined further by imposing measured extracellular fluxes as constraints and then

performing flux variability analysis (FVA) to identify the part of the network that can carry

non-zero fluxes. Subsequently, active sub-networks of different size are generated by

decomposing the network using the EMU algorithm (Antoniewicz et al., 2007). The fluxes

are estimated by solving a least squares problem involving the minimization of the sum of

squares of difference between predicted metabolite labeling and experimentally observed

metabolite labeling patterns. 95% confidence intervals are generated by varying the fluxes

individually until the minimum sum of squares exceeds a pre-defined threshold. We

Page 37: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

26

demonstrate how a combination of the constraints involved in FBA and 13C-MFA can be

used in a concerted manner to effectively resolve fluxes through key branch points such as

the oxidative pentose phosphate pathway, the Entner-Doudoroff pathway, and the

glyoxylate shunt. Our results allude to the possibility of coexistence of anabolic and

catabolic reactions and bypasses resulting in expanded ranges for many reactions which

were previously reported to be precisely inferred. They also shed light on the inability of

MFA alone to resolve alternate pathways and energy metabolism when the entirety of

metabolic reactions implied by the GSM model is used. Surprisingly, we found that results

are largely insensitive to biomass composition fluctuations as the experimental error in the

labeling data is the dominant source of prediction uncertainty. The impact of using a core

model for MFA is quantitatively assessed by contrasting the corresponding flux ranges. In

addition, the loss of information when fluxes derived from MFA in the core metabolic

model are directly ported on a GSM model is assessed and discussed.

2.2. Methods

2.2.1. Genome-scale atom mapping model

The genome-scale metabolic model of E. coli (Feist et al., 2007) consisting of 2,382

reactions and 1,670 metabolites was pruned by eliminating reactions incapable of carrying

flux for the bioprocess data measured by Leighty et.al. (Leighty and Antoniewicz, 2013).

The model was further simplified by manually eliminating thermodynamically infeasible

cycles disjoint from the metabolic network. The resultant model has 697 reactions (29

reversible reactions) and 595 metabolites with glucose as the sole carbon source. Examples

of reactions eliminated from the model include uptake systems of other carbon sources,

Page 38: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

27

beta-oxidation, and nucleotide salvage pathways in agreement with Suthers, et al (Suthers

et al., 2007). Atom mapping information for the core model was obtained from the study

conducted by Leighty, et al (Leighty and Antoniewicz, 2013). Atom mapping information

for the genome-scale model was obtained using the CLCA algorithm (Kumar and Maranas,

2014).

2.2.2. Flux estimation procedure

Network decomposition was accomplished using the EMU algorithm that relates the

labeling pattern of the input tracer and a flux distribution to a labeling pattern of all

analyzed intracellular metabolites. Fluxes were estimated by solving a non-linear least

squares problem described in detail in Appendix A. This problem minimizes the variance-

weighted sum of the squares of differences between the predicted and experimentally

observed labeling patterns for 18 fragments from 10 different intracellular amino acids

subject to flux non-negativity. Glucose labeled at the second carbon with 99.5% purity was

used as the tracer input in the analyzed dataset (Leighty and Antoniewicz, 2013) as it was

found to best resolve oxidative PPP. All other carbon atoms were assumed to contain the

heavy isotope of carbon equal to the natural abundance. The least squares objective

function 𝜑 (see Appendix A) depends on the subset of fluxes (𝒘) present in the EMU

model which is a subset of all fluxes (𝒗) in the S-matrix of the metabolic model. Since the

system of component balance equations describing the metabolic network is

underdetermined, the set of fluxes 𝒗 can be expressed in terms of the free fluxes 𝒖 by

means of a null-space decomposition (Antoniewicz et al., 2006). Consequently, the set of

fluxes describing the EMU model 𝒘 and the objective function 𝜑 can also be expressed in

Page 39: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

28

terms of the subset of free fluxes 𝒖. This allows for the estimation of all the fluxes within

the metabolic network by the resolution of the free fluxes 𝒖. The problem described in

Appendix A was solved using the fmincon function from the optimization toolbox of

MATLAB. A user-supplied Hessian matrix was provided for the interior point algorithm

(Byrd et al., 2000; Byrd et al., 1999; Waltz et al., 2006) of fmincon using the procedure

introduced by Antoniewicz, et al (Antoniewicz et al., 2006). Given the nonconvex nature

of the objective function, the problem was solved 100 times and the best solution was

selected as the optimal flux distribution candidate. This solution was selected as the optimal

flux distribution for further analysis only if it satisfied two criteria: the optimization

problem converged to the same solution at least 70 times out of 100 runs and the obtained

flux distribution was unaffected by local perturbations. All fluxes (mmol/dmol-glc) are

reported using 100 mmol of glucose uptake per gram dry cell weight as the basis. Fluxes

were also estimated with the amino acid MS data obtained using glucose labeled at the fifth

carbon as well to clearly identify loss of resolution due to model scale-up.

2.2.3. Confidence intervals

The underdetermined nature of the metabolic network could result in multiple flux

distributions with the same labeling pattern. Furthermore, metabolite labeling

measurements are inherently noisy introducing error in the data that further contributes to

metabolic flux inference uncertainty. To this end, we estimated the lower and upper bounds

of the 95% confidence interval of each flux such that the sum of squares of residuals (SSR)

is within 3.84 of the minimum SSR. The value 3.84 corresponds to the 𝜒2 statistic for a p-

value of 0.05 and one degree of freedom. The lower and upper bounds were estimated

Page 40: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

29

using an iterative procedure (Antoniewicz et al., 2006) where every flux 𝑣𝑗 is successively

varied up (or down for lower bound) and a new best flux distribution is re-calculated. The

upper (or lower) bound for 𝑣𝑗 defining the 95% confidence interval corresponds to the

value that renders the difference between the re-calculated and original SSR’s equal to

3.84. Since the genome-scale model contains a large number of measurement-coupled

reactions, a flux coupling analysis (Burgard et al., 2004) is performed to identify the list of

all reactions coupled to an extracellular measurement (i.e., 411 out of a total of 697

reactions). These reactions are assigned a range consistent with the extracellular

measurement variance to which they are fully coupled. In addition, a flux coupling analysis

between every reaction pair in the metabolic model further reduces the number of reactions

whose range needs to be estimated based on the procedure described above. We identified

250 coupled reaction pairs in the EMU-balanced network implying that there were only

186 remaining reaction fluxes whose confidence levels needed to be directly assessed.

Additional flux range reduction was achieved by successively performing FVA

(Mahadevan and Schilling, 2003) using the obtained 95% confidence level ranges as flux

bounds. The technical details of the procedure describing the implementation for 13C-

MFA for GSM models is described in Appendix A.

2.3. Results

2.3.1. Active EMU network

Decomposition of the genome-scale network using the EMU algorithm resulted in EMU

sub-networks of sizes 1 through 9 (i.e., number of carbons in the EMU fragment). The

Page 41: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

30

network consisted of 1,400 balanced EMUs and 3,526 EMU reactions, spanning 432 out

of the 726 fluxes in the GSM model. Of the 3,526 EMU reactions, 1,405 reactions were

duplicates, contributed by redundant mappings. In comparison, the core model consists of

310 balanced EMUs and 863 EMU reactions with 181 duplicates, spanning 80 out of the

100 fluxes in the model. It is interesting to note that a nine-fold scaling up of the mapping

model resulted in only a five-fold increase in the number of EMUs, a four-fold increase in

the number of EMU reactions, and a three-fold increase in the number of unique EMU

reactions. This moderate increase is because only 256 out of 595 metabolites (present in

426 out of 726 fluxes) are required to predict the experimentally observed labeling patterns.

65% of all the fluxes involved in EMU balances are from central metabolism and amino

acid metabolism. The remaining 35% result from the contributions of cofactor

biosynthesis, lipid biosynthesis, and nucleotide biosynthesis, accounting for novel carbon

transformations absent in the core model. In comparison, the core EMU network includes

all reactions from central metabolism and a limited number of reactions from amino acid

metabolism including Serine hydroxymethyltransferase (SHMT), glycine cleavage system,

and threonine aldolase. The GSMM model sheds light on novel carbon transformations,

alternate atom mapping pathways and redundant atom maps, which provide the means to

explain the experimentally measured labeling patterns better and highlight the role of

assumptions involved in flux estimation using the core model.

The novel carbon transformations allowed by the pathways outside central and amino acid

metabolism in the GSM model stems from the production of molecules such as CO2,

glycoaldehyde, and formate which are eventually recycled into central metabolism. CO2 is

Page 42: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

31

produced as a by-product of the synthesis of several cofactors and trace metabolites such

as coenzyme A, thiamine pyrophosphate, heme, pyridoxal phosphate, menaquinol 8, and

NAD. In addition, CO2 is also produced by the decarboxylation of serine to phosphatidyl

ethanolamine. The core model only accounts for CO2 production and consumption within

the central metabolic reactions. Similarly, formate is absent in the core model, whereas, it

is produced by the degradation of formyl-tetrahydrofolate, and biosynthesis of

tetrahydrofolate, riboflavin, and thiamine pyrophosphate in the GSM model. Glycolate,

which is absent in the core model, is produced as a by-product of tetrahydrofolate

biosynthesis in the GSM model. Since the FVA flux ranges for these reactions indicate a

non-zero lower bound, it provides quantitative evidence that these often ignored

transformations play a role in explaining the observed labeling data.

The GSMM model also traces alternate routes of existing pathways in the core model. For

example, the production of succinate from α-ketoglutarate occurs only through the TCA

cycle in the core model, whereas the GSMM allows for two additional routes: the

degradation of glutamate through the γ-aminobutyrate pathway and the degradation of

arginine. The key difference between these three pathways is the energy output. The TCA

cycle route produces one NADH and one ATP, whereas the γ-aminobutyrate pathway route

produces only one NADH and the arginine pathway produces one NADH but requires one

ATP to recycle the consumed acetyl-CoA. Since the atom transitions for succinate are

identical for all three pathways, 13C-MFA will fail to resolve fluxes between these

pathways thus the core model arbitrarily apportions the entire flux towards succinate

through the TCA cycle. Resolution between the three parallel pathways can be achieved

Page 43: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

32

only after cofactor balances are included. A similar diversity of metabolic routes implied

by the GSM model can be seen with the production of pyruvate from succinate. The core

model only includes the pathway involving malic enzyme whereas the GSM model

accounts for an additional route through propionyl-CoA with an identical energy output

implying non-identifiability between the two alternatives. Such alternate routes with

identical atom mapping are ubiquitous in GSM models thereby resulting in non-

identifiability or poor resolution of misleadingly well resolved fluxes according to the core

model.

Multiple metabolic reactions are sometimes described by exactly the same EMU reaction

leading to non-identifiability. We identified 195 such instances among the 726 reactions in

the GSM model. The complete list of EMU reactions consisted of 122 duplicate reactions,

85 triplicate reactions, 27 quadruplicate reactions, and 48 EMU reactions describing

identically five or more reactions from the genome-scale model. The source of this

redundancy can be traced back to four factors: (i) isozymes with different cofactors, (ii)

alternate reactions facilitating the same atom transfer, and (iii) group transfer reactions such

as transaminases. For example, isozymes of Malic Enzyme catalyze the oxidation of malate

to pyruvate using either NAD or NADP as a cofactor. Thus, the EMU reaction describing

the atom transfer reaction from malate to pyruvate is identical for both isozymes. The

elongation of the fatty acid chain releases one CO2 from malonyl-ACP in the first step of

the cycle. As a result, the corresponding EMU reaction occurs eleven times accounting for

all the fatty-acid chain elongation reactions. The conversion of glutamate to α-ketoglutarate

occurs through the transaminase reactions for which glutamate is the amino group donor.

Page 44: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

33

The GSM model contains 18 different aminotransferase reactions thereby resulting in

multiple EMU reactions producing α-ketoglutarate from glutamate. Similarly, the EMU

reaction describing the carbon transfer from ATP to ADP (or AMP) arises in an identical

manner for up to 94 reactions in the GSM model. Therefore, all GSM reactions that map

to the same EMU reaction cannot be resolved by MFA alone as only the sum of their

respective flux values is constrained.

2.3.2. Flux identifiability and statistical validity of the model

Because the EMU network spans a much smaller fraction of the GSM model, the number

of 𝜒2 degrees of freedom (DOF) for the regression model differs significantly depending

on whether it is defined with respect to the entire GSM model or with respect to the EMU

network alone. Table 2.1 shows the comparison of the 𝜒2 degrees of freedom for the core

model and the GSM model. The number of DOF is defined as the difference between the

number of data points (measured fluxes and metabolite mass fractions) and the number of

free variables in the network (EMU and GSM networks). Statistical significance of

inference requires that the estimated minimum sum of squares of residuals (SSR) be within

an expected range determined by the confidence level and the number of DOF for the

model.

For the given set of labeling data, the core model has 55 DOF. The core model contains 20

reactions (20% of the all fluxes within the metabolic network) that do not provide EMU

balances of which 15 are fully coupled to an extracellular flux measurement. Since the

number of variables outside the EMU balances included in the least-squares regression

model is very small, the impact on the expected sum of squares of residuals (SSR) is

Page 45: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

34

minimal, thereby ensuring that it is safe to use the entire metabolic network as the

regression model. With a non-negative DOF, the core model seldom encounters issues with

statistical validity of the estimated fluxes. In contrast, the DOF for GSM model is -30 after

a cursory analysis. This is because the number of reactions unaccounted by EMU balances

for the GSM is considerably larger causing the number of free fluxes in the entire metabolic

network to exceed the number of available data points. A negative DOF for the GSM model

would imply over-fitting and lack of statistical significance. However, the EMU network

only spans about 60% of the metabolic network. Analyzing the EMU network associated

with the GSM model revealed that the regression model has 27 DOF. This non-negative

value of the DOF arise due to a 40% reduction in the number of variables (i.e., free fluxes)

and a large increase in the number of fluxes coupled to an extracellular flux measurement.

It was found that 256 out of the 595 balanced metabolites were involved in EMU balances.

Of these, 214 metabolites were completely balanced by reactions involved in EMU

balances, meaning that they did not feed into non-EMU pathways. Of the 42 metabolites

feeding into other pathways, 28 of them were consumed by measurement-coupled

pathways only, indicating that only 14 metabolites required additional equality constraints

to be considered balanced EMU metabolites. Therefore, the actual EMU model contains

439 reactions and 242 balanced metabolites. Further reduction of the model by elimination

of measurement-coupled reactions revealed that there are only 99 free fluxes. In

comparison, the entire metabolic network has 274 free fluxes. Therefore, 13C-MFA can

be performed on this model if there are at least 99 data points. Since the data set used in

this analysis contains 126 data points, the least-squares fitting approach can be safely

applied to obtain statistically acceptable fits. Also shown in Table 2.1 is the maximum

Page 46: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

35

allowed SSR for an accepted fit with 95% confidence. An increase in the number of

variables causes a reduction in the degrees of freedom, as a result of which the maximum

allowed SSR for 95% confidence is reduced for the GSM model.

2.3.3. Flux and range estimation at the genome-scale

Flux elucidation using the GSMM model predicts the experimentally observed MS data

better than the core model (Figure 2.1). This improved prediction is attributed to the

improved prediction of alanine and valine MS measurements with minimal changes to the

quality of prediction of other labeling patterns. Improvements for alanine and valine are

due to changes in the estimated fluxes in central metabolism (Figure 2.2b, 2.2c, 2.2d and

2.2e). The inclusion of alternate pathways, new carbon mapping information, and complete

metabolite and cofactor balances in the GSMM model results in significant changes in the

PPP and wider flux ranges for reactions in glycolysis and the TCA cycle. The ED pathway

and glyoxylate shunt flux ranges remained similar in both the core model and GSM model.

Identical trends were observed upon analysis with a glucose tracer labeled at the fifth

carbon (Figure 2.5).

Among the glycolytic reactions (Figure 2.2a and 2.2b) flux through PGI was unaffected.

However, the remaining reactions had expanded flux ranges due to the inclusion of

gluconeogenesis and alternate pathways of pyruvate metabolism. An unaffected PGI flux

range indicates that 13C-MFA using a GSMM model is capable of resolving the

glycolysis/PPP split ratio despite the large increase in the number of reactions. Expanded

flux ranges for TPI, GAPD, PGM, and ENO arise from the presence of an alternate pathway

from dihydroxyacetone phosphate to pyruvate through methylglyoxal. However, the

Page 47: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

36

alternate methylglyoxal pathway involves a different energy balance yielding no ATP and

less NADH than the EMP pathway, thus limiting its upper bound to 20 mmol/dmol-glc. As

a consequence of this, the glycolytic reactions have a non-zero lower bound of 65

mmol/dmol-glc. Both the lower and upper bounds of PYK are altered significantly. The

lower bound of PYK drops to 5 mmol/dmol-glc. Two factors contribute to this reduction:

(i) the availability of the phosphotransferase system (PTS) for glucose uptake as an

alternative to PYK, and (ii) a significant flux through the anaplerotic reaction PPC which

serves to replenish TCA metabolites. The inferred non-zero lower bound for PYK suggests

that the alternate pathways (methylglyoxal and PTS) can only carry a fraction of the flux

in lower glycolysis. The upper bound of PYK increases to 141 mmol/dmol-glc due to the

presence of a futile cycle with PPS (carrying a maximum flux of 20 mmol/dmol-glc)

resulting in the hydrolysis of one ATP per unit flux through this cycle. A similar effect was

also observed with the phosphorylation of glucose where G6PP can carry at most 20

mmol/dmol-glc of flux, thus increasing the upper bound of the PTS and HK reactions by

the same amount. The impact of gluconeogenesis is manifested in the flux range of PFK.

A decreased lower bound of this reaction compared to the core model is due to the reduced

contribution of the non-oxidative PPP towards fructose-6-phosphate production. The

increased upper bound of this reaction is due to the activity of the FBP reaction from

gluconeogenesis. With the upper bound of the FBP reaction limited to 8% of the total

glucose input to the network the upper bound of PFK is increased by the same amount to

account for this fully resolved in the GSMM futile cycle.

Page 48: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

37

Among the monophosphate shunts (PPP and ED pathways; Figure 2.2a and 2.2e), G6PDH

and GND reactions showed a small difference between the core model and GSMM models

to account for the increased glucose-6-phosphate demand for glycogen synthesis in the

genome-scale model. In contrast, a significant shift was observed in the non-oxidative PPP

reactions: TKT1, TKT2 and TALA. Both the lower and upper bounds of these reactions

were reduced by a factor of 2.5 mmol/dmol-glc, implying reduced carbon flux through this

pathway. This is a consequence of increased drains for biomass components R5P, S7P, and

E4P in accordance with the biomass composition in the GSM model. Specifically, S7P was

diverted towards lipopolysaccharide biosynthesis, R5P was shunted towards

tetrahydrofolate biosynthesis and nucleotides, and E4P was used in pyridoxal phosphate

and aromatic amino acids synthesis. Under the experimental stated growth condition, these

drains amounted to 1%, 0.2%, and 0.04% of the total glucose uptake for S7P, R5P, and

E4P, respectively. In addition, some Ru5P was diverted for lipopolysaccharide

biosynthesis amounting to a drain of 1% of the total glucose uptake. In contrast, the core

model only contains drains for R5P for nucleotide biosynthesis and E4P for aromatic amino

acid biosynthesis thereby predicting higher fluxes through the non-oxidative branch of the

PPP. These differences arise from the fact that the biomass equation for the core model

neglects the soluble pool and other cell wall components, which constitute up to 12% of

the cell dry weight (Long and Antoniewicz, 2014).

Loss of flux identifiability when scaling up to the GSMM was manifested in the TCA cycle

and associated fluxes (Figure 2.2a and 2.2d). This inability to resolve fluxes was primarily

due to the presence of various alternate pathways between metabolites. Flux through PDH

Page 49: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

38

was lower due to a required flux through pyruvate oxidase (POX) which converts pyruvate

directly to acetate. In fact, because Acetyl-CoA can be fully produced using the ACK

reaction this allows for a complete bypass of PDH. The lower bound of AKGDH decreased

to zero due to the presence of multiple alternate pathways between glutamate and succinate.

The conversion of glutamate to succinate via γ-aminobutyrate and γ-glutamylsuccinate

showed similar flux ranges as AKGDH indicating the inability of 13C-MFA to resolve

between these alternative pathways. Another pathway contributing to the expanded

AKGDH range was the degradation of arginine which had a non-zero lower bound to

account for the production of biomass components, putrescine and spermidine. Expansion

of FUM and MDH flux ranges were due to the presence of amino group transferring

mechanisms in the arginine and purine biosynthetic pathways which remove the amino

group from aspartate to produce fumarate. The lower bound of MDH was as low as zero

due to the presence of an alternate MDH (MQO) which uses ubiquinone as the electron

acceptor. Despite the presence of wide ranges at the individual flux level, the sum total of

all alternate pathways for a particular reaction resulted in similar bounds as in the core

model. The glyoxylate shunt was equally well resolved in the GSMM model as in the core

one but with an expanded repertoire of functions. The production of glycoaldehyde as a

by-product of tetrahydrofolate biosynthesis and the eventual conversion of glycoaldehyde

to glyoxylate resulted in a partial glyoxylate shunt activity in which MALS was active with

a non-zero lower bound.

Reactions that were insensitive to the C13 labels and thus outside EMU balances are

indirectly resolved by component balance constraints imposed by flux ranges of EMU-

Page 50: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

39

resolved reactions. Such reactions include a majority of cofactor biosynthesis pathways,

lipid biosynthesis, pyrimidine biosynthesis, and energy metabolism. Figure 2.3 shows the

flux ranges corresponding the oxidative phosphorylation, NADH transhydrogenase, and

total free ATP within the network. Oxidative phosphorylation was well resolved as the

oxygen uptake limits total flux through this pathway. The presence of pathways that

convert NADPH to NADH at the expense of ATP results in a finite upper bound for the

transhydrogenase due to ATP limits. A negative lower bound for this reaction indicates

that the direction of this reaction could not be resolved by 13C-MFA. The lower bound of

ATPM is 8.39, which matches exactly the non-growth associated ATP maintenance

requirement in iAF1260. The upper bound of available ATP predicted using the GSMM

model was far lower than that of the core model due to the fact that the GSM model globally

accounts for all ATP requirements. The flux range of excess ATP availability predicted

using the GSM model was far less than the core model due to the fact that the core model

only accounts for quantifiable (i.e., polymerization and biosynthetic of macromolecules)

ATP costs, which only amounts to 56% of the total growth-associated requirement (Feist

et al., 2007). A side product of the well resolved ATP balance is that the activity of futile

cycles is highly constrained and can thus be resolved by FVA.

Reaction flux resolution by 13C-MFA requires that there exist distinct atom transition

profiles between alternatives. This property affects the resolvability of gluconeogenesis,

ED pathway and the glyoxylate shunt. The flux through gluconeogenesis can be estimated

by the intersection of the ranges of three reactions: PPS, FBP, and G6PP which reverse

reactions PYK, PFK, and HK, respectively from glycolysis. PFK alone provides a sharp

Page 51: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

40

estimate for gluconeogenesis as it is the only well resolved flux out of the three. G6PP is

unresolvable despite the fact that glucose-6-phosphate and glucose have distinct labeling

patterns because the labeling pattern of intracellular glucose is not measured. In addition,

an inactive malic enzyme results in identical phosphoenolpyruvate and pyruvate labeling

patterns thereby rendering PPS unresolvable. The reversibility of TPI and FBA result in

altered labeling patterns of fructose-1,6-bisphosphate compared to fructose-6-phosphate.

An active FBP would alter the labeling pattern of all metabolites within the PPP thereby

affecting the observed labeling patterns of downstream amino acids such as glycine, serine,

alanine, valine, leucine, and isoleucine. This property aids in the resolution of FBP and

thus facilitating the resolution of gluconeogenesis to within 8% of the total glucose uptake.

No information regarding the activity of gluconeogenesis can be gleaned using only the

core model. The ED pathway is equally well resolved using the GSMM as in the core

model due to the fact that it produces a pyruvate molecule with a different carbon atom

arrangement compared to glycolysis. This directly impacts the predicted labeling of alanine

and the branched chain amino acids derived from pyruvate. Similarly, flux through the

glyoxylate shunt produces a differently labeled aspartate when compared to the

conventional TCA pathway.

A closer analysis of the obtained flux ranges provided an insight into the sensitivity of the

obtained flux ranges to biomass composition. Any perturbations to drains from central

metabolism lie within the estimated 95% confidence interval of all the fluxes thereby

rendering flux predictions insensitive to perturbed biomass composition assuming that the

range of perturbation does not exceed 10%. It was found that the size of the obtained flux

Page 52: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

41

ranges was primarily due to errors in extracellular flux measurements. To confirm this, we

re-estimated the fluxes and flux ranges for the same network while allowing a 10% change

to each biomass component individually while maintaining the cell dry weight. The

absence of any significant flux range shift, along with a proportional increase to the drain

of our target biomass component from central metabolism corresponding to perturbation

confirmed our hypothesis of insensitivity of flux ranges to biomass composition

perturbation given this data set. Even though fluxes of biomass components changed in

proportion to the perturbation, any additional impact on central metabolism was minimal.

As described earlier a common practice is to perform MFA using a core model and then

project the identified flux ranges onto a genome-scale model. We performed the same two-

stage implementation and compared the results with MFA using the full GSMM model.

We found that the use of the core model for MFA generates flux ranges that upon mapping

onto the GSM model, propagate all assumptions made during the construction of the core

model. For as many as 90% of the reactions in the GSM flux ranges are more restricted

than when the full GSMM is used for MFA. Figure 2.4 shows the distribution of flux range

contraction upon imposing flux ranges derived for the core model to obtain GSM flux

ranges using FVA. For more than half of the reactions in the GSM model, ranges are more

than halved compared to the correct elucidation using the GSMM model. Notably, there

are 295 reactions whose estimated range upon projection to the GSM model was 54%

narrower than supported by the data. This demonstrates in a quantitative manner the fact

that assumptions made during the core model construction propagate onto the GSM model

leading to possibly erroneous conclusions about reaction flux identifiability. Overly tight

Page 53: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

42

flux ranges predicted using the core model often shut down alternate pathways which

confounds reaction essentiality prediction. For example, acetate kinase is essential based

on predictions using core model MFA, however, using a GSMM based MFA the reaction

is correctly predicted (Baba et al., 2006) as non-essential. Acetate production can be taken

over by the POX reaction which generates ATP by transferring electrons from pyruvate to

the electron transport chain while producing acetate and carbon dioxide. In contrast, for

about 10% of reactions MFA using the GSMM model leads to tighter flux resolution

compared to the core model. These reactions include energy balance reactions such as

oxidative phosphorylation which requires network-wide resolution of redox balances for

proper resolution. This quantitatively demonstrates the significance of describing

metabolism at the genome-scale and the feasibility of inferring fluxes using MFA at a

genome-scale.

2.4. Discussion

In this chapter, we have applied the framework of 13C-MFA to perform flux and range

elucidation using a genome-scale model. Using available extracellular flux measurement

data, we eliminated blocked reactions and those incapable of carrying flux in the iAF1260

GSM model of E. coli using FVA. The resulting GSM model contained 697 reactions and

595 metabolites with 29 reversible reactions. The corresponding atom mapping model was

generated by integrating the atom mapping information for every reaction in this GSM

model using the CLCA algorithm. Finally, the network was decomposed using the EMU

algorithm so as to evaluate MIDs of target metabolite fragments for a given flux

distribution, so as to iteratively obtain an optimal flux distribution which best explains

Page 54: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

43

experimentally observed GCMS labeling data. In order to account for degeneracy within

the metabolic network and the inherent error associated with experimental data, we also

evaluated the 95% confidence intervals associated with each flux. Given the

computationally intensive nature of the 13C-MFA procedure, we modified the confidence

interval estimation procedure so as to decrease the computational time by up to 67% by

identifying the minimal set of fluxes whose 95% confidence intervals need to be evaluated

using flux coupling analysis checks. We also redefined the definition of χ2 degrees of

freedom to better describe the structural properties of the genome-scale EMU model and

found that our obtained optimal flux distribution was statistically acceptable.

We found that the GSM model is able to produce a better fit compared to the core model

owing to improved prediction of alanine and valine MS data. While the overall flux

distribution remained similar to that of the core model, the comprehensive biomass

equation used in the GSM model resulted in a shifted PPP range. The presence of

gluconeogenesis along with glycolysis created futile cycles for three key reactions, of

which only PFK could be resolved properly using 13C-MFA. The availability of the

methylglyoxal pathway as an alternative to lower glycolysis resulted in a reduction of the

lower bound for TPI, GAPD, PGK, and ENO. PYK experienced significant bound

expansion in the GSM model compared to the core model due to the availability of the PTS

mechanism for glucose uptake as an alternate pathway facilitating the conversion of

phosphoenolpyruvate to pyruvate. The reduced flux through PDH was found to be due to

the activity of POX, which is known to be a key reaction in the growth of E.coli under

aerobic condition (Abdel-Hamid et al., 2001; Li et al., 2007). The expansion in the TCA

Page 55: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

44

cycle ranges was due to increased glutamate synthesis for biomass production, availability

of arginine and glutamate degradation as alternatives to AKGDH, and the presence of an

additional amino group donation mechanism with aspartate as the donor and fumarate as

the product. Since both evaluated tracer schemes produced consistent results, it is evident

that the inability to resolve certain pathways is not tracer-specific (Figure 2.2 and Figure

2.5), but a property of the GSMM model, indicating the need for additional experimental

data to completely resolve all fluxes in the metabolic network. We found that energy

metabolism was quite well resolved with the amount of free ATP being greatly limited

thereby facilitating the resolution of futile cycles. The transhydrogenase reaction, on the

other hand was poorly resolved with much uncertainty in its predicted direction. Finally,

we found that utilization of the bounds estimated using the core model resulted in much

smaller flux ranges in as many as 90% of the reactions due to the fact that these bounds

carry with them the assumptions involved in the creation of the core model. The source of

this reduction was found to be the inactivation of alternate pathways such as glutamate

degradation, and futile cycles such as gluconeogenesis, occurring due to tighter mass

balance constraints imposed by the flux ranges estimated using the core model. While such

assumptions do accelerate flux computations by elimination of variables and provide

information on total sum of fluxes through all alternate routes for a given transformation,

they fail to account for flexibility within the network, thus having an adverse impact on

secondary inferences from the estimated flux ranges. As such, this study highlights the

need for the use of more comprehensive models for flux elucidation so as to obtain a better

agreement with experimentally observed data and improved quality of inference. The flux

range expansion cannot be resolved by optimal tracer design alone because it stems from

Page 56: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

45

the presence of alternate pathways with identical overall atom transitions. This loss of

resolution will persist even if every branch point were to be resolved perfectly (e.g., by

COMPLETE-MFA). Therefore, flux and range estimation using alternate glucose-based

tracer schemes (Crown et al., 2015) or simultaneous fitting of multiple data sets (Leighty

and Antoniewicz, 2013) would be unable to resolve this ambiguity in alternate pathway

activity, as confirmed by the evaluation of multiple tracer schemes in this study.

A major challenge with MFA using GSM models is the increase in computation time

associated with the vast increase in the number of variables. An effective way to address

this issue would be to simplify the genome-scale MFA model to the size of a (near) core

model. The strategy here is to decrease the number of variables while retaining all the

information regarding novel mappings and alternate pathways contained within the GSM

model. An elementary reduction method for simplification of linear pathways at the EMU

network level has already been proposed (Antoniewicz et al., 2007), but not yet

implemented. Analysis of the EMU model has already revealed that only 60% of all the

reactions contained within the GSM model can be resolved using 13C-MFA. We have also

seen that the actual regression model described by EMU balances has only 99 free

variables. As such, the decoupling of the EMU network from the entire GSM model alone

can result in a 63% reduction in the number of free fluxes. Further reduction of EMU

networks can be achieved by elimination of linear pathways, so as to obtain the simplest

possible EMU network of each size. The reduction of model size will help alleviate some

of the complexity associated with inclusion of intracellular compartments.

Page 57: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

46

Better resolution of fluxes using a GSM model requires an optimal tracer design and the

maximum set of extracellular flux measurements. It has already been shown that no single

tracer is sufficient to resolve all the fluxes within a metabolic network, and that different

tracers promote the resolution of specific branch points (Leighty and Antoniewicz, 2013).

While MFA using multiple datasets does work effectively on a core model, it faces the

same problems associated with scale-up at the genome-scale. This can be overcome by

designing an isotope-labeling experiment using an optimal set of tracers and GCMS

measurements, which can reliably resolve all the key branch points in the metabolic

network. The basis for an optimal tracer design has already been proposed in the form of

EMU basis vectors (Crown and Antoniewicz, 2012), which decouples substrate labeling

from the fluxes in the model. The availability of MS measurements for other metabolites

besides amino acids will further improve the resolution of poorly resolved fluxes in the

GSM model. MS measurements of fatty acids (Yoo et al., 2008) and intracellular

metabolites (Luo et al., 2007; Metallo et al., 2012) have already been utilized to infer flux

distributions. Regarding the flux elucidation presented in this paper, Table 3.2 provides a

candidate list of metabolites that if measured would resolve alternate routes. This is based

on the idea that, if the labeling distribution of a metabolite unique to a given alternate

pathway is different from the labeling distribution at the start of the experiment then the

pathway carries flux and the steady-state flux through the pathway can be evaluated using

an isotopic non-stationary flux analysis procedure for E. coli as described previously (Noh

et al., 2007; Young et al., 2008). Noh et al (2007) have used a rapid sampling approach

followed by methanol quenching to obtain multiple isotopic non-stationary data points

before attainment of isotopic steady-state. Pool size measurement using already established

Page 58: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

47

protocols for methylglyoxal (Girgis et al., 2012) and γ-aminobutyrate (O'Byrne et al., 2011)

are already available for resolving flux through their corresponding pathways.

Incorporation of these additional measurements can further sharpen the estimated flux

ranges to provide a better resolution of metabolism. The estimated flux ranges can be

refined further with the availability of a more complete set of extracellular flux

measurements to include secretion profiles of additional products, such as succinate and

other organic acids, so as to close the carbon balance for the given growth condition. These

set of measurements will reduce the solution space by constraining the flux through various

production pathways, thereby further simplifying the task of identifying the optimal flux

distribution. A complete set of measurements required for maximum flux elucidation can

be obtained using a formulation such as OptMeas (Chang et al., 2008).

The extent of resolution obtained using a GSM model for E. coli points to the possibility

of more reliable flux elucidation and biologically relevant inferences in more complex

systems such as yeast, plants, and mammalian systems with compartmentalized

metabolism. Application of genome-scale MFA to such systems will allow the use of

closed cofactor balances without the risk of altering the actual flux distribution predicted

using the flux estimation procedure. This will enable identifying metabolic bottlenecks

leading to more informed metabolic engineering interventions that improve the yield of

target products. It is important to note that flux resolution using 13C-MFA ultimately

depends on how well the organism’s genome is annotated, the complexity of the underlying

EMU network, and the quality of experimental data used for flux estimation.

Page 59: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

48

Table 2.1: 𝜒2 degrees of freedom for the core model and the genome-scale model.

Statistical significance requires that the number of degrees of freedom be positive as a 𝜒2

value is defined only for positive integers. The difference in the number of degrees of

freedom estimated using free fluxes and based on the EMU network points to an inherent

flaw in using the free fluxes as the number of model parameters.

Degrees of Freedom Maximum SSRES

Core model 55 96

Genome-scale model

(based on free fluxes)

-30 Not defined

Genome-scale model

(based on EMU network)

27 44

Page 60: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

49

Table 2.2: Additional suggested MS measurements for resolving various alternate

routes.

Central Metabolic

pathway

Alternate

Pathway

Metabolite

measurement

candidate

Measurement

Type

Lower glycolysis Methylglyoxal

pathway methylglyoxal

Time-course MS

with known pool

size

TCA cycle Arginine

degradation γ-aminobutyrate

Time-course MS

with known pool

size

HEX1-G6PP futile

cycle G6PP

Intracellular

glucose Steady-state MS

Page 61: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

50

Figure 2.1: Comparison of prediction of experimentally observed amino acid MS data

by the core model (green bars) and the GSM model (brown bars).

Page 62: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

51

Figure 2.2: Comparison of fluxes elucidated using 2-13C-glucose with the core model

and GSM model. (a) Schematic representation of all reactions and metabolites involved in

central metabolism of E. coli. Comparison of flux ranges (in mmol/dmol-glucose) using

core model (green bars) and GSM model (brown bars) for (b) glycolysis and

gluconeogenesis, (c) anaplerotic reactions and glyoxylate shunt, (d) TCA cycle, (e) PPP

and ED pathway.

(a)

PGI

PFK

FBA

TPI

G6PDH

GND

GAPD

PGK/PGM

ENO

PYK

PDH

ACONT

IDH

AKGDH

SUCOAS

SDH

FUM

MDH

CS

RPI RPE

TKT2

TALA

TKT1

EDA/EDD

PPCPPCK

ME

ICLMALS

glc-D

g6p

f6p

fdp

dhap g3p

6pg

ru5p

xu5pr5p

s7p

e4p

3pg

2pg

pep

pyr

accoa

cit

icit

akg

succ

fum

mal

succoa

oaa

f6p

g3p

glyox

Page 63: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

52

(b)

Page 64: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

53

(c)

Page 65: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

54

(d)

Page 66: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

55

(e)

Page 67: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

56

Figure 2.3: Resolution of energy metabolism in core model (green bars) and GSM

model (brown bars).

Page 68: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

57

Figure 2.4: Loss of information expressed as % bound contraction of flux ranges for

every reaction in the GSM model when flux ranges are estimated using FVA with core

model-based MFA derived flux ranges as constraints.

Page 69: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

58

Figure 2.5: Flux distribution comparison for core model and GSM model using 5-13C

glucose tracer. Comparison of flux ranges (in mmol/dmol-glucose) using core model

(green bars) and GSM model (brown bars) for (a) glycolysis and gluconeogenesis, (b)

Pentose phosphate pathway, (c) anaplerotic reactions and glyoxylate shunt, (d) TCA cycle,

(e) energy metabolism. The reaction nomenclature is described in Figure 2.2a.

(a)

Page 70: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

59

(b)

Page 71: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

60

(c)

Page 72: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

61

(d)

Page 73: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

62

(e)

Page 74: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

63

Chapter 3

Elucidation of photoautotrophic carbon flux topology in Synechocystis

PCC 6803 using genome-scale carbon mapping models

This chapter has been previously published in modified form in Metabolic Engineering

(Saratram Gopalakrishnan, Himadri B. Pakrasi, and Costas D. Maranas. Elucidation of

photoautotrophic carbon flux topology in Synechocystis PCC 6803 using genome-scale

carbon mapping models. Metabolic Engineering 47(2018): 190-199.)

3.1. Introduction

Metabolic engineering of photosynthetic organisms is aimed at the sustainable

bioconversion of abundant and inexpensive substrates such as sunlight and CO2 into

valuable products such as biomass (Maurino and Weber, 2013), biofuels (Atsumi et al.,

2009), and secondary metabolites (Giuliano, 2014). The efficacy of metabolic engineering

interventions is evaluated by measuring internal fluxes via 13C-Metabolic flux analysis

(13C-MFA) methods (Metallo et al., 2009; Sauer, 2006; Tang et al., 2009). These methods

determine the intracellular flux distributions consistent with experimentally measured

metabolite labeling distributions given a stable-isotope-labeled input carbon substrate

(Zomorrodi et al., 2012). Having CO2 as the only carbon substrate in cyanobacterial

photoautotrophic metabolism, implies uniformity of metabolite labeling distributions

under isotopic steady-state (Shastri and Morgan, 2007). As a consequence of this, flux

elucidation under photoautotrophic conditions requires transient labeling experiments and

aims to recapitulate metabolite labeling dynamics in addition to steady-state labeling

Page 75: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

64

distributions (Young et al., 2008). While this approach presents the opportunity to address

key questions pertinent to cyanobacterial metabolism such as (i) completion of the TCA

cycle, (ii) utilization of the photorespiratory pathway, (iii) existence of the glyoxylate

shunt, and (iv) carbon fixation efficiency, experimental and computational challenges have

so far restricted wide applicability resulting in a limited number of isotopic instationary

MFA (INST-MFA) studies. These include a demonstration of sub-optimal carbon

incorporation in Synechocystis PCC 6803 (hereafter Synechocystis) (Young et al., 2011)

and assessment of TCA cycle functionality (Xiong et al., 2015) using a simplified central

metabolic model. Other studies aimed at capturing the metabolic response to nitrogen

depletion (Hasunuma et al., 2013) and essentiality of the photorespiratory pathway (Huege

et al., 2011) only obtained split ratios using fractional labeling and turnover of metabolites

as opposed to network-wide fluxes. Targeted flux ratio elucidation in any labeling

experiment is vulnerable to errors arising from distal influences (Gopalakrishnan and

Maranas, 2015a; McCloskey et al., 2016b; Suthers et al., 2007). Moreover, ignoring pre-

existing (unlabeled) carbon pools upon reaction lumping in core metabolic models causes

the artificial acceleration of labeling dynamics leading to significant disagreements

between model predictions and experimental data (Noh and Wiechert, 2011). Furthermore,

pool sizes are often not measured despite being co-estimated with fluxes, resulting in poor

resolution of most metabolite pools sizes. These factors are likely to bias the analysis of

labeling data using core metabolic models, motivating the re-analysis of metabolite

labeling dynamics obtained during transient labeling experiments using a genome-scale

metabolic mapping (GSMM) model.

Page 76: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

65

The accuracy of flux estimation using a GSMM model is contingent on the curation of the

base genome-scale metabolic (GSM) model. The GSM model for Synechocystis iSyn731

(Saha et al., 2012) accurately predicts 95% of the available gene (non)essentiality data

which is better than the prediction capability of the iAF1260 model for E. coli (Zomorrodi

and Maranas, 2010). This safeguards against incorrect flux inference arising from omission

of too permissive inclusion of reactions in the model (Gopalakrishnan and Maranas,

2015a). With the availability of a curated GSM model and transient metabolite labeling

distributions (Shastri and Morgan, 2007), (i) the construction of a genome-scale metabolic

mapping (GSMM) model and (ii) scalability of existing algorithms become the bottlenecks

for successful flux elucidation at the genome-scale (Gopalakrishnan and Maranas, 2015b).

In addition to the carbon paths contained within core models (Abernathy et al., 2017;

Alagesan et al., 2013; Feng et al., 2010; Yang et al., 2002a, b, c; You et al., 2014; Young

et al., 2011; Zhang and Bryant, 2011), the GSMM model affords expanded pathway

coverage to include glyoxylate metabolism, completion of the TCA cycle, and recycling of

by-products of peripheral metabolism such as CO2, formate, glycolate and acetate. While

the most reliable source of atom mapping data is by directly tracing the reaction

mechanism, it is not available for most reactions, thus requiring the use of computational

procedures such as MCS (Chen et al., 2013), PMCD (Jochum et al., 1980), EC (Morgan,

1965), MWED (Latendresse et al., 2012), and CLCA (Kumar and Maranas, 2014) to infer

plausible mappings. Simulation of labeling distributions for a given flux distribution is

performed via integration of a system of ordinary differential equations (ODEs) (Noh et

al., 2006; Young et al., 2008) upon decomposition of the mapping model using frameworks

such as cumomers (Wiechert et al., 1999) or Elementary Metabolite Units (EMUs)

Page 77: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

66

(Antoniewicz et al., 2007). Fluxes are estimated as the solution of a non-linear least-squares

fitting problem that minimizes the deviation of predicted intracellular metabolite labeling

distributions and dynamics from experimental data. Since the analytical solution for the

system of ODEs describing labeling dynamics is not tractable, the ODEs must be integrated

numerically. Memory requirements limits the use of available integration packages, thus

requiring the development of customized integrators. The state-of-the art algorithm (Young

et al., 2008) utilizes an exponential integrator in conjunction with a first-order hold

equivalent. When expressed in state-space form, the solution to these equations involves

the computation of the exponential of a matrix, which scales poorly with network size

requiring the development of more efficient algorithms undertaken in this study.

In this chapter, genome-scale INST-MFA is performed to glean insights into the metabolic

map of photoautotrophically grown Synechocystis. A GSMM model imSyn617 for

Synechocystis is constructed based on the corresponding GSM model iSyn731 (Saha et al.,

2012) to enable flux elucidation using previously measured metabolite labeling dynamics

(Young et al., 2011). The set of active reactions under photoautotrophic growth conditions

is identified by performing Flux Variability Analysis (Mahadevan and Schilling, 2003)

upon constraining the model with experimentally measured growth and product yields

(Young et al., 2011) for growth with bicarbonate as the sole carbon source. The GSMM

imSyn617 is constructed to encompass all active reactions involved in carbon balances.

Reaction mapping information is assembled from imEco726 (Gopalakrishnan and

Maranas, 2015a), reaction mechanisms and the CLCA algorithm (Kumar and Maranas,

2014). imSyn617 is deployed for genome-scale INST-MFA to uncover novel insights into

Page 78: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

67

the biology of photoautotrophic growth of Synechocystis using the published labeling data

for 15 metabolites from central metabolism (Young et al., 2011). We infer that only 88%

of the assimilated bicarbonate is fixed via the Calvin-Benson-Bassham (CBB) cycle while

the rest is fixed by phosphoenol pyruvate carboxylase (PPC) but eventually off-gassed as

CO2 through malic enzyme, the TCA cycle, and peripheral metabolic pathways. We

confirmed that there is no flux through the oxidative pentose phosphate pathway and that

regeneration of pentose phosphates occurs through the transaldolase reaction. With no flux

through pyruvate kinase, pyruvate is synthesized indirectly from phosphoenol pyruvate

(PEP) via PPC and malic enzyme. Trace flux is observed from α-ketoglutarate (AKG) to

succinate indicating dispensability of the lower TCA cycle during photoautotrophic

growth. Moreover, the oxygenase reaction of RuBisCO is found to be the primary source

of glycine with serine being synthesized directly from 3-phosphoglycerate (3PG). These

modalities result in a bifurcated topology of the TCA cycle reactions and serine metabolism

enabling maximal conversion of RuBisCO fixed CO2 to biomass. This analysis confirmed

that maximization of biomass yield from fixed carbons explains the allocation of fluxes in

the metabolic network in Synechocystis as supported by experimental findings.

3.2. Methods

3.2.1. Construction of imSyn617

The GSM model for Synechocystis (iSyn731) (Saha et al., 2012) was simplified using Flux

Variability Analysis (FVA) (Mahadevan and Schilling, 2003) under photoautotrophic

conditions using bicarbonate as the sole carbon source to eliminate all reactions incapable

of carrying flux. The feasible solution space was constrained using growth rate and organic

Page 79: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

68

acids yield (Young et al., 2011). Photon fluxes are calculated based on experimental

lighting conditions described earlier (Nogales et al., 2012). Thermodynamic infeasible

cycles (Schellenberger et al., 2011) in the form of isles (Wiechert and Wurzel, 2001) were

manually eliminated to further reduce the size of the metabolic model. The phosphoserine

pathway was included in accordance with recent genome annotation updates (Klemke et

al., 2015). The recently proposed Entner-Doudoroff pathway was excluded from the

metabolic model based on its dispensability under photoautotrophic growth conditions and

a lack of pathway characterization using tracer experiments (Chen et al., 2016). The

phototrophic growth model for Synechocystis contains 729 reactions and 679 metabolites.

The GSMM model imSyn617 was constructed for Synechocystis starting from the existing

GSMM for E. coli, imEco726 (Gopalakrishnan and Maranas, 2015a). Carbon mapping for

498 reactions were obtained directly from imEco726, spanning glycolysis, pentose

phosphate pathway, TCA cycle, biosynthesis of all amino acids except glycine and serine,

synthesis of palmitate and stearate, nucleotide biosynthesis, and the synthesis of cofactors:

NAD, tetrahydrofolate, and riboflavin. Of the originally 109 unmapped reactions, 96

reactions spanning carbon fixation, photorespiration, glyoxylate metabolism, glycolipid

and polyunsaturated fatty acid synthesis, and porphyrin biosynthetic pathways generated

metabolites recycled in central metabolism. Atom mapping for 68 unmapped reactions was

constructed from the reaction mechanism of each reaction (Supplementary File 1).

Mapping for the remaining 41 reactions including spontaneous reactions and those without

available mechanisms was obtained using the CLCA algorithm (Kumar and Maranas,

2014; Kumar et al., 2012). Alternate mappings were generated for 49 reactions based on

the presence of nine symmetric metabolites in the cyanobacterial models. Carbon

Page 80: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

69

rearrangements within the triose phosphates were identified by carbon path tracing using

an EMU-based depth-first search algorithm. All carbon atoms (single and bonded) are

represented using their corresponding EMU following a carbon numbering scheme

consistent with the IUPAC convention. The metabolic model and the corresponding atom

mapping model are made available in Supplementary File 1.

3.2.2. Algorithmic procedure for flux estimation based on least-squares

minimization

Flux and range estimation following EMU decomposition (Antoniewicz et al., 2007) of the

mapping model was performed as described earlier (Gopalakrishnan and Maranas, 2015a).

The labeling dynamics of 15 central metabolites, spanning sugar phosphates, glycolytic

intermediates, and organic acids, utilized for flux estimation was obtained from a previous

study (Young et al., 2011) with Synechocystis grown under photoautotrophic growth

conditions and 50% 13C-bicarbonate tracer. Model decomposition resulted in the

identification of 851 EMUs, 156 free fluxes (Wiechert et al., 1999), and 204 pool sizes.

Metabolite labeling dynamics was modeled using a system of 8.4 × 105 simultaneous

ODEs relating metabolite labeling distributions 𝑿(𝑡) to the initial labeling state, 𝑿(0), the

carbon tracer, and the system state transition matrix, 𝑭, containing fluxes, 𝒗, and pool sizes,

𝒄. This system of ODEs simulates 2,311 EMU mass fractions and their sensitivity to 367

fitted parameters. The system of equations in continuous time domain was converted to

discrete time domain using the procedure described in Appendix B. The mathematical

expressions for the transition matrices, 𝜱, 𝜞, and 𝜴, in terms of the 𝑭 were obtained by

solving the system of ODEs after applying a non-causal first-order hold equivalent

Page 81: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

70

(Franklin et al., 1997) as opposed to the previously described state-space form method

(Young et al., 2008) so as to improve the scalability of the flux estimation procedure. This

resulted in a 7% and 48% reduction in computation time for the core model and the GSM

model, respectively. This significance of this improvement is anticipated to increase with

model size. The NLP was solved using a modified Levenberg-Marquardt algorithm

(Madsen et al., 2004) equipped to handle linear inequality constraints (Gill et al., 1984).

The NLP was solved with 100 randomized initial feasible flux distributions and the best

solution was chosen for confidence interval calculations owing to the non-convex nature

of the objective function. The quality of the obtained flux distribution was evaluated using

a 𝜒2 goodness of fit test to ensure statistical significance of the obtained results. 95%

confidence intervals were determined as described earlier (Antoniewicz et al., 2006;

Gopalakrishnan and Maranas, 2015a). All fluxes, expressed in mmol/dmol bicarbonate

uptake (BCU), are normalized to 100 mmol/gdw-hr HCO3- uptake.

3.3. Results

This section highlights the novel carbon paths included upon scale-up to a GSMM model

and their role in facilitating prediction departures from flux distributions obtained using

core models. In addition, flux topologies of pathways absent in the core model are

elucidated and their biological implications are discussed.

3.3.1. New carbon paths covered by mapping model imSyn617

Expansion of pathway coverage in the GSMM model of Synechocystis to include

glyoxylate, amino acid, lipid, and peripheral metabolism contributes to 18 novel carbon

Page 82: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

71

paths not captured by the core model (Figure 3.1). These novel paths arise from new carbon

skeleton rearrangements, conserved group recycling, and new mechanisms for CO2

incorporation in Synechocystis. Three alternate routes to lower glycolysis are traced

through methylglyoxal synthesis, photorespiration, and serine metabolism with identical

atom mapping. Two paths via arginine degradation and the GABA shunt are present with

atom transitions identical to the lower TCA cycle, indicating the presence of a TCA-like

carbon skeleton rearrangement despite the unresolved completion of the TCA cycle

(Steinhauser et al., 2012; Yu et al., 2013; Zhang and Bryant, 2011). Carbon recycling from

peripheral metabolism occurs via acetate, formate and CO2. The condensation of the

methyl group from S-adenosyl methionine and the δ-carbon of glutamate (GLU-5) in the

adenosylcobalamine pathway produces acetate, which is either metabolized via the TCA

cycle or channeled into lipid production. Formate is produced during the biosynthesis of

tetrahydrofolate (THF), riboflavin, and thiamin pyrophosphate, whereas CO2 is generated

as a byproduct of porphyrin, terpenoid, and pyridoxal phosphate biosynthetic pathways.

Formate and CO2 are also produced as the end products of glyoxylate oxidation via oxalate.

Formate is oxidized to CO2 via formate dehydrogenase, which is eventually reincorporated

via RuBisCO and the anaplerotic PPC reaction. The tetrahydrofolate pathway also

generates glycolate, which feeds into the photorespiratory pathway. Since Synechocystis

lacks PEP carboxykinase (PPCK) activity to drive carbon flow from the TCA cycle to

lower glycolysis, CO2 incorporated via PPC is routed to TCA cycle-derived metabolites

only. CO2 is also incorporated via glycine dehydrogenase (GLYDH) in glyoxylate and

glycine metabolism in which glycine is synthesized by condensation of one CO2 and a

methenyl group donated by methenyl-THF. This reaction, in conjunction with flux through

Page 83: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

72

the photorespiratory pathway contributes to three novel carbon backbone arrangements

possibly unique to cyanobacteria (Figure 3.2a). It is important to note that these

mechanisms both incorporate and off-gas CO2 atoms of different origin resulting in

alteration of labeling distributions of various intracellular metabolites similar to 13C

labeling dilution effects seen during aeration of cell cultures (Leighty and Antoniewicz,

2012). However, it appears that net carbon fixation is performed by RuBisCO alone. In

addition to carbon skeleton rearrangements, the mapping model reveals the existence of

pathways facilitating conserved moiety cycling (E4P and G3P), which are capable of

delaying 13C incorporation (Figure 3.2b). The comprehensive inventory of carbon paths

contained within the GSMM model provides the means for better recapitulation of

experimental data and accurate flux elucidation with a high level of detail.

3.3.2. Comparison of elucidated fluxes between using imSyn617 and core mapping

models

The simulated labeling distributions are in much better agreement with experimental data

when fitted using imSyn617 (Sum of Squares of Residuals, SSRES = 511.4; Degrees of

freedom, DOF = 556) compared to the core model (Young et al., 2011) (SSRES = 684,

DOF = 697). The statistical significance of the reduction in SSRES using imSyn617 was

assessed using an F-test. The F-test provides a way of testing whether the improvement in

fit upon model expansion is not due an increased number of parameters but rather due to

better capturing of labeled carbon routes through metabolism. The F-statistic is calculated

to be 1.335 (p = 0.012). This value confirms the statistical significance of the additional

parameters introduced in the imSyn617 with a confidence level of 95%. Furthermore, the

Page 84: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

73

substantially different flux distribution elucidated using imSyn617 associated with the

reduced SSRES captures a statistically significant new optimum in the least squares

objective function. The improved fit is attributed to the better recapitulation of the labeling

dynamics of PEP-167, 3PG-185, and RuBP-309 fragments indicated by a reduction in

SSRES (Figure 3.3). Because the experimentally measured metabolite labeling distribution

and dynamics are inconsistent with the sole action of the CBB cycle (Figure 3.6 and 3.7),

flux datasets inferred by both models employ compensatory mechanisms to delay the mass

shift associated with 13C incorporation. Simplification of reactions in the core model via

lumping of linear pathways may accelerate metabolite labeling dynamics (Noh and

Wiechert, 2011). As a result, the core model derives unlabeled carbons from glycogen

degradation in conjunction with flux through the oxidative pentose phosphate pathway to

delay turnover of metabolite pools (Figure 3.8). In contrast, imSyn617 does not lump

reactions and further delays metabolite pool turnover by favoring carbon paths involving

conserved moieties (Figure 3.2 and 3.4), thereby affording a reduction in deviation from

experimental data using imsyn617 (Figure 3.6). An immediate consequence of this flux

redistribution is the dispensability of flux through the oxidative PP pathway (Figure 3.4)

according to imSyn617. The biomass formation reaction in the core model is approximated

using precursors from central metabolism (Shastri and Morgan, 2007) whereas imSyn617

mirrors completely the biomass equation of iSyn731 parameterized using experimental

measurements (Nogales et al., 2012; Saha et al., 2012). This results in significant

differences in metabolite drains between the two models. Overall, differences in labeling

dynamics and stoichiometry associated with biomass metabolite drains contribute to stark

Page 85: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

74

shifts in estimated central metabolic flux ranges upon scale-up from a core to a genome-

scale mapping model (Figure 3.4).

Flux elucidation using a GSMM model reveals that only 88% of the assimilated

bicarbonate is fixed by RuBisCO (Figure 3.4) compared to 120% in the core model. The

increased flux through RuBisCO predicted by the core model is attributed to a 16

mmol/dmol bicarbonate uptake (BCU) flux through G6PDH, resulting in a futile cycle

between the CBB cycle and the PP pathway. This futile cycle is shown to be inactive when

using imSyn617. This is consistent with experimentally verified dispensability of this

pathway inferred from unimpaired growth of the Synechocystis zwf mutant under

photoautotrophic growth conditions (Scanlan et al., 1995). As a consequence of this, an

83% reduction in the flux through PGI is seen using imSyn617 compared to the core model

with the only purpose of generating G6P for glycogen and glycolipid synthesis. imSyn617

leverages the E4P recycling mechanism (Figure 3.2b) to delay metabolite labeling

dynamics leading to a two-fold increase in flux through SBA and SBP reactions and a flux

of 37 mmol/dmol BCU through TAL in imSyn617 compared to the core model. The use

of this pathway for regeneration of pentose sugar phosphates results in no flux through

FBA and FBP reactions. Note that Synechocystis contains two FBAs: CI-FBA with higher

reactivity for fructose-bisphosphate and CII-FBA with higher reactivity for sedoheptulose-

bisphosphate. CI-FBA has been shown to be non-essential during photoautotrophic growth

of Synechocystis (Nakahara et al., 2003). It has been suggested that over 90% of the FBA

in Synechocystis under photoautotrophic growth conditions is CII-FBA (Liang and

Lindblad, 2016), consistent with the fact that CII-FBA has a 3-fold higher transcriptomic

Page 86: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

75

abundance (Saha et al., 2016) and a 39-fold higher proteomic abundance (Takabayashi et

al., 2013) compared to CI-FBA. These findings support an inactive CI-FBA which results

in the absence of flux through the fructose bisphosphate aldolase and fructose-

bisphosphatase reactions. As a consequence of this, hexose phosphates for glycogen

synthesis can only be synthesized via the TAL reaction. This increased flux through TAL

is consistent with the experimentally observed higher expression levels of the tal gene

during photoautotrophic growth phase in Synechocystis (Kucho et al., 2005). In order to

assess the impact of the higher flux through the TAL reaction on the quality of fit, flux

elucidation was performed using imSyn617 upon constraining the flux through TAL to

zero. Removal of the TAL reaction redirects carbon flux through the FBA/FBP route and

the oxidative pentose phosphate pathway. Since the TAL reaction participates in a cycle

involving a conserved E4P moiety, flux through this cycle delays the 13C incorporation into

sugar phosphate intermediates in the CBB cycle. This delay is not possible via the

FBA/FBP route and therefore increases the SSRES to 541 due to poorly recapitulated

labeling dynamics of R5P229 and RuBP309 fragments (Figure 3.9). In comparison, the

core model uses the traditional CBB pathway for pentose phosphate regeneration, resulting

in a flux of 58 mmol/dmol BCU through FBA and FBP reactions while the directionality

of TAL remains unresolved.

While both models assume the same biomass macromolecular compositions, differences

in precursor drains result in significant flux range shifts downstream to carbon fixation in

the core and imSyn617 models. In particular, lipids are traced indirectly only through free

fatty acids in the core model as opposed to a complete coverage of glycolipids, di- and

Page 87: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

76

triacylglycerols (DAGs and TAGs), phospholipids, and sulfoquinovosyl DAGs in

imSyn617. This results in a reduced acetyl-CoA demand and an increased DHAP demand

for biomass production in imSyn617. As a consequence, a 50%, 89%, and 67% reduction

in flux is predicted through lower glycolysis, PK, and PDH reactions, respectively. In

addition to this, the core model uses the glyoxylate shunt as a secondary source of

glyoxylate, thereby enabling completion of the TCA cycle without including the AKGDH

reaction. In contrast, the glyoxylate shunt is excluded from iSyn731 as this pathway is

shown to be absent in Synechocystis (Thiel et al., 2017) resulting in glyoxylate production

only in the photorespiratory pathway. Furthermore, iSyn731 accounts for multiple avenues

for the completion of the TCA cycle via AKGDH and its alternate routes and captures

glycine and serine metabolism, thereby elucidating parts of the metabolic network not

captured by the core model.

3.3.3. New insight on carbon paths gained using imSyn617

The overall carbon balance reveals that 86% of the assimilated bicarbonate is channeled

towards biomass production, 12% is ultimately off-gassed as CO2 and the remaining 2% is

distributed between organic acids and glycogen storage. 602 reactions are resolved with a

flux range narrower than 10 mmol/dmol BCU. 407 reactions are identified to be growth-

coupled. These flux ranges were compared to flux ranges generated using FVA upon

constraining the bounds of substrate uptake and product yields with MFA-derived lower

and upper bounds. The superior flux resolution afforded by INST-MFA compared to

simply FVA is attributed to the unambiguous elucidation of fluxes across all branch-points

such as CBB cycle/photorespiration, glycolysis/PP pathway, anaplerotic reactions, and the

Page 88: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

77

TCA cycle. In addition to this, futile cycles involving central metabolic reactions such as

PK, GAK, and PFK are well resolved with zero flux using INST-MFA compared to FVA

based on their contribution to carbon skeleton rearrangement and impact on metabolite

labeling dynamics. 61 reactions outside the purview of the EMU model are poorly resolved

by both INST-MFA and FVA. These include reactions from energy metabolism such as

cyclic and non-cyclic photophosphorylation, Mehler reaction, and oxidative

phosphorylation, and reactions facilitating reversible transfer of reducing equivalents

between various carriers such as ferredoxin, NAD+ and NADP+ such as Glutamate

dehydrogenase, Glutamine synthase/Glutamate:Oxoglutarate aminotransferase system,

and isozymes of gluceraldehyde-3-phosphate dehydrogenase. The expanded pathway

coverage in iSyn731 provides insights into carbon flows through various pathways not

modeled in the core model such as aspartate, glutamate, glycine, and serine metabolism

and reveals the existence of pathway topologies supporting carbon conversion to biomass

with near 100% efficiency.

Glycine and serine metabolism exhibits a bifurcated topology involving reactions from the

photorespiratory pathway, the phosphoserine pathway, and SHMT (Figure 3.5a). Flux

through the carboxylation and oxygenation reactions of RuBisCO is partitioned in a 90:10

ratio with 9.7 mmol/dmol BCU of flux entering the photorespiratory pathway (Figure 4).

Oxygenation of RuBP produces one molecule 3PG and one molecule of 2PGLYC, which

is oxidized to glyoxylate in the photorespiratory pathway (Figure 3.5a). Since no oxidation

of glyoxylate to formate or CO2 occurs, all of the 2PGLYC synthesized via oxidation of

RuBP is converted to glycine. Absence of flux through glyoxylate oxidation is supported

Page 89: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

78

by experimentally observed insignificant 13C incorporation into oxalate (Young et al.,

2011). 3PG is converted to serine via the phosphoserine (PHSER) pathway similar to E.

coli. Glycine is also produced from serine via the SHMT reaction. Since Synechocystis

does not accumulate or secrete glycine, and no glycine degradation occurs via GLYDH, it

is exclusively utilized for biomass production, as a result of which, the glycine producing

branch of the photorespiratory pathway is identified to be growth-coupled. Moreover, the

SHMT reaction is identified to be the sole source of the one-carbon pool carried by

tetrahydrofolate. A trace flux is observed through glycerate indicating that the second half

of the photorespiratory pathway is inactive causing the phosphoserine pathway to be

growth-coupled. This flux distribution results in a unique bifurcated topology achieving

complete carbon conversion of RuBP to glycine and serine with no losses in the form of

CO2. Furthermore, cysteine is also synthesized from serine and completely routed to

biomass as there is no flux through the cysteine-degrading mercaptopyruvate pathway. The

overall topology of this pathway allows glycine and serine biosynthesis from bicarbonate

with a 100% carbon conversion efficiency while reinforcing the essentiality of the

glycolate pathway in Synechocystis (Eisenhut et al., 2008). This observation is in contrast

to the linear pathway proposed in earlier GSM models of Synechocystis (Knoop et al.,

2010) and affords a higher 13C enrichment of serine than glycine, consistent with

experimental observations (Huege et al., 2007; Young et al., 2011).

The genome-scale mapping model imSyn617 achieves an unambiguous resolution of

fluxes around the pyruvate node (Figure 3.4). Pyruvate synthesis occurs indirectly from

PEP via the anaplerotic PPC and ME reactions due to the inactivity of PK, methylglyoxal,

Page 90: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

79

and serine degradation pathways. In addition to this, no flux is seen through the PPS

reaction indicating unidirectional flux from glycolysis to the TCA cycle, thereby localizing

any CO2 incorporated via PPC to the TCA cycle only. Acetyl-CoA is produced via the

PDH reaction for lipid synthesis and TCA metabolism. Absence of flux through all

alternate routes connecting AKG and succinate in conjunction with the lack of a glyoxylate

shunt (Thiel et al., 2017; Varman et al., 2013) renders the TCA cycle incomplete with a

bifurcated topology incapable of completely oxidizing acetyl-CoA (Figure 3.4b). As a

consequence, all reactions of the TCA cycle are identified to be growth coupled as

Synechocystis does not produce any organic acids as byproducts of photoautotrophic

metabolism (Young et al., 2011). Fumarate is not synthesized directly via the TCA cycle

but is instead generated as a byproduct of arginine and purine biosynthetic pathways. This

fumarate serves as a precursor for succinate required for growth, while the excess fumarate

is converted to malate via fumarate hydratase.

3.4. Discussion

In this chapter, genome-scale INST-MFA is applied to elucidate photoautotrophic

metabolism in Synechocystis. Reactions capable of carrying flux in iSyn731 (Saha et al.,

2012) are identified via FVA using extracellular flux measurement data (Young et al.,

2011). The corresponding GSMM model imSyn617 includes all carbon-balanced reactions

Atom mapping for reactions shared with E. coli is derived from imEco726 (Gopalakrishnan

and Maranas, 2015a) and the remaining reactions are mapped using the CLCA algorithm

or based on reaction mechanism when available. A customized algorithm is developed with

improved scalability and memory efficiency leading to a 48% reduction per iteration in the

Page 91: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

80

computational time required to simulate of metabolite labeling dynamics in larger

networks. INST-MFA is performed to identify a suitable flux distribution accurately

recapitulating the labeling distribution and dynamics of 15 central metabolites obtained

during photoautotrophic growth of Synechocystis with 50% 13C-labeled bicarbonate as the

tracer (Young et al., 2011). In response to degeneracy in the metabolic network and

experimental errors, 95% confidence intervals were also determined using the established

procedure (Antoniewicz et al., 2006; Gopalakrishnan and Maranas, 2015a) to identify flux

ranges for all reactions.

Upon evaluating the significance of the improved recapitulation afforded by imSyn617

using the F-test, the F-statistic is 1.335 (p = 0.012). In comparison, the corresponding F-

statistic for scale-up in E. coli was 0.152 (p = 0.999) indicating that the core model accounts

for the carbon paths necessary to recapitulate the labeling data used in that study

(Gopalakrishnan and Maranas, 2015a). The increased uncertainty of flux estimation was

attributed to the inclusion of alternate paths with identical atom mapping information. In

contrast, the statistical significance associated with model scale-up in this study implies

that unique and often surprising insights into the carbon flows under phototrophic growth

are obtained by the re-analysis of an existing dataset using a detailed description of the

entirety of metabolism in Synechocystis. Flux elucidation of photoautotrophic growth of

Synechocystis using imSyn617 reveals that Synechocystis deploys a carbon efficient

metabolism enabling maximal conversion of fixed carbons to biomass precursors with

minimal production of organic acids and glycogen. This is in contrast to heterotrophic

bacteria such as E. coli where 35% of the taken-up glucose is secreted as acetate (Sandberg

Page 92: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

81

et al., 2016) resulting in a 30% biomass yield loss from the theoretical maximum biomass

yield (Feist et al., 2007). The flux ranges estimated in this study provide a comprehensive

set of essential and dispensable metabolic reactions in Synechocystis under

photoautotrophic growth conditions to serve as a guideline for editing photosynthetic

prokaryotic genomes. The estimated flux ranges reveal that net carbon fixation accounts

for only 88% of the assimilated bicarbonate. The remaining 12% is fixed by PPC, but is

subsequently oxidized to CO2 via malic enzyme, TCA cycle, and peripheral metabolic

reactions. These carbons are not recycled by the CBB cycle and are therefore off-gassed.

This inability to recycle these carbons via the CBB cycle is identified as a target to improve

upon in photosynthetic carbon fixation. It is unclear from this analysis whether this is

caused by a rate-limiting enzyme in the CBB cycle or a paucity of available NADPH and

ATP as the fluxes through the photosynthetic light reactions and oxidative phosphorylation

are poorly resolved by INST-MFA. Resolution of these reactions requires knowledge of

the spectral composition of the light source and photon flux partitioning between

photosystems I and II to distinguish ATP production via non-cyclic and cyclic

photophosphorylation. When combined with the measurement of net oxygen evolution

rate, these measurements will allow accurate elucidation of fluxes through the

photosynthetic light reactions and oxidative phosphorylation. This will enable resolution

of NADPH production and provide insights into the biological impact of a mandatory flux

through Malic Enzyme, consistent with experimentally verified essentiality of this gene

(Bricker et al., 2004). Unlike in E. coli (Gopalakrishnan and Maranas, 2015a), here

alternate routes to lower glycolysis and TCA cycle are extremely well resolved based on

differences in metabolite labeling dynamics, thereby demonstrating the superior capability

Page 93: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

82

of INST-MFA in resolving pathways with similar atom transitions and establishing the

dispensability of the lower TCA cycle under photoautotrophic growth conditions.

The introduced algorithmic procedure (Appendix B) for performing flux elucidation at a

genome-scale offers a 48% reduction in computation time which will grow with larger

models. As this scheme employs an exponential integrator, a moderate level of stiffness

can still be handled when pool sizes exceed 10-4 mmol/gdw. Stiffness in INST-MFA

models arises from a degeneracy in pathway labeling dynamics due to the inclusion of

more pool size parameters than necessary to recapitulate experimentally observed labeling

distributions. As a consequence of this, the confidence interval estimation procedure will

fail to compute an upper bound for many metabolite turnover rates (defined as the ratio of

flux through a metabolite to its pool size). Since fluxes are scaled to bicarbonate uptake

and bounded by stoichiometric mass balance constraints, the uncertainties in the estimation

of metabolite turnover rates will be reflected in uncertainties in pool size resolution but

does not affect flux confidence interval calculation as long as the solution lies outside the

stiff regions. Due to this, pool size ranges are not computed in this study. This would

require the development of a higher order implicit method for ODE integration so as to

ensure accuracy and stability of the procedure. It has been previously reported that

channeling plays a key role in explaining the observed metabolite labeling dynamics

(Huege et al., 2007; Young et al., 2011). The presence of substrate channeling was

hypothesized based on the existence of segregated metabolite pools inferred from dilution

parameters (Young et al., 2011). Consistent with earlier findings, up to 10% of the 3PG

and F6P pools are found to be metabolically inactive with no segregation of the PEP pool,

Page 94: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

83

alluding to the presence of a channeling mechanism from 3PG to PEP. Quantification of

pool sizes will provide detailed insights into substrate channeling mechanisms arising from

CBB cycle enzyme co-localization similar to that seen in plant chloroplasts (Anderson and

Carol, 2004; Anderson et al., 2005; Suss et al., 1993). In conjunction with the bi-

functionality of the SBPase enzyme (Yan and Xu, 2008), this would explain the preference

for TAL-SBPase-SBA route for the regeneration of pentose sugar phosphates as opposed

to the conventional TPI-FBA-FBPase pathway despite the lack of an energetic advantage.

Nevertheless, the flux estimation algorithm always converged outside the stiff regions in

the solution space, indicating that the obtained flux ranges are not confounded by stiffness

of the system of ODEs describing metabolite labeling dynamics in GSM models. imSyn617

coupled with customized integrators enables the elucidation of fluxes with a global

coverage and high statistical confidence by re-analyzing already available labeling

datasets. This newly reached scope and fidelity in flux elucidation promises to enhance

both kinetic model parametrization (Khodayari and Maranas, 2016) and facilitate the use

of strain design algorithms such as OptForce (Ranganathan et al., 2010), and k-OptForce

(Chowdhury et al., 2014).

Page 95: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

84

Figure 3.1. Representation of central metabolism in Synechocystis. The reactions

exclusive to the core model and the GSM model are indicated in orange and green,

respectively. Metabolite drains for biomass formation and peripheral metabolism are

indicated in dashed arrows with GSM-specific interactions indicated in green. Completion

of the TCA cycle (AKGDH) is indicated using a dashed green arrow to represent the

existence of alternate routes between this pair of metabolites

Page 96: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

85

Figure 3.2. Carbon incorporation paths and conserved moiety cycling in imSyn617. (a)

CO2 reincorporation via photorespiration. Solid black circles represent reincorporated CO2

atoms. Reversible glycine degradation is the primary carbon scrambling reaction in this

pathway allowing the incorporation of degraded glyoxylate carbons as well as substrate

bicarbonate to generate three unique carbon arrangement patterns of 3PG. (b) Recycling of

conserved moieties within central metabolism. The conserved E4P moiety generated due

to the interaction between TAL from the non-oxidative PP pathway and SBA and SBPase

from the regeneration phase of the CBB cycle is indicated in red whereas the conserved

triose phosphate moiety recycled between the serine biosynthetic pathway,

photorespiration, and lower glycolysis is indicated in blue.

(a)

1

2

3

4

5 3

4

5

1

2

1

2

1

2 2

2 1

1

1

2

1

1

11

2

1

1

1

1 1 1

1

2

1

1

1

1 1 1RuBP

CO2

FOR

GLX

OXL

GLY

SER3PGCO2

CO2

CO2

CO2

3PG

MEETHF

Page 97: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

86

(b)

Page 98: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

87

Figure 3.3. Recapitulation of experimentally observed labeling distribution and

dynamics expressed in terms of variance-weighted sum of squares of residuals (SSRES)

using the core model (orange bars) and the GSMM model (green bars) of Synechocystis.

Fragments with an SSRES difference exceeding 25 are indicated using a black box.

Page 99: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

88

Figure 3.4. Flux ranges (expressed in mmol/dmol bicarbonate uptake (BCU)) of central

metabolic reactions in Synechocystis during photoautotrophic growth predicted using a

core model (orange bars) and a GSMM model (green bars). The bars represent the range

of flux from its lower bound to its upper bound. The reaction names on the y-axis are

consistent with the nomenclature used in Figure 3.1. Excluded reactions in each model are

assumed to carry no flux.

Page 100: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

89

Figure 3.5. Bifurcated topology of the photorespiratory pathway (a) and the TCA cycle

(b). Flux (in mmol/dmol BCU) through each reaction is specified in blue. Arrows indicate

direction of flux. Reaction abbreviations are consistent with Figure 1. The dashed arrows

represent metabolite drains for biomass production.

(a)

Page 101: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

90

(b)

Page 102: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

91

Figure 3.6. Recapitulation of labeling dynamics of CBB intermediates. Fit quantified

by the standard deviation-weighted residuals for the mass isotopomers of (a) PEP-167, (b)

3PGA-185, and (c) RUBP-309 at various time points for the core model (black bars) and

the GSM model (gray bars).

(a)

Page 103: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

92

(b)

(c)

Page 104: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

93

Figure 3.7. Carbon positional shifts in Synechocystis due to scrambling in upper

glycolysis, PPP, and the Calvin Cycle. (A) Carbon paths mapping positions C1 and C2 of

RuBP (RuBP-1,2) to 3PG via PPP and the Calvin cycle depicted as EMU reactions. (B)

Positional shifts of glucose carbon positions C2 and C3 upon flux through the PPP.

A

Page 105: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

94

B

Page 106: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

95

Figure 3.8. F-Test on the oxidative pentose phosphate pathway. (a) Recapitulation of

experimentally observed labeling distribution and dynamics expressed in terms of

variance-weighted sum of squares of residuals (SSRES) using the core model of

Synechocystis when the oxidative pentose phosphate pathway is allowed to carry flux (gray

bars) and when it is constrained to carry no flux (black bars). The total SSRES increases

from 684 to 742. A statistically significant reduction in SSRES is seen upon permitting

flux through the oxidative pentose phosphate pathway (F = 59.1, p = 5 × 10−14). However,

both flux distributions are statistically acceptable. (b) Comparison of labeling dynamics of

RuBP309 fragment when the oxidative pentose phosphate pathway is active (gray bars)

and when it is constrained to carry no flux (black bars). The increase is SSRES arises from

an acceleration in labeling dynamics based on a reduction in the unlabeled fraction upon

constraining the flux through the oxidative pentose phosphate pathway to zero.

Page 107: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

96

(a)

(b)

Page 108: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

97

Figure 3.9. F-Test on Transaldolase. Recapitulation of experimentally observed

labeling distribution and dynamics expressed in terms of variance-weighted sum of squares

of residuals (SSRES) using imSyn617 when the transaldolase (TAL) reaction is allowed to

carry flux (gray bars) and when it is constrained to carry no flux (black bars). The total

SSRES increases from 511 to 541. A statistically significant reduction in SSRES is seen

upon permitting flux through the TAL reaction (F = 16.32, p = 6.1 × 10−5). However, both

flux distributions are statistically acceptable. Upon constraining the flux through TAL to

zero, carbon flux is diverted through the conventional FBA/FBPase route, accompanied by

a reduction of flux through the SBA reaction.

Page 109: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

98

Chapter 4

K-FIT: An accelerated kinetic parameterization algorithm using steady-

state fluxomic data

4.1. Introduction

The pressing need for the rapid development of truly predictive models of metabolism to

accelerate build-design-test cycles for metabolic engineering has been widely reported

(Cheng and Alper, 2014; Dromms and Styczynski, 2012; Long et al., 2015). Advances in

synthetic biology (Chae et al., 2017; Cho et al., 2018; Stovicek et al., 2017) have alleviated

the challenge of genome editing placing the onus on the decision of what genetic

modifications to carry out in metabolic engineering projects. There already exist a number

of strain design algorithms that operate on genome-scale stoichiometric descriptions of

metabolism including Optknock (Burgard et al., 2003), RobustKnock (Tepper and Shlomi,

2010), BiMOMA (Kim et al., 2011), and OptForce (Ranganathan et al., 2010) which have

been successfully applied to engineer glutamate and succinate overproducing strains in E.

coli (Kim et al., 2011), fatty acid production in E. coli (Ranganathan et al., 2012; Xu et al.,

2011) and overproduction of flavonoid precursor shikimate in Saccharomyces cerevisiae

(Suastegui et al., 2017).

Because stoichiometric models do not directly capture the effect of metabolite

concentration changes, protein level fluctuations, enzyme saturation, or allosteric

regulation of enzymatic activity (Chowdhury et al., 2015a; Saa and Nielsen, 2017), strain

design algorithms cannot identify interventions to allosteric and transcriptional regulations

Page 110: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

99

such as the ptsG knockout for succinate overproduction in E. coli (Chowdhury et al.,

2015b) or upregulation of the transcription regulator FapR for malonyl-CoA

overproduction in E. coli (Xu et al., 2014). Kinetic models of metabolism can alleviate

these shortcomings by quantitatively describing the relationship between fluxes, enzyme

levels, and metabolite concentrations based on mechanistic and/or approximate rate law

formalisms. This allows kinetic models to trace the effect of allosteric regulation and

enzyme level changes (van Eunen et al., 2012), assess metabolic changes in response to

altered carbon sources (Kotte et al., 2010), resolve the accessibility of metabolic steady-

states (Lafontaine Rivera et al., 2017), and predict metabolic changes in response to drug

interventions (Frohlich et al., 2018).

The promise of superior product yield prediction offered by kinetic models by tracking

both enzyme levels and metabolite concentrations through metabolism comes at the

expense of substantially increased experimental data requirements and complexity in

model assembly, parameterization and interpretation of results. Construction of a kinetic

model requires knowledge across all reactions of (i) the mechanism of enzyme catalysis,

(ii) the effect of regulators (activators and allosteric inhibitors), and (iii) a network model

that is elementally balanced accurately reflecting the reaction stoichiometry. The catalytic

mechanism of the enzymes can be obtained from literature or from kinetic model

repositories such as KiMoSys (Costa et al., 2014) and information about effectors can be

obtained from databases such as BRENDA (Placzek et al., 2017) and SABIO-RK (Wittig

et al., 2012). Although it is tempting to also rely on database entries from BRENDA for

obtaining enzyme kinetic parameter values derived largely from in vitro enzyme assays,

Page 111: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

100

limited availability of organism-specific data and differences between in vivo and in vitro

assay conditions leads to the haphazard integration of heterogeneous datasets into the same

kinetic model. Even when sufficient in vitro derived kinetic information is available to

construct a kinetic model, meaningful results are not necessarily achieved (Teusink et al.,

2000).

Therefore, in vivo kinetic parameters must be estimated by solving a nonlinear

programming (NLP) problem that recapitulates experimentally measured temporal

concentration profiles (Jahan et al., 2016) or steady-state fluxes (Khodayari et al., 2014) in

WT and genetic and/or environmentally perturbed mutants. The computational difficulties

arising from non-convexity in this NLP have been alleviated in the past by linearization of

nonlinear rate laws around a reference steady-state using a log-linear formalism

(Hatzimanikatis and Bailey, 1997) which forms the basis for the ORACLE framework

(Miskovic and Hatzimanikatis, 2010). However, linearization about a reference state limits

the predictive capabilities to the vicinity of the reference state (Saa and Nielsen, 2017).

Other mechanistic frameworks bypassing linearization such as Ensemble Modeling (EM)

(Tran et al., 2008) relates fluxes to metabolite concentrations using mass-action kinetics in

conjunction with elementary-step decomposition of the mechanism of enzyme catalysis.

Conservation of mass across metabolites and enzymes is decomposed into systems of

bilinear algebraic equations, allowing convenient insertion/deletion of regulatory

components. Simulation of concentration dynamics and identification of steady-state

fluxes requires integration of the system of ODEs representing conservation of mass across

metabolites (Hoops et al., 2006; Tran et al., 2008). In addition to these, many kinetic

Page 112: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

101

models based on Michaelis-Menten and Hill kinetic formalisms have also been constructed

(Chassagnole et al., 2002; Srinivasan et al., 2018).

Unfortunately, the non-convexity in the NLP arising from enforcing conservation of mass

across all species limits the direct use of local optimization solvers such as MINOS

(Murtagh and Saunders, 1978), CONOPT (Drud, 1985) or fmincon within MATLABTM.

Instead, metaheuristic approaches such as genetic algorithms (GA) (Khodayari et al.,

2014) and particle swarm optimization (Millard et al., 2017) have been used in the past for

the traversal of kinetic parameter solution space. Meta-heuristic algorithms rarely saturate

the kinetic parameter space with function evaluations and require over 50,000 hours on a

high-performance computing cluster to parameterize a near-genome-scale kinetic model

containing 5,239 kinetic parameters (Khodayari and Maranas, 2016). More importantly,

they cannot confirm optimality of a reported solution due to the lack of gradient evaluations

which must be calculated using the computationally expensive forward sensitivity analysis

(Raue et al., 2013). The lack of efficient gradient estimation also prevents the evaluation

of local sensitivities along with any follow-up investigations on kinetic parameter

uncertainty. Although, recasting the problem within a Bayesian framework such as GRASP

(Saa and Nielsen, 2015) permits the computation of confidence intervals, applicability of

GRASP to large kinetic models is limited by the poor scalability of the underlying Monte-

Carlo-based sampling methods within the Bayesian paradigm (Saa and Nielsen, 2017).

Thus, long parameterization times stemming from poor scalability of existing

parameterization frameworks ultimately precludes the efficient computation of local

sensitivities of steady-state fluxes to kinetic parameters. As a result of this, accurate

Page 113: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

102

confidence intervals for estimated kinetic parameters cannot be readily calculated, and

insights into (i) the robustness of resolution of kinetic parameters given mutant flux

datasets, (ii) kinetic parameter confidence levels, and (iii) need for follow up measurements

to improve prediction cannot be gleaned. In response to these challenges we put forth the

K-FIT algorithm, a decomposition-based approach for parameterization of kinetic models

using steady-state fluxomic and/or metabolomic data collected for multiple perturbation

mutants. K-FIT builds upon the concept of Ensemble Modeling (EM) by anchoring

concentrations and kinetic parameters to a reference strain but unlike earlier efforts

employing genetic algorithm (Khodayari et al., 2014) to parameterize the model, K-FIT

achieves many orders of magnitude improvement in efficiency by relying on a customized

decomposition approach. K-FIT was first benchmarked against EM for three test kinetic

models of increasing size ranging from 100 to 953 kinetic parameters to demonstrate the

increase in computational savings with model size. K-FIT remained tractable for even a

large kinetic model containing 307 reactions, 258 metabolites, and 2,407 kinetic

parameters parameterized with 1,728 steady-state fluxes from six single gene-deletion

mutants determined using 13C-metabolic flux analysis (13C-MFA) (Long et al., 2018).

The parameterization was carried out 100 times with random initializations and was

completed within 48 hours of computation time. The best solution was recovered 44 out of

100 times providing confidence that convergence to the true optimum was indeed achieved.

The kinetic model k-ecoli307 accurately recapitulated fluxes to within 15 mmol/gdw-h of

the values reported by 13C-MFA while also predicting fluctuations in glucose uptake in

response to genetic perturbation and flux rerouting through energy metabolism to meet

biosynthetic NADPH demands. The yield predictions of acetate, lactate, and malate for

Page 114: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

103

engineered strains were found to be within 30% of the experimental yield for metabolites

derived from central metabolism. The presented algorithm includes local sensitivity

calculations for computation of gradients which ensures optimality of obtained solutions.

This feature will enable follow-up calculations on uncertainty in parameter estimations and

control coefficients to aid efficient design of experiments improve prediction fidelity of

kinetic models and inform metabolic engineering strategies.

4.2. Methods

4.2.1. Kinetic parameterization using K-FIT

K-FIT is a gradient-based kinetic parameterization algorithm that minimizes the least-

squares objective function representing the weighted squared deviation between predicted

and measured steady-state metabolic fluxes (and possibly metabolite concentrations)

across multiple genetic perturbation mutants. The full mathematical description for the K-

FIT algorithm is provided in the supplementary methods. The least-squares NLP is solved

using the Levenberg-Marquardt algorithm (Madsen et al., 2004) in conjunction with the

active-set method for enforcing linear inequality constraints (Gill et al., 1984). K-FIT is

encoded and implemented in MATLABTM and run on an Intel-i7 (4-core processor,

2.6GHz, 12GB RAM) computer. K-FIT is tested using kinetic models at different size

scales. The full source-code is made available on GitHub.

Computation of standard deviations for estimated kinetic parameters was performed using

linear regression tools applied to a local linearization of mutant fluxes using Taylor series

expansion (Wiechert et al., 1997). Briefly, the Covariance matrix 𝑪 is computed by

Page 115: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

104

inverting the Hessian 𝑯 computed by K-UPDATE. When the linear approximation holds,

the diagonal of the covariance matrix represents the estimation variance of kinetic

parameters. The approximate standard deviation of kinetic parameter 𝑘𝑝 (𝜎𝑝) is evaluated

as 𝜎𝑝 = √𝐶𝑝𝑝. The approximate confidence interval is computed as 𝑘𝑝 ± 𝜎𝑝.

4.2.2. Construction of the expanded kinetic model of E. coli, k-ecoli307

The expanded metabolic model is constructed by de-lumping the central and peripheral

metabolic pathways in the core model (Foster et al., 2019 (Under Review)) based on the

reported biomass composition (Neidhardt and Curtiss, 1996). The expanded model

contains 307 reactions and 258 metabolites. Atom mapping for the additional reactions

were obtained from the previously published genome-scale carbon mapping model for E.

coli (Gopalakrishnan and Maranas, 2015a). The amino acid labeling data for flux

elucidation was obtained from the published work by Long et al (2018). Metabolic fluxes

and 95% confidence intervals were elucidated using 13C-metabolic flux analysis as

described earlier (Antoniewicz et al., 2006; Gopalakrishnan and Maranas, 2015a). The

mechanism and allosteric regulation of enzyme-catalyzed reactions in the model were

obtained from k-ecoli457, the near-genome-scale kinetic model for E. coli (Khodayari and

Maranas, 2016). The standard deviation 𝜎𝑗 corresponding to the estimated flux 𝑉𝑗 to be

used as a weighting factor in the K-FIT algorithm is computed from the lower and upper

bounds of the confidence interval 𝑉𝑗𝐿𝐵 and 𝑉𝑗

𝑈𝐵 reported by 13C-MFA as 𝜎𝑗 =𝑉𝑗

𝑈𝐵−𝑉𝑗𝐿𝐵

3.92.

Computed kinetic parameters were also packaged into Michaelis-Menten parameters as

described earlier (Khodayari and Maranas, 2016).

Page 116: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

105

4.3. Results

In this section, first a schematic representation of the workflow of K-FIT (see Figure 1) is

described. The performance of K-FIT is next benchmarked against the Ensemble Modeling

(EM) (Khodayari et al., 2014) using three test kinetic models to assess the impact of model

scale-up on the computational savings afforded by K-FIT. The applicability of K-FIT to

near genome-scale models is then demonstrated using an expanded kinetic model for E.

coli (k-ecoli307) containing 307 reactions, 258 metabolites, and 2,407 kinetic parameters

parameterized using 13C-amino acid labeling data in six single gene-deletion mutants

(Long et al., 2018). Confidence intervals for all estimated elementary kinetic and

Michaelis-Menten parameters are estimated by leveraging the gradient calculations

embedded within K-FIT. The predictive capability of k-ecoli307 is then assessed by

comparing predicted product yields against experimentally measured yields in six over-

producing strains.

4.3.1. The K-FIT Algorithm

K-FIT is a gradient-based kinetic parameter estimation algorithm using steady-state flux

measurements from multiple genetic perturbation mutants. The schematic workflow for K-

FIT is shown in Figure 4.1 and Figure 4.4. Reaction fluxes are related to metabolite

concentrations using mass-action kinetics after decomposition of the enzyme catalytic

mechanism into elementary steps (see Appendix C for the detailed procedure for

elementary step decomposition). Conservation of mass across enzyme complexes and

metabolites is therefore expressed as a system of bilinear equations. This formalism was

chosen because it is mechanistically sound, obeys mass conservation laws and is inherently

Page 117: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

106

thermodynamically feasible (Saa and Nielsen, 2017) while at the same time allows for easy

integration of allosteric regulation without the need to derive cumbersome nonlinear rate

laws. The K-FIT algorithm iteratively applies the following three steps till convergence is

reached: (i) K-SOLVE, (ii) Steady-State Flux Evaluator (SSF-Evaluator), and (iii) K-

UPDATE.

The objective of K-SOLVE is to anchor kinetic parameters to the reference state (WT

network) as described by Tran et al. (Tran et al., 2008). K-SOLVE uses as input the reverse

fluxes of all elementary steps and the enzyme fractions for the WT metabolic network. It

also removes the measured WT fluxes from the sum of the least squares objective function.

It then uses the resultant equalities, WT elementary fluxes and WT enzyme fractions to

satisfy all remaining degrees of freedom and assign unique values to the kinetic parameters

so as they inherently satisfy mass balances across metabolites and enzymes in the WT

network. This is an important consideration as mass balances are not always satisfied under

metabolic steady-state for an arbitrary assignment of values to the kinetic parameters.

Kinetic parameters anchored by K-SOLVE are then used by SSF-Evaluator to compute the

steady-state fluxes and concentrations across all mutants one at a time. The system of

bilinear algebraic equations in metabolite and enzyme complex concentration resulting

from the elementary step decomposition of enzyme catalysis is decomposed into two sub-

problems. The first bilinear sub-problem representing conservation of mass across enzyme

complexes, is reduced to a system of linear algebraic equations in enzyme complex

concentrations when the metabolite concentrations are specified. The second bilinear sub-

problem describing conservation of mass across all metabolites reduces to a system of

Page 118: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

107

linear algebraic equations in metabolite concentrations when the concentrations of enzyme

complexes are specified. By iterating between these two linear sub-problems, the steady-

state enzyme levels and metabolite concentrations in each mutant is identified.

Convergence is achieved when the concentrations of enzyme complexes and metabolites

remains almost unchanged between successive iterations. This iterative scheme therefore

enables the direct evaluation of steady-state fluxes without the need to integrate any ODEs

and contributes to the speed-up of the kinetic parameterization process.

The calculated fluxes for all mutants are compared against the corresponding measured

fluxes and the sum of squared residuals (SSR), the first-, and second-order gradients are

computed by K-UPDATE. The WT reverse elementary fluxes and WT enzyme fractions

are then updated using a Newton step and the core loop of K-FIT is repeated until the

minimum deviation of predicted fluxes from experimental measurements is reached. The

local sensitivity of fluxes with respect to kinetic parameters for gradient calculation can

now readily be computed by solving a system of linear equations (see supplementary

methods) as opposed to having to perform costly forward sensitivity analysis. This enables

K-FIT to confirm that any reported solution is indeed optimal while also allowing for the

assembly of the covariance matrix from which approximate confidence intervals for the

estimated kinetic parameters can be efficiently calculated.

4.3.2. Benchmarking K-FIT against Ensemble Modeling

The computational performance of the K-FIT algorithm was first compared against

solution with a genetic algorithm (GA) operating on a population of models constructed

using the Ensemble Modeling (EM) approach. Three test models of increasing sizes were

Page 119: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

108

used to assess the impact of model size on parameter estimation speed and solution

reproducibility. The first small model containing 14 reactions, 11 metabolites, and 100

kinetic parameters was adapted from the three glycolytic pathways in E. coli and was

parameterized using flux distributions from four single gene-deletion mutants (Figure

4.5a). The second medium-sized kinetic model containing 33 reactions, 28 metabolites,

and 235 kinetic parameters was adapted from a previous study (Greene et al., 2017) and

was parameterized using flux distributions from seven single gene-deletion mutants

(Figure 4.5b). The third test model describing carbon flows through central and amino acid

metabolism was adapted from the model developed by Foster et. al., (2019 (Under

Review)). This model (Figure 4.5c) contains 108 reactions, 65 metabolites, and 953 kinetic

parameters and was parameterized using flux distributions from seven single gene-deletion

mutants

Kinetic parameters were estimated using K-FIT in 9 minutes, 30 minutes, and 4 hours for

the three models, respectively. In contrast, EM required 60 hours, 726 hours, and 4,278

hours, respectively to parameterize the same three models. Computational speed-up

increased from 100-fold for the first model to 1000-fold for the core kinetic model upon

switching from GA to K-FIT. This dramatic reduction in parameterization time arises from

the (largely) integration-free steady-state flux evaluation using the SSF-Evaluator step and

the fact that K-FIT traverses the variable space in a highly economical manner (i.e., Newton

steps) requiring fewer than 500 steady-state flux evaluations to identify the optimal

solution. In contrast, the GA approach relies on iterative recombination and mutation of

kinetic parameter vectors requiring as many as 20,000 steady-state flux evaluations before

Page 120: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

109

finding the same solution (Khodayari et al., 2014) though without confirming optimality.

SSF-Evaluator evaluated steady-state fluxes, on average, in 0.42, 1.13, and 6.12 seconds,

for the three models, respectively, whereas, numerical integration required 3.5, 120, and

440 seconds, respectively. Bypassing integration also enables SSF-Evaluator to handle stiff

systems of ODEs arising from the large dynamic range of kinetic parameters and ensuring

that steady-state fluxes are always within a mass imbalance of just 0.001 mol%. It is

important to note that Newton’s Method can only guarantee convergence to a local and not

necessarily the global minimum of SSR. As a safeguard against failure to reach the true

minimum, K-FIT was run 100 times starting from random initial starting points. For the

three test models K-FIT exhibited a best solution recovery of 98%, 93%, and 60%,

respectively. This high solution reproducibility provides confidence that K-FIT is able to

consistently converge to the lowest SSR of 0 for the small model, 8.9 for the medium-sized

model, and 1.3 for the core model, respectively. Notably, no alternate optima in the vicinity

of the best solution (within an SSR of 100) was detected for any of the three models

implying that the best SSR minimization solution is the only good kinetic parameterization

candidate.

The inherent ability of K-FIT to quickly calculate local sensitivities of predicted fluxes to

metabolite concentrations was leveraged to confirm whether SSFEstimator reported

steady-state concentrations that are stable. To this end, the eigenvalues of the Jacobian

matrix (Greene et al., 2017) at metabolic steady-state were calculated to confirm that the

real part of all eigenvalues were strictly negative for all iterations and problems solved.

The confidence intervals for the inferred kinetic parameters revealed that elementary

Page 121: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

110

kinetic parameters were generally unresolved in all three models. For the first model, only

ten elementary kinetic parameters were resolved with a standard deviation less than 10%.

For the medium and core kinetic models, 86 and 90 kinetic parameters are resolved with a

standard deviation less than 10%. In order to investigate the origin of this wide confidence

intervals, we computed the confidence intervals for the enzyme fractions and elementary

fluxes in the WT strain that serve to anchor kinetic parameters in the K-SOLVE step. We

found that all elementary fluxes in all three models were estimated with an uncertainty of

less than 10%. In fact, the average standard deviation in the estimation of elementary fluxes

for the small, medium-sized, and core models was only 1.7%, 2.8%, and 0.97%,

respectively. However, the corresponding average standard deviation for enzyme fractions

for the three models was higher with values of 0.64, 0.27, and 1.81 mol/mol-total enzyme,

respectively. For the three models, only 4, 53, and 52 enzyme fractions were resolved with

a standard deviation less than 0.1 mol/mol-total enzyme. Since enzyme fractions are

bounded between zero and one, this implies that enzyme fractions are generally poorly

resolved for all three models. The uncertainty in the estimation of enzyme fractions

propagates to the aggregated kinetic parameters resulting in the observed wide confidence

intervals. The better resolution of enzyme fractions for larger models can be traced back to

the availability of flux data in more mutants for the medium and core models (seven and

six mutants, respectively) compared to only four mutants for the small kinetic model.

Page 122: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

111

4.3.3. Parameterization of a kinetic model (k-ecoli307) for E. coli with near-genome-

wide coverage

Following the application of K-FIT on three test models of increasing models, K-FIT was

deployed for the parameterization of k-ecoli307, an E. coli kinetic model with near-

genome-wide coverage similar to k-ecoli457 (Khodayari and Maranas, 2016). The

expanded model containing 307 reactions, 259 metabolites and 2,407 kinetic parameters

encompasses central metabolism, expanded amino acid, fatty acid, and nucleotide

pathways, and lumped pathways for peptidoglycan biosynthesis. Compared to k-ecoli457,

this model lacks the pathways for anaerobic metabolism and secretion of organic acids as

it was parameterized using data under aerobic growth only. Flux data for six single gene-

deletion mutants were computed using 13C-Metabolic Flux Analysis (13C-MFA) to

recapitulate the measured labeling distribution of 10 proteinogenic amino acids and two

sugar phosphates grown with 1,2-13C-glucose as the carbon tracer (Long et al., 2018). This

provided a total of 1,728 MFA-determined fluxes for kinetic parameterization from the six

mutants. All 69 substrate level regulatory interactions for 26 reactions in the expanded

model were transferred from k-ecoli457. Complete cofactor balances were not included in

k-ecoli307. Instead, ATP was modeled as an energy sink replenished from the hydrolysis

of a single phosphate group, and NADH and NADPH balances were modeled as electron

pairs transferred. This simplification is necessary to allow the total pool of ATP, NADH

and NADPH to fluctuate across mutants, which would be otherwise impossible due to

metabolite pool dependencies introduced by cofactor recycling. The expanded model was

parameterized using flux distributions from the six mutants Δpgi, Δgnd, Δzwf, Δeda, Δedd,

Page 123: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

112

and Δfbp with 288 fitted fluxes per mutant.100 fluxes were inferred using 13C labeling

data whereas 207 reactions were growth-coupled. Parameterization using the K-FIT

algorithm was completed in 48 hours on an Intel-i7 (4-core processor, 2.6GHz, 12GB

RAM) computer with a minimum SSR of 131 and a solution reproducibility of 44%.

The recapitulation of the experimentally measured fluxes by K-FIT for the six mutants is

shown in Figure 4.2. All predicted fluxes were within 15 mmol/gdw-h of their

corresponding flux reported by 13C-MFA. This corresponds to a maximum deviation of

only 10% from the experimentally determined fluxes. Flux distributions for Δeda, Δedd,

and Δfbp mutants were largely unchanged from WT (Figures 4.2a, 4.2b, and 4.2c) alluding

to the dispensability of the corresponding genes. In contrast, carbon flux was significantly

rerouted in response to the knockout of pgi, zwf, and gnd genes. Glucose uptake remained

similar to WT for the Δzwf mutant but routed completely via the EMP pathway (Figure

4.2d). The non-oxidative pentose phosphate pathway (TKT and TAL reactions) operated

in reverse to generate ribose-5-phosphate for nucleotide biosynthesis. Glucose catabolism

solely via the EMP pathway increased acetate and biomass production by 10%. The

expanded model also revealed that the loss of NADPH production via the oxidative pentose

phosphate pathway was compensated by a 90% increase in the flux through the

transhydrogenase reaction in the Δzwf strain. Glucose uptake for the Δgnd mutant was

decreased by less than 10% compared to WT (Figure 4.2e). Δgnd was the only strain with

a measurable flux through the ED pathway by rerouting 24.9 mmol/gdw-h of flux through

EDD and EDA reactions. Similar to Δzwf, the reversal of flux through the non-oxidative

Page 124: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

113

pentose phosphate pathway generated the required ribose-5-phosphate for nucleotide

biosynthesis.

Of the remaining mutants, Δpgi involved the most significant flux rerouting relative to WT.

Glucose uptake was reduced by 75% compared to WT resulting in a 70% reduction in

growth rate (Figure 4.2f). Flux redirection through the glyoxylate shunt and reduction of

acetate secretion improved carbon routing towards biomass precursors, thereby increasing

the biomass yield by 22% compared to WT. Interestingly, the ED pathway was found to

carry only 2 mmol/gdw-h of flux with almost all of the carbon being metabolized via the

pentose phosphate pathway (see Figure 4.2f). In addition, an 80% reduction in flux through

glycolysis along with the absence of acetate secretion lowers overall glycolytic ATP

production. This loss is compensated by the reversal of flux through the transhydrogenase

reaction relative to WT to enable oxidation of excess NADPH generated by the oxidative

pentose phosphate pathway. k-ecoli307 captured that non-competitive inhibition of the

EDA reaction by glyceraldehyde-3-phosphate limits flux through the ED pathway in all

strains but Δgnd. In the Δgnd mutant, a 37-fold increase in the concentration of 6-

phosphogluconate provided the necessary driving force to overcome this product

inhibition, thereby allowing a flux of 24.9 mmol/gdw-h through the ED pathway. In

contrast, in mutant Δpgi a two-fold increase in the concentration of glyceraldehyde-3-

phosphate maintains the inhibition on the ED pathway which could not be overcome by a

40% increase in the concentration of 6-phosphogluconate (Hoque et al., 2011).

A total of 2,501 Km and Vmax values were subsequently assembled using the estimated

elementary kinetic parameters. Unlike as previously thought, the total number of

Page 125: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

114

Michaelis-Menten parameters exceed the number of elementary kinetic parameters. This

can be traced back to the fact that the number of elementary kinetic parameters per reaction

increases linearly with the number of participating species (reactants and products) (see

Appendix C.1) as opposed to quadratic scaling with Michaelis-Menten parameters

(Cleland, 1963). The number of Michaelis-Menten parameters is always two less than the

number of elementary kinetic parameters for a reversible uni-uni reaction mechanism

(Cleland, 1963), is one less than the number of elementary kinetic parameters for a bi-uni

or uni-bi reaction and always exceeds the number of elementary kinetic parameters for bi-

bi and higher order reaction mechanisms (Cleland, 1963). Since the vast majority of

reactions in k-ecoli307 are uni-bi or bi-uni type reactions involving cofactors, the number

of Michaelis-Menten parameters are underestimated in this study due to simplifications in

cofactor metabolism. If the reaction description in k-ecoli307 were expanded to account

for all cofactor forms, protons, water, and phosphate groups, the number of Michaelis-

Menten parameters would be much higher. The larger number of Michaelis-Menten

parameters compared to elementary kinetic parameters would explain the reported

correlations within Michaelis-Menten parameters leading to multicollinearity (Heijnen and

Verheijen, 2013).

The standard deviation for the Michaelis-Menten parameters was calculated by

propagating the corresponding uncertainty of the elementary kinetic parameters (Figure

4.3). Only 321 of the 570 Vmax parameters were resolved with a standard deviation of less

than 20%. 181 Vmax values had a standard deviation exceeding 100% and were deemed

unresolved. Similarly, only 1,424 Km values were resolved with a standard deviation under

Page 126: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

115

20% and 443 Km parameters had standard deviation exceeding 100%. As expected, the

estimation uncertainty for the elementary kinetic parameters was propagated to Km and

Vmax (Figure 4.6a). Interestingly, the fraction of very well-resolved (standard deviation <

1%) Michaelis-Menten parameters was 60% whereas only 28% of the elementary kinetic

parameters were well resolved. This was because the nonlinear mapping elementary kinetic

parameters onto Michaelis-Menten parameters resulted in 1,500 Michaelis-Menten

parameters assuming a value under 10 and thus have a narrow confidence interval despite

the uncertainty propagation. As was the case with the three test models, we find that all

1,129 reverse elementary fluxes in the WT network were resolved with a standard deviation

less than 1% whereas the resolution of enzyme fractions exhibited the same trend as

elementary kinetic parameters. This indicates that the wide confidence intervals for

elementary and Michaelis-Menten kinetic parameters can be traced back to inability to pin

down WT enzyme fractions. Unlike with the test models, the average standard deviation

for enzyme fractions was only 0.04 mol/mol-total enzyme with 793 of the 933 enzyme

fractions being resolved with a standard deviation of less than 0.1 mol/mol-total enzyme.

On the other hand, 333 elementary kinetic parameters were poorly resolved due to the fact

that the reactant enzyme complex corresponding to the unresolved kinetic parameter is

estimated to have an abundance less than 0.1 mol/mol-total enzyme leading to a larger

relative uncertainty of estimation. In contrast, 50 of 69 kinetic parameters for inhibition of

enzyme catalysis were resolved with a standard deviation of less than 10%. Well resolved

inhibition kinetic parameters include the inhibition of the oxidative pentose phosphate

pathway by NADPH, product inhibition of the ED pathway by glyceraldehyde-3-

phosphate, and product inhibition of cis-Aconitase by isocitrate. The narrow confidence

Page 127: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

116

interval also places a non-zero lower bound on these inhibition constants implying the

essentiality of regulatory interactions in k-ecoli307 to explain the available experimental

flux datasets. It is important to note that the approach used for computing confidence

intervals generally places an upper bound on the confidence interval as it does not take into

account the nonlinear structure of the kinetic model. Accurate confidence intervals can be

computed using profile-likelihood approaches (Antoniewicz et al., 2006) and generally

result in narrower confidence intervals as commonly seen in 13C-MFA.

The predictive capability of the model was evaluated by comparing the model prediction

of product yields in engineered strains with the corresponding experimental yield. The

genetic perturbation mutants considered for evaluation of predictive capability were not

included in the training dataset for kinetic parameterization. Of the six over-producing

strains evaluated, the kinetic model successfully predicted the yields of acetate, malate, and

lactate to within 30% of the reported experimental yield (Table 4.1). This indicates that

the genetic perturbations in the training dataset for parameterization and that the regulatory

structure of the expanded kinetic model is sufficient to explain the phenotypic response of

E. coli to perturbations in the EMP pathway. The yield predictions for acetate and malate

were superior to those by k-ecoli457 due to the fact that both the training dataset for

parameterization of the expanded model and cultivation of the engineered strains were at

the same mid-exponential growth phase, whereas, the training dataset for k-ecoli457 was

generated during late exponential growth phase. The transcriptomic and fluxomic

differences between these two growth conditions limits the carbon flux through acetate

metabolism in the late exponential growth phase (Ishii et al., 2007). Unlike predictions for

Page 128: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

117

core metabolism, product yields originating from peripheral metabolism were poorly

predicted by both models. This was because fluxes through peripheral metabolism are

growth-coupled in both models which limits both flux through these pathways as well as

flux variability across different mutant conditions and adversely impacts the prediction

fidelity of both kinetic models.

4.4. Discussion

This chapter details the development of K-FIT, an accelerated kinetic parameterization

algorithm based on steady-state fluxomic data. The K-FIT algorithm estimates kinetic

parameters by solving a nonlinear least-squares minimization problem to recapitulate

experimentally measured steady-state metabolite concentrations and fluxes using an

iterative loop comprised of three steps: K-SOLVE, SSF-Evaluator, and K-UPDATE. The

computational savings afforded by bypassing ODE integration improves parameterization

speed of K-FIT by over three orders of magnitude compared to the GA-based EM

procedure for a core model of metabolism containing 953 kinetic parameters. We anticipate

that these savings would become even more pronounced for larger models. The

parallelizable architecture of SSFEstimator improves the scalability of the procedure while

allowing compatibility with GPU-based computing architectures which affords significant

improvements in computation speed. The iterative scheme presented in SSFEstimator is

inherently numerically stable which allows it to handle stiff systems of equations with ease

while permitting reliable calculation of first- and second-order gradients. Furthermore, the

ability to calculate gradients enables local statistical analysis of inferred kinetic parameters.

Page 129: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

118

The poor resolution of elementary kinetic parameters arises from the propagation of

relative uncertainty in the estimation of enzyme fractions. While enzyme fractions were

generally poorly resolved in the three test models, 85% of the enzyme fractions in k-

ecoli307 were resolved with a standard deviation of less than 0.1 mol/mol-total enzyme. It

is important to note that this uncertainty calculation is based on local linear statistics and

thus, places an upper bound on uncertainty estimates. In order to obtain better estimates

for uncertainty, accurate confidence intervals (Antoniewicz et al., 2006) will have to be

constructed that take account for the nonlinear relationships between elementary fluxes,

enzyme fractions, metabolite concentrations, and kinetic parameters. Accurate confidence

intervals will provide insights into resolvability of kinetic parameters for the set of

experimental data and enable the identification of informative mutants (Zomorrodi et al.,

2013) and design of experiments (Banga and Balsa-Canto, 2008) to pin down the poorly

resolved kinetic parameters. Furthermore, using accurate confidence intervals, additional

insights into reaction reversibility and importance of regulatory interactions can be

gleaned. Currently, the statistical significance of regulatory interactions can only be

evaluated using frameworks such as SIMMER (Hackett et al., 2016).

The applicability of K-FIT to large-scale models was demonstrated using an expanded

kinetic model of E. coli containing 307 reactions, 258 metabolites, and 2,407 kinetic

parameters, parameterized using fluxes elucidated using 13C-MFA. In order to avoid any

error propagation arising from flux projection from simpler models, a recently developed

two-step computational pipeline (Foster et al., 2019 (Under Review)) was used for kinetic

parameterization using 13C-labeling data. First, fluxes were elucidated for the expanded

Page 130: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

119

model in the WT and six single gene-deletion strains using 13C-MFA. The elucidated

fluxes were then used to parameterize the kinetic model corresponding to the same

stoichiometric model. Although the expanded kinetic model recapitulated the fluxes better

than a core model for E. coli, product yield predictions in engineered strains did not differ

significantly compared to those predicted by the core model. This was traced back to a lack

of variability in fluxes through peripheral pathways across mutants due to growth coupling.

Since most amino acids are not catabolized by E. coli, reliable parameterization of these

pathways requires model expansion to amino acid pool turnover by protein synthesis and

degradation. Additional fluxomic and metabolomic data from overproducing strains will

also be required to capture the link between genetic perturbations and increased flux

through peripheral metabolism as the WT strain of E. coli does not secrete any amino acids

during the mid-exponential growth phase.

Overall, this procedure highlights the data-demanding nature of the kinetic

parameterization problem. Although kinetic parameterization was performed using only

steady-state flux data, steady-state metabolite concentration data can be used in the SSR

objective function. In all studies enzyme levels were assumed to remain the same in

mutants as in WT with the exception of enzymes that are associated with knock-out genes

which were obviously set to zero. Nevertheless, K-FIT allows for enzyme levels for the

mutants to be pre-specified if the information is known a priori. Ideally, one would want

to integrate allosteric with transcriptional regulation so that the enzyme concentrations in

the mutant networks can be related to the altered metabolite concentrations (Fuhrer et al.,

2017). This would ultimately enable the integration of mutant network data generated

Page 131: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

120

under both genetic and environmental perturbations and improve its predictive capabilities.

Furthermore, the local sensitivity of fluxes and metabolite concentrations with respect to

kinetic parameters directly map to elasticity coefficients used in metabolic control analysis.

They can thus be used to calculate flux and concentration control coefficients at minimal

additional cost to inform metabolic engineering strategies.

Page 132: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

121

Table 4.1: Comparison of predicted product yields (mol/mol glucose) with

experimental yields in engineered over-producing strains of E. coli. The experimental

yields and predictions by k-ecoli457 were obtained from previously published data by

Khodayari and Maranas (Khodayari and Maranas, 2016)

Product

Perturbed

Enzyme

Predicted

Yield

Predicted Yield

(k-ecoli457)

Experimental

Yield

Acetate 0.1x RPI 0.93 0.2 0.75

L-Valine 0.1x THRD 0.03 0.02 0.34

Lactate 0x ACKr 1.4 1.11 1.13

Malate

0.3x PTA;

10x PPCK

0.16 0.84 0.15

Artemisinin 2x PDH 0.17 0.03 0.38

Naringenin 2xACCOAC 0.026 0.012 0.008

Page 133: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

122

Figure 4.1: Overview of the core loop of the K-FIT algorithm

Page 134: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

123

Figure 4.2: Flux distribution through central metabolism of the expanded model for E.

coli in (a) Δeda, (b) Δedd, (c) Δfbp, (d) Δzwf, (e) Δgnd, and (f) Δpgi mutant strains.

Reactions representing metabolite flows between central and peripheral metabolism are

indicated using green arrows. Fluxes elucidated using 13C-MFA are shown in green and

the corresponding flux prediction by the expanded kinetic model is shown in brown.

Reactions corresponding to the knocked-out genes in each mutant strain are indicated using

red arrows. Flux measurements for PFK and FBP were not fitted due to poor resolution

13C-MFA

(a)

G6PG6P

F6PF6P

FDPFDP

DHAPDHAP G3PG3P

6PG6PG

Ru5PRu5P

CO2CO2

3PG3PG

PEPPEP

PYRPYR

AcCOAAcCOA

CIT

ICT

AKG

SUC

COASUC

MAL

FUM

OAA

CO2CO2

CO2CO2

ACAC

E4PE4P R5PR5P

Xu5PXu5P

S7PS7P

KDPGKDPG

GLXGLX

GlcGlc

PTS

ZWF

PGI

GND

EDD

EDA

RPI

RPE

TKT

TKTTALTKT

TALPFK

FBP

FBA

TPI

GAPDH

/PGK

PGM/

ENO

PYK

PDH

CS

ACONT

IDH

OGDH

SUCOAS

SDH

FH

ME

PPC

PPCk

MDH

ICL

MALS

PTA/

ACK

7

102.4

75.9

735.5648.8

86.7

85.6

1.1

177

11.6

165.4

31.3

1

25.5

11.2

14.3

6.86.8

4.3

11.2

0

0

7.502.5

0.3

121.5

18.5

18.5

18

0.5

11

11

11.5

17.7

6.2

12.5

5.7

31

4.4

20.6

81.821.2

17.9

85.3

3.5

0.3

0.5

6.8

5.1

25.5

1.2

4

73

0.4

2.7

5.7

100

10.1 93.4

83.3

82

169.8

1.3

0.5

25.8

0

0

10.7

6.76.70

10.7

15.1

156.4

27.8

107.6

15.6

15.6

15

7

7

7.6

14.3

14.9 0.6

0.6

0

22.9

0

6.7

13.4

20.2

8

6.7

22.2

61.3

3

71.3

25.8

8.4

30.1

Page 135: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

124

(b)

G6PG6P

F6PF6P

FDPFDP

DHAPDHAP G3PG3P

6PG6PG

Ru5PRu5P

CO2CO2

3PG3PG

PEPPEP

PYRPYR

AcCOAAcCOA

CIT

ICT

AKG

SUC

COASUC

MAL

FUM

OAA

CO2CO2

CO2CO2

ACAC

E4PE4P R5PR5P

Xu5PXu5P

S7PS7P

KDPGKDPG

GLXGLX

GlcGlc

PTS

ZWF

PGI

GND

EDD

EDA

RPI

RPE

TKT

TKTTALTKT

TALPFK

FBP

FBA

TPI

GAPDH

/PGK

PGM/

ENO

PYK

PDH

CS

ACONT

IDH

OGDH

SUCOAS

SDH

FH

ME

PPC

PPCk

MDH

ICL

MALS

PTA/

ACK

6.8

91.3

66

79.93.7

76.2

75

1.2

155.6

12.6

143

26.1

1

24.3

10.5

13.8

6.46.4

4

10.5

0

0

7.402.4

0.3

100.7

14.4

14.4

14.3

0.1

7.5

7.5

7.6

13.5

5.9

13.6

0

20.5

0

19.7

65.520.7

16.7

68.9

3.4

0.2

0.1

6.4

5.1

24.3

1.2

4

73

0.4

2.7

5.7

100

10.1 93.4

83.3

82

169.8

1.3

0.5

25.8

0

0

10.7

6.76.70

10.7

15.1

156.4

27.8

107.6

15.6

15.6

15

7

7

7.6

14.3

14.9 0.6

0.6

0

22.9

0

6.7

13.4

20.2

8

6.7

22.2

61.3

3

71.3

25.8

8.4

30.1

Page 136: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

125

(c)

G6PG6P

F6PF6P

FDPFDP

DHAPDHAP G3PG3P

6PG6PG

Ru5PRu5P

CO2CO2

3PG3PG

PEPPEP

PYRPYR

AcCOAAcCOA

CIT

ICT

AKG

SUC

COASUC

MAL

FUM

OAA

CO2CO2

CO2CO2

ACAC

E4PE4P R5PR5P

Xu5PXu5P

S7PS7P

KDPGKDPG

GLXGLX

GlcGlc

PTS

ZWF

PGI

GND

EDD

EDA

RPI

RPE

TKT

TKTTALTKT

TALPFK

FBP

FBA

TPI

GAPDH

/PGK

PGM/

ENO

PYK

PDH

CS

ACONT

IDH

OGDH

SUCOAS

SDH

FH

ME

PPC

PPCk

MDH

ICL

MALS

PTA/

ACK

7

91.2

66.7

75.40

75.4

74.3

1.2

154.5

11.9

142.6

25.9

1

24.3

9.1

13.8

5.75.7

3.3

9.1

1.2

1.2

8.102.4

0.2

100.3

12.3

12.3

12

0.3

5

5

5.3

11.4

6.1

11.7

0

20.4

0

19.8

66.421.3

16.8

69.9

3.5

0.3

0.3

5.7

5.1

23.5

1.3

3.5

73

0.3

2.7

5.6

100

0 82.4

82.4

81

168.8

1.4

0.4

25.7

1.5

1.5

9.7

6.26.20

9.7

15.1

155.5

27

107.9

16

16

15.2

7.3

7.3

8.1

14.7

15.5 0.8

0.8

0

22.9

0.2

6.2

13.3

19.1

7.9

6.6

22.4

66.6

3

70.6

25.7

8.9

24.5

Page 137: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

126

(d)

G6PG6P

F6PF6P

FDPFDP

DHAPDHAP G3PG3P

6PG6PG

Ru5PRu5P

CO2CO2

3PG3PG

PEPPEP

PYRPYR

AcCOAAcCOA

CIT

ICT

AKG

SUC

COASUC

MAL

FUM

OAA

CO2CO2

CO2CO2

ACAC

E4PE4P R5PR5P

Xu5PXu5P

S7PS7P

KDPGKDPG

GLXGLX

GlcGlc

PTS

ZWF

PGI

GND

EDD

EDA

RPI

RPE

TKT

TKTTALTKT

TALPFK

FBP

FBA

TPI

GAPDH

/PGK

PGM/

ENO

PYK

PDH

CS

ACONT

IDH

OGDH

SUCOAS

SDH

FH

ME

PPC

PPCk

MDH

ICL

MALS

PTA/

ACK

8.3

103

102.3

419324

95

93.7

1.3

184.2

15.1

169.1

34.8

0.7

0

6.9

6.9

22

4.9

6.9

0

0

8.902.9

0.4

118.1

16.6

16.6

16.3

0.3

8

4

8.3

15.6

7.3

15.8

0

25.9

1.2

23.9

76.324.9

19.7

80.4

4.1

0.1

0.3

2

6.6

0

1.2

4.4

98.8

0

2.5

1.9

100

3.2 95.3

92.1

91

179

1.1

0.2

0

0

0

6.2

1.91.90

6.2

6.2

165.7

36

118.4

18

18

17.7

9.7

6.5

10.1

17

15.5 0.3

0.3

1.9

24.9

0.3

1.9

13.3

19.5

8

6.9

22.1

77.2

3.5

80.7

0

8.1

22.9

Page 138: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

127

(e)

G6PG6P

F6PF6P

FDPFDP

DHAPDHAP G3PG3P

6PG6PG

Ru5PRu5P

CO2CO2

3PG3PG

PEPPEP

PYRPYR

AcCOAAcCOA

CIT

ICT

AKG

SUC

COASUC

MAL

FUM

OAA

CO2CO2

CO2CO2

ACAC

E4PE4P R5PR5P

Xu5PXu5P

S7PS7P

KDPGKDPG

GLXGLX

GlcGlc

PTS

ZWF

PGI

GND

EDD

EDA

RPI

RPE

TKT

TKTTALTKT

TALPFK

FBP

FBA

TPI

GAPDH

/PGK

PGM/

ENO

PYK

PDH

CS

ACONT

IDH

OGDH

SUCOAS

SDH

FH

ME

PPC

PPCk

MDH

ICL

MALS

PTA/

ACK

6.2

94.5

65.4

567507.1

59.9

58.9

1

143.6

11.5

132.1

13.3

0.9

0

5.2

5.2

1.51.5

3.7

5.2

28.2

28.2

6.702.2

0.2

121.8

17.3

17.3

17.1

0.2

10.9

10.9

11.1

16.5

5.4

16.7

0

24.4

5.7

18.1

85.418.9

14.2

88

2.6

0.3

0.2

1.5

5.6

28.2

1.2

3.5

64.7

0.5

1.9

3.9

90.8

3 62.1

59.1

58.2

139.1

0.9

0.4

0

24.9

24.9

5.1

1.61.60

5.1

5.1

129.3

14.9

114.8

18.3

18.3

18

10.8

10.8

11.1

16.4

15.8 0.3

0.3

0.9

21.4

1.7

1.6

9.8

16.7

7.2

5.3

17.2

77.5

2.5

80

24.9

6.7

18.7

Page 139: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

128

(f)

G6PG6P

F6PF6P

FDPFDP

DHAPDHAP G3PG3P

6PG6PG

Ru5PRu5P

CO2CO2

3PG3PG

PEPPEP

PYRPYR

AcCOAAcCOA

CIT

ICT

AKG

SUC

COASUC

MAL

FUM

OAA

CO2CO2

CO2CO2

ACAC

E4PE4P R5PR5P

Xu5PXu5P

S7PS7P

KDPGKDPG

GLXGLX

GlcGlc

PTS

ZWF

PGI

GND

EDD

EDA

RPI

RPE

TKT

TKTTALTKT

TALPFK

FBP

FBA

TPI

GAPDH

/PGK

PGM/

ENO

PYK

PDH

CS

ACONT

IDH

OGDH

SUCOAS

SDH

FH

ME

PPC

PPCk

MDH

ICL

MALS

PTA/

ACK

2.1

25

0

12.90

12.9

12.5

0.4

33.6

4.6

29

0.1

0.3

22.7

13

9.6

77

6

13

2

2

2.601

0.2

22.3

8.8

8.8

2.1

6.7

0

0

6.3

8.5

2.2

14.2

1

3.7

1.8

7.3

0.87.6

6.3

0.4

1.2

0.1

6.7

7

2

24.7

1.2

6

0

0

1

1.9

25.9

0 13

13

12.7

34

0.3

0.3

22.6

2

2

13

770

13

9.6

29.6

0

22.5

9.1

9.1

2.8

0

0

6.5

8.2

14.5 6.3

6.3

0

3.6

1.8

7

3.4

5.4

2.8

1.7

6.6

0.6

2

1.4

24.7

2.6

7.4

Page 140: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

129

Figure 4.3: Uncertainty in estimation of Michaelis-Menten kinetic parameters (Km and

Vmax) in k-ecoli307. The width of the confidence interval refers to the standard deviation

of the estimated kinetic parameter determined from the Covariance matrix.

Page 141: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

130

Figure 4.4: Overview of the K-FIT algorithm showing the flow of information between

various components.

Page 142: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

131

Figure 4.5: Test models used to benchmark the performance of K-FIT against GA-

based EM procedure. (a) Small model containing 14 reactions and 11 metabolites. (b)

Medium-sized model containing 33 reactions and 28 metabolites. (c) Core model

containing 108 reactions and 65 metabolites. Reactions knocked out in the single gene-

deletion mutants are indicated using a red X.

(a)

Page 143: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

132

(b)

Page 144: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

133

(c)

Page 145: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

134

Figure 4.6: Uncertainty in estimation of (a) elementary kinetic parameters and (b) WT

enzyme fractions in k-ecoli307. Width of the confidence interval refers to the standard

deviation computed from the Covariance matrix.

(a)

Page 146: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

135

(b)

Page 147: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

Chapter 5

Summary and future work

5.1. Summary

This thesis introduces three important tools that enable to construction and deployment of

large-scale predictive kinetic models of metabolism. Identification of kinetic parameters

for a reaction requires the knowledge of (a) flux through the reaction under different

conditions, and (b) concentration of the various species involved in the reaction (reactants,

products, activators, and inhibitors). Since in vivo fluxes are not directly measurable,

indirect approaches must be applied. The most reliable approach involves tracing carbons,

hydrogens, and oxygens from a nutrient source (usually a carbon source such as glucose in

heterotrophs or CO2 in photoautotrophs) to various intracellular metabolites using a stable

isotope such as 13C, 2H, and 18O. Pathway-specific bond breaks and bond formations alter

the labeling distributions of downstream metabolites and the relative contribution of

various pathways can be estimated using nonlinear regression techniques. This technique

was initially applied to small network models comprising of central metabolism only due

to the high computational cost associated with large-scale models, the assumed

intractability of existing modeling frameworks (Choi and Antoniewicz, 2019), the assumed

sufficiency of core metabolic models, and limited availability of reaction atom mapping

information for peripheral metabolic pathways.

In order to elucidate fluxes at the genome-scale in E. coli, first, an atom mapping model,

imEco726, providing a comprehensive inventory of carbon paths was constructed using

Page 148: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

137

the CLCA algorithm (Kumar and Maranas, 2014) and manually curated. Tractability of the

EMU algorithm for the genome-scale model was confirmed based on the fact that a ten-

fold increase in the number of reactions only resulted in a five-fold increase in the number

of EMUs (Gopalakrishnan and Maranas, 2015a). The major computational bottleneck was

identified to be the construction of accurate confidence intervals for the estimated fluxes.

Since the computation of confidence intervals for a single flux in imEco726 takes as much

as 30 minutes, the total computation time takes as much as 10 days on an HPC cluster. In

order to reduce this computation time, an efficient algorithm identifying the minimum set

of fluxes for which confidence intervals must be computed is developed that leverages the

topological features of the stoichiometric network. This algorithm identifies all the

reactions that are resolved using 13C data and are not coupled to an external flux

measurement. The number of fluxes for which confidence intervals must be constructed is

reduced by nearly 75% allowing all confidence intervals to be completed in just 3 days.

The confidence intervals for the remaining fluxes are computed using FVA (Mahadevan

and Schilling, 2003). The tools developed as a part of this study enabled the assessment of

the caveats associated with the practice of projecting fluxes elucidated using a core model

onto larger networks for downstream applications such as Optforce (Ranganathan et al.,

2010) and kinetic parameterization. Loss of feasible solutions associated with

simplifications in the core model propagated to the GSM model upon flux projection

resulting in an average 56% reduction in the width of confidence intervals for 90% of all

reactions in the GSM model. This propagation of simplifications reveals the dangers

associated with flux projection and reaffirms the need for direct flux elucidation using

expanded and comprehensive metabolic and mapping models.

Page 149: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

138

For isotopic instationary MFA, the computational bottleneck was identified to be the

simulation of metabolite labeling dynamics. To this end, the existing exponential

integration scheme was improved upon by deriving analytical update formulae for the

transition matrices as opposed to numerical computation to accelerate the simulation of

labeling dynamics while decreasing memory requirements. This enabled both the

simulation of metabolite labeling dynamics as well as forward sensitivity analysis using

larger networks to predict metabolite labeling distributions and sensitivity to intracellular

fluxes at various time points. The use of this new algorithm decreased the time required for

ODE integration by as much as 48%. The improved algorithm was deployed to elucidate

fluxes in Synechocystis PCC 6803 under photoautotrophic growth conditions. Flux

elucidation revealed three key insights. First, carbon flux distribution in Synechocystis

supported maximum carbon routing towards biomass with minimal routing towards

byproducts such as organic acids and glycogen. Second, Synechocystis is unable to recycle

fixed CO2 that is oxidized in anabolic reactions, and must therefore, rely on bifurcated

pathway topologies in the TCA cycle and serine metabolism to minimize loss of fixed

carbon. Finally, Synechocystis employs an unconventional pathway for regeneration of

pentose phosphates in the CBB cycle using the TAL bypass as an alternative to FBPase.

Having developed the tools for reliable flux elucidation at the genome-scale, the next

requirement for the construction of predictive models of metabolism is an efficient

algorithm for the identification of kinetic parameters corresponding to all enzyme-

catalyzed reactions. In response to the long computation times that preclude any follow-up

statistical inference of estimated kinetic parameters, a novel decomposition-based kinetic

Page 150: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

139

parameterization algorithm, K-FIT is developed. Using a two-pronged strategy, K-FIT

achieves a 1,000-fold speed-up in parameterization. First, K-FIT bypasses numerical

integration for elucidation of steady-state metabolite concentrations by solving a system of

bilinear algebraic equations using a fixed point iteration scheme to iterate between two

smaller linearized sub-problems until steady-state is found. The computed steady-state

concentrations are then used to evaluate steady-state fluxes and local sensitivities of steady-

state fluxes to kinetic parameters. Using this information, the lack-of-fit from experimental

data and the first- and second-order gradients are computed to indirectly update kinetic

parameters. By traversing the feasible kinetic parameter space using steps informed by

gradient information, an optimal solution is usually found within 500 iterations,

contributing the computation speed-up relative to the currently used meta-heuristic

methods such as GA or particle-swarm optimization. The applicability of the K-FIT

algorithm to large-scale kinetic models was then demonstrated by parameterizing a near-

genome-scale kinetic model for E. coli, k-ecoli307, with fluxes elucidated using 13C-MFA

in six single gene-deletion mutants.

5.2. Completed and ongoing research

A key requirement for flux elucidation using 13C-MFA is the construction of curated

genome-scale carbon mapping models. As a part of this thesis, mapping models for two

model organisms were constructed: imEco726 for E. coli and imSyn617 for Synechocystis

PCC 6803. imSyn617 served as the template for the construction of imSyu593, the

mapping model for the fast-growing cyanobacterium Synechococcus elongatus UTEX

2973 (Hendry et al., 2019). imSyu593 built upon imSyn617 by adding the phosphoketolase

Page 151: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

140

pathway and reactions from calomide biosynthesis allowing E4P recycling from peripheral

metabolism. Flux elucidation revealed that Synechococcus favored the use of the

phosphoketolase pathway over pyruvate dehydrogenase for acetyl-CoA production. In

addition to this, bifurcated topology in serine metabolism was not observed and the

photorespiratory pathway was complete due to the ability to re-fix carbons oxidized in

anabolic metabolism. The ability to reincorporate oxidized carbons minimized carbon loss

in the form of CO2 and allowed Synechococcus to achieve a near-perfect routing of all

carbons towards biomass production. This, in conjunction with faster CO2 uptake and

higher light tolerance facilitated faster growth in Synechococcus compared to

Synechocystis.

The ability to trace carbons through peripheral metabolism opens up the possibility of

including metabolite labeling distributions from peripheral metabolism to infer flux

distributions within central metabolism. These additional measurements include labeling

distributions of ATP, ADP, AMP, NADP, Coenzyme-A, glucosamine, and N-

acetylglucosamine. Some of these metabolites have already been used for flux elucidation

using an expanded metabolic network for E. coli (McCloskey et al., 2016a) to improve the

precision of flux estimates and resolve exchange fluxes. Currently, carbon mapping

information from imEco726 to elucidate fluxes in cellobiose-grown Clostridium

thermocellum using a combination of amino acid labeling data measured using GC-MS

and LC-MS-derived labeling distributions for central metabolites, ATP, coenzyme-A, and

N-acetyl-glucosamine.

Page 152: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

141

More recently, flux elucidation using 13C-MFA and kinetic parameterization using K-FIT

have been combined into a streamlined computational pipeline for the construction of a

core kinetic model for E. coli (k-ecoli74) using amino acid labeling distributions measured

from seven single gene-deletion mutants from upper glycolysis (Foster et al., 2019 (Under

Review)). Kinetic parameterization was carried out in two stages. First, fluxes and

confidence intervals for all reactions from central metabolism in E. coli at isotopic steady-

state were elucidated using the techniques and mapping model established in Chapter 2.

Following this, kinetic parameterization of the same stoichiometric model was performed

using K-FIT as described in Chapter 4 of this thesis. In addition to constructing a predictive

kinetic model for central metabolism in E. coli, this study also assessed the adverse effects

of flux projection on accuracy of kinetic parameterization and its predictive capabilities.

Finally, this study also demonstrated that accurate in silico emulation and training of

kinetic model using data derived from conditions reflecting the growth conditions of

engineered strains contributes to better agreement of model predictions with

experimentally measured product yields in untrained genetic conditions.

5.3. Future directions

Being able to quickly parameterize kinetic models using K-FIT now opens up the

possibility of computing accurate confidence intervals within reasonable time. The current

practice with kinetic parameterization involves reporting the best solution with the lowest

SSR without performing any goodness-of-fit tests on the regressed parameters. Chapter 4

reports that the poor resolvability of kinetic parameters can be traced back to large relative

uncertainty corresponding to the predicted enzyme complex concentrations. Since the

Page 153: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

142

accuracy of the calculated standard deviations hinges on the validity of the linearization

approximation (i.e., close to the optimum), it is important to construct accurate confidence

intervals that provides a clearer picture of resolvability of enzyme fractions and elementary

fluxes, which is required for designing meaningful experiments to resolve the unresolved

parameters. A meaningful next step would be the development of a framework that can

also report the variance of predicted fluxes in various mutant conditions given the

uncertainty in kinetic parameter estimation so that the model predictions are more

informative than a mere point estimate as is currently reported.

Currently, kinetic models assume that the total enzyme concentrations do not fluctuate

between different mutants. While this is generally true with the assessed gene-deletion

mutants, transcriptional changes have been widely reported in environmental perturbations

(Fuhrer et al., 2017). To this end, it would be of interest to construct a statistical model that

relates the total enzyme abundance to intracellular metabolite concentrations and global

transcription regulators that can capture the transcriptional differences arising from

changes to environmental conditions. Currently, the constructed kinetic models have good

predictive capabilities only in the growth conditions in which they are trained. The ability

to capture proteomic fluctuations will extend the predictive capabilities of trained kinetic

models to other growth conditions such as late-exponential growth phase, stationary phase

as well as anaerobic metabolism which are of interest for industrial production of valuable

chemicals such as succinate and 23-butanediol. In addition to improving predictions, the

ability to predict proteomic fluctuations will enable kinetic models to reaffirm the enzyme

cost-minimization hypothesis (Noor et al., 2016) that establishes a link between protein

Page 154: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

143

cost and thermodynamics and can be an important factor in determining pathway usage in

engineered organisms.

The state of the art in computational strain design is the k-OptForce algorithm which

combines a kinetic description of central metabolism with a stoichiometric description of

peripheral metabolism. The separation of kinetic and stoichiometric description is

implemented due to limited availability of kinetic descriptions for peripheral metabolism.

As a result of this, regulatory interactions within peripheral metabolism such feedback

inhibitions in fatty acid biosynthesis, shikimate pathway, and purine biosynthesis are not

modeled by k-OptForce. Expanding the scope of the component kinetic model to include

regulatory interactions from peripheral metabolism will also expand the repertoire of

meaningful interventions that can be identified by k-OptForce.

Page 155: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

Appendix A

Flux elucidation at isotopic steady-state

A.1. Predicting labeling patterns

Decomposing the network using the EMU algorithm provides an exhaustive list of

metabolite fragments and reactions involved in predicting the labeling pattern of target

metabolite fragments for a given tracer input and flux distribution. The mass balance for a

reaction within the EMU network at isotopic and metabolic steady state shown below is

described as:

∑𝑣𝑖𝑀1,2𝑖

𝑖

− (∑𝑣𝑖

𝑖

)𝑀1,2 = 0

(1)

If M3 is a substrate to the network, then the above equation can be re-written as,

∑ 𝑣𝑖𝑀1,2𝑖

𝑖=1,2

− (∑𝑣𝑖

𝑖

)𝑀1,2 = −𝑣3𝑀1,23

(2)

Based on equations (1) and (2), the mass balance for all the reactions of a particular EMU

size can be expressed as:

Page 156: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

145

𝑨𝑿 = 𝑩𝒀 (3)

Where, X and Y represent the vectors of balanced and input EMUs respectively, and, A,

and B are the corresponding coefficient matrices, which are functions of fluxes. Since A is

a square matrix, X can be solved by inversion of A and multiplying it with the r.h.s. of

equation (3). The set of target metabolite fragments, x, is a subset of X, and their

corresponding mass isotopomer distributions (MIDs) can be obtained by solving equation

(3). The MIDs estimated above need to be corrected for uncorrected pool dilutions, and

additional label dilution arising from sparged CO2 (Leighty and Antoniewicz, 2012, 2013).

A.2. Least-Squares NLP

min𝑣

∑(𝑥(𝑣)𝑖

𝑝 − 𝑥𝑖𝑚

𝜎𝑖)

2𝑁

𝑖=1

s.t. 𝑺. 𝒗 = 0

𝑣𝑗𝐿𝐵 ≤ 𝑣𝑗 ≤ 𝑣𝑗

𝑈𝐵

In the above formulation,

𝑥(𝑣)𝑖𝑝 is the predicted labeling pattern of fragment I for a given flux distribution, v.

𝑥𝑖𝑚 is the experimentally measured labeling pattern for fragment i.

𝜎𝑖 is the standard error of measurement for fragment i.

S is the stoichiometry matrix.

𝑣𝑗𝐿𝐵is the lower bound on flux vj

𝑣𝑗𝑈𝐵is the upper bound on flux vj.

Page 157: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

146

A.3. Implementation

The equality constraint in the above formulation can be eliminated by expressing the vector

of fluxes, 𝒗, in terms of the set of free fluxes, 𝒖, using:

𝒗 = 𝑵.𝒖

Where, 𝑵 is the rational basis for the null space of 𝑺.

Since a non-negativity constraint is imposed on all fluxes, the equality constraint can be

replaced with the inequality constraint:

𝑵.𝒖 ≥ 0

If bounds for 𝒗 are available, the above inequality constraint can be modified to account

for the lower and upper bounds, 𝒗𝐿𝐵, and 𝒗𝑈𝐵:

𝑵.𝒖 ≥ 𝒗𝐿𝐵

𝑵.𝒖 ≤ 𝒗𝑈𝐵

This transforms the NLP problem to:

min𝑢

∑(𝑥(𝒖)𝑖

𝑝 − 𝑥𝑖𝑚

𝜎𝑖)

2𝑁

𝑖=1

s.t. 𝑵.𝒖 ≥ 𝒗𝐿𝐵

𝑵.𝒖 ≤ 𝒗𝑈𝐵

The above minimization problem can be solved using the fmincon function within the

Optimization Toolbox in MATLABTM. Among the different algorithm options, the

interior-point algorithm accepts a user-supplied Hessian, which can be computed using the

Page 158: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

147

first-order Taylor Series expansion(Antoniewicz et al., 2006) of the objective function to

yield:

𝐻 = (𝑑𝒙

𝑑𝒖)𝑇

𝑊−1 (𝑑𝒙

𝑑𝒖)

Where W is the covariance matrix serving as a weighting matrix for the least-squares

minimization problem.

The derivative of the estimated MIDs with respect to the free fluxes can be estimated using

the following equation.

(𝑑𝒙

𝑑𝒖)𝑇

= (𝑑𝒙

𝑑𝒗)𝑇

(𝑑𝒗

𝑑𝒖)

(𝑑𝒙

𝑑𝒖)𝑇

= (𝑑𝒙

𝑑𝒗)𝑇

𝑵

To obtain the derivative of MIDs with respect to all the fluxes within the network, we have

to differentiate equation (3), and rearrange it to obtain the following expression.

𝑑𝑿

𝑑𝒗= 𝑨−1 (

𝑑𝑩

𝑑𝒗𝒀 + 𝑩

𝑑𝒀

𝑑𝒗)

A.4. Estimation of confidence intervals

With the EMU network spanning a much smaller portion of the overall metabolic network,

the original procedure for estimation of confidence intervals was modified to enhance the

speed of estimation. While the range estimation procedure(Antoniewicz et al., 2006) of an

Page 159: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

148

individual flux was not modified, the set of fluxes whose ranges need to be directly

determined was reduced based on flux coupling properties. The procedure is as follows:

Step 1: Define the initial set of fluxes (v) as the union of the sets of fluxes

involved in EMU balances and those corresponding to an extracellular

measured flux.

Step 2: Identify all fluxes coupled to an extracellular measurement and

eliminate these fluxes from v.

Step 3: For each flux, vi, within v, identify and eliminate fluxes, vj, that are

fully coupled to vi.

Step 4: Estimate the 95% confidence interval for all the fluxes remaining in v.

Step 5: Using the estimated confidence intervals as flux bounds, perform an

FVA to estimate all the other flux ranges within the metabolic network.

While performing FVA, only the net flux of reversible reactions was considered due to the

fact that exchange fluxes which are not involved in EMU balances cannot be resolved by

13C-MFA.

Page 160: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

149

Appendix B

Flux elucidation procedure for isotopic instationary MFA

B.1. Least-squares NLP for flux and pool size estimation

Cellular growth with a 13C-labeled substrate results in the incorporation of labeled atoms

into various downstream metabolites causing the synthesis of molecules with different

masses based on the extent of 13C-incorporation. These mass shifts are quantified using

NMR spectroscopy or mass spectrometry (MS) following separation of metabolites using

chromatography (GC or LC). During MS, metabolites can be fragmented as a consequence

of electron impact thus providing information about labeling distributions of both the

complete metabolite as well as its fragments. These measured fragments (including a

partial or whole metabolite) are represented as mass-isotopomer distribution vectors

(MDVs) which are row vectors of the fractional abundance of molecules of various masses

according to their 13C labeling distribution. They are denoted as 𝒙𝑖𝑚𝑒𝑎𝑠 and have a

measurement variance 𝚺𝑖. A transient labeling experiment involves sampling metabolites

at various time points during the isotopic instationary period due to which the labeling

distributions depend on both flux distribution (v) at metabolic steady-state as well as

intracellular metabolite pool sizes (c) (Noh et al., 2006). The objective of 13C-MFA is to

identify a suitable flux distribution and pool sizes consistent with 𝒙𝑖𝑚𝑒𝑎𝑠. While the solution

to the forward problem of estimating labeling distribution with known fluxes and pool sizes

can be obtained easily, either by solving a system of algebraic equations in case of isotopic

steady-state or a system of ordinary differential equations (ODEs) under transient labeling

Page 161: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

150

conditions, the inverse problem is nonlinear and non-convex. As a result of this, fluxes and

pool sizes at metabolic steady-state must be obtained as the solution of a variance-weighted

least-squares non-linear programming (NLP) problem that minimizes the sum of square of

residuals (SSRES) representing the sum of squared deviation of predicted metabolite

labeling distributions (𝒙𝑖𝑝𝑟𝑒𝑑

) from the corresponding experimental measurements. Note

that the procedure for estimating 𝒙𝑖𝑝𝑟𝑒𝑑

given a flux distribution and pool sizes is described

in the next subsection. In addition to labeling distributions, extracellular flux measurements

such as substrate uptake rate, growth rate, and product yields can also be measured

(corresponding to 𝒗𝑗𝑚𝑒𝑎𝑠) and can be included in SSRES.

min𝑣,𝑐

𝑆𝑆𝑅𝐸𝑆 = [∑(𝒙𝑖𝑝𝑟𝑒𝑑(𝒗, 𝒄) − 𝒙𝑖

𝑚𝑒𝑎𝑠)

𝑃

𝑖=1

𝑾𝒊(𝒙𝑖𝑝𝑟𝑒𝑑(𝒗, 𝒄) − 𝒙𝑖

𝑚𝑒𝑎𝑠)𝑇

+ ∑(𝑣𝑗

𝑝𝑟𝑒𝑑 − 𝑣𝑗𝑚𝑒𝑎𝑠

𝜎𝑗)

2𝑄

𝑗=1

]

𝑠. 𝑡. 𝑺. 𝒗 = 0

𝒗𝐿𝐵 ≤ 𝒗 ≤ 𝒗𝑈𝐵

𝒄 ≥ 0

The following quantities participate in formulation SSRES with 𝑛𝑣 fluxes and 𝑛𝑐

metabolite pool sizes:

P is the number of metabolite fragments whose labeling distribution is quantified by MS.

Page 162: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

151

Q is the number of extracellular fluxes (substrate uptake, growth rate, product yields)

measured.

𝒗 is an [𝑛𝑣 × 1] vector of metabolic fluxes with reversible reactions decomposed into

separate forward and backward reactions, respectively.

𝒄 is an [𝑛𝑐 × 1]vector of pool sizes.

𝒙𝑖𝑚𝑒𝑎𝑠 is the [1 × (𝑘 + 1)] experimentally measured labeling distribution vector of

fragment I containing k carbons. 𝒙𝑖𝑚𝑒𝑎𝑠 contains (k+1) columns to account for the fact that

𝒙𝑖𝑚𝑒𝑎𝑠 can contain from zero to k labeled carbons.

𝒙𝑖𝑝𝑟𝑒𝑑(𝒗, 𝒄) corresponds to the predicted labeling distribution vector of fragment i.

𝒙𝑖𝑝𝑟𝑒𝑑(𝒗, 𝒄) has the same dimensions as 𝒙𝑖

𝑚𝑒𝑎𝑠. 𝒙𝑖𝑝𝑟𝑒𝑑(𝒗, 𝒄) is related implicitly to

intracellular fluxes v and pool sizes c and the procedure for calculating labeling

distributions for a given flux distribution and metabolite pool sizes is described in the next

subsection.

𝑾𝒊 is a [(𝑘 + 1) × (𝑘 + 1)] diagonal matrix of weights equal to 𝚺𝑖−1.

𝑣𝑗𝑚𝑒𝑎𝑠 corresponds to measured extracellular fluxes and product yields with standard

deviation 𝜎𝑗.

𝑣𝑗𝑝𝑟𝑒𝑑

are the predicted (calculated) extracellular fluxes.

S is the stoichiometry matrix.

Page 163: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

152

𝒗𝐿𝐵 and 𝒗𝑈𝐵 denote the lower and upper bounds on fluxes 𝒗, respectively obtained using

FVA

The above NLP structure is similar to the NLP formulation used for steady-state MFA

(Antoniewicz et al., 2006; Gopalakrishnan and Maranas, 2015a). The equality constraints

in the above NLP (𝑺. 𝒗 = 0) can be transformed into inequality constraints capturing the

flux bounds (𝒗𝐿𝐵 and 𝒗𝑈𝐵) using a null-space projection representing 𝑛𝑣 fluxes v in terms

of 𝑛𝑢 free fluxes u (Wiechert et al., 1997). This enables a reduction in the number of

dimensions in the search space (Antoniewicz et al., 2006; Gopalakrishnan and Maranas,

2015a).

𝒗 = 𝑵.𝒖 (1)

N is an [𝑛𝑣 × 𝑛𝑢] matrix whose columns represent the basis for the null space of S derived

from the reduced row echelon form of S. The number of columns in N corresponds to the

numbers of degrees of freedom of the null space of S. As a consequence, the independent

variables comprised of 𝑛𝑢 free fluxes and 𝑛𝑐 pool sizes can be combined into an [𝑛𝑝 × 1]

vector of parameters p such that 𝒑 = [𝒖𝑻|𝒄𝑻]𝑇. The Least-squares NLP is now minimized

over 𝑛𝑝 parameters and can be re-written as:

min𝒑

𝑆𝑆𝑅𝐸𝑆 = [∑(𝒙𝑖𝑝𝑟𝑒𝑑(𝒑) − 𝒙𝑖

𝑚𝑒𝑎𝑠)

𝑃

𝑖=1

𝑾𝒊(𝒙𝑖𝑝𝑟𝑒𝑑(𝒑) − 𝒙𝑖

𝑚𝑒𝑎𝑠)𝑇

+ ∑(𝒗𝑗

𝑝𝑟𝑒𝑑(𝒑) − 𝒗𝑗𝑚𝑒𝑎𝑠

𝜎𝑗)

2𝑄

𝑗=1

]

Page 164: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

153

𝑠. 𝑡. 𝑵.𝒖 ≥ 𝒗𝐿𝐵

𝑵.𝒖 ≤ 𝒗𝑈𝐵

𝒄 ≥ 0

In vector notation, SSRES is represented as:

𝑆𝑆𝑅𝐸𝑆 = (𝒙𝑝𝑟𝑒𝑑(𝒑) − 𝒙𝑚𝑒𝑎𝑠)𝑾(𝒙𝑝𝑟𝑒𝑑(𝒑) − 𝒙𝑚𝑒𝑎𝑠)𝑇

In the above equation, 𝒙𝑝𝑟𝑒𝑑(𝒑) and 𝒙𝑚𝑒𝑎𝑠(𝒑) are assembled as follows:

𝒙𝑝𝑟𝑒𝑑(𝒑) = [𝒙1𝑝𝑟𝑒𝑑(𝒑)|𝒙2

𝑝𝑟𝑒𝑑(𝒑)|… |𝒙𝑃𝑝𝑟𝑒𝑑(𝒑)|𝑣1

𝑝𝑟𝑒𝑑(𝒑)|𝑣2𝑝𝑟𝑒𝑑(𝒑)|… |𝑣𝑄

𝑝𝑟𝑒𝑑(𝒑)],

𝒙𝑚𝑒𝑎𝑠 = [𝒙1𝑚𝑒𝑎𝑠|𝒙2

𝑚𝑒𝑎𝑠| … |𝒙𝑃𝑚𝑒𝑎𝑠|𝑣1

𝑚𝑒𝑎𝑠|𝑣2𝑚𝑒𝑎𝑠| … |𝑣𝑄

𝑚𝑒𝑎𝑠],

W is the combined [𝑛𝑚 × 𝑛𝑚] diagonal matrix of weights equal to the inverse of the

variance associated with the 𝑛𝑚 measurements contained in 𝒙𝑚𝑒𝑎𝑠.

As described earlier (Antoniewicz et al., 2006), a first-order Taylor series expansion of

𝒙𝑝𝑟𝑒𝑑(𝒑) can be performed to obtain a quadratic approximation for SSRES so that the step

direction can be computed as described in Equation (2):

∆𝒑 = −𝑯−1𝑱 (2)

In the above equation, J is an [𝑛𝑝 × 1] vector representing the approximate gradient of

SSRES and H is an [𝑛𝑝 × 𝑛𝑝] matrix corresponding to the approximate Hessian of

Page 165: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

154

SSRES. J and H are related to predicted values (𝒙𝑝𝑟𝑒𝑑) and their sensitivity to parameters

p (𝜕𝒙𝑝𝑟𝑒𝑑

𝜕𝒑) by Equations (3) and (4):

𝑱 = (𝜕𝒙𝑝𝑟𝑒𝑑

𝜕𝒑)𝑾(𝒙𝑝𝑟𝑒𝑑 − 𝒙𝑚𝑒𝑎𝑠)

𝑇

(3)

𝑯 = (𝜕𝒙𝑝𝑟𝑒𝑑

𝜕𝒑)𝑾(

𝜕𝒙𝑝𝑟𝑒𝑑

𝜕𝒑)

𝑇

(4)

Note that Equations (3) and (4) require the computation of the sensitivities 𝜕𝒙𝑝𝑟𝑒𝑑

𝜕𝒑 in

addition to 𝒙𝑝𝑟𝑒𝑑. 𝜕𝒙𝑝𝑟𝑒𝑑

𝜕𝒑 is an [𝑛𝑚 × 𝑛𝑝] corresponding to the sensitivities of P predicted

MDVs (𝒙𝑖𝑝𝑟𝑒𝑑(𝒑)) and Q extracellular fluxes (𝑣𝒊

𝒑𝒓𝒆𝒅) and is assembled as shown in

Equation (5)

𝜕𝒙𝑝𝑟𝑒𝑑

𝜕𝒑 = [

𝜕𝒙𝟏𝒑𝒓𝒆𝒅

𝜕𝒑|𝜕𝒙𝟐

𝒑𝒓𝒆𝒅

𝜕𝒑|… |

𝜕𝒙𝑷𝒑𝒓𝒆𝒅

𝜕𝒑|𝜕𝑣𝟏

𝒑𝒓𝒆𝒅

𝜕𝒑|𝜕𝑣𝟐

𝒑𝒓𝒆𝒅

𝜕𝒑|… |

𝜕𝑣𝑸𝒑𝒓𝒆𝒅

𝜕𝒑]

(5)

Under the imposed metabolic steady-state conditions, the sensitivity 𝑣𝒊𝒑𝒓𝒆𝒅

with respect to

parameters p is constant. Since kinetic parameters are not invoked in the INST-MFA

modeling framework, free fluxes u and pool sizes c are treated as independent fitted

parameters. Therefore, metabolic fluxes 𝒗 are insensitive to changes in pool sizes 𝒄. The

sensitivity of the calculated fluxes (𝑣𝒊𝒑𝒓𝒆𝒅

) with respect to parameters u and c is calculated

as:

Page 166: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

155

𝜕𝑣𝒊𝒑𝒓𝒆𝒅

𝜕𝒄= 0

𝜕𝑣𝒊𝒑𝒓𝒆𝒅

𝜕𝒖= 𝑵𝑘

𝑇

𝑵𝑘 is a [1 × 𝑛𝑢] vector derived from the kth row of N relating the predicted flux 𝒗𝒊𝒑𝒓𝒆𝒅

to

the free fluxes u.

The aggregate parameter sensitivity matrix is assembled as follows:

𝜕𝑣𝒊𝒑𝒓𝒆𝒅

𝜕𝒑= [

𝜕𝑣𝒊𝒑𝒓𝒆𝒅

𝜕𝒖

𝑇

|𝜕𝑣𝒊

𝒑𝒓𝒆𝒅

𝜕𝒄

𝑇

]𝑇 (6)

𝜕𝑣𝒊𝒑𝒓𝒆𝒅

𝜕𝒑 has the dimensions [𝑛𝑝 × 1]. Unlike predicted fluxes, the sensitivity of the predicted

metabolite labeling distributions (𝜕𝒙𝒊

𝒑𝒓𝒆𝒅

𝜕𝒑) depends on the labeling dynamics due to

instationary isotopic conditions and must be co-estimated with metabolite labeling

distributions (𝒙𝒊𝒑𝒓𝒆𝒅

) as described in the next subsection.

B.2. Dynamic EMU balances and simulation of labeling distributions

𝒙𝑖𝑚𝑒𝑎𝑠 and 𝒙𝑖

𝑝𝑟𝑒𝑑 represent the measured and predicted 13C labeling distributions for a

subset of all the carbon atoms of a particular measured metabolite. A subset of atoms of

any metabolite is termed an Elementary Metabolite Unit (EMU) (Antoniewicz et al., 2007).

For example, an MS fragment of a three-carbon metabolite M comprised of carbons at

positions 2 and 3, 𝒙𝑖𝑚𝑒𝑎𝑠 encodes the measured MDV whereas 𝒙𝑖

𝑝𝑟𝑒𝑑 describes the MDV

Page 167: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

156

of the corresponding EMU 𝑴2,3. The predicted labeling distribution of 𝑴2,3 depends on

the labeling distribution of all externally provided substrates. Therefore, the atoms

represented by EMU 𝑴2,3 must be traced back to each external substrate through all the

paths afforded by the carbon mapping model using algorithms such as EMU decomposition

(Antoniewicz et al., 2007). Any intracellular metabolite M can be produced by one of four

possible reaction types shown in Table 1. Consider the flux balance across EMU 𝑴2,3 of

size 2 (indicating number of carbons contained in the EMU) as shown in Figure 6 with

reactions 𝑣1, 𝑣2, 𝑣3, and 𝑣4 generating 𝑴2,3 from EMUs 𝑷2,3, 𝑸2,3, 𝑹2,3, and the

convolution of 𝑫2 and 𝑬1, respectively. Convolution of EMUs (Reaction type 4 in Table

1) arises from the formation of a bond between two EMUs (i.e., 𝑫2 and 𝑬1) of a smaller

size than 𝑴2,3. The corresponding MDV convolution is described by Equation (6). Note

that metabolite R (Reaction type 3) denotes an externally provided substrate.

𝑫𝟐 = [𝑎 (1 − 𝑎)]; 𝑬𝟏 = [𝑏 (1 − 𝑏)]

𝑫𝟐 ∗ 𝑬𝟏 = [𝑎 ∗ 𝑏 𝑎 ∗ (1 − 𝑏) + (1 − 𝑎) ∗ 𝑏 (1 − 𝑎) ∗ (1 − 𝑏)] (6)

Table B.1. Four types of reaction classes impacting EMU balances. Reaction v1

involves no rearrangement of the carbon skeleton of the reactant P. Reaction v2 involves

breaking of the C-C bond between carbons 3 and 4 of Q. Reaction v3 is an uptake reaction

for the external substrate R. Reaction v4 involves a bond formation between the second

carbon of D and the single-carbon E fiving rise to the convolution term described in

Equation (6).

Page 168: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

157

Reaction Types Example

𝑣1: P (abc) → M (abc)

Enolase:

2PG → PEP

𝑣2: Q (abcd) → M (abc) + S1 (d)

Malic Enzyme:

Mal → Pyr + CO2

𝑣3: R (abc) → M (abc)

Glucose uptake via PTS:

Gluc + PEP → G6P + Pyr

𝑣4: D (ab) + E (c) → M (abc)

SHMT:

Gly + MEETHF → Ser + THF

Page 169: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

158

Figure B.1. Flux balance for EMU 𝑴2,3. 𝑴2,3 is produced by four separate reactions

(one from each class) as described in Table B.1.

The labeling dynamics of EMU 𝑴2,3 can be expressed using the following relation:

𝐶𝑀

𝑑𝑴2,3

𝑑𝑡= [𝑣1 𝑣2 𝑣3 𝑣4 −(𝑣1 + 𝑣2 + 𝑣3 + 𝑣4)]

[

𝑷2,3

𝑸2,3

𝑹2,3

𝑫2 ∗ 𝑬1

𝑴2,3 ]

(7)

Here, CM denotes the pool size of metabolite M. 𝑹2,3 has a constant labeling distribution

since R is an externally supplied substrate (such as glucose or CO2). A characteristic feature

Page 170: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

159

of the EMU method is that the labeling distributions of smaller-sized EMUs are unaffected

by the EMUs of larger-size (Antoniewicz et al., 2007). As a result, 𝑹2,3 and 𝑫2 ∗ 𝑬1 can

be separated from the remaining EMUs and Equation (7) as they are unaffected by the

labeling dynamics of the other size 2 EMUs:

𝐶𝑀

𝑑𝑴2,3

𝑑𝑡= [𝑣1 𝑣2 −(𝑣1 + 𝑣2 + 𝑣3 + 𝑣4)] [

𝑷2,3

𝑸2,3

𝑴2,3

]

+ [𝑣3 𝑣4] [𝑹2,3

𝑫2 ∗ 𝑬1]

(8)

Equation (8) reveals that 𝑴2,3 depends on 𝑷2,3 and 𝑸2,3 which must be traced back to

external substrates and smaller EMU convolutions in a similar fashion resulting in an EMU

network of size 2. This results in two sets of EMUs: 𝑿𝟐 containing 𝑷2,3, 𝑸2,3, and 𝑴2,3,

connected to each other by the coefficient matrix 𝑨𝟐=[𝑣1 𝑣2 −(𝑣1 + 𝑣2 + 𝑣3 + 𝑣4)],

and 𝒀𝟐 containing the EMUs and convolutions unaffected by 𝑿𝟐 and related to 𝑿𝟐 via the

coefficient matrix 𝑩𝟐 = [𝑣3 𝑣4]. Equation (8) can thus be re-expressed in a general form

as:

𝐶𝑀

𝑑𝑴2,3

𝑑𝑡= 𝑨𝟐𝑿𝟐 + 𝑩𝟐𝒀𝟐

(9)

Although unaffected by size 2 EMUs, 𝑫2 and 𝑬1are not of constant MDV as metabolites

D and E are not externally supplied substrates. Therefore, a size 1 EMU network must be

constructed in a similar manner as the size 2 network to trace back 𝑫2 and 𝑬1 to the

substrate R resulting in EMU balances similar to Equation (8). Since convolution terms

require at least two atoms, a size 1 network will have no such terms. A dynamic balance as

Page 171: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

160

Equation (9) can be extended to all balanced metabolites in the EMU model of a particular

size, n and conforms to the following mathematical structure (Young et al., 2008):

𝑪𝒏

𝑑𝑿𝒏

𝑑𝑡= 𝑨𝒏𝑿𝒏 + 𝑩𝒏𝒀𝒏

(10)

In this mathematical description, n corresponds to the size of the EMU network. If k

intracellular metabolite EMUs, m extracellular substrate EMUs, and q convolution terms

are contained within the size n EMU network we have:

Xn is a [𝑘 × (𝑛 + 1)] matrix describing the labeling distribution of the k size-n EMUs

similar to 𝑿𝟐 in Equation (9)

Yn is an [(𝑚 + 𝑞) × (𝑛 + 1)] matrix encoding the labeling distribution of extracellular

substrate EMUs and convolution terms similar to 𝒀𝟐 in Equation (9).

An is an [𝑘 × 𝑘] matrix representing the connectivity between the k EMUs in the size n

network similar to 𝑨𝟐 in Equation (9).

Bn is an [𝑘 × (𝑚 + 𝑞)] matrix capturing the connectivity between the extracellular

substrate EMUs and convolution terms and the k EMUs in the size n network similar to 𝑩𝟐

in Equation (9). All the elements of An and Bn are linear functions of fluxes as shown in

Equation (8).

Cn is a [𝑘 × 𝑘] diagonal matrix capturing the pool sizes of the metabolites corresponding

to EMUs, Xn.

Equation (10) can be rewritten as:

𝑑𝑿𝒏

𝑑𝑡= 𝑭𝒏𝑿𝒏 + 𝑮𝒏

(11)

New parameters 𝑭𝒏 and 𝑮𝒏 are defined as:

Page 172: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

161

𝑭𝒏 = 𝑪𝒏−𝟏𝑨𝒏

𝑮𝒏 = 𝑪𝒏−𝟏𝑩𝒏𝒀𝒏

(12)

Evaluation of the step direction described in Equations (2-4) for the least-squares NLP

problem (section 1.1) requires knowledge of the sensitivity of the predicted labeling

distributions with respect to parameters p (i.e., 𝜕𝒙𝒊

𝒑𝒓𝒆𝒅

𝜕𝒑) and can be obtained by

differentiating Equation (11) with respect to p such that:

𝑑

𝑑𝑡(𝜕𝑿𝒏

𝜕𝒑) = 𝑭𝒏

𝜕𝑿𝒏

𝜕𝒑+

𝜕𝑭𝒏

𝜕𝒑𝑿𝒏 +

𝜕𝑮𝒏

𝜕𝒑

(13)

The overall system is thus represented by the following system of Equations:

𝑑𝑿𝒏

𝑑𝑡= 𝑭𝒏𝑿𝒏 + 𝑮𝒏

𝑑

𝑑𝑡(𝜕𝑿𝒏

𝜕𝒑) = 𝑭𝒏

𝜕𝑿𝒏

𝜕𝒑+ 𝑯𝒏

(14)

(15)

𝑯𝒏 consists of all the terms unaffected by changes in 𝜕𝑿𝒏

𝜕𝒑 and can be expressed as:

𝑯𝒏 =𝜕𝑭𝒏

𝜕𝒑𝑿𝒏 +

𝜕𝑮𝒏

𝜕𝒑

For a system of equations with 𝑛𝑝 sensitivity parameters 𝜕𝑿𝒏

𝜕𝒑𝒊 are independent of

𝜕𝑿𝒏

𝜕𝒑𝒋 but

are dependent on 𝑿𝒏 (see Eq. 13). As a result of this, Equations (14) and (15) must be

solved simultaneously for labeling distributions and sensitivities to free fluxes and pool

sizes. At the start of the isotope labeling experiment, all atoms are assumed to have a 13C

enrichment equal to natural abundance of 13C. Solving equation (13) at the isotopic steady-

Page 173: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

162

state conditions prior to the start of the labeling experiment confirms that 𝜕𝑿𝒏

𝜕𝒑 is zero at t =

0. Labeling distributions are sampled at various time points (𝑡1, 𝑡2, … , 𝑡𝑛) after the

introduction of the tracer. As a result of this, Equations (14) and (15) must be integrated

between the required time intervals ([𝑡1, 𝑡2], [𝑡2, 𝑡3], … , [𝑡𝑛−1, 𝑡𝑛]) to extract the relevant

𝒙𝒊𝒑𝒓𝒆𝒅

and the corresponding sensitivities 𝜕𝒙𝒊

𝒑𝒓𝒆𝒅

𝜕𝒑 at (𝑡1, 𝑡2, … , 𝑡𝑛).

The analytical solution to the above system of equations is not available due to the presence

of non-linear EMU convolution terms. An approximate analytical solution for 𝑿𝑛 at any

future time point (𝑡0 + ∆𝑡) as a function of the labeling distribution at a previous time point

𝑿𝑛(𝑡𝑜), 𝑭𝑛, 𝑮𝑛, and the time interval ∆𝑡 can be obtained by solving the system of ODEs

described by Equations (14) and (15) using the Integration Factor approach described

earlier (Young et al., 2008):

𝑿𝑛(𝑡0 + ∆𝑡) = 𝑒𝑭𝑛∆𝑡𝑿𝑛(𝑡𝑜) + ∫ 𝑒𝑭𝑛(∆𝑡−𝜏)𝑮𝑛(𝑡0 + 𝜏)𝑑𝜏

∆𝑡

0

(16)

𝜕𝑿𝑛

𝜕𝒑(𝑡0 + ∆𝑡) = 𝑒𝑭𝑛∆𝑡

𝜕𝑿𝑛

𝜕𝒑(𝑡𝑜) + ∫ 𝑒𝑭𝑛(∆𝑡−𝜏)𝑯𝑛(𝑡0 + 𝜏)𝑑𝜏

∆𝑡

0

(17)

The current state-of the art procedure by (Young et al., 2008) discretizes Equations (14)

and (15) using a non-causal first-order hold equivalent to compute the one-step solution

represented by Equations (18). Equations (21) depict the solution where the transition

matrices 𝚽𝒏, 𝚪𝒏, and 𝛀𝒏 are calculated using the following relation:

Page 174: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

163

𝑿𝑛(𝑡0 + ∆𝑡) = 𝚽𝑛𝑿𝑛(𝑡𝑜) + 𝚪𝑛𝑮𝑛(𝑡0) + 𝛀𝑛(𝑮𝑛(𝑡0 + ∆𝑡) − 𝑮𝑛(𝑡0))

𝜕𝑿𝑛

𝜕𝒑(𝑡0 + ∆𝑡) = 𝚽𝑛

𝜕𝑿𝑛

𝜕𝒑(𝑡𝑜) + 𝚪𝑛𝑯𝑛(𝑡0)

+ 𝛀𝑛(𝑯𝑛(𝑡0 + ∆𝑡) − 𝑯𝑛(𝑡0))

(18)

[𝚽𝒏 𝚪𝒏 𝛀𝒏

𝟎 𝟎 𝟎𝟎 𝟎 𝟎

] = exp ([𝑭𝑛∆𝑡 𝐈∆t 𝟎

𝟎 𝟎 𝑰𝟎 𝟎 𝟎

])

(19)

B.3. An improved algorithm for simulating labeling dynamics and sensitivities

Here we propose a faster and memory-efficient approach to compute the transition matrices

by discretizing the partial analytical solution represented by equations (16) and (17) in

order to obtain analytical expressions for the transition matrices 𝚽𝒏, 𝚪𝒏, and 𝛀𝒏 in terms

of 𝑭𝑛. 𝑮𝑛 and 𝑯𝑛 are linearized in the interval [𝑡0, 𝑡0 + ∆𝑡] using the computable quantities

𝑮𝑛(𝑡0), 𝑮𝑛(𝑡0 + ∆𝑡), 𝑯𝑛(𝑡0), and 𝑯𝑛(𝑡0 + ∆𝑡) using a non-causal first-order hold

equivalent (Franklin et al., 1997) so that 𝑮𝑛(𝑡0 + 𝜏) and 𝑯𝑛(𝑡0 + 𝜏) at any time 𝑡0 + 𝜏

between 𝑡0 and 𝑡0 + ∆𝑡 can be expressed as:

𝑮𝑛(𝑡0 + 𝜏) = 𝑮𝑛(𝑡0) +𝜏

∆𝑡(𝑮𝑛(𝑡0 + ∆𝑡) − 𝑮𝑛(𝑡0)) (

20)

𝑯𝑛(𝑡0 + 𝜏) = 𝑯𝑛(𝑡0) +𝜏

∆𝑡(𝑯𝑛(𝑡0 + ∆𝑡) − 𝑯𝑛(𝑡0)) (

21)

Upon substituting Equations (20) and (21) in Equations (16) and (17) and integrating, we

get:

Page 175: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

164

𝑿𝑛(𝑡0 + ∆𝑡) = 𝚽𝑛𝑿𝑛(𝑡𝑜) + 𝚪𝑛𝑮𝑛(𝑡0) + 𝛀𝑛(𝑮𝑛(𝑡0 + ∆𝑡) − 𝑮𝑛(𝑡0))

𝜕𝑿𝑛

𝜕𝒑(𝑡0 + ∆𝑡) = 𝚽𝑛

𝜕𝑿𝑛

𝜕𝒑(𝑡𝑜) + 𝚪𝑛𝑯𝑛(𝑡0)

+ 𝛀𝑛(𝑯𝑛(𝑡0 + ∆𝑡) − 𝑯𝑛(𝑡0))

(22)

Note that matrices 𝚽𝑛, 𝚪𝑛, and 𝛀𝑛 are recast as:

𝚽𝑛 = 𝑒𝑭𝑛∆𝑡

𝚪𝑛 = (𝑒𝑭𝑛∆𝑡 − 𝑰)𝑭𝑛−1

𝛀𝑛 = [(𝑒𝑭𝑛∆𝑡 − 𝑰)(𝑭𝑛∆𝑡)−1 − 𝑰]𝑭𝑛−1

(23)

Equation (22) represents a time-discretized form of the ODEs defined by Equations (14)

and (15). Matrix 𝚽𝒏 captures the non-linear coupling between labeling distributions,

fluxes and pool sizes. Since all the eigenvalues of 𝐅𝒏 are negative (Anderson, 1983), 𝒆𝑭𝑛𝑡

eventually vanishes implying that the product 𝚪𝑛𝐆𝑛 contains the labeling distributions at

isotopic steady-state. While the end result of both approaches is the same, the size of the

matrix for which the exponential is computed in Equation (23) is 1/9th the size of the

matrix in Equation (19). This size reduction reduces memory requirements while

accelerating the process of matrix exponential evaluation, which is the computational

bottleneck in this algorithm, thus improving scalability and enabling INST-MFA using a

genome-scale mapping model.

The matrix exponential can be approximated using different approaches such as Taylor

series expansion or Pade’s approximation (Moler and Van Loan, 2003). Because Pade’s

approximation is valid when the matrix elements are small, the matrix Fn must be first

rescaled. The matrix exponential is then evaluated as:

Page 176: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

165

𝑒𝑭𝒏 = (𝑒(𝑭𝑛)/𝑠)𝑠

𝑠 = 2𝑞

(24)

(25)

Here q is a positive integer chosen such that the absolute maximum value of any element

in the matrix Fn/s is less than 0.5 (Golub and Loan, 1996). The exponential of Fn/s is

evaluated using Pade’s approximation, which is then squared q times to obtain the

exponential of Fn. Integration of ODEs described by equation (22) can be accomplished

using adaptive step-size and error control methods such as adaptive Runge-Kutta method

or Richardson’s extrapolation integrators (Press et al., 2007a).

Page 177: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

166

Appendix C

Mathematical description of K-FIT

C.1. Overview of elementary reaction step decomposition

The complex catalytic mechanism of enzyme catalysis can be decomposed into a series of

elementary steps that are modeled using mass action kinetics. Each elementary step is

treated as reversible with one forward and one reverse elementary reaction. Each

elementary reaction is associated with one of three types of events: (i) binding of one

metabolite with an enzyme complex, (ii) release of one metabolite from an enzyme

complex, or (iii) conversion of the enzyme-reactant complex to the enzyme-product

complex. The flux through each elementary reaction is termed elementary flux and is

related to the concentration of metabolites and enzyme complexes using mass-action

kinetics. The following example details the decomposition of an enzyme-catalyzed

reaction into elementary steps and establishes the basic terms used in the kinetic

parameterization algorithm.

Consider the conversion of a metabolite A to B catalyzed by an enzyme E, regulated by a

non-competitive inhibitor C, and an activator D. The reaction mechanism can be

decomposed into six elementary steps as shown in Table 1. The set of elementary steps 𝐿

is defined as 𝐿 = {1,2,3,4,5,6}. Elementary steps 1, 2, and 3 describe the conversion of A

to B and are therefore termed catalytic elementary steps. Elementary steps 4 and 5 model

the inhibition of enzyme catalysis by metabolite C and step 6 denotes the activation of the

inactive enzyme complex for catalysis by metabolite D. Steps 4, 5, and 6 do not participate

Page 178: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

167

in the reaction; instead they regulate enzyme function and are thus referred to as regulatory

elementary steps. The set of catalytic elementary steps, denoted by 𝐿𝑐𝑎𝑡, is defined here as

𝐿𝑐𝑎𝑡 = {1,2,3}. The corresponding set of regulatory elementary steps 𝐿𝑟𝑒𝑔 is defined as

𝐿𝑟𝑒𝑔 = {4,5,6}. From Table C.1, we see that the number of unique enzyme complexes

formed over the course of the reaction is equal to the number of elementary steps required

to model the catalytic and regulatory functions of the enzyme.

Table C.1. List of elementary steps describing the catalytic mechanism and regulation

of enzyme 𝐸

Page 179: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

168

Type of

Elementary Step

Elementary

step #

Elementary Step Description of

elementary step

Catalytic Steps

1 𝐴 + 𝐸 �̂�1

⇌�̂�2

𝐸𝐴 Reactant binding to

free enzyme

2 𝐴𝐸 �̂�3

⇌�̂�4

𝐸𝐵 Conversion of

reactant to product

3 𝐸𝐵 �̂�5

⇌�̂�6

𝐸 + 𝐵 Product release from

bound complex

Regulatory

Steps

4 𝐶 + 𝐸 �̂�7

⇌�̂�8

𝐸𝐶 Inhibition of free

enzyme

5 𝐶 + 𝐸𝐴 �̂�9

⇌�̂�10

𝐸𝐴𝐶

Inhibition of

enzyme-substrate

complex

6 𝐷 + 𝐸∗ �̂�11

⇌�̂�12

𝐸

Activation of

inactive enzyme

form

Page 180: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

169

Each elementary step is modeled to be reversible with two separate elementary reactions

in the forward and reverse directions. Thus, an enzyme-catalyzed reaction that decomposes

into 𝑛𝐿 elementary steps will involve 𝑛𝑃 = 2𝑛𝐿 elementary reactions. The index of any

elementary step 𝑙 ∈ 𝐿 is related to the corresponding indices of its forward and reverse

elementary reactions (𝑓𝑤𝑑 and 𝑟𝑒𝑣, respectively) as follows:

𝑓𝑤𝑑 = 2𝑙 − 1

𝑟𝑒𝑣 = 2𝑙

(1)

(2)

Based on this, the set of elementary reactions is defined as 𝑃 = {𝑝|𝑝 = 1,2, … ,2𝐿}. This

implies that there is a sequence of alternating forward and reverse elementary reactions

contained within set P. Each elementary step is associated with its own kinetic rate

constant �̂�𝑝∀𝑝 ∈ 𝑃. We define [𝐴], [𝐵], [𝐶], and [𝐷] to be the concentrations of metabolite

𝐴, 𝐵, 𝐶, and 𝐷, respectively, and [𝐸∗] [𝐸], [𝐸𝐴] and [𝐸𝐵] to denote the concentrations of

the un-activated enzyme 𝐸∗, active free enzyme 𝐸, substrate-bound complex 𝐸𝐴 and the

product-bound complex 𝐸𝐵, respectively. [𝐸𝐶] and [𝐸𝐴𝐶] denote concentrations of the

inhibitor-bound complexes 𝐸𝐶 and 𝐸𝐴𝐶, respectively. As stated earlier, flux through

elementary steps is referred to as elementary flux. For the example reaction, the elementary

flux through the twelve elementary steps, 𝑣𝑝 ∀𝑝 ∈ 𝑃 can be computed by expressing the

reaction rate of each elementary reaction using mass-action kinetics as described by Tran

et al (Tran et al., 2008) and is shown in Equations (3):

Page 181: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

170

𝑣1 = �̂�1[𝐴][𝐸] 𝑣2 = �̂�2[𝐸𝐴]

(3)

𝑣3 = �̂�3[𝐸𝐴] 𝑣4 = �̂�4[𝐸𝐵]

𝑣5 = �̂�5[𝐸𝐵] 𝑣6 = �̂�6[𝐵][𝐸]

𝑣7 = �̂�7[𝐶][𝐸] 𝑣8 = �̂�8[𝐸𝐶]

𝑣9 = �̂�9[𝐶][𝐸𝐴]

𝑣11 = �̂�11[𝐷][𝐸∗]

𝑣10 = �̂�10[𝐸𝐴𝐶]

𝑣12 = �̂�12[𝐸]

Consistent with the convention introduced by Tran et al (Tran et al., 2008), the

concentration of metabolites 𝐴, 𝐵, 𝐶, and 𝐷 are normalized with respect to the

concentrations in the Wild-Type (WT) strain 𝐴𝑊𝑇, 𝐵𝑊𝑇, 𝐶𝑊𝑇, and 𝐷𝑊𝑇, respectively. The

corresponding relative concentrations 𝑎, 𝑏, 𝑐, and 𝑑 are defined as:

𝑎 = [𝐴]/[𝐴𝑊𝑇]

(4)

𝑏 = [𝐵]/[𝐵𝑊𝑇]

𝑐 = [𝐶]/[𝐶𝑊𝑇]

𝑑 = [𝐷]/[𝐷𝑊𝑇]

The total concentration [𝐸0] of the enzyme catalyzing the conversion of 𝐴 to 𝐵 is related

to the concentration of various enzyme forms/complexes as:

Page 182: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

171

[𝐸0] = [𝐸] + [𝐸𝐴] + [𝐸𝐵] + [𝐸𝐶] + [𝐸𝐴𝐶] + [𝐸∗] (5)

Enzyme fractions are defined as the fractional abundance of each enzyme form relative to

the total enzyme [𝐸0].

𝑒 = [𝐸]/[𝐸0] 𝑒𝑐 = [𝐸𝐶]/[𝐸0]

(6) 𝑒𝑎 = [𝐸𝐴]/[𝐸0] 𝑒𝑎𝑐 = [𝐸𝐴𝐶]/[𝐸0]

𝑒𝑏 = [𝐸𝐵]/[𝐸0] 𝑒∗ = [𝐸∗]/𝐸0]

Metabolite and total enzyme concentrations in the WT strain are often unavailable and are

therefore, lumped together with kinetic rate constants yielding the following aggregated

kinetic parameters:

𝑘1 = �̂�1[𝐴𝑊𝑇][𝐸0] 𝑘2 = �̂�2[𝐸0]

(7)

𝑘3 = �̂�3[𝐸0] 𝑘4 = �̂�4[𝐸0]

𝑘5 = �̂�5[𝐸0] 𝑘6 = �̂�6[𝐵𝑊𝑇][𝐸0]

𝑘7 = �̂�7[𝐶𝑊𝑇][𝐸0] 𝑘8 = �̂�8[𝐸0]

𝑘9 = �̂�9[𝐶𝑊𝑇][𝐸0]

𝑘11 = �̂�11[𝐷𝑊𝑇][𝐸0]

𝑘10 = �̂�10[𝐸0]

𝑘12 = �̂�12[𝐸0]

Page 183: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

172

Upon substituting the definitions from Equations (4), (6) and (7) in Equation (3)

expressions are derived for all fluxes as a function of aggregated kinetic parameters,

relative metabolite concentrations and fractional enzyme abundances:

𝑣1 = 𝑘1𝑎𝑒 𝑣2 = 𝑘2𝑒𝑎

(8)

𝑣3 = 𝑘3𝑒𝑎 𝑣4 = 𝑘4𝑒𝑏

𝑣5 = 𝑘5𝑒𝑏 𝑣6 = 𝑘6𝑏𝑒

𝑣7 = 𝑘7𝑐𝑒 𝑣8 = 𝑘8𝑒𝑐

𝑣9 = 𝑘9𝑐𝑒𝑎

𝑣11 = 𝑘11𝑑𝑒∗

𝑣10 = 𝑘10𝑒𝑎𝑐

𝑣12 = 𝑘12𝑒

Conservation of mass across all enzyme fractions at pseudo-steady-state yields the

following linear equalities:

𝑑𝑒

𝑑𝑡= 𝑣2 + 𝑣5 + 𝑣8 + 𝑣11 − 𝑣1 − 𝑣6 − 𝑣7 − 𝑣12 = 0 (9)

𝑑𝑒𝑎

𝑑𝑡= 𝑣1 + 𝑣4 + 𝑣10 − 𝑣2 − 𝑣3 − 𝑣9 = 0 (10)

𝑑𝑒𝑏

𝑑𝑡= 𝑣3 + 𝑣6 − 𝑣4 − 𝑣5 = 0 (11)

𝑑𝑒𝑐

𝑑𝑡= 𝑣7 − 𝑣8 = 0 (12)

Page 184: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

173

𝑑𝑒𝑎𝑐

𝑑𝑡= 𝑣9 − 𝑣10 = 0 (13)

𝑑𝑒∗

𝑑𝑡= 𝑣12 − 𝑣11 (14)

Upon substituting the flux expressions from Equations (8) in Equations (9) - (14), an

[𝑛𝐿 × 𝑛𝐿] square system of linear algebraic equations with the enzyme fractions as the only

variables is obtained assuming that the relative metabolite concentrations (𝑎, 𝑏, 𝑐, and 𝑑)

and kinetic parameters (𝑘𝑝∀𝑝 ∈ 𝑃) are specified.

𝑑𝑒

𝑑𝑡= 𝑘2𝑒𝑎 + 𝑘5𝑒𝑎 + 𝑘8𝑒𝑐 + 𝑘11𝑑𝑒∗ − (𝑘1𝑎 + 𝑘6𝑏 + 𝑘7𝑐 + 𝑘12)𝑒 = 0 (15)

𝑑𝑒𝑎

𝑑𝑡= 𝑘1𝑎𝑒 + 𝑘4𝑒𝑏 + 𝑘10𝑒𝑎𝑐 − (𝑘2 + 𝑘3 + 𝑘9𝑐)𝑒𝑎 = 0 (16)

𝑑𝑒𝑏

𝑑𝑡= 𝑘3𝑒𝑎 + 𝑘6𝑏𝑒 − (𝑘4 + 𝑘5)𝑒𝑏 = 0 (17)

𝑑𝑒𝑐

𝑑𝑡= 𝑘7𝑐𝑒 − 𝑘8𝑒𝑐 = 0 (18)

𝑑𝑒𝑎𝑐

𝑑𝑡= 𝑘9𝑐𝑒𝑎 − 𝑘10𝑒𝑎𝑐 = 0 (19)

𝑑𝑒∗

𝑑𝑡= 𝑘12𝑒 − 𝑘11𝑑𝑒∗ = 0 (20)

Note that Equation (15) can be reconstituted as a linear combination of Equations (16) -

(20) because the free enzyme must be regenerated at the end of the catalytic cycle to

maintain steady-state. This results in a rank-deficiency in this system of equations that can

Page 185: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

174

be rectified by appending Equation (21) which ensures that the total enzyme concentration

is maintained constant at metabolic steady-state. Equation (21) is obtained by substituting

Equations (6) in Equation (5):

𝑒 + 𝑒𝑎 + 𝑒𝑏 + 𝑒𝑐 + 𝑒𝑎𝑐 = 1 (21)

Equation (21) replaces Equation (15) resulting in an [𝑛𝐿 × 𝑛𝐿] system of equations of full-

rank for computing enzyme fractions given kinetic parameters and relative metabolite

concentrations. This means that given WT-normalized concentrations 𝑎, 𝑏, 𝑐, and 𝑑, and

kinetic parameters 𝑘𝑝∀𝑝 ∈ 𝑃, solving the system of linear equations yields a unique

assignment for the enzyme fractions 𝑒, 𝑒𝑎, 𝑒𝑏, 𝑒𝑐, 𝑒𝑎𝑐 and 𝑒∗. Fluxes through the

elementary reactions are computed by substituting the newly computed enzyme fractions

in Equations (8). Using the mapping of elementary flux indices to elementary step indices

described in Equations (1) and (2), the net flux through any elementary step 𝑙 =

{1,2,3,4,5,6} can be recovered as follows:

𝑣𝑙(𝑛𝑒𝑡)

= 𝑣2𝑙−1 − 𝑣2𝑙 (22)

The net flux through all the catalytic steps (𝑙 = {1,2,3}) is equal to the net flux through the

overall reaction 𝑉.

𝑣𝑙(𝑛𝑒𝑡)

= 𝑉 𝑙 = {1,2,3} (23)

Page 186: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

175

From the steady-state conditions on the “dead-end” complexes formed via substrate-level

regulation (see Equations (12), (13), and (14)), it can be derived that the net flux through

the regulatory elementary steps is always equal to zero.

𝑣𝑙(𝑛𝑒𝑡)

= 0 𝑙 = {4,5,6} (24)

The automated calculation of the net flux through a reaction given elementary kinetic

parameters and relative metabolite concentrations is facilitated by deriving generalized

expressions for Equations (8) - (14) using the following quantities:

𝒗 is the [𝑛𝑃 × 1] vector of elementary fluxes whose elements 𝑣𝑝 denote the flux through

elementary reaction 𝑝 ∈ 𝑃

𝒆 is the [𝑛𝐿 × 1] vector of enzyme fractions whose elements 𝑒𝑙 represent the fractional

abundance of enzyme complex 𝑙 ∈ 𝐿

𝐼 = {𝑖|𝑖 = 1,2, … , 𝑛𝑀} is the set of all metabolites. In the above example 𝑛𝑀 = 4.

𝒔 is the [𝑛𝑀 × 1] matrix of relative metabolite concentrations whose elements 𝑠𝑖 represent

the fold-change in concentration of metabolite 𝑖 ∈ 𝐼 relative to WT.

𝑬 is the enzyme complex stoichiometry matrix of dimensions [𝑛𝐿 × 𝑛𝑃] whose elements

𝐸𝑙𝑝 represent the stoichiometric coefficient of enzyme complex 𝑙 ∈ 𝐿 in elementary

reaction 𝑝 ∈ 𝑃

𝑬 is defined as follows for the above example:

Page 187: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

176

Note that all the elements in 𝑬 can assume only a value of -1, 0, or 1 and that 𝑬 has exactly

one negative and one positive entry per column. This is because, by definition, elementary

reactions operate on a single enzyme form (either metabolite-bound of free) which is

converted into another form but never destroyed. In contrast, the same enzyme form can

participate in multiple elementary reactions and there exists at least one elementary

reaction that consumes it (entry of -1) and at least one that produces it (i.e. entry of 1).

𝑺 is the metabolite stoichiometry matrix of dimensions [𝑛𝑀 × 𝑛𝑃] whose elements 𝑆𝑖𝑝

represent the stoichiometric coefficient of metabolite 𝑖 ∈ 𝐼 in elementary reaction 𝑝 ∈ 𝑃

𝑺 is defined as follows for the above example:

Page 188: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

177

As was the case for matrix 𝑬, all elements of 𝑺 are equal to either -1, 0 or 1. In addition, 𝑺

has at most one non-zero entry per column. This is because an elementary reaction can

represent only a single binding, release, or catalysis event (Saa and Nielsen, 2017).

Catalytic elementary reactions do not involve metabolites, whereas binding and release

events either consume or produce a metabolite, respectively.

The flux 𝑣𝑝 through elementary reaction 𝑝 (𝑣𝑝) is related to the concentration of

metabolites and enzyme complexes using mass-action kinetics (Khodayari and Maranas,

2016):

𝑣𝑝 = 𝑘𝑝

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0 )

(

∏ 𝑠𝑖

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝≤0 )

∀𝑝 ∈ 𝑃 (25)

In Equation (25), the product operator in (∏ 𝑒𝑙

𝑙𝐸𝑙𝑝<0

) serves to identify the only reactant

enzyme complex participating in elementary reaction 𝑝. Recall that matrix E has a single

element equal to -1 per column. Likewise, the product operator in (∏ 𝑠𝑖

−𝑆𝑖𝑝𝑖

𝑆𝑖𝑝≤0) serves

to identify the only reactant metabolite (if any) in elementary reaction 𝑝. Recall that matrix

𝑺 has at most one non-zero element per column equal to -1 or 1. Therefore, elementary

reactions representing catalysis or product release do not involve a metabolite on the

reactant side thus yielding a zero exponent. Elementary reactions modeling binding of a

metabolite with an enzyme complex always involve a single reacting metabolite which

yield an exponent of 1 (negative of -1 stoich. coeff.). This implies that in Equation (25) the

Page 189: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

178

exponent on the metabolite concentration is always equal to either 0 or 1. Equation (25)

thus captures either a linear relation between 𝑣𝑝 and 𝑒𝑙 when there is no participating

metabolite or a bilinear relation when a metabolite is a co-reactant in the elementary

reaction. The kinetic parameter 𝑘𝑝 for elementary reaction 𝑝 ∈ 𝑃 is a lumped parameter

expressed as the product of the kinetic rate constant �̂�𝑝, the total enzyme concentration 𝐸0,

and the metabolite concentration in the WT as described by Equation (7).

Conservation of mass across the 𝑙𝑡ℎ enzyme complex is mathematically represented as:

𝑑𝑒𝑙

𝑑𝑡= ∑ 𝐸𝑙𝑝𝑣𝑝

𝑛𝑃

𝑝=1

∀𝑝 ∈ 𝑃 (26)

At pseudo-steady-state Equation (26) simplifies to:

∑ 𝐸𝑙𝑝𝑣𝑝

𝑛𝑃

𝑝=1

= 0 ∀𝑝 ∈ 𝑃 (27)

The net flux through the 𝑙𝑡ℎ elementary step (𝑣𝑙(𝑛𝑒𝑡)

) is computed as the difference between

the flux through the corresponding forward and reverse elementary reactions as described

by Equation (22). The net flux through all catalytic elementary steps is equal to the net

overall flux through the reaction. As a convention, we assign the “net” flux through the last

catalytic elementary step as an index indicator of the net flux (𝑉) through the overall

reaction. This information is stored in the set 𝐿(𝑛𝑒𝑡) which is 𝐿(𝑛𝑒𝑡) = {3} for the above

example. This index mapping the last catalytic step to the net flux through the overall

Page 190: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

179

reaction is accomplished using a [1 × 𝑛𝐿] indicator vector 𝑵 whose elements are as

follows:

𝑁𝑗 = {1, 𝑖𝑓 𝑙 ∈ 𝐿(𝑛𝑒𝑡)

0, Otherwise (28)

In reference to the above example, 𝑵 is a [1 × 6] vector defined as 𝑵 =

[0 0 1 0 0 0]. The net flux (𝑉) through the overall reaction is recovered by the

summation operator in Equation (29). Only a single term in the sum is non-zero.

𝑉 = ∑𝑁𝑙𝑣𝑙(𝑛𝑒𝑡)

𝑛𝐿

𝑙=1

(29)

Even though the above treatment refers to a reversible uni-molecular reaction with non-

competitive inhibition the same concepts can be generalized to any ordered or ping-pong

mechanism of enzyme catalysis involving 𝑛𝑠𝑢𝑏𝑠 substrates, 𝑛𝑝𝑑𝑡 products, activators,

competitive inhibitors and uncompetitive inhibitors. Examples of elementary step

decomposition for various reaction mechanisms is shown in Table 2. The above definitions

and concepts form the foundation for the K-FIT procedure for estimating kinetic

parameters given flux distributions.

Page 191: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

180

Table C.2. Elementary step decomposition for various reactions

Reaction Reaction Mechanism Elementary Step

Decomposition

Example from

Central Metabolism

𝐴 ⇌ 𝐵 Uni-Uni

𝐸 + 𝐴 ⇌ 𝐸𝐴

𝐸𝐴 ⇌ 𝐸𝐵

𝐸𝐵 ⇌ 𝐸 + 𝐵

Phosphoglucose

isomerase

𝐴 ⇌ 𝐵 + 𝐶 Uni-Bi

𝐸 + 𝐴 ⇌ 𝐸𝐴

𝐸𝐴 ⇌ 𝐸𝐵𝐶

𝐸𝐵𝐶 ⇌ 𝐸𝐶 + 𝐵

𝐸𝐶 ⇌ 𝐸 + 𝐶

Fructose

bisphosphate

aldolase

𝐴 + 𝐵 ⇌ 𝐶 + 𝐷 Ordered Bi-Bi

𝐸 + 𝐴 ⇌ 𝐸𝐴

𝐸𝐴 + 𝐵 ⇌ 𝐸𝐴𝐵

𝐸𝐴𝐵 ⇌ 𝐸𝐶𝐷

𝐸𝐶𝐷 ⇌ 𝐸𝐷 + 𝐶

𝐸𝐷 ⇌ 𝐸 + 𝐷

Phosphoglycerate

kinase

𝐴 + 𝐵 ⇌ 𝐶 + 𝐷 Bi-substrate Ping-

Pong

𝐸 + 𝐴 ⇌ 𝐸𝐴

𝐸𝐴 ⇌ 𝐸𝐶

𝐸𝐶 ⇌ 𝐸∗ + 𝐶

𝐸∗ + 𝐵 ⇌ 𝐸∗𝐵

𝐸∗𝐵 ⇌ 𝐸𝐷

𝐸𝐷 ⇌ 𝐸 + 𝐷

Transketolase

Page 192: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

181

C.2. Nonlinear least-squares regression-based procedure for kinetic

parameterization

The K-FIT kinetic parameterization procedure is designed to make use of steady-state flux

measurements for multiple genetic perturbations to parameterize a single kinetic model of

metabolism. Kinetic parameters values are estimated by solving a least-squares problem

that minimizes the deviation between predicted and experimentally measured steady-state

flux distributions across all perturbed networks. The formal description of this least-

squares optimization problem requires the definition of the following sets, parameters,

variables and constraints:

Sets

Set of metabolites 𝐼 = {𝑖|𝑖 = 1,2, … , 𝑛𝑀}

Set of reactions 𝐽 = {𝑗|𝑗 = 1,2, … , 𝑛𝑅}

Set of elementary steps 𝐿 = {𝑙|𝑙 = 1,2, … , 𝑛𝐿}

𝐿𝑗𝑐𝑎𝑡 ⊆ 𝐿 is the subset of all catalytic elementary steps for reaction 𝑗

𝐿𝑗𝑟𝑒𝑔

⊂ 𝐿 is the subset of all regulatory elementary steps for reaction 𝑗

Set of elementary reactions 𝑃 = {𝑝|𝑝 = 1,2, … , 𝑛𝑃}

Set of perturbation mutants 𝐶 = {𝑐|𝑐 = 1,2, … , 𝑛𝐶} with 𝑐 = 1 denoting wild-type WT (or

reference) network

Page 193: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

182

𝐽𝑐𝑚𝑒𝑎𝑠 ⊆ 𝐽 is the subset of all reactions with available flux measurements under perturbation

mutant 𝑐 ∈ 𝐶. The cardinality of 𝐽𝑐𝑚𝑒𝑎𝑠 is 𝑛𝑐

𝑚𝑒𝑎𝑠.

Parameters

𝑺 is the metabolite stoichiometry matrix of dimensions [𝑛𝑀 × 𝑛𝑃] whose elements 𝑆𝑖𝑝

represent the stoichiometric coefficient of metabolite 𝑖 ∈ 𝐼 in elementary reaction 𝑝 ∈ 𝑃

𝑬 is the enzyme complex stoichiometry matrix of dimensions [𝑛𝐿 × 𝑛𝑃] whose elements

𝐸𝑙𝑝 represent the stoichiometric coefficient of enzyme complex (or free enzyme) 𝑙 ∈ 𝐿 in

elementary reaction 𝑝 ∈ 𝑃

𝑽𝑐(𝑚𝑒𝑎𝑠)

is the [𝑛𝑐𝑚𝑒𝑎𝑠 × 1] vector of flux measurements in mutant 𝑐 ∈ 𝐶 whose elements

𝑉𝑗,𝑐(𝑚𝑒𝑎𝑠)

represent the measured flux through reaction 𝑗 ∈ 𝐽𝑐𝑚𝑒𝑎𝑠 with standard deviation

𝜎𝑗,𝑐(𝑚𝑒𝑎𝑠)

𝑳(𝑛𝑒𝑡) is the [𝑛𝑅 × 1] net flux mapping vector whose elements (𝐿𝑗(𝑛𝑒𝑡)

) store the index of

the last catalytic elementary step 𝑙 ∈ 𝐿 that quantifies the net flux through the overall

reaction 𝑗 ∈ 𝐽.

Variables

𝒌 is the [𝑛𝑃 × 1] vector of kinetic parameters whose elements 𝑘𝑝 denote the kinetic

parameter for elementary reaction 𝑝 ∈ 𝑃

𝒔 is the [𝑛𝑀 × 𝑛𝐶] matrix of relative metabolite concentrations whose elements 𝑠𝑖𝑐

represent the fold-change in concentration of metabolite 𝑖 ∈ 𝐼 in mutant 𝑐 ∈ 𝐶 relative to

Page 194: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

183

WT. The 𝑐𝑡ℎ column representing the [𝑛𝑀 × 1] vector of relative metabolite concentrations

in mutant 𝑐 ∈ 𝐶 is denoted as 𝒔𝑐.

𝒆 is the [𝑛𝐿 × 𝑛𝐶] matrix of enzyme fractions whose elements 𝑒𝑙𝑐 represent the fractional

abundance of enzyme complex 𝑙 ∈ 𝐿 in mutant 𝑐 ∈ 𝐶. The 𝑐𝑡ℎ column representing the

[𝑛𝐿 × 1] vector of enzyme fractions for mutant 𝑐 ∈ 𝐶 is denoted as 𝒆𝑐. The number of

enzyme complexes is equal to the number of elementary steps as discussed earlier.

𝒗 is the [𝑛𝑃 × 𝑛𝐶] matrix of elementary fluxes whose elements 𝑣𝑝,𝑐 denote the flux through

elementary reaction 𝑝 ∈ 𝑃 in mutant 𝑐 ∈ 𝐶. The 𝑐𝑡ℎ column representing the [𝑛𝑝 × 1]

vector of elementary fluxes in mutant 𝑐 ∈ 𝐶 is denoted as 𝒗𝑐.

𝒗𝑛𝑒𝑡 is the [𝑛𝐿 × 𝑛𝐶] matrix of net elementary fluxes whose elements 𝑣𝑙,𝑐𝑛𝑒𝑡 represent the

net flux through elementary step 𝑙 ∈ 𝐿 in mutant 𝑐 ∈ 𝐶. The 𝑐𝑡ℎ column representing the

[𝑛𝐿 × 1] vector of net elementary fluxes in mutant 𝑐 ∈ 𝐶 is denoted as 𝒗𝑐𝑛𝑒𝑡.

𝑽 is the [𝑛𝑅 × 𝑛𝐶] matrix of reaction fluxes whose elements 𝑉𝑗,𝑐 denote the flux through

reaction 𝑗 ∈ 𝐽 in mutant 𝑐 ∈ 𝐶. The 𝑐𝑡ℎ column representing the [𝑛𝑅 × 1] vector of

elementary fluxes in mutant 𝑐 ∈ 𝐶 is denoted as 𝑽𝑐.

In addition to these variable declarations the following three matrices are defined:

𝑹 is an [𝑛𝑅 × 𝑛𝐿] grouping matrix that indicates which enzyme complexes 𝑙 ∈ 𝐿 participate

in reaction 𝑗 ∈ 𝐽. It is defined as:

𝑅𝑗𝑙 = {1 if 𝑙 ∈ {𝐿𝑗

𝑐𝑎𝑡⋃𝐿𝑗𝑟𝑒𝑔

}

0, otherwise

Page 195: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

184

𝑵 is an [𝑛𝑅 × 𝑛𝐿] indicator matrix that is used to map net flux through elementary steps 𝑙

to flux through the overall reaction 𝑗 ∈ 𝐽. Based on the convention established in the

Introduction section (Equation (28)), the last catalytic step serves as a measure of flux

through the overall reaction. It is defined as:

𝑁𝑗𝑙 = {1 𝑖𝑓 𝑙 = 𝐿𝑗

(𝑛𝑒𝑡)

0, otherwise

𝒁 is an [𝑛𝑅 × 𝑛𝐶] indicator matrix that maps the abundance of the enzyme catalyzing

reaction 𝑗 in mutant 𝑐 ∈ 𝐶 relative to its abundance in the WT strain. It is defined as:

𝑍𝑗,𝑐 = {0 if reaction 𝑗 ∈ 𝐽 is eliminated under condition 𝑐 ∈ 𝐶1, otherwise

The definition of matrix 𝒁 implies that the mutant networks are derived by eliminating one

or more reactions from the metabolic network of the reference strain. This definition can

be generalized to incorporate other genetic perturbations such as over-expression and

down-regulation of gene expression. In the absence of proteomic data in mutant strains, we

assume that the enzymes maintain levels as in the WT.

Page 196: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

185

Least-squares minimization problem P1

Using the definitions introduced above the least-squares minimization problem for kinetic

parameterization is formulated for the general case as the following nonlinear optimization

problem:

min𝒌,𝒆,𝒔,𝒗,𝑽

𝜙 = ∑ ∑ (𝑉𝑗𝑐 − 𝑉𝑗𝑐

(𝑚𝑒𝑎𝑠)

𝜎𝑗𝑐)

2

𝑗∈𝐽𝑐𝑚𝑒𝑎𝑠

𝑛𝐶

𝑐=1

subject to:

𝑣𝑝,𝑐 = 𝑘𝑝

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0 )

(

∏ 𝑠𝑖,𝑐

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝≤0 )

∀𝑝 ∈ 𝑃

∀𝑐 ∈ 𝐶

(30)

𝑣𝑙,𝑐(𝑛𝑒𝑡)

= 𝑣(2𝑙−1),𝑐 − 𝑣2𝑙,𝑐

∀𝑙 ∈ 𝐿

∀𝑐 ∈ 𝐶

(31)

∑(𝐸𝑙𝑝𝑣𝑝𝑐)

𝑃

𝑝=1

= 0

∀𝑙 ∈ 𝐿

∀𝑐 ∈ 𝐶

(32)

∑(𝑅𝑗𝑙𝑒𝑙,𝑐)

𝑛𝐿

𝑙=1

= 𝑍𝑗𝑐

∀𝑗 ∈ 𝐽

∀𝑐 ∈ 𝐶

(33)

∑(𝑆𝑖𝑝𝑣𝑝𝑐)

𝑃

𝑝=1

= 0

∀𝑖 ∈ 𝐼

∀𝑐 ∈ 𝐶

(34)

Page 197: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

186

𝑉𝑗,𝑐 = ∑(𝑁𝑗𝑙𝑣𝑙,𝑐(𝑛𝑒𝑡))

𝑛𝐿

𝑙=1

∀𝑗 ∈ 𝐽

∀𝑐 ∈ 𝐶

(35)

𝑠𝑖,𝑐 ≥ 0

∀𝑖 ∈ 𝐼

∀𝑐 ∈ 𝐶

(36)

𝑠𝑖,1 = 1 ∀𝑖 ∈ 𝐼 (37)

0 ≤ 𝑒𝑙,𝑐 ≤ 1

∀𝑙 ∈ 𝐿

∀𝑐 ∈ 𝐶

(38)

𝑘𝑝 ≥ 0 ∀𝑝 ∈ 𝑃 (39)

Equation (30) in the above formulation represents the rate law for any elementary reaction

governed by mass-action kinetics. It is a generalized form of Equation (25) accounting for

reaction rates across all mutants 𝑐 ∈ 𝐶. As discussed before, the role of the product

operators is to select the single enzyme complex and (possibly) metabolite participating in

the elementary reaction rate equation. Therefore, Equation (30) involves either a bilinear

term (product of enzyme fraction times a relative metabolite concentration) or linear term

(enzyme fraction term) in the right-hand side. Equation (32) and (34) enforce conservation

of mass across all enzyme complexes and metabolites, respectively. Equation (32) is an

extension of Equation (27) to include enzyme complex balances across all mutants.

Equation (33) ensures that the total amount of the enzyme in all of its forms catalyzing

reaction 𝑗 remains constant. It is a generalization of Equations (21) to account for enzyme

Page 198: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

187

presence or absence in different mutants. Thus, the presence or absence of reaction 𝑗 in

mutant 𝑐 is captured by Equation (33). Equation (31) computes the net flux through any

elementary step based on the mapping of elementary reactions to elementary steps

established with Equations (1), (2) and (22). Equation (35) links the net flux through the

elementary steps (i.e., last catalytic step) of a reaction to the overall flux through reaction

𝑗. Equation (38) ensures that the enzyme fractional abundances are bounded between zero

and one. Equation (36) and (39) enforce non-negativity of relative metabolite

concentrations and kinetic parameters, respectively. Since all metabolite concentrations are

normalized with respect to the corresponding concentrations in the WT strain as described

in Equations (4), Equation (37) sets all relative concentrations for the WT strain (𝑐 = 1)

equal to one.

Equation (30) involving (at most) bilinear terms is the only set of nonlinear constraints in

NLP problem P1. This constraint renders the optimization formulation nonconvex making

even the identification of a feasible point challenging let alone convergence to the optimum

value. Therefore, any attempt to solve problem P1 using an off-the-self NLP solver such

MINOS (Murtagh and Saunders, 1978), CONOPT (Drud, 1985), or fmincon from the

Optimization Toolbox in MATLABTM is unlikely to succeed due to difficulties in

maintaining feasibility and progressively reduced step-length in the line-search.

Conceptually, this can be remedied by integrating Equations (32) and (34) to steady-state

after substituting the expression for elementary flux from Equation (30). However, this

tends to be rather time consuming (i.e., order of minutes) due to the stiffness of the

differential equations and the loss of accuracy arising from taking large time steps.

Page 199: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

188

Furthermore, the inability to integrate Equations (32) and (34) to steady-state for some sets

of kinetic parameters results in the premature termination of any gradient-based

optimization algorithm. Therefore, past efforts in kinetic parameterization have relied on

meta-heuristic optimization algorithms such as Genetic Algorithm (Khodayari et al., 2014)

and particle swarm optimization (Millard et al., 2017). The lack of gradient information in

this class of methods limits efficient traversal of the kinetic space in search of an acceptable

solution which may or may not be optimal or even near-optimal for the least squares

objective function. This computational inefficiency in performing kinetic model

parameterization prevents any follow up calculations to assess uncertainties in kinetic

parameters due to experimental errors or internal kinetic parameter dependencies. This

computational inefficiency is one of the contributing factors that have so far throttled back

the parameterization of large-scale and wide application of kinetic models in strain design.

Faced with these challenges, we put forth a customized procedure that can reliably identify

optimal or near optimal kinetic model parameterizations while achieve orders of magnitude

improvement in computational time over stochastic approaches. The following subsections

will describe strategies to transform problem P1 into a successive sequence of easier-to-

solve subproblems. These strategies form the basis of the kinetic parameterization

algorithm, K-FIT. K-FIT allows for the efficient solution of NLP problem P1 using three

main tasks/procedures:

I. Procedure K-SOLVE anchors kinetic parameters 𝑘𝑝 to the specified steady-

state flux distribution in the WT network 𝑽1 such that such that conservation of

mass across metabolites (Equation (34)), pseudo-steady-state condition across all

Page 200: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

189

enzyme complexes (Equation (32)), and normalization of metabolite concentrations

(Equation (37)) are simultaneously satisfied for the WT network. This is

accomplished by rearranging Equation (30) to express 𝒌 as a function of the WT

enzyme fractions 𝒆1 and the flux through the reverse elementary reactions 𝒗𝑟 ⊂ 𝒗1

while maintaining the relative metabolite concentrations 𝑠𝑖,1 = 1 ∀𝑖 ∈ 𝐼 (i.e., 𝒌 =

𝑓(𝒗𝑟 , 𝒆1)).

II. Procedure SSF-Evaluator computes the steady-state fluxes 𝑽𝑐 and relative

metabolite concentrations 𝒔𝑐 across all mutants (𝑐 > 1) using the kinetic

parameters 𝒌 computed in procedure K-SOLVE. Procedure SSF-Evaluator

decomposes the system of bilinear equations in 𝒔𝑐 and 𝒆𝑐 defined by Equations

(30), (32), (33), and (34) into two blocks of equations representing conservation of

mass across enzyme complexes and metabolites, respectively. The bilinear

equations become linear when one of either (𝒔𝑐 or 𝒆𝑐) is specified. When 𝒔𝑐 is

specified, Equations (32) and (33) form an exactly determined [𝑛𝐿 × 𝑛𝐿] system of

linear algebraic equations in 𝒆𝑐. Similarly, Equation (34) represents an exactly

determined [𝑛𝑀 × 𝑛𝑀] system of linear algebraic equations in 𝒔𝑐 when 𝒆𝑐 is

specified. SSF-Evaluator iterates between these two blocks using originally a fixed-

point iteration (FPI) scheme (or Newton / Richardson extrapolation if needed) until

a steady-state is found. This strategy allows for the direct evaluation of both fluxes

and concentration across all mutants that automatically satisfy all the nonlinear

equality constraints from problem P1 and leaves only linear (in)equalities in the

constraint set.

Page 201: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

190

III. Procedure K-UPDATE computes the sensitivity of net flux through all

reactions 𝑽 to WT enzyme fractions 𝒆1 and reverse elementary fluxes 𝒗𝑟, which is

then used to compute the approximate gradient 𝑮 and the approximate 𝑯 for the

objective function 𝜙. 𝑮 and 𝑯 are then used to check for optimality and update 𝒆1

and 𝒗𝑟 using a Newton step if optimality is not achieved. The updated values for

𝒆1 and 𝒗𝑟 are then fed to the K-SOLVE procedure which evaluates updated kinetic

parameters 𝒌 and the calculation sequence described above is repeated.

The mathematical details and implementation of all the component subroutines of K-FIT

are described in the following subsections.

C.3. KSOLVE: Anchoring kinetic parameters to the WT flux distribution

K-SOLVE computes a set of kinetic parameters 𝒌 that satisfy Equations (30) - (39) for the

WT network (𝑐 = 1) when the WT flux distribution 𝑽1, enzyme fractions 𝒆1 and non-

negative elementary fluxes 𝒗1 are specified. This anchoring is required because

conservation of mass across all enzyme fractions, mass balance across metabolites, and

normalization of metabolite concentrations may not be simultaneously satisfied. To

demonstrate this, we recast Equations (32), (33), and (34) after substituting the expression

for elementary fluxes in terms of mass-action kinetics described in Equation (30) and

setting 𝑠𝑖,1 = 1∀𝑖 ∈ 𝐼 based on Equation (37).

∑(𝑅𝑗𝑙𝑒𝑙,1)

𝑛𝐿

𝑙=1

= 1

∀𝑙 ∈ 𝐿

∀𝑐 ∈ 𝐶

(40)

Page 202: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

191

(

𝐸𝑙𝑝𝑘𝑝

(

∏ 𝑒𝑙,𝑐

𝑙′𝐸𝑙′𝑝<0

)

)

𝑛𝑃

𝑝=1

= 0

∀𝑙 ∈ 𝐿

∀𝑐 ∈ 𝐶

∀𝑙′ ∈ 𝐿

(41)

(

𝑆𝑖𝑝𝑘𝑝

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0

)

)

𝑛𝑃

𝑝=1

= 0

∀𝑖 ∈ 𝐼

∀𝑐 ∈ 𝐶

(42)

Equations (40), (41), and (42) form an overdetermined system of (𝑛𝐿 + 𝑛𝑀) linear

algebraic equations in 𝑛𝐿 unknown enzyme fractions 𝒆1 when kinetic parameters 𝒌 are

specified. This system of equations for arbitrary values of 𝒌 will likely be infeasible

indicating that not possible values for kinetic parameters 𝒌 simultaneously satisfy

conservation of mass across all metabolites and enzyme complexes. This necessitates the

development of the K-SOLVE procedure which derives a link between 𝒌, and 𝒆1 so that

conservation of mass is always satisfied. This is achieved by rearranging Equation (30) for

the WT network and exploiting the property that the product term containing relative

metabolite concentrations (∏ 𝑠𝑖,1

−𝑆𝑖𝑝𝑖

𝑆𝑖𝑝≤0) will always be equal to one because the

metabolite concentrations are scaled with respect to WT (i.e. 𝑠𝑖,1 = 1):

Page 203: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

192

𝑘𝑝 = 𝑣𝑝,1

(

∏ 𝑒𝑙,1

𝑙𝐸𝑙𝑝<0 )

−1

∀𝑝 ∈ 𝑃 (43)

Note that Equation (43) reveals that 𝑘𝑝 can be uniquely determined when both 𝒗1 and 𝒆1

are specified. Of these variables, 𝒆1 is bounded between 0 and 1, and further constrained

by the following relations:

∑(𝑅𝑗𝑙𝑒𝑙,1)

𝑛𝐿

𝑙=1

= 1 ∀𝑙 ∈ 𝐿 (40)

0 ≤ 𝑒𝑙,1 ≤ 1 ∀𝑙 ∈ 𝐿 (44)

Equation (43) relates 2𝑛𝐿 kinetic parameters to 𝑛𝐿 enzyme fractions and 2𝑛𝐿 elementary

fluxes. Enzyme fractions are further constrained by Equation (40) and bounded as shown

in Equation (44) implying that there exists multiple value assignments for the 2𝑛𝐿

elementary fluxes that could yield the same 𝑘𝑝 values. This implies that the assignment of

values for the elementary fluxes 𝑣𝑝 is not unique and that there exist unsatisfied degrees of

freedom as only a subset of 𝑣𝑝 are independent variables. The reason for this dependency

is the presence of pairs of forward and reverse fluxes that can assume an infinity of possibly

combinations of values with the same net flux 𝑣𝑙(𝑛𝑒𝑡)

through the elementary step. We

extract an independent subset of 𝒗1 by arbitrarily selecting the reverse flux as the

independent variables and relating the forward fluxes as a function of the reverse and net

fluxes. This requires the definition of two separate [𝑛𝐿 × 1] vectors 𝒗𝑓 and 𝒗𝑟 denoting

fluxes through forward and reverse elementary reactions, respectively in the WT strain.

Page 204: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

193

Elements of vectors 𝒗𝑓 and 𝒗𝑟 are mapped to the [2𝑛𝐿 × 1] vector of elementary fluxes in

the WT network (𝒗1) using Equations (45) and (46).

𝑣𝑓,𝑙 = 𝑣2𝑙−1,1

𝑣𝑟,𝑙 = 𝑣2𝑙,1

(45)

(46)

Because the net flux through an elementary step 𝑣𝑙,1(𝑛𝑒𝑡)

is the difference between the

forward and reverse elementary fluxes we obtain

𝑣𝑓𝑙= 𝑣𝑙,1

(𝑛𝑒𝑡)+ 𝑣𝑟𝑙

∀𝑙 ∈ 𝐿 (47)

The net flux through all elementary steps of an enzyme-catalyzed reaction in the WT strain

is related to the net flux through the reaction in the WT (𝑐 = 1) by Equations (48) and (49).

𝑣𝑙,1(𝑛𝑒𝑡)

= 𝑉𝑗,1

∀𝑙 ∈ 𝐿𝑗𝑐𝑎𝑡

∀𝑗 ∈ 𝐽

(48)

𝑣𝑙,1(𝑛𝑒𝑡)

= 0

∀𝑙 ∈ 𝐿𝑗𝑟𝑒𝑔

∀𝑗 ∈ 𝐽

(49)

When 𝑽1 is specified then the values of the net fluxes 𝑣𝑙(𝑛𝑒𝑡)

through all elementary steps

(both catalytic and regulatory) can be recovered from Equations (48) and (49). These

values can then be plugged into Equation (47) to calculate 𝒗𝑓 for a given assignment of

value of the independent variables 𝒗𝑟. Since vector 𝑽1 stores the steady-state fluxes in the

Page 205: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

194

WT, equality constraints representing conservation of mass across metabolites in the WT

in problem P1 are inherently satisfied.

Therefore when 𝒗1(𝑛𝑒𝑡)

, 𝒗𝑟, and 𝒆1 are specified, a unique set of kinetic parameters 𝒌 can

be obtained by solving the following [𝑛𝑃 × 𝑛𝑃] system of linear algebraic equations.

𝑣𝑟,𝑙 + 𝑣𝑙(𝑛𝑒𝑡)

= 𝑘(2𝑙−1)

(

∏ 𝑒𝑙′,1

𝑙′𝐸𝑙′𝑝<0 )

∀𝑝 ∈ 𝑃

∀𝑙 ∈ 𝐿

∀𝑙′ ∈ 𝐿

(50)

𝑣𝑟,𝑙 = 𝑘(2𝑙)

(

∏ 𝑒𝑙′,1

𝑙′𝐸𝑙′𝑝<0 )

Note that the rate law expressions in Equations (50) are derived by setting the relative

metabolite concentrations in Equation (30) for WT to one. The vector of kinetic parameters

𝒌 is recovered from Equation (51) as the following explicit relations

𝑘(2𝑙−1) = (𝑣𝑟𝑙+ 𝑣𝑙

(𝑛𝑒𝑡))

(

∏ 𝑒𝑙′,1

𝑙′

𝐸𝑙′𝑝<0)

−1

∀𝑝 ∈ 𝑃

∀𝑙 ∈ 𝐿

∀𝑙′ ∈ 𝐿

(51)

𝑘(2𝑙) = (𝑣𝑟𝑙)

(

∏ 𝑒𝑙′,1

𝑙′

𝐸𝑙′𝑝<0)

−1

Page 206: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

195

Values assumed by 𝒆1 and 𝒗𝑟 are constrained by the following (in)equalities:

∑𝑅𝑗𝑙𝑒𝑙,1

𝑛𝐿

𝑙=1

= 1 ∀𝑗 ∈ 𝐽 (40)

0 ≤ 𝑒𝑙,1 ≤ 1 ∀𝑙 ∈ 𝐿 (44)

𝑣𝑟,𝑙 ≥ 0 ∀𝑙 ∈ 𝐿 (52)

𝑣𝑟,𝑙 + 𝑣𝑙,1(𝑛𝑒𝑡)

≥ 0 ∀𝑙 ∈ 𝐿 (53)

Since all elementary fluxes and enzyme fractions are non-negative, non-negativity of the

kinetic parameters 𝒌 computed in Equation (51) is always guaranteed. The steps for

computing this feasible set of kinetic parameters is provided in the following algorithmic

description for K-SOLVE. K-SOLVE accepts WT enzyme fractions 𝒆1 and reverse

elementary fluxes 𝒗𝑟 as inputs and returns kinetic parameters 𝒌 as the output.

Algorithm procedure K-SOLVE

Begin

Specify and fix flux distribution in the WT strain 𝑽1.

Specify and fix 𝒆1 and 𝒗𝑟 satisfying Equation (40), (44), (52), and (53).

Set 𝑣𝑙,1(𝑛𝑒𝑡) ∀𝑙 ∈ 𝐿𝑗

𝑐𝑎𝑡 to 𝑉𝑗,1∀𝑗 ∈ 𝐽

Set 𝑣𝑙,1(𝑛𝑒𝑡) ∀𝑙 ∈ 𝐿𝑗

𝑟𝑒𝑔 to 0

Compute kinetic parameters 𝒌 by substituting 𝒆1, 𝒗𝑟, and 𝒗1(𝑛𝑒𝑡)

in Equation (51)

return 𝒌

end

Page 207: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

196

C.4. SSF-Evaluator: Evaluation of steady-state fluxes for the mutant networks

using the kinetic parameter assignments of K-SOLVE

Having computed a set of kinetic parameters 𝒌 satisfying Equations (30) - (39) for the WT

strain (𝑐 = 1) using K-SOLVE, the objective of SSF-Evaluator is to compute the flux

distributions in the mutant strains. Typically, this is achieved by integrating the ODEs

describing conservation of mass across all metabolites and enzyme complexes. To

circumvent the unreliability and high computational cost associated with numerical

integration, we put forth a decomposition-based approach that leverages the bilinear

structure of the underlying system of equations. In this section, we derive updating

formulae for the metabolite concentrations (Equations (57), (59), and (82)) in response to

the altered enzyme concentrations compared to WT in the mutant networks (see Equation

(33)). These update formulae are then fed into the SSF-Evaluator procedure that evaluates

fluxes and metabolite concentrations in mutants when the kinetic parameters 𝒌 are

provided.

Substituting the expression for 𝑣𝑝,𝑐 from Equation (30) that pose metabolite and enzyme

mass balances as functions of enzyme fractions 𝒆𝑐 and relative metabolite concentrations

𝒔𝑐 across all mutant networks 𝑐 ∈ 𝐶 into Equations (32) and (34) yields Equations (54) and

(55), respectively:

Page 208: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

197

(

𝐸𝑙𝑝𝑘𝑝

(

∏ 𝑒𝑙,𝑐

𝑙′𝐸𝑙′𝑝<0

)

(

∏ 𝑠𝑖,𝑐

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝≤0

)

)

𝑛𝑃

𝑝=1

= 0

∀𝑙 ∈ 𝐿

∀𝑐 ∈ 𝐶

(54)

(

𝑆𝑖𝑝𝑘𝑝

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0

)

(

∏ 𝑠𝑖,𝑐

−𝑆𝑖′𝑝

𝑖′𝑆𝑖′𝑝≤0

)

)

𝑛𝑃

𝑝=1

= 0

∀𝑖 ∈ 𝐼

∀𝑐 ∈ 𝐶

(55)

Equations (54) and (55) must be supplemented by Equation (33) that imposes that the sum

of the fractional abundance of all enzyme complexes of a particular enzyme must be equal

to the fold-change in the total enzyme level relative to WT. Thus, for every mutant network

𝑐, the enzyme fractions 𝒆𝑐 encode any changes to enzyme level by means of upregulation,

downregulation or absence as described by Equation (33).

∑(𝑅𝑗𝑙𝑒𝑙,𝑐)

𝑛𝐿

𝑙=1

= 𝑍𝑗𝑐

∀𝑗 ∈ 𝐽

∀𝑐 ∈ 𝐶

(33)

Equations (33) and (54) form a [𝑛𝐿 × 𝑛𝐿] system of linear algebraic equations of full rank

in 𝒆𝑐 that can efficiently be solved for the fractional enzyme complex abundances in all

mutant networks given the values for the relative metabolite concentrations 𝒔𝑐 and kinetic

parameters 𝒌 (Briggs and Haldane, 1925). It is important to note that the steady-state

enzyme fractions 𝒆𝑐 encode any changes to enzyme presence in mutant 𝑐 through Equation

(33).

Page 209: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

198

Elementary binding and release steps bind only one reactant or release only one product at

a time. This ensures that the only possible exponent for the metabolite concentration term

in Equations (54) and (55) is equal to one. Equations (55) therefore simplifies to a system

of linear algebraic equations in 𝒔𝑐 when 𝒆𝑐 is specified and can be recast as:

(

𝑆𝑖𝑝𝑘𝑝

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0

)

)

𝑝∈𝑃𝑆𝑖𝑝>0

+ ∑

(

𝑆𝑖𝑝𝑘𝑝

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0

)

𝑠𝑖,𝑐

)

𝑝∈𝑃𝑆𝑖𝑝<0

= 0

∀𝑖 ∈ 𝐼

∀𝑐 ∈ 𝐶

(56)

The relative metabolite concentrations can then be directly calculated from the following

explicit expression:

𝑠𝑖,𝑐 = −

(

𝑆𝑖𝑝𝑘𝑝

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0

)

)

𝑝∈𝑃𝑆𝑖𝑝>0

∀𝑖 ∈ 𝐼

∀𝑐 ∈ 𝐶

(57)

(

𝑆𝑖𝑝𝑘𝑝

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0

)

)

𝑝∈𝑃𝑆𝑖𝑝<0

Equation (57) relates relative metabolite concentrations 𝒔𝑐 to enzyme fractions 𝒆𝑐 at

metabolic steady-state for a given set of elementary step kinetic parameters 𝒌. When 𝒔𝑐

and 𝒆𝑐 do not represent steady-state relative metabolite concentrations and enzyme

fractions, the left hand-side of Equation (56) quantifies the mass imbalance of metabolite

𝑖 in network 𝑐 as shown in Equation (58).

Page 210: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

199

𝑑𝑠𝑖,𝑐

𝑑𝑡= ∑

(

𝑆𝑖𝑝𝑘𝑝

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0

)

)

𝑝∈𝑃𝑆𝑖𝑝>0

+ ∑

(

𝑆𝑖𝑝𝑘𝑝

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0

)

𝑠𝑖,𝑐

)

𝑝∈𝑃𝑆𝑖𝑝<0

∀𝑖 ∈ 𝐼

∀𝑐 ∈ 𝐶

(58)

C.4.1. Fixed-Point Iteration (FPI)

In summary, the enzyme fractions 𝒆𝑐 can be computed from the kinetic parameters 𝒌 and

metabolite concentrations 𝒔𝑐 by solving the system of linear equations (33) and (54) and in

turn the computed enzyme fractions 𝒆𝑐 can be used to update metabolite concentrations 𝒔𝑐.

This establishes the following fixed-point iteration (FPI) procedure to solve for the

unknown concentrations 𝒔𝑐 and enzyme fractions 𝒆𝑐 given kinetic parameters 𝒌:

Algorithmic Implementation of FPI

Begin

Specify and fix 𝒌

set 𝑠𝑡𝑜𝑙:= 10−6, 𝑖𝑡𝑒𝑟: = 1

Initialize 𝑠𝑖,𝑐(0): = 1, ∀𝑖 ∈ 𝐼 and 𝑐 ∈ 𝐶

Compute 𝒆𝑐(𝑖𝑡𝑒𝑟)

by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(0)

Compute 𝒔𝑐(𝑖𝑡𝑒𝑟)

by solving Equation (57) with 𝒆𝑐: = 𝒆𝑐(𝑖𝑡𝑒𝑟)

Compute 𝑑𝒔𝑐

𝑑𝑡 by solving Equation (58) with 𝒔𝑐: = 𝒔𝑐

(𝑖𝑡𝑒𝑟) and 𝒆𝑐: = 𝒆𝑐

(𝑖𝑡𝑒𝑟)

While ‖𝑑𝑠𝒄

𝑑𝑡‖

∞> 𝑠𝑡𝑜𝑙 or ‖𝒔𝑐

(𝑖𝑡𝑒𝑟+1)− 𝒔𝑐

(𝑖𝑡𝑒𝑟)‖

∞> 10−4

𝑖𝑡𝑒𝑟 ≔ 𝑖𝑡𝑒𝑟 + 1

Compute 𝒆𝑐(𝑖𝑡𝑒𝑟)

by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(𝑖𝑡𝑒𝑟)

Compute 𝒔𝑐(𝑖𝑡𝑒𝑟)

by solving Equation (57) with 𝒆𝑐: = 𝒆𝑐(𝑖𝑡𝑒𝑟)

Compute 𝑑𝑠𝒄

𝑑𝑡 by solving Equation (58) with 𝒔𝑐: = 𝒔𝑐

(𝑖𝑡𝑒𝑟) and 𝒆𝑐: = 𝒆𝑐

(𝑖𝑡𝑒𝑟)

return 𝒔𝑐(𝐹𝑃𝐼)

≔ 𝒔𝑐(𝑖𝑡𝑒𝑟)

and 𝒆𝑐(𝐹𝑃𝐼)

≔ 𝒆𝑐(𝑖𝑡𝑒𝑟)

end

Page 211: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

200

It is important to note that the FPI algorithm has linear convergence which causes the

method to slow down as we approach metabolic steady-state. This can be accelerated by

switching to Newton’s method which has quadratic convergence. We switch to Newton’s

method when either the mass imbalance is within the specified threshold of 𝑠𝑡𝑜𝑙 or the

progress towards steady-state becomes too slow. This happens when the change in

metabolite concentrations between iterations falls below a pre-specified threshold of 10−4.

C.4.2. Newton’s method for accelerating convergence

Let 𝒔𝑐(𝐹𝑃𝐼)

and 𝒆𝑐(𝐹𝑃𝐼)

be current iterates that do not represent steady-state relative metabolite

concentrations and enzyme fractions, respectively. They can be used as starting points for

Newton’s method where relative metabolite concentrations are updated in the 𝑛𝑡ℎ iteration

as:

𝒔𝑐(𝑛+1)

= 𝒔𝑐(𝑛)

− (𝜕 (

𝑑𝒔𝑐𝑑𝑡

)

𝜕𝒔𝑐)

−1

𝑑𝒔𝑐

𝑑𝑡

(59)

In Equation (59), 𝑑𝒔𝑐

𝑑𝑡 is computed using Equation (58). The quantity (

𝜕(𝑑𝒔𝑐𝑑𝑡

)

𝜕𝒔𝑐) represents

the Jacobian 𝑱 of the function 𝑑𝒔𝑐

𝑑𝑡 described in Equation (59) and can be recast in terms of

elementary fluxes as:

𝑑𝑠𝑖,𝑐

𝑑𝑡= ∑(𝑆𝑖𝑝𝑣𝑝𝑐)

𝑃

𝑝=1

= 0

∀𝑖 ∈ 𝐼

∀𝑐 ∈ 𝐶

(60)

Page 212: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

201

The Jacobian 𝐽 obtained by differentiating Equation (60) with respect to 𝒔𝑐 yields:

𝐽𝑖𝑖′,𝑐 =𝜕

𝜕𝑠𝑖′,𝑐(𝑑𝑠𝑖,𝑐

𝑑𝑡) = ∑ 𝑆𝑖𝑝 (

𝜕𝑣𝑝𝑐

𝜕𝑠𝑖′,𝑐)

𝑃

𝑝=1

∀𝑖 ∈ 𝐼

∀𝑐 ∈ 𝐶

∀𝑖′ ∈ 𝐼

(61)

Recall that 𝑣𝑝𝑐 is related to kinetic parameters 𝒌, enzyme fractions 𝒆𝑐, and relative

metabolite concentrations 𝒔𝑐 using the mass-action kinetics of Equation (30). The

sensitivity of 𝑣𝑝𝑐 to the relative metabolite concentrations is obtained by differentiating

Equation (30) with respect to 𝒔𝑐:

𝑣𝑝,𝑐 = 𝑘𝑝

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0

)

(

∏ 𝑠𝑖,𝑐

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝<0

)

∀𝑝 ∈ 𝑃

∀𝑐 ∈ 𝐶

(30)

𝜕𝑣𝑝𝑐

𝜕𝑠𝑖′,𝑐

= ∑

(

𝑘𝑝

(

(

∏ 𝑠𝑞,𝑐

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝<0 )

𝜕

𝜕𝑠𝑖,𝑐

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0 )

+

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0 )

𝜕

𝜕𝑠𝑖′,𝑐

(

∏ 𝑠𝑞,𝑐

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝<0 )

)

)

𝑙𝐸𝑙𝑝<0

∀𝑝 ∈ 𝑃

∀𝑐 ∈ 𝐶

∀𝑖 ∈ 𝐼

∀𝑖′ ∈ 𝐼

(62)

Since only one enzyme complex and (at most) one metabolite participates in any

elementary reaction, the derivatives in Equation (62) can be simplified as:

Page 213: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

202

𝜕

𝜕𝑠𝑖′,𝑐

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0 )

= − ∑ 𝐸𝑙𝑝 (𝜕𝑒𝑙,𝑐

𝜕𝑠𝑖′,𝑐

)

𝑙𝐸𝑙𝑝≤0

∀𝑖′ ∈ 𝐼

∀𝑝 ∈ 𝑃

∀𝑐 ∈ 𝐶

(63)

𝜕

𝜕𝑠𝑖′,𝑐

(

∏ 𝑠𝑖,𝑐

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝≤0

)

= − ∑ 𝑆𝑖𝑝 (𝜕𝑠𝑖,𝑐

𝜕𝑠𝑖′,𝑐

)

𝑖𝑆𝑖𝑝≤0

∀𝑖′ ∈ 𝐼

∀𝑝 ∈ 𝑃

∀𝑐 ∈ 𝐶

(64)

Equation (62) can therefore be simplified by substituting the expressions for the derivatives

in Equations (63) and (64) as:

𝜕𝑣𝑝𝑐

𝜕𝑠𝑖′,𝑐

= −𝑘𝑝

(

(

∏ 𝑠𝑖,𝑐

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝<0 )

(

∑ 𝐸𝑙𝑝𝜕𝑒𝑙,𝑐

𝜕𝑠𝑖′,𝑐𝑙𝐸𝑙𝑝≤0 )

+

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0 )

(

∑ 𝑆𝑖𝑝𝜕𝑠𝑖,𝑐

𝜕𝑠𝑖′,𝑐𝑖𝑆𝑖𝑝≤0 )

)

∀𝑝 ∈ 𝑃

∀𝑐 ∈ 𝐶

∀𝑖′ ∈ 𝐼

(65)

The partial derivatives 𝜕𝑒𝑙,𝑐

𝜕𝑠𝑖′,𝑐 must be computed to quantify the sensitivity of elementary

fluxes to substrate concentrations. This is achieved by differentiating Equations (33) and

(54) with respect to 𝑠𝑖′,𝑐:

∑(𝑅𝑗𝑙

𝜕𝑒𝑙,𝑐

𝜕𝑠𝑖′,𝑐 )

𝑛𝐿

𝑙=1

= 0

∀𝑙 ∈ 𝐿

∀𝑐 ∈ 𝐶

∀𝑖′ ∈ 𝐼

(66)

Page 214: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

203

(

𝐸𝑙𝑝𝑘𝑝

(

∏ 𝑠𝑖,𝑐

𝑖𝑆𝑖𝑝≤0

)

∑ (𝜕𝑒𝑙,𝑐

𝜕𝑠𝑖′,𝑐)

𝑙𝐸𝑙𝑝<0

)

𝑃

𝑝=1

− ∑

(

𝑆𝑖′𝑝𝐸𝑙𝑝𝑘𝑝

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0

)

)

𝑃

𝑝=1

= 0

∀𝑙 ∈ 𝐿

∀𝑐 ∈ 𝐶

∀𝑖′ ∈ 𝐼

(67)

𝜕𝑒𝑙,𝑐

𝜕𝑠𝑖′,𝑐 is computed by solving an exactly determined [𝑛𝐿 × 𝑛𝐿] system of linear algebraic

equations formed by Equations (66) and (67). The computed 𝜕𝑒𝑙,𝑐

𝜕𝑠𝑖′,𝑐 is then substituted in

Equation (65) to compute 𝜕𝑣𝑝𝑐

𝜕𝑠𝑖,𝑐 which is subsequently substituted in Equation (61) to

compute all elements in the Jacobian 𝑱. Having computed 𝑱, metabolite concentrations can

be updated using Equation (59) until the steady-state concentrations are reached or 𝑱

becomes singular. An alternative updating scheme for when 𝑱 become singular is detailed

in the following subsection. The following algorithm details the steps involved in the

identification of steady-state metabolite concentrations using Newton’s method.

Algorithmic Implementation of Newton’s Method

Begin

Specify and fix 𝒌

Set 𝑠𝑡𝑜𝑙: = 10−6, 𝑖𝑡𝑒𝑟: = 1

Initialize 𝑠𝑖,𝑐(𝑖𝑡𝑒𝑟): = 𝑠𝑖,𝑐

𝐹𝑃𝐼 , ∀𝑖 ∈ 𝐼 and 𝑐 ∈ 𝐶

Compute 𝒆𝑐(𝑖𝑡𝑒𝑟)

by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(𝑖𝑡𝑒𝑟)

Compute 𝑑𝒔𝑐

𝑑𝑡 by substituting 𝒔𝑐: = 𝒔𝑐

(𝑖𝑡𝑒𝑟) and 𝒆𝑐:= 𝒆𝑐

(𝑖𝑡𝑒𝑟) into Equation (58)

Compute 𝜕𝒆𝑐

𝜕𝒔 by solving Equations (66) and (67) with 𝒔𝑐: = 𝒔𝑐

(𝑖𝑡𝑒𝑟) and 𝒆𝑐: = 𝒆𝑐

(𝑖𝑡𝑒𝑟)

Compute 𝜕𝒗𝑐

𝜕𝒔 by substituting

𝜕𝒆𝑐

𝜕𝒔 into Equation (65)

Compute 𝑱 by substituting 𝜕𝒗𝑐

𝜕𝒔 Equation (61)

Page 215: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

204

While (‖𝑑𝒔𝑐

𝑑𝑡‖

∞> 𝑠𝑡𝑜𝑙) and 𝑱 is not singular

𝑖𝑡𝑒𝑟 ≔ 𝑖𝑡𝑒𝑟 + 1

Update 𝒔𝑐(𝑖𝑡𝑒𝑟)

by substituting 𝜕(

𝑑𝒔𝑐𝑑𝑡

)

𝜕𝒔𝑐= 𝑱 and 𝒔𝑐

(𝑛)= 𝒔𝑐

(𝑖𝑡𝑒𝑟−1) into Equation (59)

Compute 𝒆𝑐(𝑖𝑡𝑒𝑟)

by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(𝑖𝑡𝑒𝑟)

Compute 𝑑𝒔𝑐

𝑑𝑡 by substituting 𝒔𝑐: = 𝒔𝑐

(𝑖𝑡𝑒𝑟) and 𝒆𝑐:= 𝒆𝑐

(𝑖𝑡𝑒𝑟) into Equation (58)

Compute 𝜕𝒆𝑐

𝜕𝒔 by solving Equations (66) and (67) with 𝒔𝑐: = 𝒔𝑐

(𝑖𝑡𝑒𝑟)

and 𝒆𝑐: = 𝒆𝑐(𝑖𝑡𝑒𝑟)

Compute 𝜕𝒗𝑐

𝜕𝒔 by substituting

𝜕𝒆𝑐

𝜕𝒔 into Equation (65)

Compute 𝑱 by substituting 𝜕𝒗𝑐

𝜕𝒔 Equation (61)

return 𝒔𝑐(𝑁𝑀)

≔ 𝒔𝑐(𝑖𝑡𝑒𝑟)

and 𝒆𝑐(𝑁𝑀)

≔ 𝒆𝑐(𝑖𝑡𝑒𝑟)

end

On average we find 𝑱 becomes singular in only approximately 5% of the all mutant flux

evaluations using SSF-Evaluator, thus requiring a different updating formula.

C.4.3. Richardson’s Extrapolation when J becomes singular

If singularity for the Jacobian is detected then we switch to a semi-implicit first-order

integrator (Press et al., 2007b) using Richardson’s extrapolation by initializing the relative

metabolite concentrations at the current point (𝒔𝑐(𝑁𝑀)

). The update formula for the

metabolite concentrations (Equation (72)) is derived using the following procedure. The

initial value problem described by Equation (60) can be expressed in matrix form as:

𝑑𝒔𝑐

𝑑𝑡= 𝑺. 𝒗𝑐 = 𝒇(𝒔𝑐) ∀𝑐 ∈ 𝐶 (68)

Page 216: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

205

Equation (68) is integrated starting from the initial condition 𝒔(0) = 𝒔𝑐(𝑁𝑀)

where 𝒔𝑐(𝑁𝑀)

is the vector of relative metabolite concentrations when Newton’s method fails (J becomes

singular). We use the implicit Euler’s method to update substrate concentrations 𝒔 upon

taking a time step of ℎ. This is due to the stiffness of the system of equations that precludes

the use of a less costly explicit method. The update formula for the 𝑛𝑡ℎ iteration is:

𝒔𝑐(𝑛+1)

−𝒔𝑐(𝑛)

ℎ= 𝒇(𝒔𝑐

(𝑛+1))

∀𝑐 ∈ 𝐶 (69)

Since 𝒔𝑐(𝑛+1)

is unknown, 𝒇(𝒔𝑐(𝑛+1)

) cannot be evaluated a priori and must be approximated

using Taylor series expansion.

𝒇 (𝒔𝑐(𝑛+1)

) = 𝒇 (𝒔𝑐(𝑛)

) +𝜕𝒇

𝜕𝒔𝑐

(𝒔𝑐(𝑛+1)

− 𝒔𝑐(𝑛)

) ∀𝑐 ∈ 𝐶 (70)

Equation (70) is substituted back in Equation (69) to yield:

𝒔𝑐(𝑛+1)

− 𝒔𝑐(𝑛)

= ℎ𝒇 (𝒔𝑐(𝑛)

) + ℎ𝜕𝒇

𝜕𝒔 (𝒔𝑐

(𝑛+1)− 𝒔𝑐

(𝑛)) ∀𝑐 ∈ 𝐶 (71)

Equation (71) is rearranged to obtain the semi-implicit update formula for 𝒔𝑐 :

𝒔𝑐(𝑛+1)

= 𝒔𝑐(𝑛)

+ (𝑰 − ℎ𝜕𝒇

𝜕𝒔𝑐

)−1

ℎ𝒇 (𝒔𝑐(𝑛)

) ∀𝑐 ∈ 𝐶 (72)

𝜕𝒇

𝜕𝒔𝑐

in Equation (72) is the Jacobian matrix 𝑱 also present in Equation (61) and is calculated

as described earlier. Equation (72) is integrated using the error-controlled integration

algorithm Richardson extrapolation until either the time step ℎ exceeds a maximum time

step of ℎ𝑚𝑎𝑥 or the desired threshold on Equation (68) is reached (‖𝑑𝒔𝑐

𝑑𝑡‖

∞≤ 𝑠𝑡𝑜𝑙). If ℎ

exceeds ℎ𝑚𝑎𝑥, Newton’s method is reinitialized using concentrations at the termination

Page 217: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

206

point of the semi-implicit integration procedure (𝒔𝑐(𝐼𝑁𝑇)

) and solved until ‖𝑑𝒔𝑐

𝑑𝑡‖

∞≤ 𝑠𝑡𝑜𝑙 is

achieved.

Algorithmic Implementation of Semi-implicit integration using Richardson’s extrapolation

Begin

Specify and fix 𝒌

Set 𝑠𝑡𝑜𝑙: = 10−6, 𝑖𝑡𝑒𝑟: = 1, ℎ ≔ 2 × 10−6, ℎ𝑚𝑎𝑥 ≔ 1010, 𝑡𝑜𝑙 ≔ 10−4

Initialize 𝑠𝑖,𝑐(𝑖𝑡𝑒𝑟): = 𝑠𝑖,𝑐

(𝑁𝑀), ∀𝑖 ∈ 𝐼 and 𝑐 ∈ 𝐶

Compute 𝒆𝑐(𝑖𝑡𝑒𝑟)

by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(𝑖𝑡𝑒𝑟)

Compute 𝑑𝒔𝑐

𝑑𝑡 by substituting 𝒔𝑐: = 𝒔𝑐

(𝑖𝑡𝑒𝑟) and 𝒆𝑐:= 𝒆𝑐

(𝑖𝑡𝑒𝑟) into Equation (58)

Compute 𝑱 by solving Equations (61), (65), (66) and (67) with 𝒔𝑐: = 𝒔𝑐(𝑖𝑡𝑒𝑟)

and 𝒆𝑐: = 𝒆𝑐(𝑖𝑡𝑒𝑟)

Set 𝑖𝑡𝑒𝑟 ≔ 𝑖𝑡𝑒𝑟 + 1

While (‖𝑑𝒔𝑐

𝑑𝑡 ‖

∞> 𝑠𝑡𝑜𝑙) or ℎ < ℎ𝑚𝑎𝑥

Compute 𝒔𝑐(𝑛)

by substituting 𝒔𝑐(𝑛−1)

:= 𝒔𝑐(𝑖𝑡𝑒𝑟−1)

, 𝒇 (𝒔𝑐(𝑛−1)

) ≔𝑑𝒔𝑐

𝑑𝑡,

𝜕𝒇

𝜕𝒔𝑐

≔ 𝑱, and

ℎ ≔ ℎ into Equation (72)

Set 𝒔𝑐(𝑜𝑛𝑒−𝑠𝑡𝑒𝑝 )

≔ 𝒔𝑐(𝑛)

Compute 𝒔𝑐(𝑛)

by substituting 𝒔𝑐(𝑛−1)

:= 𝒔𝑐(𝑖𝑡𝑒𝑟−1)

, 𝒇 (𝒔𝑐(𝑛−1)

) ≔𝑑𝒔𝑐

𝑑𝑡,

𝜕𝒇

𝜕𝒔𝑐

≔ 𝑱, and

ℎ ≔ℎ

2 into Equation (72).

Compute 𝒆𝑐(𝑖𝑡𝑒𝑟)

by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(𝑛)

Compute 𝑑𝒔𝑐

𝑑𝑡 by substituting 𝒔𝑐: = 𝒔𝑐

(𝑛) and 𝒆𝑐:= 𝒆𝑐

(𝑖𝑡𝑒𝑟) into

Equation (58).

Compute 𝑱 by solving Equations (61), (65), (66) and (67) with 𝒔𝑐: = 𝒔𝑐(𝑛)

and 𝒆𝑐: = 𝒆𝑐(𝑖𝑡𝑒𝑟)

Compute 𝒔𝑐(𝑛)

by substituting 𝒔𝑐(𝑛−1)

:= 𝒔𝑐(𝑛)

, 𝒇 (𝒔𝑐(𝑛−1)

) ≔𝑑𝒔𝑐

𝑑𝑡,

𝜕𝒇

𝜕𝒔𝑐

≔ 𝑱,

and ℎ ≔ℎ

2 into Equation (72).

Page 218: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

207

Set 𝒔𝑐(𝑡𝑤𝑜−𝑠𝑡𝑒𝑝 )

≔ 𝒔𝑐(𝑛)

if (‖𝒔𝑐(𝑡𝑤𝑜−𝑠𝑡𝑒𝑝 )

− 𝒔𝑐(𝑜𝑛𝑒−𝑠𝑡𝑒𝑝 )

‖∞

< 𝑡𝑜𝑙)

Set 𝒔𝑐(𝑖𝑡𝑒𝑟)

≔ 𝒔𝑐(𝑡𝑤𝑜−𝑠𝑡𝑒𝑝 )

Compute 𝒆𝑐(𝑖𝑡𝑒𝑟)

by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(𝑖𝑡𝑒𝑟)

Compute 𝑑𝒔𝑐

𝑑𝑡 by substituting 𝒔𝑐: = 𝒔𝑐

(𝑖𝑡𝑒𝑟) and 𝒆:= 𝒆𝑐

(𝑖𝑡𝑒𝑟) into

Equation (58).

Compute 𝑱 by solving Equations (61), (65), (66) and (67) with 𝒔𝑐: = 𝒔𝑐(𝑖𝑡𝑒𝑟)

and 𝒆:= 𝒆𝑐(𝑖𝑡𝑒𝑟)

Set ℎ ≔ℎ×√𝑡𝑜𝑙

√‖𝒔(𝑡𝑤𝑜−𝑠𝑡𝑒𝑝)−𝒔(𝑜𝑛𝑒−𝑠𝑡𝑒𝑝)‖∞

Set 𝑖𝑡𝑒𝑟 ≔ 𝑖𝑡𝑒𝑟 + 1

else

Set ℎ ≔ℎ

2

return 𝒔𝑐(𝐼𝑁𝑇)

≔ 𝒔𝑐(𝑖𝑡𝑒𝑟)

and 𝒆𝑐(𝐼𝑁𝑇)

≔ 𝒆𝑐(𝑖𝑡𝑒𝑟)

end

For the large-scale kinetic model (k-ecoli307) parameterized in this study (see Results

section in the main manuscript), the average computation time required to evaluate steady-

state fluxes in mutants by FPI, Newton’s method, and semi-implicit integration was 10

seconds, 4 seconds, and 37 seconds, respectively. In contrast, steady-state flux evaluation

using numerical integration alone required over 6 minutes to achieve the same mass

imbalance of 10−3 mol%. CPU times are reported are reported for an Intel-i7 (4-core

processor, 2.6GHz, 12GB RAM) computer using a single core implementation.

Page 219: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

208

C.4.4. Integration of FPI, Newton’s method, and semi-implicit integration into a

single pipeline

The three separate methods of updating metabolite concentrations (i) FPI, (ii) Newton’s

method and (iii) semi-implicit integration and switching criteria are integrated into the

SSF-Evaluator procedure. SSF-Evaluator initially solves for steady-state concentrations

using FPI and switches to Newton’s method when the change in metabolite concentrations

between successive iterations falls below a pre-specified threshold of 10−4. Newton’s

method fails when the Jacobian 𝑱 becomes singular, which prompts the switch to semi-

implicit integration using Richardson’s extrapolation. The following summarizes in detail

the algorithmic steps involved:

Algorithmic Implementation of Steady-State Flux Estimator (SSF-Evaluator)

Begin

Specify and fix 𝒌

Specify 𝑚𝑢𝑡𝑎𝑛𝑡 𝑐

Set 𝑠𝑡𝑜𝑙: = 10−6

Initialize 𝑠𝑖,𝑐(𝑖𝑛𝑖𝑡):= 1, ∀𝑖 ∈ 𝐼

Compute 𝒆𝑐(𝑖𝑛𝑖𝑡)

by solving Equations (33) and (54)with 𝒔𝑐: = 𝒔𝑐(𝑖𝑛𝑖𝑡)

Compute 𝑑𝒔𝑐

𝑑𝑡 by substituting 𝒔𝑐: = 𝒔𝑐

(𝑖𝑛𝑖𝑡) and 𝒆𝑐:= 𝒆𝑐

(𝑖𝑛𝑖𝑡) into Equation (58).

While (‖𝑑𝒔𝑐

𝑑𝑡‖

∞> 𝑠𝑡𝑜𝑙)

Compute 𝒔𝑐(𝐹𝑃𝐼)

by using the FPI algorithm using 𝒔𝑐(0)

= 𝒔𝑐(𝑖𝑛𝑖𝑡)

Compute 𝒆𝑐(𝐹𝑃𝐼)

by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(𝐹𝑃𝐼)

Compute 𝑑𝒔𝑐

𝑑𝑡 by substituting 𝒔𝑐: = 𝒔(𝐹𝑃𝐼) and 𝒆𝑐:= 𝒆(𝐹𝑃𝐼) into Equation (58).

if (‖𝑑𝒔𝑐

𝑑𝑡‖

∞≤ 𝑠𝑡𝑜𝑙)

Set 𝒔𝑐(𝑆𝑆)

≔ 𝒔𝑐(𝐹𝑃𝐼)

Page 220: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

209

else

Set 𝒔𝑐(𝐼𝑁𝑇)

≔ 𝒔𝑐(𝐹𝑃𝐼)

while (‖𝑑𝒔𝑐

𝑑𝑡‖

∞> 𝑠𝑡𝑜𝑙)

Compute 𝒔𝑐(𝑁𝑀)

by solving the Newton’s method

using 𝒔𝑐(0)

= 𝒔𝑐(𝐼𝑁𝑇)

Compute 𝒆𝑐(𝑁𝑀)

by solving Equations (33) and (54)

with 𝒔𝑐: = 𝒔(𝑁𝑀)

Compute 𝑑𝒔𝑐

𝑑𝑡 by substituting 𝒔𝑐: = 𝒔(𝑁𝑀) and 𝒆𝑐: = 𝒆(𝑁𝑀) into

Equation (58).

if (‖𝑑𝒔𝑐

𝑑𝑡‖

∞> 𝑠𝑡𝑜𝑙)

Compute 𝒔𝑐(𝐼𝑁𝑇)

using Semi-implicit integration using

𝒔𝑐(0)

= 𝒔𝑐(𝑁𝑀)

Compute 𝒆𝑐(𝐼𝑁𝑇)

by solving Equations (33) and (54) with

𝒔𝑐: = 𝒔(𝐼𝑁𝑇)

Compute 𝑑𝒔𝑐

𝑑𝑡 by substituting 𝒔𝑐: = 𝒔(𝐼𝑁𝑇) and

𝒆𝑐:= 𝒆(𝐼𝑁𝑇) into Equation (58).

if (‖𝑑𝒔𝑐

𝑑𝑡‖

∞≤ 𝑠𝑡𝑜𝑙)

Set 𝒔𝑐(𝑆𝑆)

≔ 𝒔𝑐(𝐼𝑁𝑇)

else

Set 𝒔𝑐(𝑆𝑆)

≔ 𝒔𝑐(𝑁𝑀)

Compute 𝒆𝑐(𝑆𝑆)

by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(𝑆𝑆)

Compute steady-state fluxes 𝑽𝑐(𝑆𝑆)

by solving Equations (30), (31), and (35) with 𝒔𝑐: =

𝒔𝑐(𝑆𝑆)

and 𝒆𝑐: = 𝒆𝑐(𝑆𝑆)

return 𝑽𝑐(𝑆𝑆)

, 𝒔𝑐(𝑆𝑆)

and 𝒆𝑐(𝑆𝑆)

end

Page 221: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

210

Overall, SSF-Evaluator provides an integrated procedure for calculating steady-state

relative metabolite concentrations and enzyme fractions across all mutant networks given

a set of kinetic parameters bypassing integration in almost all cases. Steady-state

elementary fluxes are then computed by substituting the known 𝒌, 𝒔(𝑆𝑆) and 𝒆(𝑆𝑆) into

Equation (30). Elementary fluxes are then related to the net flux through the reaction using

Equations (31) and (35).

It is important to note that SSF-Evaluator is parallelizable across all mutant networks as

reactions fluxes in any particular mutant are independent of metabolite concentrations and

enzyme abundances in any other mutant. Based on this, K-SOLVE and SSF-Evaluator

generate steady-state reaction fluxes 𝑽(𝑆𝑆), relative metabolite concentrations 𝒔(𝑆𝑆), and

enzyme fractions 𝒆(𝑆𝑆) across all mutants given enzyme fractions 𝒆1 and reverse

elementary fluxes 𝒗𝑟 for the WT.

C.5. NLP problem K-FIT

K-SOLVE allows for the calculation of the kinetic parameters as a function of the enzyme

fractions and reverse elementary fluxes in WT. The SSF-Evaluator procedure, in turn,

allows for the calculation of the relative metabolite concentrations and enzyme fractions

using as input the kinetic parameters estimated by K-SOLVE. This implies that metabolic

fluxes 𝑽𝑐 in the mutant networks can be expressed as implicit functions of 𝒆1 and 𝒗𝑟.

Executing procedure K-SOLVE and SSF-Evaluator allows for the calculation of the value

of these implicit functions 𝑽𝑐 = 𝑽𝑐(𝒆1, 𝒗𝑟).

Page 222: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

211

This means that NLP problem P1 can be recast as the following NLP problem with only

linear constraints described below. No equality constraints describing conservation of mass

across all metabolites and enzyme complexes in the WT need to be explicitly imposed

within K-FIT as they are implicitly enforced by K-SOLVE, which limits kinetic parameter

values to only those that simultaneously satisfy conservation of mass (for both enzymes

and metabolites) and concentration scaling with respect to WT. By propagating the

calculated 𝒌, SSF-Evaluator identifies fluxes 𝑽𝑐 across all mutants that automatically

satisfy conservation of mass constraints. The objective function 𝜙 as defined in K-FIT

below includes only the sum of squared errors for only steady-state fluxes in the mutant

networks. Nevertheless, metabolite concentration measurements for the mutant networks,

whenever available, can be supplemented in the objective function in a similar manner.

min𝒆1,𝒗𝑟

𝜙(𝒆1, 𝒗𝑟) = ∑ ∑ (𝑉𝑗𝑐(𝒆1, 𝒗𝑟) − 𝑣𝑗𝑐

(𝑚𝑒𝑎𝑠)

𝜎𝑗𝑐)

2

𝑗∈𝐽𝑐𝑚𝑒𝑎𝑠

𝐶

𝑐=2

Subject to:

𝑣𝑙,1(𝑛𝑒𝑡)

= 𝑉𝑗,1

∀𝑙 ∈ 𝐿𝑗𝑐𝑎𝑡

∀𝑗 ∈ 𝐽

(48)

𝑣𝑙,1(𝑛𝑒𝑡)

= 0

∀𝑙 ∈ 𝐿𝑗𝑟𝑒𝑔

∀𝑗 ∈ 𝐽

(49)

∑(𝑅𝑗𝑙𝑒𝑙,1)

𝑛𝐿

𝑙=1

= 1 ∀𝑗 ∈ 𝐽 (40)

Page 223: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

212

0 ≤ 𝑒𝑙,1 ≤ 1 ∀𝑙 ∈ 𝐿 (44)

𝑣𝑟,𝑙 ≥ 0 ∀𝑙 ∈ 𝐿 (52)

𝑣𝑙,1(𝑛𝑒𝑡)

+ 𝑣𝑟,𝑙 ≥ 0 ∀𝑙 ∈ 𝐿 (53)

Since all constraints in formulation K-FIT are linear, K-FIT can efficiently be solved using

a gradient-based method that requires as inputs the first- and second-order gradients of the

objective function with respect to the variables 𝒆1 and 𝒗𝑟 to construct the update formula

and check for convergence. The expressions that relate the approximate gradient and

Hessian to the sensitivity of the predicted steady-state fluxes can be derived by constructing

a quadratic approximation for the objective function 𝜙. The following procedure describes

the construction of the quadratic approximation of 𝜙 used to update 𝒆1 and 𝒗𝑟 at each

iteration of K-FIT.

C.6. K-UPDATE procedure that checks for convergence and updates kinetic

parameters using the approximate gradient and Hessian of 𝝓

The variables 𝒆1 and 𝒗𝑟 are first assembled for convenience into a single [2𝑛𝐿 × 1] vector

𝒙

𝒙 = [(𝒆1)𝑻|(𝒗𝒓 )

𝑻]𝑻 (73)

Page 224: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

213

The objective function 𝝓 and flux through reaction 𝑗 in mutant 𝑐 are recast as implicit

functions of 𝒙 as 𝜙(𝒙) and 𝑉𝑗,𝑐(𝒙). The objective function is expressed in vector form as

𝜙(𝒙) = (𝑽(𝒙) − 𝑽(𝒎𝒆𝒂𝒔))𝑇𝑾−𝟏(𝑽(𝒙) − 𝑽(𝒎𝒆𝒂𝒔)) (74)

𝑽(𝒙) is the [𝑛𝑚𝑒𝑎𝑠 × 1] vector of the calculated steady-state fluxes in mutants.

𝑛𝑚𝑒𝑎𝑠 = ∑(cardinality 𝑜𝑓 𝐽𝑐(𝑚𝑒𝑎𝑠))

𝑐

𝑽(𝒎𝒆𝒂𝒔) is the [𝑛𝑚𝑒𝑎𝑠 × 1] vector of measured fluxes.

𝑾 is the [𝑛𝑚𝑒𝑎𝑠 × 𝑛𝑚𝑒𝑎𝑠] diagonal matrix storing the variance of the flux measurements,

thus

𝑊𝑖𝑖 = 𝜎𝑖−2 ∀𝑖 = {1,2, … , 𝑛𝑛𝑒𝑎𝑠}

Upon defining the residual 𝒓(𝒙) = (𝑽(𝒙) − 𝑽(𝒎𝒆𝒂𝒔)), the objective function is expressed

more compactly as

𝜙(𝒙) = (𝒓(𝒙))𝑇𝑾−𝟏𝒓(𝒙) (75)

For a small perturbation ∆𝒙 to the parameter vector 𝒙, the objective function at 𝒙 + ∆𝒙

becomes equal to

𝜙(𝒙 + ∆𝒙) = (𝒓(𝒙 + ∆𝒙))𝑇𝑾−𝟏𝒓(𝒙 + ∆𝒙) (76)

Equation (73) is identical to the least squares representation of isotope tracer-based flux

elucidation using 13C-MFA (Antoniewicz et al., 2006). A popular and successful solution

Page 225: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

214

strategy involves constructing a quadratic approximation of the objective function

described by Equation (76). Using Taylor series expansion, 𝒓(𝒙 + ∆𝒙) linearized about 𝒙

as described by Antoniewicz et al. (Antoniewicz et al., 2006) as:

𝑟(𝒙 + ∆𝒙) = 𝒓(𝒙) +𝜕𝒓

𝜕𝒙∆𝒙

(77)

𝜕𝒓

𝜕𝒙 is the [𝑛𝑚𝑒𝑎𝑠 × 2𝑛𝐿] matrix representing the local sensitivity of 𝒓(𝒙) with respect to 𝒙.

𝜙(𝒙 + ∆𝒙) is computed by substituting Equation (77) in Equation (76) yielding:

𝜙(𝒙 + ∆𝒙) = (𝒓(𝒙 + ∆𝒙))𝑇𝑾−𝟏𝒓(𝒙 + ∆𝒙)

= (𝒓(𝒙) +𝜕𝒓

𝜕𝒙∗ ∆𝒙)

𝑇

𝑾−𝟏 (𝒓(𝒙) +𝜕𝒓

𝜕𝒙∗ ∆𝒙)

= (𝒓(𝒙))𝑇𝑾−𝟏𝒓(𝒙) + 𝟐(∆𝒙)𝑻 ∗ (

𝜕𝒓

𝜕𝒙)𝑻

𝑾−𝟏𝒓(𝒙) + (∆𝒙)𝑻 (𝜕𝒓

𝜕𝒙)𝑻

𝑾−𝟏𝜕𝒓

𝜕𝒙∗ ∆𝒙

(78)

The approximate gradient 𝑮 and the approximate Hessian 𝑯 are defined using Equation

(79).

𝑮 = (𝜕𝒓

𝜕𝒙)𝑻

𝑾−𝟏𝒓(𝒙)

𝑯 = (𝜕𝒓

𝜕𝒙)𝑻

𝑾−𝟏𝜕𝒓

𝜕𝒙

(79)

Upon replacing the relevant terms in Equation (78) using the definitions of the objective

function 𝜙(𝒙) from Equation (75) and the approximate Gradient and Hessian from

Equation (79), Equation (78) is simplified as

Page 226: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

215

𝜙(𝒙 + ∆𝒙) = 𝜙(𝒙) + 2∆𝒙𝑇𝑮 + ∆𝒙𝑇𝑯∆𝒙 (80)

Equation (80) is the local quadratic approximation (Antoniewicz et al., 2006) of the

objective function 𝜙(𝒙). In the above expression, 𝑮 = (𝜕𝒓

𝜕𝒙)𝑻

𝑾−𝟏𝒓(𝒙) and 𝑯 =

(𝜕𝒓

𝜕𝒙)𝑻

𝑾−𝟏 𝜕𝒓

𝜕𝒙 are the approximate gradient and Hessian, respectively. Upon subtracting

Equation (75) from Equation (80) we obtain:

∆𝜙 = 𝜙(𝒙 + ∆𝒙) − 𝜙(𝒙) = 2∆𝒙𝑇𝑮 + ∆𝒙𝑇𝑯∆𝒙 (81)

A stationary point (i.e., local minimum) for the (approximated) objective function is

reached when 𝑑(∆𝜙)

𝑑(∆𝒙)= 0, which yields:

∆𝒙 = −𝑯−1𝑮 (82)

Equation (82) computes the unconstrained search direction at each iteration. Note that 𝜕𝒓

𝜕𝒙

is needed in to update 𝒙. Because the residual vector 𝒓(𝒙) only contains steady-state fluxes,

𝜕𝒓

𝜕𝒙 is assembled using the sensitivity of fluxes to 𝒙 based on the chain rule:

𝜕𝒗𝑐

𝜕𝒙=

𝜕𝒗𝑐

𝜕𝒌 𝜕𝒌

𝜕𝒙

(83)

𝜕𝒗

𝜕𝒌 is computed by differentiating Equation (30) with respect to 𝒌 to yield:

Page 227: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

216

𝜕𝑣𝑝𝑐

𝜕𝒌= 𝑘𝑝

(

(

∏ 𝑠𝑖,𝑐

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝≤0 )

𝜕

𝜕𝒌

(

∏ 𝑒𝑙′,𝑐

𝑙𝐸𝑙𝑝<0 )

+

(

∏ 𝑒𝑙′,𝑐

𝑙𝐸𝑙𝑝<0 )

𝜕

𝜕𝒌

(

∏ 𝑠𝑖,𝑐

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝≤0 )

)

+

(

∏ 𝑠𝑖,𝑐

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝≤0 )

(

∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0 )

𝜕𝑘𝑝

𝜕𝒌

∀𝑝 ∈ 𝑃

∀𝑐 ∈ 𝐶

∀𝑖 ∈ 𝐼

(84)

In Equation (84), both 𝜕𝒆𝑐

𝜕𝒌 and

𝜕𝒔𝑐

𝜕𝒌 are unknown. They can be inferred by solving the system

of linear algebraic equations formed by differentiating Equations (33), (54), and (56),

respectively, with respect to 𝒌 as follows:

∑(𝑅𝑗𝑙

𝜕𝑒𝑙,𝑐

𝜕𝒌 )

𝑛𝐿

𝑙=1

= 0 ∀𝑐 ∈ 𝐶

∀𝑗 ∈ 𝐽

(85)

∑ 𝐸𝑙𝑝

(

𝑘𝑝

(

(∏ 𝑠𝑖,𝑐

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝≤0

)𝜕

𝜕𝒌(∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0

) + (∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0

)𝜕

𝜕𝒌(∏ 𝑠

𝑖,𝑐

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝≤0

)

)

𝑃

𝑝=1

+ (∏ 𝑠𝑖,𝑐

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝≤0

)(∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0

)𝜕𝑘𝑝

𝜕𝒌

)

= 0

∀𝑙 ∈ 𝐿

∀𝑐 ∈ 𝐶

∀𝑖 ∈ 𝐼

(86)

∑ 𝑆𝑖𝑝

(

𝑘𝑝

(

(∏ 𝑠𝑖,𝑐

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝≤0

)𝜕

𝜕𝒌(∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0

) + (∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0

)𝜕

𝜕𝒌(∏ 𝑠

𝑖,𝑐

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝≤0

)

)

𝑃

𝑝=1

+ (∏ 𝑠𝑖,𝑐

−𝑆𝑖𝑝

𝑖𝑆𝑖𝑝≤0

)(∏ 𝑒𝑙,𝑐

𝑙𝐸𝑙𝑝<0

)𝜕𝑘𝑝

𝜕𝒌

)

= 0

∀𝑙 ∈ 𝐿

∀𝑐 ∈ 𝐶

∀𝑖 ∈ 𝐼

(87)

Page 228: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

217

The partial derivatives of the product operators 𝜕

𝜕𝒌(∏ 𝑒𝑙,𝑐

𝑙

𝐸𝑙𝑝<0) and 𝜕

𝜕𝒌(∏ 𝑠

𝑖,𝑐

−𝑆𝑖𝑝𝑖

𝑆𝑖𝑝≤0) can be

expressed in summation form as shown in Equations (63) and (64). Equations (85), (86),

and (87) therefore form a [(𝑛𝐿 + 𝑛𝑀) × (𝑛𝐿 + 𝑛𝑀)] system of linear algebraic equations

that can be solved to obtain 𝜕𝒆𝑐

𝜕𝒌 and

𝜕𝒔𝑐

𝜕𝒌 when 𝒌, 𝒆𝑐, and 𝒔𝑐 are specified.

𝜕𝒗𝑐

𝜕𝒌 is calculated

by substituting 𝜕𝒆𝑐

𝜕𝒌 and

𝜕𝒔𝑐

𝜕𝒌 into Equation (84). Because 𝒙 contains both WT enzyme

fractions and elementary fluxes, 𝜕𝒌

𝜕𝒙 is calculated by differentiating by parts Equation (50)

with respect to 𝒙 to yield:

𝜕

𝜕𝒙(𝑣𝑟,𝑙 + 𝑣𝑙,1

(𝑛𝑒𝑡))

=𝜕𝑘(2𝑙−1)

𝜕𝒙

(

∏ 𝑒𝑙,1

𝑙𝐸𝑙𝑝<0 )

+ 𝑘(2𝑙−1)

𝜕

𝜕𝒙

(

∏ 𝑒𝑙,1

−𝐸𝑙𝑝

𝑙𝐸𝑙𝑝<0 )

∀𝑝 ∈ 𝑃

∀𝑙 ∈ 𝐿

(88)

𝜕𝑣𝑟,𝑙

𝜕𝒙=

𝜕𝑘(2𝑙)

𝜕𝒙

(

∏ 𝑒𝑙,1

𝑙𝐸𝑙𝑝<0 )

+ 𝑘(2𝑙)

𝜕

𝜕𝒙

(

∏ 𝑒𝑙,1

−𝐸𝑙𝑝

𝑙𝐸𝑙𝑝<0 )

Solution to the [𝑛𝑃 × 𝑛𝑃] square system of linear algebraic equations formed by Equation

(88) yields 𝜕𝒌

𝜕𝒙. Flux sensitivities can be obtained by substituting

𝜕𝒌

𝜕𝒙 in Equation (84).

Having computed the sensitivity of elementary fluxes, the sensitivity of all net reaction

fluxes is calculated by substituting 𝜕𝒗

𝜕𝒙 in Equations (89) and (90), which are obtained by

differentiating Equations (31) and (35) with respect to 𝒙 as shown below.

Page 229: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

218

𝜕𝑣𝑙,𝑐(𝑛𝑒𝑡)

𝜕𝒙=

𝜕𝑣(2𝑙−1),𝑐

𝜕𝒙−

𝜕𝑣2𝑙,𝑐

𝜕𝒙

∀𝑙 ∈ 𝐿

∀𝑐 ∈ 𝐶

(89)

𝜕𝑉𝑗,𝑐

𝜕𝒙= ∑(𝑁𝑗𝑙

𝜕𝑣𝑙,𝑐(𝑛𝑒𝑡)

𝜕𝒙)

𝑛𝐿

𝑙=1

∀𝑗 ∈ 𝐽

∀𝑐 ∈ 𝐶

(90)

The sequence of steps to be followed to compute the approximate gradients of the objective

function and update the variables 𝒙 in every iteration is described by the algorithmic

procedure for K-UPDATE.

Algorithmic procedure K-UPDATE

begin

Specify and fix 𝒌, 𝑒, 𝒔, and 𝑽 computed by K-SOLVE and SSF-Evaluator

Specify measured fluxes 𝑽(𝑚𝑒𝑎𝑠) and the weighting matrix 𝑾

Specify list of 𝑚𝑢𝑡𝑎𝑛𝑡𝑠

Compute 𝜕𝒌

𝜕𝒙 by solving the [𝑛𝑃 × 𝑛𝑃] system of linear Equation (88) using 𝒆1 ≔ 𝒆1

(𝑆𝑆)

and 𝒌 ≔ 𝒌

for all mutants:

Calculate sensitivities 𝜕𝒔𝑐

(𝑆𝑆)

𝜕𝒌 and

𝜕𝒆𝑐(𝑆𝑆)

𝜕𝒌 by solving the [(𝑛𝐿 + 𝑛𝑀) × (𝑛𝐿 + 𝑛𝑀)]

system of linear Equations (85), (86), and (87) using 𝒆𝑐 ≔ 𝒆𝑐(𝑆𝑆)

and 𝒔𝑐 ≔ 𝒔𝑐(𝑆𝑆)

Calculate 𝜕𝒗𝑐

(𝑆𝑆)

𝜕𝒌 by substituting

𝜕𝒔𝑐(𝑆𝑆)

𝜕𝒌 and

𝜕𝒆𝑐(𝑆𝑆)

𝜕𝒌 in Equation (84).

Calculate 𝜕𝒗𝑐

(𝑆𝑆)

𝜕𝒙 by substituting

𝜕𝒗𝑐(𝑆𝑆)

𝜕𝒌 and

𝜕𝒌

𝜕𝒙 into Equation (83).

Calculate 𝜕𝒗𝑐

(𝑛𝑒𝑡)

𝜕𝒙 by substituting

𝜕𝒗𝑐(𝑆𝑆)

𝜕𝒙 into Equation (89).

Calculate 𝜕𝑽𝑐

𝜕𝒙 by substituting

𝜕𝒗𝑐(𝑛𝑒𝑡)

𝜕𝒙 into Equation (90).

Page 230: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

219

Assemble the residual vector 𝒓(𝒙) from 𝑽 and 𝑽(𝑚𝑒𝑎𝑠)

Assemble the sensitivity matrix 𝜕𝒓

𝜕𝒙 from

𝜕𝑽

𝜕𝒙

Compute the objective function 𝜙 by substituting 𝒓(𝒙) and 𝑾 into Equation (75).

Compute the approximate gradient 𝑮 and the approximate Hessian 𝐻 by substituting

𝒓(𝒙), 𝜕𝒓

𝜕𝒙, and 𝑾 into Equation (79)

return 𝜙, 𝑮, and 𝑯

end

C.7. Algorithmic description of K-FIT

The procedures K-SOLVE, SSF-Evaluator, and K-UPDATE are integrated into the

algorithm K-FIT as described below. Briefly, WT enzyme fractions 𝒆1 and reverse

elementary fluxes 𝒗𝑟 satisfying (in)equalities in Equations (40), (44), (52), and (53) are

randomly initialized. For convenience, we combine the operations of K-SOLVE and SSF-

Evaluator into a single algorithm FLUXSOLVE which predict steady-state fluxes in mutant

networks given 𝒆1 and 𝒗𝑟. In the first step of FLUXSOLVE, kinetic parameters 𝒌 anchored

to WT steady-state fluxes 𝑽1 are computed from 𝒆1 and 𝒗𝑟 using procedure K-SOLVE.

The kinetic parameters are then used to evaluate steady-state fluxes in mutant networks

using procedure SSF-Evaluator. Having computed steady-state fluxes in mutants 𝑽,

relative metabolite concentrations 𝒔, and enzyme fractions 𝒆, the objective function 𝜙 and

its approximate gradient 𝑮 and Hessian 𝑯 are computed using procedure K-UPDATE. 𝑮

and 𝑯 are used to check for convergence and update 𝒆1 and 𝒗𝑟 if optimality is not achieved.

Page 231: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

220

The algorithmic description of FLUXSOLVE is provided below:

Algorithmic description of FLUXSOLVE

begin

Specify and fix WT enzyme fractions 𝒆1 and reverse elementary fluxes 𝒗𝑟.

Specify and fix the WT steady-state flux distribution 𝑽1.

Specify the list of 𝑚𝑢𝑡𝑎𝑛𝑡𝑠

Compute anchored kinetic parameters 𝒌 using Procedure K-SOLVE with the specified 𝒆1,

𝒗𝑟, and 𝑽1.

for all mutants

Compute steady-state fluxes 𝑽𝑐(𝑆𝑆)

, relative metabolite concentrations 𝒔𝑐(𝑆𝑆)

, and

enzyme fractions 𝒆𝑐(𝑆𝑆)

in 𝑚𝑢𝑡𝑎𝑛𝑡 𝑐 ∈ 𝐶 using SSF-Evaluator with kinetic

parameters 𝒌

Set 𝑽𝑐 ≔ 𝑽𝑐(𝑆𝑆)

, 𝒔𝑐 ≔ 𝒔𝑐(𝑆𝑆)

, and 𝒆𝑐 ≔ 𝒆𝑐(𝑆𝑆)

return 𝑽, 𝒔, and 𝒆

end

The overall workflow for the K-FIT algorithm combining procedures FLUXSOLVE and

K-UPDATE is described below and is also pictorially shown in Figure 4.4:

Overall algorithmic procedure K-FIT

begin

Specify and fix WT flux distribution 𝑽1, measured fluxes 𝑽(𝑚𝑒𝑎𝑠), variance 𝑾,

set of mutants, 𝑥𝑡𝑜𝑙 and 𝑔𝑡𝑜𝑙

Randomly initialize 𝒙 satisfying constraints in Equations (40), (44), (52),

and (53).

Set 𝑠𝑡𝑜𝑙 ≔ 10−6

Using FLUXSOLVE and inputs 𝒙 evaluate initial steady-state fluxes 𝑽, relative

metabolite concentrations 𝒔, and enzyme fractions 𝒆.

Page 232: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

221

Evaluate the initial value of the objective function 𝜙(𝒙) and gradients 𝑮 and 𝑯

using

K-UPDATE.

Set 𝑑𝑜𝑛𝑒 ≔ 𝑓𝑎𝑙𝑠𝑒

Set 𝒙𝒃𝒆𝒔𝒕 ≔ 𝒙, 𝜙𝑏𝑒𝑠𝑡 ≔ 𝜙(𝒙)

while (not 𝑑𝑜𝑛𝑒)

Compute ∆𝒙 using Equation (82)

if (‖∆𝒙‖∞ ≤ 𝑥𝑡𝑜𝑙) or (‖𝑮‖∞ ≤ 𝑔𝑡𝑜𝑙)

Set 𝑑𝑜𝑛𝑒 ≔ 𝑡𝑟𝑢𝑒

else

Update 𝒙 ≔ 𝒙𝒃𝒆𝒔𝒕 + ∆𝒙

Using FLUXSOLVE and inputs 𝒙 evaluate steady-state fluxes 𝑽,

relative metabolite concentrations 𝒔, and enzyme fractions 𝒆.

Evaluate the initial value of the objective function 𝜙(𝒙) and

gradients 𝑮 and 𝑯 using K-UPDATE.

if 𝜙(𝒙) < 𝜙𝑏𝑒𝑠𝑡

Update 𝒙𝒃𝒆𝒔𝒕 ≔ 𝒙, 𝜙𝑏𝑒𝑠𝑡 ≔ 𝜙(𝒙)

return 𝒙𝑏𝑒𝑠𝑡, 𝜙𝑏𝑒𝑠𝑡

end

Page 233: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

References

Abdel-Hamid, A.M., Attwood, M.M., and Guest, J.R. (2001). Pyruvate oxidase contributes to the

aerobic growth efficiency of Escherichia coli. Microbiology 147, 1483-1498.

Abernathy, M.H., Yu, J., Ma, F., Liberton, M., Ungerer, J., Hollinshead, W.D., Gopalakrishnan,

S., He, L., Maranas, C.D., Pakrasi, H.B., et al. (2017). Deciphering cyanobacterial

phenotypes for fast photoautotrophic growth via isotopically nonstationary metabolic flux

analysis. Biotechnology for Biofuels 10, 273.

Ahn, W.S., and Antoniewicz, M.R. (2011). Metabolic flux analysis of CHO cells at growth and

non-growth phases using isotopic tracers and mass spectrometry. Metabolic engineering

13, 598-609.

Alagesan, S., Gaudana, S.B., Sinha, A., and Wangikar, P.P. (2013). Metabolic flux analysis of

Cyanothece sp. ATCC 51142 under mixotrophic conditions. Photosynth Res 118, 191-

198.

Anderson, D.H. (1983). Compartmental Modeling and Tracer Kinetics. (Springer-Verlag Berlin

Heidelberg).

Anderson, L.E., and Carol, A.A. (2004). Enzyme co-localization with rubisco in pea leaf

chloroplasts. Photosynth Res 82, 49-58.

Anderson, L.E., Gatla, N., and Carol, A.A. (2005). Enzyme co-localization in pea leaf

chloroplasts: glyceraldehyde-3-P dehydrogenase, triose-P isomerase, aldolase and

sedoheptulose bisphosphatase. Photosynth Res 83, 317-328.

Antoniewicz, M.R., Kelleher, J.K., and Stephanopoulos, G. (2006). Determination of confidence

intervals of metabolic fluxes estimated from stable isotope measurements. Metabolic

engineering 8, 324-337.

Antoniewicz, M.R., Kelleher, J.K., and Stephanopoulos, G. (2007). Elementary metabolite units

(EMU): a novel framework for modeling isotopic distributions. Metabolic engineering 9,

68-86.

Atsumi, S., Higashide, W., and Liao, J.C. (2009). Direct photosynthetic recycling of carbon

dioxide to isobutyraldehyde. Nat Biotechnol 27, 1177-1180.

Baba, T., Ara, T., Hasegawa, M., Takai, Y., Okumura, Y., Baba, M., Datsenko, K.A., Tomita, M.,

Wanner, B.L., and Mori, H. (2006). Construction of Escherichia coli K-12 in-frame,

Page 234: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

223

single-gene knockout mutants: the Keio collection. Molecular systems biology 2, 2006

0008.

Banga, J.R., and Balsa-Canto, E. (2008). Parameter estimation and optimal experimental design.

Essays Biochem 45, 195-209.

Bonarius, H.P., Timmerarends, B., de Gooijer, C.D., and Tramper, J. (1998). Metabolite-

balancing techniques vs. 13C tracer experiments to determine metabolic fluxes in

hybridoma cells. Biotechnol Bioeng 58, 258-262.

Bricker, T.M., Zhang, S., Laborde, S.M., Mayer, P.R., 3rd, Frankel, L.K., and Moroney, J.V.

(2004). The malic enzyme is required for optimal photoautotrophic growth of

Synechocystis sp. strain PCC 6803 under continuous light but not under a diurnal light

regimen. Journal of bacteriology 186, 8144-8148.

Briggs, G.E., and Haldane, J.B. (1925). A Note on the Kinetics of Enzyme Action. Biochem J 19,

338-339.

Burgard, A.P., Nikolaev, E.V., Schilling, C.H., and Maranas, C.D. (2004). Flux coupling analysis

of genome-scale metabolic network reconstructions. Genome Res 14, 301-312.

Burgard, A.P., Pharkya, P., and Maranas, C.D. (2003). Optknock: a bilevel programming

framework for identifying gene knockout strategies for microbial strain optimization.

Biotechnol Bioeng 84, 647-657.

Byrd, R.H., Gilbert, J.C., and Nocedal, J. (2000). A trust region method based on interior point

techniques for nonlinear programming. Math. Program. 89, 149-185.

Byrd, R.H., Hribar, M.E., and Nocedal, J. (1999). An Interior Point Algorithm for Large-Scale

Nonlinear Programming. SIAM J. on Optimization 9, 877-900.

Caspi, R., Altman, T., Billington, R., Dreher, K., Foerster, H., Fulcher, C.A., Holland, T.A.,

Keseler, I.M., Kothari, A., Kubo, A., et al. (2014). The MetaCyc database of metabolic

pathways and enzymes and the BioCyc collection of Pathway/Genome Databases.

Nucleic Acids Res 42, D459-471.

Chae, T.U., Choi, S.Y., Kim, J.W., Ko, Y.S., and Lee, S.Y. (2017). Recent advances in systems

metabolic engineering tools and strategies. Current opinion in biotechnology 47, 67-82.

Chang, Y., Suthers, P.F., and Maranas, C.D. (2008). Identification of optimal measurement sets

for complete flux elucidation in metabolic flux analysis experiments. Biotechnol Bioeng

100, 1039-1049.

Page 235: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

224

Chassagnole, C., Noisommit-Rizzi, N., Schmid, J.W., Mauch, K., and Reuss, M. (2002).

Dynamic modeling of the central carbon metabolism of Escherichia coli. Biotechnol

Bioeng 79, 53-73.

Chen, W.L., Chen, D.Z., and Taylor, K.T. (2013). Automatic reaction mapping and reaction

center detection. Wiley Interdisciplinary Reviews: Computational Molecular Science 3,

560-593.

Chen, X., Alonso, A.P., Allen, D.K., Reed, J.L., and Shachar-Hill, Y. (2011). Synergy between

(13)C-metabolic flux analysis and flux balance analysis for understanding metabolic

adaptation to anaerobiosis in E. coli. Metabolic engineering 13, 38-48.

Chen, X., Schreiber, K., Appel, J., Makowka, A., Fähnrich, B., Roettger, M., Hajirezaei, M.R.,

Sönnichsen, F.D., Schönheit, P., Martin, W.F., et al. (2016). The Entner–Doudoroff

pathway is an overlooked glycolytic route in cyanobacteria and plants. Proceedings of the

National Academy of Sciences 113, 5441-5446.

Cheng, J.K., and Alper, H.S. (2014). The genome editing toolbox: a spectrum of approaches for

targeted modification. Current opinion in biotechnology 30, 87-94.

Cho, S., Shin, J., and Cho, B.K. (2018). Applications of CRISPR/Cas System to Bacterial

Metabolic Engineering. Int J Mol Sci 19.

Choi, J., and Antoniewicz, M.R. (2019). Tandem Mass Spectrometry for (13)C Metabolic Flux

Analysis: Methods and Algorithms Based on EMU Framework. Front Microbiol 10, 31.

Chowdhury, A., Khodayari, A., and Maranas, C.D. (2015a). Improving prediction fidelity of

cellular metabolism with kinetic descriptions. Current opinion in biotechnology 36, 57-

64.

Chowdhury, A., Zomorrodi, A.R., and Maranas, C.D. (2014). k-OptForce: integrating kinetics

with flux balance analysis for strain design. PLoS Comput Biol 10, e1003487.

Chowdhury, A., Zomorrodi, A.R., and Maranas, C.D. (2015b). Bilevel optimization techniques in

computational strain design. Computers & Chemical Engineering 72, 363-372.

Clasquin, M.F., Melamud, E., Singer, A., Gooding, J.R., Xu, X., Dong, A., Cui, H., Campagna,

S.R., Savchenko, A., Yakunin, A.F., et al. (2011). Riboneogenesis in yeast. Cell 145,

969-980.

Cleland, W.W. (1963). The kinetics of enzyme-catalyzed reactions with two or more substrates or

products: I. Nomenclature and rate equations. Biochimica et Biophysica Acta (BBA) -

Specialized Section on Enzymological Subjects 67, 104-137.

Page 236: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

225

Copeland, W.B., Bartley, B.A., Chandran, D., Galdzicki, M., Kim, K.H., Sleight, S.C., Maranas,

C.D., and Sauro, H.M. (2012). Computational tools for metabolic engineering. Metabolic

engineering 14, 270-280.

Costa, R.S., Verissimo, A., and Vinga, S. (2014). KiMoSys: a web-based repository of

experimental data for KInetic MOdels of biological SYStems. BMC systems biology 8,

85.

Crown, S.B., and Antoniewicz, M.R. (2012). Selection of tracers for 13C-metabolic flux analysis

using elementary metabolite units (EMU) basis vector methodology. Metabolic

engineering 14, 150-161.

Crown, S.B., Indurthi, D.C., Ahn, W.S., Choi, J., Papoutsakis, E.T., and Antoniewicz, M.R.

(2011). Resolving the TCA cycle and pentose-phosphate pathway of Clostridium

acetobutylicum ATCC 824: Isotopomer analysis, in vitro activities and expression

analysis. Biotechnol J 6, 300-305.

Crown, S.B., Long, C.P., and Antoniewicz, M.R. (2015). Integrated 13C-metabolic flux analysis

of 14 parallel labeling experiments in Escherichia coli. Metabolic engineering 28, 151-

158.

Dash, S., Khodayari, A., Zhou, J., Holwerda, E.K., Olson, D.G., Lynd, L.R., and Maranas, C.D.

(2017). Development of a core Clostridium thermocellum kinetic metabolic model

consistent with multiple genetic perturbations. Biotechnol Biofuels 10, 108.

Dash, S., Mueller, T.J., Venkataramanan, K.P., Papoutsakis, E.T., and Maranas, C.D. (2014).

Capturing the response of Clostridium acetobutylicum to chemical stressors using a

regulated genome-scale metabolic model. Biotechnol Biofuels 7, 144.

Dromms, R.A., and Styczynski, M.P. (2012). Systematic applications of metabolomics in

metabolic engineering. Metabolites 2, 1090-1122.

Drud, A. (1985). CONOPT: A GRG code for large sparse dynamic nonlinear optimization

problems. Math. Program. 31, 153-191.

Du, B., Zielinski, D.C., Kavvas, E.S., Drager, A., Tan, J., Zhang, Z., Ruggiero, K.E.,

Arzumanyan, G.A., and Palsson, B.O. (2016). Evaluation of rate law approximations in

bottom-up kinetic models of metabolism. BMC systems biology 10, 40.

Eisenhut, M., Ruth, W., Haimovich, M., Bauwe, H., Kaplan, A., and Hagemann, M. (2008). The

photorespiratory glycolate metabolism is essential for cyanobacteria and might have been

conveyed endosymbiontically to plants. Proc Natl Acad Sci U S A 105, 17199-17204.

Page 237: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

226

Feist, A.M., Henry, C.S., Reed, J.L., Krummenacker, M., Joyce, A.R., Karp, P.D., Broadbelt,

L.J., Hatzimanikatis, V., and Palsson, B.O. (2007). A genome-scale metabolic

reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and

thermodynamic information. Molecular systems biology 3, 121.

Feng, X., Bandyopadhyay, A., Berla, B., Page, L., Wu, B., Pakrasi, H.B., and Tang, Y.J. (2010).

Mixotrophic and photoheterotrophic metabolism in Cyanothece sp. ATCC 51142 under

continuous light. Microbiology 156, 2566-2574.

Flores, S., Gosset, G., Flores, N., de Graaf, A.A., and Bolivar, F. (2002). Analysis of carbon

metabolism in Escherichia coli strains with an inactive phosphotransferase system by

(13)C labeling and NMR spectroscopy. Metabolic engineering 4, 124-137.

Foster, C.J., Gopalakrishnan, S., Antoniewicz, M.R., and Maranas, C.D. (2019 (Under Review)).

From E. coli mutant 13C labeling data to a core kinetic model: A kinetic model

parameterization pipeline.

Franklin, G.F., Powell, D.J., and Workman, M.L. (1997). Digital Control of Dynamic Systems

(3rd Edition). (Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.).

Frohlich, F., Kaltenbacher, B., Theis, F.J., and Hasenauer, J. (2017). Scalable Parameter

Estimation for Genome-Scale Biochemical Reaction Networks. PLoS Comput Biol 13,

e1005331.

Frohlich, F., Kessler, T., Weindl, D., Shadrin, A., Schmiester, L., Hache, H., Muradyan, A.,

Schutte, M., Lim, J.H., Heinig, M., et al. (2018). Efficient Parameter Estimation Enables

the Prediction of Drug Response Using a Mechanistic Pan-Cancer Pathway Model. Cell

Syst 7, 567-579 e566.

Fuhrer, T., Zampieri, M., Sevin, D.C., Sauer, U., and Zamboni, N. (2017). Genomewide

landscape of gene-metabolome associations in Escherichia coli. Molecular systems

biology 13, 907.

Gill, P.E., Murray, W., and Wright, M.H. (1984). Practical Optimization. (London: Academic

Press).

Girgis, H.S., Harris, K., and Tavazoie, S. (2012). Large mutational target size for rapid

emergence of bacterial persistence. Proceedings of the National Academy of Sciences of

the United States of America 109, 12740-12745.

Giuliano, G. (2014). Plant carotenoids: genomics meets multi-gene engineering. Curr Opin Plant

Biol 19, 111-117.

Page 238: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

227

Golub, G.H., and Loan, C.F.V. (1996). Matrix computations (3rd ed.). (Johns Hopkins University

Press).

Gopalakrishnan, S., and Maranas, C.D. (2015a). 13C metabolic flux analysis at a genome-scale.

Metabolic engineering 32, 12-22.

Gopalakrishnan, S., and Maranas, C.D. (2015b). Achieving Metabolic Flux Analysis for S.

cerevisiae at a Genome-Scale: Challenges, Requirements, and Considerations.

Metabolites 5, 521-535.

Greene, J.L., Waechter, A., Tyo, K.E.J., and Broadbelt, L.J. (2017). Acceleration Strategies to

Enhance Metabolic Ensemble Modeling Performance. Biophysical journal 113, 1150-

1162.

Hackett, S.R., Zanotelli, V.R., Xu, W., Goya, J., Park, J.O., Perlman, D.H., Gibney, P.A.,

Botstein, D., Storey, J.D., and Rabinowitz, J.D. (2016). Systems-level analysis of

mechanisms regulating yeast metabolic flux. Science 354.

Hasunuma, T., Kikuyama, F., Matsuda, M., Aikawa, S., Izumi, Y., and Kondo, A. (2013).

Dynamic metabolic profiling of cyanobacterial glycogen biosynthesis under conditions of

nitrate depletion. J Exp Bot 64, 2943-2954.

Hatzimanikatis, V., and Bailey, J.E. (1997). Effects of spatiotemporal variations on metabolic

control: approximate analysis using (log)linear kinetic models. Biotechnol Bioeng 54, 91-

104.

Heijnen, J.J., and Verheijen, P.J. (2013). Parameter identification of in vivo kinetic models:

limitations and challenges. Biotechnol J 8, 768-775.

Hendry, J.I., Gopalakrishnan, S., Ungerer, J., Pakrasi, H.B., Tang, Y.J., and Maranas, C.D.

(2019). Genome-Scale Fluxome of Synechococcus elongatus UTEX 2973 Using

Transient (13)C-Labeling Data. Plant Physiol 179, 761-769.

Holms, H. (1996). Flux analysis and control of the central metabolic pathways in Escherichia

coli. FEMS microbiology reviews 19, 85-116.

Hoops, S., Sahle, S., Gauges, R., Lee, C., Pahle, J., Simus, N., Singhal, M., Xu, L., Mendes, P.,

and Kummer, U. (2006). COPASI--a COmplex PAthway SImulator. Bioinformatics 22,

3067-3074.

Hoque, M.A., Fard, A.T., Rahman, M., Alattas, O., Akazawa, K., and Merican, A.F. (2011).

Comparison of dynamic responses of cellular metabolites in Escherichia coli to pulse

addition of substrates. Biologia 66, 954.

Page 239: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

228

Hua, Q., Yang, C., Baba, T., Mori, H., and Shimizu, K. (2003). Responses of the central

metabolism in Escherichia coli to phosphoglucose isomerase and glucose-6-phosphate

dehydrogenase knockouts. Journal of bacteriology 185, 7053-7067.

Huege, J., Goetze, J., Schwarz, D., Bauwe, H., Hagemann, M., and Kopka, J. (2011). Modulation

of the major paths of carbon in photorespiratory mutants of synechocystis. PLoS One 6,

e16278.

Huege, J., Sulpice, R., Gibon, Y., Lisec, J., Koehl, K., and Kopka, J. (2007). GC-EI-TOF-MS

analysis of in vivo carbon-partitioning into soluble metabolite pools of higher plants by

monitoring isotope dilution after 13CO2 labelling. Phytochemistry 68, 2258-2272.

Ishii, N., Nakahigashi, K., Baba, T., Robert, M., Soga, T., Kanai, A., Hirasawa, T., Naba, M.,

Hirai, K., Hoque, A., et al. (2007). Multiple high-throughput analyses monitor the

response of E. coli to perturbations. Science 316, 593-597.

Jahan, N., Maeda, K., Matsuoka, Y., Sugimoto, Y., and Kurata, H. (2016). Development of an

accurate kinetic model for the central carbon metabolism of Escherichia coli. Microbial

cell factories 15, 112.

Jamshidi, N., and Palsson, B.O. (2008). Formulating genome-scale kinetic models in the post-

genome era. Mol Syst Biol 4, 171.

Jochum, C., Gasteiger, J., and Ugi, I. (1980). The Principle of Minimum Chemical Distance

(PMCD). Angewandte Chemie International Edition in English 19, 495-505.

Khodayari, A., and Maranas, C.D. (2016). A genome-scale Escherichia coli kinetic metabolic

model k-ecoli457 satisfying flux data for multiple mutant strains. Nat Commun 7, 13806.

Khodayari, A., Zomorrodi, A.R., Liao, J.C., and Maranas, C.D. (2014). A kinetic model of

Escherichia coli core metabolism satisfying multiple sets of mutant flux data. Metabolic

engineering 25, 50-62.

Kim, J., Reed, J.L., and Maravelias, C.T. (2011). Large-scale bi-level strain design approaches

and mixed-integer programming solution techniques. PLoS One 6, e24162.

Klemke, F., Baier, A., Knoop, H., Kern, R., Jablonsky, J., Beyer, G., Volkmer, T., Steuer, R.,

Lockau, W., and Hagemann, M. (2015). Identification of the light-independent

phosphoserine pathway as an additional source of serine in the cyanobacterium

Synechocystis sp. PCC 6803. Microbiology 161, 1050-1060.

Knoop, H., Zilliges, Y., Lockau, W., and Steuer, R. (2010). The metabolic network of

Synechocystis sp. PCC 6803: systemic properties of autotrophic growth. Plant Physiol

154, 410-422.

Page 240: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

229

Korner, R., and Apostolakis, J. (2008). Automatic determination of reaction mappings and

reaction center information. 1. The imaginary transition state energy approach. J Chem

Inf Model 48, 1181-1189.

Kotte, O., Zaugg, J.B., and Heinemann, M. (2010). Bacterial adaptation through distributed

sensing of metabolic fluxes. Molecular systems biology 6, 355.

Kucho, K., Okamoto, K., Tsuchiya, Y., Nomura, S., Nango, M., Kanehisa, M., and Ishiura, M.

(2005). Global analysis of circadian expression in the cyanobacterium Synechocystis sp.

strain PCC 6803. Journal of bacteriology 187, 2190-2199.

Kumar, A., and Maranas, C.D. (2014). CLCA: maximum common molecular substructure queries

within the MetRxn database. J Chem Inf Model 54, 3417-3438.

Kumar, A., Suthers, P.F., and Maranas, C.D. (2012). MetRxn: a knowledgebase of metabolites

and reactions spanning metabolic models and databases. BMC bioinformatics 13, 6.

Lafontaine Rivera, J.G., Theisen, M.K., Chen, P.W., and Liao, J.C. (2017). Kinetically accessible

yield (KAY) for redirection of metabolism to produce exo-metabolites. Metabolic

engineering 41, 144-151.

Latendresse, M., Malerich, J.P., Travers, M., and Karp, P.D. (2012). Accurate atom-mapping

computation for biochemical reactions. J Chem Inf Model 52, 2970-2982.

Leighty, R.W., and Antoniewicz, M.R. (2012). Parallel labeling experiments with [U-

13C]glucose validate E. coli metabolic network model for 13C metabolic flux analysis.

Metabolic engineering 14, 533-541.

Leighty, R.W., and Antoniewicz, M.R. (2013). COMPLETE-MFA: complementary parallel

labeling experiments technique for metabolic flux analysis. Metabolic engineering 20,

49-55.

Li, M., Yao, S., and Shimizu, K. (2007). Effect of poxB gene knockout on metabolism in

Escherichia coli based on growth characteristics and enzyme activities. World J

Microbiol Biotechnol 23, 573-580.

Liang, F., and Lindblad, P. (2016). Effects of overexpressing photosynthetic carbon flux control

enzymes in the cyanobacterium Synechocystis PCC 6803. Metabolic engineering 38, 56-

64.

Long, C.P., and Antoniewicz, M.R. (2014). Quantifying biomass composition by gas

chromatography/mass spectrometry. Analytical chemistry 86, 9423-9427.

Page 241: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

230

Long, C.P., Gonzalez, J.E., Feist, A.M., Palsson, B.O., and Antoniewicz, M.R. (2018). Dissecting

the genetic and metabolic mechanisms of adaptation to the knockout of a major metabolic

enzyme in Escherichia coli. Proc Natl Acad Sci U S A 115, 222-227.

Long, M.R., Ong, W.K., and Reed, J.L. (2015). Computational methods in metabolic engineering

for strain design. Current opinion in biotechnology 34, 135-141.

Luo, B., Groenke, K., Takors, R., Wandrey, C., and Oldiges, M. (2007). Simultaneous

determination of multiple intracellular metabolites in glycolysis, pentose phosphate

pathway and tricarboxylic acid cycle by liquid chromatography-mass spectrometry. J

Chromatogr A 1147, 153-164.

Machado, D., and Herrgard, M. (2014). Systematic evaluation of methods for integration of

transcriptomic data into constraint-based models of metabolism. PLoS Comput Biol 10,

e1003580.

Madsen, K., Nielsen, H.B., and Tingleff, O. (2004). Methods for Non-Linear Least Squares

Problems (2nd Edition). (Kongens Lyngby: Technical University of Denmark).

Mahadevan, R., and Schilling, C.H. (2003). The effects of alternate optimal solutions in

constraint-based genome-scale metabolic models. Metabolic engineering 5, 264-276.

Maurino, V.G., and Weber, A.P. (2013). Engineering photosynthesis in plants and synthetic

microorganisms. J Exp Bot 64, 743-751.

McCloskey, D., Young, J.D., Xu, S., Palsson, B.O., and Feist, A.M. (2016a). MID Max: LC-

MS/MS Method for Measuring the Precursor and Product Mass Isotopomer Distributions

of Metabolic Intermediates and Cofactors for Metabolic Flux Analysis Applications.

Analytical chemistry 88, 1362-1370.

McCloskey, D., Young, J.D., Xu, S., Palsson, B.O., and Feist, A.M. (2016b). Modeling Method

for Increased Precision and Scope of Directly Measurable Fluxes at a Genome-Scale.

Analytical chemistry 88, 3844-3852.

Metallo, C.M., Gameiro, P.A., Bell, E.L., Mattaini, K.R., Yang, J., Hiller, K., Jewell, C.M.,

Johnson, Z.R., Irvine, D.J., Guarente, L., et al. (2012). Reductive glutamine metabolism

by IDH1 mediates lipogenesis under hypoxia. Nature 481, 380-384.

Metallo, C.M., Walther, J.L., and Stephanopoulos, G. (2009). Evaluation of 13C isotopic tracers

for metabolic flux analysis in mammalian cells. Journal of biotechnology 144, 167-174.

Millard, P., Smallbone, K., and Mendes, P. (2017). Metabolic regulation is sufficient for global

and robust coordination of glucose uptake, catabolism, energy production and growth in

Escherichia coli. PLoS Comput Biol 13, e1005396.

Page 242: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

231

Miskovic, L., and Hatzimanikatis, V. (2010). Production of biofuels and biochemicals: in need of

an ORACLE. Trends Biotechnol 28, 391-397.

Moler, C., and Van Loan, C. (2003). Nineteen Dubious Ways to Compute the Exponential of a

Matrix, Twenty-Five Years Later. SIAM Review 45, 3-49.

Mollney, M., Wiechert, W., Kownatzki, D., and de Graaf, A.A. (1999). Bidirectional reaction

steps in metabolic networks: IV. Optimal design of isotopomer labeling experiments.

Biotechnology and bioengineering 66, 86-103.

Monod, J., Wyman, J., and Changeux, J.P. (1965). On the Nature of Allosteric Transitions: A

Plausible Model. J Mol Biol 12, 88-118.

Morgan, H.L. (1965). The Generation of a Unique Machine Description for Chemical Structures-

A Technique Developed at Chemical Abstracts Service. Journal of Chemical

Documentation 5, 107-113.

Murphy, T.A., Dang, C.V., and Young, J.D. (2013). Isotopically nonstationary 13C flux analysis

of Myc-induced metabolic reprogramming in B-cells. Metabolic engineering 15, 206-

217.

Murtagh, B.A., and Saunders, M.A. (1978). Large-scale linearly constrained optimization. Math.

Program. 14, 41-72.

Nakahara, K., Yamamoto, H., Miyake, C., and Yokota, A. (2003). Purification and

characterization of class-I and class-II fructose-1,6-bisphosphate aldolases from the

cyanobacterium Synechocystis sp. PCC 6803. Plant Cell Physiol 44, 326-333.

Nazem-Bokaee, H., Gopalakrishnan, S., Ferry, J.G., Wood, T.K., and Maranas, C.D. (2016).

Assessing methanotrophy and carbon fixation for biofuel production by Methanosarcina

acetivorans. Microbial cell factories 15, 10.

Neidhardt, F.C., and Curtiss, R. (1996). Escherichia coli and Salmonella : cellular and molecular

biology.

Nielsen, J. (2003). It is all about metabolic fluxes. Journal of bacteriology 185, 7031-7035.

Nogales, J., Gudmundsson, S., Knight, E.M., Palsson, B.O., and Thiele, I. (2012). Detailing the

optimality of photosynthesis in cyanobacteria through systems biology analysis. Proc

Natl Acad Sci U S A 109, 2678-2683.

Noh, K., Gronke, K., Luo, B., Takors, R., Oldiges, M., and Wiechert, W. (2007). Metabolic flux

analysis at ultra short time scale: isotopically non-stationary 13C labeling experiments.

Journal of biotechnology 129, 249-267.

Page 243: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

232

Noh, K., Wahl, A., and Wiechert, W. (2006). Computational tools for isotopically instationary

13C labeling experiments under metabolic steady state conditions. Metabolic engineering

8, 554-577.

Noh, K., and Wiechert, W. (2011). The benefits of being transient: isotope-based metabolic flux

analysis at the short time scale. Applied microbiology and biotechnology 91, 1247-1265.

Noor, E., Flamholz, A., Bar-Even, A., Davidi, D., Milo, R., and Liebermeister, W. (2016). The

Protein Cost of Metabolic Fluxes: Prediction from Enzymatic Rate Laws and Cost

Minimization. PLoS Comput Biol 12, e1005167.

O'Byrne, C.P., Feehily, C., Ham, R., and Karatzas, K.A. (2011). A modified rapid enzymatic

microtiter plate assay for the quantification of intracellular gamma-aminobutyric acid and

succinate semialdehyde in bacterial cells. J Microbiol Methods 84, 137-139.

Patil, K.R., Rocha, I., Forster, J., and Nielsen, J. (2005). Evolutionary programming as a platform

for in silico metabolic engineering. BMC Bioinformatics 6, 308.

Pazman, A. (1993). Nonlinear statistical models.

Pharkya, P., Burgard, A.P., and Maranas, C.D. (2004). OptStrain: a computational framework for

redesign of microbial production systems. Genome Res 14, 2367-2376.

Placzek, S., Schomburg, I., Chang, A., Jeske, L., Ulbrich, M., Tillack, J., and Schomburg, D.

(2017). BRENDA in 2017: new perspectives and new tools in BRENDA. Nucleic Acids

Res 45, D380-D388.

Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P. (2007a). Numerical Recipes:

The Art of Scientific Computing (3rd Edition). (Cambridge University Press).

Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P. (2007b). Numerical Recipes:

The Art of Scientific Computing (3rd Edition). (Cambridge University Press).

Ranganathan, S., Suthers, P.F., and Maranas, C.D. (2010). OptForce: an optimization procedure

for identifying all genetic manipulations leading to targeted overproductions. PLoS

Comput Biol 6, e1000744.

Ranganathan, S., Tee, T.W., Chowdhury, A., Zomorrodi, A.R., Yoon, J.M., Fu, Y., Shanks, J.V.,

and Maranas, C.D. (2012). An integrated computational and experimental study for

overproducing fatty acids in Escherichia coli. Metabolic engineering 14, 687-704.

Raue, A., Schilling, M., Bachmann, J., Matteson, A., Schelker, M., Kaschek, D., Hug, S., Kreutz,

C., Harms, B.D., Theis, F.J., et al. (2013). Lessons learned from quantitative dynamical

modeling in systems biology. PLoS One 8, e74335.

Page 244: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

233

Saa, P., and Nielsen, L.K. (2015). A general framework for thermodynamically consistent

parameterization and efficient sampling of enzymatic reactions. PLoS Comput Biol 11,

e1004195.

Saa, P.A., and Nielsen, L.K. (2017). Formulation, construction and analysis of kinetic models of

metabolism: A review of modelling frameworks. Biotechnol Adv 35, 981-1003.

Saha, R., Liu, D., Hoynes-O'Connor, A., Liberton, M., Yu, J., Bhattacharyya-Pakrasi, M.,

Balassy, A., Zhang, F., Moon, T.S., Maranas, C.D., et al. (2016). Diurnal Regulation of

Cellular Processes in the Cyanobacterium Synechocystis sp. Strain PCC 6803: Insights

from Transcriptomic, Fluxomic, and Physiological Analyses. MBio 7.

Saha, R., Verseput, A.T., Berla, B.M., Mueller, T.J., Pakrasi, H.B., and Maranas, C.D. (2012).

Reconstruction and comparison of the metabolic potential of cyanobacteria Cyanothece

sp. ATCC 51142 and Synechocystis sp. PCC 6803. PLoS One 7, e48285.

Sandberg, T.E., Long, C.P., Gonzalez, J.E., Feist, A.M., Antoniewicz, M.R., and Palsson, B.O.

(2016). Evolution of E. coli on [U-13C]Glucose Reveals a Negligible Isotopic Influence

on Metabolism and Physiology. PLoS One 11, e0151130.

Sauer, U. (2006). Metabolic networks in motion: 13C-based flux analysis. Molecular systems

biology 2, 62.

Scanlan, D.J., Sundaram, S., Newman, J., Mann, N.H., and Carr, N.G. (1995). Characterization of

a zwf mutant of Synechococcus sp. strain PCC 7942. Journal of bacteriology 177, 2550-

2553.

Schellenberger, J., Lewis, N.E., and Palsson, B.O. (2011). Elimination of thermodynamically

infeasible loops in steady-state metabolic models. Biophysical journal 100, 544-553.

Schmidt, K., Carlsen, M., Nielsen, J., and Villadsen, J. (1997). Modeling isotopomer distributions

in biochemical networks using isotopomer mapping matrices. Biotechnol Bioeng 55, 831-

840.

Schmidt, K., Nielsen, J., and Villadsen, J. (1999). Quantitative analysis of metabolic fluxes in

Escherichia coli, using two-dimensional NMR spectroscopy and complete isotopomer

models. Journal of biotechnology 71, 175-189.

Segre, D., Vitkup, D., and Church, G.M. (2002). Analysis of optimality in natural and perturbed

metabolic networks. Proc Natl Acad Sci U S A 99, 15112-15117.

Shastri, A.A., and Morgan, J.A. (2007). A transient isotopic labeling methodology for 13C

metabolic flux analysis of photoautotrophic microorganisms. Phytochemistry 68, 2302-

2312.

Page 245: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

234

Shimizu, K. (2004). Metabolic flux analysis based on 13C-labeling experiments and integration

of the information with gene and protein expression patterns. Advances in biochemical

engineering/biotechnology 91, 1-49.

Srinivasan, S., Cluett, W.R., and Mahadevan, R. (2018). Model-based design of bistable cell

factories for metabolic engineering. Bioinformatics 34, 1363-1371.

Steinhauser, D., Fernie, A.R., and Araujo, W.L. (2012). Unusual cyanobacterial TCA cycles: not

broken just different. Trends Plant Sci 17, 503-509.

Stovicek, V., Holkenbrink, C., and Borodina, I. (2017). CRISPR/Cas system for yeast genome

engineering: advances and applications. FEMS Yeast Res 17.

Suastegui, M., Yu Ng, C., Chowdhury, A., Sun, W., Cao, M., House, E., Maranas, C.D., and

Shao, Z. (2017). Multilevel engineering of the upstream module of aromatic amino acid

biosynthesis in Saccharomyces cerevisiae for high production of polymer and drug

precursors. Metabolic engineering 42, 134-144.

Suss, K.H., Arkona, C., Manteuffel, R., and Adler, K. (1993). Calvin cycle multienzyme

complexes are bound to chloroplast thylakoid membranes of higher plants in situ. Proc

Natl Acad Sci U S A 90, 5514-5518.

Suthers, P.F., Burgard, A.P., Dasika, M.S., Nowroozi, F., Van Dien, S., Keasling, J.D., and

Maranas, C.D. (2007). Metabolic flux elucidation for large-scale models using 13C

labeled isotopes. Metabolic engineering 9, 387-405.

Takabayashi, A., Kadoya, R., Kuwano, M., Kurihara, K., Ito, H., Tanaka, R., and Tanaka, A.

(2013). Protein co-migration database (PCoM -DB) for Arabidopsis thylakoids and

Synechocystis cells. Springerplus 2, 148.

Tanabe, M., and Kanehisa, M. (2012). Using the KEGG database resource. Curr Protoc

Bioinformatics Chapter 1, Unit1 12.

Tang, Y.J., Martin, H.G., Myers, S., Rodriguez, S., Baidoo, E.E., and Keasling, J.D. (2009).

Advances in analysis of microbial metabolic fluxes via (13)C isotopic labeling. Mass

Spectrom Rev 28, 362-375.

Tepper, N., and Shlomi, T. (2010). Predicting metabolic engineering knockout strategies for

chemical production: accounting for competing pathways. Bioinformatics 26, 536-543.

Teusink, B., Passarge, J., Reijenga, C.A., Esgalhado, E., van der Weijden, C.C., Schepper, M.,

Walsh, M.C., Bakker, B.M., van Dam, K., Westerhoff, H.V., et al. (2000). Can yeast

glycolysis be understood in terms of in vitro kinetics of the constituent enzymes? Testing

biochemistry. Eur J Biochem 267, 5313-5329.

Page 246: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

235

Thiel, K., Vuorio, E., Aro, E.M., and Kallio, P.T. (2017). The effect of enhanced acetate influx on

Synechocystis sp. PCC 6803 metabolism. Microbial cell factories 16, 21.

Tian, M., and Reed, J.L. (2018). Integrating proteomic or transcriptomic data into metabolic

models using linear bound flux balance analysis. Bioinformatics 34, 3882-3888.

Tran, L.M., Rizk, M.L., and Liao, J.C. (2008). Ensemble modeling of metabolic networks.

Biophysical journal 95, 5606-5617.

Usui, Y., Hirasawa, T., Furusawa, C., Shirai, T., Yamamoto, N., Mori, H., and Shimizu, H.

(2012). Investigating the effects of perturbations to pgi and eno gene expression on

central carbon metabolism in Escherichia coli using (13)C metabolic flux analysis.

Microbial cell factories 11, 87.

van Eunen, K., Kiewiet, J.A., Westerhoff, H.V., and Bakker, B.M. (2012). Testing biochemistry

revisited: how in vivo metabolism can be understood from in vitro enzyme kinetics. PLoS

Comput Biol 8, e1002483.

van Gulik, W.M., and Heijnen, J.J. (1995). A metabolic network stoichiometry analysis of

microbial growth and product formation. Biotechnol Bioeng 48, 681-698.

Varma, A., and Palsson, B.O. (1994). Stoichiometric flux balance models quantitatively predict

growth and metabolic by-product secretion in wild-type Escherichia coli W3110. Applied

and environmental microbiology 60, 3724-3731.

Varman, A.M., Yu, Y., You, L., and Tang, Y.J. (2013). Photoautotrophic production of D-lactic

acid in an engineered cyanobacterium. Microbial cell factories 12, 117.

Waltz, R.A., Morales, J.L., Nocedal, J., and Orban, D. (2006). An interior algorithm for nonlinear

optimization that combines line search and trust region steps. Math. Program. 107, 391-

408.

Weininger, D., Weininger, A., and Weininger, J.L. (1989). SMILES. 2. Algorithm for generation

of unique SMILES notation. Journal of Chemical Information and Computer Sciences 29,

97-101.

Wiechert, W., and de Graaf, A.A. (1996). In vivo stationary flux analysis by 13C labeling

experiments. Advances in biochemical engineering/biotechnology 54, 109-154.

Wiechert, W., Mollney, M., Isermann, N., Wurzel, M., and de Graaf, A.A. (1999). Bidirectional

reaction steps in metabolic networks: III. Explicit solution and analysis of isotopomer

labeling systems. Biotechnol Bioeng 66, 69-85.

Page 247: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

236

Wiechert, W., Siefke, C., de Graaf, A.A., and Marx, A. (1997). Bidirectional reaction steps in

metabolic networks: II. Flux estimation and statistical analysis. Biotechnology and

bioengineering 55, 118-135.

Wiechert, W., and Wurzel, M. (2001). Metabolic isotopomer labeling systems. Part I: global

dynamic behavior. Math Biosci 169, 173-205.

Wittig, U., Kania, R., Golebiewski, M., Rey, M., Shi, L., Jong, L., Algaa, E., Weidemann, A.,

Sauer-Danzwith, H., Mir, S., et al. (2012). SABIO-RK--database for biochemical reaction

kinetics. Nucleic Acids Res 40, D790-796.

Xiong, W., Lo, J., Chou, K.J., Wu, C., Magnusson, L., Dong, T., and Maness, P. (2018). Isotope-

Assisted Metabolite Analysis Sheds Light on Central Carbon Metabolism of a Model

Cellulolytic Bacterium Clostridium thermocellum. Front Microbiol 9, 1947.

Xiong, W., Morgan, J.A., Ungerer, J., Wang, B., Maness, P.-C., and Yu, J. (2015). The plasticity

of cyanobacterial metabolism supports direct CO2 conversion to ethylene. Nature Plants

1, 15053.

Xu, H., Andi, B., Qian, J., West, A.H., and Cook, P.F. (2006). The alpha-aminoadipate pathway

for lysine biosynthesis in fungi. Cell Biochem Biophys 46, 43-64.

Xu, P., Li, L., Zhang, F., Stephanopoulos, G., and Koffas, M. (2014). Improving fatty acids

production by engineering dynamic pathway regulation and metabolic control. Proc Natl

Acad Sci U S A 111, 11299-11304.

Xu, P., Ranganathan, S., Fowler, Z.L., Maranas, C.D., and Koffas, M.A. (2011). Genome-scale

metabolic network modeling results in minimal interventions that cooperatively force

carbon flux towards malonyl-CoA. Metabolic engineering 13, 578-587.

Yan, C., and Xu, X. (2008). Bifunctional enzyme FBPase/SBPase is essential for

photoautotrophic growth in cyanobacterium Synechocystis sp. PCC 6803. Progress in

Natural Science 18, 149-153.

Yang, C., Hua, Q., and Shimizu, K. (2002a). Integration of the information from gene expression

and metabolic fluxes for the analysis of the regulatory mechanisms in Synechocystis.

Applied microbiology and biotechnology 58, 813-822.

Yang, C., Hua, Q., and Shimizu, K. (2002b). Metabolic flux analysis in Synechocystis using

isotope distribution from 13C-labeled glucose. Metabolic engineering 4, 202-216.

Yang, C., Hua, Q., and Shimizu, K. (2002c). Quantitative analysis of intracellular metabolic

fluxes using GC-MS and two-dimensional NMR spectroscopy. Journal of bioscience and

bioengineering 93, 78-87.

Page 248: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

237

Yoo, H., Antoniewicz, M.R., Stephanopoulos, G., and Kelleher, J.K. (2008). Quantifying

reductive carboxylation flux of glutamine to lipid in a brown adipocyte cell line. The

Journal of biological chemistry 283, 20621-20627.

You, L., Berla, B., He, L., Pakrasi, H.B., and Tang, Y.J. (2014). 13C-MFA delineates the

photomixotrophic metabolism of Synechocystis sp. PCC 6803 under light- and carbon-

sufficient conditions. Biotechnol J 9, 684-692.

Young, J.D., Shastri, A.A., Stephanopoulos, G., and Morgan, J.A. (2011). Mapping

photoautotrophic metabolism with isotopically nonstationary (13)C flux analysis.

Metabolic engineering 13, 656-665.

Young, J.D., Walther, J.L., Antoniewicz, M.R., Yoo, H., and Stephanopoulos, G. (2008). An

elementary metabolite unit (EMU) based method of isotopically nonstationary flux

analysis. Biotechnol Bioeng 99, 686-699.

Yu, Y., You, L., Liu, D., Hollinshead, W., Tang, Y.J., and Zhang, F. (2013). Development of

Synechocystis sp. PCC 6803 as a phototrophic cell factory. Mar Drugs 11, 2894-2916.

Zhang, S., and Bryant, D.A. (2011). The tricarboxylic acid cycle in cyanobacteria. Science 334,

1551-1553.

Zhao, J., and Shimizu, K. (2003). Metabolic flux analysis of Escherichia coli K12 grown on 13C-

labeled acetate and glucose using GC-MS and powerful flux calculation method. Journal

of biotechnology 101, 101-117.

Zomorrodi, A.R., Lafontaine Rivera, J.G., Liao, J.C., and Maranas, C.D. (2013). Optimization-

driven identification of genetic perturbations accelerates the convergence of model

parameters in ensemble modeling of metabolic networks. Biotechnol J 8, 1090-1104.

Zomorrodi, A.R., and Maranas, C.D. (2010). Improving the iMM904 S. cerevisiae metabolic

model using essentiality and synthetic lethality data. BMC systems biology 4, 178.

Zomorrodi, A.R., Suthers, P.F., Ranganathan, S., and Maranas, C.D. (2012). Mathematical

optimization applications in metabolic networks. Metabolic engineering 14, 672-686.

Zupke, C., and Stephanopoulos, G. (1994). Modeling of Isotope Distributions and Intracellular

Fluxes in Metabolic Networks Using Atom Mapping Matrixes. Biotechnology progress

10, 489-498.

Page 249: NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …

VITA

SARATRAM GOPALAKRISHNAN

EDUCATION

The Pennsylvania State University Sep 2013 -

Mar 2019 PhD in Chemical Engineering

Johns Hopkins University Sep 2011 -

May 2013 MSE in Chemical and Biomolecular Engineering

Manipal University Aug 2007 -

May 2011 BE in Biotechnology

HONORS AND AWARDS

1. McWhirter Graduate Fellowship, The Pennsylvania State University, 2013

2. Best Candidacy Award, McWhirter Graduate Research Symposium, Sep 2014

3. Best Paper Award, McWhirter Graduate Research Symposium, Sep 2016

SELECT PUBLICATIONS

1. Gopalakrishnan, S., & Maranas, C. D. (2015a). 13C metabolic flux analysis at a

genome-scale. Metab Eng, 32, 12-22.

2. Soo, V. W., McAnulty, M. J., Tripathi, A., Zhu, F., Zhang, L., Hatzakis, E., . . ,

Gopalakrishnan, S., . . . Wood, T. K. (2016). Reversing methanogenesis to capture

methane for liquid biofuel precursors. Microb Cell Fact, 15(1), 11.

3. Nazem-Bokaee, H., Gopalakrishnan, S., Ferry, J. G., Wood, T. K., & Maranas, C.

D. (2016). Assessing methanotrophy and carbon fixation for biofuel production by

Methanosarcina acetivorans. Microb Cell Fact, 15(1), 10.

4. Abernathy, M. H., Yu, J., Ma, F., Liberton, M., Ungerer, J., Hollinshead, W. D., . .,

Gopalakrishnan, S., . . . Tang, Y. J. (2017). Deciphering cyanobacterial phenotypes

for fast photoautotrophic growth via isotopically nonstationary metabolic flux

analysis. Biotechnol Biofuels, 10, 273.

5. Gopalakrishnan, S., Pakrasi, H. B., & Maranas, C. D. (2018). Elucidation of

photoautotrophic carbon flux topology in Synechocystis PCC 6803 using genome-

scale carbon mapping models. Metab Eng, 47, 190-199.

6. Hendry, J.I., Gopalakrishnan, S., Ungerer, J., Pakrasi, H.B., Tang, Y.J., and

Maranas, C.D. (2019). Genome-Scale Fluxome of Synechococcus elongatus UTEX

2973 Using Transient (13)C-Labeling Data. Plant Physiol 179, 761-769