model building and model checking for biochemical processes · biochemical reactions can be modeled...

16
INTRODUCTION Recent advances in genomics have made it possible for the first time for a biologist to access enormous amounts of information for a number of organisms, including human, mouse, arabidopsis, fruit fly, yeast, and Escherichia coli. These developments are at the heart of the many renewed ambitious attempts by biologists to understand the functional roles of a group of genes using powerful computa- tional models and high-throughput microbio- logic protocols. The emerging fields of system biology, and its sister field of bioinformatics, focus on creating a finely detailed and “mecha- nistic” picture of biology at the cellular level by Model Building and Model Checking for Biochemical Processes Marco Antoniotti, 1 Alberto Policriti, 2 Nadia Ugel, 1 and Bud Mishra *,1,3 1 Courant Institute of Mathematical Sciences, New York University, NY; 2 Università di Udine, Italy; 3 Cold Spring Harbor Laboratory, Long Island, NY Abstract A central claim of computational systems biology is that, by drawing on mathematical approaches developed in the context of dynamic systems, kinetic analysis, computational theory and logic, it is possible to create powerful simulation, analysis, and reasoning tools for working biologists to deci- pher existing data, devise new experiments, and ultimately to understand functional properties of genomes, proteomes, cells, organs, and organisms. In this article, a novel computational tool is described that achieves many of the goals of this new discipline. The novelty of this system involves an automaton-based semantics of the temporal evolution of complex biochemical reactions starting from the representation given as a set of differential equations. The related tools also provide ability to qualitatively reason about the systems using a propositional temporal logic that can express an ordered sequence of events succinctly and unambiguously. The implementation of mathematical and computational models in the Simpathica and XSSYS systems is described briefly. Several example applications of these systems to cellular and biochemical processes are presented: the two most prominent are Leibler et al.’s repressilator (an artificial synthesized oscillatory network), and Curto- Voit-Sorribas-Cascante’s purine metabolism reaction model. Index Entries: Biochemical reactions; biological models; cellular processes; model building; model checking; purine metabolism; repressilator; temporal logic. *Author to whom all correspondence and reprint requests should be addressed. E-mail: [email protected] ORIGINAL ARTICLE © Copyright 2003 by Humana Press Inc. All rights of any nature whatsoever reserved. 1085-9195/03/38/271–286/$20.00 Cell Biochemistry and Biophysics 271 Volume 38, 2003

Upload: others

Post on 23-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Model Building and Model Checking for Biochemical Processes · Biochemical reactions can be modeled with sets of differential equations. The classical Michaelis-Menten’s formulation

INTRODUCTION

Recent advances in genomics have made itpossible for the first time for a biologist toaccess enormous amounts of information for anumber of organisms, including human,

mouse, arabidopsis, fruit fly, yeast, andEscherichia coli. These developments are at theheart of the many renewed ambitious attemptsby biologists to understand the functional rolesof a group of genes using powerful computa-tional models and high-throughput microbio-logic protocols. The emerging fields of systembiology, and its sister field of bioinformatics,focus on creating a finely detailed and “mecha-nistic” picture of biology at the cellular level by

Model Building and Model Checking for Biochemical Processes

Marco Antoniotti,1 Alberto Policriti,2 Nadia Ugel,1and Bud Mishra*,1,3

1Courant Institute of Mathematical Sciences, New York University, NY; 2Università di Udine, Italy;3Cold Spring Harbor Laboratory, Long Island, NY

Abstract

A central claim of computational systems biology is that, by drawing on mathematical approachesdeveloped in the context of dynamic systems, kinetic analysis, computational theory and logic, it ispossible to create powerful simulation, analysis, and reasoning tools for working biologists to deci-pher existing data, devise new experiments, and ultimately to understand functional properties ofgenomes, proteomes, cells, organs, and organisms. In this article, a novel computational tool isdescribed that achieves many of the goals of this new discipline. The novelty of this system involvesan automaton-based semantics of the temporal evolution of complex biochemical reactions startingfrom the representation given as a set of differential equations. The related tools also provide abilityto qualitatively reason about the systems using a propositional temporal logic that can express anordered sequence of events succinctly and unambiguously. The implementation of mathematical andcomputational models in the Simpathica and XSSYS systems is described briefly. Several exampleapplications of these systems to cellular and biochemical processes are presented: the two mostprominent are Leibler et al.’s repressilator (an artificial synthesized oscillatory network), and Curto-Voit-Sorribas-Cascante’s purine metabolism reaction model.

Index Entries: Biochemical reactions; biological models; cellular processes; model building;model checking; purine metabolism; repressilator; temporal logic.

*Author to whom all correspondence and reprintrequests should be addressed. E-mail: [email protected]

ORIGINAL ARTICLE

© Copyright 2003 by Humana Press Inc.All rights of any nature whatsoever reserved.1085-9195/03/38/271–286/$20.00

Cell Biochemistry and Biophysics 271 Volume 38, 2003

Page 2: Model Building and Model Checking for Biochemical Processes · Biochemical reactions can be modeled with sets of differential equations. The classical Michaelis-Menten’s formulation

combining the part-lists (genes, regulatorysequences, other objects from an annotatedgenome, and known metabolic pathways),with observations of transcriptional states of acell (using microarrays) and translational statesof the cell (using proteomics tools). In theprocess, it has become evident that the mathe-matical foundation of these systems needs tobe explored accurately and that their computa-tional models be implemented in softwarepackages faithfully while exploiting the poten-tial trade-offs among usability, accuracy, andscalability dealing with large amounts of data.

Several biological and biochemical mecha-nisms can be modeled with relatively simple sets

of differential algebraic equations (DAE). Thenumerical solution to these differential equationsprovides a potentially powerful and effectiveinvestigative tool for biologists and biochemists.In this article, we demonstrate the power of anovel computational tool with the ability toquery massive sets of numerical data obtainedfrom in silico experiments on complex biologicalsystems. The computational tool derives itsexpressiveness, flexibility, and power by inte-grating many commonly available tools fromnumerical analysis, symbolic computation, tem-poral logic, model checking, and visualization.

In this article, we describe XS-systems—anew computational model extending the basic

272 Antoniotti et al.

Cell Biochemistry and Biophysics Volume 38, 2003

Fig. 1. The Simpathica Main Window. The system being analyzed is the “repressilator” circuit (1).The upper left frame contains a list of the reactants. The upper right frame is used to insert differ-ent kinds of reactions. The lower left frame contains a list of known reactions. Finally, the lowerright frame contains a depiction of the reactions’ network.

Page 3: Model Building and Model Checking for Biochemical Processes · Biochemical reactions can be modeled with sets of differential equations. The classical Michaelis-Menten’s formulation

foundations provided by the “S-systems mod-els of biochemical processes.” The main innov-ative extension provided by the XS-systeminvolves an automaton-based semantics of thetemporal evolution of complex biochemical reac-tions starting from the representation as a set ofdifferential equations. The implementation ofour mathematical and computational modelsin the Simpathica and XSSYS systems will bedescribed briefly (see Figs. 1 and 2 for screen-shots of the two systems). However, a detaileddiscussion of the underlying mathematicalfoundations will be omitted to keep the articleaccessible to a wider readership. The mainemphasis of the article will be biological appli-cations illustrating how we envision Simpathicaand XSSYS in practice. The work described inthis article is part of a much larger project, still

in progress and thus only provides a partialand evolving picture of a new paradigm forcomputational biology.

The remainder of the article is organized asfollows: The second section describes our mod-els of biological experiments and how thesemodels can be interpreted in terms of anautomaton whose structure is determined byboth numerical and analytic solutions of a setof “parameterized” differential equations. Thissection also contains the description of a tem-poral logic language for expressing and verify-ing properties of XS-systems together with aprototype implementation. The third sectioncontains the description of a simple system (therepressilator [1]) with a “walk through” ofSimpathica. The fourth section contains a morecomplex biological example and demonstrates

Model Building and Checking for Biochemical Processes 273

Cell Biochemistry and Biophysics Volume 38, 2003

Fig. 2. The main XSSYS view. The core XSSYS interface is on the left. The left frame contains a listof the “loaded” traces (obtained from simulators such as the Simpathica/Octave engine or PLAS[2]). The right frame is used to type in various temporal logic queries and to check the results. Theright window is simply a plotting application (PtPlot from UC Berkeley), which we use for visual-ization purposes.

Page 4: Model Building and Model Checking for Biochemical Processes · Biochemical reactions can be modeled with sets of differential equations. The classical Michaelis-Menten’s formulation

how the computational tools of this paper areapplied, and the fifth section concludes thepaper.

EXPERIMENTS AND SIMULATION

Imagine a computational biologist about toperform simulations of complex biochemicalpathways in conjunction with related experi-mental data collection. The researcher will oftenmodel the underlying biological and biochemi-cal mechanisms with sets of relatively simpleDAE, each one representing a reversible chemi-cal reaction, a degradation process, a synthesisprocess, or a reaction modulated by an enzymeor a co-enzyme. The numerical solutions tothese differential equations and the time-series“tracing” the evolutions of an RNA transcript,protein, or a lipid, etc., provide the basic ingre-dients for data interpretation, data validation,and hypothesis formation and falsification.

As the model complexity of the biologicalsystems increases, the sets of numerical tracesbecome increasingly difficult to interpret andthe traditional biological reasoning processfails to scale beyond a handful of genes and rel-atively small and coarsely modeled pathways.To cope with this problem, we propose a novelapproach that first summarizes the numericaltraces into an automaton with distinguishablebiologic states and a deterministic set of rulesof transition from state-to-state and finally,checks the automaton model for its ability tosatisfy various temporal logic statements.

Our starting reference point is the classical S-systems (2,3) and the idea that a natural comple-tion for that approach would be an automatonsummarizing the states along which the simu-lated biochemical system evolves in time. Theautomaton, so generated, allows the user toview, manipulate, and reason, using a well-integrated set of tools.

As an example (to be explored further), con-sider the case study of purine metabolism (2).This case study, as many others, illustrates thefact that the “right” cellular behavior is oftendifficult to capture with an initial abstraction

model. Often it is necessary to improve themodel in many successive iterations, each stepinvolving a more accurate estimation of somemodel parameters; identification and elimina-tion of some “structural” problems in the set ofequations; or incorporation of some new unex-pected insight obtained by closer examinationof an intermediate model. Our thesis is thatrapid successive refinement of a model is facil-itated through the model checking algorithmsand the underlying formalisms provided bytemporal logic formulae are capable of identi-fying the missing features in a partial/incom-plete model.

MATHEMATICAL MODELS,DIFFERENTIAL EQUATIONS, AND CANONICAL FORMS

Biochemical reactions can be modeled withsets of differential equations. The classicalMichaelis-Menten’s formulation of reactionspeed is essentially differential equations for therate of change of the product of an enzymaticreaction. The parameters of such equation arethe constants Km (Michaelis-Menten constant)and Vmax (maximum velocity of a reaction). AnS-system is simply a set of ordinary differentialequations where each equation appears in aparticularly simple form and models the rate ofchange in the concentration of a product (orsubstrate) in terms of its synthesis from anddegradation into other reactants.

Canonical Forms

A set of differential equations can be rewrit-ten (recast) in special canonical forms by purelyalgebraic transformations and further inclu-sions of a set of algebraic constraint equations.S-systems are sets of ODEs in canonical form.Canonical forms have several advantages overmore general forms of equations, because theycan be more easily manipulated, integrated,and interpreted in mathematical terms.

The extended S-system model we use—XS-systems—has a simple canonical form. An XS-

274 Antoniotti et al.

Cell Biochemistry and Biophysics Volume 38, 2003

Page 5: Model Building and Model Checking for Biochemical Processes · Biochemical reactions can be modeled with sets of differential equations. The classical Michaelis-Menten’s formulation

system is simply a list of expressions describ-ing the rate of change of a given quantity in amodel (say the concentration of a compound),plus a set of equations describing some con-straints on the relationships among some of theparameters characterizing the model. Theseconstraints may or may not be needed for thecomplete model to be meaningful. Their pres-ence depends on the system under examina-tion. Each of the expressions describing a ratehas a very simple form as well: it is simply adifference between two algebraic power-prod-ucts (or monomials) one representing synthesisand the other, dissociation (i.e., it is a S-sys-tem). More formally, we have the following.

DEFINITION 1 An XS-system is defined by a set of pairs of equa-

tions (a rate equation and a constraint equation)

with index variables, i ranging from 1 to n, and j,from 1 to k. This formalism describes an XS-systemwith n equations and k constraints.

An XS-system can be interpreted as the rep-resentation of a set of flows of reactants withina network of reactions (2) and thus describeshow to translate a graphical rendition of suchreaction networks into the classical S-system.Our XS-system formulation naturally capturesthese steps in a computer-assisted translation,which had been traditionally carried out by amanual manipulation (2).

The XS-system formulation makes one moredistinction between dependent and independentvariables. Independent variables representenvironmental conditions that influence thebehavior of the system but do not influencethemselves in return. Dependent variables areall the others. Of course, to complete thedescription of the system it is necessary tospecify all the rate constants (α’s and β’s) and

the kinetic orders (g’s, h’s, and c’s) of each equa-tion and constraint.

Characterizing the Behavior of the SystemAcross Different “States”

Although the dependent and independentvariables of an S-system give a quantitativedescription of the reactants (substrates, prod-ucts, enzymes, etc.) involved in the experiment,a tool for the qualitative analysis of the systemis still missing. We have developed the theoret-ical framework for such a tool and provided afirst implementation in the XSSYS system.

We note that the data we can count on in thedevelopment of our tool are not only numerical(i.e., the solution of the power-law differentialequations defining the S-system) but also logi-cal, appearing in the form of constraints, knownto the biologist modeling an experiment, andrelating values of the substances involved.

A simple and natural example of a logicalproperty of many biological systems is the onedescribing the existence of a steady-state.Informally, a system is in a “steady state” whennothing “changes” in the system as timepasses. More formally, for the purpose of ourdiscussion, the “steady state” is reached whenall the first derivatives of the functions describ-ing dependent variables become equal to zeroand do not change in the future. The softwaretool can check this event by solving an alge-braic problem to detect the existence of a com-mon root (this operation may be ratherinvolved, see [4]).

Very often the biologist not only knows thatin the absence of external stimuli, such a statemust be reached sooner or later, but also knowswhat are (at least) the relative values of sub-stances involved in such a state. Another nat-ural property involves boundedness of thereactant concentrations involved in a biologicalprocess and may need to be ascertained as aprecondition to other interesting propertiessuch as existence of a limit-cycle or steady-statebehavior.

The key to manipulate “qualitatively” thesenotions (e.g., the notion of steady-state) lies in

X = X X X X X X

a X X X a X X X

a X X X

ig1i g2i

ngni

ih1i h2i

nhni

1jc j c j

ncn1j

2jc j c j

ncn2j

mjc mj c mj

ncnmj

α β1 2 1 2

111

221

112

222

11

22 0

L L

L L

L L

( ) − ( )

+

+ +

=

Model Building and Checking for Biochemical Processes 275

Cell Biochemistry and Biophysics Volume 38, 2003

Page 6: Model Building and Model Checking for Biochemical Processes · Biochemical reactions can be modeled with sets of differential equations. The classical Michaelis-Menten’s formulation

grouping each instant in time and the corre-sponding values of each variable into “states.”In the simplest case we simply denote eachinstant in time (our sampled time) as associatedto a “state.” More interesting states (with deteri-orating computational efficiency implications)are constructed by grouping several timeinstants according to some simple rules, e.g., alinearization rule that groups states, if their ratesof change are within a user defined parameter.

Such construction yields an “automaton,” acommon abstraction tool used in computer sci-ence and other engineering disciplines. Theautomaton we construct allows us to capturequalitative features of a biological system, e.g.,the notion of a steady state. Note that the con-struction of the automaton will eventually behidden from the user. The automaton serves asa tool to answer queries about the evolution ofthe system, and its construction is provided bythe software tool.

DEFINITION 2 Given an S-system S, the S-system automaton

AS describes a set of qualitative states, S, of the sys-tem together with rules of state-transitions, ∆.More formally, an AS associated with S is a 4-tupleAS = (S, ∆, S0, F), where S ⊆ D1 × ⋅ ⋅ ⋅ × Dn+m is a(finite or infinite) set of states, ∆ ⊆ S × S is the tran-sition relation, and S0, F ⊆ S are the initial and finalstates, respectively.

The key idea is that the values of the depen-dent and independent variables uniquely char-acterize the state of the system and that thecollection of such values, together with a rela-tion governing the possible transitions, consti-tutes the automaton qualitatively describingthe behavior of the system.

The subsequent steps consist in providingthe biologist with a language to impose con-straints on qualitative features of the systemunder study. To this end, we introduce a notionthat renders the temporal evolution of the sys-tem in terms of a trace.

DEFINITION 3A trace of an S-system automaton AS is a (finite

or infinite) sequence s0, s1, . . . sn, . . . such that s0 is

an initial state (s0 ∈ S0) and there is a transitionrule allowing the automaton to enter si+1 from si,(∆(si, si+1) holds) for all i ≥ 0.

A trace can also be defined as:

trace(AS) = ⟨⟨X1(t) . . .Xn+m(t)⟩ | t ∈{t0 + k step : k ≥ 0}⟩,

is called the trace of AE.

A trace of an S-system automaton is there-fore a sequence of arrays of values that allowsa complete description of the dynamics of thereactants within a fixed time [t0, t]. The preci-sion of the description is parametric in thevalue of the step variable: the smaller the stepthe higher the precision.

The following definition is used to “focus”the automaton on certain dependent valuesdetermined by a fixed subset of variables—these are the variables used explicitly or implic-itly in the description of a qualitative propertyor needed by the quantitative analysis.

DEFINITION 4 Given any set of variables U ⊆ {X1, . . . ,Xn+m},

the sequence:

trace(AS|U) = ⟨⟨Xi(t) | Xi ∈ U⟩ : t ∈{t0 + k step : k ≥ 0}⟩,

is called the trace of U. If U consists of a single variable Xi the trace is

called the trace of Xi.

Notice that a single trace of the automatonresults when only one “set-up” (e.g., initialconditions, a set of values for the parameters, aset of signaling events, etc.) for the system isbeing considered. To allow more than onetrace, it is necessary to consider different “set-ups,” e.g., many possible values for the para-meters, such as rate constants and kineticorders. If multiple traces are available, they canbe combined within a single model containingdifferent possible evolutions of the system: avery common situation easily handled by anybranching time temporal logic, which allows aconceptual clock to split intermittently tomodel simultaneous evolutions in many possi-

276 Antoniotti et al.

Cell Biochemistry and Biophysics Volume 38, 2003

Page 7: Model Building and Model Checking for Biochemical Processes · Biochemical reactions can be modeled with sets of differential equations. The classical Michaelis-Menten’s formulation

ble worlds. Of course, the question of howmany of these traces are necessary to assess thevalidity of the model is still open, we envisionour tools to be used in an iterative fashion, con-verging to a “good” model of the system underinvestigation.

Regardless of whether one single trace ormany of them are combined into a uniquemodel, the number of resulting qualitativestates as defined previously can be prohibi-tively large and may ultimately be redundantfor a qualitative analysis. To optimize the com-putational efficiency of the quantitative analy-sis, we have implemented a collapse operationcombining those states that are indistinguish-able as the numerical values characterizingthem are same or only differ by an impercepti-bly small amount.

A distinct feature of the collapsing operationsis that they can be interleaved with the phaseperforming the numerical computation of theapproximate solutions of the power-law differ-ential equations. This feature guarantees thatthe collapse does not impose a heavy computa-tional burden on the entire system, during theprocess of producing a more genuine temporallogic model (the automaton) for the S-system.

As a consequence, once this automaton hasbeen constructed, it provides many differentavenues for performing a qualitative analysison the temporal evolution of the S-system, andstudying in parallel (within a single structure)multiple evolutions and experiments differingin rate constants and kinetic orders, e.g., wild-types and many mutants.

Note that the automaton proposed here arenot necessarily unique and one may considermore complex automata with different seman-tics and amenable to different logical analysis.For instance, a timed automaton (with quanti-tative temporal information or with constraintlabeling the transitions) could be produced atno additional cost during the same numericalcomputation. Currently, other variant auto-mata are under active investigation and weanticipate several more novel model automatato be incorporated into our final tool. In partic-ular we are investigating how to consider sev-

eral traces at a time and how to build incre-mentally a Hybrid Automata without loss ofinformation: this will address the issues ofexhaustivity that must be considered with careto assess the overall validity of a model.

TEMPORAL LOGIC

Temporal logic (TL) (5,6) has been studied indepth in the context of systems whose behav-ior change in time, for instance, computerhardware, network protocols, and engineeringsystems. We omit a detailed introduction toany or all of many specific TLs that have beenintroduced. Instead we concentrate on themain ideas at the core of these TLs to providethe intuition about how it can be used in theanalysis of biochemical systems.

Fundamental to a temporal logic is thenotion that time-dependent terms from naturallanguage, such as “eventually” and “always,”can be given a precise meaning (semantics) interms of the abstract behavior of a systemunder discourse. As an example, consider thefollowing sentence:

The concentration of guanosine triphosphate(GTP) is equal to x.

Such a sentence is true only in certain cir-cumstances. Given a biologic system in equi-librium the above sentence may or may not betrue at any or all instants of time. In particular,we can easily construct sentences (in a suitablenatural language) that express the fact that,given a certain set of initial conditions, theabove sentence will eventually hold true.Temporal logic precisely formalizes the mean-ing of the adverb eventually (and other such“modes”: always, infinitely often, and almostalways) and the resulting semantics lead to aprecise model-checking algorithm for deter-mining the validity of TL sentences in the con-text of an automaton.

This particular attribute of TL is very impor-tant because it concisely captures the notion ofa logical property like “steady-state” and for-malizes this notion in a simple consistent way

Model Building and Checking for Biochemical Processes 277

Cell Biochemistry and Biophysics Volume 38, 2003

Page 8: Model Building and Model Checking for Biochemical Processes · Biochemical reactions can be modeled with sets of differential equations. The classical Michaelis-Menten’s formulation

that is directly handled by the model-checkingalgorithm.

Consider a system M and a (simulation)trace trace(M). If we consider a state s intrace(M), we can simply check if all the firstderivatives in s are 0. Suppose we have a pro-cedure that answers yes (or no) when this is thecase. Let us call this predicate, zero_derivative.Suppose that there actually is a state s’ intrace(M) where zero_derivative yields yes.Now, by the rules of TL, the following state-ment would be true:

Eventually(zero_derivative)

for each instant from the start, at least up untilthe instant characterized as state s’.

Now we can expand the language of TL andintroduce a new predicate “steady state” to bea synonym of the following concept: thereexists an instant (a state s’ in trace(M)) afterwhich zero_derivative will always be true. Moreformally,

steady_state(M)

is defined to be logically equivalent to the fol-lowing:

Eventually(Always(zero_derivative))

meaning that, when we consider the simulation(or in vivo) trace of the system there will be atime where all the rates of change of the sys-tem’s variables reach 0 and remain at that value.

Alternatively, we could be more selectiveand ask whether some specific variable reachesthe steady state. We can determine the answeras a result of the Definition 4.

steady_state(M, GTP).

Another set of properties that we may wantto express (and subsequently check) is the oneinvolving “persistence.” In other words, prop-erties of the form: something is always true (orfalse). For instance, we could ask whether in agiven system

Always (GTP > k).

Thus, we query whether the GTP levelalways remains greater than k, independent of

other changes occurring during the evolutionof the system.

How to Translate a Statement in English into TL

The previous discussion illustrates the mainideas needed to translate an English sentenceinvolving temporal claims into a query in TL.The translation from English to TL is ratherstraightforward. Simple conjunctions (“and”s),disjunctions (“or”s), and negations (“not”s)can be expressed directly. The correspondingpropositional logic is then augmented withtemporal modes: Always and Eventually.

Now, suppose we wish to determine if (1)our system reaches a steady state, and (2) thelevel of GTP is less than k after a certaininstant. This statement is simply expressed inTL as

steady_state and Eventually(Always[GTP < k]). (a)

Note that the validity of the previous state-ment is completely determined by the two con-stituent sub-expressions. Furthermore, thetruth property of the statement requires exam-ining the entire system trace, becausesteady_state is a “global” property, and the sec-ond conjunct has the same form. To appreciatethe subtleties of TL, consider the followingexpression: eventually the system will be insteady state and the level of GTP will be lessthan k.

Eventually(steady_state and Always[GTP < k]) (b)

Given the properties of TL, the aboveexpression (if true) will actually guarantee thatwhen the system attains the steady state, it alsohas a GTP level less than k. This is a differentstatement than (a), and it shows how flexibleand yet precise a TL statement can be, withoutsacrificing a high degree of expressive power.In (7), we discuss some of the mathematicaland computational problems associated withthis approach, e.g., the dependency of theanalysis on the density of time points.

278 Antoniotti et al.

Cell Biochemistry and Biophysics Volume 38, 2003

Page 9: Model Building and Model Checking for Biochemical Processes · Biochemical reactions can be modeled with sets of differential equations. The classical Michaelis-Menten’s formulation

A Simple Example: the Repressilator

As a simple yet very interesting example,consider the repressilator system constructedby Elowitz and Leibler (1). First, the authorsconstructed a mathematical model of a net-work of three interacting transcriptional regu-lators and produced a trace of the interactionusing a traditional mathematical package(MATLAB®). Subsequently they constructed aplasmid with the three regulators and collecteddata from in vivo experiments to match themwith the predicted values.

The observed trace of the six combined vari-ables is shown in Fig. 3. The system exhibits anoscillatory steady state.

Simpathica was used to enter the descriptionof the repressillator system and to analyze itsbehavior. Although this is a simple toy exam-ple for our application, it still presents a clearidea regarding how a biologist may use theSimpathica system.

Figure 4 shows how to use SIMPATHICA toenter a reactant (in this case pLambdaCI — theprotein LambdaCI). In Fig. 5, a reaction isinserted in the system and the graphical depic-tion is immediately visible in the bottom leftframe; in this case, the reaction just entered is amodulated reaction: the production of TetR isinhibited by the amount of LambdaCI present.

Figure 6 shows how to enter a second reactionin the system.

Once all the reactions have been entered,along with a set of initial conditions, the menu“Simulation->Run Simulation” will performthe following steps.

Generate the appropriate set of differentialequations in canonical form. (Simpathica strivesto generate an S-system from a given network—if it cannot, it generates a GMA-system.)

Start the integrated Octave program (www.octave.org) to simulate the system.

Finally, we can use our system to verify thatthe repressilator system’s variables actually“oscillate.” We may formulate and test a querysuch as the one shown below (for variablepLambdaCI):

Eventually[not [Always(pLambdaCI < 0.25)

or Always(pLambdaCI > 0.5)]].

The above query states that the value of thepLambdaCI variable oscillates1 between thetwo extremes of 0.25 and 0.5.

The repressilator example is interestingbecause it indicates how well our system scaleswhen many variables are present in the model.In a more recent work by Guet et al. (8), theauthors show how to construct plasmids con-taining many more kinds of promoters (e.g.,five) modulating these genes, thus potentiallymany more gene network topologies. Askingprecise and circumscribed queries about thebehavior of such combinations is much morepreferable to visually examining complexgraphical renditions of the complex interactingpatterns of reactant concentrations.

Cell Biochemistry and Biophysics Volume 38, 2003

Fig. 3. The simulation trace of the repressila-tor system.

Model Building and Checking for Biochemical Processes 279

1A more precise, yet more complex, query would be

Eventually[(pLambdaCI < 0.25)and Always[(plambdaCI < 0.25)implies Eventually[plambdaCI > 0.5]]

and Always[(plambdaCI > 0.5)implies Eventually[plambdaCI < 0.25]]].

This query seems to be common enough to warrant a"macro"-like steady_state.

Page 10: Model Building and Model Checking for Biochemical Processes · Biochemical reactions can be modeled with sets of differential equations. The classical Michaelis-Menten’s formulation

Fig. 4. Entering a reactant. The name for the quantity we want to use in the simulation (in thiscase pLamdaCI) is entered in the text field along with the initial concentration (upper left frame).

Fig. 5. Entering a reaction. A modulated reaction (production of TetR, inhibited by LambdaCI) isentered in the system. The rate of the reaction is a = 1. The inhibition effect is expressed by the expo-nent f = –1, assigning a label to the modulation arrow (the light one originating from the pLacIfield). Note the graphic rendition in the lower right frame. The grayed oval represents the “input”of the reaction, while the dotted arrow represents the modulation effect of pLacI.

Page 11: Model Building and Model Checking for Biochemical Processes · Biochemical reactions can be modeled with sets of differential equations. The classical Michaelis-Menten’s formulation

A More Complex Example: Purine Metabolism

Let us now revisit in detail the example ofpurine metabolism described in (4) and fullyanalyzed in (9,10). The pathway for purinemetabolism is presented in Fig. 7. A briefdescription of the key reactions follows, andthe reader is invited to examine the moredetailed summaries contained in (2,9,10) aswell as the related literature referenced there.

The main metabolite in purine biosynthesisis 5-phosphoribosyl-α-1-pyrophosphate (PRPP).A linear cascade of reactions converts PRPPinto inosine monophosphate (IMP). IMP is thecentral branch point of the purine metabolismpathway. IMP is transformed into AMP andGMP. Guanosine, adenosine, and their deriva-

tives are recycled (unless used elsewhere) intohypoxanthine (HX) and xanthine (XA). XA isfinally oxidized into uric acid (UA). In additionto these processes, there appear to be two “sal-vage” pathways that serve to maintain IMPlevel and thus of adenosine and guanosinelevels as well. In these pathways, adeninephosphoribosyltransferase (APRT) and hypox-anthine-guanine phosphoribosyltransferase(HGPRT) combine with PRPP to form ribonu-cleotides.

The consequences of a malfunctioningpurine metabolism pathway are severe and canlead to death. The entire pathway is quite com-plex and contains several feedback loops,cross-activations and reversible reactions, andthus an ideal candidate for reasoning with thecomputational tools we have developed.

Model Building and Checking for Biochemical Processes 281

Cell Biochemistry and Biophysics Volume 38, 2003

Fig. 6. Entering a second reaction. In this case, we enter a new modulated reaction. Note how thenew reaction is rendered in the bottom right frame and how it appears in the reactions list in thebottom left frame.

Page 12: Model Building and Model Checking for Biochemical Processes · Biochemical reactions can be modeled with sets of differential equations. The classical Michaelis-Menten’s formulation

In ref. 2, a sequence of models for purinemetabolism is presented alongside an analysisof how to identify discrepancies with physi-cally observed data, and how to amend thecurrent model to explain these discrepancies.

We also show how to formulate queries overthe simulation traces to express various desir-able properties (or absence of undesirableones) that the model should possess. Shouldany of these queries “fail,” the model will bemarked for further examination, experimenta-tion, and correction.

Model 2

Given the purine metabolism model labeledas “model 2” in (2), and following the exampleclosely, we start by querying the simulationtrace for the reachability of a steady state. This

property is easily formulated in our frameworkwith the following query:

steady_state()

A precise definition of the predicatesteady_state in terms of other atomic predi-cates has been given earlier in this article. In asimilar manner we proceed to other interestingqueries, as illustrated below.

Variation of the initial concentration of PRPPdoes not change the steady state.

(PRPP = 10 * PRPP1) implies steady_state()

This query will be true when evaluatedagainst the modified simulation run (i.e., theone where the initial concentration of PRPP is10 times the initial concentration in the first run- PRPP1). Figure 8 illustrates how XSSYS treatssuch a query.

Persistent increase in the initial concentra-tion of PRPP does cause unwanted changes inthe steady state values of some metabolites.

In this case, if the increase in the level ofPRPP is in the order of 70% (see ref. 2), then thesystem does reach a steady state, and weexpect to see increases in the levels of IMP andof the hypoxanthine pool in a “comparable”order of magnitude. This means that

Always (PRPP = 1.7*PRPP1)implies steady_state()

will be true over the modified experimenttrace. Note the use of “always” in this TL queryto indicate the persistent increase of the PRPPvalue. For a contrast, consider the followingstatement:

Eventually(Always (PRPP = 1.7 * PRPP1)impliessteady_state()and Eventually(Always(IMP < 2 * IMP1))

and Eventually(Always(hx_pool < 10*hx_pool1)))

where IMP1 and hx_pool1 are the valuesobserved in the unmodified trace. The abovestatement turns out to be false over the modi-

282 Antoniotti et al.

Cell Biochemistry and Biophysics Volume 38, 2003

Fig. 7. The metabolic scheme of purinemetabolism in human. (Reprinted from ref. 10,where a full description and further referencesmay be found.)

Page 13: Model Building and Model Checking for Biochemical Processes · Biochemical reactions can be modeled with sets of differential equations. The classical Michaelis-Menten’s formulation

fied experiment trace. In fact, the increase inIMP is approx 6.5-fold whereas the hypoxan-thine pool increase is approx 60-fold. Figure 9illustrates how XSSYS treats such a query.

Because the above queries turn out to befalse over the modified trace, we conclude thatthe model “over-predicts” the increases insome of its products and that it should there-fore be amended.

Modes 3, 4, and “Final”

The following sequence of models (leadingto the “final” one, which is adapted from[9,10]) improves their response as comparedwith known clinical data. Again, we considera simple example from (2) where we expressthe properties being tested with our TLlanguage.

TEMPORARY PERTURBATIONS

We reused the model published in (2) andthe data generated with PLAS software with avariation. The PLAS model reaches the steadystate, and the in silico experiment shows thatwhen an initial level of PRPP is increased by50-fold, the steady state concentration isquickly absorbed by the system. The level ofPRPP returns rather quickly to the expectedsteady state values. IMP concentration levelalso rises and HX level falls before returning topredicted steady state values.

These results are tested with the same TLexpressions we described earlier. However, weneed to make a change to our model to makePRPP increase only after a certain time after thesimulation has started. This change allows usto reformulate our query as follows:

Model Building and Checking for Biochemical Processes 283

Cell Biochemistry and Biophysics Volume 38, 2003

Fig. 8. The first example discussed in the context of “Model 2.” The variable X1 is PRPP and thefact that the initial condition is changed is expressed as stated.

Page 14: Model Building and Model Checking for Biochemical Processes · Biochemical reactions can be modeled with sets of differential equations. The classical Michaelis-Menten’s formulation

Always(PRPP > 50 * PRPP1implies(steady_state()and Eventually(IMP > IMP1)and Eventually(HX < HX1)and Eventually(Always(IMP = IMP1))

and Eventually(Always(HX = HX1))

The above query may be interpreted as fol-lows: an (instantaneous) increase in the level ofPRPP will not make the system stray from thepredicted steady state, even if temporary vari-ations of IMP and HX are allowed. Figure 10shows how XSSYS responds to the new query.

CONCLUDING REMARKS

We released a preliminary version of oursoftware to the DARPA’s Biospice community(www.biospice.org). Several interesting ques-tions remain to be further explored; e.g., we areworking toward integrating some time/frequency analysis techniques into our tools tobetter discriminate different traces, and toincrementally construct a better model withoutlosing information (both numerically and sym-bolically). To get the interested reader to appre-ciate the challenges posed by this new branchof computational biology, we list the followingquestions.

284 Antoniotti et al.

Cell Biochemistry and Biophysics Volume 38, 2003

Fig. 9. In Model 2 of ref. 2, the level of PRPP is persistently raised to 1.7 of its undisturbed steadystate level. The XSSYS correctly answers the query whether the levels of IMP (variable X2) and theHX pool (variable X8) are still within acceptable bounds. Notice that the trajectories do not indicatea flaw in Model 2.

Page 15: Model Building and Model Checking for Biochemical Processes · Biochemical reactions can be modeled with sets of differential equations. The classical Michaelis-Menten’s formulation

Model Building and Checking for Biochemical Processes 285

Cell Biochemistry and Biophysics Volume 38, 2003

Reactions Models

We primarily focused on a simple ODEmodel (Differential Algebraic Equations,DAE) and narrowed this even further to amodel based on S- and XS-systems. Does thisimply that we are accommodating a signifi-cant deviation from reality? How can a sto-chastic model representing small number ofmolecules interacting pair-wise and randomlybe incorporated?

Numerical Issues

Many of the operations we propose need tobe carefully analyzed from the numerical ana-

lyst point of view to take into account errorpropagation issues.

Hybrid Systems

Certain interactions are purely discrete andafter each such interaction, the system dynamicsmay change. Such a hybrid model implies thatthe underlying automaton must be modified foreach such mode. How do these enhancementsmodify the basic symbolic model?

Spatial Models

The cellular interactions are highly specificto their spatial locations within the cell. How

Fig. 10. The in silico trace of the “final model” from ref. 2. We arbitrarily increased the level ofPRPP (variable X1) to more than 250 at time step 100. The XSSYS system correctly answers bothqueries. Because of numerical fluctuations we had to ask a less stringent question about the steadystate value of IMP (variable X2) and HX (variable X13).

Page 16: Model Building and Model Checking for Biochemical Processes · Biochemical reactions can be modeled with sets of differential equations. The classical Michaelis-Menten’s formulation

can these be modeled with richer abstractionsof automata, e.g., cellular-automata? How canwe account for dynamics because of changes tothe cell volume? The time constants associatedwith the diffusion may vary from location tolocation; how can that be modeled?

State Space

A number of interacting cells can be mod-eled by product automata. In addition to theclassical “state-explosion problem” we alsoneed to pay attention to the variable structurefrom the result of (1) cell division, (2) apopto-sis, and (3) differentiation.

Communication

How do we model the communicationamong the cells mediated by the interactionsbetween the extracellular factor and externalreceptor pairs?

Hierarchical Models

Finally, as we delve into more and morecomplex cellular processes, a clear understand-ing can only be obtained through modularizedhierarchical models. What are the ideal hierar-chical models? How do we model a populationof cells with related statistics?

ACKNOWLEDGMENTS

We thank Mike Wigler, Misha Gromov, AleCarbone, Tom Anantharaman, Vivek Mittal,Rob Lucito, Jack Schwartz, Roger Brockett,Sanjoy Mitter, Shankar Sastry, Sri Kumar, MitaDesai, David Harel, Amir Pnueli, JaneHubbard, Charlie Cantor, Yuri Lazebnick andHarel Weinstein for offering many excitingideas, useful contributions and ways to thinkabout genomes, proteomes, pathways and cel-lular processes. Many of our close colleaguesfrom NYU Bioinformatics group, Cold SpringHarbor Laboratory, and Mt. Sinai School ofMedicine have directly and indirectly con-tributed to this effort: Toto Paxia, Raoul

Daruwala, Joey Zhou, Archi Rudra, NaomiSilver, Frank Park, Chris Wiggins, VioletChang, Elizabeth Thomas, Ken Chang and JoeWest. To all of them, we are grateful. The workreported in this paper was supported by grantsfrom NSF’s Qubic program, DARPA’s Biospiceprogram, HHMI biomedical support researchgrant, the US department of Energy, the US airforce, National Institutes of Health and NewYork State Office of Science, Technology &Academic Research.

REFERENCES

1. Elowitz, M. and Leibler, S. (2000) A syntheticoscillatory network of transcriptional regula-tors. Nature 403, 335–338.

2. Voit, E.O. (2000) Computational Analysis ofBiochemical Systems. Cambridge UniversityPress, Cambridge, UK.

3. Voit, E.O. (ed). (1991) Canonical NonlinearModeling: S-System Approach to UnderstandingComplexity. Van Nostrand Reinhold, New York.

4. Savageau, M.A. (1993) Finding multiple roots ofnonlinear algebraic equations using s-systemmethodology. Appl. Math. Comput. 55, 187–199.

5. Emerson, E.A. (1990) Temporal and modallogic. In: van Leeuwen, J., ed. Handbook ofTheoretical Computer Science, vol. B. Elsevier,Amsterdam, pp. 997–1072.

6. Mishra, B., and Clarke, E.M. (1985) Hierarchicalverification of asynchronous circuits using tem-poral logic. Theoretical Computer Sci. 38, 269–291.

7. Antoniotti, M., Policriti, A., Ugel, N., andMishra, B. (2002) XS-systems: eXtented S-sys-tems and Algebraic Differential Automata forModeling Cellular Behavior. Proceedings of theInternational Conference of High PerformanceComputing, HiPC’02, Bangalore, India.

8. Guet, C.C., Elowitz, M.B., Hsing, W., andLeibler, S. (2002) Combinatorial synthesis ofgenetic networks. Science 286, 1466–1470.

9. Curto, R., Voit, E.O., Sorribas, A., and Cascante,M. (1998) Mathematical models of purinemetabolism in man. Math. Biosci. 151, 1–49.

10. Curto, R., Voit, E.O., Sorribas, A., andCascante, M. (1998) Analysis of abnormalitiesin purine metabolism leading to gout and toneurological dysfunctions in man. Biochem.l J.329, 477–487.

286 Antoniotti et al.

Cell Biochemistry and Biophysics Volume 38, 2003