design dna

Upload: purwanto

Post on 09-Mar-2016

213 views

Category:

Documents


0 download

DESCRIPTION

d

TRANSCRIPT

  • doi: 10.1098/rsif.2011.0800 published online 4 January 2012J. R. Soc. Interface

    Matthew R. Lakin, David Parker, Luca Cardelli, Marta Kwiatkowska and Andrew Phillips

    using probabilistic model checkingDesign and analysis of DNA strand displacement devices

    Supplementary datal http://rsif.royalsocietypublishing.org/content/suppl/2012/01/03/rsif.2011.0800.DC1.htm

    "Data Supplement"

    Referencesref-list-1http://rsif.royalsocietypublishing.org/content/early/2012/01/03/rsif.2011.0800.full.html#

    This article cites 27 articles, 6 of which can be accessed free

    P

  • Design and analysid

    idan

    n Aity1of

    diftheices aerfs tophngu

    be used to identify design aws and to evaluate the merits of contrasting design decisions,

    1. INTRO

    Molecular cconstruct inlevel. In paDNA showcation areamanufacturrect and roresults, inference betstrand dispfacilitate thstrand disp

    In thiscation teidentify faufocus on mto vericaof a nite-model checthe analys

    ing to theng timing.king tech-erties suchbound toing allowsthe prob-o completerobabilisticany otherhat is the

    uced to anit?. Moreeady beens from aunicationattacks foren used inr example,e use the

    as follows.t and theobabilistic

    model checking and the PRISM model checker. In 3, wepresent the results of applying probabilistic model

    *Author for cThese author

    Electronic supplementary material is available at http://dx.doi.org/10.1098/rsif.2011.0800 or via http://rsif.royalsocietypublishing.org.

    J. R. Soc. Interface

    on March 5, 2012rsif.royalsocietypublishing.orgDownloaded from Received 20 NAccepted 6 Deven on devices comprising relatively few inputs. We then demonstrate the use of these com-ponents to construct a DNA strand displacement device for approximate majority voting.Finally, we discuss some of the challenges and possible directions for applying these methodsto more complex designs.

    Keywords: DNA computing; formal verication; probabilistic model checking;DNA strand displacement

    DUCTION

    omputing is a relatively new eld that aims toformation-processing devices at the molecularrticular, molecular devices constructed usingpromise for a wide range of important appli-s, including biosensing, biomimetic moleculare and drug delivery. However, designing cor-bust DNA devices is a major challenge. Thispart, from the possibility of unwanted inter-ween molecules in the system. The DNAlacement (DSD) [1,2] has been developed toe design, simulation and analysis of DNAlacement devices.paper, we propose the use of formal veri-chniques to check the correctness of, andlty behaviour in, DNA device designs. Weodel checking, a fully automated approachtion based on the exhaustive explorationstate model. We also employ probabilisticking, which generalizes these techniques tois of probabilistic models of systems that

    exhibit stochastic behaviour, for example, owpossibility of failures or uncertainty regardiConventional (non-probabilistic) model-checniques can be used to check correctness propas molecules 1 and 2 are never simultaneouslymolecule 3, whereas probabilistic model checkverication of quantitative guarantees such asability of a strand displacement device failing twithin 20 min is at most 1026. Furthermore, pmodel checking can be used to evaluate mquantitative properties, such as performance: wexpected time for an input signal to be transdoutput signal by a strand displacement circugenerally, probabilistic model checking has alrsuccessfully applied to the analysis of systemwide range of application areas, from commprotocols such as Bluetooth [3] to pin-crackingcash machines [4]. In particular, it has also bethe domain of systems biology to analyse, focell signalling pathways [5,6]. In this paper, wprobabilistic model-checking tool PRISM [7].

    The remainder of the paper is structuredIn 2, we present DNA strand displacemenDSD programming language, and introduce pr

    orrespondence ([email protected]).s contributed equally to the study.displacement components and to investigate their kinetics. We show how our techniques candisplacementprobabilistic m

    Matthew R. Lakin1,3,, DavMarta Kwiatkowska2

    1Microsoft Research, 7 JJ Thomso2Department of Computer Science, Univers

    Oxford OX3Department of Computer Science, University

    Designing correct, robust DNA devices isunwanted interference between molecules inproposed as a design paradigm for DNA devprogramming language has been developed aing these devices to check for unwanted intthe use of probabilistic verication techniqueformance of DNA devices during the designPRISM, in combination with the DSD laovember 2011ecember 2011 1s of DNA strandevices usingodel checkingParker2,, Luca Cardelli1,d Andrew Phillips1,*

    venue, Cambridge CB3 0FB, UKof Oxford, Wolfson Building, Parks Road,3QD, UKNew Mexico, Albuquerque, NM 87131, USA

    cult because of the many possibilities forsystem. DNA strand displacement has beens, and the DNA strand displacement (DSD)means of formally programming and analys-erence. We demonstrate, for the rst time,analyse the correctness, reliability and per-

    ase. We use the probabilistic model checkerage, to design and debug DNA strand

    doi:10.1098/rsif.2011.0800Published onlineThis journal is q 2012 The Royal Society

  • (from releasing strands) and enthalpy (from formingadditional base pairs) drive the system forward [10].

    an

    2 Design and analysis of DNA M. R. Lakin et al.

    on March 5, 2012rsif.royalsocietypublishing.orgDownloaded from These increases typically result from the conversion ofactive gate structures into unreactive waste. Further-more, because DNA strand displacement relies solelyon hybridization between complementary nucleotidesequences to perform computational steps, these systemschecking to DNA strand displacement gates and systems,including a DNA strand displacement device for approxi-mate majority voting. Additional details of the methodsare presented in 4, followed by a discussion of futurework in 5.

    2. BACKGROUND

    2.1. DNA strand displacement

    DNA strand displacement [8] is a mechanism for per-forming computation with DNA molecules. Once initialspecies of DNA are mixed together, strand displacementsystems proceed autonomously [9] as increases in entropy

    x

    x

    x

    x

    x

    y

    y

    y

    y

    y

    y

    y

    x y

    y*

    y*

    x*

    x*

    y*

    y*

    x*

    x*

    t

    tt

    t

    t*t*

    t*

    t

    t t

    t

    t*

    t*

    t*

    t*

    t*

    Figure 1. Toehold-mediated DNA branch migrationrequire no additional enzymes or transcription machin-ery, which in turn allows experiments to be run usingsimple laboratory equipment.

    In most strand displacement schemes, populations ofsingle strands of DNA are interpreted as signals, whereasdouble-stranded DNA complexes act as gates, mediatingchanges in the signal populations.Within the system, thecomputational mechanism is toehold-mediated branchmigration and strand displacement [11]. At the peripheryof the system, signal populations may be connected touorophores for human-readable output, or regulatedby custom-designed aptamer molecules that interface tothe biological environment. The latter example high-lights a key strength of DNA-based computationaldevices: the ability to interface directly with biologicalsystems [12,13].

    Figure 1 presents example branch migration andstrand displacement reactions. Each letter in the gurerepresents a distinct domain (a sequence of nucleotides)and the asterisk operator (*) denotes the WatsonCrick (CG, TA) complement of a given domain.

    J. R. Soc. InterfaceShort domains (represented in colour) are known as toe-holds, while long domains (in grey) are often referred to asrecognition domains. We assume that toeholds are suf-ciently short (410 nucleotides) that they hybridizereversibly with their complements, whereas recognitiondomains are sufciently long (greater than 20 nucleo-tides) to hybridize irreversibly [11]. Each single strandis oriented from the 50 (left) end to the 30 (right) end,and each double-stranded complex consists of hybridizedsingle strands with opposite orientations. We assumethat the underlying nucleotide sequences have beenchosen such that distinct domains do not interact at all.

    In the rst reaction from gure 1, an incoming strandbinds to a gate because the t toehold domain inthe strand hybridizes with its exposed complementin the gate, producing the intermediate complex on theright-hand side. Because the incoming strand is onlyheld on by a toehold, this reaction can be reversed, caus-ing the single strand to oat away into the solution. In thesecond reaction, the x domain in the overhanging strandmatches the next domain along the double-stranded

    y

    x

    y

    x xy tt

    y*x*t* t*

    x y

    y

    y*x*

    t

    x yt t

    y t

    t* t*

    y*x*t* t*

    x y tt

    y*x*t* t*

    d strand displacement. (Online version in colour.)backbone, which means that the branching pointbetween the overhanging strand and the backbone canmove back and forth in a random walk called a branchmigration. Eventually, the random walk may completelydetach the short x strand from the gate in a strand dis-placement. This reaction is considered irreversiblebecause the invading strand is now attached to the gateby a long domain as well as a toehold. Note that if the rec-ognition domain on the strand did not match the nextdomain along the gate, then branch migration couldnot proceed, and the incoming strand would eventuallyunbind. We call such binding reactions unproductive.The third reaction is another branch migration, thoughin this case no strand is displaced because even afterthe y domain has been displaced, the rightmost strandis still attached by a toehold. The fourth reaction is a(reversible) unbinding reaction in which the rightmoststrand spontaneously unbinds because of the low bindingstrength of the toehold.

    Binding, migration and unbinding reactions such asthose illustrated in gure 1 allow signal populations

  • technique of annealing single strands, which has a higher

    [S]. The sequences L, R, L0 and R can potentiallybe empty, in which case we simply omit them. Gatesare built up by concatenating gate segments G1 andG2 along a common lower strand, written G1:G2, oralong a common upper strand, written G1::G2. Inthe graphical representation, we omit the colonsaltogether and connect the strands.

    An individual DNA species can be an upper strandkSl, a lower strand fSg or a gate G. We let D rangeover systems of such species. Multiple systems D1,D2 can be present in parallel, written as D1jD2.A domain N can also be restricted to molecules D, writ-ten new N D. This represents the assumption that thedomain (or its complement) is not used by any othermolecules outside D. We also allow module denitionsof the form X(n)=D, where n are the module par-ameters and X(m) is an instance of the module Dwith parameters n replaced by m. We assume a xedset of module denitions, which are declared at thestart of the program. The denitions are assumed tobe non-recursive, such that a module cannot invokeitself, either directly or indirectly via another module.

    All of the models discussed in this paper were createdusing the Visual DSD tool.1 This is a web-basedimplementation of the DSD language that allows net-works of strand displacement reactions to be designed,

    Table 1. Syntax of the DNA strand displacement (DSD)

    1http://lepton.research.microsoft.com/webdna

    Design and analysis of DNA M. R. Lakin et al. 3

    on March 5, 2012rsif.royalsocietypublishing.orgDownloaded from probability of producing unwanted secondary structures.In this paper, we use a variant of two-domain gates thatrelaxes the restrictions on single-strandedDNAmoleculesbut that retains the simplied gate structure with all itspractical benets to the experimentalist.

    2.2. The DNA strand displacementprogramming language

    The DSD programming language [2] provides a textualsyntax for expressing the structure of DNA species suchas those portrayed graphically in gure 1. The seman-tics of the DSD language denes a formal translationof a collection of DNA species into a system of chemicalreactions that captures the possible interactionsbetween the species. The language includes syntacticand graphical abbreviations that allow us to representa particular class of DNA molecules in a concisemanner. The class of molecules in question is thosewithout secondary structurethat is, only single-stranded DNA sequences may hang off the maindouble-stranded backbone of the molecule. This rulesout tree-like or pseudo-knotted structures, whichgreatly simplies the denition of the semantics whilestill allowing a wide variety of systems to be designed.

    The textual syntax of the DSD programminglanguage and the corresponding graphical represen-tation are presented in table 1. The syntax is denedin terms of sequences S, L, R, strands A, gates G and sys-tems D. A sequence S comprises one or more domains,which can be long domains N or short domains N^.DNA species can be single or double stranded. Asingle upper strand kSl denotes a sequence S orientedfrom left to right on the page, while a single lowerto be dynamically modied over time, and irreversi-ble strand displacement reactions such as the secondreaction from gure 1 provide a thermodynamic biastowards producing output. Combining these differentkinds of reactions allows us to construct cascades ofgates in which the output strands from one gate serveas the input strands for another. This technique hasenabled the construction of large, complex logic circuitsbased on DNA strand displacement [14].

    In this paper, we restrict our attention to a class ofgates closely related to Cardellis two-domain gatescheme [15]. Two-domain gates are a restricted class ofsystemswhere every strand consists of twodomains (a toe-hold and a long recognition domain) and gates haveno structures hanging off the main double-stranded back-bone of the complex. The initial and nal gates shown ingure 1 have this property, although the intermediatesteps do involve transient overhanging structures duringbranch migration and strand displacement steps. Thesegate structures can be thought of as one continuousstrand hybridized with a complementary strand thathas breaks at certain points. Such restricted gatestructures could be assembled by synthesizing double-stranded DNA and inserting the breaks using eitherrestriction enzymes or site-specic photocleavage tosplit the backbone of the DNA strand at the appropriatepoint. This technique should allow gates to be constructedwith a higher yield thanwould be obtained with the usualJ. R. Soc. Interfacestrand fSg denotes a sequence S oriented from rightto left on the page. A double strand [S] denotes anupper strand kSl bound to the complementary lowerstrand fS*g. A gate G is composed of double-strandedsegments of the form fLgkLl[S]kRlfRg, which rep-resents an upper strand kL S Rl bound to a lowerstrand fL S* Rg along the double-stranded region

    language, in terms of strands A, gates G and systems D.Where present, the graphical representation below isequivalent to the program code above.

  • tic temporal logics such as continuous stochastic logic(CSL) [18,19] and its extensions for reward-based prop-

    DSD code of gure 2. This also includes a denitionof single-stranded signals, where the S(N,x) moduledenotes a population of N single-stranded signals carry-ing the X domain. We assume that the t^ toehold isdened globally and shared by all gates and strands.The T(N,x,y) module represents N parallel copies ofthe transducer gate that implements the X! Y reac-tion. The body of the module denition consists oftwo populations of N complexes, and two populationsof N strands. The species present in the initial state(when N 1) are shown in gure 3a. The rst gate

    4 Design and analysis of DNA M. R. Lakin et al.

    on March 5, 2012rsif.royalsocietypublishing.orgDownloaded from erties [20]. For example, rather than verifying that theprotein always eventually degrades, using CSL allowsus to ask what is the probability that the protein even-tually degrades? or what is the probability that theprotein degrades within t hours? Reward-based proper-ties include what is the expected time that proteinsare bound within the rst t time units? and what issimulated and analysed. For the purposes of this work,we have developed additional functionality for visualDSD that allows the reaction network to be exportedas a model that can be loaded into the PRISM probabil-istic model checker for verication. Additional detailsof how reaction networks are computed in DSD areprovided in 4.

    2.3. Probabilistic model checking

    Model checking is an automated formal vericationtechnique, based on the exhaustive construction andanalysis of a nite-state model of the system being ver-ied. The model is usually a labelled state-transitionsystem, in which each state represents a possible con-guration of the system and each transition betweenstates represents a possible evolution from one congur-ation to another. The desired correctness properties ofthe system are typically expressed in temporal logics,such as computation tree logic (CTL) or linear-timetemporal logic. We omit here a precise description ofthese logics (see [16,17] for detailed coverage); instead,we give some typical CTL formulae below, along withtheir corresponding informal meanings:

    A [ G !(access1 & access2): processes 1and 2 never simultaneously access a shared resource;

    A [ F end ]: the algorithm always eventuallyterminates and

    E [ !fail U end ]: it is possible for the algor-ithm to terminate without any failures occurring.

    Once the desired correctness properties of the system havebeen formally expressed in this way, they can then be ver-ied using a model checker. This performs an exhaustiveanalysis of the system model, for each property eitherconcluding that it is satised or, if not, providing acounterexample illustrating why it is violated.

    Probabilistic model checking is a generalization ofmodel checking for the verication of systems that exhi-bit stochastic behaviour. In this case, the models thatare constructed and analysed are augmented with quan-titative information regarding the likelihood thattransitions occur and the times at which they do so.In practice, these models are typically Markov chainsor Markov decision processes. To model systems of reac-tions at a molecular level, the appropriate model iscontinuous-time Markov chains (CTMCs), in whichtransitions between states are assigned (positive, real-valued) rates. These values are interpreted as therates of negative exponential distributions.

    Properties of CTMCs are, like in non-probabilisticmodel checking, expressed in temporal logic, but arenow quantitative in nature. For this, we use probabilis-J. R. Soc. Interface3.1. Transducer gates: correctness

    We begin by considering one of the simplest reactiongates: a transducer that takes an X signal as inputand produces a Y signal as output. The gate can bethought of as implementing a unary chemical reactionX! Y. We will demonstrate the use of (non-probabi-listic) model checking to debug strand displacementsystems, by detecting a bug in a awed design for atransducer gate.

    Our initial transducer design is specied by thethe expected number of phosphorylations before reloca-tion occurs? For further details on probabilistic modelchecking of CTMCs, see [19,20]. For a description of theapplication of these techniques to the study of biologicalsystems, see Kwiatkowska et al. [21]. All of the modelsdiscussed in this paper were analysed using PRISM [7], aprobabilistic model-checking tool developed at theUniversities of Birmingham and Oxford. Additionaldetails of probabilistic model checking using PRISM areprovided in 4.

    3. RESULTS

    In this section, we present a series of case studiesthat demonstrate the application of probabilistic model-checking techniques to the design of DNA stranddisplacement systems. As mentioned previously, to doso we have extended the Visual DSD tool [2] with thecapability to generate, from DSD designs, correspondingmodel descriptions that can be directly analysed by thePRISM probabilistic model checker [7]. We will study ourmodied two-domain gate designs from 2.1, and presentanalyses of various correctness, reliability and perform-ance properties of the gates using PRISM. Finally, we willconstruct an approximate majority voting system usingthese components and show how additional approxi-mation mechanisms can be adopted in order to verifythis system.

    Figure 2. Initial transducer gate code, with additional denitionfor signal strands.

  • (cie

    Design and analysis of DNA M. R. Lakin et al. 5

    on March 5, 2012rsif.royalsocietypublishing.orgDownloaded from accepts an input signal X and the second gate producesan output signal Y. The single strands are fuel species:the rst ejects an intermediate strand from the inputgate in an irreversible reaction that prevents inputsignal X from being rejected (thereby undoing theexecution of the gate so far), whereas the secondejects the output signal Y from the output gate. Thisdesign is closely related to the two-domain designfrom [22], with the addition of a private c domainthat introduces an additional irreversible step intothe execution of the gate. The effect of this modicationis limited for the simple transducer gate but will becomemore apparent when we move on to more complicatedgate designs. Note, however, that the denition ofT(N,x,y) does not include a similar new declarationfor the a domain. This has implications for crosstalkbetween gate populations, as we shall see below.

    The expected nal species for the transducer gatedesign (whenN 1) are shown in gure 3b. The intendedeffect is to convert an incoming kt^ xl signal into kt^ yl,leaving only unreactive waste. We say that a strand orgate is reactive if it can react with some other speciespresent in the system to cause a strand to be displaced,and unreactive otherwise. In this example, the unreac-tive species are those in which all toeholds occur in

    x

    x

    x

    c.1

    c.1

    c.1

    c.1

    c.1*

    c.1*

    (a)

    (1)

    (1)

    (1)

    (1)

    (1)a* a*

    a

    a

    a a

    y

    y

    y* a*x*

    x*

    t

    t

    t

    t t

    t

    t

    t*

    t* t* t*

    t* t*

    Figure 3. (a) Initial species and (b) expected nal spedouble-stranded segments and are thus sequestered.Here, we will focus on verifying the correctness of two

    transducer gates in series. The rst gate should turn asignal X0 into X1, and then the second should turnsignal X1 into X2. Using the DSD code from gure 2,the input signal and pair of transducers are given by:S(1,x0) | T(1,x0,x1)| T(1,x1,x2).

    To formalize a correctness property to be checked byPRISM, we rst need to identify the states of the model inwhich the execution of the gates has completed succes-sfully. This is performed with the following PRISM code,which is, in a generic form, designed to be applicable tovarious different designs:

    Here, strands_reactive and gates_reactiveare pre-dened formulae (automatically generated

    J. R. Soc. Interfaceby Visual DSD) that, when evaluated in a particu-lar state of a model, return the number of reactivestrands and reactive gates in that state, respectively.The variable output gives the number of outputstrands (in this example, kt^ x2l) and N is thenumber of parallel copies of the system. So, we saythat the execution was successful when all copies ofthe gate have produced the required output and thereare no reactive gates or strands (other than outputstrands) present.

    Notice that, by denition, when the specicationall_done given above is true; no further reactionscan occur. Hence, such states of the model are deadlockstates (those with no outgoing transitions to otherstates). We specify the correctness of the systemdesign by stating that: (i) any possible deadlock statethat can be reached must satisfy all_done and(ii) there is at least one path through the system thatreaches a state satisfying all_done. These twoproperties can be represented by formulae in the (non-probabilistic) temporal logic CTL, which can be veriedby PRISM:

    x

    x

    c.1

    c.1

    (1)

    (1)

    (1)

    (1)

    (1)

    (2)

    b) a

    a

    a

    yt

    t t

    x y c.1t t t

    a*x* y* c.1*t* t* t*

    t

    x* c.1* a*a

    a*t* t* t*

    s for the transducer gate. (Online version in colour.)When we use PRISM to check these two queries,we nd that the second is true but the rst is false.In fact, we nd that there are two deadlock states inthe model, one where all_done is false and onewhere it is true. We can visualize both states usingthe Visual DSD tool, as shown in gure 4. State 2, onthe right-hand side, represents the case where thesystem has executed correctly and indeed this isthe result that we would anticipate: the state containsthe output strand kt^ x2l along with the inertgarbage left over from complete execution of the twotransducer gates. State 1, however, is incorrect: eventhough the output strand kt^ x2l is produced,we see that some constituent complexes of the trans-ducers are left in a reactive state, with exposedtoehold domains.

    When checking that the rst query above is false,PRISM also produces a counterexample in the form ofa path through the model that leads to a deadlock

  • 6 Design and analysis of DNA M. R. Lakin et al.

    on March 5, 2012rsif.royalsocietypublishing.orgDownloaded from (1)x1*x1

    x2*x2

    a*

    a

    c.2*c.2

    t* t*

    t

    t*

    tstate where all_done is not true (state 1 fromgure 4). Analysing this path reveals exactly how the

    x0 x0x0*

    c.1 ac.1* a*

    a

    a*t* t*

    t

    t*

    t t

    x0x0*

    c.1 c.1ac.1* a*

    a a

    a*t* t*

    t

    t*

    t t

    The problem arises because the ka t^l strand releasedfrom the input complex of the X0! X1 transducer can

    x1x1*

    x2x2*

    a a

    a*t*

    t

    t* t*

    t tc.2c.2*

    x1x1*

    x2 x2x2*

    a

    a*t*

    t

    t* t*

    t tc.2 c.2c.2*

    There are some subsequent reactions that tidy up asmany as possible of the species with exposed toeholds.The output strand X2 is produced, as expected, but

    (1)x1x1*

    c.2c.2* a* a*

    a a

    t* t* t*

    t t

    Figure 4. Deadlock states for the two faulty transducers in seriescolour

    J. R. Soc. Interface(1)x1*

    x1

    x0*

    x0

    a*

    ac.1

    c.1*t*

    t

    t*

    t

    t*

    tx1 c.1

    c.1* a*

    a

    a*

    a

    (1) x1 (1)

    (1)

    (1)

    x0*

    x0*

    t

    t*t*

    c.1

    c.1*

    c.1

    x0

    x0x1*x1

    a*

    a

    t

    t*

    ttc.1* a*

    a

    a*

    a(1)

    (1)

    x0* t*t*

    c.1x0

    x1*

    x1

    a*

    a

    a*

    ac.2

    c.2*

    t

    t*

    tt

    t* t*

    t t

    t*

    t

    t*

    t

    t* t*

    tx0 (1)t

    x0 (1)c.1

    c.2

    c.2 a

    (1)

    (1)

    (1)

    (1)x2

    c.2

    c.1

    a

    t (1)

    (2)

    (2)

    (2)

    t

    x2t(1)

    (a)a(b)system can fail. The rst few reactions proceed as onewould expect.

    x0x0x0*

    c.1 ac.1* a*

    a

    a*t* t*

    t

    t*

    t t

    c.1x0x0*

    c.1 ac.1* a*

    a

    a*t* t*

    t t

    t*

    now interact with the output complex of the X1! X2transducer, causing the following reactions.

    x1x1*

    x2 x2x2*

    a

    a*t*

    t

    t* t*

    t tc.2 c.2c.2*

    x1x1*

    x2x2*

    a a

    a*t*

    t t

    t* t*

    tc.2c.2*

    there are still some reactive species left at the end. Theka t^l strand from the X0! X1 transducer prematurelyactivates the X1! X2 transducer without producing the

    (1)a*

    ac.2

    c.2*

    x2

    x2*

    x1

    x1* t*

    t

    t*

    t

    t*

    t

    . (a) State 1 (error); (b) state 2 (success). (Online version in

    .)

  • We can x the crosstalk bug in the transducer

    indicator that the transducer did not execute correctly).The DSD code for N copies of the transducer pair is:

    S(N,x0)| T(N,x0,x1)| T(N,x1,x2). There areseveral different ways to compute the desired propertyusing PRISM. One simple way is to use the query:

    Design and analysis of DNA M. R. Lakin et al. 7

    on March 5, 2012rsif.royalsocietypublishing.orgDownloaded from module from gure 2 by adding an additional new adeclaration within the module denitions, as shown inthe denition of the alternative T2(N,x,y) modulein gure 5. This sufces to prevent crosstalk betweenthe populations of gates that implement the differentchemical reactions, because each population of gatesuses different domains for their private a and cdomains. We verify this using PRISM by re-running theabove-mentioned queries against the same model butwith occurrences of the T module replaced by T2 mod-ules. In each case, these designs are correct: both queriesare true.

    3.2. Transducer gates: performanceand reliability

    Next, we examine some quantitative properties of thetransducer gate designs from 3.1. Returning rst ofall to the pair of faulty transducers, we use PRISM to ana-lyse the kinetics of the system. Recall that there are twopossible outcomes once the system eventually termi-nates, one where the execution has completedsuccessfully and one where it has not. Using the PRISMtemporal logic queriesintermediate kt^ x1l signal, thereby leaving parts of thetransducers unused. This happens because of crosstalk:the two transducers share a common recognitiondomain a that allows them to interfere with eachother. In contrast, the new c declaration in the de-nition of the T(N,x,y) module from gure 2 enforcesthat the two transducers use different nucleotidesequences for their c domains. The existence of thisfaulty behaviour was pointed out in Cardelli [15] andillustrated by manually tracing a path through thesystem. Such bugs have, however, proved to be difcultto identify manually using simulation tools. Here, wedemonstrate that model checking can identify suchaws in an automatic fashion.

    Figure 5. Corrected transducer gate code, with an addi-tional new a declaration that prevents crosstalk betweendifferent gates.we can compute the probability that the transducerpair has, after exactly T seconds: (i) terminated, (ii)terminated in error; and (iii) terminated successfully.

    Figure 6 shows how these probabilities vary fordifferent values of T. We see that the erroneous out-come is more likely to occur in the early stages thanthe successful outcome. This is to be expected becausethe error in the system arises when various intermediatereaction steps are skipped. By removing the time

    J. R. Soc. Interfaceparameter T from the queries, we can use PRISM to com-pute the eventual probability of each outcome:

    As the plot in gure 6 suggests, there is in fact an equalprobability of 0.5 for each possible outcome. Intuitively,this can be explained as follows. There is a point in theexecution of the system where, as described in 3.1, it ispossible either for the reaction to proceed as intended orfor interference between gates to occur. In fact, each ofthese two conicting reactions occurs with the samerate, meaning that they are equally likely. Furthermore,each one is irreversible; so the nal outcome is effec-tively decided at this point.

    Interestingly, although a single copy of the faultytransducer pair is clearly unreliable (because it failswith probability 0.5), we can improve the overallreliability of the design by adding multiple (N) parallelcopies of the gates. Section 4 of Cardelli [15] suggeststhat, if large populations of these gates execute in par-allel, the additional strands available in the systemshould be able to unblock the partially completedgates in the erroneous deadlock state. This hypothesisis supported by evidence from individual stochasticsimulations of the system. Here, we use PRISM to performa more exhaustive analysis: we compute, for differentvalues of N, the expected percentage of gates in thenal state of the system that are still reactive (recallfrom 3.1 that a reactive gate in the nal state is an

    1.00.90.80.70.6

    terminateerror

    success

    prob

    abili

    ty

    0.50.40.30.20.1

    0 0.5 1.0 2.0 3.02.51.5T (104 s)

    Figure 6. Plot showing the probability for each possible out-come of the faulty transducer pair, after T seconds. (Onlineversion in colour.)which gives the probability that there are i reactivegates in the nal state of the system and, from this,compute the expected nal percentage.

    Figure 7 plots this value for a range of values of N.We see that the percentage of reactive gates decreaseswith increasing N, indicating, as conjectured, that run-ning more copies of the faulty gates in parallel (i.e.increasing N) means that more of the gates in thesystem will be used correctly.

  • Y Z. This is an extension of the transducer gate,which takes advantage of the fact that the extra reac-tant and product are both of the same species inorder to optimize the gate design. Figure 10 shows theinitial and expected nal states of the system for onecopy of the catalyst gate, i.e. the module instantiationC(1,x,y,z). Note that the catalyst gateC(N,x,y,z) effectively implements a catalytic reac-tion X Y! Y Z by consuming a strand Y asinput and producing a different strand Y as output.

    5

    8 Design and analysis of DNA M. R. Lakin et al.

    on March 5, 2012rsif.royalsocietypublishing.orgDownloaded from Finally, in this section, we consider performance prop-erties of DNA strand displacement systems, as opposedto the reliability properties discussed earlier. Seelig &Soloveichik [23] showed that the time for strand displace-ment circuits to execute increases linearly with thedepth of the circuit. We can verify such propertiescomputationally with our model of DNA strand dis-placement and PRISM. Using the corrected transducerdesign from gure 5, we constructed DSD models ofthe form S(1,x0)| T2(1,x0,x1)| . . .| T2(1,xf k2 1g,xk), for various values of k, which correspondto transducer chains of increasing length. Note that weonly consider a single copy of every transducer in thechain. We used the following PRISM temporal logic queryto compute the expected time to reach a state in whichall of the gates have nished executing:

    In this query, "time" denotes a simple PRISM rewardstructure (see 4.2) that assigns a reward of 1 to allstates of the model. This indicates the rate at which

    01 2 3 4 5 6

    N (number of copies of transducer pair)Figure 7. Plot showing expected percentage of leftover reac-tive gates in the nal state of the system against thenumber of parallel buggy transducersthat is, the parameterN in the system S(N,x0)| T(N,x0,x1)| T(N,x1,x2).We observe that the expected number of unreactive (i.e. cor-rectly executed) gates increases as we add more parallelcopies of the buggy transducer system. (Online version incolour.)40353025

    expe

    cted

    per

    cent

    age

    1520

    10reward will be accumulated, i.e. T units of reward foreach T seconds spent in a state. The property above-mentioned gives the expected reward accumulateduntil "all_done" rst becomes true, thus giving theexpected execution time.

    The results are shown in gure 8. We observe thatthere is indeed a linear relationship between the timeto complete the circuit and the number of transducersin the chain, which agrees with Seelig & Soloveichik[23]. Thus, we can use probabilistic model checkingto analytically investigate the kinetics of stranddisplacement circuits.

    3.3. Catalyst gates

    We now focus on a more complicated reaction gate thatmodels a chemical reaction X! Z catalysed by a thirdspecies Y, i.e. a reaction of the form X Y! Y Z.Figure 9 presents DSD code for a catalyst moduleC(N,x,y,z) that represents N copies of the reactiongate implementing the chemical reaction X Y!

    J. R. Soc. Interface4.54.03.5

    2.5

    expe

    cted

    tim

    e (

    104 s

    )

    K (size of transducer chain)

    1.5

    0.5

    0 1 2 3 4 5 6 7

    3.0

    2.0

    1.0

    Figure 8. Plot showing the expected time to terminate forchains of corrected transducer gates; that is, we vary the par-ameter k in the system S(1,x0)| T2(1,x0,x1)| . . .|T2(1,xf k2 1g ,xk). (Online version in colour.)

    Figure 9. Catalyst gate code, presenting two different gateimplementations: one that carries out garbage collectionreactions and one that does not.Even though a different strand is produced, the DNAimplementation effectively emulates the function of acatalyst by producing a strand identical to the onethat is consumedhence the population of the catalystremains constant. This is in line with the general idea ofemulating chemical reactions using DNA [24]: it is notthe exact species and reactions that are implemented,but equivalent ones.

    In this section, we will use probabilistic model check-ing to investigate the relative performance of twodifferent catalyst gate designs. In particular, we studythe effect of omitting garbage collection from thedesign, i.e. the process of tidying up intermediate speciesinto inert structures. Omitting garbage collection makesthe design simpler and cheaper to implement but, as wewill see, has an effect on its performance.

    Figure 9 also presents DSD code for a variant catalystmodule C_NoGC(N,x,y,z) that also implements Ncopies of the chemical reaction X Y! Y Z, butwith the key difference that intermediate strands dis-placed from the gate during execution are not garbage

  • b)

    (1)cx y aat t tt

    hee

    Design and analysis of DNA M. R. Lakin et al. 9

    on March 5, 2012rsif.royalsocietypublishing.orgDownloaded from collected by the gate structure. This means that someintermediate single-stranded fuels remain in solutionafter the gate has nished executing (see gure 11). Wewill quantify the effect that this has on the kinetics of sub-sequent reactions by using PRISM to compute the expectedcompletion times of catalyst gates with and without gar-bage collection. Intuitively, we would expect that theintermediate strands that are not garbage collected willaccumulate over time, gradually slowing the systemdownby providing a larger backward force in the kinetics.

    For both variants of the catalyst gate, we adapt theearlier fragment of PRISM code used to identify the statesof the model in which the gates have executedsuccessfully:

    This code is as before except that, because thereare now two output species we use output1 andoutput2 to refer to the two output signals, which areY and Z in the case of C(1,x,y,z). We no longerrequire that the only reactive strands must be theoutput strands because this is not true for the catalystgate without garbage collection (the leftover fuelstrands are reactive). Furthermore, our denition of

    c a (1)

    (a) (

    (1)

    (1)

    (1)

    (1)x*

    t

    xt

    y

    z c t

    t

    c*

    cx

    x*

    x

    z*

    z

    y*

    y

    a*

    a

    t*

    t

    t* t* t*

    t t

    (1)c*

    c

    y*

    y

    a*

    a

    a*

    a

    t*t*

    t

    t*t*

    tt

    Figure 10. (a) Initial species and (b) expected nal species for tinert structures being present among thgates_reactive must be carefully designed toensure a fair comparison between gates with and withoutgarbage collection. In the case without garbage collection(C_NoGC(N,X,y,z)), the nal gate structures actuallycontain exposed toeholds, because there are no garbagecollection reactions to seal off the nal toehold in thegate. Despite the fact that the nal gate has an exposedtoehold, the irreversible steps introduced by adding anextra domain, which was not present in the design fromCardelli [15], mean that execution of the gate is still irre-versible. Because the a and c domains in the catalystgate are both private to this gate, we can guaranteethat no other strand in the system will be able to reactwith these nished gates in a productive manner. Thus,it is reasonable to adjust our denition of gates_reac-tive in this case so that these structures are not countedas reactive.

    In the case with garbage collection (C(N,x,y,z)),the nal gate structures are indeed completely sealed

    J. R. Soc. Interfaceoff. In order to make a fair comparison of the kinetics,however, we must also designate the penultimate formof each gate structure as unreactive, that is, the struc-ture before the nal garbage collection reaction. Thisis essentially saying that, for the purposes of comput-ing the time to termination, we do not care whetherthe garbage collection reactions have actually takenplace when we decide if the system has termina-ted. Without this, it would not be possible to make afair comparison.

    We rst check that both designs satisfy the correct-ness property given earlier for transducers. Havingestablished this, we then look at the performance ofthe designs, i.e. how quickly they execute. To quantifythe effect of garbage collection on the kinetics ofthe system, we compared the behaviour of the systemsS(N,x)| S(N,y)| C(N,x,y,z) and S(N,x)|S(N,y)| C_NoGC(N,x,y,z) for different values ofN. We vary this parameter because one would expectthat the negative effects of garbage collection willonly begin to accumulate after a number of identicalgates have been executed. Re-using the PRISM temporallogic query from 3.2, we compute the expected timeuntil termination.

    The results of this analysis are presented in gure 12.

    x* c*y* a*a*t* t* t*t*

    (1)x* c*

    c

    z*

    z

    y*

    yx

    a*

    a

    t*

    t

    t*

    t

    t*t*

    tt

    catalyst gate C(1,x,y,z). Garbage collection results in onlynal species. (Online version in colour.)a (1)

    x (1)

    c (2)

    (1)yt

    (1)ztWe observe that, asN increases, the expected completiontime for the gates without garbage collection increasesfaster than for the gates with garbage collection. Thisconrms our intuition that accumulating waste strandsfrom earlier executions of the gate exerts an additionalbackward force on subsequent executions of the gate,gradually slowing the system down.

    3.4. Approximate majority

    We now use the catalyst gates from 3.3 to implement alarger systemthe approximate majority populationprotocol of Angluin et al. [25]using DNA strand displa-cement. The following chemical reactions implement theapproximate majority population protocol:

    a X Y ! Y B c B X ! X Xb Y X ! X B d B Y ! Y Y

  • )hes t

    10 Design and analysis of DNA M. R. Lakin et al.

    on March 5, 2012rsif.royalsocietypublishing.orgDownloaded from (a) (b

    c a (1)

    (1)

    (1)

    (1)

    t

    xt

    y

    z c t

    t

    (1)c*

    c

    z*

    z

    y*

    y

    a*

    a

    t*

    t

    t* t* t*

    t t

    x*

    x(1)

    c*

    c

    y*

    y

    a*

    a

    t*t*

    t

    t*t*

    tt

    Figure 11. (a) Initial species and (b) expected nal species for tgate does not perform the additional garbage collection reaction(Online version in colour.)When an X and a Y meet, reactions (a) and (b) con-vert one of them into an auxiliary species B with equalprobability. Then, when a B meets an X or Y, reactions(c) and (d) convert the B to match the species that itencountered. In this second step, the probability of a Bencountering an X as opposed to a Y depends on theinitial populations of X and Y in the system, and thisfact allows the system to amplify any excess populationof one species over the other to converge on a consensusin which all of the species are converted either to X or toY. Furthermore, it was proved by Angluin et al. [25] thatthe system converges with high probability to the popu-lation that was initially in the majority, if the originalmargin is sufciently large.

    Note that the above-mentioned reactions all involvecatalysts, like those from 3.3. Thus, we can use the -catalyst gates discussed therein to implement thissystem of chemical reactions in the DSD language, asshown in the code in gure 13. In order to reduce thenumber of reactions and to make model checkingmore tractable, we will use a catalyst gate without

    9.08.07.06.05.04.03.02.01.0

    0 2 4N (number of copies of catalyst gate)

    6 8

    expe

    cted

    tim

    e (

    104

    s)

    Figure 12. Plot of expected time to completion for N parallelcopies of catalyst gates with (solid line with lled circles) andwithout (solid line with lled triangle) garbage collection.(Online version in colour.)

    J. R. Soc. Interfacec (2)

    (1)at

    (1)yt

    (1)zt

    (1)x t

    (1)

    (1)

    x* c*

    cx

    y*y

    a*

    a

    t*t

    t*t

    t*t*t

    c*

    c

    z*

    z

    y*

    y

    a*

    a

    t* t*

    t

    t*t*

    tt

    alternative catalyst gate C_NoGC(1,x,y,z). Note that thishat produced the completely inert structures seen in gure 10.garbage collection. Even with this simplication, how-ever, the fact that there are cycles in the chemicalreactions means that the system can potentially use allof the available fuel, which causes the state space togrow enormously. To counteract this effect, we modifythe gate designs from 3.3 further, using the constantkeyword of the DSD language. This keyword declaresthat the population of a particular species should beheld constant across all reactions in the system, eventhose reactions where it is produced or consumed. Thisapproximation can be used when the species is inexcess, allowing us to abstract away from depletion offuels and accumulation of waste, in cases when thesespecies are in very high concentrations. This helps us togreatly restrict the size of the state space in the PRISMmodel, by essentially collapsing any states that differonly by their populations of fuels and/or waste products.

    Because the population protocol is not guaranteed toform a consensus around the species that was initially inthe majority, we use PRISM to compute the probabilitiesof reaching each consensus state (X or Y ) in the DNA

    Figure 13. DNA strand displacement (DSD) code for a cata-lyst gate, which extends the C_NoGC gate from gure 9 byusing the constant keyword from the DSD language toabstract away from population changes due to accumulationof waste and depletion of fuel.

  • Design and analysis of DNA M. R. Lakin et al. 11

    on March 5, 2012rsif.royalsocietypublishing.orgDownloaded from implementation, given different initial populations ofthe species. We constructed PRISM models for variousinput populations and computed the probabilities ofending up with the two possible consensus values.Care must be taken because, even when a consensushas been achieved, the resulting signal strands canstill speculatively bind to the remaining fuel gates,even though the gate will never execute fully (because

    1.0

    0.5

    05

    43

    init X

    prob

    abili

    ty o

    f cho

    osin

    g X

    init X2

    1 1 23 4

    5

    Figure 14. Surface plot which shows the probability of reach-ing a consensus of X, for various initial populations ofX and Y. (Online version in colour.)

    Table 2. Probability of reaching a consensus of X, for variousinitial populations of X and Y.

    Y 1 Y 2 Y 3 Y 4 Y 5X 1 0.5000 0.2531 0.1290 0.0658 0.0334X 2 0.7468 0.5000 0.3156 0.1917 0.1131X 3 0.8709 0.6843 0.5000 0.3462 0.2299X 4 0.9341 0.8082 0.6537 0.5000 0.3651X 5 0.9665 0.8869 0.7699 0.6349 0.5000we have reached a consensus, there are no differentinput species to bind and complete the reaction process)and the strand must therefore eventually unbind again.We use PRISM variables output_x and output_y,which return the number of individuals in the two con-sensus states, taking these transient structures intoaccount in the denitions. The required queries inPRISM are then:

    As a sanity check, we used the following query:

    to check that the probability of eventually ending up ineither consensus state is 1 in all cases. Thus, it sufcesto study just one of the two outcomes. Figure 14shows a plot of the probability of nishing in consensusstate X for initial populations of X and Y rangingbetween 1 and 5. Table 2 shows the same values. If Xor Y equals zero, then no reactions can take place,and for total initial populations exceeding 10, themodel becomes too large to handle. We observe that,if the initial populations of X and Y are the same,then the system is equally likely to form a consensusaround either X or Y. This is illustrated by the contourline at the 0.5 level in gure 14. However, as the initial

    J. R. Soc. Interfaceexcess of one species increases, the probability of endingup in that consensus state increases rapidly, so thatwhen all but one of the initial species are X (say),then the probability of forming a consensus around Xis almost one, as we would expect.

    Theoretical results from Angluin et al. [25] showthat the correct consensus is achieved with high prob-ability if the margin between the initial numbers of Xand Y is above

    N

    plogN , for large total initial popu-

    lation N. So, in gure 15, we also plot the probabilityof reaching a consensus of X against (X02 Y0 )/N,which is the initial difference between X and Y relativeto the total initial population N X0 Y0 , for N vary-ing between 4 and 10. Here, we see more clearly how theprobability grows with the increase in the size ofthe margin jX0 2 Y0j/N. Figure 15 also illustrateshow, for larger population sizes, we see an increasinglyclear threshold above which consensus is achievedwith high probability.

    1.0

    45678910

    0.90.80.70.60.50.40.30.20.1

    01.0 0.5 0

    (init Xinit Y)/N0.5 1.0

    Figure 15. Probability of reaching a consensus of X, plottedagainst (X02 Y0 )/N, which is the difference between theinitial populations of X and Y, relative to the total initialpopulation N X0 Y0 . Results are shown for N 4 . . . 10.Finally, it is worth remarking that an analysis of thissystem in terms of ordinary differential equations(ODEs) has a number of limitations. In particular,the simulation of the ODE model gives us the averagebehaviour of the system, but not the probability of com-puting the majority. Specically, because the system isinherently stochastic, there is always a chance that theminority wins, whereas in the ODE model the majorityalways wins. Furthermore, in cases where both specieshave equal initial populations the ODE model neverconverges to a majority, while the stochastic modeldoes converge. Thus, to correctly analyse the behaviourof the system, a detailed analysis of the chemical masterequation is required, which is analytically unsolvable inthis case. Another alternative is to simply run largenumbers of stochastic simulations, though this is alsointractable, owing to the very low probability of erroras the number of molecules increases.

    Thus, probabilistic model checking allows us to ana-lyse properties that would be considerably more difcultto obtain using other techniques. Essentially, probabil-istic model checking can be viewed as an automated

  • N*LS*

    12 Design and analysis of DNA M. R. Lakin et al.

    on March 5, 2012rsif.royalsocietypublishing.orgDownloaded from L L1

    R

    SR2

    ' 'S1 L N R

    L1 L

    N*

    N

    R

    S1*

    L1

    SL

    R1

    R

    '

    ' '

    ' 'way to solve the chemical master equation for small butnon-trivial model sizes. In particular, it allows us todetermine the distribution of the system, i.e. the indi-vidual possible outcomes of the system and theircorresponding probabilities.

    4. METHODS

    4.1. Model simulation in DNA stranddisplacement

    The syntax of DSD described in 2 interprets systems aswell-mixed solutions: hence we impose a standard setof structural equivalence rules (see Lakin & Phillips [2]for more details). In addition, we assume that nolong domain and its complement are simultaneouslyunbound anywhere in the system. This well-formednessrestriction ensures that two species can only interactwith each other via complementary toeholds.

    The rules in gure 16 present reduction rules thatformalize DNA interactions in the DSD language. Thearrows are labelled with rates that are used to

    S1 S S2

    LS1* S2*S*

    R

    R

    S1 S

    LS1* S*

    L L2 S

    R

    R2

    R

    ' '

    Figure 16. Elementary reduction rules of the DSD programmingsequence S and fst(S) denote its rst domain. We also let N+ anof a toehold N^. We assume that fst(R2)= fst(S2) for rule (Rgiven sequence and that rules (RM) and (RD) are mutually exclu

    J. R. Soc. InterfaceNN*

    R

    R

    R

    S

    LS*

    NN*

    L

    S1

    L1 S1* L

    L1 LR1 R

    L

    L1S 2

    '

    ' '

    '

    ' ' 'parametrize an exponential rate distribution. Rules(RB) and (RU) dene reversible toehold-mediatedbinding and unbinding reactions between a singlestrand and a double-stranded gate complex. Rule(RC) allows complementary toeholds to hybridize ifthey are present in opposite overhanging strands of agate. This situation could arise when an incomingstrand contains multiple toeholds that match up toexposed complementary toeholds in the gate. We donot provide versions of rules (RB), (RU) and (RC),where the complementary domain is not a toeholdbecause our well-formedness assumption ensures thatthe only complementary domains exposed simul-taneously are toeholds. Rules (RM) and (RD) presentprimitive branch migration and strand displacementreactions, respectively. Rule (RD) can be thought ofas the special case of (RM) where there are sufcientlymany matching domains for the branch migration tomake it right to the end of the strand. The additionalassumption that fst(R2)= fst(S2) in rule (RM)ensures that we only derive maximal branch migrationreactions and also ensures that rules (RM) and (RD)

    L2 S R2

    R

    S1 S

    LS1* S*

    L R

    S1 S S2

    LS1* S2*S*

    R

    R R

    ' '

    language. We let S denote the migration rate of a domaind N 2 denote the binding and unbinding rates, respectively,M). This ensures that branch migration is maximal along asive. (Online version in colour.)

  • have an underlying CTMC semantics. Translationsfrom stochastic process algebra such as PEPA and the

    Design and analysis of DNA M. R. Lakin et al. 13

    on March 5, 2012rsif.royalsocietypublishing.orgDownloaded from are mutually exclusive. As an example, the followingsteps illustrate the application of the sequence of rules(RB), (RD) and (RC) on an initial system kt^ x u^ljft^*g[x]fu^*g

    xx x

    x xx

    uu

    ut

    txt

    x* u*t*x*t*

    xt u

    x* u*t*x* u*

    u*t*

    The rules of gure 16 dene the fundamental DNAreactions that we consider in this paper. We do not con-sider reactions that would form chain polymers orcomplex secondary structures, or leak reactions. To com-plete the denition of reduction, however, we must alsoprovide additional contextual rules that allow theseprimitive steps to occur in richer contexts. For example,rule (RB) allows a strand to bind to the bottom-rightoverhanging strand of a segment, whereas in reality itcould bind to any of the overhangs (provided that thereis a complementary toehold available). We also requirethat these reactions can take place partway along morecomplex molecules. Thus, we require each reductionrule to be closed under rotation and mirroring of individ-ual DNA species about the axis of the double-strandedbackbone, and under concatenation of additionalsegments onto either end of the gate complex involvedin a particular reaction.We also lift the rules to transformsystems involving parallel compositions and name restric-tions in the standard way.We refer the reader to Lakin &Phillips [2] for further details.

    The reduction rules in gure 16 provide a detailedmodel of DNA strand displacement reactions betweensingle strands and gate complexes. In fact, for our pur-poses a simpler model would sufce. Hence, we use amerged reduction semantics for the DSD language, inwhich we model toehold binding reactions as having anite rate and all other reactions as instantaneous. Inparticular, we consider gates to be equivalent up tobranch migration; so if two gates differ only by appli-cations of rule (RM), we treat them as if they werethe same gate. These simplications are based on theassumption that toehold binding steps are sufcientlyslow to be rate limiting, which is valid in the limit oflow concentration.

    We write D)N D to denote a merged reduction fromD to D. Formally, D)N D means that D !RB;N D andD !RX1;r1 !RXk ;rk D both hold, for some D, and wherenone of the (RXi) rules are repeat occurrences of (RB).Furthermore, because we are assuming that branchmigration is included in the structural equivalencerelation, there should be no occurrences of (RM) either.In order to improve efciency and reduce the size of theresulting model, we ignore unproductive reactions wherea strand binds onto a gate but cannot perform any sub-sequent reaction other than an unbinding. This

    corresponds to merged reductions of the form D)N D.The merged reduction relation dened earlier is the

    Innite semantics from Lakin & Phillips [2]. This isthe most abstract model of DNA strand displacementinteractions dened in that paper, and allows us toJ. R. Soc. Interfacestochastic calculus [28] have also been developed.Formally, letting R0 denote the set of non-negative

    reals and AP denote a xed, nite set of atomic prop-ositions used to label states with properties of interest,a CTMC is a tuple (S,R,L) where:

    S is a nite set of states; R:(S S)! R0 is a transition rate matrix; L:S! 2AP is a labelling function which associates

    each state with a set of atomic propositions.

    The transition rate matrix R assigns rates to each pairof states, which are used as parameters of the exponen-tial distribution. A transition can only occur betweenstates s and s0 if R(s,s0). 0 and, in this case, the prob-ability of the transition being triggered within t timeunits is 1 2 e2 R(s,s

    0)t. Typically, in a state s, there ismore than one state s0 for which R(s,s0). 0; this isknown as a race condition and the rst transition tobe triggered determines the next state. The time spentin state s before any such transition occurs is exponen-tially distributed with the rate Es Ps0[S Rs; s0,called the exit rate. The probability of moving to states0 is given by R(s,s0)/E(s).produce more compact models for verication. In thispaper, we phrase all models in terms of these mergedrules, which allow us to translate a collection of DNAmolecules into a system of chemical reactions for analysis.Under this condensed semantics, the four reactions fromgure 1 are combined into the following single reaction.

    x

    x*

    y x y

    y*t* t*

    t t

    x x

    x*

    y y

    y*t* t*

    t t

    Where possible (i.e. where models do not become toolarge for verication), we have also re-run the exper-iments in this paper using models obtained withDSDs Default semantics, observing an overall slowdown in reaction time, but otherwise identical patternsin behaviour. For all simulations and analysis, thekinetic rates of toehold binding, toehold unbindingand branch migration were based on the experimen-tal measurements of Zhang & Winfree [26]. Theserates were in turn used to derive the correspondingsimulation and analysis times, in seconds.

    4.2. Probabilistic model checking in PRISM

    PRISM [7] is a probabilistic model checking tool devel-oped at the Universities of Birmingham and Oxford.It provides support for several types of probabilisticmodels, including CTMCs, which we use here. Modelsare specied in a simple, state-based language basedon guarded commands. Support for several otherhigh-level model description languages has also beenmade available through language-level translations tothe PRISM modelling language. For example, PRISM hasthe ability to import SBML [27] specications, which

  • accumulated each time a transition occurs and can be

    had to approximate the populations of fuel and wastespecies in the model as constant in order to preventthe state space from becoming too large to generate.This effect can be mitigated to an extent, for example,by careful gate design: our modied two-domain gateswere deliberately constructed to minimize the numberof asynchronous steps required for garbage collec-tion and sealing off used gates, which greatly expandthe state space. We also used a high level of abstrac-tion (the Innite semantics of DSD [2]) to reduce thenumber of reactions in the model as far as possible.However, even with the cleverest gate design, the

    14 Design and analysis of DNA M. R. Lakin et al.

    on March 5, 2012rsif.royalsocietypublishing.orgDownloaded from used to compute, e.g. the number of protein bindingsover a particular time period.

    PRISM can then be used to specify and verify a rangeof properties of CTMCs, including those expressedin the logic CSL and the reward-based extension ofKwiatkowska et al. [20]. The underlying computationperformed to apply probabilistic model checkinginvolves a combination of graphtheoretical algorithms(e.g. to construct and explore models) and numericalcomputation methods (e.g. to calculate probabilitiesor reward values). For the latter, PRISM typicallysolves linear equation systems or performs transientanalysis. Owing to the size of the models that need tobe handled, it uses iterative methods rather thandirect methods. For solutions of linear equation sys-tems, it supports a range of well-known techniques,including the Jacobi, Gauss-Seidel and SOR (successiveover-relaxation) methods; for transient analysis ofCTMCs, it employs uniformization.

    The PRISM tool also offers a graphical user interfacewith a range of functionality. First, it facilitates con-struction and editing of PRISM models and propertyspecications. In addition, it includes a graph-plottingcomponent for visualization of numerical results fromprobabilistic model checking. It also provides a tool toperform manual execution and debugging of proba-bilistic models, based on an underlying discrete-eventsimulation engine. Another use of this engine is togenerate approximate solutions for the numerical com-putations that underlie the model checking process,by applying Monte Carlo methods and sampling.These techniques offer increased scalability, at theexpense of numerical accuracy. The main strength ofPRISM, though, and the probabilistic model checkingtechniques that it implements, is the ability to computequantitative properties exactly, based on exhaustivemodel exploration and numerical solution.

    5. DISCUSSION

    We have introduced a modied version of Cardellistwo-domain gate designs [22], which is amenable to ver-ication yet retains the key simplication of the two-domain design: initial species contain no overhangsand can thus be constructed by inserting breaks intoone strand of a simple double-stranded DNA complex.This should give higher yields compared with simpleannealing of single strands in a test tube.

    We have demonstrated that probabilistic (and non-probabilistic) model checking can be used to verify aA CTMC can be augmented with rewards, attachedto states and/or transitions of the model. Formally, areward structure for a CTMC is a pair (c,C) where

    c:S! R0 is a state reward function; C:(S S)! R0 is a transition reward function.State rewards can represent either a quantitativemeasure of interest at a particular time instant (e.g.the number of phosphorylated proteins in the system)or the rate at which some measure accumulates overtime (e.g. energy dissipation). Transition rewards areJ. R. Soc. Interfacewide range of properties of individual circuit com-ponents constructing using this design. We showedthat the PRISM model checker can detect bugs owingto crosstalk between gates, analyse quantitative proper-ties such as reliability and performance, and computethe probability of different possible outcomes of thegates execution. These techniques can be particularlyuseful during the initial stages of gate design. Evenmodel checking a single gate executing in isolation, asin 3.2, can help us to identify errors in the designthat would be difcult to quantify using simulation-based methods. Although multiple simulation runscan be used to approximate the probability of a givenerror, performing large numbers of simulations can betime-consuming, particularly in cases such as gure 7,where the error probability is low for large number ofgates. More importantly, model checking can be usedto identify the source of an error, by providing a specicexecution trace of the behaviour that leads to its occur-rence. As illustrated in 3.3, we can also use modelchecking to investigate the potential of differentsystem designs, even when analysed using relativelysmall numbers of inputs. Finally, we have shown howapplying additional abstractions to the populations offuel and waste species can allow us to scale up to verify-ing more complicated systems, such as the approximatemajority population protocol [25].

    Nonetheless, model checking has its limitations. Asthe species populations grow, the number of reactioninterleavings explodes, which causes problems fornaively scaling up to larger systems. Table 3 shows stat-istics for a selection of the largest models that we usedto generate the results in 3 (model checking was runon a 2.80 GHz Dell PowerEdge R410 with 32 GB ofRAM). The table shows the size of each model andthe time required to check a single property. Asexpected, model sizes grow rapidly as population sizesare increased, meaning that models larger than thoseshown in the table could not be analysed. In 3.4, we

    Table 3. Model checking statistics: model sizes and run-times.

    example parameters states time (s)

    buggy transducers (3.2) N 4 57 188 4N 5 284 641 26N 6 1 160 292 145

    approx. majority (3.4) X 3,Y 5 240 286 266X 4,Y 5 674 066 860X 5,Y 5 1 785 250 2602

  • eral strand displacement schemes where strands can

    Design and analysis of DNA M. R. Lakin et al. 15

    on March 5, 2012rsif.royalsocietypublishing.orgDownloaded from sheer number of interleavings will eventually becometoo great.

    One key challenge is to extrapolate the results frommodel checking relatively small systems to systemswith higher numbers of molecules. Being able to ident-ify design aws in individual system components, suchas the buggy transducer gate in 3.1, is already valuablebecause the aws are still likely to occur when the com-ponent is present in larger numbers or is part of a morecomplex design. On the other hand, our full-systemmodel checking does not verify interactions with anarbitrary environment. For example, the buggy trans-ducer gate would appear to work correctly whenmodel-checked in isolationthe unwanted crosstalkonly becomes apparent when two gates are model-checked together. To help address this, we can selec-tively model check a given gate design with all of theremaining gates in the system, in order to identifypossible interferences.

    For quantitative properties, such as performance orreliability, we showed in 3.2 that it is already possibleto make comparisons between gate designs using rela-tively small numbers of molecules. Furthermore, forcertain categories of circuits, such as those involving loca-lized strands tethered to the surface of DNA origami [29],the internal behaviour of each origami circuit can be ana-lysed independently, and then used to accurately predictthe behaviour of potentially millions of circuits in sol-ution. This is because localization signicantly reducescross-talk between circuits, allowing them to be accu-rately analysed in modular and scalable ways. We alsonote that, in the context of cell signalling pathways,PRISM has been used successfully to evaluate regulationmechanisms for the broblast growth factor pathway [6]:behavioural predictions from a PRISM model over smallpopulation numbers were later validated experimen-tally [30]. In this paper, we were able to reproducebehavioural trends previously analysed in a theoreticalsetting [23,25]. In the future, we plan to investigate exper-imental validation of our analysis techniques for DNAstrand displacement circuit designs.

    Other important areas for research include develop-ing techniques to further improve the scalability ofprobabilistic model checking on DNA designsforexample, through the construction of abstractions orby analysing systems in a compositional manner. Prom-ising directions for the former include sliding windowabstractions [31], which optimize the analysis of tem-poral system properties by restricting analysis to aparticular subset of the state space for each phase ofits evolution, and the stochastic hybrid model of [32]for analysing systems in which some populations arepresent in small numbers and others in large numbers.Abstractions will also become essential when modellingTuring-powerful computation with DNA strand displa-cement, because the corresponding reaction network isof potentially unbounded size. In this case, a notion ofdynamically generated reactions is needed, as discussedin Lakin & Phillips [2].

    Regarding compositional techniques, it may bebenecial to consider stochastic Petri nets [33], whichare an alternative means of representing the behaviourof strand displacement systems, and have alreadyJ. R. Soc. Interfacecontain arbitrary domains. This insight could allow usto relate every gate complex with a nite-state automa-ton describing its possible interactions with theenvironment, which could form the basis for a compo-sitional verication technique. Just as we used generictemporal logic formulae to characterize correct nalstates of the reaction gates examined in 3, for compo-sitional model checking we would need to identifytemporal logic formulae that characterize validsequences of interactions between the gate and itsenvironment. We would hope to prove that, if a particu-lar set of gates all satisfy those formulae, then anycascade comprising just those gates would be correctin some sense. Such techniques will become increasinglyimportant as larger and more complex strand displace-ment systems are constructed, such as Qian &Winfrees [14] four-bit square root circuit.

    This work is partially supported by ERC Advanced GrantVERIWARE. We are grateful to the anonymous referees fortheir helpful comments.

    REFERENCES

    1 Phillips, A. & Cardelli, L. 2009 A programming languagefor composable DNA circuits. J. R. Soc. Interface 6,S419S436. (doi:10.1098/rsif.2009.0072.focus)

    2 Lakin, M. R. & Phillips, A. 2011 Modelling, simulatingand verifying Turing-powerful strand displacement sys-tems. In DNA computing and molecular programming:17th Int. Conf., DNA 17, 1923 September 2011 (edsL. Cardelli & W. M. Shih), pp. 130144. Pasadena, CA,USA. Springer Lecture Notes in Computer Science, no.6937. Berlin, Germany: Springer.

    3 Duot, M., Kwiatkowska, M., Norman, G. & Parker, D.2006 A formal analysis of Bluetooth device discovery.Int. J. Softw. Tools Technol. Trans. 8, 621632. (doi:10.1007/s10009-006-0014-x)

    4 Steel, G. 2006 Formal analysis of pin block attacks.Theor. Comput. Sci. 367, 257270. (doi:10.1016/j.tcs.2006.08.042)

    5 Calder, M., Gilmore, S. & Hillston, J. 2006 Modelling theinuence of RKIP on the ERK signalling pathway usingthe stochastic process algebra PEPA. In Transactions onbeen applied to systems and synthetic biology [34].In this approach, places correspond to DNA speciesand transitions to the chemical reactions betweenthem. Previous work has explored compositionalmodel checking of (non-stochastic) modular Petrinets [35] composed by transition sharing. In fact, inthe context of Petri net-based models for strand displa-cement systems, it would be advantageous to considercompositions based on both sharing of transitions andof places.

    We believe that advances in abstraction techniquesand compositional model checking will be vital inorder to apply model checking to larger DNA stranddisplacement systems. In fact, the two-domain schemeis an excellent framework for research into compo-sitional verication of strand displacement circuitsbecause the restricted syntax makes it straightforwardto compute, for any gate complex, the set of all single-stranded species that could interact with the complex.This is a much more challenging problem for more gen-

  • computational systems biology VII, vol. 4230 (edsC. Priami, A. Ingolfsdottir, B. Mishra & H. R. Nielson),pp. 123. LNBI. Berlin, Germany: Springer.

    6 Heath, J., Kwiatkowska, M., Norman, G., Parker, D. &Tymchyshyn, O. 2008 Probabilistic model checking ofcomplex biological pathways. Theor. Comput. Sci. 319,239257. (doi:10.1016/j.tcs.2007.11.013)

    7 Kwiatkowska, M., Norman, G. & Parker, D. 2011 PRISM

    22 Cardelli, L. In press. Two-domain DNA strand displace-ment. Math. Struct. Comput. Sci.

    23 Seelig, G. & Soloveichik, D. 2009 Time-complexity of mul-tilayered DNA strand displacement circuits. In DNAcomputing and molecular programming: 15th Int. Conf.,DNA 15, Fayetteville, AR, USA, 811 June 2009, RevisedSelected Papers, vol. 5877 (eds R. Deaton & A. Suyama).Lecture Notes in Computer Science, pp. 144153. Berlin,

    16 Design and analysis of DNA M. R. Lakin et al.

    on March 5, 2012rsif.royalsocietypublishing.orgDownloaded from Proc. 23rd Int. Conf. on Computer Aided Verication(CAV11), Springer Lecture Notes in Computer Science,vol. 6806, pp. 585591. Berlin, Germany: Springer

    8 Seelig, G., Soloveichik, D., Zhang, D. Y. & Winfree, E.2006 Enzyme-free nucleic acid logic circuits. Science314, 15851588. (doi:10.1126/science.1132493)

    9 Green, S. J., Lubrich, D. & Turbereld, A. J. 2006 DNAhairpins: fuel for autonomous DNA devices. Biophys. J.91, 26992675. (doi:10.1529/biophysj.106.084681)

    10 Zhang, D. Y. & Seelig, G. 2011 Dynamic DNA nanotech-nology using strand-displacement reactions. Nat. Chem. 3,103113. (doi:10.1038/nchem.957)

    11 Yurke, B. & Mills Jr, A. P. 2006 Using DNA to powernanostructures. Genet. Program. Evol. Mach. Arch. 4,111122. (doi:10.1023/A:1023928811651)

    12 Benenson, Y., Paz-Elizur, T., Adar, R., Keinan, E.,Livneh, Z. & Shapiro, E. 2001 Programmable and auton-omous computing machine made of biomolecules. Nature414, 430434.

    13 Venkataraman, S., Dirks, R. M., Ueda, C. T. & Pierce,N. A. 2010 Selective cell death mediated by small con-ditional RNAs. Proc. Natl Acad. Sci. USA 107, 16 77716 782. (doi:10.1073/pnas.1006377107)

    14 Qian, L. & Winfree, E. 2011 Scaling up digital circuit com-putation with DNA strand displacement cascades. Science332, 11961201. (doi:10.1126/science.1200520)

    15 Cardelli, L. 2010 Two-domainDNA strand displacement. InDevelopments in computationalmodels (DCM2010), vol. 26(eds S. B. Cooper, E. Kashe & P. Panangaden). ElectronicProc. in Theoretical Computer Science, pp. 4761

    16 Clarke, E., Grumberg, O. & Peled, D. 1999 Model check-ing. Cambridge, MA: MIT Press.

    17 Baier, C. & Katoen, J. P. 2008 Principles of model check-ing. Cambridge, MA: MIT Press.

    18 Aziz, A., Sanwal, K., Singhal, V. & Brayton, R. 2000Model-checking continuous time Markov chains. ACM Trans.Comput. Logic 1, 162170. (doi:10.1145/343369.343402)

    19 Baier, C., Haverkort, B., Hermanns, H. & Katoen, J. P.2003 Model-checking algorithms for continuous-timeMarkov chains. IEEE Trans. Softw. Eng. 29, 524541.(doi:10.1109/TSE.2003.1205180)

    20 Kwiatkowska, M., Norman, G. & Parker, D. 2007 Stochas-tic model checking. In Formal methods for the design ofcomputer, communication and software systems: perform-ance evaluation (SFM07), vol. 4486 (eds M. Bernardo &J. Hillston). Lecture Notes in Computer Science (tutorialvolume), pp. 220270. Berlin, Germany: Springer.

    21 Kwiatkowska, M., Norman, G. & Parker, D. 2010Symbolic systems biology. In Probabilistic model checkingfor systems biology (ed. M. Sriram Iyengar), pp. 3159.Sudbury, MA: Jones and Bartlett.J. R. Soc. Interface24 Soloveichik, D., Seelig, G. &Winfree, E. 2010 DNA as a uni-versal substrate for chemical kinetics. Proc. Natl Acad. Sci.USA 107, 53935398. (doi:10.1073/pnas.0909380107)

    25 Angluin, D., Aspnes, J. & Eisenstat, D. 2008 A simplepopulation protocol for fast robust approximate majority.Distrib. Comput. 21, 87102. (doi:10.1007/s00446-008-0059-z)

    26 Zhang, D. Y. & Winfree, E. 2009 Control of DNA stranddisplacement kinetics using toehold exchange. J. Am.Chem. Soc. 131, 17 303 17 314.(doi:10.1021/ja906987s)

    27 Hucka, M. et al. 2003 The systems biology markuplanguage (SBML): a medium for representation andexchange of biochemical network models. Bioinformatics9, 524531. (doi:10.1093/bioinformatics/btg015)

    28 Priami, C., Regev, A., Shapiro, E. & Silverman, W. 2001Application of a stochastic name-passing calculus torepresentation and simulation of molecular processes. Inf.Process. Lett. 80, 2531. (doi:10.1016/S0020-0190(01)00214-9)

    29 Chandran, H., Gopalkrishnan, N., Phillips, A. & Reif,J. H. Localized hybridization circuits. In DNA computingand molecular programming, 17th Int. Conf. DNA 17,Pasadena, CA, USA, 1923 September 2011, vol. 6937(eds L. Cardelli & W. M. Shih). Lecture Notes in Compu-ter Science, pp. 6483. Berlin, Germany: Springer.

    30 Sandilands, E., Akbarzadeh, S., Vecchione, A., McEwan,D. G., Frame, M. C. & Heath, J. K. 2007 Src kinase modu-lates the activation, transport and signalling dynamics ofbroblast growth factor receptors. EMBO Rep. 8, 11621169. (doi:10.1038/sj.embor.7401097)

    31 Wolf, V., Goel, R., Mateescu, M. & Henzinger, T. 2010Solving the chemical master equation using slidingwindows. BMC Syst. Biol. J. 4, 42. (doi:10.1186/1752-0509-4-42)

    32 Henzinger, T., Mateescu, M., Mikeev, L. & Wolf, V.2010 Hybrid numerical solution of the chemical masterequation. In Proc. 8th Int. Conf. on ComputationalMethods in Systems Biology (CMSB10), pp. 5565.New York, NY: ACM.

    33 Goss, P. J. E. & Peccoud, J. 1998 Quantitative modeling ofstochastic systems in molecular biology by using stochasticPetri nets. Proc. Natl Acad. Sci. USA 95, 67506755.(doi:10.1073/pnas.95.12.6750)

    34 Heiner, M., Gilbert, D. & Donaldson, R. 2008 Petri netsfor systems and synthetic biology. In Proc. SFM 2008,vol. 5016 (eds M. Bernardo, P. Degano & G. Zavattaro),Lecture Notes in Computer Science, pp. 215264. Berlin,Germany: Springer.

    35 Christensen, S. & Petrucci, L. 2000 Modular analysis ofPetri nets. Comput. J. 43, 224242. (doi:10.1093/comjnl/43.3.224)4.0: verication of probabilistic real-time systems. In Germany: Springer.

    Design and analysis of DNA strand displacement devices using probabilistic model checkingIntroductionBackgroundDNA strand displacementThe DNA strand displacement programming languageProbabilistic model checking

    ResultsTransducer gates: correctnessTransducer gates: performance and reliabilityCatalyst gatesApproximate majority

    MethodsModel simulation in DNA strand displacementProbabilistic model checking in prism

    DiscussionThis work is partially supported by ERC Advanced Grant VERIWARE. We are grateful to the anonymous referees for their helpful comments.REFERENCES