learning and detecting emergent behavior in networks of...

Learning and Detecting Emergent Behaviorin Networks of Cardiac Myocytes

R. Grosu1, E. Bartocci1,2, F. Corradini2,E. Entcheva3, S.A. Smolka1, and A. Wasilewska1

1 Department of Computer Science, Stony Brook UniversityStony Brook, NY 11794-4400, USA

2 Department of Mathematics and Computer Science, University of CamerinoCamerino (MC), I-62032, Italy

3 Department of Biomedical Engineering, Stony Brook UniversityStony Brook, NY 11794-8181, USA

Abstract. We address the problem of specifying and detecting emer-gent behavior in networks of cardiac myocytes, spiral electric waves inparticular, a precursor to atrial and ventricular fibrillation. To solve thisproblem we: (1) Apply discrete mode-abstraction to the cycle-linear hy-brid automata (CLHA) we have recently developed for modeling thebehavior of myocyte networks; (2) Introduce the new concept of spatial-superposition of CLHA modes; (3) Develop a new spatial logic, basedon spatial-superposition, for specifying emergent behavior; (4) Devisea new method for learning the formulae of this logic from the spatialpatterns under investigation; and (5) Apply bounded model checking todetect (within milliseconds) the onset of spiral waves. We have imple-mented our methodology as the Emerald tool-suite, a component ofour EHA framework for specification, simulation, analysis and controlof excitable hybrid automata. We illustrate the effectiveness of our ap-proach by applying Emerald to the scalar electrical fields produced byour CellExcite simulator.

1 Introduction

One of the most important and intriguing questions in systems biology is howto formally specify emergent behavior in biological tissue, and how to efficientlypredict and detect its onset. A prominent example of such behavior is electricalspiral waves in spatial networks of cardiac myocytes (heart cells). Spiral wavesof this kind are a precursor to a variety of cardiac disturbances, including atrialfibrillation (AF), an abnormal rhythm originating in the upper chambers of theheart. AF afflicts 2-3 million Americans alone, putting them at risk for clots andstrokes. Moreover, the likelihood of developing AF increases with age.

In this paper, we address this question by proposing a simple and efficientmethod for learning, and automatically detecting the onset of, spiral waves in car-diac tissue. See Figure 1 for an overview of our approach. Underlying our methodis a linear spatial-superposition logic (LSSL) we have developed for specifying

CX−Simulation Superposition QTPSelection

ClassificationBMChecking

DSEF SQTSQTP

SQTPSQTCLHAN

LSSLF

LSSLF: LSSL Formula

SQTP: SQT Path

CLHAN: CLHA Network

CX: CellExcite Tool

DSEF: Discrete SE Field

Fig. 1. Overview of our method for learning and detecting spiral waves.

properties of spatial networks. LSSL is discussed in greater detail below. Ourmethod also builds upon hybrid-automata, image-processing, machine-learning,and model-checking techniques to first learn an LSSL formula that character-izes such spirals. The formula is then automatically checked against a quadtreerepresentation [15] of the scalar electrical field (SEF), produced by simulatinga hybrid automata network modeling the myocytes, at each discrete time step.The quadtree representation is obtained via discrete mode-abstraction and hier-archical superposition of the elementary units within the SEF.

The electrical behavior of cardiac myocytes is hybrid in nature: they exhibitan all-or-nothing electrical response, the so-called action potential, to an externalexcitation. Despite their hybrid nature, networks of myocytes have traditionallybeen modeled using nonlinear partial differential equations. While highly accu-rate in describing the molecular processes underlying cell behavior, these modelsare not particularly amenable to formal analysis and typically do not scale wellfor the simulation of complex cell networks.

In [11], we showed that it is possible to automatically learn a much simplerCycle-Linear Hybrid Automaton (CLHA) for cardiac myocytes, which describestheir action potential up to a specified error margin. Moreover, as we have shownin [2], one can use a variant of this model [19, 20] to efficiently (up to an orderof magnitude faster) and accurately simulate the behavior of myocyte networks,and, in particular, induce spirals and fibrillation.

A key observation concerning our simulations (see Figure 2) is that mode-abstraction, in which the action-potential value of each CLHA in the networkis discretely abstracted to its corresponding mode, faithfully preserves the net-work’s waveform and other spatial characteristics. Hence, for the purpose oflearning, and detecting the onset of, spirals within CLHA networks, we can ex-ploit mode-abstraction to dramatically reduce the system state space. A similarmode-abstraction is possible for voltage recordings in live cell networks.

The state space of a 400× 400 CLHA network is still prohibitively large,even after applying the above-described abstraction: it contains 4160,000 modes,as each CLHA has four mode values. To combat this state explosion, we use aspatial abstraction inspired by [12]: we regard the mode of each automaton asa probability distribution and define the superposition of a set of CLHAs-modeas the probability that an arbitrary CLHA’s-mode in this set has a particu-lar value. By successively applying superposition to the network, we obtain atree structure, the root of which is the mode-superposition of the entire CLHA

network, and the leaves of which are the modes of the individual CLHA. The par-ticular superposition tree structure we employ, quadtrees, is inspired by image-processing techniques [15]. We shall refer to quadtrees obtained in this manneras superposition-quadtrees (SQT).

Our LSSL logic is an appropriate logic for reasoning about paths in SQTs, andthe spatial properties of CLHA networks in which we are interested, includingspirals, can be cast in LSSL. For example, we have observed that the presenceof a spiral can be formulated in LSSL as follows: Given an SQT, is there apath from its root leading to the core of a spiral? Based on this observation, webuild a machine-learning classifier, the training-set records for which correspondto the probability distributions associated to the nodes along such paths. Eachdistribution, for mode value stimulated, corresponds to an attribute of a training-set record, with the number of attributes bounded by the depth of the SQT. Anadditional attribute is used to classify the record as either spiral or non-spiral.For spiral-free SQTs, we simply record the path of maximum distribution.

For training purposes, we use the CellExcite simulator [2] to generate, uponsuccessive time steps, snapshots of a 400× 400 CLHA network and their mode-abstraction; see Figures 1,2. Training data for the classifier is then generatedby converting the hybrid-abstracted snapshots into SQTs and selecting pathsleading to the core of a spiral (if present). The resulting table is input to thedecision-tree algorithm of the Weka machine-learning tool suite [8], which pro-duces a classifier in the form of a predicate over the node-distribution attributes.

The syntax of LSSL is similar to that of linear temporal logic, with LSSL’sNext operator corresponding to concretization (anti-superposition). Moreover, a(finite) sequence of LSSL Next operators corresponds to a path through an SQT.The classifier produced by Weka can therefore be regarded as an LSSL formula.An SQT path can be thought of as a magnifying glass, which starting from theroot, produces an increasingly detailed but more focused view of the image. Thiseffect is analogous to concept hierarchy in data mining [13] and arguably similarto the way the brain organizes knowledge: a human can recognize a word or apicture without having to look at all of the characters in the word or all of thedetails in the picture, respectively.

We are now in a position to view spiral detection as a bounded-model-checking problem [3]: Given the SQT Q generated from the discrete SEF ofa CLHA network and an LSSL formula ϕ learned through classification, is therea finite path π in Q satisfying ϕ? We use this observation to check every secondduring simulation, whether or not a spiral has been created. More precisely, theLSSL formula we use states that no spiral is present, and we thus obtain as acounterexample one or all the paths leading to the core of a spiral. In the lattercase, we can identify the number of spirals in the SEF and their actual position.

The above-described method (including user-guided path selection) has beenfully implemented as the Emerald tool suite for automated spiral learning anddetection. Emerald is written in Java, and it is a new component of our EHA

environment for the specification, simulation, analysis and control of networksof Excitable Hybrid Automata. It is freely available from [10].

The rest of the paper is organized as follows. Section 2 reviews excitable-cell networks and their modeling with CLHA. Section 3 defines superpositionand quadtrees, the essential ideas underlying linear spatial-superposition logic,the topic of Section 4. Section 5 describes our learning and bounded-model-

checking techniques; their implementation is considered in Section 6, along withour experimental results. Section 7 discusses related work. Section 8 offers ourconcluding remarks and directions for future research.

2 Biological Background

An excitable cell has the ability to propagate an electrical signal, known atthe cellular level as the Action Potential (AP), to neighboring cells. An AP

corresponds to a change of potential across the cell membrane, and is caused bythe flow of ions between the inside and outside of the cell through ion channels.Generally, an AP is an externally triggered event: a cell fires an action potentialas an all-or-nothing response to a supra-threshold stimulus, and each AP followsthe same sequence of phases and maintains the same magnitude regardless ofthe applied stimulus. During an AP, generally no re-excitation can occur.

Despite differences in AP duration, morphology and underlying ion currents,the following major AP phases can be identified across different species anddifferent excitable-cell types: resting, stimulated, upstroke, early repolarization,plateau and final repolarization. We shall subsequently use the following abbre-viations for these phases: r (resting and final repolarization), s (stimulated), u(upstroke), and p (plateau and early repolarization).

Using these AP phases as a guide, we have developed, for several represen-tative excitable-cell types, Cycle-Linear Hybrid Automata (CLHA) models thatapproximates their AP and other bio-electrical properties with reasonable accu-racy [19, 20, 11]. This derivation was first performed manually [19, 20]. We sub-sequently showed in [11] how to fully automate this process by learning variousbiological aspects of the AP of the different cell types. The CLHA we obtained arefairly compact in nature, employing two or three continuous state variables andfour to six modes. The term Cycle-Linear is used to highlight the cyclic structureof CLHA, and the fact that while in each cycle they exhibit linear dynamics, thecoefficients of the corresponding linear equations and mode-transition guardsmay vary in interesting ways from cycle to cycle.

The dynamics of excitable-cell networks play an important role in the phys-iology of many biological processes. For cardiac cells, on each heart beat, anelectrical control signal is generated by the sinoatrial node, the heart’s internalpacemaking region. Electrical waves then travel along a prescribed path, excitingcells in atria and ventricles and assuring synchronous contractions. Of special in-terest are cardiac arrhythmias: disruptions of the normal excitation process dueto faulty processes at the cellular level, single ion-channel level, or at the level ofcell-to-cell communication. The clinical manifestation is a rhythm with alteredfrequency, tachycardia or bradycardia, or the appearance of multiple frequen-cies, polymorphic Atrial Tachycardia (AT), with subsequent deterioration to achaotic signal, Atrial Fibrillation (AF). AF is a serious condition in which thereis uncoordinated contraction of the cardiac muscle of the atria in the heart. As aresult, the heart fails to adequately pump blood, putting 2-3 million Americansalone at risk for clots and strokes. Moreover, AF likelihood increases with age.

Resting State Stimulated State Upstroke State Plateau State

Continuous behavior

Discrete behavior

1° Stimulus Normal Wave Propagation Ventricular Fibrillation2° Stimulus

1 ms 50 ms 100 ms 150 ms 200 ms 250 ms 300 ms 350 ms 400 ms

1° Stimulus Normal Wave Propagation Ventricular Fibrillation2° Stimulus

1 ms 50 ms 100 ms 150 ms 200 ms 250 ms 300 ms 350 ms 400 ms

Fig. 2. Simulation of continuous and discrete behavior of CLHA network.

In order to simulate the emergent behavior of cardiac tissue, we have devel-oped CellExcite [2], a CLHA-based simulation environment for excitable-cellnetworks. CellExcite allows the user to sketch a tissue of excitable cells, planthe stimuli to be applied during simulation, and customize the arrangement ofcells by selecting the appropriate lattice. Figure 2 presents our simulation resultsfor a 400× 400 CLHA network. The network was stimulated twice during simu-lation, at different regions. The results we obtain demonstrate the feasibility ofusing CLHA networks to capture and mimic different spatiotemporal behavior ofwave propagation in 2D isotropic cardiac tissue, including normal wave propaga-tion (1-150 ms); the creation of spirals, a precursor to fibrillation (200-250 ms);and the break-up of such spirals into more complex spatiotemporal patterns,signaling the transition to ventricular fibrillation (250-400 ms).

As can be clearly seen in Figure 2, a particular form of discrete abstraction,in which the action-potential value of each CLHA in the network is discretely ab-stracted to its corresponding mode, faithfully preserves the network’s waveformand other spatial characteristics. Hence, for the purpose of learning and detect-ing spirals within CLHA networks, we can exploit discrete mode-abstraction todramatically reduce the system state space.

3 Superposition and Quadtrees

A key benefit of hybrid automata compared to nonlinear ODEs is their explicitsupport for finite mode abstraction: the infinite range of values of a hybridautomaton’s continuous state variables can be abstracted to the automaton’s

discrete finite set of modes. As discussed in Sections 1,2, abstracting the AP

(voltage) of the constituent CLHA in a CLHA network to their correspondingmode (s, u, p or r) turns out to faithfully preserve the network’s waveform andother spatial characteristics. This allows us to reduce the spiral-onset verificationproblem to a finite-state verification problem.

Unfortunately, the state space of a 400×400 CLHA network, which would benecessary to simulate the behavior of a tissue of about 16 cm2 in size, is stilltoo large for analysis purposes: it has 4160,000 mode values! To combat stateexplosion, we use a spatial abstraction inspired by [12]: we regard the mode ofa CLHA as a degenerate probability distribution and define the superpositionof a set of (possibly superposed) modes as the mean of their distributions. Bysuccessively applying superposition, we obtain a tree whose root is the mode-superposition of the entire CLHA network, and whose leaves are the individualmode of the component CLHA. The particular superposition tree structure weemploy, the quadtree, was inspired by image-processing techniques [15].

Let A be a 2k ∗ 2k matrix of CLHA modes. A quadtree Q = (V,R) repre-sentation of A is a quaternary tree, such that each vertex v ∈ V represents asub-matrix of A. For example, the root v0 of the quadtree in Figure 3 representsthe entire matrix; child v1 represents the matrix {2k−1, . . . , 2k}× {0, . . . , 2k−1};child v6 represents the matrix {2k−1, . . . , 3 ∗ 2k−2} × {0, . . . , 2k−2}; etc.

2

v8v7v6v5

v6 v5

v8v7 v41

1 432

32

d=2

d=1

d=0

(b)(a)

v4v3

v

v0

v1

v1

v02

v4v3

Fig. 3. Quadtree representation

Definition 1 (Leaf distribution). Let N be a CLHA network whose con-stituent CLHA have modes M = {s, u, p, r}, and let Q be the quadtree represen-tation of N . Then each leaf node l ∈ Q has an associated degenerate leaf distri-bution Dl, whose probability mass function (PMF) satisfies: ∃m∈M. pl(m) = 1.

The intuition is as follows. If the leaf occurs at the maximum depth of thequadtree, then it corresponds to the mode of a CLHA. As CLHA are determin-istic, their states assume one of the values in M with probability 1.4 If the leafdoes not occur at the maximum depth of the quadtree, then it corresponds to thesuperposition of identical degenerate distributions, and no additional informa-tion is obtained by decomposing the leaf into its four superposition components.The visual interpretation is that a pixel has one definite color, and that nothingis learned by decomposing an area in which all pixels have the same color.

Definition 2 (Interior-node distribution). Let N be a CLHA network whoseconstituent CLHA have modes M = {s, u, p, r}, and let Q be the quadtree rep-resentation of N . Furthermore, let i ∈ Q be an interior node with children4 We will weaken this restriction at the end of the section.

i1, . . . , i4. Then i has an associated superposition distribution Di whose PMF

satisfies: ∀m∈M. pi(m) = 1/4∑4

j=1 pij(m).

If all of i’s children are leaves, then, for each mode value m, i’s superpositionis the mean of the occurrences of m. Hence, the probability that the mode ofthe parent is m is the probability that the mode of an arbitrary child is m. If i’schildren are interior nodes, it still holds that the probability that i’s mode is mis the probability that the mode of an arbitrary leaf below i’s children is m.

We call a quadtree whose nodes are labeled with leaf and interior-node dis-tributions a superposition quadtree (SQT). The distributions in an SQT are notknown in advance. The task of our learning algorithm is to determine these dis-tributions for what we perceive to be spirals. The use of probability distributionsis justified by the fact that different spirals might have slightly different shapes;i.e., slightly different values for the leaf nodes of their associated quadtrees.

1

(b) (d)(c)

1

(a)

1

y4

y3

1

1

2 43

1

4

1 2 3

4

1 4

2 3

0

4

x2

x x2 3

1,32

1

x

1

0

1 0

1 0

0

1

0

Fig. 4. Fractals as finite SQGs: (a) x = 2/3, (b) x = 5/11, y = 4/11, (c) x = 1/2.

The SQTs presented so far were constructed over a finite matrix A contain-ing 2k ∗ 2k elements. In general, however, SQTs can be obtained via the finiteunfolding of a superposition quad-graph. Let 4= {1, . . ., 4}.

Definition 3 (Superposition quad-graph (SQG)). A superposition quad-graph is a 4-tuple G = (V, v0, R, L) consisting of:

– A finite set of vertices V with initial vertex v0 ∈V ,– A transition relation R⊆V ×4×V s.t. ∀v ∈V, i∈4∃u∈V. (v, i, u)∈R,– A probability-distribution labeling L s.t. ∀v ∈V. L(v) = 1/4

∑u∈R(v) L(u).

The condition on R ensures that each vertex in V has precisely four successorsin R. The condition on L ensures that the probability distributions are relatedthrough superposition. Constructing SQTs as finite unfoldings of SQGs is morepowerful as it also supports the definition of infinite SQTs generated by recursion.That is, it supports the definition of fractals.

Figure 4 gives the specification of three fractals and the unfolding of theirSQGs up to depth 3. Recursive nodes are labeled by distribution variables, thevalues for which can be computed by solving a linear system. For example, x andy in Figure 4(b) are computed by solving the linear system x=1/4 (x + 1 + y)and y=1/4 (1 + x). In the pictures on the right, gray areas represent recursivenodes. The four self-loops of the leaves are not shown for simplicity. Note thatleaves may now be associated with any constant distribution. Also note thatgraphs (a) and (d) yield equivalent infinite SQTs.

4 Linear Spatial-Superposition Logic

Every finite SQT can be transformed into an SQG by adding to each leaf aself-loop labeled by i, for i∈4. Moreover, an SQG can be transformed into aKripke structure by erasing (forgetting) the transition labeling, collapsing iden-tical transitions, and assuming nondeterminism among transitions emanatingfrom the same node. For example, applying this forgetful transformation to theSQGs of Figure 4 yields the Kripke structures of Figure 5, where the self-loopsare made explicit. The Kripke structure of Figure 5(d) can be seen as a minimal-state equivalent of the one of Figure 5(b).

(a) (b)

1

(c) (d)

1

y

1

x

x

0 1

y x

x

1

10

001

1 0 0

Fig. 5. Kripke structures for SQGs of Figure 4.

Definition 4 (Kripke structure (KS)). A Kripke structure over a set ofatomic propositions AP is a four-tuple M = (S, I,R, L) consisting of:

– A countable set of states S, with initial states I ⊆S,– A transition relation R∈S×S with ∀s∈S ∃ t∈S. (s, t)∈R,– A labeling (or interpretation) function L : S→ 2AP .

The condition associated with the transition relation R ensures that every statehas a successor in R. Consequently, it is always possible to construct an infinitepath through the KS, an important property when dealing with reactive systems.In our case, it means that we can reason about recursive SQTs, i.e. fractals.

The labeling function L defines for each state s∈S the set L(s) of atomicpropositions that are valid in s. Our atomic propositions are inequalities overdistributions. Syntactically, they are written as follows: P (D=m)∼ d, where Dis a distribution function, m∈M for M ⊂R is a discrete value (e.g. a mode),d∈ [0..1], and ∼ is one of <, ≤, =, ≥, or >.

In order to verify properties of a reactive system modeled as a KS K, itis customary to use either a linear-time or a branching-time temporal logic. Amodel for a linear-time logic (LTL) formula is an infinite path π in K. A modelfor a branching-time logic formula is K itself; given a state s of K, this allowsone to quantify over the paths originating from s. For our current purposes ofspecifying, and detecting the onset of, spirals, LTL suffices.

Strictly speaking, our logic is a linear spatial-superposition logic (LSSL), asa path π in K represents a sequence of concretizations (anti-superpositions).Syntactically, however, our temporal-logic operators are the same as in LTL:the next operator X with Xϕ meaning that ϕ holds in a concretization of thecurrent state; its inverse operator B; the until operator U , with ϕUψ meaningthat ϕ holds along a path until ψ holds; and the release operator R, with ψRϕmeaning that ϕ holds along a path unless released by ψ.

Definition 5 (LSSL Syntax). The syntax of linear space-superposition logicis defined inductively as follows:

ϕ ::= > | ⊥ | P [D = m] ∼ d | ¬φ | ϕ ∨ ψ | Xϕ | Bϕ | ϕUϕ | ϕRϕ∼ ::= < | ≤ | = | ≥ | >

Although Kripke structures and the LSSL logic allow us to reason about infi-nite paths, physical considerations—such as the number of myocytes in a cardiactissue or the screen resolution—impose a maximum length k on such paths. Thelength k, however, is maintained as a parameter in LSSL’s semantic definition,permitting us to accommodate any number of myocytes or any screen resolu-tion. Defining LSSL’s semantics in this manner places us within the frameworkof bounded-model-checking [3].

Definition 6 (LSSL Semantics). Let K be a KS and π a path in K. Then,for k≥ 0, π satisfies an LSSL formula ϕ with bound k, written π |=k ϕ, if andonly if π |=0

k ϕ, where:

π |=ik > and π 6|=i

k ⊥π |=i

k p ⇔ p ∈ L(π[i])π |=i

k ¬ϕ ⇔ π 6|=ik ϕ

π |=ik ϕ ∨ ψ ⇔ π |=i

k ϕ or π |=ik ψ

π |=ik Xϕ ⇔ i < k and π |=i+1

k ϕ

π |=ik Bϕ ⇔ 0 < i ≤ k and π |=i−1

k ϕ

π |=ik ϕUψ ⇔ ∃j. i ≤ j ≤ k. π |=j

k ψ and ∀n. i ≤ n < j. π |=nk ϕ

π |=ik ψRϕ ⇔ ∀j. i ≤ j ≤ k. π |=j

k ϕ or ∃n. i ≤ n < j. π |=nk ψ

We say that K |=k ϕ if for all paths π in K, π |=k ϕ.

Our release operator R is a bounded version of the LTL’s R operator. Similarly,the globally operator G, defined as Gϕ ≡ ⊥Rϕ, is a bounded version of LTL’s Goperator. The finally operator F is defined as usual as Fϕ ≡ >U ϕ. In general,the unbounded LTL version of G is assumed to not hold. For example, Gϕ doesnot hold as ϕ could be violated at k+1; to decide Gϕ in LTL wrt. a bound k,one needs a more sophisticated analysis of the KS K, as discussed in [3].

As an example, consider an unfolding depth k of the KS in Figure 5(a), andassume the distributions correspond to mode s. This KS has a path π suchthat π |=k G (P [D= s] = 2/3) holds: the path that always returns to x. Toautomatically find π we will model-check the negation of the above formula.This will return π as a counterexample. By using the techniques in [3], one canshow that π also satisfies the unbounded LTL version of the formula.

5 Model Checking and Learning

Bounded model checking. Given a Kripke structure K, an LSSL formula ϕ, anda bound k, a bounded model checker (BMC) efficiently verifies if K |=k ¬ϕ. If so,

it returns one or more paths π in K that violate ϕ; otherwise, it returns true.Intuitively, a BMC applies the LSSL semantics inductively defined in Section 4to each path π in K. We have implemented a simple prototype BMC for Kripkestructures K derived from SQTs and LSSL formulae. The BMC first enumeratesall paths in K, and then for each path, it applies the LSSL semantics. ThisBMC is efficient enough to check within milliseconds the onset of spirals. We arecurrently improving the BMC for safety formulae (formulae without F operator),by traversing the SQT and pruning all subtrees of a vertex as soon as we detectthat the current path violates ¬ϕ. A more ambitious SAT-based BMC is alsounder development.

Machine learning. Writing the LTL formulae that a reactive system should sat-isfy is a nontrivial task. Developers often find it difficult to specify the systemproperties of interest. The classification of LTL formulae into safety (somethingbad should never happen) and liveness properties (something good should even-tually happen) provides some guidance, but the task remains difficult.

Writing LSSL formulae describing emerging properties of CLHA networks iseven more difficult. For example, what is the LSSL formula for spiral onset? In thefollowing, we describe a surprisingly simple, machine-learning-based approachthat we have successfully applied to spiral detection. The main idea is to castthe onset property as follows: Is there a path in the given SQT leading to thecore of a spiral? The implementation is simple as well. For an SEF produced bythe CellExcite simulator (see Figure 1), our Emerald tool set allows the userto select a path through the SEF’s corresponding SQT simply by clicking on apoint in the SEF (e.g. in the core of a spiral). If no spiral is present, the SQT

path with maximum PMF (probability mass function) is returned. Note that thismethod is not restricted to spirals: path selection via clicking on a representativepoint can be applied to normal wave propagation, wave collision, etc.

The paths so obtained are then used to learn the LSSL formula for theproperty we are interested in, such as spiral onset. The learning algorithm worksas follows: (1) For each path of length k, where k is the height of the SQT, wedefine k attributes a1, . . ., ak such that each ai holds the PMF value of vertex vi,for the mode we are interested in (for spirals, mode s); (2) Each path is classifiedby Emerald as spiral or non-spiral, depending on whether or not the user clickedon a point (core); the classification is stored as an additional classifier attributec; (3) All records (ai, . . ., ak, c) are stored in a table, which is provided to thedata classification phase; (4) At the end of this phase we obtain a path classifierwhich we translate into an LSSL formula.

Data classification [17] is generally a two-step process: training and testing.For training, we choose a classification algorithm that learns a set of descriptionsof our training data set. The form of these descriptions depends on the type ofclassification algorithm employed. For testing, we use a test data set, disjointfrom the training set, and containing the class attribute with a known value.The accuracy of the classifier on a given test set is the percentage of the testrecords that are correctly classified. Various techniques can be used to obtain testand training sets from an initial set of records, such as X-Cross Validation [8].

Classification algorithms also come in various flavors. We used a descriptiveclassifier, as this returns a set of if-then rules called discriminant rules. Underly-ing descriptive classifiers are either decision trees, rough sets, classification-by-association analysis, etc. A rule r has the form (

∧i∈ I ai = vi)⇒ (c= v), where

I is a subset of k. Usually, each class c has an associated set of rules r1, . . ., rn;i.e. c is characterized by

∧ni=1 ri. Using boolean arithmetic, this is equivalent to

(∨n

i=1

∧j ∈ Ii

aij = vij)⇒ (c= v). The antecedent formula∨n

i=1

∧j ∈ Ii

aij = vij iscalled the class description formula of the class c.

As is customary, we built a classifier for one class only (the class c), called thetarget class, using all other classes as one contrasting class. Hence the classifierconsists of only one class-description formula, describing the target class. We saythat we learned that formula. We have used Weka’s decision-tree algorithm, butany other rule-based algorithm could have been used as well. The classifier wehave learned for spirals, is as follows:if a7 <= 0.875 then {if a2 <= 0.04895 then ∼c else c}else if a3 <= 0.078359 then {if a0 <= 0.025021 then ∼c else c} else ∼cIts translation into linear spatial-superposition generated the following formula:

XX ( P(D= s)> 0.04895 ∧ XXXXX P(D= s)≤ 0.875 ) ∨P(D= s)> 0.025021 ∧ XXX ( P(D= s)≤ 0.078359 ∧ XXXX P(D= s)> 0.875 )

This formula is an approximate description of a spiral which we used togetherwith Emerald’s BMC to detect spiral onset within milliseconds. In case the BMC

returned a false positive, we add the corresponding record to the classificationtable as part of a retraining phase; see Figure 1.

6 Implementation and Experimental Results

Our techniques of Sections 2-5 have been implemented as the Emerald tool suiteof the EHA environment. Emerald is a Java application that can be used tolearn an LSSL formula for a particular spatial pattern, and to check the formulaagainst a set of images that reproduce the discrete behavior of a CLHA network.For ease of use, Emerald provides two graphical panels, one for Preprocessing(classification) and the other for Bounded Model Checking.

The Preprocessing panel (Figure 6(a)) enables users to browse the variouscollections of images they have assembled for machine-learning purposes, andto view their SQT representation. It comprises three graphical components: animage viewer on the right, a quadtree viewer on the top-left, and a data-tableviewer on the bottom-left, where user-selected paths are displayed. In the imageviewer, the user selects a path leading to a spiral core by clicking on an appro-priate stimulated point (in yellow) of the image. If the image does not containa spiral, the user can choose the maximum PMF path or a generic stimulatedpoint. Each path selected is stored in a data table in the form of the PMF se-quence of stimulated modes in each node of the traversed SQT. All such pathsare subsequently exported to Weka in a common format. Presently, we have cus-tomized Emerald for spiral detection, but we plan to extend the tool with thecapability to classify any generic spatial pattern.

Fig. 6. (a) Top: Preprocessing Panel. (b) Bottom: Bounded Model Checking Panel.

The BMC panel (Figure 6(b)) enables the user to check an LSSL formulaagainst the SQT representation of a specific image. As discussed in Section 5, theLSSL formula encodes the classifier for the spatial pattern under investigation. Ifthe SQT in question fails to satisfy the formula, the resulting counter-examples(spirals) are reported to the user both as rows in the counter-example table andas red points marking the core of the spiral contained in the image.

Table 1 contains our preliminary experimental results. For training and test-ing purposes, we used two different sets of images, each containing spirals andnormal wave propagation. The first set of images was used to train the classifier;we supervised the training by discriminating between paths leading to a spiralcore versus those (of maximum PMF) belonging to images that did not containa spiral. From this first set we extracted 512 possible paths, and used Weka tobuild a ruled-based classifier with a very high prediction accuracy (99.25%).

The test set was divided into increasingly larger sets of images: 500, 550,600 and 650 images. Applying the rule-based classifier on the first 500 imagesproduced 67 wrongly classified paths. We used these paths to obtain a new,retrained classifier. We then used both classifiers on the remaining sets of images,and for each classifier and test set we computed the LSSL formula accuracy, as an

Path Classifier Test Set 550 Test Set 600 Test Set 650

Trained (512 Paths) 87.00% 88.83% 88.23%

Retrained (512 Paths + 67 Counter-Examples) 97.10% 97.33% 93.07%

Table 1. Experimental Results

estimate of how well the formula specifies the spatial pattern. As Table 1 shows,retraining considerably improves accuracy, and can be repeated each time a falseclassification is returned. Weka’s decision-tree algorithm took no more than 9s toconstruct a rule-based classifier from the training (512 records) and retraining(579 records) tables, respectively. Our model checker took between 1.67s and7.09s, with an average of 4.72s to model check an SQT for a 400× 400 SEF ifno spiral was present, and between 1ms and 4.64s, with an average of 230ms if aspiral was present. All results were produced on a PC equipped with a Centrino2GHz processor with 1.5GB RAM.

7 Related Work

The use of hybrid automata to model and analyze spatial networks is a relativelynew subject area, and includes application to Delta-Notch signaling networks [9],coordinated control of autonomous underwater vehicles [14], and aircraft tra-jectories and landing protocols [7, 16]. In contrast, our focus is on emergentbehavior (in the form of spiral waves) in networks of cardiac myocytes, andthe use of spatial superposition as an abstraction mechanism. Predicting spi-rals [4] in pure continuous models [18] is a more complicated process than whatis implemented in Emerald, where discrete SQT structures, obtained via mode-abstraction and superposition, are used. Several logics have recently been pro-posed for describing the behavior and spatial structure of concurrent systems [5,6], and for reasoning about the topological aspects of modal logics and Kripkestructures [1]. Unlike LSSL, these logics are not based on an abstraction mech-anism like spatial-superposition that can be used to alleviate state explosionduring model checking.

8 Conclusions

In this paper, we have presented a framework for specifying and detecting emer-gent behavior in networks of cardiac myocytes. Our approach, which uses hybridautomata, discrete mode-abstraction, and bounded model checking, is based ona novel notion of spatial-superposition and its related logic LSSL, and a newmethod for the automated learning of formulae in this logic from the spatialpatterns under investigation. Our framework has been fully implemented in theEmerald tool suite. Our preliminary experimental results are very encouraging,with a prediction accuracy of over 93% on a test set comprising 650 images. Asfuture work, we plan to extend our framework to the learning of branching-timespatial-superposition properties, and the more intricate problem of specifyingand detecting spatiotemporal emergent behavior.

References

1. M. Aiello, J. Benthem, and G. Bezhanishvili. Reasoning about space: The modalway. J. Log. Comput., 13(6):889–920, 2003.

2. E. Bartocci, F. Corradini, E. Entcheva, R. Grosu, and S. A. Smolka. CellExcite: Atool for simulating in-silico excitable cells. To app. in BMC Bioinformatics, 2007.

3. A. Biere, A. Cimatti, E. Clarke, O. Strichman, and Y. Zhu. Bounded modelchecking. In Adv. in Comp. vol. 58: Highly Depend. Software. Acad. Press, 2003.

4. M. A. Bray, S. F. Lin, R. R. Aliev, B. J. Roth, and J. P. J. Wikswo. Experi-mental and theoretical analysis of phase singularity dynamics in cardiac tissue. JCardiovasc Electrophysiol, 12(6):716–722, 2001.

5. L. Caires and L. Cardelli. A spatial logic for concurrency (part I). Inf. Comput.,186(2):194–235, 2003.

6. L. Caires and L. Cardelli. A spatial logic for concurrency (part II. Theor. Comput.Sci., 322(3):517–565, 2004.

7. I. deOliveira and P. Cugnasca. Checking safe trajectories of aircraft using hybridautomata. In Proc. of SAFECOMP 2002. Springer-Verlag, Sept. 2002.

8. E. Frank, M. A. Hall, G. Holmes, R. Kirkby, B. Pfahringer, I. H. Witten, andL. Trigg. WEKA – a machine learning workbench for data mining. In The DataMining and Knowledge Discovery Handbook, pages 1305–1314. Springer, 2005.

9. R. Ghosh, A. Tiwari, and C. Tomlin. Automated symbolic reachability analysis;with application to delta-notch signaling automata. In HSCC, pages 233–248, 2003.

10. R. Grosu, E. Bartocci, F. Corradini, E. Entcheva, S. Smolka, M. True,A. Wasilewska, and P. Ye. EHA: An environment for the specification, sim-ulation, analysis and control of networks of excitable hybrid automata. URL:http://www.cs.sunysb.edu/~eha, 2007.

11. R. Grosu, S. Mitra, P. Ye, E. Entcheva, I. Ramakrishnan, and S. Smolka. Learningcycle-linear hybrid automata for excitable cells. In Proc. of HSCC’07, the 10thInternational Conference on Hybrid Systems: Computation and Control, volume4416 of LNCS, pages 245–258, Pisa, Italy, April 2007. Springer Verlag.

12. Y. Kwon and G. Agha. Scalable modeling and performance evaluation of wirelesssensor networks. In IEEE RT Tech. and App. Symp., pages 49–58, 2006.

13. Y. Lu. Concept hierarchy in data mining: Specification, generation and implemen-tation. Master’s thesis, Simon Fraser University, Dec. 1997.

14. F. Pereira and J. deSousa. Coordinated control of networked vehicles: An au-tonomous underwater system. Aut. and Remote Ctrl, 65(7):1037–1045, 2004.

15. E. Shusterman and M. Feder. Image compression via improved quadtree decom-position algorithms. IEEE Trans. on Image Processing, 3(2):207–215, mar 1994.

16. S. Umeno and N. Lynch. Safety verification of an aircraft landing protocol: Arefinement approach. In Proceedings of HSCC 2007, Apr. 2007.

17. A. Wasilewska and E. M. Ruiz. A classification model: Syntax and semantics forclassification. In RSFDGrC (2), pages 59–68, 2005.

18. N. Wedge, M. Branicky, and M. Cavusoglu. Computationally efficient cardiacbiolectricity models toward whole-heart simulation. In Proc.of Intl. Conf. IEEEEngineering in Medicine and Biology Society, pages 1–4, 2004.

19. P. Ye, E. Entcheva, R. Grosu, and S. Smolka. Efficient modeling of excitable cellsusing hybrid automata. In Proc. of CMSB’05, the 3rd Workshop on ComputationalMethods in Systems Biology, pages 216–227, Edinburgh, Scotland, April 2005.

20. P. Ye, E. Entcheva, S. Smolka, and R. Grosu. A cycle-linear hybrid-automatamodel for excitable cells. Accepted at the IET J. of Systems Biology (SYB), 2007.

learning and detecting emergent behavior in networks of...

Documents