brief review of common modeling formalisms and representation approaches
TRANSCRIPT
Brief review of common modeling formalismsand representation approaches
Michael Hucka, Ph.D. – California Institute of Technology, Pasadena, CA James Sluka, Ph.D. – Indiana University, Bloomington, IN
Herbert Sauro, Ph.D. – University of Washington, Seattle, WA
2015 MSM Consortium Satellite Meeting, September 2015, Bethesda, MD, USA
Trajectory of this presentation
Phenomena Formalisms Formats
What is the model about?
What is the form of the model?
How is the model stored and shared?
Phenomena
Dimensions of biological phenomena
ecosystempopulation
bodyorgan system
organtissue
cellsubcellular structures
moleculeatom
Spatial scale
Illust
ratio
ns fr
om fr
eepi
k.co
m, s
cienc
ekid
s.co.
nz, w
ikim
edia
.org
, and
thea
ppro
fess
or.o
rg
Temporal scale
yeardayhourminutesecond monthmillisecondpicosecond
Carb
o, A
., Hon
teci
llas,
R., A
ndre
w, T
., Ede
n, K
., Mei
, Y., H
oops
, S., &
Ba
ssag
anya
-Rie
ra, J
.. Fro
ntie
rs in
Cel
l and
Dev
elop
men
tal B
iolo
gy, 2
014.
Modeling formalisms
Some common, basic formalisms
ODE (Nonlinear) ordinary differential equations expressing rates of change of variables w.r.t. a single continuous variable
PDE Partial differential equations expressing rates of change w.r.t. multiple variables (e.g., space & time)
Discrete stochastic
Monte Carlo methods (e.g., Gillespie’ SSA) used to produce time evolution of a system of individuals or particles
Constraint-based
Optimization of variables around a steady state, subject to constraints (e.g., mass balance)
Agent-based
Simulation of the operation and interaction of multiple (semi-)independent entities
Finite element
Solution methods for PDE problems that subdivide the domain into a mesh of smaller/simpler pieces
Dimensions of formal models
DiscreteContinuous
Not spatial Explicitly spatial
Deterministic Stochastic
What is the form of a given variable or quantity or attribute?
Some example variations:
• Entity values:
- State value (e.g., Boolean on/off )
- Discrete molecular count
- Continuous concentration
• Time:
- System iterations
- Discrete time steps
- Continuous time
Discrete vs. continuous
Do you always get the same result given the same initial conditions?
Some example variations:
• Ordinary differential equations (⇒ deterministic)
• Hybrid deterministic and stochastic (⇒ both)
• Fully stochastic system
Deterministic vs. stochastic
Are spatial characteristics inherently accounted for?
Some example variations:
• “Pure” biochemical reaction network model
• Compartmental model
• Spatial diffusion model
Not spatial vs. explicitly spatial
How some common formalisms compare
Finite element
Agent-based
continuous discretedeterministic stochastic
nonspatial spatialODE
continuous discretedeterministic stochastic
nonspatial spatialPDE
continuous discretedeterministic stochastic
nonspatial spatial
Discrete stochastic
Constraint-based
continuous discretedeterministic stochastic
nonspatial spatial
continuous discretedeterministic stochastic
nonspatial spatial
continuous discretedeterministic stochastic
nonspatial spatial
Encoding the models
Structured format
• Software-independent
• SBML, CellML, NeuroML
• Structured format for some parts
• Hard coding for other parts
Spectrum of approaches to encoding models
“Hard-coded”
• The code is the model
• Python, MATLAB, etc.
Mixture
format interpreter
simulation software system
software’s internal format
What do we mean by a structured format?
definition in software-independent file format
Gen
eral
cas
e
format interpreter
simulation software system
software’s internal format
What do we mean by a structured format?
definition in software-independent file format
Gen
eral
cas
eAl
tern
ativ
e ca
se format interpreter
simulation software system
software’s internal format
declarative model definition
declarative simulation protocol definition
Pros:
• Most flexibility and power for defining model & simulation
Cons:
• Model details intertwined with implementation details
• Others must read code to understand model
➡ Readers must have access to same environment (⇒ $$$)
➡ Readers must know language & environment
• Model reuse can be much more difficult
• Model annotation can be much more difficult
• Model comparison can be much more difficult
Pros and cons of hard-coded models
Pros
• Software-independent ⇒ model usable in any compatible tool
• Model details are made explicit ⇒ reproducibility enhanced
- The knowledge represented by the model is clarified
• Implementation details not mixed in ⇒ less error prone
• Tools & facilities can be devised for annotation, comparison, search
Cons:
• Suitable formats not available for all model formalisms
• Encoding model in a given format may not be easy
- Formats are often an intersection of commonly needed features, not a union of all possible features ⇒ limited in their features
Pros and cons of structured definition formats
Results of short MSM survey: approaches
“If you/your team write simulations, how do you usually encode or represent your models?”
Express models in a spreadsheetHard-coded in programming language
Encoded using an open, structured formatUse application with its own internal format
Mix of approachesOther
0 5 10 15
210
53
110
Survey run in August, 2015. Received 32 total responses.
However, number of responses listing model representation formats in answer to the question “If you use open formats, please list the relevant standards you use” = 14
Results of short MSM survey:
formats
“If you use open formats to represent and store your models, please list the relevant standards that you use.”
SBMLCellML
FieldMLSED-ML
FEMMATLABMIRIAM
VCMLAMPL
BioPAXBioSignalML
BNGLFortran
GAMSHDF5
GoTranGML
JSONMoML
NeuroMLOpenSim
PythonRDF
SymPyVTK
0 5 10
Results of short MSM survey:
formats
“If you use open formats to represent and store your models, please list the relevant standards that you use.”
SBMLCellML
FieldMLSED-ML
FEMMATLABMIRIAM
VCMLAMPL
BioPAXBioSignalML
BNGLFortran
GAMSHDF5
GoTranGML
JSONMoML
NeuroMLOpenSim
PythonRDF
SymPyVTK
0 5 10
not a model formatapplication-specific
prog. language or API
Open formats named in the surveySBML Declarative, process-oriented (e.g., reactions) descriptions. SBML
Level 3 packages support added constructs & application areas.
CellML Declarative, component-oriented descriptions of mathematical models of any kind.
FieldML Declarative descriptions of hierarchically-structured generalized mathematical fields.
SED-ML Declarative descriptions of simulation procedures to be applied to models in SBML, CellML, NeuroML or other format.
BNGL Rule-based descriptions of biomolecular interactions. Originally BioNetGen’s format but now used by some other tools.
NeuroML Declarative descriptions of neuronal cell and network models.
MoML Descriptions of hierarchical components of any kind. Defines connections, ports, and meta-data.
RDF General data representation format using directed, labeled graphs.
JSON General data rep. format using ordered lists of name-value pairs.
format interpreter
simulation software system
software’s internal format
What do we mean by a structured format?
definition in software-independent file format
Alte
rnat
ive
case
Gen
eral
cas
e
format interpreter
simulation software system
software’s internal format
declarative model definition
declarative simulation protocol definition
SBML, CellML, NeuroML, FieldML
SED-ML
Formalisms vs. formats
ODE SBML, CellML, NeuroML, VCell “.vcml”, COPASI “.cps”, OpenSim “.osim”, JSIM “.mml”
PDE SBML Level 3, FieldML, VCell “.vcml”, JSIM “.mml”
Discrete stochastic
Molecular level: CHARMMS “.crd” & “.psf”, LAMMPS files Higher levels: SBML, BNGL, COPASI “.cps”, JSIM “.mml”, MIST zip
Constraint-based
SBML Level 3, AMPL “.mps”, GAMS “.gdx”
Agent-based
(application-specific formats, or programming languages + frameworks/APIs)
Finite element
SBML Level 3, FieldML, CMISS “.exelem”, CompuCell3D “.cc3d”, Tecplot “.tp”, FlexPDE “.pde”, COMSOL “.mph”, FEBio “.feb”, VCell “.vcml”, JSIM “.mml”
Why bother going down this road?
Responsible conduct of scientific research!
Promotes greater reproducibility
• Models tested in multiple software tools reveal hidden assumptions
Gives you access to a larger ecosystems of tools
• Databases (e.g., BioModels Database, Physiome Repository)
- Helps disseminate models (good for citations)
• Other compatible software written by other people: simulation, analysis, visualization, validation, comparison, annotation, …
• Automated model generation pipelines (e.g., Path2Models)
Ensures persistence of models after individual software tools disappear
Incentives for sharing models in open formats