data standards for systems biology
DESCRIPTION
TRANSCRIPT
![Page 1: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/1.jpg)
Data Standards for Systems Biology
Neil SwainstonManchester Centre for Integrative Systems Biology
![Page 2: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/2.jpg)
Introduction
• Experimental standards• Proteomics• Metabolomics• Enzyme kinetics
• Modelling standards• Models• Simulations• Results
![Page 3: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/3.jpg)
Why do we need standards?
• Aids researchers by facilitating management of experimental data
• Facilitates open-source software development and interoperability
• Allows data to be shared• Increasingly becoming a requirement for journal
submissions
![Page 4: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/4.jpg)
When are standards developed?
• Standards generally are generated organically
• Not for pioneers
• When an experimental technique becomes established
• Need for a standard becomes obvious
![Page 5: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/5.jpg)
Who develops standards?
• Usually two or more academic groups• Commercial providers often less enthusiastic
• Often formed by a Working Group• Proteome Standards Initiative• Metabolomics Standards Initiative
• “Minimum information required” specification provided
• Followed by data schema, XML standard
![Page 6: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/6.jpg)
MCISB project overview
Enzyme kineticsQuantitativemetabolomics
Quantitativeproteomics
Model
Parameters(KM, Kcat)
Variables(metabolite, proteinconcentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
![Page 7: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/7.jpg)
Proteomics
• We wish to store:
• Raw experimental mass spectrometry data
• Protein / peptide identifications
• Protein / peptide quantitations
• Metadata (instrument, search algorithm, user, etc.)
![Page 10: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/10.jpg)
Mass spectrometry data
• The simple approach does provide a list of masses and intensities, but…• What instrument was used?• Who ran the instrument?• What sample was used?• …etc.
• The simple approach lacks metadata
• Many simple approaches (formats) exist
![Page 11: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/11.jpg)
Mass spectrometry data
• The less simple approach: mzData• Developed by the Proteome Standards Initiative, 2005• Put together by Working Group of academics and
commercial parties• Regular meetings, both real and virtual
• Goal: unify the existing “simple” formats into one• Support “tagging” with metadata
![Page 12: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/12.jpg)
mzData
• http://www.psidev.info/index.php?q=node/80#mzdata
• XML format, includes…• Peak lists (mz / intensities)• Experimental protocols• Admin (Who? When?)• Instrument details• etc.
![Page 13: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/13.jpg)
Controlled vocabularies
• Use of free text is “dangerous”• Non-standard, ambiguous terms• Difficult to match / compare
• Controlled vocabularies• Collection of standardised terms• Organised into vocabularies or ontologies• Ontologies contain controlled terms and relationships
between them (predicates)
![Page 16: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/16.jpg)
Proteomics data
• Proteomics data is not solely mass spectrometry data• Sample preparation protocol?• Peptide / protein identifications?• Post-translational modifications• Identification scores?
• To support this, an extension is required• Extension based on defined set of “minimum
requirements”• MIAPE
![Page 18: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/18.jpg)
PRIDE
• Proteomics identifications database– Both a format and a database
– Centralised, standards compliant, open source, public data repository for proteomics data
– Query, submit and retrieve proteomics data in standardized XML formats
– Public version housed at the EBI– http://www.ebi.ac.uk/pride/
![Page 20: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/20.jpg)
PRIDE Converter
• User interface
• Usable by biologists
• Interfaces with Ontology Lookup Service
• Developed by EBI
• Automatic upload to PRIDE database
![Page 22: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/22.jpg)
Future directions
• PRIDE does NOT hold:• Protein and peptide quantitations
• New approaches being developed• mzML – mass spectrometry format, enhancement of
mzData, including support for richer datasets
• mzIdentML – storage of protein and peptide identifications
• mzQuantML – storage of protein and peptides quantitations
![Page 23: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/23.jpg)
Metabolomics
• We wish to store:
• Raw experimental mass spectrometry (and NMR) data
• Metabolite identifications
• Metabolite quantitations
• Metadata (instrument, search algorithm, user, etc.)
![Page 24: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/24.jpg)
Metabolomics
• Data standard does NOT currently exist• Core Information for Metabolomics Reporting
• Metabolites Standard Initiative (MSI)• http://msi-workgroups.sourceforge.net/
• MetaboLights being developed at EBI• Not many details as yet
• In the mean time…• MCISB has developed its own repository
![Page 25: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/25.jpg)
MeMo
• Metabolomics Model database
• Designed initially for metabolomics data• SQL / XML hybrid approach
• Holds:– Experimental meta-data (submitter, lab, date)– Sample meta-data (including biological source)– Instrumentation meta-data– Mass spectra– Metabolite identifications
![Page 27: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/27.jpg)
![Page 29: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/29.jpg)
Enzyme kinetics
• How fast does a given reaction occur?
Enzyme
A B
• Determination of kinetic constants which define the kinetics of the reaction
• Experimental approach: perform kinetic assays
![Page 30: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/30.jpg)
Enzyme kinetics
• Many approaches:– Absorbance– Fluorescence– others
• Currently concentrating on absorbance assays on BMG NOVOstar instrument
• Requirement: determination of KM and kcat for a given reaction under particular conditions (pH and temperature)
![Page 31: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/31.jpg)
Enzyme kinetics: Michaelis-Menten
• Traditionally, for each assay, initial rate, v is determined
![Page 32: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/32.jpg)
Enzyme kinetics: Michaelis-Menten
• Performing this at various substrate concentrations allows KM and Vmax to be determined:
![Page 33: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/33.jpg)
STRENDA guidelines
• Standards for Reporting Enzymology Data• http://www.beilstein-institut.de/en/projects/strenda/
• Specifies…• Reactants / products• Enzyme (wild-type, modified, purification, expressed
in• Experimental conditions (pH, temperature, buffer)• Instrument, experiment type• Submitter (contact details)
![Page 34: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/34.jpg)
SABIO-RK
• http://sabio.villa-bosch.de/
• Comprehensive collection of enzyme kinetic constants
• Adheres to STRENDA recommendation
• Harvested from literature
• Searchable web interface
![Page 38: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/38.jpg)
BRENDA
• http://www.brenda-enzymes.org/
• Even more comprehensive
• Slightly less well-curated
• Again, searchable web interface
![Page 40: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/40.jpg)
Other experimental standards
• MIBBI: Minimum Information for Biological and Biomedical Investigations• http://mibbi.org/
• Over thirty recommendations for a range of experimental techniques
![Page 42: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/42.jpg)
MCISB project overview
Enzyme kineticsQuantitativemetabolomics
Quantitativeproteomics
Model
Parameters(KM, Kcat)
Variables(metabolite, proteinconcentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
![Page 43: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/43.jpg)
MCISB project overview
Enzyme kineticsQuantitativemetabolomics
Quantitativeproteomics
Model
Parameters(KM, Kcat)
Variables(metabolite, proteinconcentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
![Page 44: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/44.jpg)
Modelling
• What is a model?
• “An analytic or computational model proposes specific testable hypotheses about a biological system”
• Mathematical / computational representation of a biological system
• May allows computational simulations of the system
![Page 45: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/45.jpg)
Pathway databases
• Building a model often starts with a topological description of a pathway or pathways
• What reacts with what?
• A number of existing data resources• Biochemical knowledge, curated from literature
![Page 50: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/50.jpg)
Simulation tools
• The systems biology community has developed a strong software infrastructure
• Many tools exist, including simulators• Several hundred
• How do we link pathway databases to these simulators?
• A standard: SBML• Systems Biology Markup Language• Recently celebrated its 10th birthday
![Page 51: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/51.jpg)
SBML
• XML markup language describing models
• Contains concepts such as…• compartments• species (metabolites, enzymes, RNA, etc.)• reactions
• Similar to pathway databases• KEGG2SBML tool exists for converting KEGG pathway
maps to SBML files
![Page 52: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/52.jpg)
Mathematical SBML
• Also contains concepts allowing simulations• Many of these driven by experimental work
• Specification of metabolite and enzyme concentrations
• Specification of kinetic laws and kinetic parameters
• Parameterised model = pathways + experimental data
![Page 54: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/54.jpg)
SBML data resources
• Biomodels.net• http://www.ebi.ac.uk/biomodels-main/• Curated collection of biochemical models at EBI
• JWS Online• http://jjj.mib.ac.uk/• Also curated• BUT also includes an online simulator• You’ll learn more next month…
![Page 55: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/55.jpg)
SBML tools
• Hundreds of ‘em (205)• http://sbml.org/SBML_Software_Guide
• Different goals• Whole cell / single pathway• Deterministic / stochastic simulators• Different platforms / programming languages
• Matrix exists, describing capabilities of each tool• http://sbml.org/SBML_Software_Guide/
SBML_Software_Matrix
![Page 56: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/56.jpg)
Making SBML models: CellDesigner
![Page 57: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/57.jpg)
Other model representations
• CellML• http://www.cellml.org/• Larger scale modelling• Inter-cellular, used in whole organ modelling
• BioPAX• http://www.biopax.org/• Similar goals to SBML
• Overlap between “competing” representations is being reduced• Regular “COMBINE” meetings
![Page 58: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/58.jpg)
MIRIAM
• Minimum Information Required in the Annotation of Models• http://www.ebi.ac.uk/miriam/
• Set of guidelines describing how to make models reusable• Specify model creator contact details• Ensure consistent annotation of terms with database
resources• e.g. use UniProt identifiers for unambigous
identification of enzymes
![Page 59: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/59.jpg)
SBML visualisation: SBGN
• Until recently, no standardised way of viewing models• Systems Biology Graphical Notation• Attempts to generate standard “wiring-diagram” for
biological representations
![Page 60: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/60.jpg)
Model simulation
![Page 61: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/61.jpg)
Model simulation
• Many simulators exist
• How do we tell a simulator what to simulate?• Simulation Experiment Description Markup Language
(SED-ML)
• Contains concepts…• Model (what to run the simulation on)• Simulation (define what to simulate, duration, step-
size)• Data generation (post-processing normalisation)• Output (2D plot, 3D plot)
![Page 62: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/62.jpg)
Simulation results: SBRML
• Simulation results are data too, and are represented by SBRML• Systems Biology Results Markup Language• Developed by Joseph Dada, et al. (Manchester)
• Structured format for representing simulation results
• Dada JO, et al. SBRML: a markup language for associating systems biology data with models. Bioinformatics 2010, 26, 932-938.
![Page 63: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/63.jpg)
SBRML
![Page 64: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/64.jpg)
Conclusion
• Data standards greatly facilitate computational systems biology
• Standards exist (and are being continually developed) for both experimental and modelling data
• Provides a framework for data sharing and open-source software tool development
![Page 65: Data standards for systems biology](https://reader036.vdocuments.site/reader036/viewer/2022062511/54c69fc74a795911758b4597/html5/thumbnails/65.jpg)
Data Standards for Systems Biology
Neil SwainstonManchester Centre for Integrative Systems Biology