euro lipids 2014_graz

47
Bioinformatics for lipidomics: putting some building blocks together Dr. Juan Antonio Vizcaíno EMBL-EBI Hinxton, Cambridge, UK

Upload: juan-antonio-vizcaino

Post on 29-Jun-2015

246 views

Category:

Science


2 download

DESCRIPTION

Bioinformatics for lipidomics: Putting some building blocks together. 4th European Lipidomic Meeting. Graz, Austria. 22-24/09/2014.

TRANSCRIPT

Page 1: Euro lipids 2014_graz

Bioinformatics for lipidomics: putting some building blocks together

Dr. Juan Antonio Vizcaíno

EMBL-EBI

Hinxton, Cambridge, UK

Page 2: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Overview

• A bit of general context…

• Data standards: mzTab (and mzML)

• Standard nomenclature

• Public repository: MetaboLights

• Specialist resource: LipidHome

Page 3: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Some of the main bioinformatics building blocks

Data standards

Databases, data repositories

Stable identifiers for molecules

Infrastructure to store and access the information

Nothing new… Lipidomics (metabolomics) is following the steps of other disciplines

Page 4: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Bioinformatics infrastructure

Usually, we will not realize they are there… unless something does not work

Page 5: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Overview

• A bit of general context…

• Data standards: mzTab (and mzML)

• Standard nomenclature

• Public repository: MetaboLights

• Specialist resource: LipidHome

Page 6: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Standards are needed in life: also in bioinformatics…

With a small number of standards,data converters are feasible

Data standards are needed

Page 7: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Metabolomics Standards Initiative 2007 publications

Roy Goodacre Metabolomics (2014) 10:5-7

Not much adoption happened in practise…

Page 8: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Situation at the field

LipidXplorer LDA ALEX Others

Lab 1 Lab 2 Lab 3 …

Different output files from different tools

How can these results coming from different groups be easily compared? (also applicable to visualization, storage, …)

Page 9: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Situation at the field

LipidXplorer LDA ALEX Others

Lab 1 Lab 2 Lab 3 …

Different output files from different tools

mzTab Common analysis/visualization tools

Converters

Page 10: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

The mzTab format

http://code.google.com/p/mztab/

Page 11: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

mzTab – Aims and concept• To provide a simple and efficient way of exchanging results

from MS approaches.

• Simple summary report of the experimental results

• Peptides and proteins identified in a given experimental setting

• Small molecules identified

• Reported quantification values

• Technical and biological metadata

• Easier to update and maintain, and flexible enough.

• Easier to parse and use by the research community, systems biologists as well as providers of knowledge bases.

• It can be used by non-experts in bioinformatics.

Page 12: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Why a tab-delimited file?• An effective use of the XML based formats in the

proteomics field (mzIdentML, mzQuantML) requires sophisticated bioinformatics expertise.

• No alternative was available for metabolomics results…

• Many researchers are still used to use MS Excel to “look” or exchange their data.

• The transcriptomics field has a widely used standard tab-delimited file format (MAGE-TAB) for exchanging data. The format MI TAB has also been a success in the molecular interaction field.

Page 13: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

mzTab –Format Specification (version 1.0.0)• Five sections:

• (Optional) Metadata section

• (Optional) Protein section

• (Optional) Peptide section

• (Optional) PSM (Peptide Spectrum Match) version

• (Optional) Small Molecule section

• Can report experimental design to a high detail level.

Page 14: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

mzTab – Metadata Section

• It provides additional information about the dataset. It consists of key- value pairs.

• Extensive use of CVs/ontologies.

• Different requirements depending on the file mode (‘summary’ or ‘complete’) and type (‘identification’ or ‘quantification’).

• Support for experimental design (very similar to mzQuantML).

Page 15: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

mzTab – Metadata Section

Page 16: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

mzTab – Small Molecule Table • Main contents:

• Identifier• Unit-ID• Chemical formula• SMILES identifier• InChi identifier• Descriptive name• Mass to charge• Charge and retention time• Tax ID and species name• Spectral library name + version• Software name + version• Relative or absolute quantification values• Reference to the spectrum ID in an external file (i.e. mzML),

Page 17: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

mzTab – Small Molecule Section

• It contains mandatory and optional fields.

• It is possible to link with the external mass spectra.

Page 18: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

mzTab – Current implementations

• jmzTab (Java API): Version 3.0 is now a stable version. Manuscript published in the journal Proteomics.

• mzTab Validator, PRIDE XML to mzTab converter (PRIDE team).

• mzIdentML and mzQuantML to mzTab converters (Andy Jones group).

• MaxQuant: exporter in beta is available.

• OpenMS (version 1.10).

• R/Bioconductor package Msnbase (L. Gatto, Cambridge University).

• LipidDataAnalyzer (J. Hartler, University of Graz, see next talk).

• Metabolights (EBI).

Page 19: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Implementation in Lipid Data Analyzer• In collaboration with TU of Graz.

• mzTab export support is available from v1.6 (May 2012)

Page 20: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

mzTab format publications

http://code.google.com/p/mztab/

J. Griss et al., MCP, 2014

Q.W. Xu et al., Proteomics, 2014

Page 21: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

COSMOS: EU FP7 project

• COordination of Standards in MetabOlomicS

• Started October 2012

• 14 European partners

• World wide collaborators• Standards!!

• Data exchange• Opensource

http://www.cosmos-fp7.eu/

Page 22: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

mzTab in Mx: extension ongoing

•Meeting in Tuebingen to extend mzTab for metabolomics (March 2014).

•NEW! 3 Tables for SM (analogous to Proteins)

1)SmallMoleculeList

2)SmallMoleculeFeatures

3)SmallMoleculeEvidence

Example file exists at

https://github.com/sneumann/mtbls2/faahKO.mzTab

Page 23: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

mzML: Standard for MS data

• A data format for the storage and exchange of MS output files

• Originally designed for proteomics by merging the best aspects of both mzData and mzXML

• Developed with full participation of academic researchers, hardware and software vendors

• For both raw data and processed peaks.

• Version 1.1 released in June 2009

• Many implementations already exist in the proteomics world

Page 24: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

mzML for Metabolomics

•A no-brainer. No need to reinvent the wheel

•No schema change required.

•But in next documentation update:

1.Describe multidimensional retention time (GCxGC/MS, LCxLC/MS and LC-IMS/MS)

2.Describe tools for conversion (especially the GC world)

Page 25: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Data standards in MS for metabolomics

Steffen Neumann

Page 26: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Overview

• A bit of general context…

• Data standards: mzTab (and mzML)

• Standard nomenclature

• Public repository: MetaboLights and COSMOS

• Specialist resource: LipidHome

Page 27: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Situation at the field

•Very challenging to share experimental results efficiently:

•No standard data format for experimental results (Excel spreadsheets are routinely used).

•Lipid species are called in a slightly different way by different groups and the level of detail also varies.

•This situation is maybe good enough for human consumption, but not for computers. This hinders the development of:

•Analysis tools

•Data repositories

•LIMS systems

Page 28: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Standard LipidomicNet Nomenclature

• Address some limitations of LIPID MAPS (de facto standard nomenclature) for high-throughput lipid MS approaches

• Enabling different levels of resolution for lipid species (needed to add clarification to the data)

• Suitable for bioinformatics approaches (used in LipidHome)

• Includes at present the main lipid classes (from FA to Sterols).

G. Liebisch et al., JLR, 2013

Page 29: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Nomenclature Structural Hierarchy

Page 30: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Overview

• A bit of general context…

• Data standards: mzTab (and mzML)

• Standard nomenclature

• Public repository: MetaboLights

• Specialist resource: LipidHome

Page 31: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Data sharing in Biology

• In some ‘omics’ fields, data sharing ‘culture’ is well established. Generally, it is considered to be a good scientific practise.

• In metabolomics (lipidomics), that ‘culture’ is not there yet.

• Public availability of data enables: • Reinterpretation.• validation of the experimental results reported. • reuse of the data (e.g. for meta-analysis studies). • Specific use cases for metabolomics (lipidomics): e.g.

development of MRM assays, spectral libraries, fragmentation models,…etc.

Page 32: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

MetaboLights – metabolomics repository

www.ebi.ac.uk/metabolights (metabolights.org, metabolights.eu)

Page 33: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

MetaboLights – Data types stored

• Primary research data

• Investigation, Study, Assay and Protocols (metadata)

• Instrument and analytical software output (raw / processed)

• Metabolite references, QC, Blanks, …

• Open source formats

• Imported Reference data, for each metabolite

• Reference data imported from external databases

• Chemistry, Biology, Reactions, Pathways, NMR/MS spectra, Literature

• Link to:

• ChEBI, Rhea and others

Page 34: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

MetaboLights – Private Data – Share data

Page 35: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

MetabolomeXchange.org

Page 36: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Overview

• A bit of general context…

• Data standards: mzTab (and mzML)

• Standard nomenclature

• Public repository: MetaboLights

• Specialist resource: LipidHome

Page 37: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

LipidHome

www.ebi.ac.uk/apweiler-srv/lipidhome

J. Foster et al., PLOS One, 2013

Page 38: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

LipidHome: executive summary• Provides stable identifiers for all common lipid structures.

• Provides all theoretical lipid structures, while maintaining clear separation between them and experimentally validated structures.

• Evidence based system for annotating lipids with papers.

• A useful annotation level hierarchy that allows interrogation of the database from whatever results you have. E.g. Mass, structural fragment or empirical formula.

• Programmatic access so that lipid identification software/ LIMS / analysis pipelines can be built on top of it.

Page 39: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

LipidHome Structural Hierarchy

• Lipids are stored at the levels described in the proposed LipidomicNet nomenclature

• Lipid identifications can accurately be mapped to suitable records in the database

Page 40: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Use cases

• What Species/Isomers are viable identifications for mass X with tolerance Y?

• For species PC 36:2 what are the experimentally validated isomers/ Fatty acid scan species?

• What are all the experimentally validated sub species containing the fatty acid species 18:2?

• What are all the identifications validated by “PMID:20564011”?

• For the mass X what is the most likely sub species based on previous identifications.

Page 41: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

The data in LipidHome

GL

MG

MG

MG O-

DG

DG

DG O-

DG dO-

TG

TG

TG O-

TG dO-

TG tO-

GP

PC

PC

PC O-

PC dO-

LPC

LPC O-

PA

PA

PA O-

PA dO-

LPA

LPA O-

PE

PE

PE O-

PE dO-

LPE

LPE O-

PS

PS

PS O-

PS dO-

LPS

LPS O-

PI

PI

PI O-

PI dO-

LPI

LPI O-

PG

PG

PG O-

PG dO-

LPG

LPG O-

Species: 17497Fatty Acid Scan species: 1821760Sub Species: 2140592Annotated Isomers: 7584Fatty Acid species: 164

Page 42: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Theoretical lipid generation

• A set of rules were derived that describe common fatty acids.

• Minimum carbons = 2

• Maximum carbons = 30

• Minimum double bonds = 0

• Maximum double bonds = 10

• Minimum gap between double bonds

Page 43: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

LipidHome – Species view

Page 44: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

LipidHome – MS1 search output

Page 45: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

The big picture…Standard nomenclature

mzTabCommon analysis and visualization software

Local LIMS systemsMetaboLights

Different output files from different tools

Data convertersto mzTab

mzTab importer intoLIMS/ resource

mzTab exporter fromLIMS/ resource

LipidXplorer LDA ALEX Others

Page 46: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Acknowledgements

Johannes GrissQing-Wei XuJoe Foster

R. Salek & C. SteinbeckCOSMOS partners

G. Liebisch, M. Troetzmueller, F. Spener, H. Koefeler & M. Wakelam

http://code.google.com/p/mztab/

Jurgen HartlerGerhard Thallinger

BBSRC PROCESS grant

Mathias WalzerTimo SachsenbergOliver Kohlbacher

Page 47: Euro lipids 2014_graz

Juan A. Vizcaí[email protected]

4th European Lipidomic meetingGraz, 24 September 2014

Questions?