enabling systems biology: development and implementation of proteomics standards … ·...

37
Enabling Systems Biology: Development and Implementation of Proteomics Standards and Services

Upload: others

Post on 28-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Enabling Systems Biology:

Development and Implementation of

Proteomics Standards and Services

Page 2: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Engineering 1850

•Nuts and bolts fit perfectly

together, but only if they

originate from the same

factory

•Standardisation proposal in

1864 by William Sellers

•It took until after WWII until it

was generally accepted,

though …

Proteomics today

•Proteomics results are perfectly

compatible, but only if they are from

the same lab, from the same

software

•Fragmentation of proteomics data

•“Publish and vanish”

Page 3: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Proteomics Data Sharing

Page 4: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Incompleteness of the

public record

•Nucleotide sequences, protein sequences,

macromolecular 3D structures, DNA microarrays:

Database submission mandatory

•Proteomics: No standardised reporting, no standard

database submission

•Proteomics data is generated at a high rate, and lost at

a high rate

•Simple question like “Give me all tissues in which my

protein of interest was identified” are currently

unanswerable

•Experiments are repeated unnecessarily, the field

advances slower than necessary

Page 5: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

The tide is turning, though …

•Bradshaw RA, Burlingame AL, Carr S, Aebersold R.

Reporting protein identification data: the next

generation of guidelines.

Mol Cell Proteomics. 2006 May;5(5):787-8.

•Wilkins et al.

Guidelines for the next 10 years of proteomics.

Proteomics. 2006 Jan;6(1):4-8.

•Nature Biotechnology 2006, Nov:

• Editorial: Standard Operating Procedures

• Burgoon LD. The need for standards, not guidelines, in biological data

reporting and sharing.

• Ball C. Are we stuck in standards?

•Nature Biotechnology: Community Consultation on

Standards: http://www.nature.com/nbt/consult/index.html

Page 6: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Community Consultation

•Nature Biotechnology community consultation

•http://www.nature.com/nbt/consult/index.html

•Currently nine “standards” papers on NBT website for public

consultation, thereof six from PSI

• MIAPE parent

• MIAPE MS

• MIAPE MS Informatics

• MIAPE Gel

• MIAPE MI

• PSI MO

Page 7: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

HUPO Proteomics Standards Initiative

•Develop data format standards

•Data representation and annotation

standards

•Involve data producers, database providers,

software producers, publishers

•Open community initiative

Page 8: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

PSI deliverables

•Minimum Information about a Proteomics

Experiment (MIAPE)

•XML schema

•Detailed controlled vocabularies

•Support tools

Page 9: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Document process

• Significant investments

into PSI standards require

formal process for PSI

standards

• Process ensures good

balance between expert

design and public scrutiny

• Document process

approved at PSI spring

meeting, San Francisco,

April 2006

• Now in implementation

Page 10: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

PSI work groups

PSI-MI

Molecular

Interactions

PSI-MS

Mass

Spectrometry

PSI-MOD

Separations

Page 11: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

FuGE

Functional Genomics Experiment model

MGED collaboration

PSI-MI

Molecular

Interactions

PSI-MS

Mass

Spectrometry

MGED

MIAME

MAGE-OM

Microarray

Standard

PSI-MOD

Separations

Page 12: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

FuGE

Functional Genomics Experiment model

PSI work groups: MI

PSI-MI

Molecular

Interactions

PSI-MS

Mass

Spectrometry

MGED

MIAME

MAGE-OM

Microarray

Standard

PSI-MOD

Separations

Page 13: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

PSI-MI community standard

•Community standard for Molecular Interactions

•Jointly developed by major data providers: BIND, CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS,

Serono, U. Bielefeld, U. Bordeaux, U. Cambridge, and others

•XML schema

•Controlled vocabularies

•Tools

•Minimum requirements (submitted)

•Implementated by major data providers

Page 14: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

•PSI develops not only formats, but also

controlled vocabularies/ontologies where

necessary

•Example: > 20 ways to write: yeast two hybrid, Y2H, 2H, yeast-two-hybrid, two-hybrid, …

•Ca. 800 terms, fully defined and cross-

referenced

•GO format

PSI-MI controlled vocabularies

Page 15: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

PSI-MI format development

•Iterative development:

Do the feasible first, leave the unfeasible for later

•Version 1.0 published in February 2004

• The HUPO PSI Molecular Interaction Format - A community standard for the

representation of protein interaction data.

Henning Hermjakob et al,

Nature Biotechnology 2004, 22, 176-183.

•Version 2.5 released December 2005

• Technical improvements

• Quantitative parameters

• Additional interactor types: DNA, RNA, small molecules

• Additional, simplified tabular format

• Submitted

Page 16: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

PSI-MI Support

•Data: DIP, HPRD, IntAct, MINT, MIPS, …

•Tools

•Conversion Tabular – PSI XML

•XML -> HTML

•Semantic validation

•Visualisation

•PimWalker ®: http://pim.hybrigenics.com/pimwalker

•ProViz: http://cbi.labri.fr/eng/proviz.htm

•Cytoscape: http://www.cytoscape.org

Page 17: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

IntAct as an implementation of PSI MI

•Curated molecular interaction database

•128.000 binary interactions

•Open source

•Open data

•http://www.ebi.ac.uk/intact

Page 18: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

IntAct curation

•Detailed, “deep” curation

•Based on full text papers

•Experimental conditions

•Detailed interactor identification

•Use of detailed controlled vocabularies

•Annotation of binding domains, protein modifications, etc.

Page 19: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

The IMEx consortium

•International Molecular-Interaction Exchange

consortium

•DIP, IntAct, MINT, MIPS

are establishing an exchange of curated literature data

in PSI-MI format from summer 2006 onwards to

provide a network of stable, comprehensive resources

for molecular interaction data

•Aims:

•Consistent body of public data

•Avoid redundant curation

•http://imex.sf.net

Page 20: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

IMEx data deposition

•Deposition of published data in one of the IMEx

databases is strongly encouraged

•Any dataset submitted in one of the IMEx databases

will be replicated to the other IMEx databases

•IMEx partners are already co-ordinating their curation

efforts now

•Public guidelines:Orchard et al. The Minimum Information on a Molecular Interaction

Experiment (MIMIx).

Nature Biotechnology, accepted.

Page 21: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

FuGE

Functional Genomics Experiment model

PSI work groups: MS

PSI-MI

Molecular

Interactions

PSI-MS

Mass

Spectrometry

MGED

MIAME

MAGE-OM

Microarray

Standard

PSI-MOD

Separations

Page 22: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Mass spectrometry: PSI-MS

•mzData format as common instrument output format

•Format beta version accepted in Nice, April 2004

•EBI workshop July 2004

•Version 1.05 released January 4, 2005

•Next revision spring 2007, in collaboration with the Institute

for Systems Biology (ISB), merging mzData and mzXML

•Controlled vocabularies developed jointly with ASTM

•Key concept:

Request direct vendor support to avoid version problems due

to vendor API changes

•Move to mzML (merge of mzData and mzXML)

Page 23: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Current mzData support•Applied Biosystems

•Bruker

•EBI

•GeneBio

• Insilicos

•Kratos

•MatrixScience

•Swiss Institute of Bioinformatics

•GPM

•Thermo Electron

•Waters

Page 24: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Mass spectrometry: PSI-MS

•analysisXML format as common search

engine output format

•Suggested in Nice, April 2004

•Further developed in Siena, April 2005

•Aim: Facilitate comparison and archiving of search

engine output, in particular in comparative projects

like the HUPO PPP

•Beta release under internal review

Page 25: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

PSI-MS based data flow

proprie-

tary

format

mass

spectrometer B

mass

spectrometer A converter

mzData

search

engine A

search

engine B

analysisXML

Public repository

Page 26: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

PRIDE – Protein Identification Database

•Turns publicly available data into publicly accessible

data

•Protein identifications

•Experimental detail

•Peak lists

•Linkout to raw data

•Fully open source

•Fully open data

•Implementation of PSI standards as they are released

Page 27: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data
Page 28: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data
Page 29: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data
Page 30: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Data views

Page 31: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Experiment Comparison

Page 32: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Lab B

Private Data in

PRIDE “Collaboration”

Comparison

Reviewer

Lab A

Lab C

PRIDE private mode

Publicly available data

•Private mode allows data

analysis within a

collaboration

•PRIDE tools are already

accessible in private mode, in

particular experiment

comparison (alpha)

•On manuscript submission,

reviewers can access the data

in standard format

Page 33: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Lab B

Private Data

“Collaboration”

Reviewer

Lab A

Lab C

PRIDE private mode

Publicly available data

•Private mode allows data

analysis within a

collaboration

•PRIDE tools are already

accessible in private mode, in

particular experiment

comparison (alpha)

•On manuscript submission,

reviewers can access the data

in standard format

•On manuscript publication,

the data becomes public

Page 34: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Data entry

•Register

•XML-based data deposition

• Target group: Larger labs with good bioinformatics support, large scale

data sets

•Generate PRIDE XML directly

•Supporting toolkit currently under development

• Fully automated, web-based submission

•Excel-based

• Target group: Smaller labs, low to medium throughput

• “Biologists love Excel”

•Advanced Excel spreadsheet will allow user input in “familiar” Excel

environment

•Spreadsheet supports use of controlled vocabularies and validation

•Automatic submission direction from spreadsheet into PRIDE

Page 35: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Medium term vision

•Collaborate with regional or project centers for data collection and

analysis

•Establish data exchange and collaboration between PeptideAtlas,

GPMDB, PRIDE, PRIDE@NPC, …

•Provide a set of compatible, synchronized, public resources for protein

identification data

Regional

Center

PRIDEPeptide

Atlas

Regional

CenterHUPO

xPP

PRIDE@NPC

Page 36: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Acknowledgements

•All PSI participants

• Luisa Montecchi-Palazzi

• Sandra Orchard

• Chris Taylor

• Randy Julian (Lilly)

• Patrick Pedrioli (ISB, ETH)

•PRIDE

• Phil Jones

• Lennart Martens

• Richard Cote

• Sebastian Klie

•BBSRC ISPIDER Grant

•BBSRC ProteomeHarvest Grant

•EU ProDaC grant

•Henning Hermjakob

•http://www.psidev.info

Page 37: Enabling Systems Biology: Development and Implementation of Proteomics Standards … · 2017-08-16 · HUPO Proteomics Standards Initiative •Develop data format standards •Data

Resources

•http://psidev.sf.net

•http://imex.sf.net

•http://www.ebi.ac.uk/intact

•http://www.ebi.ac.uk/pride

•http://www.nature.com/nbt/consult/index.html