pride and proteomexchange: training webinar

54
PRIDE and ProteomeXchange: Training webinar Dr. Juan Antonio Vizcaíno PRIDE Group Coordinator Proteomics Services Team EMBL-EBI Hinxton, Cambridge, UK [email protected]

Upload: juan-antonio-vizcaino

Post on 11-Feb-2017

189 views

Category:

Science


0 download

TRANSCRIPT

Page 1: PRIDE and ProteomeXchange: Training webinar

PRIDE and ProteomeXchange: Training webinar

Dr. Juan Antonio Vizcaíno

PRIDE Group CoordinatorProteomics Services TeamEMBL-EBIHinxton, Cambridge, [email protected]

Page 2: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

Welcome - webinar instructions• Gototraining works best in Chrome or IE – avoid

Firefox due to audio issues with Macs.• To access the full features of Gototraining, use

the desktop version by clicking “switch to desktop version”.

• All microphones will be muted whilst the trainer is speaking.

• If you have a question during this time or at the end, please use the chat box at the bottom of the gototraining box.

• Please complete the feedback survey which will launch at the end of the webinar.

Page 3: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

Data resources at EMBL-EBIGenes, genomes & variation

RNA CentralArrayExpress

Expression AtlasMetabolights

PRIDE

InterPro Pfam UniProt

ChEMBL ChEBI

Molecular structuresProtein Data Bank in EuropeElectron Microscopy Data Bank

European Nucleotide ArchiveEuropean Variation ArchiveEuropean Genome-phenome Archive

Gene, protein & metabolite expression

Protein sequences, families & motifs

Chemical biologyReactions, interactions & pathways

IntActReactome

MetaboLights

SystemsBioModels Enzyme Portal BioSamples

Ensembl Ensembl Genomes

GWAS CatalogMetagenomics portal

Europe PubMed CentralGene OntologyExperimental Factor Ontology

Literature & ontologies

Page 4: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

Data resources at EMBL-EBIGenes, genomes & variation

RNA CentralArrayExpress

Expression AtlasMetabolights

PRIDE

InterPro Pfam UniProt

ChEMBL ChEBI

Molecular structuresProtein Data Bank in EuropeElectron Microscopy Data Bank

European Nucleotide ArchiveEuropean Variation ArchiveEuropean Genome-phenome Archive

Gene, protein & metabolite expression

Protein sequences, families & motifs

Chemical biologyReactions, interactions & pathways

IntActReactome

MetaboLights

SystemsBioModels Enzyme Portal BioSamples

Ensembl Ensembl Genomes

GWAS CatalogMetagenomics portal

Europe PubMed CentralGene OntologyExperimental Factor Ontology

Literature & ontologies

Page 5: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

• PRIDE Archive (in the context of ProteomeXchange and the PSI standards)

• How to submit data to PRIDE: PRIDE tools

• How to access data in PRIDE Archive

• A sneak peak to other PRIDE resources

Overview

Page 6: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

• PRIDE Archive (in the context of ProteomeXchange and the PSI standards)

• How to submit data to PRIDE: PRIDE tools

• How to access data in PRIDE Archive

• A sneak peak to other PRIDE resources

Overview

Page 7: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015 7

Mass Spectrometry (MS)-based proteomics• Many different workflows.

• Discovery mode:• Bottom-up proteomics

• Data dependent acquisition• Data independent acquisition

• Top down proteomics

• Targeted mode:• SRM (Selected Reaction Monitoring)

Page 8: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015 8

Mass Spectrometry (MS)-based proteomics• Many different workflows.

• Discovery mode:• Bottom-up proteomics

• Data dependent acquisition• Data independent acquisition

• Top down proteomics

• Targeted mode:• SRM (Selected Reaction Monitoring)

Page 9: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

MS proteomics: tandem MS (bottom-up)

MS/MS matching identifies peptides, not proteins.

Proteins are inferred from the peptide sequences.

Page 10: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

• PRIDE stores mass spectrometry (MS)-based proteomics data:

• Peptide and protein expression data (identification and quantification)

• Post-translational modifications• Mass spectra (raw data and peak

lists)• Technical and biological metadata• Any other related information

• Full support for tandem MS approaches

PRIDE (PRoteomics IDEntifications) database

http://www.ebi.ac.uk/pride/archive Martens et al., Proteomics, 2005Vizcaíno et al., NAR, 2013

Page 11: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PRIDE Mission

• To archive all types of proteomics mass spectrometry data for the purpose of supporting reproducible research, allowing the application of quality control metrics and enabling the reuse of these data by other researchers.

• To integrate MS-based data in a protein-centric manner to provide information on protein variants, modifications, and expression.

• To provide mass spectrometry based expression data to the Expression Atlas.

Page 12: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PRIDE Mission

• To archive all types of proteomics mass spectrometry data for the purpose of supporting reproducible research, allowing the application of quality control metrics and enabling the reuse of these data by other researchers.

• To integrate MS-based data in a protein-centric manner to provide information on protein variants, modifications, and expression.

• To provide mass spectrometry based expression data to the Expression Atlas.

Page 13: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

What is a proteomics publication in 2015?• Proteomics studies generate potentially large amounts of

data and results.

• Ideally, a proteomics publication needs to:• Summarize the results of the study• Provide supporting information for reliability of any

results reported

• Information in a publication:• Manuscript• Supplementary material• Associated data submitted to a public repository

Page 14: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

Journal Submission Recommendations• Journal guidelines recommend submission to proteomics repositories:

Proteomics Nature Biotechnology Nature Methods Molecular and Cellular Proteomics

• Funding agencies are enforcing public deposition of data to maximize the value of the funds provided.

Page 15: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PRIDE: Source of MS proteomics data

• PRIDE Archive already provides or will soon provide MS proteomics data to other EMBL-EBI resources such as UniProt, Ensembl and the EBI Expression Atlas.

http://www.ebi.ac.uk/pride/archive

Page 16: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

Data content in PRIDE Archive• Dataset submission driven resource.

• PRIDE is organised in datasets (group of assays).

• An assay represents one MS run (in most cases).

• No data reprocessing at present. PRIDE aims to represent the author’s view on the data.

• Main supported formats: PRIDE XML and mzIdentML.

• Raw data is also now stored.

Page 17: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

ProteomeXchange Consortium•Goal: Development of a framework to allow

standard data submission and dissemination pipelines between the main existing proteomics repositories.

•Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and (very recently) MassIVE (UCSD, San Diego).

•Common identifier space (PXD identifiers)

•Two supported data workflows: MS/MS and SRM.

•Main objective: Make life easier for researchers

http://www.proteomexchange.org

Page 18: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

ProteomeCentral

Metadata / Manuscript

Raw Data*

Results

Journals

UniProt/neXtProt

Peptide Atlas

Other DBs

Receiving repositories

PASSEL (SRM data)

PRIDE (MS/MS data)

Other DBs

GPMDB

Researcher’s results

Reprocessed results

Raw data*

Metadata

MassIVE (MS/MS data)

Vizcaíno et al., Nat Biotechnol, 2014

ProteomeXchange data workflow

Page 19: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

• PRIDE Archive (in the context of ProteomeXchange and the PSI standards)

• How to submit data to PRIDE: PRIDE tools

• How to access data in PRIDE Archive

• A sneak peak to other PRIDE resources

Overview

Page 20: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

ProteomeCentral

Metadata / Manuscript

Raw Data*

Results

Journals

UniProt/neXtProt

Peptide Atlas

Other DBs

Receiving repositories

PASSEL (SRM data)

PRIDE (MS/MS data)

Other DBs

GPMDB

Researcher’s results

Reprocessed results

Raw data*

Metadata

MassIVE (MS/MS data)

Vizcaíno et al., Nat Biotechnol, 2014

ProteomeXchange data workflow

Page 21: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PX Data workflow for MS/MS data1. Mass spectrometer output files: raw data (binary files) or

peak list spectra in a standardized format (mzML, mzXML).

2. Result files:

a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard.

b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form.

3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter.

4. Other files: Optional files:a. QUANT: Quantification related results e. FASTAb. PEAK: Peak list files f. SP_LIBRARYc. GEL: Gel imagesd. OTHER: Any other file type

Published

RawFiles

Other files

Page 22: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

Complete

Partial

Complete vs Partial submissions: processed resultsFor complete submissions, it is possible to connect the spectra with the identification

processed results and they can be visualized.

Page 23: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PX Data workflow for MS/MS data1. Mass spectrometer output files: raw data (binary files) or

peak list spectra in a standardized format (mzML, mzXML).

2. Result files:

a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard.

b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form.

3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter.

4. Other files: Optional files (the list can be extended):a. QUANT: Quantification related results e. FASTAb. PEAK: Peak list files f. SP_LIBRARYc. GEL: Gel imagesd. OTHER: Any other file type

Published

RawFiles

Other files

Page 24: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PRIDE Components: Submission Process

PRIDE Converter 2

PRIDE Inspector PX Submission Tool

mzIdentML

PRIDE XML1

Page 25: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

Search output files

Spectra files

Original data files ‘RESULT’ file generation Final ‘RESULT’ file

PRIDE XML

‘RESULT’

Before: only file conversion to PRIDE XML

File conversion

PRIDE Converter

Other tools, e.g. hEIDI

Page 26: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PX Data workflow for MS/MS dataSearch Engine

Results + MS files

PRIDE Converter 2

PRIDE XML

Coté & Griss et al., MCP, 2012

Other tools available:

- PRIDE Converter- PLGS (Waters)- Proteios- EasyProt- hEIDI- OmicsHub (Integromics)- PeptideShaker (Compomics)

PRIDE Converter 2

https://github.com/PRIDE-Toolsuite/pride-converter-2

- ‘Bulk’ conversion possible: Command Line mode- Virtually no limit in file sizes.

Page 27: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

Tools ‘RESULT’ file generation Final ‘RESULT’ file

mzIdentML ‘RESULT’

Now: native file export to mzIdentML

Spectra files

(mzML, mzXML, mzData,

mgf, pkl,

ms2, dta, apl)

Mascot

ProteinPilot

Scaffold

PEAKS

MSGF+

Others

Native File export

Page 28: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

Complete submissionsSearch Engine

Results + MS files

Search engines

mzIdentML

- Mascot- MSGF+- MyriMatch and related tools from D. Tabb’s

lab- OpenMS- PEAKS- PeptideShaker- ProCon (ProteomeDiscoverer, Sequest)- Scaffold- TPP via the idConvert tool (ProteoWizard)- ProteinPilot (from version 5.0)- X!Tandem native conversion (Beta,

PILEDRIVER)- Others: library for X!Tandem conversion, lab

internal pipelines, …- Crux

An increasing number of tools support export to mzIdentML 1.1

- Referenced spectral files need to be submitted as well (all open formats are supported).

Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.

Page 29: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PRIDE Components: Submission Process

PRIDE Converter 2

PRIDE Inspector PX Submission Tool

mzIdentML

PRIDE XML

2

Page 30: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PRIDE Inspector Toolsuite

Wang et al., Nat. Biotechnology, 2012Perez-Riverol et al., MCP, 2016, in press

PRIDE Inspector

PRIDE Inspector Toolsuite supports:

- PRIDE XML- mzIdentML + all types of spectra files- mzML- mzTab identification and Quantification +

all types of spectra files

https://github.com/PRIDE-Toolsuite/

Page 31: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PRIDE Inspector Toolsuite

https://github.com/PRIDE-Toolsuite/

New visualisation functionality for Protein Groups

PRIDE Inspector Toolsuite

Page 32: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PRIDE Inspector ToolsuitePRIDE Inspector Toolsuite

Private review of files submitted to PRIDE https://github.com/PRIDE-Toolsuite/

Page 33: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PRIDE Components: Submission Process

PRIDE Converter 2

PRIDE Inspector PX Submission Tool

mzIdentML

PRIDE XML

3

Page 34: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

•Capture the mappings between the different types of files.

•Make the file upload process straightforward to the submitter (It transfers all the files using Aspera or FTP).

PX submission tool

Published

Raw

Other files

http://www.proteomexchange.org/submission

PXsubmission

tool

•Command line alternative: Using the Aspera file transfer protocol.

Page 35: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PX submission tool: step by step

Page 36: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PX submission tool: screenshots

Page 37: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

Manuscript published detailing the process

Ternent et al., Proteomics, 2014http://www.proteomexchange.org/submission

Example dataset:PXD000764

- Title: “Discovery of new CSF biomarkers for meningitis in children”- 12 runs: 4 controls and 8 infected samples- Identification and quantification data

Page 38: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PRIDE Archive submitted datasets up until 1st November, 2015

• 1,259 submitted datasets by November 1st • 923 submitted datasets in 2014• In the last 6 months, 155 submitted datasets per month• Size: ~ 160 TB.

Page 39: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PRIDE: Size comparison with other EBI resources (May 2015)

2004 2006 2008 2010 2012 2014 20161E+07

1E+12

1E+17Data accumulation by resource

Metabo-lites

PRIDE

EGA

ENA (less AE)

AE

date

byte

s

Chart generated by Guy Cochrane

Page 40: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

• PRIDE Archive (in the context of ProteomeXchange and the PSI standards)

• How to submit data to PRIDE: PRIDE tools

• How to access data in PRIDE Archive

• A sneak peak to other PRIDE resources

Overview

Page 41: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

Data access to PRIDE Archive• Look for particular datasets of interest:

• For data reuse: which particular proteins and peptides (including PTMs) have been detected.

• Data reinterpretation or re-analysis.

• Validation of the experimental results reported.

• Specific use cases for proteomics: spectral libraries, fragmentation models, SRM transitions,…

Page 42: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

RSS feed for public datasets

http://groups.google.com/group/proteomexchange/feed/rss_v2_0_msgs.xml

Page 43: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

Ways to access data in PRIDE Archive

• PRIDE web interface

• File repository

• REST web service

• PRIDE Inspector tool

Page 44: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PRIDE Archive web interface

Page 45: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PRIDE Archive web interface (2)

Page 46: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

ProteomeCentral

Metadata / Manuscript

Raw Data*

Results

Journals

UniProt/neXtProt

Peptide Atlas

Other DBs

Receiving repositories

PASSEL (SRM data)

PRIDE (MS/MS data)

Other DBs

GPMDB

Researcher’s results

Reprocessed results

Raw data*

Metadata

MassIVE (MS/MS data)

Vizcaíno et al., Nat Biotechnol, 2014

ProteomeXchange data workflow

Page 47: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

ProteomeCentral: Portal for all PX datasets

http://proteomecentral.proteomexchange.org/cgi/GetDataset

Page 48: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

• PRIDE Archive (in the context of ProteomeXchange and the PSI standards)

• How to submit data to PRIDE: PRIDE tools

• How to access data in PRIDE Archive

• A sneak peak to other PRIDE resources

Overview

Page 49: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

2015 overview of PRIDE resources

Page 50: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

PRIDE Proteomes and PRIDE Cluster• Provide an aggregated and QC filtered peptide-

centric and protein centric view on PRIDE Archive data. http://www.ebi.ac.uk/pride/cluster/http://wwwdev.ebi.ac.uk/pride/proteomes/

Page 51: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

• Main characteristics of PRIDE Archive and ProteomeXchange (PX)

• PX/PRIDE submission workflow for MS/MS data• PRIDE Inspector• PX submission tool

• PRIDE/ProteomeXchange has become the de facto standard for data submission and data availability in proteomics

Conclusions

Page 52: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

Do you want to know a bit more…?

http://www.slideshare.net/JuanAntonioVizcaino

Page 53: PRIDE and ProteomeXchange: Training webinar

Juan A. Vizcaí[email protected]

Training webinar25 November 2015

Aknowledgements: PeopleAttila CsordasTobias TernentNoemi del ToroGerhard Mayer (Bochum, de.NBI)

Johannes GrissYasset Perez-Riverol

Henning Hermjakob

Former team members: Rui Wang, Florian Reisinger and Jose A. Dianes

Acknowledgements: The PRIDE Team

Page 54: PRIDE and ProteomeXchange: Training webinar

• 9 December – UniProt website updates• 16 December – Ensembl release 83

All webinars @ 4:00pm GMT time unless statedFor details see: http://www.ebi.ac.uk/training/webinars

Future webinars: