pride-proteomexchange

70
PRIDE resources and ProteomeXchange Dr. Juan Antonio Vizcaíno PRIDE Group Coordinator Proteomics Services Team EMBL-EBI Hinxton, Cambridge, UK

Upload: juan-antonio-vizcaino

Post on 12-Jan-2017

192 views

Category:

Science


0 download

TRANSCRIPT

Page 1: PRIDE-ProteomeXchange

PRIDE resources and ProteomeXchange

Dr. Juan Antonio Vizcaíno

PRIDE Group CoordinatorProteomics Services TeamEMBL-EBIHinxton, Cambridge, UK

Page 2: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Data resources at EMBL-EBIGenes, genomes & variation

RNA CentralArrayExpress

Expression AtlasMetabolights

PRIDE

InterPro Pfam UniProt

ChEMBL ChEBI

Molecular structuresProtein Data Bank in EuropeElectron Microscopy Data Bank

European Nucleotide ArchiveEuropean Variation ArchiveEuropean Genome-phenome Archive

Gene, protein & metabolite expression

Protein sequences, families & motifs

Chemical biologyReactions, interactions & pathways

IntActReactome

MetaboLights

SystemsBioModels Enzyme Portal BioSamples

Ensembl Ensembl Genomes

GWAS CatalogMetagenomics portal

Europe PubMed CentralGene OntologyExperimental Factor Ontology

Literature & ontologies

Page 3: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

• PRIDE Archive (in the context of ProteomeXchange and the PSI standards)

• How to submit data to PRIDE: PRIDE tools

• How to access data in PRIDE Archive

• PRIDE Cluster and PRIDE Proteomes

Overview

Page 4: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

• PRIDE Archive (in the context of ProteomeXchange and the PSI standards)

• How to submit data to PRIDE: PRIDE tools

• How to access data in PRIDE Archive

• PRIDE Cluster and PRIDE Proteomes

Overview

Page 5: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

ProteomeXchange Consortium•Goal: Development of a framework to allow

standard data submission and dissemination pipelines between the main existing proteomics repositories.

•Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and (very recently) MassIVE (UCSD, San Diego).

•Common identifier space (PXD identifiers)

•Two supported data workflows: MS/MS and SRM.

•Main objective: Make life easier for researchers

http://www.proteomexchange.org

Page 6: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

• PRIDE stores mass spectrometry (MS)-based proteomics data:

• Peptide and protein expression data (identification and quantification)

• Post-translational modifications• Mass spectra (raw data and peak

lists)• Technical and biological metadata• Any other related information

• Full support for tandem MS approaches

PRIDE (PRoteomics IDEntifications) database

http://www.ebi.ac.uk/pride/archive Martens et al., Proteomics, 2005Vizcaíno et al., NAR, 2013

Page 7: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Mission

• To archive all types of proteomics mass spectrometry data for the purpose of supporting reproducible research, allowing the application of quality control metrics and enabling the reuse of these data by other researchers.

• To integrate MS-based data in a protein-centric manner to provide information on protein variants, modifications, and expression.

• To provide mass spectrometry based expression data to the Expression Atlas.

Page 8: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Mission

• To archive all types of proteomics mass spectrometry data for the purpose of supporting reproducible research, allowing the application of quality control metrics and enabling the reuse of these data by other researchers.

• To integrate MS-based data in a protein-centric manner to provide information on protein variants, modifications, and expression.

• To provide mass spectrometry based expression data to the Expression Atlas.

Page 9: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Data content in PRIDE Archive• Submission driven resource

• PRIDE is split in datasets (group of assays)

• An assay represents one MS run (in most cases).

• No data reprocessing at present. PRIDE aims to represent the author’s view on the data

• Supported formats: PRIDE XML and mzIdentML.

• Raw data is also now stored

Page 10: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

What is a proteomics publication in 2015?• Proteomics studies generate potentially large amounts of

data and results.

• Ideally, a proteomics publication needs to:• Summarize the results of the study• Provide supporting information for reliability of any

results reported

• Information in a publication:• Manuscript• Supplementary material• Associated data submitted to a public repository

Page 11: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Journal Submission Recommendations• Journal guidelines recommend submission to proteomics repositories:

Proteomics Nature Biotechnology Nature Methods Molecular and Cellular Proteomics

• Funding agencies are enforcing public deposition of data to maximize the value of the funds provided.

Page 12: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE: Source of MS proteomics data

• PRIDE Archive already provides or will soon provide MS proteomics data to other EMBL-EBI resources such as UniProt, Ensembl and the Expression Atlas.

http://www.ebi.ac.uk/pride

Page 13: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

ProteomeXchange Consortium•Goal: Development of a framework to allow

standard data submission and dissemination pipelines between the main existing proteomics repositories.

•Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and (very recently) MassIVE (UCSD, San Diego).

•Common identifier space (PXD identifiers)

•Two supported data workflows: MS/MS and SRM.

•Main objective: Make life easier for researchers

http://www.proteomexchange.org

Page 14: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

ProteomeCentral

Metadata / Manuscript

Raw Data*

Results

Journals

UniProt/neXtProt

Peptide Atlas

Other DBs

Receiving repositories

PASSEL (SRM data)

PRIDE (MS/MS data)

Other DBs

GPMDB

Researcher’s results

Reprocessed results

Raw data*

Metadata

MassIVE (MS/MS data)

Vizcaíno et al., Nat Biotechnol, 2014

ProteomeXchange data workflow

Page 15: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

• PRIDE Archive (in the context of ProteomeXchange and the PSI standards)

• How to submit data to PRIDE: PRIDE tools

• How to access data in PRIDE Archive

• A sneak peak to other PRIDE resources

Overview

Page 16: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

ProteomeCentral

Metadata / Manuscript

Raw Data*

Results

Journals

UniProt/neXtProt

Peptide Atlas

Other DBs

Receiving repositories

PASSEL (SRM data)

PRIDE (MS/MS data)

Other DBs

GPMDB

Researcher’s results

Reprocessed results

Raw data*

Metadata

MassIVE (MS/MS data)

Vizcaíno et al., Nat Biotechnol, 2014

ProteomeXchange data workflow

Page 17: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Complete

Partial

Complete vs Partial submissions: processed resultsFor complete submissions, it is possible to connect the spectra with the identification

processed results and they can be visualized.

Page 18: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Complete vs Partial submissions: experimental metadata

Complete Partial

General experimental metadata about the projects is similar. However, at the assay level information in partial submissions is not so detailed

Page 19: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

How to perform a complete PX submission to PRIDE

• Decide between a complete/partial submission.

• File conversion/export to PRIDE XML or mzIdentML

• File check before submission (PRIDE Inspector)

• Experimental annotation and actual file submission (PX submission tool)

• Post-submission steps

Page 20: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PX Data workflow for MS/MS data1. Mass spectrometer output files: raw data (binary files) or

peak list spectra in a standardized format (mzML, mzXML).

2. Result files:

a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard.

b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form.

3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter.

4. Other files: Optional files:a. QUANT: Quantification related results e. FASTAb. PEAK: Peak list files f. SP_LIBRARYc. GEL: Gel imagesd. OTHER: Any other file type

Published

RawFiles

Other files

Page 21: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PX Data workflow for MS/MS data1. Mass spectrometer output files: raw data (binary files) or

peak list spectra in a standardized format (mzML, mzXML).

2. Result files:

a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard.

b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form.

3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter.

4. Other files: Optional files (the list can be extended):a. QUANT: Quantification related results e. FASTAb. PEAK: Peak list files f. SP_LIBRARYc. GEL: Gel imagesd. OTHER: Any other file type

Published

RawFiles

Other files

Page 22: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Components: Submission Process

PRIDE Converter 2

PRIDE Inspector PX Submission Tool

mzIdentML

PRIDE XML1

Page 23: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Search output files

Spectra files

Original data files ‘RESULT’ file generation Final ‘RESULT’ file

PRIDE XML

‘RESULT’

Before: only file conversion to PRIDE XML

File conversion

PRIDE Converter

Other tools, e.g. hEIDI

Barsnes et al., Nat Biotechnol, 2009Cote et al., MCP, 2012

Page 24: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Tools ‘RESULT’ file generation Final ‘RESULT’ file

mzIdentML ‘RESULT’

Now: native file export to mzIdentML

Spectra files

(mzML, mzXML, mzData,

mgf, pkl,

ms2, dta, apl)

Mascot

ProteinPilot

Scaffold

PEAKS

MSGF+

Others

Native File export

Page 25: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Complete submissionsSearch Engine

Results + MS files

Search engines

mzIdentML

- Mascot- MSGF+- MyriMatch and related tools from D. Tabb’s

lab- OpenMS- PEAKS- PeptideShaker- ProCon (ProteomeDiscoverer, Sequest)- Scaffold- TPP via the idConvert tool (ProteoWizard)- ProteinPilot (from version 5.0)- X!Tandem native conversion (Beta,

PILEDRIVER)- Others: library for X!Tandem conversion, lab

internal pipelines, …- Crux

An increasing number of tools support export to mzIdentML 1.1

- Referenced spectral files need to be submitted as well (all open formats are supported).

Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.

Page 26: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Components: Submission Process

PRIDE Converter 2

PRIDE Inspector PX Submission Tool

mzIdentML

PRIDE XML

2

Page 27: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Inspector Toolsuite

Wang et al., Nat. Biotechnology, 2012Perez-Riverol et al., MCP, 2016, in press

PRIDE Inspector

PRIDE Inspector 2 supports:

- PRIDE XML- mzIdentML + all types of spectra files- mzML- mzTab identification and Quantification

https://github.com/PRIDE-Toolsuite/

Page 28: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Inspector 2PRIDE Inspector 2

https://github.com/PRIDE-Toolsuite/

New visualisation functionality for Protein Groups

Page 29: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Components: Submission Process

PRIDE Converter 2

PRIDE Inspector PX Submission Tool

mzIdentML

PRIDE XML

3

Page 30: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

•Capture the mappings between the different types of files.

•Make the file upload process straightforward to the submitter (It transfers all the files using Aspera or FTP).

PX submission tool

Published

Raw

Other files

http://www.proteomexchange.org/submission

PXsubmission

tool

•Command line alternative: Using the Aspera file transfer protocol.

Page 31: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PX submission tool: screenshots

Page 32: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Fast file transfer with Aspera

- Aspera is the default file transfer protocol to PRIDE:- PX Submission tool- Command line

- Up to 50X faster than FTP File transfer speed should not be a problem!!

Page 33: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Manuscript published detailing the process

Ternent et al., Proteomics, 2014http://www.proteomexchange.org/submission

Example dataset:PXD000764

- Title: “Discovery of new CSF biomarkers for meningitis in children”- 12 runs: 4 controls and 8 infected samples- Identification and quantification data

Page 34: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Archive: Number of submitted datasets in 2015

2012

-01

2012

-03

2012

-05

2012

-07

2012

-09

2012

-11

2013

-01

2013

-03

2013

-05

2013

-07

2013

-09

2013

-11

2014

-01

2014

-03

2014

-05

2014

-07

2014

-09

2014

-11

2015

-01

2015

-03

2015

-05

2015

-07

2015

-090

20

40

60

80

100

120

140

160

180

200

Number of submitted datasets to PRIDE Archive per month (November 1st 2015)

Page 35: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

ProteomeXchange: 2,774 datasets up until 1st September, 2015

Type: 1681 PRIDE partial 813 PRIDE complete 173 MassIVE 84 PeptideAtlas/PASSEL complete 23 Reprocessed

Publicly Accessible: 1372 datasets, 49% of all 90% PRIDE 6% PASSEL 4% MassIVE

Data volume:Total: ~150 TB Number of all files: ~400,000PXD000320-324: ~ 4 TBPXD002319-26 ~2.4 TBPXD001471 ~1.6 TB

Datasets/year: 2012: 102 2013: 527 2014: 963 2015: 1182

Top Species studied by at least 20 datasets:1080 Homo sapiens 335 Mus musculus 110 Saccharomyces cerevisiae 98 Arabidopsis thaliana 75 Rattus norvegicus 58 Escherichia coli 29 Bos taurus 23 Glycine max 20 Caenorhabditis elegans 20 Oryza sativa

~ 500 species in total

Origin: 714 USA313 Germany252 United Kingdom163 China146 France121 Netherlands108 Switzerland 103 Canada 81 Denmark 73 Spain 68 Japan 67 Australia 63 Sweden 57 Belgium 43 Austria 39 India 34 Taiwan 33 Norway 26 Italy 24 Ireland 24 Finland 21 Republic of Korea 20 Brazil 20 Russia 18 Israel 18 Singapore …

Page 36: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Public data release: when does it happen?• When the author tells us to do it (the authors can do it

by themselves)

• When we find out that a dataset has been published

• We look for PXD identifiers in PubMed abstracts.

• If your PXD identifier is not in the abstract, a paper may have been published and the data is still private. Let us know!

• New web form in the PRIDE web to facilitate the process

Page 37: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Partial submissions can be used to store other data types

• Everything can be stored, not only MS/MS data: very flexible mechanism to be able to capture all types of datasets

• PRIDE does not store SRM data (it goes to PASSEL)

• Top down proteomics datasets.

• Mass Spectrometry Imaging datasets.

• Data independent acquisition techniques: e.g. SWATH-MS datasets.

Page 38: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

C

D

From original publication [13] Reconstructed ProteomeXchange data

1. Thermo RAW data / UDP2. Mirion Software (JLU)

1. Thermo RAW data / UDP2. Convert to imzML3. Upload to PRIDE

(EBI, Cambridge, UK)

4. Download from PRIDE5. Display in MSiReader

- Vendor-independent data format- Freely available software (open source)- ‘open data‘ – free to reuse- Anybody can do this!

A public repository for mass spectrometry imaging dataRömpp et al., 2015

PRIDE databaseEuropean

Bioinformatics Institute,

Cambridge, UK

3. Upload

4. Download

No file size limit!

Page 39: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

• PRIDE Archive (in the context of ProteomeXchange and the PSI standards)

• How to submit data to PRIDE: PRIDE tools

• How to access data in PRIDE Archive

• PRIDE Cluster and PRIDE Proteomes

Overview

Page 40: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Data access to PRIDE Archive• Look for particular datasets of interest:

• For data reuse: which particular proteins and peptides (including PTMs) have been detected.

• Data reinterpretation or re-analysis.

• Validation of the experimental results reported.

• Specific use cases for proteomics: spectral libraries, fragmentation models, SRM transitions,…

Page 41: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

ProteomeCentral

Metadata / Manuscript

Raw Data*

Results

Journals

UniProt/neXtProt

Peptide Atlas

Other DBs

Receiving repositories

PASSEL (SRM data)

PRIDE (MS/MS data)

Other DBs

GPMDB

Researcher’s results

Reprocessed results

Raw data*

Metadata

MassIVE (MS/MS data)

Vizcaíno et al., Nat Biotechnol, 2014

ProteomeXchange data workflow

Page 42: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

ProteomeCentral: Portal for all PX datasets

http://proteomecentral.proteomexchange.org/cgi/GetDataset

Page 43: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

RSS feed for public datasets

http://groups.google.com/group/proteomexchange/feed/rss_v2_0_msgs.xml

Page 44: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Ways to access data in PRIDE Archive

• PRIDE web interface

• File repository

• REST web service

• PRIDE Inspector tool

Page 45: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Archive web interface

Page 46: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Archive web interface (2)

Page 47: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Vaudel M, Barsnes H, Berven FS, Sickmann A, Martens L: Proteomics 2011;11(5):996-9.

https://github.com/compomics/searchgui https://github.com/compomics/peptide-shaker

Vaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L, Barsnes H:Nature Biotechnology 2015; 33(1):22-24.

CompOmics Open Source Analysis Pipeline

Page 48: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Find the desired PRIDE project …

… and start re-analyzing the data!

… inspect the project details ….

Reshake PRIDE data!

Page 49: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

• PRIDE Archive (in the context of ProteomeXchange and the PSI standards)

• How to submit data to PRIDE: PRIDE tools

• How to access data in PRIDE Archive

• PRIDE Cluster and PRIDE Proteomes

Overview

Page 50: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE resources

Page 51: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Archive

Aggregation

PRIDECluster

Basic QC checks for

PSMs

Reprocesseddatasets

Original Submissions

Link to the original evidence

For original results

PRIDE Proteomes

Page 52: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Sneak peak• Provide an aggregated and QC filtered peptide-

centric and protein centric view on PRIDE Archive data. http://www.ebi.ac.uk/pride/cluster/http://wwwdev.ebi.ac.uk/pride/proteomes/

Page 53: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Cluster - Concept• Use spectral clustering to reliably group spectra

coming from the same peptide• Infer reliable identifications by comparing

submitted identifications of spectra within a cluster

• Increases quality through data increase (taking advantage of the wealth of data in PRIDE).

• Inherently adapts to new (labelling) techniques

Griss et al., Nat Methods, 2013

Page 54: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Cluster - Concept

Griss et al., Nat Methods, 2013

NMMAACDPR

NMMAACDPR

PPECPDFDPPR

NMMAACDPR

Consensus spectrum

PPECPDFDPPR

NMMAACDPR

NMMAACDPR

Threshold: At least 10 spectra in a cluster and ratio >70%.

Originally submitted identified spectra

Page 55: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Cluster Home page

http://www.ebi.ac.uk/pride/cluster/#/

Page 56: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Cluster: result of searches

http://www.ebi.ac.uk/pride/cluster/#/

A couple of examples …

Page 57: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Examples: one perfect cluster

- 880 PSMs give the same peptide ID- 4 species- 28 datasets- Same instruments

Page 58: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Examples: one perfect cluster (2)

Page 59: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Examples: one perfect cluster (3)What does that peptide sequence correspond to?

Page 60: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Examples: very good cluster

Page 61: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Examples: very good cluster (2)

Page 62: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Examples: one perfect cluster (3)What does that peptide sequence correspond to?

Page 63: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Cluster – Spectral libraries

http://www.ebi.ac.uk/pride/cluster/#/libraries

Page 64: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE Proteomes: reusing PRIDE Cluster data

• Condensed and cross-dataset view of PRIDE Archive for identification data:• Data filtering of PSMs is performed at the level of the

submitted data.• PSMs are grouped as peptide sequences.• The peptide sequences are remapped to a recent

version of UniProtKB (at present UniProtKB “complete proteome”).

• Linked to the original supporting evidence.• “PRIDE Cluster” used as an extra evidence for the PSMs.

http://wwwdev.ebi.ac.uk/pride/proteomes/

Page 65: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

PRIDE: Using it for giving reliability to IDs

Link to PRIDE Cluster web

http://wwwdev.ebi.ac.uk/pride/proteomes/

Page 66: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Examples: one perfect cluster

- 880 PSMs give the same peptide ID- 4 species- 28 datasets- Same instruments

Page 67: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

• Main characteristics of PRIDE Archive and ProteomeXchange

• PX/PRIDE submission workflow for MS/MS data• PRIDE Inspector• PX submission tool

• PRIDE/ProteomeXchange has become the de facto standard for data submission and data availability in proteomics

• PRIDE Proteomes and PRIDE Cluster: new resources

Conclusions

Page 68: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Do you want to know a bit more…?

http://www.slideshare.net/JuanAntonioVizcaino

Page 69: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Aknowledgements: PeopleAttila CsordasTobias TernentNoemi del Toro

Johannes GrissYasset Perez-Riverol

Henning Hermjakob

All past team members, especially Rui Wang, Florian Reisinger and Jose A. Dianes

All ProteomeXchange partners, especially Eric Deutsch and Nuno Bandeira

Acknowledgements: The PRIDE Team and collaborators

Page 70: PRIDE-ProteomeXchange

Juan A. Vizcaí[email protected]

WT Proteomics Bioinformatics Course 2015Hinxton, 10 December 2015

Questions?