submitting your data to proteomexchange – a mini tutorial

46
Submitting your data to ProteomeXchange A Mini-Tutorial Dr. Juan Antonio Vizcaíno PRIDE Group Coordinator Proteomics Services Team EMBL-EBI Hinxton, Cambridge, UK

Upload: juan-antonio-vizcaino

Post on 02-Jul-2015

711 views

Category:

Science


7 download

DESCRIPTION

Talk during the PSI/ProteomeXchange workshop in HUPO 2014. It summarizes how to submit data to ProteomeXchange via PRIDE.

TRANSCRIPT

Page 1: Submitting your data to ProteomeXchange – a mini tutorial

Submitting your data to

ProteomeXchange – A Mini-Tutorial

Dr. Juan Antonio Vizcaíno

PRIDE Group Coordinator

Proteomics Services Team

EMBL-EBI

Hinxton, Cambridge, UK

Page 2: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Overview

• The ProteomeXchange (PX) consortium

• How to submit and access data in PX via PRIDE

• How to access PX data

• Submitting data triggers data reuse

Page 3: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

ProteomeXchange Consortium

• Goal: Development of a framework to allow

standard data submission and dissemination

pipelines between the main existing proteomics

repositories.

• Includes PeptideAtlas (ISB, Seattle), PRIDE

(Cambridge, UK) and (very recently) MassIVE

(UCSD, San Diego).

• Common identifier space (PXD identifiers)

• Two supported data workflows: MS/MS and SRM.

• Main objective: Make life easier for researchers

http://www.proteomexchange.org

Page 4: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

ProteomeCentral

Metadata /

Manuscript

Raw Data*

Results

Journals

UniProt/

neXtProt

Peptide Atlas

Other DBs

Receiving repositories

PASSEL

(SRM data)

PRIDE

(MS/MS data)

Other DBs

GPMDB

Researcher’s results

Reprocessed results

Raw data*

Metadata

MassIVE

(MS/MS data)

Vizcaíno et al., Nat Biotechnol, 2014

ProteomeXchange data workflow

Page 5: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

ProteomeCentral

Metadata /

Manuscript

Raw Data*

Results

Journals

UniProt/

neXtProt

Peptide Atlas

Other DBs

Receiving repositories

PASSEL

(SRM data)

PRIDE

(MS/MS data)

Other DBs

GPMDB

Researcher’s results

Reprocessed results

Raw data*

Metadata

MassIVE

(MS/MS data)

Vizcaíno et al., Nat Biotechnol, 2014

ProteomeXchange data workflow

Page 6: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Overview

• The ProteomeXchange (PX) consortium

• How to submit and access data in PX via PRIDE

• How to access PX data

• Submitting data triggers data reuse

Page 7: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

PRIDE (PRoteomics IDEntifications) database

http://www.ebi.ac.uk/pride

• Focused on MS/MS

approaches

• Other data types can

also be stored as

“Partial submissions”.

Martens et al., Proteomics, 2005

Vizcaíno et al., NAR, 2013

Page 8: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Manuscript just out detailing the process

Ternent et al., Proteomics, 2014http://www.proteomexchange.org/submission

Example dataset:

PXD000764

- Title: “Discovery of new CSF biomarkers for meningitis in children”

- 12 runs: 4 controls and 8 infected samples

- Identification and quantification data

Page 9: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

PX Data workflow for MS/MS data1. Mass spectrometer output files: raw data (binary files) or

peak list spectra in a standardized format (mzML, mzXML).

2. Result files:

a. Complete submissions: Result files can be converted to

PRIDE XML or the mzIdentML data standard.

b. Partial submissions: For workflows not yet supported by

PRIDE, search engine output files will be stored and

provided in their original form.

3. Metadata: Sufficiently detailed description of sample origin,

workflow, instrumentation, submitter.

4. Other files: Optional files:

a. QUANT: Quantification related results e. FASTA

b. PEAK: Peak list files f. SP_LIBRARY

c. GEL: Gel images

d. OTHER: Any other file type

Published

RawFiles

Other files

Page 10: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Complete vs Partial submissions: experimental metadata

Complete Partial

General experimental metadata about the projects is similar.

However, at the assay level information in partial submissions is not so detailed

Page 11: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Complete

Partial

Complete vs Partial submissions: processed results

For complete submissions, it is possible to connect the spectra with the identification

processed results and they can be visualized.

Page 12: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

How to perform a complete PX submission to PRIDE

• Decide between a complete/partial submission.

• File conversion/export to PRIDE XML or mzIdentML

• File check before submission (PRIDE Inspector)

• Experimental annotation and actual file submission (PX

submission tool)

• Post-submission steps

Page 13: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

How to perform a complete PX submission to PRIDE

• Decide between a complete/partial submission.

• File conversion/export to PRIDE XML or mzIdentML

• File check before submission (PRIDE Inspector)

• Experimental annotation and actual file submission (PX

submission tool)

• Post-submission steps

Page 14: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

How to perform a complete PX submission to PRIDE

• Complete submission:

• MS/MS data.

• Processed results can be converted to the PSI standard

mzIdentML or PRIDE XML.

• Partial submission:

• Any type of data (not SRM, which goes to PASSEL)

• E.g. top down, data independent acquisition, MS Imaging (to

come), etc.

• Processed results cannot be converted to a data standard.

Page 15: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

How to perform a complete PX submission to PRIDE

• Decide between a complete/partial submission.

• File conversion/export to mzIdentML or PRIDE XML

• File check before submission (PRIDE Inspector)

• Experimental annotation and actual file submission (PX

submission tool)

• Post-submission steps

Page 16: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Complete submissions

Search

Engine

Results +

MS files

Search

engines

mzIdentML

- Mascot

- MSGF+

- Myrimatch and related tools from D. Tabb’s lab

- OpenMS

- PEAKS

- ProCon (ProteomeDiscoverer, Sequest)

- Scaffold

- TPP via the idConvert tool (ProteoWizard)

- ProteinPilot (planned by the end of 2014)

- Others: library for X!Tandem conversion, lab

internal pipelines, …

An increasing number of tools support export to mzIdentML

1.1

- Referenced spectral files need to be submitted as well

(all open formats are supported).

Updated list: http://www.psidev.info/tools-implementing-

mzIdentML#.

Page 17: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Search

output

files

Spectra

files

Original data files ‘RESULT’ file generation Final ‘RESULT’ file

PRIDE

XML

‘RESULT’

Before: file conversion

File conversion

PRIDE

Converter

Page 18: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Tools ‘RESULT’ file generation Final ‘RESULT’ file

mzIdentML

‘RESULT’

Now: native file export

Spectra

files

Mascot

ProteinPilot

Scaffold

PEAKS

MSGF+

Others

Native File export

Page 19: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

How to perform a complete PX submission to PRIDE

• Decide between a complete/partial submission.

• File conversion/export to PRIDE XML or mzIdentML

• File check before submission (PRIDE Inspector)

• Experimental annotation and actual file submission (PX

submission tool)

• Post-submission steps

Page 20: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Available for complete submissions

Wang et al., Nat. Biotechnology, 2012

PRIDE Inspector 2.0

PRIDE Inspector 2.0 supports:

- PRIDE XML

- mzIdentML + all types of spectra files

- mzML- mzTab Ident (work in progress)

http://code.google.com/p/pride-

toolsuite/wiki/PRIDEInspector

Page 21: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

How to perform a complete PX submission to PRIDE

• Decide between a complete/partial submission.

• File conversion/export to PRIDE XML or mzIdentML

• File check before submission (PRIDE Inspector)

• Experimental annotation and actual file submission (PX

submission tool)

• Post-submission steps

Page 22: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

• Capture the mappings between the different types of files.

• Make the file upload process straightforward to the submitter (It transfers all the

files using Aspera or FTP).

PX submission tool

Published

Raw

Other files

http://www.proteomexchange.org/submission

PX

submission

tool

• Command line alternative: some scripting is needed

Page 23: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

PX submission tool: step by step

Step 1

Step 2

Page 24: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

PX submission tool: step by step

Step 3

Step 4

Page 25: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

PX submission tool: step by step

Step 5 Step 6

Page 26: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

PX submission tool: step by step

Step 7

Step 8

Page 27: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

PX submission tool: step by step

Step 9

Page 28: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Fast file transfer with Aspera

- Aspera is the default file transfer protocol to PRIDE:

- PX Submission tool

- Command line

- Up to 50X faster than FTPFile transfer speed should

not be a problem!!

Page 29: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

PX submission tool: HPP tags

Page 30: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Batch submissions on the command line

• Generate on your own the PX summary file (generated

by default by the PX submission tool).

MTD first_name John Arthur

MTD last_name Smith

MTD email [email protected]

MTD affiliation University of Cambridge

MTD title Human proteome

MTD description An experiment about human proteome

MTD keyword human, proteome

MTD pubmed 12345

MTD px 10.1000/182

MTD pride_login pride-user

FMH file_id file_type file_path file_mapping

FME 1 result /path/to/pride/xml/files/pride-1.xml7,8,9

FME 2 result /path/to/pride/xml/files/pride-2.xml4

FME 3 result /path/to/mzidentml/files/mzidentml-1.xml 5,10

FME 4 raw /path/to/raw/files/raw-1.bin

FME 5 raw /path/to/raw/files/raw-2.bin

FME 6 raw /path/to/raw/files/raw-3.bin

FME 7 raw ftp://some.url/at/some/place/raw-4.bin

FME 8 search/path/to/search/engine/output/search-1.out

FME 9 other /path/to/other/file/other-1.e

FME 10 peak /path/to/peak/list/mzml-1.xml

Page 31: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Batch submissions on the command line

• Generate on your own the PX summary file (generated

by default by the PX submission tool).

Page 32: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Batch submissions on the command line (2)

• Generate on your own the PX summary file (generated

by default by the PX submission tool).

• Put together all the files plus the PX summary file.

• Ask PRIDE team for a specific upload directory (pride-

[email protected])

Page 33: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

How to perform a complete PX submission to PRIDE

• Decide between a complete/partial submission.

• File conversion/export to PRIDE XML or mzIdentML

• File check before submission (PRIDE Inspector)

• Experimental annotation and actual file submission (PX

submission tool)

• Post-submission steps

Page 34: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Post-processing steps

• PRIDE curators will check the files

• Files must be valid to the schema

• All the required annotations must be there

• Basic QC check (e.g. detect errors in PTM annotation)

• If everything is correct, submission to PRIDE is done

• The author receives a PXD identifier, a reviewer username

and a password, and a DOI (for complete submissions).

Page 35: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Overview

• The ProteomeXchange (PX) consortium

• How to submit and access data in PX via PRIDE

• How to access PX data

• Submitting data triggers data reuse

Page 36: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

ProteomeCentral

Metadata /

Manuscript

Raw Data*

Results

Journals

UniProt/

neXtProt

Peptide Atlas

Other DBs

Receiving repositories

PASSEL

(SRM data)

PRIDE

(MS/MS data)

Other DBs

GPMDB

Researcher’s results

Reprocessed results

Raw data*

Metadata

MassIVE

(MS/MS data)

Vizcaíno et al., Nat Biotechnol, 2014

ProteomeXchange data workflow

Page 37: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

ProteomeCentral: Portal for all PX datasets

http://proteomecentral.proteomexchange.org/cgi/GetDataset

Page 38: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Origin:

235 USA

142 Germany

97 United Kingdom

67 Switzerland

64 Netherlands

62 China

60 France

48 Canada

43 Spain

36 Belgium

32 Sweden

29 Australia

26 Denmark

23 Japan

18 Taiwan

17 India

16 Ireland

14 Norway

14 Italy

12 Finland

11 Republic of Korea

10 Brazil

8 Austria

7 Israel

7 Singapore …

ProteomeXchange: 1,148 datasets up until August 2014

Type:

386 PRIDE complete

687 PRIDE partial

51 PeptideAtlas/PASSEL complete

1 MassIVE

23 reprocessed

Publicly Accessible:

544 datasets, 50% of all

90% PRIDE

10% PASSEL

Data volume:

Total: ~51 TB

Number of all files: ~130,000

PXD000320-324: ~ 5 TB

PXD000065: ~ 1.4TB

Top Species studied by at least 10

datasets:

510 Homo sapiens

142 Mus musculus

46 Saccharomyces cerevisiae

45 Arabidopsis thaliana

23 Rattus norvegicus

16 Escherichia coli

15 Bos taurus

15 Mycobacterium tuberculosis

13 Oryza sativa

12 Drosophila melanogaster

12 Glycine max

~ 265 species in totalDatasets/year:

2012: 102

2013: 527

2014: 519

Page 39: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Overview

• The ProteomeXchange (PX) consortium

• How to submit and access data in PX via PRIDE

• How to access PX data

• Submitting data triggers data reuse

Page 40: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Which are the most accessed datasets?

PXD Identifier Hits Dataset title Publication

PXD000561 153512 A draft map of the human proteome

Kim et al.,

Nature,2014.

PMID: 24870542

PXD000851 111587

Membrane proteomic analysis of

colorectal cancer tissue

Kume et al., MCP,

2014.

PMID:24687888

PXD000865 51639

Mass spectrometry based draft of

the human proteome

Wilhelm et al., 2014,

Nature,

PMID:24870543

Page 41: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Which are the most accessed datasets?Tota

l N

um

bers

Page 42: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Vaudel M, Barsnes H, Berven FS, Sickmann A,

Martens L:

Proteomics 2011;11(5):996-9.

http://searchgui.googlecode.com http://peptide-shaker.googlecode.com

Vaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L,

Barsnes H:

Nature Biotechnology (in press)

CompOmics Open Source Analysis Pipeline

Page 43: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Find the desired PRIDE project …

… and start re-analyzing the data!

… inspect the project details ….

Reshake PRIDE data!

Page 44: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Conclusions

• Submission to ProteomeXchange via PRIDE is easy.

• Decide between complete and partial submissions.

• Different open source tools available to facilitate the process.

• File transfer speed should not be a problem (Aspera support)

Page 45: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Acknowledgements

PRIDE Team

Attila Csordas

Rui Wang

Florian Reisinger

Jose A. Dianes

Tobias Ternent

Yasset Perez-Riverol

Noemi del Toro

Henning Hermjakob

EU FP7 grant number 260558

PeptideAtlas Team (ISB, Seattle)

Eric Deutsch

Terry Farrah

Zhi Sun

Andrew R. Jones

Lennart Martens

Juan Pablo Albar

Martin Eisenacher

Gil Omenn

And many other PX partners and

stakeholders

Page 46: Submitting your data to ProteomeXchange – a mini tutorial

Juan A. Vizcaí[email protected]

13th HUPO World CongressMadrid, 7 October 2014

Questions?