using ms/ms spectrum libraries for the detection of · –combine sequence search with spectrum...

41
© 2009 SIB Using MS/MS Spectrum Libraries for the Detection of PTM’s Markus Müller Swiss Institute of Bioinformatics Geneva, Switzerland

Upload: others

Post on 16-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Using MS/MS Spectrum Libraries for the Detection of

PTM’s

Markus Müller

Swiss Institute of Bioinformatics

Geneva, Switzerland

Page 2: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Outline

• MS/MS peptide identification – Spectrum library versus sequence search

• QuickMod MS/MS workflow

• QuickMod Open modification spectrum library search – Alignment scoring

– Statistical validation

– Positioning of modifications

2 QuickMod Tutorial 2011

Page 3: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Spectrum Library Searches

3 QuickMod Tutorial 2011

Page 4: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Spectrum Library Searching

QuickMod Tutorial 2011 4

Page 5: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB QuickMod Tutorial 2011 5

Peptide-Spectrum Match (PSM)

p = LREQLGPVTQEFWDNLEK; z = 3

Page 6: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Spectrum Library Search Scoring

• Log-transform intensities (variance stabilization, i.e. the variance of a peak becomes independent of its intensity).

• Bin peak (m/z-intensity) lists into bins of width =0.1-1.0 m/z units.

• Normalized dot-product score:

6

21

1

22

21

1

11

1

21

21

21

1

2

22

2

2

1

2

2

2222

11

2

1

1

1

1

1111

cos

log

,..,,1;log, Spectrum

,..,,1;log, Spectrum

minmin

N

i

ii

N

i

ii

N

i

ii

jmmjm

i

k

j

Nbinningiii

Nbinningiii

ssss

ss

SS

SSscore

Is

sssSniImPS

sssSniImPS

ki

QuickMod Tutorial 2011

Page 7: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Spectral Library Search

7 QuickMod Tutorial 2011

Zhang et al., Proteomics 2011

Page 8: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Spectral Library Search

8 QuickMod Tutorial 2011

Zhang et al.

Page 9: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Spectral Library Search

9 QuickMod Tutorial 2011

Zhang et al.

Page 10: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Spectral Library Search

10 QuickMod Tutorial 2011

Zhang et al., Proteomics 2011

Page 11: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Spectrum Library Searches

• Spectrum library searches are more accurate than sequence searches.

• Scoring is less critical and easier to implement.

• Spectrum library searches are very fast compared to sequence searches.

• Libraries must be complete. Low abundance proteins are rarely found in spectrum libraries.

• Different libraries for different instruments.

11 QuickMod Tutorial 2011

Page 12: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Completeness of Libraries

12 QuickMod Tutorial 2011

Yeast data and one of the completest yeast libraries: 20281 of 25348 non-phospho peptides found 14186 of 31120 phospho peptides found

Page 13: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Completeness of Spectrum Libraries

• Only 2 TF in NIST spectrum libraries of human protein!

– For a given biological sample, measure the sample repetitively using inclusion/exclusion list to get maximum coverage of the peptides in the spectrum library (Schmidt A, et al.)

– Clone TF in bacteria, purify, digest and measure with LC-MS (Bart Deplancke Lab)

– Create synthetic peptides for all proteins of an organism and measure them with LC-MS (Aebersold lab)

– Combine sequence search with spectrum library search (Ahrne et al, 2009)

– Create realistic in silico spectra to complement real spectra (Cannon et al, JPR, 2011)

• Few modified peptides in libraries – Use and OMS spectrum library search tool, if the unmodified form of the

peptide is present (QuickMod, see below)

– Isolate modified peptides and create spectrum libraries for specific modifications (PhosphoPep, PHOSPHIDA,..)

13 QuickMod Tutorial 2011

Page 14: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Prediction of MS/MS Spectra

14 QuickMod Tutorial 2011

Cannon et al, JPR, 2011 Zhang et al., Proteomics 2011

Page 15: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Spectrum Library Searches

15

Ahrne et al., Proteomics 2009

QuickMod Tutorial 2011

Page 16: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Spectrum Libraries

Spectra identified with SpectraST, but not with Phenyx Ahrné et al. Proteomics, 2009

16 QuickMod Tutorial 2011

Page 17: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

QuickMod Spectral Library Search Workflow

17

Ahrné et al, Proteomics, 2009

QuickMod Tutorial 2011

Page 18: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Combining Search Tools (PepArML)

18 QuickMod Tutorial 2011

https://edwardslab.bmcb.georgetown.edu/pymsio/

Page 19: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Random and True Matches

• When searching a large database, most of the candidate peptides are not present at a detectable level in a MS2 spectrum.

• For example, in silico tryptic digest of 10000 proteins may yield 100x 10000 = 1’000’000 peptides, but only 300 of these peptides may actually be detectable in MS2 spectra.

• The score distribution will (hopefully) be bimodal: many low scores for the random matches and higher scores for the true matches.

• The random and true score distributions will evidently overlap, if the database is large.

19 QuickMod Tutorial 2011

Page 20: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Statistical Scores

False discovery rate : FDR = FPR = B/(A+B); P-value: pValue = B/(B+C) Posterior error probability: PEP = b/(a+b) (see TPP)

20 QuickMod Tutorial 2011

Page 21: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Statistical Scores

• Statistical scores do not depend on the details of the scoring function.

• The underlying scoring function can even be multidimensional, i.e. include several scores of a search engine.

• Statistical scores have a unified probabilistic interpretation, i.e. they correspond to frequencies and counts.

• This allows comparing the statistical scores of different search engines with each other.

21 QuickMod Tutorial 2011

Page 22: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

False Discovery Rate (FDR) • Decoy search to control FDR on peptide and protein level

• Works for both single and combined runs if applied correctly

• Does not provide an answer about modification positioning.

• Does not provide an answer if there is more than one high scoring PSM.

• FDR is very sensitive to high scoring random matches.

• The number of peptides identified at a given FDR is dependent on the way the decoy database is created and the way FDR is calculated.

• Statistically the FDR is an expectation value, i.e. the mean of many different decoy searches:

• Each estimate with a single decoy db is only accurate within its standard error (Granholm & Käll, Proteomics 2011):

0/ TPFPTPFPFPEFDR

0025.05.0,01.0,2400

/1

FDRTPFP

TPFPFDR

22 QuickMod Tutorial 2011

Page 23: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Robustness of FDR

23 QuickMod Tutorial 2011

Page 24: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Creation of Decoy Spectrum Libraries

QuickMod Tutorial 2011

24

1. Shuffle sequence

2. Move annotated b,y,c,z-ions in accordance with shuffled sequence (e.g. y8+ -> y8+)

3. Sample non-annotated m/z if they do not belong to a conserved pattern (intensity is left intact)

Ahrne et al, Preoteomics, 2011

Page 25: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Fragment Peak Distribution

25 QuickMod Tutorial 2011

ETD

IT

Page 26: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Controlling FDR

26

DeLiberator Ahrné et al, Proteomics, 2011

QuickMod Tutorial 2011

Page 27: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

MS/MS Spectra of Modified Peptides

• Modifications of mass of a amino acid in a peptide induce several important changes in the MS/MS spectrum:

– Precursor m/z is shifted by /z

– All the m/z values of the fragment ions, which contain the modified amino acid are shifted by /z

– All the m/z values of the fragment ions, which do not contain the modified amino acid remain the same. However, their intensities my change significantly.

– Multiple modifications induce more complicated changes.

27 QuickMod Tutorial 2011

Page 28: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Similarity Between Modified and Unmodified Spectra

28

Oxidation of GQGTLSVVTM{16}YHK/2

Phosphorylation of TY{80}FPHFDLSHGSAQVK/2

QuickMod Tutorial 2011

Page 29: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

QuickMod Open modification search: Spectral alignment and scoring Controlling FDR Modification

positioning

29

Ahrné et al. Recomb2011/JPR, submitted

QuickMod Tutorial 2011

Page 30: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

OMS: Spectrum Libraries Versus Theoretical Spectra

30 QuickMod Tutorial 2011

Page 31: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

QuickMod Scores

31

QuickMod score = Linear SVM combination of 3 best scores

Z=2

Z=3

QuickMod Tutorial 2011

Page 32: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Benchmarking

32

Speed: InsPecT 30 min, PTMFinder 5 min; SpectraST 55 min; QuickMod 5 min

QuickMod Tutorial 2011

Page 33: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Modification Positioning

33

C I S K

b1,b2,b3 b2,b3,y3 b3,y2,y3 y3,y2,y3

- 1 - 1 - 1 -1 - 1 + 1 -1 + 1 + 1 +1 + 1 + 1

-3 -1 +1 +3

QuickMod Tutorial 2011

Page 34: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Modification Positioning

34 QuickMod Tutorial 2011

Page 35: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Multiple Modifications

• QuickMod is primarily designed for single modifications

• Double modifications can also be detected as long as the 2 modified residues are close together

• Positioning yields a region between the two modified amino acids

35 QuickMod Tutorial 2011

Page 36: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Modification Positioning

36 QuickMod Tutorial 2011

Page 37: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Modification Positioning

1) QuickMod Workflow 2) Directed MS (Inclusion list)

3) Complimentary Fragmentation CID/HCD or MS3

HCD/CID

CID

B2,Y2

IK,IF,IH Y3 Y4

Y5

Y7 Y8

37 QuickMod Tutorial 2011

Page 38: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

QuickMod Tools

QuickMod Tutorial 2011

38

Page 39: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Java Proteomics Library (JPL) http://javaprotlib.sourceforge.net/

39 QuickMod Tutorial 2011

Page 40: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Future Work

• Extend alignment to multiple modifications

• Develop modification specific scores and positioning algorithms (phosphorylation)

• Work on combined sequence search and spectrum library search

• Apply QM to large datasets for phosphorylation and other modifications.

• Use it for verification of MS/MS assignments.

• …

40 QuickMod Tutorial 2011

Page 41: Using MS/MS Spectrum Libraries for the Detection of · –Combine sequence search with spectrum library search (Ahrne et al, 2009) –Create realistic in silico spectra to complement

© 2009 SIB

Many Thanks to

Proteome Informatics Group Swiss Institute of Bioinformatics Swetha Ramagoni Luc Mottin Leelapavan Tadoori Nottania Campbell Erik Ahrné Yuki Ohta Frederic Nikitin Rostyk Kuzyakiv Dominique Kadio Koua Patricia Palagi Markus Müller Frederique Lisacek

BPRG Alex Scherl Maria Ramirez-Boo Xavier Robin Alex Hainard Natacha Turck Jean-Charles Sanchez

SCAHT Laurent Geiser Florent Glück Paola Antinori Denis Hochstrasser

41 QuickMod Tutorial 2011

SIP-CUI Fokko Beekhof Oleksiy Koval Slava Voloshynovskiy