modification site localization why is this a problem? calculating localization reliability ways of...

18
Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

Upload: makenna-roswell

Post on 28-Mar-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

Modification Site Localization

•Why is this a problem?

•Calculating localization reliability

•Ways of representing reliability

•Modification ambiguity

Page 2: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

PTM Analysis: An Exploding Field

•Large-scale PTM characterization studies are now common

• Phosphorylation

• O-GlcNAcylation

• Acetylation

• …

•Database search engines can identify modified peptides and report a measure of reliability for peptide IDs

• Peptide Level: p-value; e-value

• Dataset Level: FDR

•Most search engines do not assess modification site assignment reliability.

• No standard FLR calculation method

Page 3: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

Search Engine Performance for Site Assignment

•Database search engines are optimized for peptide identification

•Optimal parameters for discriminating between correct and random answers are not same as for site identification

• More peaks may be needed for site assignment

•Reliability of modified peptide identifications is higher than PTM site assignments

•What most search engines do:

• Report site consistent with data

• May be more than one site equally consistent with the data

• No information about how reliable site assignment is

Bradshaw et al. J Mass Spectrom (2010) 45 10 1095-1097

Page 4: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

There are Mistakes In The Literature

•There are several large-scale PTM datasets where site assignment was ‘by manual verification’.

• Did authors carefully look at 1000+ spectra?

•Results from publications are used to populate other databases

SwissProt

Phosphosite

Page 5: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

Evidence for Serine 486 Phosphorylation

•Spectrum from publication reporting unambiguous assignment of serine 4 (serine 487) phosphorylation.

Annotated spectra associated with publications are useful!

Page 6: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

Why I highlighted this example

•I found this modification site in my own data in 2006

SwissProt Entry of this protein in 2006

Page 7: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

Site Assignment Scoring Methods (1)

Probability of randomly observing a given peak

• A-Score (Gygi)

• PTM Score (Mann)

•Probability calculation based on unit mass measurement and assuming all masses equally possible at random:

• e.g. if considering 4 peaks per 100 Da, then probability of random match of a given peak is 4%

• A-score is a number; PTM score reports a probability

•How valid are these assumptions?

• Nominal mass may be appropriate for poor mass accuracy ion trap data, but not for high mass accuracy data

• Could adjust probability calculation to more mass ‘bins’

• All masses are not equally probable; e.g. for b ions:

• 201 – EA, LP, IP, TV 204 – Not possible

• 202 – NS 205 – FG, CT

• 203 – MA, CV, TT 206 – Not possible

Page 8: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

Score/probability difference

•Compare search engine probabilities for peptide IDs with different site assignments

• Mascot Delta Score

• SLIP Score

e.g. Top scoring assignment: E-value: 1E-5

Next best site assignment: E-value 1E-4; SLIP score=10

Next best site assignment: E-value 1E-3; SLIP score=20

Advantages:

•Can be calculated as part of database search

•Accounts for variation of probability of observing different masses

•If search engine makes use of mass accuracy, score will adjust to data of different mass accuracy

Site Assignment Scoring Methods (2)

Page 9: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

Assessing Reliability of Site Localization Scoring

•Data from 180 synthetic phosphopeptides

•Tested with wide range of fragmentation data (CID, HCD, ETD, MSA…)

•Comparison of Mascot Delta Score to A-score

• SLIP Score in Protein Prospector

•PhosphoRS used different set of synthetic phosphopeptides

Savitski et al. Mol Cell Proteomics (2011) M110.003830

Page 10: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

SLIP Score vs A-Score vs MD-Score

Dataset: QTOF Micro CID Data of 180 synthetic phosphopeptides1

• Modification sites known

Data Searched by Mascot: 2174 correct spectra matches

Data Searched by PP: 2334 correct spectra matches

Baker et al. Mol Cell Proteomics (2011) M111.008078

SLIP Score A-Score MD-Score

Site IDs 2053 1584 1840

Incorrect Sites 130 138 201

FLR 6.3% 8.7% 10.9%

1 Site Possible 164590 334

Ambiguous 220

Page 11: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

SLIP Score0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0 2 4 6 8 10 12 14 16 18 20

Lo

ca

l FL

R

PP Site Score

Phospho E

Phospho P

SLIP Score

Decoy Sites for Estimating PEP (Local FLR)•Test Dataset: Synaptic phosphopeptides acquired in LTQ-Orbitrap Velos (IT-CID): 70,000 phosphopeptide spectra identified

•Altered Batch-Tag to allow for phosphorylation of Pro and Glu

•Filtered results to only phosphopeptide IDs containing one S, T or Y

• Modification site known

•Local FLR: SLIP score of 6 = 95% correct

•Global FLR (matches to phosphoP and phosphoE) similar to QTOF Micro data.

• Similar score threshold appropriate for ion trap CID and quadrupole CID data

Page 12: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

Representing Ambiguity

VATVSVLATR – Singly phosphorylated

Phospho@5=3Best site assignment with associated score. No information as to which is second best site.

Example software: A-Score; Mascot Delta Score; SLIP Score

Phospho@3|5Indicating inability to differentiate between two sites, either due to no information, or confidence below a defined threshold

Example software: SLIP Score; VML Score

VAT(0.1)VS(0.89)VLAT(0.01)RProbabilities for all potential site assignments within peptide are reported

Example software: PTM Score / MaxQuant; PhosphoRS

Page 13: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

Representing Ambiguity

VATVSVLATR – Doubly phosphorylated

Phospho@3=12; Phospho@5=3 Best site assignments with associated scores. Separate score calculated for each site assignment. Score is in comparison to best assignment not containing a particular modification site; i.e. @3 is relative to when residues 5 and 9 are modified.

Phospho@3=12; Phospho@5|9One site has confidence measure; other site does not.

VAT(0.95)VS(0.9)VLAT(0.15)RProbabilities are combination probabilities for one of the two modifications.

Page 14: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

Site-Level or Peptide-Level Assesment for Localization Reliability

All current software reports reliability for individual site localizations, but software could in theory calculate a reliability for the combination of modifications reported:e.g. VAT(0.95)VS(0.9)VLAT(0.15)RCould be reported as VAT(phospho)VS(phospho)VLATR with probability (0.95x0.9=) 0.86

Page 15: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

Modification Ambiguity

•Some modifications are isobaric• Acetyl vs Trimethyl; Phospho vs Sulfo; Ser->Thr vs Methyl

•Some combinations of modifications are isobaric /isomeric with a single modification

• Methyl + Methyl vs Dimethyl• Carbamidomethyl + Carbamidomethyl vs GlyGly (ubiquitin)• Carbamidomethyl + methyl vs propionamide (acrylamide)• Acetyl + K+/Ca2+ adduct vs phospho

Page 16: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

Modification Ambiguity

•Many of the published site localization software were specifically written for phospho, so will not work for other PTMs.

•Site localization scoring based on search engine results should work for all modifications

• SLIP score; Mascot Delta score; VML score•However, they will only be meaningful if the competing modification alternatives were considered in the initial database search

• If carbamidomethyl modification of lysines or N-termini in addition to cysteines was not considered, then two carbamidomethyl modifications may not be considered as an alternative to ubiquitination.

• Knowledge of modifications considered relevant to evaluating site localization reliability

Page 17: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

PTMs in Crosslinked Peptides

For crosslinked peptides, ambiguity may be between peptides:

CAMKERTMAKER

Oxidation could be on methionine in either peptide.

Page 18: Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity

What is an Acceptable FLR?71 58

3356

4

9312

8i

1121

1

5840

9

8713

3i

9415

8i

9705

3i

4242

4i

7777

7i

2306

8

4010

4i

8704

8i

9265

3

3428

4i

2311

7

7456

4

1415

1

5278

1

4760

3

1415

2

4551

1

1182

1

0

2

4

6

8

10

12

14

16

18

Sp

iked

Pep

tid

e P

SM

FL

R (

%)

5% 5% 1% 1-2% 1% <1 0.5 10% <30% 0.01 <5% <1%

•2012 iPRG study involved identification of modified peptides•Participants were asked to return results with 1% FDR at PSM level•They were asked to indicate for which peptides they thought PTM site assignments were reliable•Modified peptides were spiked in, so correct site localizations were known

What was reliability of results reported?