1
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS
Karl R. ClauserBroad Institute of MIT and Harvard
BioInfoSummer 2012University of Adelaide
December, 2012
2
Topics Covered
• Basics of phospho site identification and localization• Evolution of phosphoproteomic literature MS/MS reporting• Modification site localization algorithm development• 2010 ABRF-iPRG study of phosphopeptide ID and site localization • Emerging false localization rate (FLR) metrics
3
Localizing a Phosphorylation SiteL/F|P/A/D|T/s/P/S T A\T K
L/F|P/A/D|t S/P/S T A\T K
4
PTM Site LocalizationTest all Locations, Examine Score Gaps
No possibleambiguity
SingleSite
MultipleSites
AVsEEQQPALK
# PO4 sites = # S,T, or Y
AVS(1.0)EEQQPALK
APS(0.99)LT(0.0)DLVKAPsLTDLVK *APSLtDLVK -
Locations Tested Conclusion
S(0.50)S(0.50)S(0.0)AGPEGPQLDVPRsSSAGPEGPQLDVPR * SsSAGPEGPQLDVPR * SSsAGPEGPQLDVPR -
VT(0.0)NDIS(0.99)PES(0.50)S(0.50)PGVGRVTNDIsPEsSPGVGR *VTNDIsPESsPGVGR *VTNDISPEssPGVGR -VtNDIsPESSPGVGR -VtNDISPEsSPGVGR -VtNDISPESsPGVGR -
5
PTM Site Localization – Confident Localization
(K)A/P|s|L/T D|L\V K(S)
APS(0.99)LT(0.0)DLVK
6
PTM Site Localization – Ambiguous Localization
(R)S s/S/A/G/P E/G/P Q L|D|V|P R(E)
S(0.50)S(0.50)S(0.0)AGPEGPQLDVPR
7
PTM Site Localization – Ambiguous Localization2 sites: 1 confident, 1 ambiguous
(R)V T N D|I|s/P E|s S/P G V\G R(R)
VT(0.0)NDIS(0.99)PES(0.50)S(0.50)PGVGR
8
Reliability of LC/MS/MS Phosphoproteomic Literature ~2005Citation Approach Instrument #sites #ambiguous Scores Site Supplem.
sites Shown Ambiq LabeledShown Spectra
Ballif, BA,…Gygi, SP 1DGel LCQ Deca XP 546 86 yes yes no2004 MCP, 3, digest, SCX1093-1101 LC/MS/MS
Rush, J, … Comb, MJ digest lysate LCQ Deca XP 628 0 yes no no2005, Nat Biotech, 23, pTyr Ab94-101 LC/MS/MS
Collins, MO, …Grant, SGN protein IMAC Q-Tof Ultima 331 42 no yes no2005, J Biol Chem, 280, peptide IMAC5972-5982 LC/MS/MS
Gruhler, A, … Jensen, ON digest lysate LTQ-FT 729 0 yes no no2005 MCP, 4, SCX, IMAC310-327 LC/MS/MS
“Resulting sequences were inspected manually …. When the exact site of phosphorylation could not be assigned for a given phosphopeptide, it was tabulated as ambiguous.”
“All identified phosphopeptides were manually validated, and localization of phosphorylated residues within the individual peptide sequences were manually assigned…”
“All spectra supporting the final list of assigned peptides used to build the tables shown here were reviewed by at least three people to establish their credibility.”
“Assignment of phosphorylation sites was verified manually with the aid of PEAK Studio (Bioinformatics Solutions) software.”
9
• The site(s) of modification Within each peptide sequence, all modifications must be clearly located (unless ambiguous; see below) and the manner in which this was accomplished (through computation or manual inspection) must be described.
• A justification for any localization score threshold employed.• Ambiguous assignments: Peptides containing ambiguous PTM site localizations must be listed in a separate table from
those with unambiguous site localizations. In cases where there are multiple modification sites and at least one is ambiguous, then these peptides should be listed with the ambiguous assignments. Ambiguous assignments must clearly labeled as such.
Examples of ambiguities include:• Modified peptides in which one or more modification sites are ambiguous.• Instances where the peptide sequence is repeated in the same protein so the specific modification site cannot be
assigned.• Instances in which the same peptide is repeated in multiple proteins, e.g. paralogs and splice variants (See also Section
IV).• Isobaric modifications (e.g., acetylation vs. trimethylation, phosphorylation vs. sulfonation etc), where the possibilities
may not be distinguished. Examples of methods able to distinguish between these include mass spectrometric approaches such as accurate mass determination, observation of signature fragment ions (e.g. m/z 79 vs. m/z 80 in negative ion mode for assignment of phosphorylation over sulfonation), or biological or chemical strategies.
• Annotated, mass labeled spectra: Spectra for ALL modified peptides must be either submitted to a public repository or accompany the manuscript as described in guideline II.
MCP Guideline for publishing PTM data ~2010
III. POST-TRANSLATIONAL MODIFICATIONSStudies focusing on posttranslational modifications (PTMs) require specialized methodology and documentation to assign the type(s) and site(s) of the modification(s). The guidelines in this section apply to PTMs that occur under physiological conditions and to which biological significance may be assigned, such as phosphorylation, glycosylation, etc. as well as purposefully induced chemical modifications of central importance to the results of the study, such as chemical cross‐linking. These guidelines do not apply to common modifications arising from sample handling or preparation such as oxidation of Met or alkylation of Cys. In addition to the tabular presentation(s) of the data described in guideline II, the following information is required:
http://www.mcponline.org/
10
Supplemental Table Links to Each Labeled Spectrum
11
Spectrum Mill Scoring of MS/MS Interpretations
Peak Selection: De-Isotoping, S/N thresholding,Parent - neutral removal, Charge assignment
Match to Database Candidate Sequences
Score=
Assignment Bonus(Ion Type Weighted)
+Marker Ion Bonus
(Ion Type Weighted) -
Non-assignment Penalty(Intensity Weighted)
12.68 92%
SPI (%)Scored Peak Intensity
12
Spectrum Mill Variable Modification Localization Score
VML score = Difference in Score of same identified sequences with different variable modification localizations
VML score > 1.1 indicates confident localization
Why a threshold value of 1.1?1 implies that there is a distinguishing ion of b or y ion type0.1 means that when unassigned, the peak is 10% the intensity of the base peak
13
*
*
14
VML Scoring - Room for Improvement
S(0.50)Q T(0.50)PPGVAT(0.0)PPIPK
VML score: 1.09
y12
b2
15
VML Scoring - Room for Improvement
VML score: 0.49S(0.0)T(0.0)S(0.25)T(0.25)PT(0.25)S(0.25)PGPR
S(0.0)T(0.0)[S(0.5)T(0.5)]P[T(0.5)S(0.5)]PGPR
16
Phosphosite Localization Scoring - Ascore
http://ascore.med.harvard.edu/Supports Sequest results only, Linux onlyBeausoleil SA, Villen J, Gerber SA, Rush J, Gygi SP (2006) Nat Biotechnol 24:1285–1292.
7
0.07 0.07
17
Phosphosite Localization Scoring - Andromeda
P = (k!/[n!(n-k)!] [pk] [(1-p) (n-k) ]) = (k!/[n!(n-k)!] [0.04k] [(0.96) (n-k) ])
PTM score = -10 x log (P)
p: 0.04 - use the 4 most intense fragment ions per 100 m/z unitsn: total num possible b/y ions in the observed mass range for all possible combinations of PO 4 sites in a peptidek: number of peaks matching n
Olsen, J. V.; Blagoev, B.; Gnad, F.; Macek, B.; Kumar, C.; Mortensen, P.; Mann, M. Cell (2006), 127 (3), 635–48.Olsen, J.V., and Mann, M. Proc. Natl. Acad. Sci. USA. (2004) 101, 13417–13422.
18
True Probability or Just Effective Scores?
Peak selection assumptions• All regions of spectrum equally likely
• multiply charged fragments below precursor• some 100-300 m/z values not possible, dipeptide AA combinations• tolerance in Da, not ppm
• Tall and short peak intensities equally diagnostic
Fragment ion type assumptions• All ion types equally probable• Neutral losses ignored, y-H3P04, y-H2O
19
Phosphosite Localization Scoring - PhosphoRS
Taus, T., Kocher, T., Pichler, P., Paschke, C., Schmidt, A., Henrich, C., and Mechtler, K. (2011) J Proteome Res. 10(12): 5354-62.
N: total # of extracted peaksd: fragment ion mass tolerancew: full mass range of spectrum
Score all theoretical fragment ions, not just site determining ions.
20
Key Aspects of Scoring Localizations
• Select peaks in spectrum to be used for identification/localization• Test all sequence/location possibilities• Assign fragment ion types to peaks
• Allow for peaks to have different ion type assignments for conflicting localization possibilities
• Use score differences to make decision on localization certainty/ambiguity• Decide upon conservative/aggressive thresholds.
• Provide a clear representation of the certainty/ambiguity in localization of each site
• Allow for multiple sites with mix of certainty and ambiguity in localization• Distinguish between:
• Ambiguity – no distinguishing evidence, i.e. either possibility• Ambiguity – conflicting evidence, multiple co-eluting isoforms present
How can we calculate a false localization rate as a standard measure of certainty for phosphosite assignment across a dataset?
A BR F
Proteome InformaticsResearch Group
iPRG: Informatic Evaluation of Phosphopeptide Identification and
Phosphosite Localization
ABRF 2010, Sacramento, CAMarch 22, 2010
21
A BR F
Proteome InformaticsResearch Group
Study Goals
22
1. Evaluate the consistency of reporting phosphopeptide identifications and phosphosite localization across laboratories
2. Characterize the underlying reasons why result sets differ
3. Produce a benchmark phosphopeptide dataset, spectral library and analysis resource
A BR F
Proteome InformaticsResearch Group
Study Design
23
• Use a common dataset• Use a common sequence database• Allow participants to use the bioinformatic tools
and methods of their choosing• Use a common reporting template• Fix the identification confidence (1% FDR)• Require an indication of phosphosite ambiguity
per spectrum• Ignore protein inference – for now
A BR F
Proteome InformaticsResearch Group
Study Materials and Instructions to Participants
24
• 1 Orbitrap XL dataset (3 files)– RAW, mzML, mzXML,
MGF, pkl or dta – conversions by ProteoWizard
• 1 FASTA file (SwissProt human seq’s. v57.1)
• 1 template (Excel)• 1 on-line survey (Survey
Monkey)
1. Analyze the dataset2. Report the phosphopeptide
spectrum matches in the provided template
3. Complete an on-line survey4. Attach a 1-2 page description
of your methodology
A BR F
Proteome InformaticsResearch Group
Reporting Template
25
Name of data file (e.g., D20090930_PM_K562_SCX-IMAC_fxn03)
Identifiers should be unique scan numbers from data file but may also refer to a merged range of MS/MS scans (e.g., Scan:19, 2316.19.19.3.dta, 2316.19.19.3.pkl).
Precursor m/z as submited to search engine
Precursor charge reported by search engine
Use lowercase s, t or y (e.g. SLsGSsPCPK) OR a trailing symbol (e.g. SLS#GS#PCPK) OR a string in parentheses (e.g. SLS(ph)GS(ph)PCPK) immediately following each phosphorylated residue. Only phosphorylation of S, T and Y will be compared; all other modifications (e.g., oxidized M) will be ignored. It will be assumed that all modifications indicated on S, T or Y are phosphorylations.
Protein identifier(s) from Fasta file. Use multiple values if peptide is found in multiple proteins, e.g., Q9NZ18; Q9UQ35. Protein inference will not be scored.
Total number of phosphorylations as evidenced by the precursor m/z and MS2 spectrum.
'Y' indicates this match is BETTER than the confidence threshold. 'N' indicates the match is WORSE. Please report BOTH types of identifications in your ranked list. Is this match above 1% FDR identification threshold (Y|N)?
Indicate 'Y' if ALL phosphorylations have been confidently localized. 'N' if one or more have not. Are ALL phosphosites unambiguously localized (Y|N)?
Peptide identification score reported by search engine (e.g., E-value, p-value, probability, Mascot score, etc.)
File Spectrum IdentifierPrecursor m/z
Precursor Charge Peptide Sequence Accession(s)
Num. Phospho sites
Peptide Identification Certainty
Phosphosite Localization Certainty
Peptide Identification Score
D20090930_PM_K562_SCX-IMAC_fxn03Scan:908 558.7576 2 qGsPVAAGAPAK Q9NZI8 1 Y Y 0.0002097
D20090930_PM_K562_SCX-IMAC_fxn04Scan:2017 710.82233 2 TsPDPSPVSAAPSK Q13469 1 Y N 45.41
D20090930_PM_K562_SCX-IMAC_fxn03Scan:683 692.28891 2 _APQTS(ph)S(ph)SPPPVR_ Q8IYB3 2 Y N 30.09
D20090930_PM_K562_SCX-IMAC_fxn03Scan:4832 775.3548 2 SQtPPGVAtPPIPK Q15648 2 Y N 31.79
D20090930_PM_K562_SCX-IMAC_fxn03Scan:641 590.2127 2 SLsGSsPcPK Q9UQ35 2 Y N 0.0112023D20090930_PM_K562_SCX-IMAC_fxn03Scan:641 590.2127 2 sLSGSsPcPK Q9UQ35 2 Y N 0.0915611
ABRF iPRG 2010 Study Template: Phosphorylated Peptide AnalysisInstructions: Please fill in all REQUIRED fields. After deleting the example rows, create a new row for each phosphopeptide spectrum match. Multiple rows MAY be used to report ambiguous phosphosite localizations. Phosphorylated residues MUST be indicated in the 'Peptide Sequence' field, and results should be sorted by 'Peptide Identification Score' from most to least confident. Additional instructions can be found above each field header. Results should be emailed to '[email protected]' no later than Jan. 10, 2010. Please make sure to fill out the REQUIRED survey --------------------->
REQUIRED FIELDS
A BR F
Proteome InformaticsResearch Group
26
55%
45%
Membership (n=33)
ABRF MemberNon-member
73%
9%
6%6% 6%
Type of Lab
AcademicBiotech/Pharma/IndustryContract Research OrgGovernmentOther
9% 6%
15%
70%
Location
AsiaAustralia/New ZealandEuropeNorth Amercia
39%
15%
42%
3%
Resource Lab Status
Conduct both core func-tions and non-core lab researchCore onlyNon-core research labSoftware development only
58%
9%
12%
18%3%
Primary Job Function
Bioinformatician/DeveloperDirector/ManagerLab ScientistMass SpectrometristOther
1-2 years 3-4 years 5-10 years >10 years Unanswered02468
10121416
Proteomics Experience
• 59 requests / 32 submissions (54% return) 2 retractions + 7 iPRG members and 1 guest
A BR F
Proteome InformaticsResearch Group
Software Tools Used
27
Phosphosite Localization
0
1
2
3
4
5
6
Ascore
custo
m
In-house
MaxQ
uant
msInsp
ect
Myri
Match
NNScore PLS
Phosphinato
r
PhosphoSc
ore
Prophossi
Spectrum M
ill
Peptide Identification
02468
10121416
Masco
t
X!Tandem
OMSS
A
SEQUEST
Myri
Match
in-house
PeptideProphet
Scaffold
InsPecT
PepARML
Peptizer
pFind
TPP
iProphet
MaxQ
uant
msInsp
ect
MSP
epSearch
OpenMS/TOPP
ProteinPro
phetPvie
w
SpectraST
SpectrumM
ill
thegp
m
A BR F
Proteome InformaticsResearch Group
The SCX/IMAC Enrichment Approach for Phosphoproteomics
28
Sample: 7.5x10e7 human K562 human chronic myelogenous leukemia cells, 4mg lysate Protocol: Villen, J, and Gygi, SP, Nat Prot, 2208, 3, 1630-1638.Lysis: 8M urea, 75mM NaCl, 50 mM Tris pH 8.2, phosphatase inhibitorsSCX: PolyLC - Polysulfoethyl A 9.4 mm X 200mm, elute: 0-105mM KCl , 30% Acn .IMAC: Sigma - PhosSelect Fe IMAC beads, bind: 40% Acn, 0.1% formic acid, elute: 500 mM K2HPO4 pH 7MS/MS: Thermo Fisher Orbitrap XL, high-res MS1 scans in the Orbitrap (60k), Top-8 fragmented in LTQ, exclude +1
and precursors w/ unassigned charges, 20s exclusion time, precursor mass error +/- 10 ppm
A BR F
Proteome InformaticsResearch Group
Preliminary Analysis of SCX Fractions and Dataset Selection
2929
0
500
1000
1500
2000
2500
3000
3500
2 3 4 5 6 7 8 9 10 11 12
SCX fr #
# sp
ectr
a
z4
z3
z2
Precursor z
0%
20%
40%
60%
80%
100%
2 3 4 5 6 7 8 9 10 11 12
SCX fr#
% d
isti
nc
t p
ep
tid
es
3P
2P
1P
# phosphosites
0%
20%
40%
60%
80%
100%
2 3 4 5 6 7 8 9 10 11 12
SCX fr #
% d
isti
nc
t p
ep
tid
es
6SC
5SC
4SC
3SC
2SC
1SC
0SC
-1SC
Solution charge
Frxn 3: multi-phosphositesFrxn 4: single phospho, single basicFrxn 12: multi-basic residues (RHK)
A BR F
Proteome InformaticsResearch Group
From 30,000 Ft.
30
0
1000
2000
3000
4000
5000
6000
7000
8000
1494
1
8713
3
2273
0
8601
0
1380
0
8494
0v
2089
9i
5370
6
9253
6i
8704
86i
4568
2
8704
84i
8524
6
1386
7
2044
1v
4081
6i
2010
9
5030
8i
2985
0v
5636
5
6639
8
9194
3i
4758
7
7126
3
6521
1
6310
3
9721
9i
2081
4
6196
3v
1862
1
7463
7
1576
9
7711
4
6651
4
7711
5
# spectra Id Yes# spectra Loc Yes# unique Peptides UC ID Yes
Participant alias14941
87133
22730
86010
13800
84940v
20899i
53706
92536i
870486
i45682
870484
i85246
13867
20441v
40816i
20109
50308i
29850v
56365
66398
91943i
47587
71263
65211
63103
97219i
20814
61963v
18621
74637
15769
77114
66514
77115
Spectral pre-processing Ih IhRr, Ih Ih Ih Bw Ih Ih Mq Sm Sm Mc
Rr, Xc Mq
Di, Mq Bw Ih Em Ih
R, Xc Ih
precursor m/z adjusted Y Y Y Y Y Y Y Y Y Y Y Y Y
nterm acetyl Y Y Y Y Y Y Y Y Y Y
Peptide identification
My, Om, Se, Xt, Pp
Om, Xt, Pp, TPP, Ip, Sp Se Pf Pf
Se, Pp Mp Om
Ma, Mq Sm
Ma, My, Om, Xt, Pl Sm
Ma, My, Om, Xt, Pl
Ma, Om, Xt Xt Ma
My, Xt, In
Ma, Ih Ma In Ma Ma Se
Ma, In, Op, Pz Se
Ma, Sc, Xt
Xt*, Sc Ma
Xt, Gp
Ma, Xt, Sc Pv
Ma, Ih
Se, Pp, Ih
Ma, Xt, Sc
Phosphosite localization Ih Ih As IhPf, As As Ih Ph Mq Sm Ih Sm Ih As Id
As, Ih Ma In Mq Ps In Ih Ih As Ih Ih
Ih, Pr As Ih
A BR F
Proteome InformaticsResearch Group
Software Program Abbreviations
31
Software Program KeyAscore AsBioworks BwDistiller Diextract_msn EmTheGPM Gpin-house IhInspect InIdPicker IpiProphet IdMascot Mamsconvert McmsInspect MiMyriMatch MmMSPepSearch + Spec Lib. MpMaxQuant MqmsInspect MsOMSSA OmOpenMS OppFind PfPhosphinator PhpepARML PlPeptideProphet PpPeptizer PzProphossi PrPhosphoScore PsPview PvReAdW RrScaffold ScSEQUEST SeSpectrum Mill SmSpectraST + Spec Lib. SpXcalibur XcX!Tandem XtX!Tandem (k-score) Xt*
The data analysis tools used by the participants were collected from the on-line survey as reported by the participants. Many participants used multiple search engines and most used a software tool to localize the phosphosites. Moreover, many in-house (Ih) or custom software tools were used in the study, only some of which are published. The key at the left can be used to decode the names of the software tools in the table above, and the table is sorted (by number of confident peptide identifications), exactly as in the histogram above.
A BR F
Proteome InformaticsResearch Group
Relative Performance: Identification By Fraction
32
0
500
1000
1500
2000
2500
3000
3500
400014
941
8713
3
2273
0
8601
0
1380
0
8494
0v
2089
9i
5370
6
9253
6i
8704
86i
4568
2
8704
84i
8524
6
1386
7
2044
1v
4081
6i
2010
9
5030
8i
2985
0v
5636
5
6639
8
9194
3i
4758
7
7126
3
6521
1
6310
3
9721
9i
2081
4
6196
3v
1862
1
7463
7
1576
9
7711
4
6651
4
7711
5
# s
pe
ctr
a I
d Y
es
# spectra Id Yes Frxn 3# spectra Id Yes Frxn 4# spectra Id Yes Frxn 12
Performance was not equivalent
across the 3 fractions for all
participants.
Some participants saw more unique
peptides than others.
0
500
1000
1500
2000
2500
3000
3500
4000
1494
1
8713
3
2273
0
8601
0
1380
0
8494
0v
2089
9i
5370
6
9253
6i
8704
86i
4568
2
8704
84i
8524
6
1386
7
2044
1v
4081
6i
2010
9
5030
8i
2985
0v
5636
5
6639
8
9194
3i
4758
7
7126
3
6521
1
6310
3
9721
9i
2081
4
6196
3v
1862
1
7463
7
1576
9
7711
4
6651
4
7711
5
# u
niq
ue
pep
tid
es U
C Id
Yes
# unique peptides UC Id Yes Frxn 3
# unique peptides UC Id Yes Frxn 4
# unique peptides UC Id Yes Frxn 12
A BR F
Proteome InformaticsResearch Group
Room for Improvement in ID Certainty Thresholds
33
0
200
400
600
800
1000
1200
1400
1600
1800
1494
187
133
2273
086
010
1380
084
940v
2089
9i53
706
9253
6i87
0486
4568
287
0484
8524
613
867
2044
1v40
816i
2010
950
308i
2985
0v56
365
6639
891
943i
4758
771
263
6521
163
103
9721
9i20
814
6196
3v18
621
7463
715
769
7711
466
514
7711
5
# sp
ectr
a
#DN Diff Id No#SN Same Id No#DY Diff Id Yes#SY Same Id Yes#Y1P Id Yes single
ii
Frxn 3 – most multiple phos per peptide
0
400
800
1200
1600
2000
2400
2800
1494
187
133
2273
086
010
1380
084
940v
2089
9i53
706
9253
6i87
0486
4568
287
0484
8524
613
867
2044
1v40
816i
2010
950
308i
2985
0v56
365
6639
891
943i
4758
771
263
6521
163
103
9721
9i20
814
6196
3v18
621
7463
715
769
7711
466
514
7711
5
# sp
ectr
a
#DN Diff Id No#SN Same Id No#DY Diff Id Yes#SY Same Id Yes#Y1P Id Yes single
ii
Frxn 12 – highest precursor charges
0
1000
2000
3000
4000
1494
187
133
2273
086
010
1380
084
940v
2089
9i53
706
9253
6i87
0486
4568
287
0484
8524
613
867
2044
1v40
816i
2010
950
308i
2985
0v56
365
6639
891
943i
4758
771
263
6521
163
103
9721
9i20
814
6196
3v18
621
7463
715
769
7711
466
514
7711
5
# sp
ectr
a
#DN Diff Id No#SN Same Id No#DY Diff Id Yes#SY Same Id Yes#Y1P Id Yes single
ii
Frxn 4 – most phosphopeptides
Gray means – Number of spectra where < 2 people agreed on the Id
85246: 1205 spectra with 3-15 phosphosites, 624 spectra with 4-15
20814: ?, Frxn 12 >> Frxn 3,477114, 77115: merged multiple scans, so
can’t be compared with other 33
A BR F
Proteome InformaticsResearch Group
Resource for Inspecting Peptide Id Certainty Overlaps - Frxn 4
34
YY: Y – identification Y – localizationYN: Y – identification N – localizationNS: N – identification, but top sequence same as consensusND: N – identification, and top sequence different than consensus
A BR F
Proteome InformaticsResearch Group
Subset of Participants Used for Localization Analysis
35
Excluded 0 0% localization1 100% localizationF FDR - very high?R Replicate submissionM Merged spectraC Categorization ErrorsA Y Loc only when
no possible ambiguity
0
1000
2000
3000
4000
5000
6000
7000
8000
1494
1
8713
3
2273
0
8601
0
1380
0
8494
0v
2089
9i
5370
6
9253
6i
8704
86i
4568
2
8704
84i
8524
6
1386
7
2044
1v
4081
6i
2010
9
5030
8i
2985
0v
5636
5
6639
8
9194
3i
4758
7
7126
3
6521
1
6310
3
9721
9i
2081
4
6196
3v
1862
1
7463
7
1576
9
7711
4
6651
4
7711
5
# sp
ectr
a
# spectra Id Yes# spectra Loc Yes
RF 1 0 1 A0 F 1 CM 0 M
35
22
0
1000
2000
3000
4000
5000
6000
7000
8000
1494
1
8713
3
2273
0
8601
0
1380
0
8494
0v
2089
9i
5370
6
9253
6i
8704
86i
4568
2
1386
7
2044
1v
2010
9
5030
8i
5636
5
9194
3i
4758
7
7126
3
9721
9i
6196
3v
1862
1
# sp
ectr
a
# spectra Id Yes# spectra Loc Yes
A BR F
Proteome InformaticsResearch Group
If Participants Agree on the Identity, Do They Also Agree Site Localization Can be Certain?
36
Frxn 4Subset of472 spectrafor which20/22 participantsall agree onIdentity
No possibility of ambiguity
0.0%
2.0%
4.0%
6.0%
8.0%
10.0%
NPA10%25%40%55%70%85%100%
% participants indicating localization Yes
% o
f s
pe
ctr
a
A BR F
Proteome InformaticsResearch Group
What Fraction of the Time Do They Agree On Localization(s)?
37
4685, 79%
563, 10%
670, 11% 100% partic agree
67-99% partic agree
< 67% partic agree
5918Y loc
5918/8050 spectra with > 2/22 Loc Yesand Site Ambiguity Possible
8050 spectra with > 2/22 Id Yes (Frxn 3, 4, 12)
5918
798
498
836
0 1000 2000 3000 4000 5000 6000 7000
# Y loc 2-22 partic
#Y loc 1 partic
# N loc all partic
no ambiguity
# spectra
For all of the participants that agree on identity when• site ambiguity is possible (#S,T,Y > # phos)• >2 participants mark Loc=Y
For 79% (4,685 of 5,918) of the spectra, all participants who mark Loc=Y unanimously agree on the localization of the phosphosites
A BR F
Proteome InformaticsResearch Group
Which Participants are More Likely to Disagree on Localization?
38
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
1494
1
8713
3
2273
0
8601
0
1380
0
8494
0v
2089
9i
5370
6
9253
6i
8704
86i
4568
2
1386
7
2044
1v
2010
9
5030
8i
5636
5
9194
3i
4758
7
7126
3
9721
9i
6196
3v
1862
1%
of
spec
tra
in m
ino
rity
lo
cali
zati
on
ch
oic
e
0.0%
5.0%
10.0%
15.0%
20.0%
1494
1
8713
3
2273
0
8601
0
1380
0
8494
0v
2089
9i
5370
6
9253
6i
8704
86i
4568
2
1386
7
2044
1v
2010
9
5030
8i
5636
5
9194
3i
4758
7
7126
3
9721
9i
6196
3v
1862
1% o
f sp
ectr
a in
min
ori
ty lo
caliz
atio
n c
ho
ice
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
1494
1
8713
3
2273
0
8601
0
1380
0
8494
0v
2089
9i
5370
6
9253
6i
8704
86i
4568
2
1386
7
2044
1v
2010
9
5030
8i
5636
5
9194
3i
4758
7
7126
3
9721
9i
6196
3v
1862
1%
of
spec
tra
in m
ino
rity
lo
cali
zati
on
ch
oic
e
# Spectra with Loc Agreement 50.1-99.9%
Frxn 3: 154Frxn 4: 498Frxn 12: 227
x-axis is sorted in descending order of# identified
A BR F
Proteome InformaticsResearch Group
Liberal Localizers are More Disagreeable
39
The participants who are the most willing to localize
are more likely to disagree with the majority view.
x-axis is sorted in descending order of # localized / # identified
A BR F
Proteome InformaticsResearch Group
A Challenging Problem
40
P(m/z) -H3PO4
879
3/7 DSAIPVESDtDDEGAPR
14/21 said can identify peptide but can not localize site
4/7 DSAIPVEsDtDDEGAPR
A BR F
Proteome InformaticsResearch Group
Primary Observations from iPRG 2010 study
41
1. Wide range of spectra marked confidently identified.2. Wide range of spectra marked confidently localized.3. If all of the participants agree on the identification,
phosphosite ambiguity is possible, and that localization is possible, for 79% of the spectra, participants unanimously agree on the localization(s).
4. For the remaining 21%, the participants who are liberal localizers are more likely to disagree with the majority view.
A BR F
Proteome InformaticsResearch Group
Acknowledgements
42
iPRG Members•Paul A. Rudnick (chair) – NIST•Manor Askenazi - Dana-Farber Cancer Institute•Karl R. Clauser - Broad Institute of MIT and Harvard•William S. Lane - Harvard University•Lennart Martens - Ghent University, Belgium•Karen Meyer-Arendt - University of Colorado•W. Hayes McDonald - Vanderbilt University•Brian C. Searle - Proteome Software, Inc.•Jeffrey A Kowalak (EB Liaison) – NIMH
Additional Contributors• Philipp Mertins, The Broad Institute
–All wet lab work and an analysis• Steve Gygi, Harvard Medical School
–Test datasets• Matthew Chambers, Vanderbilt University Medical Center
–Data format conversions (ProteoWizard)• Steve Stein and Yuri Mirokhin, NIST
–A K562 phosphopeptide spectral library• Renee Robinson, Harvard University
–“The Anonymizer”
Emerging False Localization Rate (FLR) Metrics
43
Target/Decoy for localizationDecoy - AA’s that can not biologically bear the modification
IssuesAllow decoys only during localization, not during identification
otherwise will bias identification FDRAmbiguity – more allowed sites will yield more ambiguous
assignments, so may need to score targets and decoys separately then compare
Frequency - decoy AA occurrence should be similar to target AAsotherwise FLR will be inaccurate
Proximity – a decoy AA nearer the site of a target AA has better chance of matchingPro and Glu often found in the consensus motifs of many kinases
AA Frequency in the Proteome
44
http://proteomics.broadinstitute.org/millhtml/faindexframe.htmselect the Calculate statistics utility
• Test Dataset: Synaptic phosphopeptides acquired in LTQ-Orbitrap Velos (IT-CID): 70,000 phosphopeptide spectra identified
• Altered Batch-Tag to allow for phosphorylation of Pro and Glu
• Filtered results to only phosphopeptide IDs containing one S, T or Y
• Modification site known
• Local FLR: SLIP score of 6 = 95% correct
• Global FLR (matches to phosphoP and phosphoE) similar to QTOF Micro data.
Baker, P.R., Trinidad, J.C., and Chalkley, R.J. (2011) Mol Cell Proteomics. M111.008078.
ProteinProspector SLIP Scoring and Local FLR
Closing Thoughts
46
• More research in the area of FLR metric calculation is critical to the field for developing standard confidence thresholds for modification site localization.
• An ambiguous modification localization decision for a particular peptide spectrum match is far preferable to getting it wrong.
• As more raw LC-MS/MS data from PTM studies is deposited in the public domain, it becomes increasingly possible for knowledgebases to undertake efforts to reprocess the data with the most recent algorithms and scoring metrics and enforce uniform quality standards on the information they disseminate.
• PHOSIDA (www.phosida.com) disseminates modification sites identified and localized in publications emerging only from research in the laboratory of Matthias Mann. So all MS/MS data has been analyzed through a common software platform and subject to consistent scoring thresholds.
Review ArticleModification Site Localization Scoring: Strategies and PerformanceChalkley, RJ and Clauser, KRMol Cell Proteomics 2012 11: 3-14. doi:10.1074/mcp.R111.015305.http://www.mcponline.org/
Canonical pathways in lung cancer are being aggressively targeted for drug development
47
Janku et al. J Thoracic Oncol 2011; 6: 1601-1612
EML4-ALK fusion
Crystal, Clinical Advances in Hematology & Oncology, 2011, 9, 207-214.
Targeted therapy development time
49
Gerber and Minna Cancer Cell 2010; 18: 548-551
The future of lung cancer management
50
• Diagnose earlier• Prognosticate better• Treat more precisely• Monitor more effectively
Herbst et al. NEJM 2008