reporting protein identifications from ms/ms results

100
Reporting Protein Identifications from MS/MS Results Brian C. Searle Proteome Software Inc. Portland, Oregon USA [email protected] Creative Commons Attribution

Upload: odelia

Post on 16-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Reporting Protein Identifications from MS/MS Results. Brian C. Searle Proteome Software Inc. Portland, Oregon USA [email protected]. Creative Commons Attribution. Outline. Assigning Proteins from Peptide IDs Correcting for One-Hit-Wonders Protein False Discovery Rates? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Reporting Protein Identifications  from MS/MS Results

Reporting Protein Identifications from MS/MS Results

Brian C. SearleProteome Software Inc.

Portland, Oregon USA

[email protected]

Creative Commons Attribution

Page 2: Reporting Protein Identifications  from MS/MS Results

Outline

• Assigning Proteins from Peptide IDs

• Correcting for One-Hit-Wonders

• Protein False Discovery Rates?

• Correcting for Shared Peptides

• Publication Standards

Page 3: Reporting Protein Identifications  from MS/MS Results

Outline

• Assigning Proteins from Peptide IDs

• Correcting for One-Hit-Wonders

• Protein False Discovery Rates?

• Correcting for Shared Peptides

• Publication Standards

Page 4: Reporting Protein Identifications  from MS/MS Results

Just to Review:

clearlywrong

possiblycorrect

F

R

Elias JE, Gygi SP.Nat Methods. 2007 Mar;4(3):207-14.

Page 5: Reporting Protein Identifications  from MS/MS Results

Just to Review:# Spectrum Accession Peptide Score

1 scan 3632 P35908 GFSSGSAVVSGGSR 4.6

2 scan 3609 P0AFY8 FSAASQPAAPVTK 3.7

3 scan 3629 P0A940 GFQSNTIGPK 3.0

4 scan 3635 P0A6F9 STRGEVLAVGNGR 2.2

5 scan 3636 P0A870 ELAESEGAIER 2.1

6 scan 3607 P0A799 ADLNVPVKDGK 1.9

7 scan 3626 P0ABC7 EAEAYTNEVQPR 1.6

8 scan 3602 P0A853 IRVIEPVKR 1.4

9 scan 3623 P38489 KLTPEQAEQIK 0.9

10 scan 3616 P00448 GTTLQGDLK 0.8

11 scan 3621 P09546 LLPGPTGER 0.4

12 scan 3615 P0AFG8 AFLEGR 0.2

13 scan 3624 P14565 SAADVAIMK 0.0

14 scan 3613 rev_P06864 EGSLAVNVQGDAAIR -0.4

15 scan 3604 P36562 DPEEVVGIGANLPTDK -0.7

16 scan 3606 P0A9C5 IPVVSSPK -0.7

17 scan 3611 P0ABB0 ASTISNVVR -0.7

18 scan 3614 rev_Q2EEU2 KFVALTCDTLLLGER -0.8

19 scan 3620 rev_P0ACL5 NNESAALMKEYCR -0.9

20 scan 3633 rev_P37309 SDGSCNQRALNR -0.9

21 scan 3627 P32132 VEETEDADAFRVSGR -1.0

22 scan 3618 P37342 ILTQDEIDVR -1.0

23 scan 3610 rev_P0ADK0 IANVSDVVPR -1.2

24 scan 3601 P0AG93 LGMKREHMLQQK -1.3

Page 6: Reporting Protein Identifications  from MS/MS Results

Just to Review:# Spectrum Accession Peptide Score

1 scan 3632 P35908 GFSSGSAVVSGGSR 4.6

2 scan 3609 P0AFY8 FSAASQPAAPVTK 3.7

3 scan 3629 P0A940 GFQSNTIGPK 3.0

4 scan 3635 P0A6F9 STRGEVLAVGNGR 2.2

5 scan 3636 P0A870 ELAESEGAIER 2.1

6 scan 3607 P0A799 ADLNVPVKDGK 1.9

7 scan 3626 P0ABC7 EAEAYTNEVQPR 1.6

8 scan 3602 P0A853 IRVIEPVKR 1.4

9 scan 3623 P38489 KLTPEQAEQIK 0.9

10 scan 3616 P00448 GTTLQGDLK 0.8

11 scan 3621 P09546 LLPGPTGER 0.4

12 scan 3615 P0AFG8 AFLEGR 0.2

13 scan 3624 P14565 SAADVAIMK 0.0

14 scan 3613 rev_P06864 EGSLAVNVQGDAAIR -0.4

15 scan 3604 P36562 DPEEVVGIGANLPTDK -0.7

16 scan 3606 P0A9C5 IPVVSSPK -0.7

17 scan 3611 P0ABB0 ASTISNVVR -0.7

18 scan 3614 rev_Q2EEU2 KFVALTCDTLLLGER -0.8

19 scan 3620 rev_P0ACL5 NNESAALMKEYCR -0.9

20 scan 3633 rev_P37309 SDGSCNQRALNR -0.9

21 scan 3627 P32132 VEETEDADAFRVSGR -1.0

22 scan 3618 P37342 ILTQDEIDVR -1.0

23 scan 3610 rev_P0ADK0 IANVSDVVPR -1.2

24 scan 3601 P0AG93 LGMKREHMLQQK -1.3

Page 7: Reporting Protein Identifications  from MS/MS Results

Just to Review:# Spectrum Accession Peptide Score

1 scan 3632 P35908 GFSSGSAVVSGGSR 4.6

2 scan 3609 P0AFY8 FSAASQPAAPVTK 3.7

3 scan 3629 P0A940 GFQSNTIGPK 3.0

4 scan 3635 P0A6F9 STRGEVLAVGNGR 2.2

5 scan 3636 P0A870 ELAESEGAIER 2.1

6 scan 3607 P0A799 ADLNVPVKDGK 1.9

7 scan 3626 P0ABC7 EAEAYTNEVQPR 1.6

8 scan 3602 P0A853 IRVIEPVKR 1.4

9 scan 3623 P38489 KLTPEQAEQIK 0.9

10 scan 3616 P00448 GTTLQGDLK 0.8

11 scan 3621 P09546 LLPGPTGER 0.4

12 scan 3615 P0AFG8 AFLEGR 0.2

13 scan 3624 P14565 SAADVAIMK 0.0

14 scan 3613 rev_P06864 EGSLAVNVQGDAAIR -0.4

15 scan 3604 P36562 DPEEVVGIGANLPTDK -0.7

16 scan 3606 P0A9C5 IPVVSSPK -0.7

17 scan 3611 P0ABB0 ASTISNVVR -0.7

18 scan 3614 rev_Q2EEU2 KFVALTCDTLLLGER -0.8

19 scan 3620 rev_P0ACL5 NNESAALMKEYCR -0.9

20 scan 3633 rev_P37309 SDGSCNQRALNR -0.9

21 scan 3627 P32132 VEETEDADAFRVSGR -1.0

22 scan 3618 P37342 ILTQDEIDVR -1.0

23 scan 3610 rev_P0ADK0 IANVSDVVPR -1.2

24 scan 3601 P0AG93 LGMKREHMLQQK -1.3

?

Page 8: Reporting Protein Identifications  from MS/MS Results

…Well, Maybe

Page 9: Reporting Protein Identifications  from MS/MS Results

AEPTIR

IDVCIVLLQHK

NTGDR

Protein

Page 10: Reporting Protein Identifications  from MS/MS Results

AEPTIR

IDVCIVLLQHK

NTGDR

Protein

85%

65%

25%

??%

Page 11: Reporting Protein Identifications  from MS/MS Results

FDRs for Whole Datasetsvs Individual Peptides

• Cumulative FDRs only estimate the validity of a data set

• Probabilities (or instantaneous FDRs) estimate the validity of a peptide of interest

Page 12: Reporting Protein Identifications  from MS/MS Results

One Possible Approach• Instantaneous False Discovery Rate

• PeptideProphet (TPP, Scaffold)• Percolator• Spectral Energies• RAId De Novo

Many Others:

Page 13: Reporting Protein Identifications  from MS/MS Results

Just to Review:# Spectrum Accession Peptide Score

1 scan 3632 P35908 GFSSGSAVVSGGSR 4.6

2 scan 3609 P0AFY8 FSAASQPAAPVTK 3.7

3 scan 3629 P0A940 GFQSNTIGPK 3.0

4 scan 3635 P0A6F9 STRGEVLAVGNGR 2.2

5 scan 3636 P0A870 ELAESEGAIER 2.1

6 scan 3607 P0A799 ADLNVPVKDGK 1.9

7 scan 3626 P0ABC7 EAEAYTNEVQPR 1.6

8 scan 3602 P0A853 IRVIEPVKR 1.4

9 scan 3623 P38489 KLTPEQAEQIK 0.9

10 scan 3616 P00448 GTTLQGDLK 0.8

11 scan 3621 P09546 LLPGPTGER 0.4

12 scan 3615 P0AFG8 AFLEGR 0.2

13 scan 3624 P14565 SAADVAIMK 0.0

14 scan 3613 rev_P06864 EGSLAVNVQGDAAIR -0.4

15 scan 3604 P36562 DPEEVVGIGANLPTDK -0.7

16 scan 3606 P0A9C5 IPVVSSPK -0.7

17 scan 3611 P0ABB0 ASTISNVVR -0.7

18 scan 3614 rev_Q2EEU2 KFVALTCDTLLLGER -0.8

19 scan 3620 rev_P0ACL5 NNESAALMKEYCR -0.9

20 scan 3633 rev_P37309 SDGSCNQRALNR -0.9

21 scan 3627 P32132 VEETEDADAFRVSGR -1.0

22 scan 3618 P37342 ILTQDEIDVR -1.0

23 scan 3610 rev_P0ADK0 IANVSDVVPR -1.2

24 scan 3601 P0AG93 LGMKREHMLQQK -1.3

Page 14: Reporting Protein Identifications  from MS/MS Results

Just to Review:# Spectrum Accession Peptide Score

1 scan 3632 P35908 GFSSGSAVVSGGSR 4.6

2 scan 3609 P0AFY8 FSAASQPAAPVTK 3.7

3 scan 3629 P0A940 GFQSNTIGPK 3.0

4 scan 3635 P0A6F9 STRGEVLAVGNGR 2.2

5 scan 3636 P0A870 ELAESEGAIER 2.1

6 scan 3607 P0A799 ADLNVPVKDGK 1.9

7 scan 3626 P0ABC7 EAEAYTNEVQPR 1.6

8 scan 3602 P0A853 IRVIEPVKR 1.4

9 scan 3623 P38489 KLTPEQAEQIK 0.9

10 scan 3616 P00448 GTTLQGDLK 0.8

11 scan 3621 P09546 LLPGPTGER 0.4

12 scan 3615 P0AFG8 AFLEGR 0.2

13 scan 3624 P14565 SAADVAIMK 0.0

14 scan 3613 rev_P06864 EGSLAVNVQGDAAIR -0.4

15 scan 3604 P36562 DPEEVVGIGANLPTDK -0.7

16 scan 3606 P0A9C5 IPVVSSPK -0.7

17 scan 3611 P0ABB0 ASTISNVVR -0.7

18 scan 3614 rev_Q2EEU2 KFVALTCDTLLLGER -0.8

19 scan 3620 rev_P0ACL5 NNESAALMKEYCR -0.9

20 scan 3633 rev_P37309 SDGSCNQRALNR -0.9

21 scan 3627 P32132 VEETEDADAFRVSGR -1.0

22 scan 3618 P37342 ILTQDEIDVR -1.0

23 scan 3610 rev_P0ADK0 IANVSDVVPR -1.2

24 scan 3601 P0AG93 LGMKREHMLQQK -1.3

4 to 53 to 4

2 to 3

1 to 2

0 to 1

-1 to 0

-2 to -1

Page 15: Reporting Protein Identifications  from MS/MS Results

# of

Mat

ches

0

100

200

300

400

500

600

700

800

-40 -30 -20 -10 0 10 20 30 40 50 60

“Correct”

Ion Score – Identity Score

“2x Decoy”

Histogram of Decoy Matches

Page 16: Reporting Protein Identifications  from MS/MS Results

# of

Mat

ches

0

100

200

300

400

500

600

700

800

-40 -30 -20 -10 0 10 20 30 40 50 60

“Correct”

Ion Score – Identity Score

Histogram of Decoy Matches“2x Decoy”

Page 17: Reporting Protein Identifications  from MS/MS Results

# of

Mat

ches

Ion Score – Identity Score

Curve Fit Distributions

0

100

200

300

400

500

600

700

800

-40 -30 -20 -10 0 10 20 30 40 50 60

“2x Decoy”

“Correct”

Choi H, Ghosh D, Nesvizhskii AI.J Proteome Res. 2008 Jan;7(1):286-92.

Page 18: Reporting Protein Identifications  from MS/MS Results

0

100

200

300

400

500

600

700

800

-40 -30 -20 -10 0 10 20 30 40 50 60

Instantaneous FDR Method#

of M

atch

es

“Correct”

“2x Decoy”

Ion Score – Identity Score

p( | D)

p(D | ) p()

p(D | ) p() p(D | ) p( )

Choi H, Ghosh D, Nesvizhskii AI.J Proteome Res. 2008 Jan;7(1):286-92.

Page 19: Reporting Protein Identifications  from MS/MS Results

AEPTIR

IDVCIVLLQHK

NTGDR

Protein

85%

65%

25%

??%

Page 20: Reporting Protein Identifications  from MS/MS Results

AEPTIR

IDVCIVLLQHK

NTGDR

Protein

(15%)

(35%)

(75%)

(??%)

Feng J, Naiman DQ, Cooper B.Anal Chem. 2007 May 15;79(10):3901-11.

Page 21: Reporting Protein Identifications  from MS/MS Results

AEPTIR

IDVCIVLLQHK

NTGDR

Protein

(15%)

(35%)

(75%)

(4%)

0.15 * 0.35 * 0.75 = 0.04Feng J, Naiman DQ, Cooper B.Anal Chem. 2007 May 15;79(10):3901-11.

Page 22: Reporting Protein Identifications  from MS/MS Results

AEPTIR

IDVCIVLLQHK

NTGDR

Protein

85%

65%

25%

96%

0.15 * 0.35 * 0.75 = 0.04Feng J, Naiman DQ, Cooper B.Anal Chem. 2007 May 15;79(10):3901-11.

Page 23: Reporting Protein Identifications  from MS/MS Results

If only it were so easy!

Page 24: Reporting Protein Identifications  from MS/MS Results

Peptide 1

Peptide 2

Peptide 3

Peptide 4

Peptide 5

Peptide 6

Peptide 7

Peptide 8

Peptide 9

Peptide 10

80% Peptides

Page 25: Reporting Protein Identifications  from MS/MS Results

Peptide 1

Peptide 2

Peptide 3

Peptide 4

Peptide 5

Peptide 6

Peptide 7

Peptide 8

Peptide 9

Peptide 10

CorrectProtein A

CorrectProtein B

80% Peptides

Page 26: Reporting Protein Identifications  from MS/MS Results

Peptide 1

Peptide 2

Peptide 3

Peptide 4

Peptide 5

Peptide 6

Peptide 7

Peptide 8

Peptide 9

Peptide 10

CorrectProtein A

CorrectProtein B

IncorrectProtein C

IncorrectProtein D

80% Peptides 50% Proteins

Page 27: Reporting Protein Identifications  from MS/MS Results

One hit wonders aredubious at best

Page 28: Reporting Protein Identifications  from MS/MS Results

Outline

• Assigning Proteins from Peptide IDs

• Correcting for One-Hit-Wonders

• Protein False Discovery Rates?

• Correcting for Shared Peptides

• Publication Standards

Page 29: Reporting Protein Identifications  from MS/MS Results

Computed Probability

Actu

al P

roba

bilit

y

Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75, 4646-4658

Page 30: Reporting Protein Identifications  from MS/MS Results

Computed Probability

Actu

al P

roba

bilit

y

UNDERestimation

OVERestimation

Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75, 4646-4658

Page 31: Reporting Protein Identifications  from MS/MS Results

Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75, 4646-4658

UNDERestimation

OVERestimation

Computed Probability

Actu

al P

roba

bilit

y

Page 32: Reporting Protein Identifications  from MS/MS Results

What if we could scoreone-hit-wonderness?

Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75, 4646-4658

Page 33: Reporting Protein Identifications  from MS/MS Results

Combining different peptides

• Quantify as a score:If different peptides agree: Good!If peptides are one-hit-wonders: Bad!

Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75, 4646-4658

Page 34: Reporting Protein Identifications  from MS/MS Results

Combining different peptides

• Quantify as a score:If different peptides agree: Good!If peptides are one-hit-wonders: Bad!

• Peptide agreement score:

'

'

( | )k k

k k

NSP p D

Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75, 4646-4658

Page 35: Reporting Protein Identifications  from MS/MS Results

Combining different peptides

• Quantify as a score:If different peptides agree: Good!If peptides are one-hit-wonders: Bad!

• Peptide agreement score:

'

'

( | )k k

k k

NSP p D

NSP score for peptide (k) is the sum of other

agreeing peptides (not k)Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75, 4646-4658

Page 36: Reporting Protein Identifications  from MS/MS Results

Protein Prophet Distributions

Multi-hitProteins

One-hitWonders

Page 37: Reporting Protein Identifications  from MS/MS Results

Protein Prophet Distributions

Page 38: Reporting Protein Identifications  from MS/MS Results

Protein Prophet Distributions

Page 39: Reporting Protein Identifications  from MS/MS Results

Protein Prophet Distributions

in between(keep same)

one hit wonders(decrease prob)

multi-hit proteins(increase prob)

Page 40: Reporting Protein Identifications  from MS/MS Results

UNDERestimation

OVERestimation

Computed Probability

Actu

al P

roba

bilit

y

Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75, 4646-4658

Page 41: Reporting Protein Identifications  from MS/MS Results

Computed Probability

Actu

al P

roba

bilit

y

with NSP

without NSP

Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75, 4646-4658

Page 42: Reporting Protein Identifications  from MS/MS Results

Brian, I hate math.What do I do?

Page 43: Reporting Protein Identifications  from MS/MS Results

Option 1:Throw Out One-Hit-Wonders

Advantages: Easy, works!

Disadvantages: Loss of sensitivity!

Page 44: Reporting Protein Identifications  from MS/MS Results

Option 2: Use Multiple FiltersFilter 1 - Protein Mode

• ≥2 peptides/protein• moderate spectrum threshold

Filter 2 - Peptide Mode• 1 peptide/protein• high spectrum threshold

Page 45: Reporting Protein Identifications  from MS/MS Results

Option 2: Use Multiple Filters

Advantages: More sensitive!

Disadvantages: Pretty arbitrary!

Page 46: Reporting Protein Identifications  from MS/MS Results

Option 3:

• Assigning Proteins from Peptide IDs

• Correcting for One-Hit-Wonders

• Protein False Discovery Rates?

• Correcting for Shared Peptides

• Publication Standards

Page 47: Reporting Protein Identifications  from MS/MS Results

# Accession Protein Score

1 P0ABH7 4258.08

2 P0ABJ9 2423.84

3 P0A7S3 1670.86

4 P0ACF0 1230.35

5 P0AES0 896.12

6 P21165 702.89

7 P0AG59 524.04

8 P17952 409.74

9 P08997 327.85

10 rev_P76577 276.03

11 P41407 246.88

12 P39177 219.44

13 P37689 195.37

14 P0A951 177.02

15 P0AGG4 164.52

16 P29131 153.92

17 rev_P0AEQ1 146.86

18 rev_P09155 140.07

19 P0A9S5 132.29

20 P0AE45 125.41

21 P77718 120.12

22 P76115 116.15

23 rev_P76463 111.37

24 rev_P0A6E4 107.58

Page 48: Reporting Protein Identifications  from MS/MS Results

# Accession Protein Score

1 P0ABH7 4258.08

2 P0ABJ9 2423.84

3 P0A7S3 1670.86

4 P0ACF0 1230.35

5 P0AES0 896.12

6 P21165 702.89

7 P0AG59 524.04

8 P17952 409.74

9 P08997 327.85

10 rev_P76577 276.03

11 P41407 246.88

12 P39177 219.44

13 P37689 195.37

14 P0A951 177.02

15 P0AGG4 164.52

16 P29131 153.92

17 rev_P0AEQ1 146.86

18 rev_P09155 140.07

19 P0A9S5 132.29

20 P0AE45 125.41

21 P77718 120.12

22 P76115 116.15

23 rev_P76463 111.37

24 rev_P0A6E4 107.58

Page 49: Reporting Protein Identifications  from MS/MS Results

Protein FDRs only accurate with >100 Proteins

Number of Confidently IDed Proteins

Unc

erta

inty

in P

rote

in F

DR

1% Error In FDR Estimation

Page 50: Reporting Protein Identifications  from MS/MS Results

Histogram of Decoy PROTEIN Matches

Protein Score

# Pr

otei

n Id

entifi

catio

ns

“Correct”

“2x Decoy”

Page 51: Reporting Protein Identifications  from MS/MS Results

Instantaneous Protein FDRs…

• Estimate the likelihood that a single protein of interest is present

• Are trouble at best due to stochastic sampling

• Shouldn’t be used with <500 likely proteins– Better off calculating protein probabilities using a

model like ProteinProphet

Page 52: Reporting Protein Identifications  from MS/MS Results

Proteins don’t existin isolation

Page 53: Reporting Protein Identifications  from MS/MS Results

Outline

• Assigning Proteins from Peptide IDs

• Correcting for One-Hit-Wonders

• Protein False Discovery Rates?

• Correcting for Shared Peptides

• Publication Standards

Page 54: Reporting Protein Identifications  from MS/MS Results

Nesvizhskii, A. I.; Aebersold, R. Mol. Cell. Proteom. 4.10, 1419-1440, 2005

Page 55: Reporting Protein Identifications  from MS/MS Results

Nesvizhskii, A. I.; Aebersold, R. Mol. Cell. Proteom. 4.10, 1419-1440, 2005

Page 56: Reporting Protein Identifications  from MS/MS Results

Nesvizhskii, A. I.; Aebersold, R. Mol. Cell. Proteom. 4.10, 1419-1440, 2005

Page 57: Reporting Protein Identifications  from MS/MS Results

Tubulinalpha 6

Tubulinalpha 3

YMACCLLYR

Tubulinalpha 4

85%

??%

??%

??%

Page 58: Reporting Protein Identifications  from MS/MS Results

Tubulinalpha 6

Tubulinalpha 3

YMACCLLYR

Tubulinalpha 4

85%

85%3

85%3

85%3Nesvizhskii, A. I.; Keller, A. et al

Anal. Chem. 75, 4646-4658

Page 59: Reporting Protein Identifications  from MS/MS Results

Tubulinalpha 6

Tubulinalpha 3

YMACCLLYR

SIQFVDWCPTGFK

Tubulinalpha 4

??%

??%

??%

Page 60: Reporting Protein Identifications  from MS/MS Results

Tubulinalpha 6

Tubulinalpha 3

YMACCLLYR

SIQFVDWCPTGFK

Tubulinalpha 4

Page 61: Reporting Protein Identifications  from MS/MS Results

Peptide 1 Peptide 2

Peptide 3 Peptide 4

Prot

ein

BPr

otei

nA

Distinct Proteins

100% 100%

100% 100%

Page 62: Reporting Protein Identifications  from MS/MS Results

Peptide 1 Peptide 2 Peptide 3 Peptide 4

Peptide 1 Peptide 2 Peptide 3 Peptide 4

Prot

ein

BPr

otei

nA

Indistinguishable Proteins

50% 50% 50% 50%

50% 50% 50% 50%

Page 63: Reporting Protein Identifications  from MS/MS Results

Peptide 1 Peptide 2 Peptide 3

Peptide 2 Peptide 3 Peptide 4

Prot

ein

BPr

otei

nA

Differentiable Proteins

100% 50% 50%

50% 50% 100%

Page 64: Reporting Protein Identifications  from MS/MS Results

Peptide 1 Peptide 2 Peptide 3 Peptide 4

Peptide 2 Peptide 3 Peptide 4

Prot

ein

BPr

otei

nA

Subset Proteins

100% 100% 100% 100%

0% 0% 0%

Page 65: Reporting Protein Identifications  from MS/MS Results
Page 66: Reporting Protein Identifications  from MS/MS Results

Indistinguishable

Page 67: Reporting Protein Identifications  from MS/MS Results

Differentiable

Page 68: Reporting Protein Identifications  from MS/MS Results

Subset

Page 69: Reporting Protein Identifications  from MS/MS Results

Peptide 1 Peptide 2 Peptide 3 Peptide 4

Peptide 2 Peptide 3 Peptide 4

Prot

ein

BPr

otei

nA

The QuantitativeSubset Complication

Page 70: Reporting Protein Identifications  from MS/MS Results

Peptide 1 Peptide 2 Peptide 3 Peptide 4

Peptide 2 Peptide 3 Peptide 4

Prot

ein

BPr

otei

nA

The QuantitativeSubset Complication

Page 71: Reporting Protein Identifications  from MS/MS Results

Peptide 1 Peptide 2 Peptide 3 Peptide 4

Peptide 2 Peptide 3 Peptide 4

Prot

ein

BPr

otei

nA

The QuantitativeSubset Complication

?

Page 72: Reporting Protein Identifications  from MS/MS Results

Peptide 1 Peptide 2 Peptide 3 Peptide 4

Peptide 2 Peptide 3 Peptide 4

Prot

ein

BPr

otei

nA

The QuantitativeSubset Complication

?

Page 73: Reporting Protein Identifications  from MS/MS Results

EAFIDHGEEFSGR GSFPMAEK

NLGMGK

Specific to 2c29Specific to 2c40 Common to both

Ratio ≈ 1.1

P450 2c40 P450 2c29

Ratio ≈ 1.6 Ratio ≈ 2.2

Page 74: Reporting Protein Identifications  from MS/MS Results
Page 75: Reporting Protein Identifications  from MS/MS Results

The Hidden Subset Complication

Peptide 1

Prot

ein

BPr

otei

nA Peptide 2

Peptide 3Peptide 2

Peptide 3 Peptide 4

Prot

ein

C

Page 76: Reporting Protein Identifications  from MS/MS Results

The Hidden Subset Complication

Peptide 1

Prot

ein

BPr

otei

nA Peptide 2

Peptide 3Peptide 2

Peptide 3 Peptide 4

Prot

ein

C

100%

100%

Page 77: Reporting Protein Identifications  from MS/MS Results

The Hidden Subset Complication

Peptide 1

Prot

ein

BPr

otei

nA Peptide 2

Peptide 3Peptide 2

Peptide 3 Peptide 4

Prot

ein

C

100% 100%

0% 0%

100%

100%

Page 78: Reporting Protein Identifications  from MS/MS Results

The Bold Red Complication

Peptide 1

Prot

ein

BPr

otei

nA Peptide 2 Peptide 3 Peptide 4

Peptide 3 Peptide 4 Peptide 5

Page 79: Reporting Protein Identifications  from MS/MS Results

The Bold Red Complication

Peptide 1

Prot

ein

BPr

otei

nA

100%

Peptide 2 Peptide 3 Peptide 4

Peptide 3 Peptide 4 Peptide 5

100% 100%

100%

0% 0% 100%

Page 80: Reporting Protein Identifications  from MS/MS Results

The Bold Red Complication

Peptide 1

Prot

ein

BPr

otei

nA

100%

Peptide 2 Peptide 3 Peptide 4

Peptide 3 Peptide 4 Peptide 5

100% 100%

100%

0% 0% 100%

?

Page 81: Reporting Protein Identifications  from MS/MS Results

The Bold Red Complication

Peptide 1

Prot

ein

BPr

otei

nA Peptide 2 Peptide 3 Peptide 4

Peptide 3 Peptide 4 Peptide 5

Protein Identification Unique Peptides TrustFamily of A and B 5 Unique, 5

TotalHigh

•Definitive ID of Protein A 2 Unique, 4 Total

Med

•Definitive ID of Protein B 1 Unique, 3 Total

Low

Page 82: Reporting Protein Identifications  from MS/MS Results

The Similar Peptide Complication

AVGNLR

Scan Number: 2435

GLGNLR

Page 83: Reporting Protein Identifications  from MS/MS Results

The Similar Peptide Complication

AVGNLR

Scan Number: 2435 TLR9_HUMAN

GLGNLR

TRFE_HUMAN

LRFN1_HUMAN

Page 84: Reporting Protein Identifications  from MS/MS Results

The Similar Peptide Complication

AVGNLR

Scan Number: 2435 TLR9_HUMAN

TRFE_HUMAN

LRFN1_HUMAN

Page 85: Reporting Protein Identifications  from MS/MS Results

No software deals withall of these issues

Page 86: Reporting Protein Identifications  from MS/MS Results

Outline

• Assigning Proteins from Peptide IDs

• Correcting for One-Hit-Wonders

• Protein False Discovery Rates?

• Correcting for Shared Peptides

• Publication Standards

Page 87: Reporting Protein Identifications  from MS/MS Results

Publication Standards

• In 2006 MCP published guidelines for reporting peptide and protein identifications

• Other proteomics journals have adopted similar standards

• Revised “Paris 2” guidelines are forthcoming Expected to be enforced 1/1/2010!

Page 88: Reporting Protein Identifications  from MS/MS Results

Guidelines remind you:• To present a complete methods/results section

I. Search Parameters and Acceptance CriteriaVI. Raw Data Submission

Page 89: Reporting Protein Identifications  from MS/MS Results

Guidelines remind you:• To present a complete methods/results section

I. Search Parameters and Acceptance CriteriaVI. Raw Data Submission

• Follow smart criteria for choosing results to publish

II. Protein and Peptide IdentificationIV. Protein Inference from Peptide AssignmentsV. Quantification

Page 90: Reporting Protein Identifications  from MS/MS Results

Guidelines remind you:• To present a complete methods/results section

I. Search Parameters and Acceptance CriteriaVI. Raw Data Submission

• Follow smart criteria for choosing results to publish

II. Protein and Peptide IdentificationIV. Protein Inference from Peptide AssignmentsV. Quantification

• To not over-report your resultsIII. Post-Translational Modifications

Page 91: Reporting Protein Identifications  from MS/MS Results

Software Can MakeGuideline Fulfillment Easier

• Peak picking software, version, altered parameters

• Database Selection– Database name and version

– Species restriction

– Number of proteins searched

• Database search parameters– Search engine name and version

– Enzyme specificity

– # missed cleavages

– Fixed/variable modifications

– Mass tolerances

• Peptide selection criteria

Page 92: Reporting Protein Identifications  from MS/MS Results

XML Standards Can Make Guideline Fulfillment Easier

I. Search Parameters and Acceptance Criteria

II. Protein and Peptide Identification

III. Post-Translational Modifications

IV. Protein Inference from Peptide Assignments

V. Quantification

VI. Raw Data Submission

mzIdentML

mzMLhttp://www.psidev.info/

Page 93: Reporting Protein Identifications  from MS/MS Results

XML Standards Can Make Guideline Fulfillment Easier

I. Search Parameters and Acceptance Criteria

II. Protein and Peptide Identification

III. Post-Translational Modifications

IV. Protein Inference from Peptide Assignments

V. Quantification

VI. Raw Data Submission

mzIdentML

mzMLhttp://www.psidev.info/

Page 94: Reporting Protein Identifications  from MS/MS Results

Where are they?

http://www.mcponline.org/misc/ParisReport_Final.dtl

Molecular & Cellular Proteomics: Bradshaw, R. A., Burlingame, A. L., Carr, S., Aebersold, R., Reporting Protein Identification Data: The next Generation of Guidelines. Mol. Cell. Proteomics, 5:787-788, 2006.

Journal of Proteome Research: Beavis, R., Editorial: The Paris Consensus. J. Proteome Res., 2005, 4 (5), p 1475

Proteomics: Wilkins, M. R., Appel, R. D., Van Eyk, J. E., Maxey, C. M., et al., Guidelines for the next 10 years of proteomics. Proteomics. 2006, 6, 1, 4-8.

Page 95: Reporting Protein Identifications  from MS/MS Results

Conclusions• We identify Proteins (not Peptides)!

– Can’t stop at Peptide FDRs and Probabilities

Page 96: Reporting Protein Identifications  from MS/MS Results

Conclusions• We identify Proteins (not Peptides)!

– Can’t stop at Peptide FDRs and Probabilities

• One-Hit-Wonders are often wrong and need to be seriously investigated (manually or mathematically)

Page 97: Reporting Protein Identifications  from MS/MS Results

Conclusions• We identify Proteins (not Peptides)!

– Can’t stop at Peptide FDRs and Probabilities

• One-Hit-Wonders are often wrong and need to be seriously investigated (manually or mathematically)

• You can compute Protein level FDRs– But take them with a grain of salt!

Page 98: Reporting Protein Identifications  from MS/MS Results

Conclusions• We identify Proteins (not Peptides)!

– Can’t stop at Peptide FDRs and Probabilities

• One-Hit-Wonders are often wrong and need to be seriously investigated (manually or mathematically)

• You can compute Protein level FDRs– But take them with a grain of salt!

• Occam’s Razor can simplify Shared Peptides

Page 99: Reporting Protein Identifications  from MS/MS Results

Conclusions• We identify Proteins (not Peptides)!

– Can’t stop at Peptide FDRs and Probabilities

• One-Hit-Wonders are often wrong and need to be seriously investigated (manually or mathematically)

• You can compute Protein level FDRs– But take them with a grain of salt!

• Occam’s Razor can simplify Shared Peptides

• Publication Standards exist to help you

Page 100: Reporting Protein Identifications  from MS/MS Results

Conclusions• We identify Proteins (not Peptides)!

– Can’t stop at Peptide FDRs and Probabilities

• One-Hit-Wonders are often wrong and need to be seriously investigated (manually or mathematically)

• You can compute Protein level FDRs– But take them with a grain of salt!

• Occam’s Razor can simplify Shared Peptides

• Publication Standards exist to help you