protein identification by database searching · protein identification by database searching. john...
TRANSCRIPT
![Page 1: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/1.jpg)
ASMS 2005
Protein Identificationby
Database Searching
John CottrellMatrix Science
![Page 2: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/2.jpg)
ASMS 2005
Three ways to use mass spectrometry data for protein ID:1. Peptide Mass Fingerprint
A set of peptide molecular weights from an enzyme digest of a protein
![Page 3: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/3.jpg)
ASMS 2005
Henzel, W. J., Billeci, T. M., Stults, J. T., Wong, S. C., Grimley, C. and Watanabe, C. (1993). Proc Natl Acad Sci USA 90, 5011-5.
James, P., Quadroni, M., Carafoli, E. and Gonnet, G. (1993). Biochem Biophys Res Commun 195, 58-64.
Mann, M., Hojrup, P. and Roepstorff, P. (1993). Biol Mass Spectrom 22, 338-45.
Pappin, D. J. C., Hojrup, P. and Bleasby, A. J. (1993). Curr. Biol. 3, 327-32.
Yates, J. R., 3rd, Speicher, S., Griffin, P. R. and Hunkapiller, T. (1993). Anal Biochem 214, 397-408.
1993
![Page 4: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/4.jpg)
ASMS 2005
![Page 5: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/5.jpg)
ASMS 2005
![Page 6: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/6.jpg)
![Page 7: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/7.jpg)
ASMS 2005
![Page 8: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/8.jpg)
ASMS 2005
Peptide Mass Fingerprint
• Fast, simple analysis• High sensitivity• Need database of protein sequences, not
ESTs or genomic DNA• Sequence (or close homolog) must be present
in database• Not good for mixtures, especially a minor
component.
![Page 9: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/9.jpg)
![Page 10: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/10.jpg)
ASMS 2005
H – N – C – C – N – C – C – N – C – C – N – C – C – OH
R1 R2 R3 R4O O O
HH H H H HHH
O
a1 b1 c1 a2 b2 c2 a3 b3 c3
x3 y3 z3 x2 y2 z2 x1 y1 z1H+
Roepstorff, P. and Fohlman, J. (1984). Proposal for a common nomenclature for sequence ions in mass spectra of peptides. Biomed Mass Spectrom 11, 601.
![Page 11: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/11.jpg)
ASMS 2005
Three ways to use mass spectrometry data for protein ID:1. Peptide Mass Fingerprint
A set of peptide molecular weights from an enzyme digest of a protein
2. Sequence QueryMass values combined with amino acid sequence or composition data
![Page 12: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/12.jpg)
ASMS 2005
Mann, M. and Wilm, M. (1994). Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal Chem 66, 4390-9.
![Page 13: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/13.jpg)
ASMS 2005
TA
G
913.2 1278.3
![Page 14: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/14.jpg)
ASMS 2005
Sequence Tag
• Rapid search times• Error tolerant• Requires interpretation• Requires high quality data.
![Page 15: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/15.jpg)
ASMS 2005
Three ways to use mass spectrometry data for protein ID:1. Peptide Mass Fingerprint
A set of peptide molecular weights from an enzyme digest of a protein
2. Sequence QueryMass values combined with amino acid sequence or composition data
3. MS/MS Ions SearchMS/MS data from a single peptide or from a complete LC-MS/MS run
![Page 16: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/16.jpg)
ASMS 2005
Eng, J. K., McCormack, A. L. and Yates, J. R., 3rd (1994). An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976-89.
![Page 17: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/17.jpg)
![Page 18: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/18.jpg)
ASMS 2005
MS/MS Ions Search
• Easily automated for high throughput• Get matches from marginal data• Can be slow
• No enzyme• Lots of variable modifications• Large database• Large dataset
• Peptide identification, proteins by inference.
![Page 19: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/19.jpg)
ASMS 2005
MS/MS matching identifies peptides, not proteins
• Grouping peptide matches into protein matches is an arbitrary procedure
Protein A Protein BProtein C
Peptide 1 Peptide 2 Peptide 3
Peptide 1 Peptide 3
Peptide 2
• If match peptides 1, 2 and 3 from 2D gel spot, Mascot will prefer Protein A (Occam’s razor)
• But, could easily have been mixture of B and C.
![Page 20: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/20.jpg)
ASMS 2005
BLAST / FASTA• Sequence against sequence• Can be used to find weak / distant similarity• Can make gapped alignments
MS-based ID• Mass & intensity values against sequence• Looking for identity or near identity• Generally, short peptides
![Page 21: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/21.jpg)
ASMS 2005
What is probability based scoring?
We compute the probability that the observed match between the experimental data and mass values calculated from a candidate protein or peptide sequence is a random event.
The ‘correct’ match, which is not a random event, has a very low probability.
![Page 22: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/22.jpg)
ASMS 2005
Why is probability based scoring important?• Human (even expert) judgement is subjective
and can be unreliable.
![Page 23: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/23.jpg)
ASMS 2005
![Page 24: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/24.jpg)
ASMS 2005
![Page 25: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/25.jpg)
ASMS 2005
Why is probability based scoring important?• Human (even expert) judgement is subjective
and can be unreliable• Standard, statistical tests of significance can
be applied to the results.
![Page 26: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/26.jpg)
ASMS 2005
Standard significance tests can be applied to results• Mascot score is -10Log10(P), where P is absolute probability that
observed match is random event
• If we make 50,000 trials, a 1 in a 20 significance threshold is
• -10Log10(1 / (20 x 50,000)) = 60 … “identity”
• If data quality are poor, this may not be achievable. If match is clearly an outlier, also report a lower, empirical threshold
• … “homology”
![Page 27: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/27.jpg)
ASMS 2005
Why is probability based scoring important?• Human (even expert) judgement is subjective
and can be unreliable• Standard, statistical tests of significance can
be applied to the results• Arbitrary scoring schemes are susceptible to
false positives.
![Page 28: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/28.jpg)
ASMS 2005
Major proteomics study published in Nature, 2002• 11,381 peptides• 2,415 proteins• Matches to fully non-tryptic peptides discarded• Overall fraction of semi-tryptic peptides 34%• For proteins identified using
• 1 peptide: 63% semi-tryptic• 2 peptides: 54% semi-tryptic• 3 peptides: 46% semi-tryptic.
![Page 29: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/29.jpg)
ASMS 2005
Can we calculate a probability that the match is correct?• Maybe, if it is a test sample and you know what
the answer should be• If the sample is an unknown, then you have to
define “correct” very carefully:– The best match in the database?– The best match out of all possible peptides?– The peptide sequence that is uniquely and completely
defined by the MS data?– A statistically unlikely match?
![Page 30: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/30.jpg)
Expect 1.8E-5
Expect 9.2E-4
Expect 0.037
Expect 4.0
![Page 31: Protein Identification by Database Searching · Protein Identification by Database Searching. John Cottrell Matrix Science. ASMS 2005. Three ways to use mass spectrometry data for](https://reader035.vdocuments.site/reader035/viewer/2022081613/5fbc2adaea57d964290f085d/html5/thumbnails/31.jpg)
ASMS 2005