peptide identification by tandem mass spectrometry behshad behzadi april 2005
Post on 20-Dec-2015
233 Views
Preview:
TRANSCRIPT
Peptide Identification by Tandem Mass Spectrometry
Behshad Behzadi
April 2005
Outline
• Proteomics
• Tandem Mass Spectrometry
• Peptide Identification Problem
• Identification Via Database
• De novo peptide identification
Proteomics
• The systematic analysis of the proteins expressed by a cell or tissue.
• Identification, Quantification, intractions,…
• Tandem Mass spectrometry is an essential tool for identification (and quantification) of the proteins in a mixture.
Proteins
• Primary structure of the proteins is a sequence in an alphabet of size 20 of amino acids.
Amino Acids
`
Tandem Mass Spectrum: An Example
Secondary Fragmentation
Ionized parent peptide
What is the goal ?
• Spectrum Peptide sequence
Protein Backbone
H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH
Ri-1 Ri Ri+1
AA residuei-1 AA residuei AA residuei+1
N-terminus C-terminus
Breaking of Protein Backbone
H...-HN-CH-CO NH-CH-CO-NH-CH-CO-…OH
Ri-1 Ri Ri+1
AA residuei-1 AA residuei AA residuei+1
N-terminus C-terminus
H+
How Does a Peptide Fragment?
m(y1)=19+m(A4)m(y2)=19+m(A4)+m(A3)m(y3)=19+m(A4)+m(A3)+m(A2)
m(b1)=1+m(A1)m(b2)=1+m(A1)+m(A2)m(b3)=1+m(A1)+m(A2)+m(A3)
The identification Algorithms
• Database Search Algorithms (Sequest, Mascot, …)
• De novo Algorithms (Lutefisk, Peaks,…)
Database Search Algorithms
• Interpreting the tandem mass spectral data by searching a protein database.
• SEQUEST (Eng. et al. 1994)
• Mascot (Perkins et al. 1999)
• ProteinProspector (Clauser et al. 1999)
SEQUEST (Eng et al. 94)
• Protein database is searched to identify the amino acid sequences with mass tolerance of 1.
• Produce the theoretical spectra for the candidates.
• Match the theoretical and experimental spectrum using a score function (Xcorr)
• Rank the candidates using this score.
Other probabilistic models for scores
• Qin et al. (1997)
• Danick et al. (2000)
• Bafna and Edwards (2001)
Why do we need de novo?
• Unknown genomes of certain organisms.
• The sequences in the protein database are not accurate.
• Modifications in Amino Acids: RNA editing, Post-Translational Modifications
Methods
• Tree Based Search ( Taylor et al. 97)
• Spectrum Graph Bases Search (Danick et al. 99)
• Dynamic Programming Algorithm (Chen et al. 2001)
• AuDeNS (Baginsky et al. 02)
• Sub-Optimal Algorithm (Lu and Chen 03)
• …
De Novo Identification
• Given a spectrum S and a defined scoring function f(), find a peptide q sequence which maximizes f(S|q).
AuDeNS
• Using Grass Mowers to preprocess the spectrum, and then employs the dynamic programming approach.
• Compute a relevance for peaks by using different mowers.
• Apply a weighted version of Chen et al. algorithm (DP).
Mowers
• Threshold Mower
• Window Mower
• Isotope Mower
• Intersection Mower
• Complement Mower
Summary: De novo Sequencing
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
850.3
687.3
588.1
851.4425.0
949.4
326.0524.9
589.2
1048.6397.1226.9
1049.6489.1
629.0
SequenceSequence
Intensities
• Intensities are the second dimension of the information in spectrum.
• Different factors play roles in determination of the intensities.
Intensities (2)
• Amino Acid dependent factors,
• Ion type factors,
• Position-based factors (peaks in the middle of the spectrum are higher)
Conclusion
• Tandem Mass Spectrometry is now the most important tool to identify the proteins.
• Many approaches have been developed but there is still a long way into extracting all information which can be obtained from the mass spectra.
Research Themes
• A mixture of De Novo and Database method. (ex. Extracting tags)
• Using the intensities
• Dealing better with the PTMs. (200 types)
• High-throughput Experiences Clustering.
• Multi-Dimensional Interpretation.
top related