peptide identification by tandem mass spectrometry behshad behzadi april 2005

Post on 20-Dec-2015

233 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Peptide Identification by Tandem Mass Spectrometry

Behshad Behzadi

April 2005

Outline

• Proteomics

• Tandem Mass Spectrometry

• Peptide Identification Problem

• Identification Via Database

• De novo peptide identification

Proteomics

• The systematic analysis of the proteins expressed by a cell or tissue.

• Identification, Quantification, intractions,…

• Tandem Mass spectrometry is an essential tool for identification (and quantification) of the proteins in a mixture.

Proteins

• Primary structure of the proteins is a sequence in an alphabet of size 20 of amino acids.

Amino Acids

`

Tandem Mass Spectrum: An Example

Secondary Fragmentation

Ionized parent peptide

What is the goal ?

• Spectrum Peptide sequence

Protein Backbone

H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH

Ri-1 Ri Ri+1

AA residuei-1 AA residuei AA residuei+1

N-terminus C-terminus

Breaking of Protein Backbone

H...-HN-CH-CO NH-CH-CO-NH-CH-CO-…OH

Ri-1 Ri Ri+1

AA residuei-1 AA residuei AA residuei+1

N-terminus C-terminus

H+

How Does a Peptide Fragment?

m(y1)=19+m(A4)m(y2)=19+m(A4)+m(A3)m(y3)=19+m(A4)+m(A3)+m(A2)

m(b1)=1+m(A1)m(b2)=1+m(A1)+m(A2)m(b3)=1+m(A1)+m(A2)+m(A3)

The identification Algorithms

• Database Search Algorithms (Sequest, Mascot, …)

• De novo Algorithms (Lutefisk, Peaks,…)

Database Search Algorithms

• Interpreting the tandem mass spectral data by searching a protein database.

• SEQUEST (Eng. et al. 1994)

• Mascot (Perkins et al. 1999)

• ProteinProspector (Clauser et al. 1999)

SEQUEST (Eng et al. 94)

• Protein database is searched to identify the amino acid sequences with mass tolerance of 1.

• Produce the theoretical spectra for the candidates.

• Match the theoretical and experimental spectrum using a score function (Xcorr)

• Rank the candidates using this score.

Other probabilistic models for scores

• Qin et al. (1997)

• Danick et al. (2000)

• Bafna and Edwards (2001)

Why do we need de novo?

• Unknown genomes of certain organisms.

• The sequences in the protein database are not accurate.

• Modifications in Amino Acids: RNA editing, Post-Translational Modifications

Methods

• Tree Based Search ( Taylor et al. 97)

• Spectrum Graph Bases Search (Danick et al. 99)

• Dynamic Programming Algorithm (Chen et al. 2001)

• AuDeNS (Baginsky et al. 02)

• Sub-Optimal Algorithm (Lu and Chen 03)

• …

De Novo Identification

• Given a spectrum S and a defined scoring function f(), find a peptide q sequence which maximizes f(S|q).

AuDeNS

• Using Grass Mowers to preprocess the spectrum, and then employs the dynamic programming approach.

• Compute a relevance for peaks by using different mowers.

• Apply a weighted version of Chen et al. algorithm (DP).

Mowers

• Threshold Mower

• Window Mower

• Isotope Mower

• Intersection Mower

• Complement Mower

Summary: De novo Sequencing

S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

850.3

687.3

588.1

851.4425.0

949.4

326.0524.9

589.2

1048.6397.1226.9

1049.6489.1

629.0

SequenceSequence

Intensities

• Intensities are the second dimension of the information in spectrum.

• Different factors play roles in determination of the intensities.

Intensities (2)

• Amino Acid dependent factors,

• Ion type factors,

• Position-based factors (peaks in the middle of the spectrum are higher)

Conclusion

• Tandem Mass Spectrometry is now the most important tool to identify the proteins.

• Many approaches have been developed but there is still a long way into extracting all information which can be obtained from the mass spectra.

Research Themes

• A mixture of De Novo and Database method. (ex. Extracting tags)

• Using the intensities

• Dealing better with the PTMs. (200 types)

• High-throughput Experiences Clustering.

• Multi-Dimensional Interpretation.

top related