interpreting ms\\ms results

Download Interpreting  MS\\MS Results

Post on 14-Jun-2015

5.066 views

Category:

Documents

2 download

Embed Size (px)

DESCRIPTION

Join Brian Searle on an illustrated tour about interpreting MS/MS peptide spectra. On this tour you will first see how you can relate mass spectra to peptides. Next you see why the SEQUEST software was developed to interpret these spectra as peptides. Next you will see other software approaches have been developed and how combining approaches produces even better results.

TRANSCRIPT

  • 1. Interpreting MS/MS Proteomics Results
    The first thing I should say is that none of the material presented is original research done at Proteome Software
    but we do strive to make the tools presented here available in our software product Scaffold.With that caveat aside
    Brian C. Searle
    Proteome Software Inc.
    Portland, Oregon USA
    Brian.Searle@ProteomeSoftware.com
    NPC Progress Meeting
    (February 2nd, 2006)
    Illustrated by Toni Boudreault

2. Organization
SEQUEST
Identify
This is foremost an introduction so were first going to talk about
Then were going to talk about the motivations behind the development of the first really useful bioinformatics technique in our field, SEQUEST.
how you go about identifying proteins with tandem mass spectrometry in the first place
This technique has been extended by two other tools called X! Tandem and Mascot.
X! Tandem/Mascot
Were also going to talk about how these programs differ
Differ
Combine
and how we can use that to our advantage by considering them simultaneously using probabilities.
3. A
A
I
E
P
A
T
H
K
K
Q
So, this is proteomics, so were going to use tandem mass spectrometry to identify proteins-- hopefully many of them, and hopefully very quickly.
I
G
L
R
L
K
N
V
I
T
I
D
D
C
G
V
R
T
A
Start with a protein
4. A
A
I
E
P
A
T
And to use this technique you generally have to lyse the protein into peptides about 8 to 20 amino acids in length and
H
K
K
Q
I
G
L
R
L
K
N
V
I
T
I
D
D
C
G
V
R
T
A
Cut with an enzyme
5. A
A
I
E
P
A
T
H
K
K
Q
I
G
L
Look at each peptide individually.
R
L
K
We select the peptide by mass using the first half of the tandem mass spectrometer
N
V
I
T
I
D
D
C
G
V
R
Select a peptide
T
A
6. A
E
P
T
I
R
H2O
Impart energy in collision cell
The mass spectrometer imparts energy into the peptide causing it to fragment at the peptide bonds between amino acids.
7. Measure mass of daughter ions
The masses of these fragment ions is recorded using the second mass spectrometer.
A
E
P
T
A
E
P
A
E
Intensity
399.2
A
298.1
201.1
72.0
M/z
8. These ions are commonly called B ions, based on nomenclature you dont really want to know about
A
E
P
T
I
R
B-type Ions
H2O
Intensity
72.0
129.0
97.0
101.0
113.1
174.1
M/z
But the mass difference between the peaks corresponds directly to the amino acid sequence.
9. A
E
P
T
I
R
B-type Ions
H2O
Intensity
72.0
129.0
97.0
101.0
113.1
174.1
AE-A
AEP
-AE
AEPT
-AEP
AEPTI
-AEPT
AEPTIR
-AEPTI
A-0
For example, the A-E peak minus the A peak should produce the mass of E.
You can build these mass differences up and derive a sequence for the original peptide
This is pretty neat and it makes tandem mass spectrometry one of the best tools out there for sequencing novel peptides.
M/z
10. But there are a couple confounding factors.
So, it seems pretty easy, doesnt it?
For example
11. B ions have a tendency to degrade and lose carbon monoxide producing
A
E
P
T
I
R
B-type Ions
H2O
CO
CO
CO
CO
CO
CO
Intensity
M/z
12. A ions.
A
E
P
T
I
R
A-type Ions
H2O
Furthermore
CO
CO
CO
CO
CO
CO
M/z
13. The second half are represented as Y ions that sequence backwards.
Y-type Ions
And, unfortunately, this is the real world, so
R
I
T
P
E
A
H2O
Intensity
M/z
14. All the peaks have different measured heights and many peaks can often be missing.
Y-type Ions
R
I
T
P
E
A
H2O
Intensity
M/z
15. All these peaks are seen together simultaneously
and we dont even know
B-type,A-type,Y-type Ions
R
I
T
P
E
A
H2O
Intensity
M/z
16. What type of ion they are, making the mass differences approach even more difficult.
Finally, as with all analytical techniques,
Intensity
M/z
17. Theres noise,
producing a final spectrum that looks like
Intensity
M/z
18. And so its actually fairly difficult to
.This, on a good day.
Intensity
M/z
19. compute the mass differences to sequence the peptide, certainly in a computer automated way.
A
E
P
T
I
R
H2O
Intensity
72.0
129.0
97.0
101.0
113.1
174.1
M/z
20. So the community needed a new technique.
Now, it wasnt all without hope
21. Known Ion Types
We knew a couple of things about peptide fragmentation.
B-type ions
A-type ions
Y-type ions
Not only do we know to expect B, A, and Y ions, but
22. Known Ion Types
We also know a couple of other variations on those ions that come up.
B-type ions
A-type ions
Y-type ions
B- or Y-type +2H ions
B- or Y-type -NH3 ions
B- or Y-type -H2O ions
We even know something about the
23. likelihood of seeing each type of ion,
Known Ion Types
B-type ions
A-type ions
Y-type ions
B- or Y-type +2H ions
B- or Y-type -NH3 ions
B- or Y-type -H2O ions
100%
20%
100%
50%
20%
20%
where generally B and Y ions are most prominent.
24. So its actually pretty easy to guess what a spectrum should look like
If we know the amino acid sequence of a peptide,we can guess what the spectra should look like!
if we know what the peptide sequence is.
25. Model Spectrum
So as an example, consider the peptide ELVIS LIVES K
that was synthesized by Rich Johnson in Seattle
ELVISLIVESK
*Courtesy of Dr. Richard Johnson
http://www.hairyfatguy.com/
26. Model Spectrum
We can create a hypothetical spectrum based on our rules
27. B/Y type ions (100%)
Where B and Y ions are estimated at 100%,
plus 2 ions are estimated at 50%,
and other stragglers are at 20%.
B/Y +2H type ions
(50%)
A type ions
B/Y -NH3/-H2O
(20%)
28. Model Spectrum
So if we consider the spectrum that was derived from the ELVIS LIVES K peptide
29. Model Spectrum
We can find where the overlap is between the hypothetical and the actual spectra
30. Model Spectrum
And say conclusively based on the evidence that the spectrum does belong to the ELVIS LIVES K peptide.
31. But who cares?
The more important question is
what about situations where we dont know the sequence?
32. We guess!
33. PepSeq
And so this was an approach followed by a program called PepSeq
which would guess every combination of amino acids possible
AAAAAAAAAA
AAAAAAAAAC
AAAAAAAACC
AAAAAAACCC
ELVISLIVESK
WYYYYYYYYY
YYYYYYYYYY
build a hypothetical spectrum,
and find the best matching hypothetical.


J. Rozenski et al.,
Org. Mass Spectrom.,
29 (1994) 654-658.
34. PepSeq
This was a start,
but its clearly impossibly hard with larger peptides
Impossibly hard after 7 or 8 amino acids!
High false positive rate because you consider so many options
and theres a lot of room to overfit the data.
35. PepSeq
So obviously this isnt going to work in the long run.
Another strategyis needed!
Impossibly hard after 7 or 8 amino acids!
High false positive rate because you consider so many options
36. Sequencing Explosion
We needed a new invention to come around
and that was shotgun Sanger-sequencing
1977 Shotgun sequencing invented, bacteriophage fX174sequenced.
1989 Yeast Genome project announced
1990 Human Genome project announced
1992 First chromosome (Yeast) sequenced
1995 H. influenza sequenced
1996 Yeast Genome sequenced
2000 Human Genome draft

In 89 and 90 the Yeast and Human Genome projects were announced
followed by the first chromosome in 92
et cetra, et cetra
37. Sequencing Explosion
1977 Shotgun sequencing invented, bacteriophage fX174sequenced.
1989 Yeast Genome project announced
1990 Human Genome project announced
1992 First chromosome (Yeast) sequenced
1995 H. influenza sequenced
1996 Yeast Genome sequenced
2000 Human Genome draft
Eng, J. K.; McCormack, A. L.; Yates, J. R. III
J. Am. Soc. Mass Spectrom. 1994, 5, 976-989.

In 1994 Jimmy Eng and John Yates published a technique to exploit genome sequencing
for use in tandem mass spectrometry.
And the idea was
38. SEQUEST
.instead of searching all possible peptide sequences,
Now, in the post- genomic world this seems like a pretty trivial idea,
search only those in genome databases.
but back then there was a lot of assumption placed on the idea
that wed actually have a complete Human genome in a reasonable amount of