identification of proteins through mass spectrometry databases

8/13/2019 Identification of Proteins Through Mass Spectrometry Databases

1/50


2/50

2

Proteome - complete set of proteins in cell

Current methodologies: 2D gel, protein microarray, fluorescencemicroscopy, mass spectroscopy, chromatography, nuclearmagnetic resonance, microfluidics, microchip

Mass spectrometry is an important practice for molecular and cellbiology

New advances in automation of mass spectrometry like excisionof protein spots, enzymatic digestion and acquirement of massspectra and automatic data bases searching.

Techniques for modified proteins and quantification have beendeveloped.

Proteomics and Mass Spectrometry


3/50

3

Servers available for ProteinIdentification through MassSpectrometry

For PMF For Sequence Query For MS/MS Ion Search

ASCQ_ME Mascot Inspect

Bupid MS-Seq (Protein Prospector) Mascot

Mascot Tagldent MS-Seq (Protein Prospector)

MassSearch Omssa

MS-Fit (Protein Prospector) PepFrag (Prowl)

PepMAPPER PepProbePrpfound (Powl) Rald_DbS

Mowse Sonar (Knexus)

PeptideSearch X!Tandam (The GPM)


4/50

4

MascotSoftware search engine

Uses mass spectrometry data

Mascot is uniqueWidely used

Freely available by Matrix Science

License is required for in-house use


5/50

5

Mascot ServerGives excellent results with peak lists from instrumentsmanufactured by:

Agilent, Bruker, Thermo Scientific

Waters AB Sciex, Shimadzu

In-house use:Data sets that exceed the 1200 spectrum limit

Confidentiality

For automation

To add and edit modifications, enzymes, quantitationmethods, etc.

Time taken in search depends on number of processors.


6/50

6

Three proven ways of using mass

spectrometry dataPeptide mass fingerprint

Uses the molecular masses of the peptides resulting fromdigestion of a protein by a specific enzyme

Sequence queryMass values combined with amino acid sequence or

composition data.

MS/MS Ions Search

Uninterpreted MS/MS data from a single peptide or from acomplete LC-MS/MS run.


7/50

7

Peptide Mass Fingerprint


8/50

8

Peptide Mass Fingerprint

Peak picking

Find a utility to convert into a peak list

Mass matter most

Get as many peptide masses in the range 1000 to 3500 Da

To perform a search

Paste your peak list or upload it as a file

Enter values for search parameters

After submission, you receive the results.

A list of matching proteins,


9/50

9

Protein Mass Fingerprinting

Fast simple analysis.

High Sensitivity

Need a database of proteins

not ESTSequence must be present in databases

Not Good for mixtures

Start with Swiss-Prot.

Protein hit is significant if expect value below 0.05


10/50

10

MS/MS Ions Search


11/50

11

MS/MS Ions SearchSingle protein or a complex mixture

Use chromatography to regulate the flow of peptides into the massspectrometer.

Select peptides one at a time using the first stage of mass analysis.Each isolated peptide is then induced to fragment. Second stage ofmass analysis used to collect an MS/MS spectrum.

We use software to determine which peptide sequence in the databasegives the best match.

The degree of matching is scored.


12/50

12

Fragment ion structures

Peptide molecular ions fragment at preferred locations alongbackbone.

Major peaks are b and y ions,

Depends on the ionization technique, the mass analyser, andpeptide structure.

If peptides fragmented cleanly, we wouldnt need databasesearch. A ladder of peaks for e ach ion series

Fragmentation is rarely perfect


13/50

13

Results complicated to report

Report, lists a series of proteins and the peptide matches thathave been assigned.

Report uses a pop-up window to show the alternative peptidematches

Top match has a high score

MS/MS ion search


14/50

14

MS/MS ion searchEasily automated

Searches can be slow

Without enzyme

Several variable modifications

Large dataset

Large database

MS/MS is peptide identification


15/50

15

Sequence Query


16/50

16

Sequence tag searchEven the quality of spectrum is poor, its possible to pick outminimum of four clean peaks

A few residues of amino acid sequence are interpreted

What Mann and Wilm realized, that this very short stretch ofamino acid sequence might provide sufficient specificity toprovide identification if it was combined with the fragment ionmass values which enclose it, the peptide mass, and theenzyme specificity.

Picking out a good tag requires both luck and experience.

Requires interpretation of spectrum

Usually manual, hence not high throughput

Tag has to be called correctly


17/50

17

Peptide Sequence tag

Standard sequence tag is obsolete.

Easier to skip the interpretation step and pass the peak list tothe search engine.

Rapid search timesError tolerant


18/50

18

Search parametersName, Email and Search Title

The name and email are saved as a browser cookie. If Mascotsecurity is enabled, information taken from user database

Email address used for sending results


19/50

19

Databases

Swiss-Prot (~500000 entries)

Best annotated database, ideal for PMF

NCBI nr and UniRef100 (~19000000 entries)

Large, comprehensive, best choice for MS/MSEST databases (>400000000 entries in translation)

Huge, not advisable for PMF

Single genome databases

Not suitable for PMF

cRAP and Contaminants


20/50

20

DatabaseChoose the right database

In Mascot 2.3 and later, you can select multiple databases

You cannot mix AA and DNA databases.

Comprehensive database repositories, NCBI and EBI, to downloadnr, GenBank, Swiss-Prot, EMBL, Trembl, etc

Searching for a single organism, always include a databaseof common contaminants.

If interested in a bacterium/plant, try comprehensive proteindatabases e.g. NCBInr and UniRef100.

how


21/50

21

Nucleic Acid DatabasesMascot always performs a 6 frame translation

Translates entire sequence, don't look for start codon to begin

When a stop codon is encountere d, leave a gap

Uses the correct genetic code, as long as the taxonomy isknown.


22/50

22

Taxonomy

Speeds upSimple report

Keep indexes up to date

Check the stats file for each database.

If the correct protein from the correct species is not inthe database , Dont specify a very narrow taxonomy.


23/50

23

EnzymeFirst choice

Allowed missed cleavage sites to zero

Choose a setting of 1 or 2 when youre not sure aboutyour sample

Higher number, increases the number of calculatedpeptide masses.

No enzyme only in exceptional cases, never for PMF

The list is user configurable.


24/50

24

ModificationsFixed modifications

Variable, post-translational modifications

Display all modifications

Keep less number of variable modifications

Some modifications are worse then othersMods that affect a terminus are less of a problem, e.g. Pyro-glu

Mods that apply to residues with a high fractional abundance and at anyposition are BIG prob, e.g. Phospho (ST)


25/50

25

ModificationsPost-translational

Phosphorylation, acetylation

Artifacts

Oxidation, acetylations

Derivatization

Alkylation of cysteine

Sequence varients

Errors, SNPs, other varients

Take complete list from unimodAnd if alkylation agent is iodoacetamide (carbamidomethyl),iodoacetic acid (carboxymethyl), and MMTS (methylthio).


26/50

26

PhosphorylationSite heterogeneity

Poor ionization efficiency

3 fragmentation channels

Intact fragments

Natural loss of HPO3 (80 Da)

Natural loss of H3PO4 (98 Da)

Can occur at STY -~16% of residues


27/50

27

Protein massMass of the intact protein in kDa.

If this field is left blank, there is no restriction on protein mass

Slow down the search a little.


28/50

28

TolerancePeptide tolerance

MS/MS toleranceError window on experimental peptide mass values

Units: percentage, milli-mass units, parts per million, orDaltons.

Protein/peptide view includes a graph of the mass errors for

fragment ions.Specifying too tight peptide tolerance , common reason forfailing to get a match

A more appropriate tolerance should be +/- 0.3 in MS/MS


29/50

29

Mass typeAverage or monoisotopic.

Monoisotopic: most abundant natural isotopes

First peak of isotope distribution.Average mass is the chemical mass, centre of gravity of theisotope distribution.

Difference is approximately 0.06%.

If you get this setting wrong, the mass errors will be very large


30/50


31/50

31

Data (PMF)Mass

Query window are used when no data file.

The data format is auto detected.

List of mass values, one per line. If a second values is

present, it is assumed to be intensity. Any further values onthe same line are ignored

Mascot also supports other peak list formats

Applied biosystems data explorer (.pkm)

Bruker analysis autoxecute data report

Bruker XML

mzData (1.o5)

mzML


32/50

32

Data (MS/MS)The format cannot be auto-detected, and must be specified.

InstrumentType of instrument used to acquire the data.This setting determines which fragment ion series will be usedfor scoring


33/50

33

Report

AUTO to display only protein hits with significant scores.One additional after the cutoff at the significant score.


34/50

34

Final tipBeware of

Narrowing the taxanomy

Reducing mass tolerances

Removing modifications

Selecting spectras or mass values

Set search parameters using standard samples


35/50

35

Types of Summary Report


36/50

36


37/50

37

Scoring and statistics

A list of proteins

Some matches not statistically significant.

The score threshold for this search is 76, and the top scoringmatch is 47.

Area shaded green to indicate random, meaningless matches.


38/50

38

Probability based scoringScoring whether the match is random or not.

Probability: observed match, is a random event.

Real match, not random, has very low probability.

Reject anything with a probability greater than a chosen

thresholdThe mascot score is 10log10(p)


39/50

39

Significant thresholdsThe threshold is calculated from the number of trials

P=1/(20x500000)

Standard score

MudPIT score


40/50

40

Expectetion valueThe number of times you could expect to get this score or betterby chance

E=Pthreshold*(10**((Sthreshold-score)/10))

A completely random match has an expectation value of 1 ormore

The better the match, the smaller the expectation value.


41/50


42/50

42

Error tolerant searchTake query 218. the observed mass difference couldcorrespond to either carbamidomethylation orcarboxymethylation at the N-terminus.

Since sample was alkylated with iodoacetamide.carbamidomethylation is also very believable, known artefactof over-alkylation.

Finds new matches by introducing mass shifts


43/50

43

Phosphorylation site

localizationFor confident site localization. Ascore, PTM score and MD-score

MD -score, the score difference between top two matches

Depends on fragmentation techniques

Ability increases with increasing distance

The MD score does not require complex computational


44/50

44

Validation (Decoy)False discovery rate.Most reliable is decoy databaseSeparate databases or concatenated to target entries


45/50

45

DecoySearch a decoy databaseVery simple

Repeat the search

Matches that are found in the decoy database are falsepositives.

It isnt useful when small number of spectra.


46/50

46

DecoyA utility to create a decoy database

Reversed or randomised sequence of the same length isautomatically generated and tested.

The average amino acid composition of the random sequencesis the same

The matches and scores for the decoy sequences are recordedseparately in the result file.


47/50

47

Mascot DaemonAutomates the submission of data files

Batch mode

Real-time monitor modeFollow-up tasks


48/50

48

Mascot DistillerAccess all of the popular data formats

To produce high quality peak lists

Submit and review Mascot search results.

Perform de novo sequencing and interpret sequencetags for tag searches


49/50

49

References

http://www.matrixscience.com

Mikhail M. S., Simone L., Markus B., Manja L., Toby M., MarcusB., Bernard K., The American Society for Biochemistry andMolecular Biology. (2011)

Ville R. Koskinen, Patrick A. Emery, David M. Creasy, and JohnS. Cottrell, Molecular and Cellular Proteomics, (2011)

Elias, J. E. and Gygi, S. P., Natural Methods 4 207-214 (2007)


50/50

50

identification of proteins through mass spectrometry databases

Documents