peptide-assisted annotation of the mlp genome

16
Peptide-assisted annotation of the Mlp genome Philippe Tanguay Nicolas Feau David Joly Richard Hamelin

Upload: shalin

Post on 01-Feb-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Peptide-assisted annotation of the Mlp genome. Philippe Tanguay Nicolas Feau David Joly Richard Hamelin. Objective. Use peptide libraries to validate the in silico prediction of gene models. Assumption : « if a peptide protein is detected, then there must be a gene that encodes it ». - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Peptide-assisted annotation of the Mlp genome

Peptide-assisted annotation of the Mlp genome

Philippe TanguayNicolas FeauDavid JolyRichard Hamelin

Page 2: Peptide-assisted annotation of the Mlp genome

Objective

• Use peptide libraries to validate the in silico prediction of gene models

Mapping peptides on a translated genome sequence = provides « correct frames of translation »

Assumption : « if a peptide protein is detected, then there must be a gene that encodes it »

Page 3: Peptide-assisted annotation of the Mlp genome

Methodology (hardware)

Urediniospores (3729)

Protein extraction

1D SDS-PAGE

Gel slicing (64)

Trypsin digestion

LC-MS/MS

Bioinformatics

Waters MassPREP station

LTQ ThermoElectron

Extraction SlicingDigestionElution

Peptide MS/MS dataacquisition

Page 4: Peptide-assisted annotation of the Mlp genome

Methodology (Bioinformatic)

Spectral identification by sequence

database searching

Statistical validation of peptide identifications

Protein databases built from…

1 - Comparison of results from both db2- Comparison of peptides and GM

(validation/correction of genome annotations)

6 frames translation of the genome

Gene catalog (16694 GM)

MascotSequest

MascotSequest

Page 5: Peptide-assisted annotation of the Mlp genome

MLP proteomic results so far

• 691 000 MS/MS spectra obtained from the total proteins

10980 3524699

Gene catalog 6-frame translation

Mascot +

SequestOnly

Mascot

352 unique peptides obtained from the 6-frames translation db have do not match GM of the Gene catalog

Unique peptides:

False discovery rate below 1.6%

Page 6: Peptide-assisted annotation of the Mlp genome

Peptide frequency distribution on GM

0

50

100

150

200

250

300

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79

No. peptide/gene model

No

. ge

ne

mo

del

Mean 9 peptides covering 134 AA / GM

The 10980 + 4699 peptides represent assignments for nearly 10% of the Gene catalog e.g. 1659 GM

Page 7: Peptide-assisted annotation of the Mlp genome

Automated classification of peptides with no hit (352) on the Gene catalog

• 5’ extension of a predicted GM– If peptide (s) located within the 1000 bp upstream the predicted

GM start codon• 3’ extension of a predicted GM

– If peptide (s) located within the 1000 bp downstream the predicted GM stop codon

• 5’ and 3’ extension of a predicted GM– If peptides located within the 1000 bp upstream the start codon

and within the 1000 bp downstream the predicted GM stop codon

• Internal extension of a predicted GM– If peptide (s) located in the GM

• New GM– If no predicted GM in the vicinity of the peptide (s)

Page 8: Peptide-assisted annotation of the Mlp genome

Corrections-Additions to the Gene catalog

Modification Number of GM

5’ extension 44

Internal exon extension 31

3’ extension 22

5’ and 3’ extension 5

New GM 73

Total 172

• Mapping of the peptides with no hit on the genome allowed the following modifications

Page 9: Peptide-assisted annotation of the Mlp genome

Manual curation- Internal extension

Page 10: Peptide-assisted annotation of the Mlp genome

Manual curation- Internal extension

• EuGene’s prediction is OK

Page 11: Peptide-assisted annotation of the Mlp genome

Manual curation- New GM

Page 12: Peptide-assisted annotation of the Mlp genome

Manual curation- New GM

Page 13: Peptide-assisted annotation of the Mlp genome

Summary – Peptide-assisted genome annotation

– Validated 10 % of the predicted GM– Corrected/found > 170 GM

According the manual curation accomplished so far, it appears that EuGene had predicted most of the corrected/found > 170 GM

With little resources (6000 $ worth of materials and services, and a few weeks worth of labour) our proteomic analysis:

Page 14: Peptide-assisted annotation of the Mlp genome

• A quantitative proteomic approach (iTRAQ) will be used to compare urediniospores, germinated urediniospores and haustoria protein complexes

Perspectives

• Analysing the Sequest output obtained from the 6-frames translation

5051 peptides identified with Mascot (352 with no hits on the Gene catalog)

Sequest ?

Page 15: Peptide-assisted annotation of the Mlp genome

Available material

• Our set of peptide spectra from urediniospores proteins is available to validate new GM predictions

• The peptides GFF files will be made available to the Melampsora community

Page 16: Peptide-assisted annotation of the Mlp genome

Finding the peptides on the different model prediction sets

Gene Catalog 16694 1659 9,9%

EuGene 12386 1348 10,9%

Genewise1 14087 977 6,9%

Genewise1Plus 14162 1046 7,4%

fgenesh1_pg 15760 1140 7,2%

fgenesh2_pg 17833 1377 7,7%

Do we need to perform a new spectra search on the whole model prediction sets ?

Total GMModel prediction set GM validated %