data validation and annotation: prideviewer and pike bioinformatics analysis from proteomics data
DESCRIPTION
Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data. ProteoRed Bioinformatics Workshop Salamanca Alberto Medina-Aunon March, 15th 2010. Main Topics. Mass spectrometry and protein and peptide validation PRIDEViewer: Description. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/1.jpg)
Data Validation and Annotation: PRIDEViewer and PIKE
Bioinformatics analysis from proteomics data
ProteoRed Bioinformatics Workshop Salamanca
Alberto Medina-AunonMarch, 15th 2010
![Page 2: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/2.jpg)
Main Topics
• Mass spectrometry and protein and peptide validation– PRIDEViewer: Description.– Examples: Uses-cases.
• Experiment context: Linking functional information to our proteins.– PIKE: Description.– Examples: Uses-cases.
![Page 3: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/3.jpg)
• Starting from:– Mass spectrum/spectra– Tentative identification/Sequence– Search Engine
MS Validation. The easiest Way
Candidate: AFLLAMAARTGFRTR
![Page 4: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/4.jpg)
How to do it
• By hand:– Just for a few sequences/spectra – We cannot read every format files (for instance
binaries).
• Semi-automatically: – Using PRIDE files as input: PRIDEViewer
![Page 5: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/5.jpg)
PRIDEViewer Experiment info
![Page 6: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/6.jpg)
PRIDEViewer Sample and Instrument info
![Page 7: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/7.jpg)
PRIDEViewerSpectra and identifications
![Page 8: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/8.jpg)
PRIDEViewerGel Separation
![Page 9: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/9.jpg)
PRIDEViewerMascot interface
![Page 10: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/10.jpg)
One Example: Identification using 5 peptides
![Page 11: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/11.jpg)
Example Mascot output
![Page 12: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/12.jpg)
Another example:350 input spectra
![Page 13: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/13.jpg)
Validation study
• Starting from one public proteomics repository – EBI PRIDE-:
– Retrieve a set of available experiments.– Check the level of fulfillment of the experiments.– Repeat the protein and peptide identification.
VALIDATE THE EXPERIMENT……..
![Page 15: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/15.jpg)
PRIDE: Searching experiments: Biomart
![Page 16: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/16.jpg)
Validation. First Round. Biomart
![Page 17: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/17.jpg)
Validation- First Round: PRIDE Accession 1642
![Page 18: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/18.jpg)
First View: Mascot Results
![Page 19: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/19.jpg)
Validation – First Round:PRIDE Accession 1642
Protein Id Database Peptide Count
Identified
IPI00295598 IPI 2 No
Q15843 SwissProt 6 Yes
P62491 SwissProt 1 NoWhy? If we explore the data, we’ll find …..
Protein Id First Peptide PRIDE mass
Calculated mass
IPI00295598 VISEPGEAEVFMTPEDFVR
2184.0375 2152.0267
Q15843 EIGPPQQQR 1052.5697 1052.5483
P62491 DHADSNIVIMLVGNK 1657.8186 1625.8316
Delta mass around 32Da
![Page 20: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/20.jpg)
Validation – First Round:Pride Accession 1642
• Hypothesis…. – First and third sequences present a mass
variation around 32 Da. • Is there a modification in C or N termini? In that way,
second sequence will present as well.• Is any residue -or more than one- modified?• We’ll extract the common aminoacids: D, A, S, I, C, M
and G• Compare they with the described modifications with a
mass variation of 32 Da.
![Page 21: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/21.jpg)
Validation – First Round:PRIDE Accession 1642.
Only this modification could explain a common property between both sequences.
So, we’ll select it in the next round
![Page 22: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/22.jpg)
Validation – First Round:PRIDE Accession 1642
![Page 23: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/23.jpg)
Validation – Second Round: Latest Experiments. Retrieved by hand
![Page 24: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/24.jpg)
• PRIDE accession id: 10470 to 11257 (787 experiments).– No one is suitable to check.– No information regarding the identification is available.
• PRIDE accession id: 10000 to 10074 (74 experiments).– One dataset could be checked: 10042 to 10060. (Dataset title: Low abundance proteome of
human red blood cells captured by combinatorial peptide libraries)
Validation – Second Round:Latest experiments
![Page 25: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/25.jpg)
Pride Accession 10053
![Page 26: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/26.jpg)
Mascot output
![Page 27: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/27.jpg)
Pride Accession 10060
![Page 28: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/28.jpg)
Mascot output: No identification
![Page 29: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/29.jpg)
Validation – Third Round: Recent Experiments. Retrieved by
hand• Experiment id: 9900 to 9999• Two dataset are suitable to check:
– 9900 to 9942: LC-MALDI experiments (Tannerella forsythia).
– 9944 to 9949: Rattus norvegicus.– 9984: Zebrafish. No spectra.– 9985 to 9992: Homo sapiens. (No identifications).– 44 not available.
![Page 30: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/30.jpg)
Validation – Third Round:Experiment 9900
![Page 31: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/31.jpg)
Validation – Third Round.Experiment 9900
![Page 32: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/32.jpg)
Validation – Third Round: Experiment 9900. Summary
Protein Id Peptide Count
Identified 1st Peptide Mass
Theoretical Mass
TF2239 1 No 1228.5463 1228.6433
TF26612 13 Yes -- --
TF1259 1 No 1271.6478 1271.6783
TF2116 4 No 1139.5835 1139.6208
TF1741 16 No 1044.5144 1044.5473
TF0447 2 No 1092.4619 1092.5432
TF2663 7 Yes -- --
TF2592 2 No 1022.5306 1022.5782
![Page 33: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/33.jpg)
Study summary
• Around 1000 PRIDE experiments were downloaded from PRIDE central repository.
• Around 100 of them were suitable to test.
• Less than of 50% were successfully validated.
![Page 34: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/34.jpg)
In summary
• There a lot of data within the repositories. (PRIDE).
• There a lot of missing information.• It is not possible to check the data
automatically.
• PRIDEViewer could help us saving a lot of time.
![Page 35: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/35.jpg)
Protein Set
• Other times, if there is a mistake in the identification, it will not so significant if finally we can reach to the goal of the experiment.
• For instance, proteins involved in a particular function or biological process.
![Page 36: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/36.jpg)
DB id Protein Name
gi|12857455 Heat shock protein
gi|14017768 FKB9_HUMAN
gi|12836587 Tubulin alpha homo sapiens
gi|15010550 Ubiquitin specific protease
gi|15489190 vinculin isoform VCL Homo sapiens
gi|9963904 selenium binding protein 1 Homo sapiens
… …
PIKE http://proteo.cnb.csic.es/
PIKE: Protein Information and Knowledge extractor
![Page 41: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/41.jpg)
PIKE output. CSV
![Page 42: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/42.jpg)
PIKE output
![Page 43: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/43.jpg)
First example medium-complexity protein list (containing 57 proteins)
J Proteome Res. 2005 Nov-Dec;4(6):2435-41.
![Page 44: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/44.jpg)
First example medium-complexity protein list (containing 57 proteins)
# entry namea Entry ID (UniProt ID)Manual searching PIKE output -Only Keywords-
6 Integrin alpha-5 precursor P08648 1 TM KeyWord: Transmembrane
7Sodium/potassium-transporting ATPase alpha-1 chain precursor P05023 10 TM KeyWord: Transmembrane
8 Short transient receptor potential channel 4 Q9UBN4 8 TM KeyWord: Transmembrane
10 Band 3 anion transport protein P02730 11 TM KeyWord: Transmembrane
11 Transferrin receptor protein 1 P02786 1 TM KeyWord: Transmembrane17 calnexin precursor P27824 1 TM KeyWord: Transmembrane
19 5'-nucleotidase precursor P21589 1 TM; GPI Keyword: GPI-anchor
21 Alkaline phosphatase, placental type precursor P05187 GPIKeyWords: Transmembrane; GPI-anchor
22 4F2 cell-surface antigen heavy chain P08195 1 TM KeyWord: Transmembrane
24Solute carrier family 2, facilitated glucose transporter, member 1 P11166 12 TM KeyWord: Transmembrane
29 chloride intracellular channel protein 5 Q9NZA1 KeyWord: Transmembrane
303beta-hydroxy-Delta5-steroid dehydrogenase multifunctional protein I P14060 1 TM KeyWord: Transmembrane
41 myristoylated alanine-rich C-kinase substrate P29966 Myristoylation Keyword: Myristate
42 Basigin precursor P35613 1 TM KeyWord: Transmembrane
47 Brain acid soluble protein 1 P80723 MyristoylationKeyWords: Transmembrane; Myrsitate
51 ADP-ribosylation factor 1 P84077 KeyWords: Transmembrane; Myristate
![Page 45: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/45.jpg)
Second example Human Plasma Proteins from PRIDE (HPPP). PRIDE Accession 65
25 MOST FREQUENT PROTEINS
Serum albumin [Precursor] - Serum albumin - ALB 356Complement C3 [Precursor] 273IGHA1 protein 225Calcium/calmodulin-dependent protein kinase kinase 2 100Inter-alpha-trypsin inhibitor heavy chain H1-H4 [Precursor] 99Putative uncharacterized protein 97IGL@ protein 96ARF GTPase-activating protein GIT2 90Complement factor B [Precursor] 90PRO2275 90IGHM protein 78IGKC protein 64Alpha-1B-glycoprotein [Precursor] 62cDNA FLJ14473 fis, clone MAMMA1001080. 58CDNA FLJ25298 fis, clone STM07683. 58Fibronectin [Precursor] 58IGHD protein 56Trypsin 55Apolipoprotein-L1 [Precursor] 54HP protein 53Alpha-2-macroglobulin [Precursor] 52SNC66 protein 52Ig kappa chain V-III region HAH [Precursor] 50
PROTEIN COUNT 2226REDUNDANCY RATIO (Protein count/non redundant entries) 89.04%
![Page 46: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/46.jpg)
Third example The Human Plasma Proteome: A non redundant list:
Mol Cell Proteomics. 2004 Apr;3(4):311-26. Epub 2004 Jan 12.
>> We have merged four different views of the human plasma proteome, based on different methodologies, into a single nonredundant list of 1175 distinct gene products ….
![Page 47: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/47.jpg)
Third example The Human Plasma Proteome: A non redundant list:
Mol Cell Proteomics. 2004 Apr;3(4):311-26. Epub 2004 Jan 12.
![Page 48: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/48.jpg)
Conclussion• PIKE represents a suitable and useful bioinformatics
tool for small-or large-scale proteomics projects.
• PIKE main characteristic is its ability to systematically access and automatically retrieve comprehensive biological information contained in common databases.
• The resulting information is output in a wide range of standard formats that can be directly viewed, exported, or downloaded for additional analysis.
![Page 49: Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815668550346895dc41991/html5/thumbnails/49.jpg)
Questions?