proteogenomics kelly ruggles, ph.d. proteomics informatics week 9

42
Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Upload: tyrone-mcdowell

Post on 18-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Proteogenomics

Kelly Ruggles, Ph.D. Proteomics Informatics

Week 9

Page 2: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

As the cost of high-throughput genome sequencing goes down whole genome, exome and RNA sequencing can be easily attained for most proteomics experiments

In combination with mass spectrometry-based proteomics, sequencing can be used for:1. Genome annotation2. Studying the effect of genomic variation in proteome3. Biomarker identification

Proteogenomics: Intersection of proteomics and genomics

Page 3: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Proteogenomics: Intersection of proteomics and genomics

First published on in 2004 “Proteogenomic mapping as a complementary method to perform genome annotation”

(Jaffe JD, Berg HC and Church GM) using genomic sequencing to better annotate Mycoplasma pneumoniae

Renuse S, Chaerkady R and A Pandey, Proteomics. 11(4) 2011

Page 4: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Proteogenomics• In the past, computational algorithms were commonly used

to predict and annotate genes. – Limitations: Short genes are missed, alternative splicing prediction

difficult, transcription vs. translation (cDNA predictions)• With mass spectrometry we can

– Confirm existing gene models– Correct gene models– Identify novel genes and splice isoforms

Renuse S, Chaerkady R and A Pandey, Proteomics. 11(4) 2011

Essentials for Proteogenomics

Page 5: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Proteogenomics

1. Genome annotation2. Studying the effect of genomic variation in

proteome3. Proteogenomic mapping

Page 6: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Proteogenomics

1. Genome annotation2. Studying the effect of genomic variation in

proteome3. Proteogenomic mapping

Page 7: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Proteogenomics Workflow

Krug K., Nahnsen S, Macek B, Molecular Biosystems 2010 Renuse S, Chaerkady R and A Pandey, Proteomics. 11(4) 2011

Page 8: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Protein Sequence Databases

• Identification of peptides from MS relies heavily on the quality of the protein sequence database (DB)

• DBs with missing peptide sequences will fail to identify the corresponding peptides

• DBs that are too large will have low sensitivity• Ideal DB is complete and small, containing all

proteins in the sample and no irrelevant sequences

Page 9: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Genome Sequence-based database for genome annotation

Reference protein DB

Compare, score, test significance

annotated peptides

6 frame translation of genome sequence

Compare, score, test significance

annotated + novel peptides

m/z

inte

nsity

MS/MS

Page 10: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Creating 6-frame translation databaseATGAAAAGCCTCAGCCTACAGAAACTCTTTTAATATGCATCAGTCAGAATTTAAAAAAAAAATC

M K S L S L Q K L F * Y A S V R I * K K N

* K A S A Y R N S F N M H Q S E F K K K I

E K P Q P T E T L L I C I S Q N L K K K S

H F A E A * L F E K L I C * D S N L F F I

S F G * G V S V R K I H M L * F K F F F D

F L R L R C F S K * Y A D T L I * F F F G

Positive Strand

Negative Strand

Software: • Peppy: creates the database + searches MS, Risk BA, et. al (2013)• BCM Search Launcher: web-based Smith et al., (1996)• InsPecT: perl script Tanner et. al, (2005)

Page 11: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Genome Annotation Example 1: A. gambiae

Renuse S, Chaerkady R and A Pandey, Proteomics. 11(4) 2011

Peptides mapping to annotated 3’ UTR

Peptides mapping to novel exon within an existing gene

Page 12: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Genome Annotation Example 1: A. gambiae

Renuse S, Chaerkady R and A Pandey, Proteomics. 11(4) 2011

Peptides mapping to unannotated gene

related strain

Page 13: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Armengaud J, Curr. Opin Microbiology 12(3) 2009

Genome Annotation Example 2: Correcting Miss-annotations

currently annotated genes

peptide mapping to nucleic acid sequence

manual validation of miss-annotation

A. Hypothetical protein confirmedB. Confirm unannotated geneC. Initiation codon is downstreamD. Initiation codon is upstream E. Peptides indicate the gene frame is wrongF. Peptides indicate that gene on wrong strandG. In frame stop-codon or frameshift found

Page 14: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

RNA Sequence-based database for alternatively splicing identification

RNA-Seq junction DB

Compare, score, test significance

Identification of novel splice isoforms

m/z

inte

nsity

MS/MS

Page 15: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Annotation of organisms which lack genome sequencing

Compare, score, test significance

Identification of potential protein coding regions

Reference DB of related species

m/z

inte

nsity

MS/MS

De novo MS/MS sequencing

Page 16: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Proteogenomics: Genome Annotation Summary

Renuse S, Chaerkady R and A Pandey, Proteomics. 11(4) 2011

Page 17: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Proteogenomic Genome Annotation Summary

Renuse S, Chaerkady R and A Pandey, Proteomics. 11(4) 2011

Page 18: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Proteogenomics

1. Genome annotation2. Studying the effect of genomic variation in

proteome3. Proteogenomic mapping

Page 19: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Single nucleotide variant database for variant protein identification

Compare, score, test significance

Identification of variant proteins

m/z

inte

nsity

MS/MS

TCGAGAGCTGTCGAGAGCTGTCGAGAGCTGTCGAGAGCTGTCGAGAGCTGTCGATAGCTGExon 1

Variants predicted from genome sequencing

Reference protein DB

+ Variant DB

Page 20: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Creating variant sequence DBVCF File Format

# Meta-information linesColumns: 1. Chromosome2. Position3. ID (ex: dbSNP)4. Reference base 5. Alternative allele 6. Quality score7. Filter (PASS=passed filters)8. Info (ex: SOMATIC, VALIDATED..)

Page 21: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Creating variant sequence DB

…GTATTGCAAAAATAAGATAGAATAAGAATAATTACGACAAGATTC…

……

…CTATTGCAAAAATACGATAGCATAAGAATAGTTACGACAAGATTC…

Add in variants within exon boundaries

In silico translation

EXON 1 EXON2

…LLQKYDSIRIVTTRF…

Variant DB

Page 22: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Splice junction database for novel exon, alternative splicing identification

Compare, score, test significance

Identification of novel splice proteins

m/z

inte

nsity

MS/MS

Intron/Exon boundaries from RNA sequencing

Reference protein DB

+RNA-Seq junction

DB

Exon 1 Exon 2 Exon 3

Alt. Splicing Novel Expression

Exon 1 Exon X Exon 2

Page 23: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Creating splice junction DBBED File Format

Columns:1. Chromosome2. Chromosome Start3. Chromosome End 4. Name 5. Score6. Strand (+or-)7-9. Display info10. # blocks (exons)11. Size of blocks12. Start of blocks

Page 24: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Creating splice junction DBJu

nctio

n be

d fil

e Map to known intron/exon boundaries

Exon 1 Exon 2

1. Annotated Splicing 2. Unannotated alternative splicing

3. One end matches, one within exon

4. One end matches, one within intron 5. No matching exons

Bed file with new gene mapping

Intronic region

Exon 1 Exon 2 Exon 3

Exon 1 Exon 2 Exon 1 Exon 2

Page 25: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Fusion protein identification

Compare, score, test significance

Identification of variant proteins

m/z

inte

nsity

MS/MS

Reference protein DB

+Fusion Gene

DB

Gene XExon 1

Gene XExon 2

Gene YExon 1

Gene YExon 2

Chr 1 Chr 2

Gene XExon 1

Gene YExon 2

Page 26: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Fusion Genes

Fusion Location

.…AGAACTGGAAGAATTGG*AATGGTAGATAACGCAGATCATCT..…

Find consensus sequence

6 frame translation FASTA

Page 27: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Informatics tools for customized DB creation

• QUILTS: perl/python based tool to generate DB from genomic and RNA sequencing data (Fenyo lab)

• customProDB: R package to generate DB from RNA-Seq data (Zhang B, et al.)

• Splice-graph database creation (Bafna V. et al.)

Page 28: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Proteogenomics and Human Disease: Genomic Heterogeneity

•Whole genome sequencing has uncovered millions of germline variants between individuals

•Genomic, proteome studies typically use a reference database to model the general population, masking patient specific variation

Nature October 28, 2010

Page 29: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Proteogenomics and Human Disease: Cancer Proteomics

Cancer is characterized by altered expression of tumor drivers and suppressors

• Results from gene mutations causing changes in protein expression, activity

• Can influence diagnosis, prognosis and treatment

Cancer proteomics • Are genomic variants evident at the protein level?• What is their effect on protein function?• Can we classify tumors based on protein markers?

Page 30: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Tumor Specific Proteomic Variation

Stephens, et al. Complex landscape of somatic rearrangement in human breast cancer genomes.

Nature 2009

Nature April 15, 2010

Page 31: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Personalized Database for Protein Identification

m/z

inte

nsity

MS/MS

Protein DB

Compare, score, test significance

Somatic VariantsSVATGSSEAAGGASGGGARGQVAGTMKIEIAQYRDSGSYGQSGGEQQREETSDFAEPTTCITNNQHSEPRDPRFIKGWFCFIISAR….

Germline VariantsMQYAPNTQVEIIPQGRSSAEVIAQSRASSSIIINESEPTTNIQIRQRAQEAIIQISQAISIMETVKSSPVEFECINDKSPAPGMAIGSGR…

Identified peptides and proteins

Page 32: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Personalized Database for Protein Identification

m/z

inte

nsity

MS/MS

Tumor Specific Protein DB

Compare, score, test significance

+ tumor specific + patient specific peptides

RNA-SeqGenome Sequencing

Identified peptides and proteins

Page 33: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Tumor Specific Protein Databases

Tumor Specific

Protein DB

Non-Tumor Sample Genome sequencing Identify germline variants

Reference Human Database (Ensembl)

Genome sequencingRNA-SeqTumor Sample

Identify alternative splicing, somatic variants and

novel expression

TCGAGAGCTGTCGAGAGCTGTCGAGAGCTGTCGAGAGCTGTCGAGAGCTGTCGATAGCTG

Exon 1 Exon 2 Exon 3

Exon 1

Variants

Alt. Splicing Novel Expression

Exon 1 Exon X Exon 2

Fusion Genes

Gene XExon 1

Gene XExon 2

Gene YExon 1

Gene YExon 2

Gene X Gene Y

Page 34: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Proteogenomics and Biomarker Discovery

• Tumor-specific peptides identified by MS can be used as sensitive drug targets or diagnostic tools– Fusion proteins– Protein isoforms– Variants

• Effects of genomic rearrangements on protein expression can elucidate cancer biology

Page 35: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Proteogenomics

1. Genome annotation2. Studying the effect of genomic variation in

proteome3. Proteogenomic mapping

Page 36: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Proteogenomic mapping

• Map back observed peptides to their genomic location.

• Use to determine: – Exon location of peptides– Proteotypic– Novel coding region– Visualize in genome browsers– Quantitative comparison based on genomic

location

Page 37: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Informatics tools for proteogenomic mapping

• PGx: python-based tool, maps peptides back to genomic coordinates using user defined reference database (Fenyo lab)

• The Proteogenomic Mapping Tool: Java-based search of peptides against 6-reading frame sequence database (Sanders WS, et al).

Page 38: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

PGX: Proteogenomic mapping toolPeptides

Sample specific protein database

Peptides mapped onto genomic

coordinates

Man

or A

sken

azi

Dav

id F

enyo

Log Fold Change in Expression (10,000 bp bins)

Copy Number Variation

Methylation Status

Exon Expression (RNA-Seq)

Number of Genes/Bin

Peptides

Page 39: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Variant Peptide Mapping

SVATGSSEAAGGASGGGAR

SVATGSSETAGGASGGGAR

ACG->GCG

Peptides with single amino acid changes corresponding to germline and somatic variants

ENSEMBL Gene

Tumor Peptide

Reference Peptide

Page 40: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Novel Peptide MappingPeptides corresponding to RNA-Seq expression in non-coding regions

ENSEMBL Gene

Tumor Peptide

Tumor RNA-Seq

Page 41: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Proteogenomic integration

Maps genomic, transcriptomic and proteomic data to same coordinate system including quantitative information

Variants

Proteomic Quantitation

RNA-Seq Data

Proteomic Mapping

Predicted gene expression

Page 42: Proteogenomics Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Questions?