generalized protein parsimony and spectral counting for functional enrichment analysis nathan...
TRANSCRIPT
Generalized Protein Parsimony and Spectral Counting
for FunctionalEnrichment Analysis
Nathan EdwardsDepartment of Biochemistry and
Molecular & Cellular Biology
Georgetown University Medical Center
Systems Biology
2
Structured High-Throughput
Experiments
KnowledgeDatabases
molecular biology ↕
phenotype
Systems Biology
3
KnowledgeDatabases
Structured High-Throughput
Experiments• Localization• Function• Process• Interactions• Pathway• Mutation
• Proteomics• Sequencing• Microarrays• Metabolomics
molecular biology↕
biology
molecular biology ↕
phenotype
Systems Biology
4
MathematicalModels
Structured High-Throughput
Experiments• Localization• Function• Process• Interactions• Pathway• Mutation
• Proteomics• Sequencing• Microarrays• Metabolomics
molecular biology↕
biology
KnowledgeDatabases
molecular biology ↕
phenotype
Systems Biology
5
MathematicalModels
Structured High-Throughput
Experiments• Localization• Function• Process• Interactions• Pathway• Mutation
• Proteomics• Sequencing• Microarrays• Metabolomics
molecular biology↕
biology
KnowledgeDatabasesFunctional
AnnotationEnrichment
molecular biology ↕
phenotype
Systems Biology
6
MathematicalModels
Structured High-Throughput
Experiments• Localization• Function• Process• Interactions• Pathway• Mutation
• Proteomics• Sequencing• Microarrays• Metabolomics
molecular biology↕
biology
KnowledgeDatabasesFunctional
AnnotationEnrichment
molecular biology ↕
phenotype
Systems Biology
7
MathematicalModels
Structured High-Throughput
Experiments• Localization• Function• Process• Interactions• Pathway• Mutation
• Proteomics• Sequencing• Microarrays• Metabolomics
molecular biology↕
biology
KnowledgeDatabasesFunctional
AnnotationEnrichment
Functional Annotation Enrichment
In any draw, we expect: ~ 5 "evens", ~ 2 "≤ 10", etc.
Each ball is equally likely Balls are independent p-value is surprise! For transcriptomics:
Genes ↔ Balls Genome ↔ Tumbler Diff. Expr. ↔ Draw Annotation ↔ "evens",…
8
Draw 10 of 50!
Why not in proteomics?
Double counting and false positives… …due to traditional protein inference
Proteomics cannot see all proteins… …proteins are not equally likely to be drawn
Good relative abundance is hard… …extra chemistries, workflows, and software …missing values are particularly problematic
9
In proteomics…
Double counting and false positives… Use generalized protein parsimony
Proteomics cannot see all proteins… Use identified proteins as background
Good relative abundance is hard… Model differential spectral counts directly
10
Ignore some PSMs
FDR filtering leaves some false PSMs Enforce strict protein inference criteria Leave some PSMs uncovered
11
10%
Proteins
PSMs
Ignore some PSMs
FDR filtering leaves some false PSMs Enforce strict protein inference criteria Leave some PSMs uncovered
12
Proteins
PSMs
90%
Match uncovered PSMs to FDR
13
Plasma membrane enrichment
Pellicle enrichment of plasma membrane Choksawangkarn et al. JPR 2013 (Fenselau Lab)
Six replicate LC-MS/MS analyses each Cell-lysate (44,861 MS/MS) Fe3O4-Al2O3 pellicle (21,871 MS/MS)
625 3-unique proteins to match 10% FDR: Lysate: 18,976 PSMs; Pellicle: 13,723 PSMs 89 proteins with significantly (< 10-5) increased counts
14
Plasma membrane enrichment
Na/K+ ATPase subunit alpha-1 (P05023): Lysate: 1; Pellicle: 90; p-value: 5.2 x 10-33
Transferrin receptor protein 1 (P02786): Lysate: 17; Pellicle: 63; p-value: 2.0 x 10-11
DAVID Bioinformatics analysis (89/625): Plasma membrane (GO:0005886) : 29 (5.2 x 10-5) Transmembrane (SwissProtKW): 24 (1.3 x 10-6)
Transmembrane (SwissProtKW): Lysate: 524; Pellicle: 1335; p-value: 2.6 x 10-158
15
A protein's PSMs rise and fall together!
16
A protein's PSMs rise and fall together?
17
Anomalies indicate proteoforms
18
Nascent polypeptide-associated complex subunit alpha
19
7.3 x 10-8
20
Pyruvate kinase isozymes M1/M22.5 x 10-5
Summary
Functional annotation enrichment for proteomics too: Careful counting (generalized parsimony) Differential abundance by spectral counts
Use (multivariate-)hypergeometric model for Differential abundance by spectral counts Proteoform detection
21
HER2/Neu Mouse Model of Breast Cancer
Paulovich, et al. JPR, 2007 Study of normal and tumor mammary tissue by
LC-MS/MS 1.4 million MS/MS spectra
Peptide-spectrum assignments Normal samples (Nn): 161,286 (49.7%)
Tumor samples (Nt): 163,068 (50.3%)
4270 proteins identified in total 2-unique generalized protein parsimony
22
Distribution of p-values (Yeast)
23