generalized protein parsimony and spectral counting for functional enrichment analysis nathan...

23
Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center

Upload: suzanna-short

Post on 18-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

Generalized Protein Parsimony and Spectral Counting

for FunctionalEnrichment Analysis

Nathan EdwardsDepartment of Biochemistry and

Molecular & Cellular Biology

Georgetown University Medical Center

Page 2: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

Systems Biology

2

Structured High-Throughput

Experiments

KnowledgeDatabases

Page 3: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

molecular biology ↕

phenotype

Systems Biology

3

KnowledgeDatabases

Structured High-Throughput

Experiments• Localization• Function• Process• Interactions• Pathway• Mutation

• Proteomics• Sequencing• Microarrays• Metabolomics

molecular biology↕

biology

Page 4: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

molecular biology ↕

phenotype

Systems Biology

4

MathematicalModels

Structured High-Throughput

Experiments• Localization• Function• Process• Interactions• Pathway• Mutation

• Proteomics• Sequencing• Microarrays• Metabolomics

molecular biology↕

biology

KnowledgeDatabases

Page 5: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

molecular biology ↕

phenotype

Systems Biology

5

MathematicalModels

Structured High-Throughput

Experiments• Localization• Function• Process• Interactions• Pathway• Mutation

• Proteomics• Sequencing• Microarrays• Metabolomics

molecular biology↕

biology

KnowledgeDatabasesFunctional

AnnotationEnrichment

Page 6: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

molecular biology ↕

phenotype

Systems Biology

6

MathematicalModels

Structured High-Throughput

Experiments• Localization• Function• Process• Interactions• Pathway• Mutation

• Proteomics• Sequencing• Microarrays• Metabolomics

molecular biology↕

biology

KnowledgeDatabasesFunctional

AnnotationEnrichment

Page 7: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

molecular biology ↕

phenotype

Systems Biology

7

MathematicalModels

Structured High-Throughput

Experiments• Localization• Function• Process• Interactions• Pathway• Mutation

• Proteomics• Sequencing• Microarrays• Metabolomics

molecular biology↕

biology

KnowledgeDatabasesFunctional

AnnotationEnrichment

Page 8: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

Functional Annotation Enrichment

In any draw, we expect: ~ 5 "evens", ~ 2 "≤ 10", etc.

Each ball is equally likely Balls are independent p-value is surprise! For transcriptomics:

Genes ↔ Balls Genome ↔ Tumbler Diff. Expr. ↔ Draw Annotation ↔ "evens",…

8

Draw 10 of 50!

Page 9: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

Why not in proteomics?

Double counting and false positives… …due to traditional protein inference

Proteomics cannot see all proteins… …proteins are not equally likely to be drawn

Good relative abundance is hard… …extra chemistries, workflows, and software …missing values are particularly problematic

9

Page 10: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

In proteomics…

Double counting and false positives… Use generalized protein parsimony

Proteomics cannot see all proteins… Use identified proteins as background

Good relative abundance is hard… Model differential spectral counts directly

10

Page 11: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

Ignore some PSMs

FDR filtering leaves some false PSMs Enforce strict protein inference criteria Leave some PSMs uncovered

11

10%

Proteins

PSMs

Page 12: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

Ignore some PSMs

FDR filtering leaves some false PSMs Enforce strict protein inference criteria Leave some PSMs uncovered

12

Proteins

PSMs

90%

Page 13: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

Match uncovered PSMs to FDR

13

Page 14: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

Plasma membrane enrichment

Pellicle enrichment of plasma membrane Choksawangkarn et al. JPR 2013 (Fenselau Lab)

Six replicate LC-MS/MS analyses each Cell-lysate (44,861 MS/MS) Fe3O4-Al2O3 pellicle (21,871 MS/MS)

625 3-unique proteins to match 10% FDR: Lysate: 18,976 PSMs; Pellicle: 13,723 PSMs 89 proteins with significantly (< 10-5) increased counts

14

Page 15: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

Plasma membrane enrichment

Na/K+ ATPase subunit alpha-1 (P05023): Lysate: 1; Pellicle: 90; p-value: 5.2 x 10-33

Transferrin receptor protein 1 (P02786): Lysate: 17; Pellicle: 63; p-value: 2.0 x 10-11

DAVID Bioinformatics analysis (89/625): Plasma membrane (GO:0005886) : 29 (5.2 x 10-5) Transmembrane (SwissProtKW): 24 (1.3 x 10-6)

Transmembrane (SwissProtKW): Lysate: 524; Pellicle: 1335; p-value: 2.6 x 10-158

15

Page 16: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

A protein's PSMs rise and fall together!

16

Page 17: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

A protein's PSMs rise and fall together?

17

Page 18: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

Anomalies indicate proteoforms

18

Page 19: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

Nascent polypeptide-associated complex subunit alpha

19

7.3 x 10-8

Page 20: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

20

Pyruvate kinase isozymes M1/M22.5 x 10-5

Page 21: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

Summary

Functional annotation enrichment for proteomics too: Careful counting (generalized parsimony) Differential abundance by spectral counts

Use (multivariate-)hypergeometric model for Differential abundance by spectral counts Proteoform detection

21

Page 22: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

HER2/Neu Mouse Model of Breast Cancer

Paulovich, et al. JPR, 2007 Study of normal and tumor mammary tissue by

LC-MS/MS 1.4 million MS/MS spectra

Peptide-spectrum assignments Normal samples (Nn): 161,286 (49.7%)

Tumor samples (Nt): 163,068 (50.3%)

4270 proteins identified in total 2-unique generalized protein parsimony

22

Page 23: Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular

Distribution of p-values (Yeast)

23