computational proteomics: structure/function prediction & the protein interactome jaime...

82
Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ([email protected]), with Betty Cheng, Yan Liu, Eric Xing, Yanjun Qi, Judith Klein-Seetharaman, and Oznur Tastan Carnegie Mellon University Pittsburgh PA, USA December, 2008

Post on 15-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

Computational Proteomics: Structure/Function Prediction

& the Protein Interactome

Jaime Carbonell ([email protected]), with Betty Cheng, Yan Liu, Eric Xing, Yanjun Qi, Judith Klein-Seetharaman, and Oznur Tastan

Carnegie Mellon UniversityPittsburgh PA, USA

December, 2008

Page 2: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell2

Simplified View of Biology

Nobelprize.org

Protein sequence

Protein structure

Page 3: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell3

Primary SequenceMNGTEGPNFY VPFSNKTGVV RSPFEAPQYY LAEPWQFSML AAYMFLLIML GFPINFLTLY VTVQHKKLRT PLNYILLNLA VADLFMVFGG FTTTLYTSLH GYFVFGPTGC NLEGFFATLG GEIALWSLVV LAIERYVVVC KPMSNFRFGE NHAIMGVAFT WVMALACAAP PLVGWSRYIP EGMQCSCGID YYTPHEETNN ESFVIYMFVV HFIIPLIVIF FCYGQLVFTV KEAAAQQQES ATTQKAEKEV TRMVIIMVIA FLICWLPYAG VAFYIFTHQG SDFGPIFMTI PAFFAKTSAV YNPVIYIMMN KQFRNCMVTT LCCGKNPLGD DEASTTVSKT ETSQVAPA

3D Structure

Folding

Complex function within network of proteins

Normal

PROTEINSSequence Structure Function

(Borrowed from: Judith Klein-Seetharaman)

Page 4: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell4

Primary SequenceMNGTEGPNFY VPFSNKTGVV RSPFEAPQYY LAEPWQFSML AAYMFLLIML GFPINFLTLY VTVQHKKLRT PLNYILLNLA VADLFMVFGG FTTTLYTSLH GYFVFGPTGC NLEGFFATLG GEIALWSLVV LAIERYVVVC KPMSNFRFGE NHAIMGVAFT WVMALACAAP PLVGWSRYIP EGMQCSCGID YYTPHEETNN ESFVIYMFVV HFIIPLIVIF FCYGQLVFTV KEAAAQQQES ATTQKAEKEV TRMVIIMVIA FLICWLPYAG VAFYIFTHQG SDFGPIFMTI PAFFAKTSAV YNPVIYIMMN KQFRNCMVTT LCCGKNPLGD DEASTTVSKT ETSQVAPA

3D Structure

Folding

Complex function within network of proteins

Disease

PROTEINSSequence Structure Function

Page 5: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell5

Motivation: Protein Structure and Function Prediction

•Ultimate goal: Sequence Function – …and Function Sequence (drug design, …)– Potential active binding sites are a good start, but how

about stability, external accessibility, energetics, …

•Intermediate goal: Sequence Structure– Only 1.2% of proteins have been structurally resolved– What-if analysis (precursor of mutagenesis exp’s)

•Machine Learning & Lang Tech methods– Powertools to model and predict structure & function– ComBio challenges are starting to drive new research

in Machine Learning & Language Technologies

Page 6: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell6

OUTLINE

•Motivation: sequencestructurefunction•Vocabulary-based classification approaches

(Betty Cheng, Jaime Carbonell, Judith Klein-Seetharaman)– GPRC Subfamily classification– Protein-protein coupling specificity

•Solving the “Folding Problem” Machine Learning Approaches to Structure Prediction (Yan Liu, Jaime Carbonell, et al)

– Teriary folds: β-helix prediction via segmented CRFs– Quaternary Folds: Viral adhesin and capsid complexes

•Conclusions and future directions

Page 7: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell7

GPRC Super-family:G-Protein Coupled Receptors

• Transmembrane protein

• Target of 60% drugs (Moller, 2002)

• Involved in cancer, cardiovascular disease, Alzheimer’s and Parkinson’s diseases, stroke, diabetes, and inflammatory and respiratory diseases

VII VIC-Terminus

N-Terminus

Intracellular Loops

Extracellular Loops

Membrane

I

II III IVV

Page 8: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell8

Protein Family & Subfamily Classification (applied to GPCRs)

Subfamily classification based on pharmaceutical properties

Page 9: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell9

Comparative Study – Karchin et al., 2002

Support Vector Machines, Neural Nets, Clustering

Hidden Markov Models

K-Nearest Neighbours, BLAST

Complex

Simple

SVM is the best for subfamily classification- Karchin et al., 2002

Decision Trees, Naïve Bayes

Traditionally, hidden Markov models, k-nearest neighbours and BLAST have been used.

Recently, more complicated classifiers have been used.

Karchin et al. (2002) studied a range of classifiers of varied complexity in GPCR subfamily classification.

But what about those simple classifiers at the other end of the scale?

Hypothesis: Bio-vocabulary selection is crucial for sub-family

classification (and protein-protein interaction prediction)

Page 10: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell10

Study “segments” with different vocabulary

AA, chemical groups, properties of AA

Page 11: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell11

Computing Chi-Square

Cc xc

xcxcx

),e(

),o(),e()(

22

N

tnxc

xc),e(

Observed # of sequences with

feature x

Expected # of sequences

with feature x

Total # of sequences

# of sequences with feature x

# of sequences in

class c

Page 12: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell12

Level I Subfamily Optimization

Number of Features

Acc

ura

cy

Decision Trees Naïve Bayes

Binary Features

N-gram Counts

Page 13: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell13

Level I Subfamily Results

Classifier # of Features Type of Features AccuracyNaïve Bayes 5500-7700 Binary 93.0 %

3300-6900 N-gram counts 90.6 %

All (9702) N-gram counts 90.0 %

SVM 9 per match state in the HMM

Gradient of the log-likelihood that the sequence is generated by the given HMM model

88.4 %

BLAST Local sequence alignment 83.3 %

Decision Tree 900-2800 Binary 77.3 %

700-5600 N-gram counts 77.3 %

All (9723) N-gram counts 77.2 %

SAM-T2K HMM A HMM model built for each protein subfamily 69.9 %

kernNN 9 per match state in the HMM

Gradient of the log-likelihood that the sequence is generated by the given HMM model

64.0 %

Page 14: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell14

Level II Subfamily Results

Classifier # of Features Type of Features Accuracy

Naïve Bayes 8100 Binary 92.4 %

SVM 9 per match state in the HMM

Gradient of the log-likelihood that the sequence is generated by the given HMM model

86.3 %

Naïve Bayes 5600 N-gram counts 84.2 %

SVMtree 9 per match state in the HMM

Gradient of the log-likelihood that the sequence is generated by the given HMM model

82.9 %

Naïve Bayes All (9702) N-gram counts 81.9 %

BLAST Local sequence alignment 74.5 %

Decision Tree 1200 N-gram counts 70.8 %

Decision Tree 2300 Binary 70.2 %

SAM-T2K HMM A HMM model built for each protein subfamily 70.0 %

Decision Tree All (9723) N-gram counts 66.0 %

kernNN 9 per match state in the HMM

Gradient of the log-likelihood that the sequence is generated by the given HMM model

51.0 %

Page 15: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

Helix 3 and 7 known to be important for signal transduction

Top 20 selected “words” for Class B GPCRs. They correlate with identified motifs.

Loop 1 is suspected common binding site

Page 16: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell16

Generalization to Other Superfamilies: Nuclear Receptors

Dataset Feature Type # of Features Accuracy

Validation Testing

Family Binary 1500-4200 96.96% 94.53%

N-grams counts 400-4900 95.75% 91.79%

Level I Subfamily

Binary 1500-3100 98.09% 97.77%

N-gram counts 500-1100 93.95% 91.40%

Level II Subfamily

Binary 1500-2100 95.32% 93.62%

N-gram counts 3100-5600 86.39% 85.54%

Page 17: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell17

G-Protein Coupling Specificity Problem

• Predict which one or more families of G-proteins a GPCR can couple with, given the GPCR sequence

• Locate regions in the GPCR sequence where the majority of coupling specificity information lies

G-Protein Family Function

GsActivates adenylyl cyclase

Gi/oInhibits adenylyl cyclase

Gq/11Activates phospholipase C

G12/13Unknown

Page 18: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell18

N-gram Based Component

• Extract n-grams from all possible reading frames

• Use a set of binary k-NN, one for each G-protein family to predict whether the receptor couples to the family

• Predict coupling if k-NN outputs a probability higher than trained threshold

MGNASNDSQSEDCETRQWLPPGESPAI …

Test Sequence01001………51571225

Counts of all n-grams

K-NN Classifier

Pr(coupling to family C) ≥ threshold?

Predict coupling to family C

Predict no coupling to family C

Yes No

Page 19: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell19

Alignment-Based Component

• A set of binary k-NN, one for each G-protein family to predict whether the receptor couples to the family

• Predict coupling if more than x% of retrieved sequences couple to the family

• 2 parameters:

– Number of neighbours, K– Threshold x%

MGNASNDSQSEDCETRQWLPPGESPAI …

MDNTSNDSQSENREEPLWLPSGESPAIS …

MDNFLNDSKLMEDCKSRQWLLSGESPAI …

MNESYRCQTSTWVERGSSATMGAVLFG …

BLAST

x% of the K1 sequences couple to family C?

Test Sequence

K1 most similar sequences

Predict coupling to family C

Predict no coupling to family C

Yes No

MGNASNDSQSEDCETRQWLPPGESPAI …

MDNTSNDSQSENREEPLWLPSGESPAIS …

MDNFLNDSKLMEDCKSRQWLLSGESPAI …

MNESYRCQTSTWVERGSSATMGAVLFG …

BLAST

x% of the K1 sequences couple to family C?

Test Sequence

K1 most similar sequences

Predict coupling to family C

Predict no coupling to family C

Yes No

Page 20: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell20

Our Hybrid Method:Combining Alignment and N-grams

MGNASNDSQSEDCETRQWLPPGESPAI …

BLAST K-NN,x% = 100%

Test Sequence

Predict coupling to family C

YesNo

N-gram K-NN

Predict coupling to family C

Yes No

Predict nocoupling to family C

Page 21: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell21

Evaluation Metrics & Dataset

CA

AcallRe

BA

AecisionPr

DCBA

DAAccuracy

RP

PRF

21

Truth

Predict

Couplings Non-Couplings

Couplings A B

Non-Couplings

C D

(Cao et al., 2003)

81.3% training set

Same test set

Page 22: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell22

Results on Cao et al. Dataset

Method N-gram Threshold Prec Recall F1

Hybrid 0.66 0.698 0.952 0.805

N-gram 0.34 0.658 0.794 0.719

Cao et al. 0.577 0.889 0.700

Method Max Prec Recall F1

Whole Seq Alignment F1 0.779 0.841 0.809

Hybrid F1 0.775 0.873 0.821

Whole Seq Alignment Precision 0.793 0.730 0.760

Hybrid Precision 0.803 0.778 0.790

• Suggests n-grams contain information not found in alignment

• Hybrid method outperformed Cao et al. in precision, recall and F1• Suggests alignment contains information not found in n-grams

Page 23: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell23

Feature Selection of N-grams

• Pre-processing step to remove noisy or redundant features that may confuse classifier

• Many feature selection algorithms available

• Chi-square was used because of success in GPCR subfamily classification

MGNASNDSQSEDCETRQWLPPGESPAI …

K-NN Classifier

Test Sequence

Pr(coupling to family C) =threshold?

Predict coupling to family C

Predict no coupling to family C

Yes No

01001………51571225

001……10200

Chi-Square Feature Selection

Counts of all n-grams

Selected n-gram counts

MGNASNDSQSEDCETRQWLPPGESPAI …

K-NN Classifier

Test Sequence

Pr(coupling to family C) =threshold?

Predict coupling to family C

Predict no coupling to family C

Yes No

01001………51571225

001……10200

Chi-Square Feature Selection

Counts of all n-grams

Selected n-gram counts

Page 24: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell24

IC Domain Combination Analysis

IC Prec Rec F1 Acc

1 0.782 0.703 0.739 0.796

2 0.820 0.799 0.808 0.845

3 0.661 0.721 0.682 0.730

4 0.632 0.755 0.670 0.694

1, 2 0.820 0.805 0.811 0.847

1, 3 0.799 0.765 0.780 0.825

1, 4 0.780 0.755 0.765 0.807

2, 3 0.837 0.825 0.828 0.861

2, 4 0.828 0.816 0.821 0.853

3, 4 0.773 0.807 0.788 0.821

1, 2, 3 0.822 0.814 0.816 0.850

1, 2, 4 0.807 0.809 0.807 0.843

1, 3, 4 0.792 0.807 0.797 0.832

2, 3, 4 0.839 0.820 0.828 0.861

1, 2, 3, 4 0.824 0.813 0.817 0.853

• Of the 4 domains, 2nd domain yielded best F1 followed by 1st, 3rd and 4th domains

• Most information in IC1 already found in IC2

Page 25: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell25

Tertiary Protein Fold Prediction

• Protein function strongly modulated by structure• Predicting folds, domains and other regular structures

requires modeling local and long distance interactions in low-homology sequences

– Long distance: Not addressed by n-grams, HMMs, etc.– Low homology: Not address by BLAST algorithms

• We focus on minimal mathematical structural modeling– Segmented conditional random fields– Layered graphical models– Fully trainable to recognize new instances of structures

• First acid-test: β-helix super-secondary structural prediction (with data and guidance from Prof. J. King at MIT)

Page 26: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell26

Protein Structure Determination

• Lab experiments: time, cost, uncertainty, …– X-ray crystallography (months to crystalize, uncertain outcome) Nobel Prize, Kendrew & Perutz, 1962

– NMR spectroscopy (only works for small proteins or domains)Nobel Prize, Kurt Wuthrich, 2002

• The gap between sequence and structure necessitates computational methods of protein structure determination– 3,023,461 sequences v.s. 36,247 resolved structures (1.2%)

1MBN

1BUS

Page 27: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

Predicting Protein Structures

• Protein Structure is a key determinant of protein function

• Crystalography to resolve protein structures experimentally in-vitro is very expensive, NMR can only resolve very-small proteins

• The gap between the known protein sequences and structures:

– 3,023,461 sequences v.s. 36,247 resolved structures (1.2%)

– Therefore we need to predict structures in-silico

Page 28: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell28

Predicting Tertiary Folds

• Super-secondary structures

– Common protein domains and scaffolding patterns such as regular combinations of β-sheets and/or -helices

• Out task

– Given a protein sequence, predict supersecondary structures and their components (e.g. β-helices and the location of each rung therein)

• Examples:– Parallel Right-handed β-helix Leucine-rich repeats

Page 29: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell29

Parallel Right-handed β-Helix

• Structure– A regular super-secondary structure with an

an elongated helix whose successive rungs are composed of beta-strands

– Highly-conserved T2 turn

• Computational importance– Long-range interactions

– Repeat patterns

• Biological importance– functions such as the bacterial infection of

plants, binding the O-antigen and etc.

Page 30: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell30

Conditional Random Fields

• Hidden Markov model (HMM) [Rabiner, 1989]

• Conditional random fields (CRFs) [Lafferty et al, 2001]

– Model conditional probability directly (discriminative models, directly optimizable)

– Allow arbitrary dependencies in observation – Adaptive to different loss functions and regularizers– Promising results in multiple applications– But, need to scale up (computationally) and extend to long-

distance dependencies

11

( ) ( | ) ( | )N

i i i ii

P P x y P y y

x, y

11 10

1( ) exp( ( , , , ))

N K

k k i ii k

P f i y yZ

y | x x

Page 31: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell31

• Outputs Y = {M, {Wi} }, where Wi = {pi, qi, si}

• Feature definition

– Node feature

– Local interaction feature

– Long-range interaction feature

Our Solution: Conditional Graphical Models

1 1 1( , , ) ( , ', 1)k i i i i i if w w x I s s s s p q

( , ) '( , , ) ( ', 1 ')k i k i i i i if w x f x p q I s s q p d

Long-range dependencyLocal dependency

1( , , ) '( , , , , ) ( , ')k i j k i i j j i if w w x g x p q p q I s s s s

Page 32: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell32

Linked Segmentation CRF

• Node: secondary structure elements and/or simple fold

• Edges: Local interactions and long-range inter-chain and intra-chain interactions

• L-SCRF: conditional probability of y given x is defined as

, , ,

1 1 , , ,,

1( ,..., | ,..., ) exp( ( , )) exp( ( , , , ))

i j G i j a b G

R R k k i i j l k i a i j a bV k lE

P f g yZ

y y y

y y x x x y x x y

Joint Labels

Page 33: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell33

• Classification:

• Training : learn the model parameters λ

– Minimizing regularized negative log loss

– Iterative search algorithms by seeking the direction whose empirical values agree with the expectation

• Complex graphs results in huge computational complexity

Linked Segmentation CRF (II)

( | )( ( , ) [ ( , )]) ( ) 0G

k c p k cc Ck

Lf E f

y xx y x y

21

( , ) log ( )G

K

k k cc C k

L f Z

x y

1

* arg max ( , )G

K

k k cc C k

y f Y

x

Page 34: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell34

Model Roadmap

Conditional random fields [lafferty et al, 2001]

Segmentation CRFs (Liu & Carbonell 2005)

Chain graph model (Liu, Xing & Carbonell, 2006)

Linked segmentation CRFs (Liu & Carbonell, 2007)

Long-range

Trade-off between local and long-range

Inter-chain long-range

Semi-markov CRFs [Sarawagi & Cohen, 2005]

Beyond Markov dependencies

Generalized discriminative graphical models

Page 35: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell35

Tertiary Fold Recognition: β-Helix fold

• Histogram and ranks for known β-helices against PDB-minus dataset

5

Chain graph model reduces the real running time of SCRFs model by around 50 times

Page 36: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell36

Fold Alignment Prediction: β-Helix

• Predicted alignment for known β -helices on cross-family validation

Page 37: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell37

Discovery of New Potential β-helices

• Run structural predictor seeking potential β-helices from Uniprot (structurally unresolved) databases

– Full list (98 new predictions) can be accessed at www.cs.cmu.edu/~yanliu/SCRF.html

• Verification on 3 proteins with later experimentally resolved structures from different organisms

– 1YP2: Potato Tuber ADP-Glucose Pyrophosphorylase

– 1PXZ: The Major Allergen From Cedar Pollen

– GP14 of Shigella bacteriophage as a β-helix protein

– No single false positive!

Page 38: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell38

Predicting Quaternary Folds

• Triple beta-spirals [van Raaij et al. Nature 1999]

– Virus fibers in adenovirus, reovirus and PRD1

• Double barrel trimer [Benson et al, 2004]

– Coat protein of adenovirus, PRD1, STIV, PBCV

Page 39: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell39

Features for Protein Fold Recognition

Page 40: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell40

Experiment Results: Quaternary Fold Recognition

Double barrel-trimersTriple beta-spirals

Page 41: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell41

Experiment Results: Alignment Prediction

Triple beta-spirals

Four states: B1, B2, T1 and T2

Correct Alignment:

B1: i – o B2: a - h

Predicted Alignment

B1 B2

Page 42: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell42

Experiment Results:Discovering New Membership Proteins

• Predicted membership proteins of triple beta-spirals can be accessed at

http://www.cs.cmu.edu/~yanliu/swissprot_list.xls

• Membership proteins of double barrel-trimer suggested by biologists [Benson, 2005] compared with L-SCRF predictions

Page 43: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell43

Conclusions & Challenges for Protein Structure/Function Prediction

• Methods from modern Machine Learning and Language Technologies really work in Computational Proteomics

– Family/subfamily/sub-subfamily predictions– Protein-protein interactions (GPCRs G-proteins)– Accurate tertiary & quaternary fold structural predictions

• Next generation of model sophistication…• Addressing new challenges

– Structure Function: Structural predictions combined with binding-site & specificity analysis

– Predictive Inversion: Function Structure Sequence for new hyper-specific drug design (anti-viral, oncology)

Page 44: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell44

Proteins and Interactions

• Every function in the living cell depends on proteins

• Proteins are made of a linear sequence of amino acids and folded into unique 3D structures

• Proteins can bind to other proteins physically

– Enables them to carry out diverse cellular functions

Page 45: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell45

Protein-Protein Interaction (PPI) Network

• PPIs play key roles in many biological systems

• A complete PPI network (naturally a graph)

– Critical for analyzing protein functions & understanding the cell

– Essential for diseases studies & drug discoveries

Page 46: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell46

PPI Biological Experiments

• Small-scale PPI experiments One protein or several proteins at a time Small amount of available data Expensive and slow lab process

• Large-scale PPI experiments Hundreds / thousands of proteins at a time Noisy and incomplete data Little overlap among different sets

Large portion of the PPIs still missing or noisy !

Page 47: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell47

Learning of PPI Networks

• Goal I: Pairwise PPI (links of PPI graph)– Most protein-protein interactions (pairwise) have not been

identified or noisy Missing link prediction !

• Goal II: “Complex” (important groups)– Proteins often interact stably and perform functions together as

one unit (“complex” )

– Most complexes have not be discovered Important group detection !

Pairwise Interactions

Protein ComplexPPI NetworkLink Prediction

Group Detection

Page 48: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell48 48

Goal I: Missing Link Prediction

Pairwise Interactions

PPI Network

Page 49: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell49

Related Biological Data

• Overall, four categories:

– Direct high-throughput experimental data: Two-hybrid screens (Y2H) and mass spectrometry (MS)

– Indirect high throughput data: Gene expression, protein-DNA binding, etc.

– Functional annotation data: Gene ontology annotation, MIPS annotation, etc.

– Sequence based data sources: Domain information, gene fusion, homology based PPIs, etc.

direct

Indirect

Utilize implicit evidence and available direct experimental results together

Page 50: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell50

Related Data Evidence

Relational Evidence Between Proteins

1 Synthetic lethal

Attribute Evidence of Each Protein

Expression

Structure

Sequence

Annotation

……

……

Relation expanding1

Page 51: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell51

Feature Vector for (Pairwise) Pairs

– For data representing protein-protein pairs, use directly

– For data representing single protein (gene), calculate the (biologically meaningful) similarity between two proteins for each evidence

Synthetic lethal: 1……

Sequence SimilarityGeneExp CorrelationCoeff…

Pair A-B: fea1, fea2, fea3, …….

Sequence: mtaaqaagee…

GeneExp: 233.94, 162.85, ...

….

Sequence: mrpsgtagaa…

GeneExp: 109.4, 975.3, ...

Protein B Protein A

Pair A-B

Page 52: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell52

Problem Setting

• For each protein-protein pair: – Target function: interacts or not ?– Treat as a binary classification task

• Feature Set

– Feature are heterogeneous

– Most features are noisy

– Most features have missing values

• Reference Set:

– Small-scale PPI set as positive training (hundreds thousands)

– No negative set (non-interacting pairs) available

– Highly skewed class distribution» Much more non-interacting pairs than interacting pairs

» Estimated: 1 out of ~600 yeast; 1 out of ~1000 human

Page 53: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell53

PPI Inference via ML Methods

• Jansen,R., et al., Science 2003– Bayes Classifier

• Lee,I., et al., Science 2004– Sum of Log-likelihood Ratio

• Zhang,L., et al., BMC Bioinformatics 2004– Decision Tree

• Bader J., et al., Nature Biotech 2004– Logistic Regression

• Ben-Hur,A. et al., ISMB 2005– Kernel Method

• Rhodes DR. et al., Nature Biotech 2005– Naïve Bayes

Present focus: Y. Qi, Z. Bar-Joseph, J. Klein-Seetharaman, Proteins 2006

Page 54: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell54

Predicting Pairwise PPIs

– Prediction target (three types)» Pphysical interaction,

» Co-complex relationship,

» Pathway co-membership inference

– Feature encoding » (1) “detailed” style, and (2) “summary” style

» Feature importance varies

– Classification methods» Random Forest & Support Vector Machine

Y. Qi, Z. Bar-Joseph, J. Klein-Seetharaman, Proteins 2006

Details in the paper

Page 55: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell55

Human Membrane Receptors

Ligands

Signal Transduction Cascades

extracellularOther Membrane Proteinstransmembrane

cytoplasmic

Type I Type II (GPCR)

Page 56: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

PPI Predictions for Human Membrane Receptors

• A combined approach

– Binary classification

– Global graph analysis

– Biological feedback & validation

Y. Qi, et al 2008

Page 57: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell57

• Random Forest Classifier– A collection of independent decision trees ( ensemble classifier)

– Each tree is grown on a bootstrap sample of the training set

– Within each tree’s training, for each node, the split is chosen from a bootstrap sample of the attributes

Binary Classification

GeneExpress

TAP

Y2H

GOProcess N HMS_PCI N

GeneOccur Y GOLocalization Y

ProteinExpress

GeneExpress

Gene Express

Domain

Y2HHMS-PCI SynExpress ProteinExpress

• Robust to noisy features• Can handle different types of features

Page 58: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell58

• Compare Classifiers • Receptor PPI (sub-network) to general human PPI prediction

Classifier Comaparison

(27 features extracted from 8 different data sources, modified with biological feedbacks)

Page 59: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

Global Graph Analysis

• Degree distribution / Hub analysis / Disease checking

• Graph modules analysis (from bi-clustering study)

• Protein-family based graph patterns (receptors / receptors subclasses / ligands / etc )

59

Page 60: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell60

Global Graph Analysis

Network analysis reveals interesting features of the human membrane receptor PPI graph

60

For instance:

• Two types of receptors (GPCR and non-GPCR (Type I))

• GPCRs less densely connected than non-GPCRs(Green: non-GPCR receptors; blue: GPCR)

Page 61: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell61 61

Experimental Validation

• FFive predictions were chosen for experiments and three were verified

– EGFR with HCK (pull-down assay)

– EGFR with Dynamin-2 (pull-down assay)

– RHO with CXCL11 (functional assays, fluorescence spectroscopy, docking)

– Experiments @ U.Pitt School of Medicine

Y. Qi, et al 2008

Details in the paper

Page 62: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell62 62

Motivation

• Current situation of PPI task

– Only a small positive (interacting) set available

– No negative (not interacting) set available

– Highly skewed class distribution» Much more non-interacting pairs than interacting pairs

– The cost for misclassifying an interacting pair is higher than for a non-interacting pair

– Accuracy measure is not appropriate here

• Try to handle this task with ranking

– Rank the known positive pairs as high as possible

– At the same time, have the ability to rank the unknown positive pairs as high as possible

Page 63: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell63

Split Features into Multi-View

• Overall, four feature groups:

– P: Direct highthroughput experimental data: Two-hybrid screens (Y2H) and mass spectrometry (MS)

– E: Indirect high throughput data: Gene expression, protein-DNA binding, etc.

– F: Functional annotation data: Gene ontology annotation, MIPS annotation, etc.

– S: Sequence based data sources: Domain information, gene fusion, homology based PPIs, etc.

Direct

Genomic

Functional

Sequence

Y. Qi, J. Klein-Seetharaman, Z. Bar-Joseph, BMC Bioinformatics 2007

Page 64: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell64

Mixture of Feature Experts (MFE)

• Make protein interaction prediction by

– Weighted voting from the four roughly homogeneous feature categories

– Treat each feature group as a prediction expert

– The weights are also dependent on the input example

• Hidden variable, M modulates the choice of expert

P

FS

E Interact ?

M

XMpMXYpXYp )|(),|()|(

Page 65: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell65

Mixture of Four Feature Experts

• Parameters are trained using EM

• Experts and root gate use logistic regression (ridge estimator)

Expert P Direct PPI High throughput

Experiment Data

Expert F Function Annotation

of Proteins

Expert S Sequence or Structure

based Evidence

Expert E Indirect High throughput

Experimental Data

4

1

)()()()()()()( ),1,|(*),|1()|(i

in

innnn

inn wmxypvxmpxyp

),( vwi

Page 66: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell66

Mixture of Four Feature Experts

• Handling missing value

– Add additional feature column for each feature having low feature coverage

– MFE uses present / absent information when weighting different feature groups

• The posterior weight for expert i in predicting pair n

– The weight can be used to indicate the importance of that feature view ( expert ) for this specific pair

4

1

)()()()()(

)()()()()()()()()(

),1,|(*),|1(

),1,|(*),|1(),,,|1(

j

tj

nj

nntnnj

ti

ni

nntnnittnnn

in

i

wmxypvxmP

wmxypvxmPwvxymPh

Page 67: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell67

Performance

• 162 features for yeast physical PPI prediction task

• Features extracted in “detail” encoding

• Under “detail” encoding, the ranking method is almost the same as RF (not shown)

Page 68: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell68

Functional Expert Dominates

Figure: The frequency at which each of the four experts has maximum contribution among validated and predicted pairs

300 candidate protein pairs

51 predicted interactions

33 validated already

18 newly predicted

Page 69: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell69

Protein Complex

• Proteins form associations with multiple protein binding partners stably (termed “complex”)

• Complex member interacts with part of the group and work as an unit together

• Identification of these important sub-structures is essential to understand activities in the cell

Group detection within the PPI network

Page 70: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell70

Identify Complex in PPI Graph

• PPI network as a weighted undirected graph

– Edge weights derived from supervised PPI predictions:

• Previous work

– Unsupervised graph clustering style

– All rely on the assumption that complexes correspond to the dense regions of the network

• Related facts

– Many other possible topological structures

– A small number of complexes available from reliable experiments

– Complexes also have functional /biological properties (like weight / size / …)

Page 71: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell71

Possible topological structures

Edge weight color coded

• Make use of the small number of known complexes supervised• Model the possible topological structures subgraph statistics• Model the biological properties of complexes subgraph features

Page 72: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell72

Properties of Subgraph

• Subgraph properties as features in BN

– Various topological properties from graph

– Biological attributes of complexes

No. Sub-Graph Property

1 Vertex Size

2 Graph Density

3 Edge Weight Ave / Var

4 Node degree Ave / Max

5 Degree Correlation Ave / Max

6 Clustering Coefficient Ave / Max

7 Topological Coefficient Ave / Max

8 First Two Eigen Value

9 Fraction of Edge Weight > Certain Cutoff

10 Complex Member Protein Size Ave / Max

11 Complex Member Protein Weight Ave / Max

5/14/2008

Page 73: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell73

Model Complex Probabilistically

• Bayesian Network (BN)

– C : If this subgraph is a complex (1) or not (0)

– N : Number of nodes in subgraph

– Xi : Properties of subgraph

C

N

X X X X

),...,,,|0(

),...,,,|1(log

21

21

m

m

xxxncp

xxxncpL

Assume a probabilistic model (Bayesian Network) for representing complex sub-graphs

Page 74: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell74

Model Complex Probabilistically

• BN parameters trained with MLE

– Trained from known complexes and random sampled non-complexes

– Discretize continuous features

– Bayesian Prior to smooth the multinomial parameters

• Evaluate candidate subgraphs with the log ratio score L

m

kk

m

kk

m

m

cnxpcnpcp

cnxpcnpcp

xxxncp

xxxncpL

1

1

21

21

)0,|()0|()0(

)1,|()1|()1(log

),...,,,|0(

),...,,,|1(log

Page 75: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell75

Experimental Setup

• Positive training data:

– Set1: MIPS Yeast complex catalog: a curated set of ~100 protein complexes

– Set2: TAP05 Yeast complex catalog: a reliable experimental set of ~130 complexes

– Complex size (nodes’ num.) follows a power law

• Negative training data

– Generate from randomly selected nodes in the graph

– Size distribution follows the same power law as the positive complexes

Page 76: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell76

Evaluation

• Train-Test style (Set1 & Set2)

• Precision / Recall / F1 measures

• A cluster “detects” a complex if

A : Number of proteins only in clusterB : Number of proteins only in complexC : Number of proteins shared

If overlapping threshold p set as 50%

A C B

Detected

Cluster Known comple

x

pCA

C

p

CB

C

&

Page 77: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell77

Performance Comparison

• On yeast predicted PPI graph (~2000 nodes)

• Compare to a popular complex detection package: MCODE (search for highly interconnected regions)

• Compare to local search relying on density evidence only

• Compared to local search with complex score from SVM (also supervised)

Methods Precision Recall F1

Density MCODESVMBN

0.1800.2190.2110.266

0.4620.0750.3770.513

0.2530.1110.2690.346

Page 78: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell78

Human-PPI (Revise 08)HIV-Human PPI (Revise)

Learning PPI Networks

Pairwise Interactions

Pathway

Function Implication

Func ?Func A

Protein Complex

PSB 05PROTEINS 06BMC Bioinfo 07CCR 08 ISMB 08

Prepare

Genome Biology 08

PPI Network

Domain/Motif Interactions

Page 79: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell79

Inter species interactome

What are the interacting proteins between two organisms?

Page 80: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell80

HIV-1 host protein interactions

HIV-1 depends on the cellular machinery in every

aspect of its life cycle.

Fusion

Reverse transcription

MaturationBudding

Transcription

Peterlin and Torono, Nature Rev Immu 2003.

Page 81: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell81

HIV-1 host protein interactions

Human protein

HIV protein

Page 82: Computational Proteomics: Structure/Function Prediction & the Protein Interactome Jaime Carbonell ( jgc@cs.cmu.edu ), with Betty Cheng, Yan Liu, Eric Xing,

© 2003, Jaime Carbonell82

FIN

Questions ?