inferring strengths of protein-protein interactions from experimental data using linear programming...

36
of protein- protein interactions from experimental data using linear programming Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Upload: ada-daniel

Post on 02-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Inferring strengths           of protein-protein interactions from experimental data using     linear programming

Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu

Bioinformatics Center,Kyoto University

Overview Background Probabilistic model Related work Biological experimental data Proposed methods

For binary data For numerical data

Results of computational experiments

Conclusion

Background (1/3)

Understanding protein-protein interactions is useful for understanding of protein functions. Transcription factors

Proteins interact with a factor. Regulate the gene.

Receptors, etc.

Background (2/3)

Various methods were developed for inference of protein-protein interactions Gene fusion/Rosetta stone (Enright et al. a

nd Marcotte et al. 1999) Number of possible genes to be applied is limit

ed. Molecular dynamics

Long CPU time Difficult to predict precisely

Background (3/3)

A Model based on domain-domain interactions has been proposed. Use domains defined by databases

like InterPro or Pfam.

Domain

Domain

Overview Background Probabilistic model Related work Biological experimental data Proposed methods

For binary data For numerical data

Results of computational experiments

Conclusion

Probabilistic model of interaction (1/2)

Model (Deng et al., 2002) Two proteins interact. At least one pair of domains

interacts. Interactions between domains are

independent events.D1

D2

D3

D2 D4

P2P1

: Proteins Pi and Pj interact : Domains Dm and Dn interact : Domain pair (Dm ,Dn) is include

d in protein pair PiX Pj

Probabilistic model of interaction (2/2)

Overview Background Probabilistic model Related work

Association method (Sprinzak et al., 2001) EM method (Deng et al., 2002)

Biological experimental data Proposed methods Results of computational experiments Conclusion

Related work

INPUT: interacting protein pairs (positive examples) non-interacting protein pairs (negative example

s) OUTPUT: Pr(Dmn=1) for all domain pairs

Association method (Sprinzak et al., 2001)

Inference of probabilities of domain-domain interactions using ratios of frequencies

: Number of interacting protein pairs that include (Dm, Dn)

: Number of protein pairs that include (Dm, Dn)

EM method (Deng et al.,2002)

Probability (likelihood L) that experimental data {Oij={0,1}} are observed.

Use EM algorithm in order to (locally) maximize L.

Estimate Pr(Dmn=1)

Overview Background Probabilistic model Related work Biological experimental data Proposed methods

For binary data For numerical data

Results of computational experiments

Conclusion

Biological experimental data Related methods (Association and EM) use o

nly binary data (interact or not). Experimental data using Yeast 2 hybrid

Ito et al. (2000, 2001) Uetz et al. (2001)

For many protein pairs, different results (Oi

j = {0,1}) were observed.

We developed new methods using raw numerical data.

Numerical data

Ito et al. (2000,2001) For each protein pair, experiments

were performed multiple times. IST (Interaction Sequence Tag)

Number of observed interactions By using a threshold, we obtain binary

data.

Overview Background Probabilistic model Related work Biological experimental data Proposed methods

For binary data For numerical data

Results of computational experiments

Conclusion

Proposed methods It seems difficult

to modify EM method for numerical data.

Linear Programming

For binary data LPBN Combined methods

LPEM EMLP

SVM-based method For numerical data

ASNM LPNM

Overview Background Probabilistic model Related work Biological experimental data Proposed methods

For binary data For numerical data

Results of computational experiments

Conclusion

LPBN (LP-based method)(1/2)

Transformation into linear inequalities

Pi and Pj interact

LPBN (LP-based method)(2/2)

Linear programming for inference of protein-protein interactions

Combination of EM and LPBN

LPEM method Use the results of LPBN as initial

parameter values for EM. EMLP method

Constrains to LPBN with the following inequalities so that LP solutions are close to EM solutions.

Simple SVM-based method

Feature vector

Simple linear kernel with Interacting pairs = Positive examples Non-interacting pairs = Negative

examples

Overview Background Probabilistic model Related work Biological experimental data Proposed methods

For binary data For numerical data

Results of computational experiments

Conclusion

Strength of protein-protein interaction

For each protein pair, experiments were performed multiple times.

The ratio can be considered as strength.

Kij : Number of observed interactions for a protein pair (Pi,Pj)

Mij : Number of experiments for (Pi,Pj)

LPNM method (1/2)

Minimize the gap between Pr(Pij=1) and using LP.

LPNM method (2/2)

Linear programming for inference of strengths of protein-protein interactions

ASNM

Modified Association method for numerical data

For binary data (Sprinzak et al., 2001)

Overview Background Probabilistic model Related work Biological experimental data Proposed methods

For binary data For numerical data

Results of computational experiments

Conclusion

Computational experimentsfor binary data

DIP database (Xenarios et al., 2002) 1767 protein pairs as positive 2/3 of the pairs for training, 1/3 for test

Computational environment Xeon processor 2.8 GHz LP solver: loqo

Results on training data (binary data)

SVM

EM

LPBN

Association

Results on test data (binary data)

SVM

EMEML

P

Association

LPEM

Computational experimentsfor numerical data

YIP database (Ito et al., 2001, 2002) IST (Interaction Sequence Tag) 1586 protein pairs 4/5 for training, 1/5 for test

Computational environment Xeon processor 2.8 GHz LP solver: lp_solve

Results on test data (numerical data)

ASNM

EMLPNM

Association

Results on test data (numerical data)

LPNM is the best. EM and Association methods

classify Pr(Pij=1) into either 0 or 1.

LPNM ASNM

EM ASSOC

Ave. Error

0.0308 0.0405 0.295 0.277

CPU (sec.) 1.20 0.0077 1.62 0.0088

Conclusion We have defined a new problem to infer

strengths of protein-protein interactions.

We have proposed LP-based methods. For binary data

LPBN, LPEM, EMLP SVM-based method

For numerical data ASNM LPNM LPNM outperformed the other methods.

Future work

Improve the methods to avoid overfitting.

Improve the probabilistic model to understand protein-protein interactions more accurately.