bioinformatic analyses of protein-protein interaction networks ue systems biology and complex...

81
BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex System Master Bioscience Lyon 05/20/201 Biology Summer Schoo Marseille 09/01/201 Christine Brun, , Marseille

Post on 20-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION

NETWORKS

UE Systems Biology and Complex SystemsMaster Biosciences

Lyon 05/20/2010&

Biology Summer SchoolMarseille 09/01/2010

Christine Brun, , Marseille

Page 2: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

A protein never acts alone…

…but interacts with others to perform its function.

Page 3: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Molecular interactions :

- protein-DNA- protein-RNA- protein-protein

- protein-lipid- protein-small molecule

MolecularMovies.org The Inner Life of the Cell

Page 4: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

THE DIVERSITY OF P-P INTERACTIONSfrom Nooren & Thornton (1/3)

• The interaction occurs between identical molecules homo-dimers, trimers, tetramers.....

1- Structural diversity

Ferritine, 24 identical polypeptides

• The interaction occurs between different polypeptides hetero-dimers, trimers, tetramers.....

RNApolymerase, 12 different polypeptides

Page 5: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

2- Functional Diversity

Interactions within the JAK-STAT signaling pathway in drosophila

• Non-obligatory PPI :

Proteins are stable independantly.Proteins are functional independantly.

The interaction performs an action.

Ex: antigen-antibody reaction, enzymatic reaction, phosphorylation reaction (signaling…)

THE DIVERSITY OF P-P INTERACTIONSfrom Nooren & Thornton (2/3)

Page 6: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

• Obligatory PPI :

Proteins are not stable independantly Proteins are not functional independantly.

The interaction is necessary to stability and function.

Ex: protein complexes (DNA polymerase, RNA polymerase, ribosome…)

• Non-obligatory PPI :

Proteins are stable independantly.Proteins are functional independantly.

The interaction performs an action.

Ex: antigen-antibody reaction, enzymatic reaction, phosphorylation reaction (signaling…)

2- Functional Diversity

THE DIVERSITY OF P-P INTERACTIONSfrom Nooren & Thornton (2/3)

Page 7: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

2- Functional Diversity

Interactions within a complex : the yeast

proteasome

THE DIVERSITY OF P-P INTERACTIONSfrom Nooren & Thornton (3/3)

• Obligatory PPI :

Proteins are not stable independantly Proteins are not functional independantly.

The interaction is necessary to stability and function.

Ex: protein complexes (DNA polymerase, RNA polymerase, ribosome…)

Page 8: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

3- Dynamic Diversity• Transient PPI :

Associate and dissociate in vivo.

Transient PPI may be non-obligatory.

Interactions within the JAK-STAT signaling pathway in drosophila

• Permanent PPI :

Exist only in complexes.

Permanent PPI generally correspond to obligatory PPI. Interactions within a

complex : the yeast proteasome

THE DIVERSITY OF P-P INTERACTIONSfrom Nooren & Thornton (3/3)

Page 9: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

A protein never acts alone…

…but interacts with others to perform its function.

Page 10: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Cell

Molécule

Tissue, organ

Organism

Population

Protein Function : a complex notion

Molecular Function

Cellular Function

Physiology

Development, reproduction

Ecologicalequilibrium

Different integration levels

Page 11: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Molecular function of the proteins

Molecular activityexamples: Kinase, ATPase, DNA binding...

Biochemical analyses Bioinformatic predictions: similarity search between sequences and structures

But...

~30% of the genes/proteins of each newly sequences organism do not show any similarity with any known gene.

Page 12: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Cellular function of the proteins

Biological processexamples: signaling, transcription, establishment of the

epithelium

Genetic Analyses Cellular Biology

But...

Sharan et al, Mol Syst Biol, 2007

Page 13: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Protein function: functional annotations are also missing in E.

coli

Bouveret & Brun, MethMolBio, 2010

Page 14: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Interaction analysis allows investigating protein cellular functions.

Interactions within a process: the JAK-STAT signaling pathway in

drosophila

Interactions within a complex : the yeast

proteasome

How to study the protein cellular functions ?

Cellular function = Biological process

Page 15: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

HOW TO IDENTIFY PROTEIN-PROTEIN INTERACTIONS AT THE WHOLE PROTEOME

SCALE?

Two high-throughput methods:

- Two-hybrid screens

- Affinity Purification followed by Mass Spectrometry

Page 16: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

BS

AD

Gene

Transcription factor:DNA Biding site BS

+ Transcription activation domain AD, able to activate the basal transcription

machinery

Transcirption factor biding site

Transcription factor Transcription factor Messenger RNA

Messenger RNA

Messenger RNAMessenger RNA

Messenger RNA

Messenger RNA

Messenger RNAMessenger RNABS

AD

THE MODULARITY OF THE TRANSCRIPTION FACTORS,as an elementary principle of the yeast two-hybrid

Page 17: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

BSBait X

ADPrey Y

Repoter Gene

YEAST TWO-HYBRIDPrinciple of the test

• The bait protein is fused to the BS of a transcription factor.• The prey proteins (potential interactors) are fused to the activation domain of a transcription factor.

• The fusion proteins are expressed in a yeast strain containing a reporter gene under the control of BS.

Page 18: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

BSBait X

ADPrey Y

• When the prey Y interacts with the bait X, the activation domain AD gets close to the gene promotor and the transcription can happen.

Reporter Gene

Reporter RNAReporter RNA

Reporter RNAReporter RNA

Reporter RNA

Reporter RNA

Reporter RNAReporter RNA

ADPrey Y

YEAST TWO-HYBRIDPrinciple of the test

Page 19: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

THE LARGE-SCALE TWO HYBRID SCREENS

S. cerevisiae Uetz et al., 2000Ito et al., 2001

P. falciparum Lacount et al., 2005 C. elegans Li et al., 2004D. melanogaster Giot et al., 2003

Stanyon et al., 2004 Formstecher et al., 2005

H. sapiens Stelzl et al., 2005Rual et al., 2005

T. pallidum Titz et al., 2008H. pilori Rain et al., 2001C. jejuni Parrish et al., 2007Synechocystis Sato et al., 2007Mesorhizobium Shimoda et al., 2008

+ virus (bacteriophage T7, vaccine, HCV, BPV, Herpes, EBV…)

+ host-virus (HCV-human, EBV-human)

Page 20: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Protein reconstitution

Bouveret & Brun, MethMolBio, 2010

OTHER LARGE SCALE METHODS BASED ON THE TWO-HYBRID PRINCIPLE (1/2)

Page 21: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Mappit, a functional complementationassay (cytokine receptor signalingpathway)

Bouveret & Brun, MethMolBio, 2010

OTHER LARGE SCALE METHODS BASED ON THE TWO-HYBRID PRINCIPLE (2/2)

Page 22: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

HOW TO IDENTIFY PROTEIN-PROTEIN INTERACTIONS AT THE WHOLE PROTEOME

SCALE?

Two high-throughput methods:

- Two-hybrid screens

- Affinity Purification followed by Mass Spectrometry

Page 23: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Tag Bait Y

antibody anti-Tag

PRINCIPLE OF COMPLEX PURIFICATION

Page 24: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

PROTOCOL FOR COMPLEX PURIFICATION

Different types of TAG formed by 2 parts, separated by a clivage site allow 2 steps of purification

Bouveret & Brun, MethMolBio, 2010

Page 25: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

PROTOCOL FOR COMPLEX PURIFICATION

Bouveret & Brun, MethMolBio, 2010

Page 26: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

AP/MS analyses

S. cerevisiae Gavin et al., 2002, 2006Ho et al., 2002Krogan et al., 2006

E. coli Butland et al., 2005Arifuzzaman et al., 2006Hu et al., 2009

M. pneumoniae Kühner et al., 2009

D. melanogaster Perrimon lab, 2009

(+ signaling pathways in drosophila and human)

Page 27: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

HOW TO IDENTIFY PROTEIN-PROTEIN INTERACTIONS AT THE WHOLE PROTEOME SCALE?

The two high-throughput methods do not detect the same interaction types :

- the yeast two-hybrid method detects interactions which are biophysically possible, transient or

permanent.

- the tandem affinity purification method identifies permanent interactions in vivo.

Page 28: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

FINDING INTERACTION DATA?

InternationalMolecularExchange

Consortium

Specialized Databases

Multi-organisms:DIP (dip.doe-mbi.ucla.edu)IntAct (www.ebi.ac.uk/intact)MINT (mint.bio.uniroma2.it/mint)BioGRID (www.thebiogrid.org)BIND (www.blueprint.org)

Yeast:MPact (mips.gsf.de/genre/proj/mpact)

Human:HPRD (www.hprd.org)Reactome (http://reactome.org/)

Meta-database

APID (bioinfow.dep.usal.es/apid/index.htm)

Page 29: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

EXAMPLE 1: INTACT

Page 30: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

EXAMPLE 1: INTACT

Page 31: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

EXAMPLE 2: APID METABASE

Page 32: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

A standardized representation of the interactions is proposed for databases. Authors are invited to submit their interactions to databases according to this format when they submit their publication towards a better record of the knowledge

Page 33: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Interactions described in databases can be represented as networks

= universal language to describe complex systems

Disease Spread

[Krebs]

Social Network

Food Web

Neural Network[Cajal]

ElectronicCircuitInternet

[Burch & Cheswick]

Page 34: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Non oriented graphNode = protein

Edge = physical interaction

PROTEIN-PROTEIN INTERACTION NETWORK

Page 35: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

WHAT IS AN INTERACTOME ?The set of all possible protein-

protein interactions between all the proteins of an organism.

Jeong et al., 2001 Li et al., 2004

Formstecher et al., 2005 Rual et al., 2005

Page 36: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

FAQ about INTERACTOMES

The set ofall possible protein-protein interactions between all the

proteins of an organism.

Are all detected interactions

physiological?

What’s the sizeof the interaction

space?

BUT...Interactomes do not contain spatio-temporal information 2D maps,

projections, long-exposure photographs...

~75 000 to 350 000 interactions in human

They are physically possible

Page 37: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

SHOULD WE TRUST LS-Y2H ? (1/4)

Giot et al.Science 2003

Formstecher et al.Genome Res 2005

Stanyon et al. Genome Biol 2004

• Low overlap between experiments• The size of the complete set of interactions being unknown False-positive or barely overlapping sub-sets?

Comparison of the results of the 3 drosophila LS-Y2H screens

Page 38: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Formstecher et al.Genome Res 2005

Comparison of the results of 2 drosophila LS-Y2H…

Yes, we should since interactions detected in LS-Y2H are possible interactions.

SHOULD WE TRUST LS-Y2H ? (2/4)

20416 203723

192 63823

…only 30 baits in common

Giot et al.Science 2003

Page 39: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Braun et al., Nature Meth 2009

Different detection methods do not detect p-p interactions with the same efficiency

SHOULD WE TRUST LS-Y2H ? (3/4)

Page 40: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Positive interactions(17-21%)

From Venkatesan et al., Nat. Meth. 2009

SHOULD WE TRUST LS-Y2H ? (4/4)

Negative interactions(false positive rate0.5-2%)

1- SENSITIVITY:

2- COVERAGE:

Low sensibility rather than low

specificity

Page 41: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

AND TAP-TAG? (1/2)

Overlap of the interactions detected in the 3 TAP-TAG screens performed in E. coli.

Hu et al., Plos Biol 2009

Page 42: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

AND TAP-TAG? (2/2)

Overlap of the interactions detected in the 1 TAP-TAG screen performed in M. pneumoniae and interactions identified/inferred by other means.

Kühner et al., Science 2009

Page 43: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

WHAT ABOUT LITTERATURE?

Cusick et al., Nature Meth 2009

Page 44: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Discovery Science : • Knowing all the parts• ~30 % of the genes/sequenced genomes proteins of unkonwn function

Predict/discover protein function

Systems biology approach :• Analyze the interactome properties

bring some new insights into old and novel biological questions

MOTIVATIONS OF INTERACTOMICS

Page 45: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

1- Description

network organisation (stat, graph theory…)

- Protein degree- Edge-betweenness- K-core- Diametre....

POSSIBLE APPROACHES

Page 46: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

• The protein degree number of neighbours

• If the network is directed, kin et kout

k = 4k = 4

kin = 1

kout

= 3

kin = 1

kout

= 3

PROTEIN DEGREE

Page 47: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

• Protein degree distribution:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

0.2

2

20

200

2000

levure S. cerevisae

connectivité k

nom

bre

de g

ènes

A lot of proteins are poorly connected

Some proteins are highly connected = « hub »

What does it mean in biology?

PROTEIN DEGREE

Page 48: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

…when the airline traffic network is organized as a protein-protein interaction network ???

Power-law distribution

WHAT DOES IT MEAN BIOLOGICALLY?

Page 49: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

‘EDGE BETWEENESS’

a

b

c

d

f

e g

h

i j

Number of shortest paths going through an edge

(computed between all node pairs)

Bio: Centrality

Processes are connected by interactions of high

betweenness.

a

b

c

d

f

e g

h

i j

1

8

8

3,5

15

3,55,5

5,5

24

7

14

2

9

(Can also be used to disconnect the graph)

Page 50: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

b

d

a

c

X

X

X

X

X

X X

a,b,c,d,e belong to core 1

e

K-CORE NOTION or how to peel the interactome…(1/3)

* Recursively remove vertices/proteins according to their number of neighbours

Page 51: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

i

g

l

f

k

h

X

X

XX

X

X

a,b,c,d,e belong to core 1f,g,h,i,j,k,l belong to core 2…

X

X X

X

XX

j

K-CORE NOTION or how to peel the interactome…(2/3)

* Recursively remove vertices/proteins according to their number of neighbours

X

Page 52: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

K-CORE NOTION or how to peel the interactome…(3/3)

Bio: Central proteins vs. peripheral proteins

Functional differences ?

Page 53: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

X

XPotential difficulties:

It is often difficult to match a graph property/characteristic to a biological role/property…

…increased when the graph/interactome does not contain any spatio-temporal information

Towards data integration

Page 54: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

• The p-p interaction network is a static view• All interactions do not happen in the same

time at the same place! • ‘Dynamic’ information: expression data from

transcriptome experiments.

ex

pre

ss

ion

de

s g

èn

es

temps

on off on

EXAMPLE OF DATA INTEGRATION (1/2)

Page 55: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

• Different kinds of hubs:

1st possibility:Simultaneous interactions « party hubs »

1st possibility:Simultaneous interactions « party hubs »

2nd possibility:Successive interactions« date hubs »

2nd possibility:Successive interactions« date hubs »

M phase of the cell cycle

S phase of the cell cycle

[Han et al., Nature 2004]

Inter-processcommunication

Intra- processrole

EXAMPLE OF DATA INTEGRATION (1/2)

Page 56: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

2- Functional module identification for function prediction and systems biology

classification and graph partitioning

POSSIBLE APPROACHES

Page 57: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

from Sharan et al., Mol Syst Biol, 2007

FUNCTION PREDICTION: 2 types of methods

Function prediction Function prediction+ Systems biology

Page 58: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

FUNCTION PREDICTION : direct method

Inferrence of the function of an uuncharacterized protein by transfer of its neighbour’s functions.

- majority rule- functional flux- ...

Page 59: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Identification of groups of proteins. Inferrence of the group function.

- density- distances- edge-betweenness/betweenness cut- optimisation of criterion: modularity ( higher nb of internal edges / random partition of the graph, with the same class cardinals)- ...

FUNCTION PREDICTION: module detection

Page 60: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

EXAMPLE 1: IDENTIFICATION OF MODULES BASED ON EDGE DENSITY

• What is dense zone ?

• « rigourous » definition:

not dense... ...rather dense !

maximal nb of connections between N proteins is ½N(N-1)Density is defined as:

maximal nb of connections between N proteins is ½N(N-1)Density is defined as:

d =connection nb

maximal nb of connections

d=6/21=0.28 d=14/21=0.67

Page 61: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

FUNCTIONAL MODULES IDENTIFIED BASED ON EDGE DENSITY

Cell cycle regulationCell cycle regulation

Signaling pathway triggered by pheromonesSignaling pathway triggered by pheromones

Spirin & Mirny, PNAS 2003

Page 62: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

EXAMPLE 2: THE PRODISTIN METHOD, A FUNCTIONAL CLASSIFICATION BASED ON INTERACTIONS

1-

The Czekanowski-Dice distance (Dice, 1945)

3- A classification tree

Brun et al., Genome Biology, 2003; Baudot et al., Bioinformatics, 2006

4- Annotated classification tree

+ GO annotations

2-

Page 63: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

PRINCIPLE: …calculate a distance based on the number of interactors shared and unshared by protein pairs, reflect of

their functional similarity

A B

D

C

HYPOTHESIS: the more proteins share common interactors, the more likely they are functionally related

Page 64: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

A POSSIBLE TRANSLATION OF THE BIOLOGICAL HYPOTHESIS:

THE CZEKANOWSKI-DICE DISTANCE

| XY | + | XY |

| X \ (XY) | + | Y \ (XY) | D(X, Y) =

8 + 3

2 + 3 =

(Dice, Ecology, 1945)

Page 65: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

1-

The Czekanowski-Dice distance (Dice, 1945)

3- A classification tree

Brun et al., Genome Biology, 2003; Baudot et al., Bioinformatics, 2006

4- Annotated classification tree

+ GO annotations

2-

EXAMPLE 2: THE PRODISTIN METHOD, A FUNCTIONAL CLASSIFICATION BASED ON INTERACTIONS

Page 66: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

[CLASS: 76]CA sensory_organ_development, neuroblast_division, cytoskeleton_organization_and_biogenesisP# 5PN pros, mira, insc, numb, baz

[CLASS: 60]CA myoblast_fusionP# 7PN sls, Mhc, Act88F, Actn, rols, mbc, Crk

FUNCTIONAL MODULES IDENTIFIED BY THE

COMPUTATION OF A DISTANCE

Functional classes = groups of proteins involved in the same pathway, the same protein complex or the same cellular process through interactions

Page 67: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

PROTEIN FUNCTION PREDICTION: EXAMPLE OF THE'DNA METABOLISM' and 'CELL CYCLE‘ CLASS

Pre-Replication Complex

Protein kinase Complex

Telomere Replication

Targets PRC to origin

Cell cycle control

Chromatine Structure

??Telomere tethering to the nuclear periphery

POSSIBILITY OF FUNCTION INFERRENCE

Page 68: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

PREDICTION OF CELLULAR FUNCTION

93 uncharacterised proteins

42 belong to PRODISTIN classes

a cellular function is predicted

2 new predictions (5%)

40 predicted by other bioinformatic methods

or recentlycharacterised

experimentally

+

27 in agreement (64%)

13 different (30%)Brun et al., Genome Biol, 2003

Page 69: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

STATISTICAL EVALUATION OF CLASS QUALITY AND FUNCTIONAL PREDICTIONS (1/2)

- Is the protein clustering significant? Would it happen by chance? Test on random networks of the same topology – ‘Reshuffling’ 15 classes instead of 64 Significant

- What is the functional prediction quality? Suppose that all members of a class all perform the function assigned to the class; compare these predictions with known functions. Success rate = # correctly predicted functions/ # predictions 67 % (vs 43 % for Majority Rule Algorithm)

- What is the class quality? Class Robustness Index (CRI). Based on tree topological criteria (bootstrap for a distance-based method) 0 < 0.96 < 1 Brun et al., Genome Biol, 2003

Page 70: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

- How do PRODISTIN deal with noise in interaction data? Is it robust toward the presence of both spurious and missing interactions in the dataset? Test on networks of different topologies – ‘Rewiring’ What is the prediction success rate?

STATISTICAL EVALUATION OF CLASS QUALITY AND FUNCTIONAL PREDICTIONS (2/2)

0

100

200

300

400

500

600

0 10 20 30 40 50

% rewiring

# p

rote

ins

pre

dic

ted

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

pre

dic

tio

n r

ate

Brun et al., Genome Biol, 2003

Page 71: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

INTERACTOMES AND THE EVOLUTION OF THE FUNCTION OF THE DUPLICATED GENES

WHAT CAN WE LEARN ABOUT THE EVOLUTION OF THE FUNCTION OF THE DUPLICATED GENES WHEN STUDYING THE INTERACTOME WITH THE CLASSIFICATION METHOD ?

THE ANCESTRAL WHOLE GENOME DUPLICATION IN YEAST

UPDATE RESULTS 2004 2007

Page 72: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

• 100-150 million years ago

• After the Kluyveromyces waltii and S. cerevisiae divergence (Kellis et al., 2004)

• Followed by massive deletion events

• 16% of the present genome is formed by WGD paralog pairs, remnants of this duplication event 457 - 460 pairs (Wolfe & Shields, 1997; Seoighe & Wolfe, 1999; Kellis et al., 2004)

THE ANCESTRAL WHOLE GENOME DUPLICATION IN YEAST

Kellis et al., 2004

Page 73: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

A

Duplication t0

t1

t2

A’

A A4 shared interactors

A’’2 shared interactors2 specific interactors

A’A’’’2 shared interacteurs4 specific interactors

EVOLUTION OF THE NUMBER AND THE IDENTITY OF THE PARALOGUES INTERACTORS AFTER A GENOME

DUPLICATION

Page 74: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

YEAST PPI NETWORK 2004

Interactions 3991 (Core data LS-Y2H+ Homemade literature curation)

17656

Proteins 2644 (~ 46% ORFs)

4773(82,3% ORFs)

Mean degree 3 7,4

Paralogs pairs in the

classification tree

38(8% paralogs)

172(37%

paralogs)

2004

Page 75: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

38 paralog pairs

How are they classified?

Functional classification tree for yeast proteins

(2004)

Both paralogs are in the

same class

Page 76: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Functional classification tree for yeast proteins

(2004)

Paralogs are classified in

different classes devoted to the same cellular

function

Page 77: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Paralogs are classified in

different classes devoted

to different cellular function

Functional classification tree for yeast proteins

(2004)

Page 78: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

THREE CLASSIFICATION BEHAVIOURS

43(25%)

21(12,2%)

9(24%)

Different class,Different function

13(7,6%)

3(8%)

Different class,Same function

95(55,3%)

26(68%)

Same class,Same function

2004 2007

• The majority of the WGD paralogs (68%) are in the same functional class share interactors.• The majority of the WGD paralogs are involved in the same cellular function (76%).

Page 79: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

Different class, Different function

24%

FunctionalDivergence

-

+

Different class,Same function

8%

Same class, Same function

68%

EVOLUTION OF CELLULAR FUNCTION: A SCALE OF FUNCTIONAL DIVERGENCE FOR DUPLICATED

GENES BASED ON INTERACTION ANALYSIS

EVOLUTION BY NEO-FUNCTIONALIZATION

(Ohno, 1970)

EVOLUTION BY SUB-FUNCTIONALIZATION

(Force et al., 1999)

Baudot et al., 2004, Genome Biology, 5: R76

Page 80: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

• Sequence conservation does not necessarily imply function conservation Complex relationship between sequence identity and cellular function • The functional classification shows paralog properties not detectable by sequence analysis alone.

CLASSIFICATION BEHAVIOURS AND SEQUENCE IDENTITY

Same class, Same function

Different class, Same function

Different class, Different function

Not classified

Page 81: BIOINFORMATIC ANALYSES OF PROTEIN-PROTEIN INTERACTION NETWORKS UE Systems Biology and Complex Systems Master Biosciences Lyon 05/20/2010 & Biology Summer

* A scale of functional divergence for the duplicated genes based on the interactions Distincts scenarii for the evolution of the cellular function of the duplicated genes.

* Differences between paralog pairs neither detectable by sequence analysis nor by functional annotation analysis a novel type of information by the analysis of the interactions.

(Baudot et al., Genome Biology, 2004)

Update 2004-2007:

* Stability of the results with a 4 times larger interactome:

- high quality interactome, even small, gives reliable biological information

- robustness of the network analysis method to false negative/positive (Brun et al., Genome Biology, 2003).

- changes in functional annotations (‘knowledge’ effect) may change the biological interpretation of results.

INTERACTION NETWORK AND EVOLUTION OF THE FUNCTION OF THE DUPLICATED GENES :

CONCLUSIONS