part ii protein interactions and networks [email protected] peer bork embl & mdc heidelberg &...

37
Part II Part II Protein Protein interactions interactions and networks and networks [email protected] http://www.bork.embl-heidelberg. Peer Bork Peer Bork EMBL & MDC EMBL & MDC Heidelberg & Berli Heidelberg & Berli Proteome analysis analysis in silico in silico

Post on 22-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Part IIPart IIProtein interactions Protein interactions

and networksand networks

[email protected]://www.bork.embl-heidelberg.de/

Peer BorkPeer BorkEMBL & MDCEMBL & MDC

Heidelberg & BerlinHeidelberg & Berlin

Proteome analysis Proteome analysis in silicoin silico

www.bork.embl-heidelberg.de

II. Protein network analysisII. Protein network analysis

STRING: a framework for network analysisSTRING: a framework for network analysis

Towards spatial and temporal network aspectsTowards spatial and temporal network aspects

Building and destroying interaction networksBuilding and destroying interaction networks

Genomic context analysis: Interaction predictionsGenomic context analysis: Interaction predictionsGenomic context analysis: Interaction predictionsGenomic context analysis: Interaction predictions

Genomic context methods to predict protein interactionsGenomic context methods to predict protein interactions

Korbel et al., Nat. Biotechn. 04 Morett et al., Nat. Biotechn. 03

Dandekar et al. TIBS 98Overbeek et al. PNAS 99 Pellegrini et al. PNAS 99

Enright et al. Nature 99Marcotte et al. Science 99

Prediction of analogous enzymesPrediction of analogous enzymesby anti-correlation of gene occurrencesby anti-correlation of gene occurrences

Species A B C DSpecies A B C DGene a Gene a -- ++ -- - -Gene b Gene b ++ -- ++ - -

Collaboration with Collaboration with Enrique Morett Enrique Morett et alet al., Mexico., Mexico

Application: Application: thiamine-PP thiamine-PP biosynthesisbiosynthesis

Morett et al., Morett et al., Nature Biotech. 21(03)790Nature Biotech. 21(03)790

Gene Gene neighbourhood neighbourhood conservation at conservation at evolutionary time evolutionary time scalesscales

Conservation of Conservation of divergently transcribed divergently transcribed gene pairs reveal gene pairs reveal functional constraintsfunctional constraints

The more conserved The more conserved divergently transcribed divergently transcribed neighboring genes are,neighboring genes are,the higher is their level the higher is their level of co-expressionof co-expression

The resulting prediction method The resulting prediction method can reliably predict associations can reliably predict associations between>2500 pairs of genes; between>2500 pairs of genes; ca 650 of which are supported ca 650 of which are supported by other methodsby other methods

Korbel, Jensen, von Mering, BorkKorbel, Jensen, von Mering, BorkNat. Biotechnol. 2004, JulyNat. Biotechnol. 2004, July

Transcriptional regulators comprise the majority Transcriptional regulators comprise the majority of conserved divergently transcribed gene pairs of conserved divergently transcribed gene pairs

They areThey areall Self-all Self-Regulatory !Regulatory !

Coverage: Homology Coverage: Homology vs.vs. context context

(80% accuracy level,(80% accuracy level,taken from STRING taken from STRING COG mode)COG mode)

Huynen, Snel, von Mering and Bork . Curr.Opin.Cell.Biol. 15(03)191Huynen, Snel, von Mering and Bork . Curr.Opin.Cell.Biol. 15(03)191

www.bork.embl-heidelberg.de

II. Protein network analysisII. Protein network analysis

STRING: a framework for network analysisSTRING: a framework for network analysis

Towards spatial and temporal network aspectsTowards spatial and temporal network aspects

Building and destroying interaction networksBuilding and destroying interaction networks

Genomic context analysis: Interaction predictionsGenomic context analysis: Interaction predictionsGenomic context analysis: Interaction predictionsGenomic context analysis: Interaction predictions

Building and destroying interaction networksBuilding and destroying interaction networks

Three context methods Three context methods toto predict functional interactions predict functional interactions

Phylogenetic Co-occurence

Conserved Neighborhood

Gene fusion events

…allowing the study of networks

combined and quantified in STRINGcombined and quantified in STRINGVon Mering et al. NAR 31(03)258Von Mering et al. NAR 31(03)258

pathway representationpathway representation

comparative genomics: comparative genomics: functional modulesfunctional modules

purinepurinebiosynthesisbiosynthesis

histidinehistidinebiosynthesisbiosynthesis

Biochemical pathways Biochemical pathways vsvs functional modules functional modules

www.string.embl-heidelberg.dewww.string.embl-heidelberg.de

Giant component ofGiant component ofgene context networkgene context network

The more conservation (red) the higher the number of connections

High local connectivity,(c=0.6);hence lot ofsubstructure

pathway representationpathway representation

comparative genomics: comparative genomics: functional modulesfunctional modules

purinepurinebiosynthesisbiosynthesis

histidinehistidinebiosynthesisbiosynthesis

unsupervisedunsupervisedclusteringclustering

Biochemical pathways Biochemical pathways vsvs functional modules functional modules

Coverage: >70%Coverage: >70%

Specificity: ca 90%Specificity: ca 90% Von Mering et al. PNAS 100 (2003) 15428

- Functional assignment of >3000 hypothetical proteins- Functional assignment of >3000 hypothetical proteins

- ‘Target’ for transcription regulators, transporters etc.- ‘Target’ for transcription regulators, transporters etc.

- Pathways links (CoA and nucleotide biosynth.)- Pathways links (CoA and nucleotide biosynth.)

Biological discoveriesBiological discoveries

- Missing enzymes in known pathways- Missing enzymes in known pathways

- Potentially novel pathways/processes/complexes- Potentially novel pathways/processes/complexes

- Independent modules within known pathways- Independent modules within known pathways

STRING STRING annotationsannotations

UracilPermease

Uncharacterized

Pyrimidinebiosynthesis

knownknown

Query protein:Known transcriptional regulator PyrR

Query protein:Putative transcriptional regulator, uncharacterized

Riboflavinbiosynthesis

Uncharacterizedresponse regulator

novelnovel

Doerks et al. TIG, 2004Doerks et al. TIG, 2004

Synergies between homology and context based methodsSynergies between homology and context based methods

www.bork.embl-heidelberg.de

- Functional assignment of >3000 hypothetical proteins- Functional assignment of >3000 hypothetical proteins

- ‘Target’ for transcription regulators, transporters etc.- ‘Target’ for transcription regulators, transporters etc.

- Pathways links (CoA and nucleotide biosynth.)- Pathways links (CoA and nucleotide biosynth.)

Biological discoveriesBiological discoveries

- Missing enzymes in known pathways- Missing enzymes in known pathways

- Potentially novel pathways/processes/complexes- Potentially novel pathways/processes/complexes

- Independent modules within known pathways- Independent modules within known pathways

Information Processing:Translation, Transcription, DNA.

Cellular Processes:Transport, Motility, Signalling

Metabolism:Anabolism, Catabolism, Energy

Unassigned/Uncharacterized,or multiple assignments

Functional Categories (COG):

YeaG

YcgBYeaH

YfbU

YeaH predicted Integrin I domainpredicted ATPase domainYeaG

Functional modules in Functional modules in E.coliE.coli

About 650 modulespredicted (120 metabolic)

About 140 modulesdominated by ‘hypotheticals’

(Only modules with>3 nodes shown)

www.bork.embl-heidelberg.de

II. Protein network analysisII. Protein network analysis

STRING: a framework for network analysisSTRING: a framework for network analysis

Towards spatial and temporal network aspectsTowards spatial and temporal network aspects

Building and destroying interaction networksBuilding and destroying interaction networks

Genomic context analysis: Interaction predictionsGenomic context analysis: Interaction predictions

Building and destroying interaction networksBuilding and destroying interaction networks

STRING: a framework for network analysisSTRING: a framework for network analysis

Functional associations between proteinsFunctional associations between proteins 80.000 from large-scale approaches in yeast80.000 from large-scale approaches in yeast

yeast two-hybrid(Uetz et al., 2000, Ito et al., 2000,2001)

5125

complex-purification(analysis by mass spectrometry)

18027 (TAP)

33014 (HMS-PCI)

mRNA synexpression(cell-cycle + Rosetta data)

~ 15000

in silico predictionsin silico predictions(neighborhood, fusion, cooccurence)(neighborhood, fusion, cooccurence)

~ 7000~ 7000 (~9000 new)(~9000 new)

genetic interactions(synthetic lethality, Tong et.al. 2001 + MIPS)

886

small-scale interactions(MIPS+YPD)

~11000

Binary interactions vs. groups of interacting proteinsBinary interactions vs. groups of interacting proteins

LPD1

ARC1

CDC3

CDC10

SHS1

CIN2

CDC12

CDC11

SPR28

GIN4

TAP purification

two-hybrid interaction

HMS-PCI purification

annotated memberof septin complex

Counting functional associations:Counting functional associations:

EG

P

T

B

F

O

AR

D

M

C

U

EG M P T B F O A R D C U

Distribution of interacting proteins (TAP complexes)Distribution of interacting proteins (TAP complexes)

energy production

aminoacid metabolism

other metabolism

translation

transcription

transcriptional control

protein fate

cellular organization

transport and sensing

stress and defense

genome maintenance

cellular fate/organization

uncharacterized

interaction density

(actual interactions per1000 possible pairs)

0 10

Reference interactionsReference interactions

EG

P

T

B

F

O

AR

D

M

C

U

E G M P T B F O A R D C U

EG

P

T

B

F

O

AR

D

M

C

U

E G M P T B F O A R D C U

manually annotatedprotein complexes:

MIPS / YPD

high-throughputinteraction data:

OVERLAP OF2+ METHODS

10907 interactions10907 interactions 2455 interactions2455 interactions

Protein interaction datasetsProtein interaction datasets

EG

P

T

B

F

O

ARD

M

C

UEG M P T B F O A R D C U

EG

P

T

B

F

O

ARD

M

C

UEG M P T B F O A R D C U

EG

P

T

B

F

O

ARD

M

C

UEG M P T B F O A R D C U

EG

P

T

B

F

O

ARD

M

C

UEG M P T B F O A R D C U

EG

P

T

B

F

O

ARD

M

C

UEG M P T B F O A R D C U

EG

P

T

B

F

O

ARD

M

C

UEG M P T B F O A R D C U

purifiedcomplexes

(TAP)

purifiedcomplexes(HMS-PCI)

genomicassociations

syntheticlethals

yeasttwo-hybrid

mRNAsynexpression

18027 interactions18027 interactions

886 interactions886 interactions5125 interactions5125 interactions16496 interactions16496 interactions

7446 interactions7446 interactions33014 interactions33014 interactions

0.1 11 10 100

0.1

11

10

100

AccuracyAccuracy

Co

vera

ge

Co

vera

ge

purifiedcomplexes

TAP

yeast two-hybrid

two methods

three methods

PurifiedComplexesHMS-PCI

combinedevidence

mRNAsynexpression

genomic associations

syntheticlethality

fracti

on

of

refe

ren

ce s

et

covere

d b

y d

ata

( %

; log

scale

)

fraction of data confirmed by reference set (%; log scale)

filtered data

raw data

parameter choices

Benchmarking high-throughput interaction dataBenchmarking high-throughput interaction data

(update to 89 species)

A probabilistic approach for functionA probabilistic approach for function predictionprediction

Von Mering.C, Von Mering.C, Krause. R, Snel, B., Oliver, S.G., Fields, S. and Bork, PKrause. R, Snel, B., Oliver, S.G., Fields, S. and Bork, P Nature 417(2002)399Nature 417(2002)399

Please show me the functional contextof these proteins?

ATP1

STRING: known and predicted functional links

QCR2

STRING: known and predicted functional links

Ubiquinol-Cyt.C reductase

ATP synthase

QCR2QCR2

High-throughput Experiments

Literature Co-occurrence

Phylogenetic Profiles

Conserved NeighborhoodKnown Pathways/Complexes

Co-expression

www.bork.embl-heidelberg.de

II. Protein network analysisII. Protein network analysis

STRING: a framework for network analysisSTRING: a framework for network analysis

Towards spatial and temporal network aspectsTowards spatial and temporal network aspects

Building and destroying interaction networksBuilding and destroying interaction networks

Genomic context analysis: Interaction predictionsGenomic context analysis: Interaction predictions

STRING: a framework for network analysisSTRING: a framework for network analysis

Towards spatial and temporal network aspectsTowards spatial and temporal network aspects

EMBL’s Structural and Computational Biology unit EMBL’s Structural and Computational Biology unit From molecules to organismsFrom molecules to organisms

Protein/DNAProtein/DNA

ComplexComplex

Subcellular structureSubcellular structure

CellCell

NMRNMR XrayXrayEMEM

3D tomography3D tomography

Cell BiologyCell Biology

Gene expressionGene expression

Developmental BiologyDevelopmental Biology OrganismOrganism

EndosomesPeroxisomes

MitochondriaGolgi

ER

Microtubules+

-

+

++

+

+

+

+

+ +

+

Nucleus

Computational Computational BiologyBiology

SynchrotonsSynchrotons

In red: other EMBL unitsIn red: other EMBL units

CoreCorefacilitiesfacilities

Characterise the domains

Predict whichsubunits interact

Build assembly

TAP (Cellzome)

x300=

(exosome case study: Aloy et al., EMBO Rep, 2002)

From interactions to 3D protein From interactions to 3D protein complexes:complexes:

Large scale modeling and EM mapping Large scale modeling and EM mapping

Side-chain to side-chain

Side-chain to main-chainInterface of 3D structureof interaction

Parameters

Have we seen anyof these domains interacting before?

rulesconstraints

Compatible?EM screen

Aloy, P., Boettcher B., Ceulemans, H., Leutwein, C., Mellwig, C., Fischer, S., Gavin, A.-C., Bork, P., Superti-Furga, G., Serrano, L. and Russell, R.B. Science 303 (2004) 2026

Structure-based assembly of protein complexesStructure-based assembly of protein complexes

From functionalFrom functionalassociationsassociationsto three dimensionalto three dimensionalassembliesassemblies

Analysis of 101Analysis of 101 yeast complexes yeast complexes and their interactionsand their interactions

3D3D

Dynamic complex Dynamic complex formation duringformation duringthe 90 min yeast the 90 min yeast

cell cyclecell cycleMultiple arrays reveal 600 Multiple arrays reveal 600 periodically expressed genesperiodically expressed genes

Projection to interaction dataProjection to interaction dataidentifies novel assemblies identifies novel assemblies

Details on the time dependent Details on the time dependent formation in some assemblies formation in some assemblies revealedrevealed

Some unknown proteins Some unknown proteins detected in well-studied cell detected in well-studied cell cycle assembliescycle assemblies

ColorColor: : periodicallyperiodicallyexpressed proteinsexpressed proteins

4D4D

Lichtenberg, Lichtenberg, Larsen et alLarsen et al

www.bork.embl-heidelberg.de

Losses/Gains of Functional Associations Losses/Gains of Functional Associations

M. pneumoniae & M. pneumoniae & M. genitaliumM. genitalium

M. pneumoniae onlyM. pneumoniae only

(Linked by conserved neighborhood or fused proteins,

combined score >0.95)

www.bork.embl-heidelberg.de

urease enzyme complex

ABC-type phosphate transport system (incl. regulator)

fructose-specific phosphotransferase system (plus assoc. enzymes)

ribose/xylose sugar-transport

glycerol metabolism

M. pneumoniaeM. pneumoniae

M. pulmonisM. pulmonis

U. parvumU. parvum

++

++

++

++

++

++

++

++

++

++

++

++

GeneGenepresent (+)present (+)

DifferentialDifferentialanalysisanalysis

Comparison of the interaction networks in three mollicutesComparison of the interaction networks in three mollicutes

TCA cycleTCA cycle

Modification of Modification of functional modules functional modules at evolutionary at evolutionary time scalestime scales

Huynen et al TIM 1999Huynen et al TIM 1999

Summary (network analysis)Summary (network analysis)

Gene context methods have already ca 90% specificity/70% Gene context methods have already ca 90% specificity/70% sensitivity in predicting functional modules in prokaryotessensitivity in predicting functional modules in prokaryotes

Gene context and other concepts for interaction predictions not Gene context and other concepts for interaction predictions not only complement homology approaches, but are about to offer more only complement homology approaches, but are about to offer more functional information than blast functional information than blast et al. et al.

In eukaryotes, accurate prediction of networks and modules is still In eukaryotes, accurate prediction of networks and modules is still difficult and heterogenous expermental data have to be integrated difficult and heterogenous expermental data have to be integrated

Spatial and temporal aspects of protein networks have a Spatial and temporal aspects of protein networks have a great potential although data are still limitedgreat potential although data are still limited

Context methodsContext methods

Functional modulesFunctional modules

STRINGSTRING

Networks in 3DNetworks in 3D

Network in4DNetwork in4DUlrik de Lichtenberg,Ulrik de Lichtenberg,Soren Brunak (CBS)Soren Brunak (CBS)

Christos Ouzounis et al. (EBI)Christos Ouzounis et al. (EBI)

Rob Russell, Rob Russell, Pattrick Aloy,Pattrick Aloy,Bettina Boettcher (EMBL)Bettina Boettcher (EMBL)Cellzome AGCellzome AG

Berend Snel,Berend Snel,Martijn Huynen (Nejm)Martijn Huynen (Nejm)

Enrique Morett et al. (Mex)Enrique Morett et al. (Mex)

CreditsCredits

+ all other group members + many experim. collaborators+ all other group members + many experim. collaborators

Chicken international Chicken international sequencing and analysissequencing and analysisconsortiumconsortium