part ii protein interactions and networks bork@embl.de peer bork embl & mdc heidelberg &...
Post on 22-Dec-2015
219 Views
Preview:
TRANSCRIPT
Part IIPart IIProtein interactions Protein interactions
and networksand networks
bork@embl.dehttp://www.bork.embl-heidelberg.de/
Peer BorkPeer BorkEMBL & MDCEMBL & MDC
Heidelberg & BerlinHeidelberg & Berlin
Proteome analysis Proteome analysis in silicoin silico
www.bork.embl-heidelberg.de
II. Protein network analysisII. Protein network analysis
STRING: a framework for network analysisSTRING: a framework for network analysis
Towards spatial and temporal network aspectsTowards spatial and temporal network aspects
Building and destroying interaction networksBuilding and destroying interaction networks
Genomic context analysis: Interaction predictionsGenomic context analysis: Interaction predictionsGenomic context analysis: Interaction predictionsGenomic context analysis: Interaction predictions
Genomic context methods to predict protein interactionsGenomic context methods to predict protein interactions
Korbel et al., Nat. Biotechn. 04 Morett et al., Nat. Biotechn. 03
Dandekar et al. TIBS 98Overbeek et al. PNAS 99 Pellegrini et al. PNAS 99
Enright et al. Nature 99Marcotte et al. Science 99
Prediction of analogous enzymesPrediction of analogous enzymesby anti-correlation of gene occurrencesby anti-correlation of gene occurrences
Species A B C DSpecies A B C DGene a Gene a -- ++ -- - -Gene b Gene b ++ -- ++ - -
Collaboration with Collaboration with Enrique Morett Enrique Morett et alet al., Mexico., Mexico
Application: Application: thiamine-PP thiamine-PP biosynthesisbiosynthesis
Morett et al., Morett et al., Nature Biotech. 21(03)790Nature Biotech. 21(03)790
Gene Gene neighbourhood neighbourhood conservation at conservation at evolutionary time evolutionary time scalesscales
Conservation of Conservation of divergently transcribed divergently transcribed gene pairs reveal gene pairs reveal functional constraintsfunctional constraints
The more conserved The more conserved divergently transcribed divergently transcribed neighboring genes are,neighboring genes are,the higher is their level the higher is their level of co-expressionof co-expression
The resulting prediction method The resulting prediction method can reliably predict associations can reliably predict associations between>2500 pairs of genes; between>2500 pairs of genes; ca 650 of which are supported ca 650 of which are supported by other methodsby other methods
Korbel, Jensen, von Mering, BorkKorbel, Jensen, von Mering, BorkNat. Biotechnol. 2004, JulyNat. Biotechnol. 2004, July
Transcriptional regulators comprise the majority Transcriptional regulators comprise the majority of conserved divergently transcribed gene pairs of conserved divergently transcribed gene pairs
They areThey areall Self-all Self-Regulatory !Regulatory !
Coverage: Homology Coverage: Homology vs.vs. context context
(80% accuracy level,(80% accuracy level,taken from STRING taken from STRING COG mode)COG mode)
Huynen, Snel, von Mering and Bork . Curr.Opin.Cell.Biol. 15(03)191Huynen, Snel, von Mering and Bork . Curr.Opin.Cell.Biol. 15(03)191
www.bork.embl-heidelberg.de
II. Protein network analysisII. Protein network analysis
STRING: a framework for network analysisSTRING: a framework for network analysis
Towards spatial and temporal network aspectsTowards spatial and temporal network aspects
Building and destroying interaction networksBuilding and destroying interaction networks
Genomic context analysis: Interaction predictionsGenomic context analysis: Interaction predictionsGenomic context analysis: Interaction predictionsGenomic context analysis: Interaction predictions
Building and destroying interaction networksBuilding and destroying interaction networks
Three context methods Three context methods toto predict functional interactions predict functional interactions
Phylogenetic Co-occurence
Conserved Neighborhood
Gene fusion events
…allowing the study of networks
combined and quantified in STRINGcombined and quantified in STRINGVon Mering et al. NAR 31(03)258Von Mering et al. NAR 31(03)258
pathway representationpathway representation
comparative genomics: comparative genomics: functional modulesfunctional modules
purinepurinebiosynthesisbiosynthesis
histidinehistidinebiosynthesisbiosynthesis
Biochemical pathways Biochemical pathways vsvs functional modules functional modules
www.string.embl-heidelberg.dewww.string.embl-heidelberg.de
Giant component ofGiant component ofgene context networkgene context network
The more conservation (red) the higher the number of connections
High local connectivity,(c=0.6);hence lot ofsubstructure
pathway representationpathway representation
comparative genomics: comparative genomics: functional modulesfunctional modules
purinepurinebiosynthesisbiosynthesis
histidinehistidinebiosynthesisbiosynthesis
unsupervisedunsupervisedclusteringclustering
Biochemical pathways Biochemical pathways vsvs functional modules functional modules
Coverage: >70%Coverage: >70%
Specificity: ca 90%Specificity: ca 90% Von Mering et al. PNAS 100 (2003) 15428
- Functional assignment of >3000 hypothetical proteins- Functional assignment of >3000 hypothetical proteins
- ‘Target’ for transcription regulators, transporters etc.- ‘Target’ for transcription regulators, transporters etc.
- Pathways links (CoA and nucleotide biosynth.)- Pathways links (CoA and nucleotide biosynth.)
Biological discoveriesBiological discoveries
- Missing enzymes in known pathways- Missing enzymes in known pathways
- Potentially novel pathways/processes/complexes- Potentially novel pathways/processes/complexes
- Independent modules within known pathways- Independent modules within known pathways
STRING STRING annotationsannotations
UracilPermease
Uncharacterized
Pyrimidinebiosynthesis
knownknown
Query protein:Known transcriptional regulator PyrR
Query protein:Putative transcriptional regulator, uncharacterized
Riboflavinbiosynthesis
Uncharacterizedresponse regulator
novelnovel
Doerks et al. TIG, 2004Doerks et al. TIG, 2004
Synergies between homology and context based methodsSynergies between homology and context based methods
www.bork.embl-heidelberg.de
- Functional assignment of >3000 hypothetical proteins- Functional assignment of >3000 hypothetical proteins
- ‘Target’ for transcription regulators, transporters etc.- ‘Target’ for transcription regulators, transporters etc.
- Pathways links (CoA and nucleotide biosynth.)- Pathways links (CoA and nucleotide biosynth.)
Biological discoveriesBiological discoveries
- Missing enzymes in known pathways- Missing enzymes in known pathways
- Potentially novel pathways/processes/complexes- Potentially novel pathways/processes/complexes
- Independent modules within known pathways- Independent modules within known pathways
Information Processing:Translation, Transcription, DNA.
Cellular Processes:Transport, Motility, Signalling
Metabolism:Anabolism, Catabolism, Energy
Unassigned/Uncharacterized,or multiple assignments
Functional Categories (COG):
YeaG
YcgBYeaH
YfbU
YeaH predicted Integrin I domainpredicted ATPase domainYeaG
Functional modules in Functional modules in E.coliE.coli
About 650 modulespredicted (120 metabolic)
About 140 modulesdominated by ‘hypotheticals’
(Only modules with>3 nodes shown)
www.bork.embl-heidelberg.de
II. Protein network analysisII. Protein network analysis
STRING: a framework for network analysisSTRING: a framework for network analysis
Towards spatial and temporal network aspectsTowards spatial and temporal network aspects
Building and destroying interaction networksBuilding and destroying interaction networks
Genomic context analysis: Interaction predictionsGenomic context analysis: Interaction predictions
Building and destroying interaction networksBuilding and destroying interaction networks
STRING: a framework for network analysisSTRING: a framework for network analysis
Functional associations between proteinsFunctional associations between proteins 80.000 from large-scale approaches in yeast80.000 from large-scale approaches in yeast
yeast two-hybrid(Uetz et al., 2000, Ito et al., 2000,2001)
5125
complex-purification(analysis by mass spectrometry)
18027 (TAP)
33014 (HMS-PCI)
mRNA synexpression(cell-cycle + Rosetta data)
~ 15000
in silico predictionsin silico predictions(neighborhood, fusion, cooccurence)(neighborhood, fusion, cooccurence)
~ 7000~ 7000 (~9000 new)(~9000 new)
genetic interactions(synthetic lethality, Tong et.al. 2001 + MIPS)
886
small-scale interactions(MIPS+YPD)
~11000
Binary interactions vs. groups of interacting proteinsBinary interactions vs. groups of interacting proteins
LPD1
ARC1
CDC3
CDC10
SHS1
CIN2
CDC12
CDC11
SPR28
GIN4
TAP purification
two-hybrid interaction
HMS-PCI purification
annotated memberof septin complex
Counting functional associations:Counting functional associations:
EG
P
T
B
F
O
AR
D
M
C
U
EG M P T B F O A R D C U
Distribution of interacting proteins (TAP complexes)Distribution of interacting proteins (TAP complexes)
energy production
aminoacid metabolism
other metabolism
translation
transcription
transcriptional control
protein fate
cellular organization
transport and sensing
stress and defense
genome maintenance
cellular fate/organization
uncharacterized
interaction density
(actual interactions per1000 possible pairs)
0 10
Reference interactionsReference interactions
EG
P
T
B
F
O
AR
D
M
C
U
E G M P T B F O A R D C U
EG
P
T
B
F
O
AR
D
M
C
U
E G M P T B F O A R D C U
manually annotatedprotein complexes:
MIPS / YPD
high-throughputinteraction data:
OVERLAP OF2+ METHODS
10907 interactions10907 interactions 2455 interactions2455 interactions
Protein interaction datasetsProtein interaction datasets
EG
P
T
B
F
O
ARD
M
C
UEG M P T B F O A R D C U
EG
P
T
B
F
O
ARD
M
C
UEG M P T B F O A R D C U
EG
P
T
B
F
O
ARD
M
C
UEG M P T B F O A R D C U
EG
P
T
B
F
O
ARD
M
C
UEG M P T B F O A R D C U
EG
P
T
B
F
O
ARD
M
C
UEG M P T B F O A R D C U
EG
P
T
B
F
O
ARD
M
C
UEG M P T B F O A R D C U
purifiedcomplexes
(TAP)
purifiedcomplexes(HMS-PCI)
genomicassociations
syntheticlethals
yeasttwo-hybrid
mRNAsynexpression
18027 interactions18027 interactions
886 interactions886 interactions5125 interactions5125 interactions16496 interactions16496 interactions
7446 interactions7446 interactions33014 interactions33014 interactions
0.1 11 10 100
0.1
11
10
100
AccuracyAccuracy
Co
vera
ge
Co
vera
ge
purifiedcomplexes
TAP
yeast two-hybrid
two methods
three methods
PurifiedComplexesHMS-PCI
combinedevidence
mRNAsynexpression
genomic associations
syntheticlethality
fracti
on
of
refe
ren
ce s
et
covere
d b
y d
ata
( %
; log
scale
)
fraction of data confirmed by reference set (%; log scale)
filtered data
raw data
parameter choices
Benchmarking high-throughput interaction dataBenchmarking high-throughput interaction data
(update to 89 species)
A probabilistic approach for functionA probabilistic approach for function predictionprediction
Von Mering.C, Von Mering.C, Krause. R, Snel, B., Oliver, S.G., Fields, S. and Bork, PKrause. R, Snel, B., Oliver, S.G., Fields, S. and Bork, P Nature 417(2002)399Nature 417(2002)399
Please show me the functional contextof these proteins?
ATP1
STRING: known and predicted functional links
QCR2
STRING: known and predicted functional links
Ubiquinol-Cyt.C reductase
ATP synthase
QCR2QCR2
High-throughput Experiments
Literature Co-occurrence
Phylogenetic Profiles
Conserved NeighborhoodKnown Pathways/Complexes
Co-expression
www.bork.embl-heidelberg.de
II. Protein network analysisII. Protein network analysis
STRING: a framework for network analysisSTRING: a framework for network analysis
Towards spatial and temporal network aspectsTowards spatial and temporal network aspects
Building and destroying interaction networksBuilding and destroying interaction networks
Genomic context analysis: Interaction predictionsGenomic context analysis: Interaction predictions
STRING: a framework for network analysisSTRING: a framework for network analysis
Towards spatial and temporal network aspectsTowards spatial and temporal network aspects
EMBL’s Structural and Computational Biology unit EMBL’s Structural and Computational Biology unit From molecules to organismsFrom molecules to organisms
Protein/DNAProtein/DNA
ComplexComplex
Subcellular structureSubcellular structure
CellCell
NMRNMR XrayXrayEMEM
3D tomography3D tomography
Cell BiologyCell Biology
Gene expressionGene expression
Developmental BiologyDevelopmental Biology OrganismOrganism
EndosomesPeroxisomes
MitochondriaGolgi
ER
Microtubules+
-
+
++
+
+
+
+
+ +
+
Nucleus
Computational Computational BiologyBiology
SynchrotonsSynchrotons
In red: other EMBL unitsIn red: other EMBL units
CoreCorefacilitiesfacilities
Characterise the domains
Predict whichsubunits interact
Build assembly
TAP (Cellzome)
x300=
(exosome case study: Aloy et al., EMBO Rep, 2002)
From interactions to 3D protein From interactions to 3D protein complexes:complexes:
Large scale modeling and EM mapping Large scale modeling and EM mapping
Side-chain to side-chain
Side-chain to main-chainInterface of 3D structureof interaction
Parameters
Have we seen anyof these domains interacting before?
rulesconstraints
Compatible?EM screen
Aloy, P., Boettcher B., Ceulemans, H., Leutwein, C., Mellwig, C., Fischer, S., Gavin, A.-C., Bork, P., Superti-Furga, G., Serrano, L. and Russell, R.B. Science 303 (2004) 2026
Structure-based assembly of protein complexesStructure-based assembly of protein complexes
From functionalFrom functionalassociationsassociationsto three dimensionalto three dimensionalassembliesassemblies
Analysis of 101Analysis of 101 yeast complexes yeast complexes and their interactionsand their interactions
3D3D
Dynamic complex Dynamic complex formation duringformation duringthe 90 min yeast the 90 min yeast
cell cyclecell cycleMultiple arrays reveal 600 Multiple arrays reveal 600 periodically expressed genesperiodically expressed genes
Projection to interaction dataProjection to interaction dataidentifies novel assemblies identifies novel assemblies
Details on the time dependent Details on the time dependent formation in some assemblies formation in some assemblies revealedrevealed
Some unknown proteins Some unknown proteins detected in well-studied cell detected in well-studied cell cycle assembliescycle assemblies
ColorColor: : periodicallyperiodicallyexpressed proteinsexpressed proteins
4D4D
Lichtenberg, Lichtenberg, Larsen et alLarsen et al
www.bork.embl-heidelberg.de
Losses/Gains of Functional Associations Losses/Gains of Functional Associations
M. pneumoniae & M. pneumoniae & M. genitaliumM. genitalium
M. pneumoniae onlyM. pneumoniae only
(Linked by conserved neighborhood or fused proteins,
combined score >0.95)
www.bork.embl-heidelberg.de
urease enzyme complex
ABC-type phosphate transport system (incl. regulator)
fructose-specific phosphotransferase system (plus assoc. enzymes)
ribose/xylose sugar-transport
glycerol metabolism
M. pneumoniaeM. pneumoniae
M. pulmonisM. pulmonis
U. parvumU. parvum
++
++
++
++
++
++
++
++
++
++
++
++
GeneGenepresent (+)present (+)
DifferentialDifferentialanalysisanalysis
Comparison of the interaction networks in three mollicutesComparison of the interaction networks in three mollicutes
TCA cycleTCA cycle
Modification of Modification of functional modules functional modules at evolutionary at evolutionary time scalestime scales
Huynen et al TIM 1999Huynen et al TIM 1999
Summary (network analysis)Summary (network analysis)
Gene context methods have already ca 90% specificity/70% Gene context methods have already ca 90% specificity/70% sensitivity in predicting functional modules in prokaryotessensitivity in predicting functional modules in prokaryotes
Gene context and other concepts for interaction predictions not Gene context and other concepts for interaction predictions not only complement homology approaches, but are about to offer more only complement homology approaches, but are about to offer more functional information than blast functional information than blast et al. et al.
In eukaryotes, accurate prediction of networks and modules is still In eukaryotes, accurate prediction of networks and modules is still difficult and heterogenous expermental data have to be integrated difficult and heterogenous expermental data have to be integrated
Spatial and temporal aspects of protein networks have a Spatial and temporal aspects of protein networks have a great potential although data are still limitedgreat potential although data are still limited
Context methodsContext methods
Functional modulesFunctional modules
STRINGSTRING
Networks in 3DNetworks in 3D
Network in4DNetwork in4DUlrik de Lichtenberg,Ulrik de Lichtenberg,Soren Brunak (CBS)Soren Brunak (CBS)
Christos Ouzounis et al. (EBI)Christos Ouzounis et al. (EBI)
Rob Russell, Rob Russell, Pattrick Aloy,Pattrick Aloy,Bettina Boettcher (EMBL)Bettina Boettcher (EMBL)Cellzome AGCellzome AG
Berend Snel,Berend Snel,Martijn Huynen (Nejm)Martijn Huynen (Nejm)
Enrique Morett et al. (Mex)Enrique Morett et al. (Mex)
CreditsCredits
+ all other group members + many experim. collaborators+ all other group members + many experim. collaborators
Chicken international Chicken international sequencing and analysissequencing and analysisconsortiumconsortium
top related