philip m. kim, ph.d. yale university
DESCRIPTION
3-D Structural Analysis of Protein Interaction Networks Gives New Insight Into Protein Function, Network Topology and Evolution. Philip M. Kim, Ph.D. Yale University. GCB 2006, Tuebingen, Germany September 21st, 2006. MOTIVATION. ILLUSTRATIVE. Network perspective:. =. - PowerPoint PPT PresentationTRANSCRIPT
3-D Structural Analysis of Protein Interaction Networks Gives New Insight Into Protein Function, Network Topology and Evolution
Philip M. Kim, Ph.D.Yale University
GCB 2006, Tuebingen, GermanySeptember 21st, 2006
060921_GCB2006_Talk_PMK
2
MOTIVATION
≠
AB1-4
Cdk/cyclin complex Part of the RNA-pol complex
ILLUSTRATIVE
A
B1
B2
B3
B4
Network perspective:
Structural biology perspective:
=
There remains a rich sourceof knowledge unmined by network
theorists!
060921_GCB2006_Talk_PMK
3
OUTLINE
Interaction Networks and their properties
Network properties revisited
A 3-D structural point of view
Conclusions
060921_GCB2006_Talk_PMK
4
OUTLINE
Interaction Networks and their properties
Network properties revisited
A 3-D structural point of view
Conclusions
060921_GCB2006_Talk_PMK
5
PROTEIN INTERACTION NETWORKS IN YEAST
Source: Gavin et al. Nature (2002), Uetz et al. Nature (2000), Cytoscape and DIP
• Determined by:
– Large-scale Yeast-two-hydrid
– TAP-Tagging
– Literature curation
• Currently over 20,000 unique interactions available in yeast
• Spawned a field of computational “graph theory” analyses that view proteins as “nodes” and interactions as “edges”
A snapshot of the current interactome Description and methodologies
ILLUSTRATIVE
DIP (Database of interacting Proteins)
060921_GCB2006_Talk_PMK
6
TINY GLOSSARY: DEGREE AND HUBS
C: Degree = 1A: Degree = 5
A is a “Hub”*
*The definition of hubs is somewhat arbitrary, usually a cutoff is used
Source: PMK
Topology is dominatedby hubs!
(“Scale-free”)
060921_GCB2006_Talk_PMK
7
HUBS TEND TO BE IMPORTANT PROTEINS, THEY ARE MORE LIKELY TO BE ESSENTIAL PROTEINS AND TEND TO BE MORE CONSERVED
Source: Jeong et al. Nature (2001), Yu et al. TiG (2004) and Fraser et al. Science (2002)
• By now it is well documented that proteins with a large degree tend to be essential proteins in yeast.
(“Hubs are essential”)
• Likewise, it has been found that hubs tend to evolve more slowly than other proteins
(“Hubs are slower evolving”)
There is some controversy regarding
this relationship
060921_GCB2006_Talk_PMK
8
THERE IS A RELATIONSHIP BETWEEN NETWORK TOPOLOGY AND GENE EXPRESSION DYNAMICS
Source: Han et al. Nature (2004) and Yu*, Kim* et al. (Submitted)
Frequency
Co-expression correlation
060921_GCB2006_Talk_PMK
9
SCALE FREENESS GENERALLY EVOLVES THROUGH PREFERENTIAL ATTACHMENT (THE RICH GET RICHER)
Source: Albert et al. Rev. Mod. Phys. (2002) and Middendorf et al. PNAS (2005)
• Theoretical work shows that a mechanism of preferential attachment leads to a scale-free topology
(“The rich get richer”)
The Duplication Mutation Model Description
ILLUSTRATIVE
• In interaction network, gene duplication followed by mutation of the duplicated gene is generally thought to lead to preferential attachment
• Simple reasoning: The partners of a hub are more likely to be duplicated than the partners of a non-hub
Gene duplication
The interaction partners of A are more likely to beduplicated
060921_GCB2006_Talk_PMK
10
OUTLINE
Interaction Networks and their properties
Network properties revisited
A 3-D structural point of view
Conclusions
060921_GCB2006_Talk_PMK
11
THERE IS A PROBLEM WITH SCALE-FREENESS AND REALLY BIG HUBS IN INTERACTION NETWORKS
Source: DIP, Institut fuer Festkoerperchemie (Univ. Tuebingen)
A really big hub (>200 Interactions)
Gedankenexperiment
How many maximum neighbors can a protein have?
• Clearly, a protein is very unlikely to have >200 simultaneous interactors.
• Some of the >200 are most likely false positives
• Some others are going to be mutually exclusive interactors (i.e. binding to the same interface).
Conclusion
• There appears to be an obvious discrepancy between >200 and 12.
ILLUSTRATIVEWouldn’t it be great to
be able to see the differentbinding interfaces?
060921_GCB2006_Talk_PMK
12
UTILIZING PROTEIN CRYSTAL STRUCTURES, WE CAN DISTINGUISH THE DIFFERENT BINDING INTERFACES
*Many redundant structures
Source: PMK
ILLUSTRATIVE
InteractomeUse a high-confidencefilter
Map Pfam domains to all proteins in the interactome
Distinguish interfaces
Combine with all structures of yeast protein complexes
Annotate interactionswith available structures,discard all others
PDB
Homology mappingof Pfam domainsto all structures of interactions
~10000 Structures of interactions*
~20000 interactions
060921_GCB2006_Talk_PMK
13
THAT IS HOW THE RESULTING NETWORK LOOKS LIKE
Source: PDB, Pfam, iPfam and PMK
• Represents a “very high confidence” network
• Total of 873 nodes and 1269 interactions, each of which is structurally characterized
• 438 interactions are classified as mutually exclusive and 831 as simultaneously possible
• While much smaller than DIP, it is of similar size as other high-confidence datasets
The Structural Interaction Dataset (SID) Properties
060921_GCB2006_Talk_PMK
14
OUTLINE
Interaction Networks and their properties
Network properties revisited
A 3-D structural point of view
Conclusions
060921_GCB2006_Talk_PMK
15
THERE DO NOT APPEAR TO BE THE KINDS OF REALLY BIG HUBS AS SEEN BEFORE – IS THE TOPOLOGY STILL SCALE-FREE?
Source: PMK
• With the maximum number of interactions at 13, there are no “really big hubs” in this network
• Note that in other high-confidence datasets (or similar size), there are still proteins with a much higher degree
• The degree distribution appears to top out much earlier and less scale free than that of other networks
Degree distribution Properties
060921_GCB2006_Talk_PMK
16
Entire genomeAll proteins
In our dataset
64.9%
31.8%32.3%15.1%
Single-interface hubs only
Multi-interface hubs only
Percentage ofessential proteins
IT’S REALLY ONLY THE MULTI-INTERFACE HUBS THAT ARE SIGNIFICANTLY MORE LIKELY TO BE ESSENTIAL
Source: PMK
060921_GCB2006_Talk_PMK
17
All proteinsIn our dataset
Single-interface hubs only
Multi-interface hubs only
ExpressionCorrelation
0.20.17
0.25
Expression correlation
DATE-HUBS AND PARTY-HUBS ARE REALLY SINGLE-INTERFACE AND MULTI-INTERFACE HUBS
Source: Han et al. Nature (2004) and PMK
Frequency
060921_GCB2006_Talk_PMK
18
AND ONLY MULTI-INTERFACE PROTEINS ARE EVOLVING SLOWER, SINGLE-INTERFACE HUBS DO NOT
Entire genomeAll proteins
In our datasetSingle-interface
hubs onlyMulti-interface
hubs only
EvolutionaryRate (dN/dS)
0.029
0.077
0.047 0.051
Source: PMK
060921_GCB2006_Talk_PMK
19
IN FACT, EVOLUTIONARY RATE CORRELATES BEST WITH THE FRACTION OF INTERFACE AVAILABLE SURFACE AREA
Source: PMK
DATA IN BINS
Small portion of surface area involved in interfaces – fast evolving
Large portion of surface area involved in interfaces – slow evolving
060921_GCB2006_Talk_PMK
20
IS THERE A DIFFERENCE BETWEEN SINGLE-INTERFACE HUBS AND MULTI-INTERFACE HUBS WITH RESPECT TO NETWORK EVOLUTION?
Source: PMK
The Duplication Mutation Model
Gene duplication
The interaction partners of A are more likely to beduplicated
In the structural viewpoint
If these models were correct,there would be an enrichment of
paralogs among B
060921_GCB2006_Talk_PMK
21
0.00%
0.15%
0.07%
0.003%
Random pair
Same partner
Same partnerdifferent interface
Same partnersame interface
Fraction of paralogsbetween pairs of proteins
MULTI-INTERFACE HUBS DO NOT APPEAR TO EVOLVE BY A GENE DUPLICATION – THE DUPLICATION MUTATION MODEL CAN ONLY EXPLAIN THE EXISTENCE OF SINGLE-INTERFACE HUBS
Source: PMK
But that also means that the duplication-mutation modelcannot explain the full current
interaction network!
060921_GCB2006_Talk_PMK
22
OUTLINE
Interaction Networks and their properties
Network properties revisited
A 3-D structural point of view
Conclusions
060921_GCB2006_Talk_PMK
23
CONCLUSIONS
• The topology of a direct physical interaction network is much less dominated by hubs than previously thought
• Several genomic features that were previously thought to be correlated with the degree are in fact related to the number of interfaces and not the degree
• Specifically, a proteins evolutionary rate appears to be dependent on the fraction of surface area involved in interactions rather than the degree
• The current network growth model can only explain a part of currently known networks
PRELIMINARY
Source: PMK
060921_GCB2006_Talk_PMK
24
ACKNOWLEDGEMENTS
Mark Gerstein
Long Jason Lu
Yu Brandon Xia
The Gersteinlab, in particular:
Jan Korbel
Joel Rozowsky
Tom Royce
060921_GCB2006_Talk_PMK
25
BACKUP
060921_GCB2006_Talk_PMK
26
INTERESTING PROPERTIES OF INTERACTION NETWORKS
Source: Various, see following slides
Network topology
Network Evolution
Relationship of topology and genomic features
Examples of studies
• What distribution does the degree (number of interaction partners) follow?
• What is the relationship between the degree and a proteins essentiality?
• Is there a relationship between a proteins connectivity and expression profile?
• What is the relationship between a proteins evolutionary rate and its degree?
• How did the observed network topology evolve?
OVERVIEW
060921_GCB2006_Talk_PMK
27
INTERACTION NETWORKS ARE SCALE-FREE – THEIR TOPOLOGY IS DOMINATED BY SO-CALLED HUBS
Source: Barabasi, A. and Albert, R., Science (1999)
• So-called scale-free topology has been observed in many kinds of networks (among them interaction networks)
• Scale freeness: A small number of hubs and a large number of poorly connected ones (“Power-law behavior”)
• Topology is dominated by “hubs”
• Scale-freeness is in stark contrast to normal (gaussian) distribution
p(k) ~ kγ
060921_GCB2006_Talk_PMK
28
• But the “Yes” side appears to be winning
… OR ARE THEY? THERE IS AN ONGOING DEBATE ABOUT THE RELATIONSHIP BETWEEN EVOLUTIONARY RATE AND DEGREE
Source: See text
Yes, hubs are more conserved
• Fraser et al. Science (2002)
• Fraser et al. BMC Evol. Biol. (2003)
• Wuchty Genome Res. (2004)
• Jordan et al. Genome Res. (2002)
• Hahn et al. J. Mol. Evol. (2004)
• Jordan et al. BMC Evol. Biol. (2003)
No, the relationship is unclear
?
EXAMPLES
• Fraser Nature Genetics (2005)
060921_GCB2006_Talk_PMK
29
SHORT DIGRESSION: THIS ALLOWS US TO DISTINGUISH SYSTEMATICALLY BETWEEN SIMULTANEOUSLY POSSIBLE AND MUTUALLY EXCLUSIVE INTERACTIONS
Simultaneouslypossible
interactions
Mutuallyexclusive
interactions
Source: PMK
060921_GCB2006_Talk_PMK
30
Mutuallyexclusive
interactions
Simultaneouslypossible
interactions
0.24
0.14Fractionsame biologicalprocess
p<<0.001
Fractionsamemolecularfunction
p<<0.001
Mutuallyexclusive
interactions
Simultaneouslypossible
interactions
Co-expressioncorrelation
p<<0.001
0.33
0.18
0.23
0.17
Fractionsamecellularcomponent
p<<0.001
0.27
0.12
SIMULTANEOUSLY POSSIBLE INTERACTIONS (“PERMANENT”) MORE OFTEN LINK PROTEINS THAT ARE FUNCTIONALLY SIMILAR, COEXPRESSED AND CO-LOCATED
Source: PMK
060921_GCB2006_Talk_PMK
31
REMEMBER THE NETWORK PROPERTIES AS WE DESCRIBED BEFORE?
Source: Various, see following slides
Network topology
Network Evolution
Relationship of topology and genomic features
Examples of studies
• What distribution does the degree (number of interaction partners follow?)
• Does the network easily separate into more than one component?
• What is the relationship between the degree and a proteins essentiality?
• Is there a relationship between a proteins connectivity and expression profile?
• What is the relationship between a proteins evolutionary rate and its degree?
• How did the observed network topology evolve?
OVERVIEW
060921_GCB2006_Talk_PMK
32
• But the “Yes” side appears to be winning
… OR ARE THEY? THERE IS AN ONGOING DEBATE ABOUT THE RELATIONSHIP BETWEEN EVOLUTIONARY RATE AND DEGREE
Source: See text
Yes, hubs are more conserved
• Fraser et al. Science (2002)
• Fraser et al. BMC Evol. Biol. (2003)
• Wuchty Genome Res. (2004)
• Jordan et al. Genome Res. (2002)
• Hahn et al. J. Mol. Evol. (2004)
• Jordan et al. BMC Evol. Biol. (2003)
No, the relationship is unclear
?
This debate may have arisenbecause the two different sides were
all looking at the wrong variable!
060921_GCB2006_Talk_PMK
33
OUT
060921_GCB2006_Talk_PMK
34
UTILIZING PROTEIN CRYSTAL STRUCTURES, WE CAN DISTINGUISH THE DIFFERENT BINDING INTERFACES
Source: PMK
Combine with all structures of yeast protein complexes
• Start with high-confidence interactome dataset
• Collected dimer and multimer structures and mapped Pfam domains onto the corresponding proteins
• Removed ubiquitous domains (e.g., WD40)
• All interactions that contain Pfam domains found to interact in a crystal structure are annotated with this structural information (all others are removed)
• Dataset: ~1269 interactions (combined with all structures that were from yeast).
ILLUSTRATIVE
Pfam -- Homology
Explain methodology….
060921_GCB2006_Talk_PMK
35
UTILIZING PROTEIN CRYSTAL STRUCTURES, WE CAN DISTINGUISH THE DIFFERENT BINDING INTERFACES
*Many redundant structures
Source: PMK
ILLUSTRATIVE
PDB
Interactome
Homology mappingof Pfam domainsto all structures of interactions
Use a high-confidencefilter
Map Pfam domains to all proteins in the interactome
Distinguish interfaces
Combine with all structures of yeast protein complexes
Annotate interactionswith available structures,discard all others
~10000 Structures of interactions*
~20000 interactions
060921_GCB2006_Talk_PMK
36
SOME NETWORK STATISTICS – SCALE FREENESS?
Source: PMK
• In the Pfam dataset, the vast majority (570 out of 790) of the proteins (even hubs) has only one distinct interface.
• 220 proteins (~25%) have 2 or more interfaces.
• Most hubs are mediated by promiscuous interfaces rather than many interfaces ~ 2.6 interactions/interface
MaxDegree
161 nodes(degree >5)
220 nodes(numint>1) 6.0
1.4
3.5
19
MaxInterfaces
Avg.Degree
Avg.Interfaces
060921_GCB2006_Talk_PMK
37
UTILIZING PROTEIN CRYSTAL STRUCTURES, WE CAN DISTINGUISH THE DIFFERENT BINDING INTERFACES
Source: PMK
ILLUSTRATIVE
PDB
Interactome
060921_GCB2006_Talk_PMK
38
CLIQUES, K-PLEXES AND K-CORES IN SOCIAL NETWORKS
…
Source:…
• …
…
• …
…
• …
…
060921_GCB2006_Talk_PMK
39
AUTOMORPHIC EQUIVALENCE
Source: …
• …
… …
060921_GCB2006_Talk_PMK
40
NETWORKS IN MANAGEMENT SCIENCE - THE FIELD OF ORGANIZATION THEORY
… …
060921_GCB2006_Talk_PMK
41
DECISION MAKING IN ORGANIZATIONS: DECENTRALIZATION OF CERTAIN ISSUES
… …
060921_GCB2006_Talk_PMK
42
GROWING ORGANIZATIONS NEED TO DEPARTMENTALIZE
Source: …
• …
… …
060921_GCB2006_Talk_PMK
43
DOES SIZE MATTER?
Source: …
• …
… …
060921_GCB2006_Talk_PMK
44
ENVIRONMENTAL EFFECTS ON ORGANIZATIONAL STRUCTURE
* …
Source: …
…
• …
…
• …
…
…
…
• …
• …
• …
060921_GCB2006_Talk_PMK
45
• …
FIVE DIFFERENT ORGANIZATIONAL CONFIGURATIONS
* …
Source: …
…
…
…
…
…
• …
• …
• …
• …
• …
• …
• …
• …
…
• …
• …
• …
• …
…