modularity in biological networks
DESCRIPTION
Modularity in Biological networks. Traditional view of modularity:. Modularity in Cellular Networks. Hypothesis: Biological function are carried by discrete functional modules. Hartwell, L.-H., Hopfield, J. J., Leibler, S., & Murray, A. W., Nature , 1999. - PowerPoint PPT PresentationTRANSCRIPT
Modularity in Biological networks
Hypothesis: Biological function are carried by discrete functional modules.
Hartwell, L.-H., Hopfield, J. J., Leibler, S., & Murray, A. W., Nature, 1999.
Question: Is modularity a myth, or a structural property of biological networks?(are biological networks fundamentally modular?)
Modularity in Cellular Networks
Traditional view of modularity:
Modularity in cell biology
Definition of a module
• Loosely linked island of densely connected nodes
• Groups of co-expressed genes
Concept of modules in a network
Concept of modules in a network
Definition of a module
Computational analysis of modular structures
Data clustering approach
Concept of data clustering analysis
• Partitioning a data set into groups so that points in one group are similar to each other and are as different as possible from the points in other groups.
• The validity of a clustering is often in the eye of beholder.
Concept of data clustering analysis
• In order to describe two data points are similar or not, we need to define a similarity measure.
• We also need a score function for our objectives.
• A clustering algorithm can be used to partition the data set with optimized score function.
Types of clustering algorithms
• Partition-based clustering algorithms
• Hierarchical clustering algorithms
• Probabilistic model-based clustering algorithms
Partitioning problem
• Given the set of n nodes network D={x(1),x(2), ,x(n)}, our task is to find K cluste∙∙∙rs C={C1,C2, ,C∙∙∙ K} such that each node x(i) is assigned to a unique cluster Ck with optimized score function S(C1,C2, ,C∙∙∙
K).
Community structure of biological network
Community 1
Community 2
Community 3
Score function for network clustering
• To maximize the intra group connections as many as possible and to minimize the inter group connection as few as possible.
Spectral analysis clustering algorithm
Adjacency Matrix
• Aij= 1 if ith protein interacts with jth protein
• Aij=0 otherwise
• Aij=Aji (undirected graph)
• Aij is a sparse matrix, most elements of Aij are zero
0
0
Spectral analysis
Algorithm (Spectral analysis)
• Randomly assign a vector X=(X1,X2,…,Xn)
• Iterate X(k+1)=AX(k) untill it converges
• Try another vector which is perpendicular to previous found eigenspace
Topological Structure
Original Network Hidden Topological Structure
An example
Protein-protein interaction network of Saccharomyces cerevisiae
Assign 80000 interactions of 5400 yeast proteins a confidence
valueWe take 11855 interactions with high and medium confidence among 2617 proteins with 353 unknown function
proteins.
Data source
Quasi-cliqueQuasi-bipartite
Positive eigenvalue negative eigenvalue
• With the spectral analysis, we obtain 48 quasi-cliques and 6 quasi-bipartites.
• There are annotated proteins, unannotated and unknown proteins within a quasi-clique
Application—function prediction
Hierarchical clustering algorithm
• A similarity distance measure between node i and j, d(i,j)
• The similarity measure can be let the network to be a weighted network Wij.
Types of hierarchical clustering
• Agglomerative hierarchical clustering
• Divisive hierarchical clustering
Properties of similarity measure
• d(i,j)≥0
• d(i,j)=d(j,i)
• d(i,j)≤d(i,k)+d(k,j)
Similarity measure for agglomerative clustering
• Correlation
• Shortest path length
• Edge betweenness
How good is agglomerative clustering ?
Hierarchical tree (Dendrogram)
threshold
Cluster 1Cluster 2
Single link
Distance between clusters
Cluster 1Cluster 2
Complete link
Distance between clusters
0203.429.55
205.3539.5
03.45.305.15.2
29.555.102
539.55.220
D
x2
x3
x1
x4
x5
1.5 2.0 2.2 3.5
Single link
Divisive hierarchical clustering
M.E.J., Newman and M. Girvan, Phys. Rev. E 69, 026113, (2004)
Definition of edge betweeness
i
j
5
2
and i node connectingpath ofnumber
k edge through passingpath ofnumber ),(
jjiBk
Definition of edge betweeness
ji
k jiBk,
),( edge of sbetweennes edge
jik jiB
NNk
,
),()3)(2(
2 edge of sbetweennes edge scaled
Calculation of edge betweenness
Quantitative measurement of network modularity
Modularity Q
ji
e
ea
aeQ
ij
jiji
iiii
and module
connectingnetwork in edges offraction theis
2
Threshold selection
Karate club network
Karate club network
Examples of agglomerative hierarchical
clustering
Can we identify the modules?
),min(
),(),(
jiT kk
jiJjiO J(i,j): # of nodes both i and j link to; +1 if there is a direct (i,j) link
Modules in the E. coli metabolismE. Ravasz et al., Science, 2002
Pyrimidine metabolism
Yeast signaling proteins in MIPS
i,jl
lA
ij
ij
ij
proteinbetween path shortest :
12
PNAS, vol.100, pp.1128, (2003).
Spotted microarray for Saccharomyces cerevisiae
Similarity measure
Regulatory module network
Genome Biology, 9, R2, (2008).