hierarchically modular structure in complex...
TRANSCRIPT
Hierarchically ModularStructure in
Complex Networks
Aaron ClausetSanta Fe Institute
3 November 2008DIMACS / DyDAn
“Network Models of Biological and Social Contagion”
Modular Hierarchies
Grassland species**thank you: Jennifer Dunne
plant!
!
herbivore
!
parasite
Modular Hierarchies
c
Modular Hierarchies
c
The TaskHow can we extract
• this hierarchical (multi-scale) structurefrom complex networks?
!
network
?
c
hierarchy
One Approach
Model-based inference
1. describe how to generate hierarchies (a model)
2. “fit” model to empirical data
3. test “fitted” model
4. extract predictions + insight
5. profit!
A Model of Hierarchy
!
A Model of Hierarchy
probability
assortative modules
pr
D, {pr}
“inhomogeneous” random graph
!!
model
instance
!
Pr(i, j connected) = pr
i
j
i j
= p(lowest common ancestor of i,j)
Model Features
• explicit model = explicit assumptions
• very flexible (many parameters)
• captures structure at all scales
• arbitrary mixtures of assortativity, disassortativity
• learnable directly from data
Learning From Data
a direct approach• likelihood function
( scores quality of model)
• sample the good models
via Markov chain Monte Carlo
• technical details in arXiv : physics/0610051
L = Pr( data | model )
From Graph to Ensemble
From Graph to Ensemble
• Given graph• run MCMC to equilibrium• then, for each sampled , draw a resampled
graph from ensemble
A test: do resampled graphs look like original?
D
G!
G
Grassland species*
plant!
!herbivore
!parasite
*thank you: Jennifer Dunne
100 10110!3
10!2
10!1
100a
Degree, k
Frac
tion
of v
ertic
es w
ith d
egre
e k
Degree Distribution
resampled!
original!
Clustering Coefficient
resampled!
original!
0 0.05 0.1 0.15 0.2 0.25 0.30
0.05
0.1
0.15
0.2
0.25
Frac
tion
of g
raph
s wi
th c
lust
erin
g co
effic
ient
c
Clustering coefficient, c
resampled!
original!
2 4 6 8 1010!3
10!2
10!1
100b
Distance, d
Frac
tion
of v
erte
x!pa
irs a
t dis
tanc
e d
Distance Distribution
resampled!
original!
Missing Links
A test: can model predict missing links?
Predicting is Hard
• remove edges from• how easy to guess a missing link?
n = 75
m = 113
pguess !k
n2" m + k
= O(n!2)
k
pguess = k/(2662 + k)
G
• Given incomplete graph• run MCMC to equilibrium• then, over sampled , compute average
for links • predict links with high values are missing
Test idea via leave-k-out cross-validationperfect accuracy: AUC = 1no better than chance: AUC = 1/2
(i, j) !" G
Predicting Missing Links
D !pr"
G
!pr"
Missing Structure
0 0.2 0.4 0.6 0.8 10.4
0.5
0.6
0.7
0.8
0.9
1
Area
und
er R
OC
curv
e
Fraction of edges observed, k/m
Grassland species network
Pure chanceCommon neighborsJaccard coeff.Degree productShortest pathsHierarchical structure
simple predictors
!
hierarchy
!
pure chance!
AUC
0 0.2 0.4 0.6 0.8 10.4
0.5
0.6
0.7
0.8
0.9
1
AUC
Fraction of edges observed
Terrorist association networka
Pure chanceCommon neighborsJaccard coefficientDegree productShortest pathsHierarchical structure
Other Networks
0 0.2 0.4 0.6 0.8 10.4
0.5
0.6
0.7
0.8
0.9
1
AUC
Fraction of edges observed
T. pallidum metabolic networkb
Pure chanceCommon neighborsJaccard coefficientDegree productShortest pathsHierarchical structure
Summary
• Many real networks are hierarchically modular• Hierarchies can
• model multi-scale structure• generalize a single network• predict missing links
• Model-based inference is very powerful
Acknowledgments:C. Moore, M.E.J. Newman, C.H. Wiggins, and C.R. Shalizi
Fin