machine learning in simple networksmmds.imm.dtu.dk/presentations/hansen.pdf · european physical...

Machine Learning in Simple Networks

Lars Kai Hansenwww.imm.dtu.dk/~lkh

Outline

• Communities and link prediction

• Modularity – Modularity as a combinatorial optimization problem

– Gibbs sampling

• Detection threshold – a phase transition?

• Learning community parameters

– The Hofman-Wiggins generative model– Is there a threshold for detection when you learn the parameters and complexity?

http://upload.wikimedia.org/wikipedia/commons/a/a7/Web_2.0_Map.svg

Muzeeker

• Wikipedia based common sense• Wikipedia used as a proxy for the

music users mental model• Implementation: Filter retrieval

using Wikipedia’s article/ categories

• Muzeeker.com

• LINK PREDICTION to complete the ontological quality of Wikipedia

http://muzeeker.com/

Network models

• Nodes/vertices and links/edges– Directed / undirected– Weighted / un-weighted

• Link distributions– Random– Long tail– Hubs and authorities

• Link induced correlations– The Rich club

• Communities– Link prediction

http://en.wikipedia.org/wiki/File:Undirected.svg

http://upload.wikimedia.org/wikipedia/commons/a/a2/Directed.svg

http://upload.wikimedia.org/wikipedia/commons/5/5b/6n-graf.svg

Motivation for community detection• Community structure may mark a non-stationary link distribution with “high and low density” sub-networks, hence summarizing with a single “model” could be misleading

Modularity can be predictive for dynamics

M.E.J. Newman and M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. E 69, 026113 (2004).

Modularity objective functionThe modularity is expressed as a sum over links, such that we penalize

missing links in communities - missing is measured relative to a null distribution P0ij.

( , )2

iji j i jij

AQ PP c c

mδ⎡ ⎤

= −⎢ ⎥⎣ ⎦

∑

Ci is the community assignment of node jand 2m = ΣijAij, ki = ΣjAij

The null is a baseline distribution Pij = kikj/(2m)2

The value of the modularity lies in the range [−1,1]. It is positive if the number of edges within groups exceeds the number expected on the basis of chance

M.E.J. Newman and M. Girvan. Finding and evaluating communitystructure in networks. Physical Review E, 69:026113,2004, cond-mat/0308217.

Potts representation

( , )

( , )2

( , )2 2

1 ( ')2 2

i j ki kjk

ij

ij iji j i j i j ki kjij ij k

ij ki kjijk

c c S S

AP j i

mA A

Q PP c c PP S Sm m

Tr SBSQ B S Sm m

δ

δ

=

=

⎡ ⎤ ⎡ ⎤= − = −⎢ ⎥ ⎢ ⎥

⎣ ⎦ ⎣ ⎦

= =

∑

∑ ∑ ∑

∑

Introduce 0,1 binary variables Skj coding the community assignment: “node j is member of community k”

Spectral optimization• Newman relaxes the optimization problem to the simplex

1 ( ')2 2

( ') ( )2

i j k i k ji jk

T r S B SQ B S Sm m

T r S B SL T r Sm

B S S

= =

= + Λ

= Λ

∑

Combinatorial optimization• We can use a physics analogy Simulated Annealing (Kirkpatrick et al. 1983)

( ) ( ')( | , ) exp( ) exp( )2

Q S Tr SBSP S A TT mT

∝ =

• Gibbs sampling is a Monte Carlo realization of a Markov process in which each variable is randomly assigned according to its marginal distribution

( | , )( | , , )( | , )

j

j jS

P S A TP S S A TP S A T− =

∑S Geman,D Geman, "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images". IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (6): 721–741 (1984)

Potts model 1-node• Discrete probability distribution on states k = 1,…,K

( )

1

''

( | , ) exp ,

( | , )

exp

exp

k

Kk kk

Sk

k

k

k kk

k

SP S A T

T

P S A T r

TS r

T

ϕ

ϕ

ϕ

=⎛ ⎞⎜ ⎟∝⎜ ⎟⎝ ⎠

=

⎛ ⎞⎜ ⎟⎝ ⎠= =⎛ ⎞⎜ ⎟⎝ ⎠

∑

∏

∑

Gibbs sampling

''

2 2 2 2

exp( / )exp( / )

potts( )

ij ij jiki kj kj kjj j j

kiki

k ik

i i

B A kkS S Sm m m m

TrT

S r

ϕ

ϕϕ

= = −

=

=

∑ ∑ ∑

∑

Deterministic annealing • Instead of drawing Gibbs samples according to the marginals we can average instead, this provides a set of self-consistent equations for the means (for 0,1 Bernoulli variables the mean is the probability μki =P(Ski))

''

exp( / )exp( / )

2 2

kiki

k ik

ij ijki kj kj i j kjj j j

TrT

B Ar r PP r

m m

ϕϕ

ϕ

=

= = −

∑

∑ ∑ ∑

S. Lehmann, L.K. Hansen: Deterministic modularity optimization European Physical Journal B 60(1) 83-88 (2007).

Experimental evaluation• Create a simple testbed with link probability and “noise”


Generative community model (Hofman & Wiggins, 2008)

( )

( )

,

,

( | , , ) (1 ) (1 )121 (1 )21 121 (1 ) 12

c d e f

ij kj kij i k

ij kj kij i k

ij kj kij i k

ij kj kij i k

P A S p q p p q q

c A S S

d A S S

e A S S

f A S S

≠

≠

≠

≠

= − −

=

= −

= −

= − −

∑

∑

∑ ∑

∑ ∑

Learning parameters of the generative model

• Hofman & Wiggins (2008)– “Variational Bayes”

– Dirichlets/beta prior and posterior distributions for the probabilities

– Very well determined (over kill)

– Independent binomials for the assignment variables (misses correlation)

• Here– Maximum likelihood for the parameters– Gibbs sampling for the assignments

Jake M. Hofman and Chris H. Wiggins, Bayesian Approach to Network ModularityPhys. Rev. Lett. 100, 258701 (2008),

The community detection thresholdhow many links are needed to detect the structure?

Jorg Reichardt and Michele Leone, Un)detectable Cluster Structure in Sparse NetworksPhys. Rev. Lett. 101, 078701 (2008),

( 1) 1inp SNRP

q C C= =

− −

Experimental design

• Planted solution– N = 1000 nodes– Ctrue = 5– Quality: Mutual information between

• planted assignments and the best identified

• Gibbs sampling– No annealing– Burn-in 200 iterations– Averaging 800 iterations

• Parameter learning– Q = 10 iterations

Community Detection – fully informed on number of communities and probabilities

0 0.01 0.02 0.03 0.04 0.050

0.5

1

1.5

2

2.5

INTRA COMMUNITY LINK PROB (P)

MU

TU

AL

INF

. PLA

NT

ED

CO

MM

UN

ITY

COMMUNITY DETECTION (N =1000, C = 5, SNR = 5)

0 0.01 0.02 0.03 0.04 0.050

0.5

1

1.5

2

2.5


MU

TU

AL

INF

. PLA

NT

ED

CO

MM

UN

ITY


0 0.01 0.02 0.03 0.04 0.050

0.5

1

1.5

2

2.5


MU

TU

AL

INF

. PLA

NT

ED

CO

MM

UN

ITY


0 0.01 0.02 0.03 0.04 0.050

0.5

1

1.5

2

2.5


MU

TU

AL

INF

. PLA

NT

ED

CO

MM

UN

ITY


Now what happens to the phase transition if we learn the parameters … with a too complex model(C > Ctrue = 5) ?

0 0.02 0.04 0.06 0.08 0.10

0.5

1

1.5

2

2.5


MU

TU

AL

INF

. PLA

NT

ED

CO

MM

UN

ITY


0 0.02 0.04 0.06 0.08 0.10

0.5

1

1.5

2

2.5

INTRA COMMUNITY LINK PROB (P)M

UT

UA

L IN

F. P

LAN

TE

D C

OM

MU

NIT

Y


1 2 3 4 5 6 7 8 9 100

50

100

150

200

COMMUNITY

ME

MB

ER

SH

IPS

Conclusions

•Community detection can be formulated as an inference problem (Hofman & Wiggins, 2008)

•The sampling process for fixed SNR has a phase transition like detection threshold (Richard & Leone, 2008)

•The phase transition remains (sharpens?) if you learn the parameters of a generative model with unknown complexity

machine learning in simple networksmmds.imm.dtu.dk/presentations/hansen.pdf · european physical...

Documents