uncovering latent structure in valued...

47
Uncovering Latent Structure in Valued Graphs M. Mariadassou Joint work with S. Robin and C. Vacher Laboratoire MIG (UR INRA), Jouy-en-Josas, France. Singapore, IMS, 10 May 2011 Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 1 / 28

Upload: others

Post on 22-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Uncovering Latent Structure in Valued Graphs

M. MariadassouJoint work with S. Robin and C. Vacher

Laboratoire MIG (UR INRA), Jouy-en-Josas, France.

Singapore, IMS, 10 May 2011

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 1 / 28

Page 2: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Outline

1 Introduction

2 MixNet: a Mixture Model for Random Graphs

3 Parametric Estimation

4 Simulation Study

5 Ecological Network

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 2 / 28

Page 3: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Yeast Protein Interaction Network (PIN)

Figure: Yeast PIN. source: www.bordalierinstitute.com/images/yeastProteinInteractionNetwork.jpg

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 3 / 28

Page 4: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Goal: Simple Representation of the Graph

Figure: Zachary’s karate club (Zachary 77)

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 4 / 28

Page 5: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Goal: Simple Representation of the Graph

Figure: Zachary’s karate club (Zachary 77)

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 4 / 28

Page 6: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Models for Networks

Classical ModelsErdos-Renyi random graph (Erdos & Renyi 59);Degree distribution (Milo & al 04);Preferential Attachment (Barabasi & Albert 99);

Exponential ModelsERGM (Holland & Leinhardt 81).

→ Local structure induced by relative frequencies of motifs.

Mixture ModelStochastic Block Model / MixNet (Holland & al 83, Fienberg & al85, Snijders & Nowicki 97, Daudin & al 08)

→ Global structure induced by groups of similar nodes.

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 5 / 28

Page 7: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Models for Networks

Classical ModelsErdos-Renyi random graph (Erdos & Renyi 59);Degree distribution (Milo & al 04);Preferential Attachment (Barabasi & Albert 99);

Exponential ModelsERGM (Holland & Leinhardt 81).

→ Local structure induced by relative frequencies of motifs.

Mixture ModelStochastic Block Model / MixNet (Holland & al 83, Fienberg & al85, Snijders & Nowicki 97, Daudin & al 08)

→ Global structure induced by groups of similar nodes.

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 5 / 28

Page 8: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Models for Networks

Classical ModelsErdos-Renyi random graph (Erdos & Renyi 59);Degree distribution (Milo & al 04);Preferential Attachment (Barabasi & Albert 99);

Exponential ModelsERGM (Holland & Leinhardt 81).

→ Local structure induced by relative frequencies of motifs.

Mixture ModelStochastic Block Model / MixNet (Holland & al 83, Fienberg & al85, Snijders & Nowicki 97, Daudin & al 08)

→ Global structure induced by groups of similar nodes.

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 5 / 28

Page 9: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

MixNet Probabilistic Model (nodes)

Nodes heterogeneityI The nodes are distributed among Q different classes (e.g. ,N,�);

I Z = (Zi)i=1..n i.i.d. vectors Zi = (Zi1, . . . ,ZiQ) ∼ M(1,α) whereα = (α1, . . . , αQ) are the group proportions;

I Zi is not observed.

Example: (9 nodes, 3 classes)

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 6 / 28

Page 10: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

MixNet Probabilistic Model (nodes)

Nodes heterogeneityI The nodes are distributed among Q different classes (e.g. ,N,�);

I Z = (Zi)i=1..n i.i.d. vectors Zi = (Zi1, . . . ,ZiQ) ∼ M(1,α) whereα = (α1, . . . , αQ) are the group proportions;

I Zi is not observed.

Example: (9 nodes, 3 classes)

P( )=0.25

P( )=0.25

P( )=0.5

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 6 / 28

Page 11: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

MixNet Probabilistic Model (edges)

ObservationsI Edges values Xij where Xij ∈ R

s;I Conditional on Z, the (Xij) are independent with distribution

Xij|{Ziq = 1,Zj` = 1} ∼ f (., θq`)

I θ = (θq`)q,`=1..Q is the connectivity parameter.

Example: 3 classes with Poisson-valued edges

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 7 / 28

Page 12: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

MixNet Probabilistic Model (edges)

ObservationsI Edges values Xij where Xij ∈ R

s;I Conditional on Z, the (Xij) are independent with distribution

Xij|{Ziq = 1,Zj` = 1} ∼ f (., θq`)

I θ = (θq`)q,`=1..Q is the connectivity parameter.

Example: 3 classes with Poisson-valued edges

0 0.9 0.25

. 1 0.5

. . 1

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 7 / 28

Page 13: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Flexibility of MixNet

Classical Distributions:I fθ can be any probability distribution;

→ Bernoulli (interaction graph): presence/absence of an edge;Xij|{Ziq = 1,Zj` = 1} ∼ B(πq`)

→ Poisson (PM) (count): in coauthorship networks, number ofcopublished papers;

Xij|{Ziq = 1,Zj` = 1} ∼ P(λq`)

→ Poisson regression with homogeneous effects (PRMH) (countswith covariates): in ecological networks;

Xij|{Ziq = 1,Zj` = 1} ∼ P(λq` exp{βᵀyij})

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 8 / 28

Page 14: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

(Log)-Likelihood of the Model

I Complete data likelihood

L(X,Z) = ln Pr(X,Z) = ln Pr(Z)P(X|Z)

=∑

i

∑q

Ziq lnαq +∑i<j

∑q,l

ZiqZjl ln fθql(Xij)

I Observed data likelihood

L(X) = ln∑

ZexpL(X,Z)

I Sum over Qn is untractable, use EM algorithm instead.

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 9 / 28

Page 15: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

(Log)-Likelihood of the Model

I Complete data likelihood

L(X,Z) =∑

i

∑q

Ziq lnαq +∑i<j

∑q,l

ZiqZjl ln fθql(Xij)

I Observed data likelihood

L(X) = ln∑

ZexpL(X,Z)

I Sum over Qn is untractable, use EM algorithm instead.

But...The random variables Xij are not independent;The distribution Pr(.|X) of Z conditional on X is not a productdistribution;

→ Exact EM is not possible...Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 9 / 28

Page 16: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Variational Inference: Pseudo Likelihood

If RX is a distribution over Z, let

J(RX) = L(X) − KL(RX,Pr(.|X))

For RX = Pr(.|X), J(RX) = L(X);Variational approximation: replace complicated distributionPr(.|X) by a simple RX such that KL(RX,Pr(.|X)) is minimal to obtaina tight lower bound of L(X).

J(RX) = L(X) − KL(RX,Pr(.|X))

= H(RX) + ERX[L(X,Z)]

where H(RX) is the entropy of RX.

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 10 / 28

Page 17: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Variational Inference: Pseudo Likelihood

If RX is a distribution over Z, let

J(RX) = L(X) − KL(RX,Pr(.|X))

For RX = Pr(.|X), J(RX) = L(X);Variational approximation: replace complicated distributionPr(.|X) by a simple RX such that KL(RX,Pr(.|X)) is minimal to obtaina tight lower bound of L(X).

J(RX) = L(X) − KL(RX,Pr(.|X))

= H(RX) + ERX[L(X,Z)]

where H(RX) is the entropy of RX.

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 10 / 28

Page 18: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Variational Inference: Pseudo Likelihood

If RX is a distribution over Z, let

J(RX) = L(X) − KL(RX,Pr(.|X))

For RX = Pr(.|X), J(RX) = L(X);Variational approximation: replace complicated distributionPr(.|X) by a simple RX such that KL(RX,Pr(.|X)) is minimal to obtaina tight lower bound of L(X).

J(RX) = L(X) − KL(RX,Pr(.|X))

= H(RX) + ERX[L(X,Z)]

where H(RX) is the entropy of RX.

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 10 / 28

Page 19: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Variational Inference: Pseudo Likelihood (II)

Computing ERX[L(X,Z)] is easy, computing H(RX) is hard (ingeneral).Restrict RX to a comfortable class of distributions:

RX[Z] =∏

i

h(Zi; τi)

with h(.; τi) the multinomial with paramater τi = (τi1, . . . , τiQ).Intuitively, τiq ' Pr(Ziq = 1|X).For such RX,

J((τi)i=1..n) = −∑

i

∑q

τiq ln τiq +∑

i

∑q

τiq lnαq +∑i<j

τiqτj` ln fθq`(Xij)

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 11 / 28

Page 20: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Variational Inference: Pseudo Likelihood (II)

Computing ERX[L(X,Z)] is easy, computing H(RX) is hard (ingeneral).Restrict RX to a comfortable class of distributions:

RX[Z] =∏

i

h(Zi; τi)

with h(.; τi) the multinomial with paramater τi = (τi1, . . . , τiQ).Intuitively, τiq ' Pr(Ziq = 1|X).For such RX,

J((τi)i=1..n) = −∑

i

∑q

τiq ln τiq +∑

i

∑q

τiq lnαq +∑i<j

τiqτj` ln fθq`(Xij)

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 11 / 28

Page 21: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Variational Inference: Pseudo Likelihood (II)

Computing ERX[L(X,Z)] is easy, computing H(RX) is hard (ingeneral).Restrict RX to a comfortable class of distributions:

RX[Z] =∏

i

h(Zi; τi)

with h(.; τi) the multinomial with paramater τi = (τi1, . . . , τiQ).Intuitively, τiq ' Pr(Ziq = 1|X).For such RX,

J((τi)i=1..n) = −∑

i

∑q

τiq ln τiq +∑

i

∑q

τiq lnαq +∑i<j

τiqτj` ln fθq`(Xij)

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 11 / 28

Page 22: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

2 Steps Iterative Algorithm

Maximize pseudo-likelihood:

J((α, θ), (τi)i=1..n) = −∑

i

∑q

τiq ln τiq +∑

i

∑q

τiq lnαq +∑i<j

τiqτj` ln fθq` (Xij)

Step 1 Optimize J w.r.t. (τi):→ Constraint:

∑q τiq = 1 for all i;

→ τiq variational parameter found via a fixed point algorithm:

τiq ∝ αq

∏j,i

Q∏`=1

fθq` (Xij)τjl

Step 2 Optimize J w.r.t. (α, θ):→ Constraint:

∑q αq = 1

αq =∑

i

τiq/n

θql = arg maxθ

∑i,j

τiqτjl log fθ(Xij)

→ Simple expression of θql for classical distributions (weighted MLE).

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 12 / 28

Page 23: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

2 Steps Iterative Algorithm

Maximize pseudo-likelihood:

J((α, θ), (τi)i=1..n) = −∑

i

∑q

τiq ln τiq +∑

i

∑q

τiq lnαq +∑i<j

τiqτj` ln fθq` (Xij)

Step 1 Optimize J w.r.t. (τi):→ Constraint:

∑q τiq = 1 for all i;

→ τiq variational parameter found via a fixed point algorithm:

τiq ∝ αq

∏j,i

Q∏`=1

fθq` (Xij)τjl

Step 2 Optimize J w.r.t. (α, θ):→ Constraint:

∑q αq = 1

αq =∑

i

τiq/n

θql = arg maxθ

∑i,j

τiqτjl log fθ(Xij)

→ Simple expression of θql for classical distributions (weighted MLE).

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 12 / 28

Page 24: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

2 Steps Iterative Algorithm

Maximize pseudo-likelihood:

J((α, θ), (τi)i=1..n) = −∑

i

∑q

τiq ln τiq +∑

i

∑q

τiq lnαq +∑i<j

τiqτj` ln fθq` (Xij)

Step 1 Optimize J w.r.t. (τi):→ Constraint:

∑q τiq = 1 for all i;

→ τiq variational parameter found via a fixed point algorithm:

τiq ∝ αq

∏j,i

Q∏`=1

fθq` (Xij)τjl

Step 2 Optimize J w.r.t. (α, θ):→ Constraint:

∑q αq = 1

αq =∑

i

τiq/n

θql = arg maxθ

∑i,j

τiqτjl log fθ(Xij)

→ Simple expression of θql for classical distributions (weighted MLE).

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 12 / 28

Page 25: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Model Selection Criterion

BIC-like criterion to select the number of classes;

The likelihood can be split: L(X,Z|Q) = L(X|Z,Q) +L(Z|Q);

These terms can be penalized separately:

L(X|Z,Q) → penX|ZPQ log n(n − 1)

L(Z|Q) → penZ = (Q − 1) log(n)

ICL(Q) = maxθL(X, Z|θ,mQ) − 1

2

(PQ log n(n − 1) − (Q − 1) log(n)

)

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 13 / 28

Page 26: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Model Selection Criterion

BIC-like criterion to select the number of classes;

The likelihood can be split: L(X,Z|Q) = L(X|Z,Q) +L(Z|Q);

These terms can be penalized separately:

L(X|Z,Q) → penX|ZPQ log n(n − 1)

L(Z|Q) → penZ = (Q − 1) log(n)

ICL(Q) = maxθL(X, Z|θ,mQ) − 1

2

(PQ log n(n − 1) − (Q − 1) log(n)

)

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 13 / 28

Page 27: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

MixNet Properties

IdentifiabilityIdentifiablity of Parameters (Allman and al., 2009, 2011);

Model Selection criteria (Daudin and al., 2008, Latouche and al.,2011)

Quality of EstimatesVEM algorithm converge to a different optimum than ML in thegeneral case (Gunawardana and Byrne (2005)), except fordegenerated models;

SBM are in a certain sense degenerated: Pr(.|X)→ δZ (ongoingwork of Celisse and Daudin, Mariadassou and Matias)

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 14 / 28

Page 28: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

MixNet Properties

IdentifiabilityIdentifiablity of Parameters (Allman and al., 2009, 2011);

Model Selection criteria (Daudin and al., 2008, Latouche and al.,2011)

Quality of EstimatesVEM algorithm converge to a different optimum than ML in thegeneral case (Gunawardana and Byrne (2005)), except fordegenerated models;

SBM are in a certain sense degenerated: Pr(.|X)→ δZ (ongoingwork of Celisse and Daudin, Mariadassou and Matias)

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 14 / 28

Page 29: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Quality of the Estimates: Simulation Setup

→ Undirected graph with Q = 3 classes;

→ Poisson-valued edges;

→ n = 100, 500 vertices;

→ αq ∝ aq for a = 1, 0.5, 0.2;a = 1: balanced classes;a = 0.2: unbalanced classes (80.6%, 16.1%, 3.3%)

→ Connectivity matrix of the form

λ γλ γλ

γλ λ γλ

γλ γλ λ

for

γ = 0.1, 0.5, 0.9, 1.5 and λ = 2, 5.γ = 1: all classes equivalent (same connectivity pattern);γ , 1: classes are different;λ: mean value of an edge;

→ 100 repeats for each setup.

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 15 / 28

Page 30: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Quality of the Estimates: Results

Root Mean Square Error (RMSE) =√

Bias2 + Variance

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 16 / 28

Page 31: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Quality of the Estimates: Results

Root Mean Square Error (RMSE) =√

Bias2 + Variance

RMSE for the αq RMSE for the λql

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

2 4 60

0.5

1

1.5

2

2.5

3

2 4 60

0.5

1

1.5

2

2.5

3

2 4 60

0.5

1

1.5

2

2.5

3

2 4 60

0.5

1

1.5

2

2.5

3

2 4 60

0.5

1

1.5

2

2.5

3

2 4 60

0.5

1

1.5

2

2.5

3

x-axis: α1, α2, α3 x-axis: λ11, λ22, λ33, λ12, λ13, λ23Top: n = 100, Bottom: n = 500

Left to right: a = 1, 0.5, 0.2Solid line: λ = 5, dashed line: λ = 2

Symbols depend on γ: ◦ = 0.1, O = 0.5, M= 0.9, ∗ = 1.5

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 16 / 28

Page 32: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Number of Classes

→ Undirected graph with Q? = 3 classes and Poisson edges;

→ n = 50, 100, 500, 1000 vertices;

→ αq = (57.1%, 28, 6%, 14, 3%);

→ Connectivity matrix of the form

2 1 11 2 11 1 2

Q

n 2 3 450 82 17 1

100 7 90 3500 0 100 0

1000 0 100 0

Table: Frequency of selected Q for various n.

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 17 / 28

Page 33: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Fungi Trees Interactions

Dataset Parisitic behavior of 154 fungi on 51 trees;

Network Valued Network on trees: Xtt′ = number of fungisinfecting both t and t′.

Goal Identify groups of trees sharing similar interactions: issimilarity driven by evolution or geography ?

Poisson Model We assumeXij|{Ziq = 1,Zj` = 1} ∼ P(λq`)

CovariatePhylogenetic relatedness measured by genetic\taxonomicdistance;Geographical relatedness measured by Jaccard distance;

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 18 / 28

Page 34: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

With no covariate (7 classes)

0 5 10 15 20 25 30 35 40 45 500

5

10

15

20

25

30

35

40

45

50

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 19 / 28

Page 35: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Groups of Trees: No Covariate

T1 T2 T3 T4 T5 T6 T7

ConipherophytaMagnoliophyta

0

4

8

12

Mean number of interactions

Group size and composition

Taxonomic rank: species ¡ genus ¡ family ¡ order ¡ class ¡ phylum;Strong effect of taxonomic rank on the group composition;Groups T1, T2, T3, T4 are even monofamily;Need to account for taxonomic distance.

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 20 / 28

Page 36: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Groups of Trees: No Covariate

T1 T2 T3 T4 T5 T6 T7

ConipherophytaMagnoliophyta

0

4

8

12

Mean number of interactions

Group size and composition

Taxonomic rank: species ¡ genus ¡ family ¡ order ¡ class ¡ phylum;Strong effect of taxonomic rank on the group composition;Groups T1, T2, T3, T4 are even monofamily;Need to account for taxonomic distance.

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 20 / 28

Page 37: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Groups of Trees: No Covariate (II)

λq` T1 T2 T3 T4 T5 T6 T7T1 14.46 4.19 5.99 7.67 2.44 0.13 1.43T2 4.19 14.13 0.68 2.79 4.84 0.53 1.54T3 5.99 0.68 3.19 4.10 0.66 0.02 0.69T4 7.67 2.79 4.10 7.42 2.57 0.04 1.05T5 2.44 4.84 0.66 2.57 3.64 0.23 0.83T6 0.13 0.53 0.02 0.04 0.23 0.04 0.06T7 1.43 1.54 0.69 1.05 0.83 0.06 0.27αq 7.8 7.8 13.7 13.7 15.7 19.6 21.6

T1, T2, T3, T4, T5: trees sharing lots of parasites;T6, T7: Trees with sharing few parasites with any other.

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 21 / 28

Page 38: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Groups of Trees: With Covariate

Model: Xij ∼ P(λqleβyij) with yij taxonomic distance

Q = 4 classes;β = −0.317;

T’1 T’2 T’3 T’4T1 0 0 0 4T2 0 0 0 4T3 2 5 0 0T4 0 2 0 5T5 0 2 0 6T6 0 0 10 0T7 7 2 2 0

λq` T’1 T’2 T’3 T’4T’1 0.75 2.46 0.40 3.77T’2 2.46 4.30 0.52 8.77T’3 0.40 0.52 0.080 1.05T’4 3.77 8.77 1.05 14.22αq 17.7 21.5 23.5 37.3

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 22 / 28

Page 39: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Groups of Trees: With Covariate

Model: Xij ∼ P(λqleβyij) with yij taxonomic distance

Q = 4 classes;β = −0.317;

T’1 T’2 T’3 T’4T1 0 0 0 4T2 0 0 0 4T3 2 5 0 0T4 0 2 0 5T5 0 2 0 6T6 0 0 10 0T7 7 2 2 0

T’1 T’2 T’3 T’4

ConipherophytaMagnoliophyta

0

5

10

15

20

Mean number of interactions

Group size and composition

λq` T’1 T’2 T’3 T’4T’1 0.75 2.46 0.40 3.77T’2 2.46 4.30 0.52 8.77T’3 0.40 0.52 0.080 1.05T’4 3.77 8.77 1.05 14.22αq 17.7 21.5 23.5 37.3

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 22 / 28

Page 40: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Goodness of fit

Check predictive power of the model for

Weighted degree Single Edge Value

0 20 40 60 80 100 120 140 160 1800

50

100

150

200

250

0 2 4 6 8 10 12 140

5

10

15

20

25

Ki =∑

j,i Xij Xij

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 23 / 28

Page 41: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Other covariates

Genetic distance: same effect than taxonomic distance;Jaccard distance: no effect;

→ Main sources of similarity in trees parasitic assemblages areevolutionary processes and not ecological processes.

Tree interaction networkFactor Covariate Q (PM) Q (PRMH) ∆ ICLPhylogenetic Taxonomic Distance 7 4 116.0relatedness Genetic distance 7 4 94.8Geographical Jaccard distance 7 7 -8.6overlap

Table: Effect of covariates. ∆ ICL = gain of switching from PM to PRMH.

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 24 / 28

Page 42: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Other covariates

Genetic distance: same effect than taxonomic distance;Jaccard distance: no effect;

→ Main sources of similarity in trees parasitic assemblages areevolutionary processes and not ecological processes.

Tree interaction networkFactor Covariate Q (PM) Q (PRMH) ∆ ICLPhylogenetic Taxonomic Distance 7 4 116.0relatedness Genetic distance 7 4 94.8Geographical Jaccard distance 7 7 -8.6overlap

Table: Effect of covariates. ∆ ICL = gain of switching from PM to PRMH.

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 24 / 28

Page 43: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Summary

MixNetFlexible probabilistic model to detect structure in complex valuedgraphs;Pseudo-likelihood estimators computed through variational EM(consistency ?);A statistical model selection criteria for the number of classes;Package available athttp://pbil.univ-lyon1.fr/software/MixNet.

Host-Parasite NetworkSimilarity in parasitic assemblages of two trees explained byphylogenetic relatedness, not geographical overlap.

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 25 / 28

Page 44: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Summary

MixNetFlexible probabilistic model to detect structure in complex valuedgraphs;Pseudo-likelihood estimators computed through variational EM(consistency ?);A statistical model selection criteria for the number of classes;Package available athttp://pbil.univ-lyon1.fr/software/MixNet.

Host-Parasite NetworkSimilarity in parasitic assemblages of two trees explained byphylogenetic relatedness, not geographical overlap.

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 25 / 28

Page 45: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

E. Coli Reaction Network

Reaction Network of E.Coli:→ data from http://www.biocyc.org/,→ n = 605 vertices (reactions) and 1 782 edges.→ 2 reactions i and j are connected if the product of i is the substrate

of j (cofactors excluded),→ V. Lacroix and M.-F. Sagot (INRIA - Helix).

Question:→ Interpretation of the connectivity structure of classes?

MixNet results:→ ICL gives Q = 21 classes,→ Most classes correspond to pseudo-cliques,

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 26 / 28

Page 46: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Biological interpretation of the groups I

Dot-plot representation→ adjacency matrix (sorted)

Biological interpretation:→ Groups 1 to 20 gather

reactions involving all thesame compound either as asubstrate or as a product,

→ A compound (chorismate,pyruvate, ATP,etc) can beassociated to each group.

The structure of the metabolicnetwork is governed by thecompounds.

0 100 200 300 400 500 6000

100

200

300

400

500

600

0 0.5 10

100

200

300

400

500

600

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 27 / 28

Page 47: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/1105-Singapore.pdf · Exponential Models ERGM (Holland & Leinhardt 81).!Localstructure induced

Biological interpretation of the groups II

→ Classes 1 and 16 constitute ssingle clique corresponding toa single compound (pyruvate),

→ They are split into two classesbecause they interactdifferently with classes 7(CO2) and 10 (AcetylCoA)

→ Connectivity matrix (sample):q, l 1 7 10 161 1.07 .11 .6510 .43 .6716 1.0 .01 ε 1.0

Adjacency matrix (sample)

Mariadassou (INRA) Uncovering Structure in Valued Graphs May 11 28 / 28