stat 375: inference in graphical models lectures 11, 12 · stat 375: inference in graphical models...

72

Upload: others

Post on 31-May-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Stat 375: Inference in Graphical Models

Lectures 11, 12

Andrea Montanari

Stanford University

May 13, 2012

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 1 / 46

Page 2: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Will focus on Ising models

x1

x2 x3 x4

x5x6

x7x8x9

x10

x11x12

G = (V ;E), V = [n ], x = (x1; : : : ; xn), xi 2 f+1;�1g

�G;�(x ) =1

ZG;�exp

n X(i ;j )2E

�ij xixj +Xi2V

�ixio:

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 2 / 46

Page 3: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Outline

1 Learning graphical models: Setting

2 Parameter learning

3 Maximum likelihood

4 Structural learning

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 3 / 46

Page 4: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Learning graphical models: Setting

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 4 / 46

Page 5: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Learning

You are given

x (1); x (2); : : : ; x (n) �i:i:d: �G;�( � )

Question: Estimate �, G

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 5 / 46

Page 6: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Some notation

Number of vertices p

Number of samples n

When necessary, `truth' will be indicated by ��

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 6 / 46

Page 7: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Three critiques of this setting

I The actual distribution is only roughly Ising.

I Variables only observed partially/corrupted.

I Sampling is hard. How can you hope to get i.i.d. samples?

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 7 / 46

Page 8: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

A toy example: US Senate

Number of senators p = 100

Number of bills in 2004� 2006 n = 542

Voting pattern on bill p x (`) 2 f+1;�1gp

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 8 / 46

Page 9: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

A toy example: US Senate

[O.Banerjee, L.El Ghaoui, A.d'Aspremont, J.Mach.Learn.Res., 2008]

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 9 / 46

Page 10: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Parameter learning

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 10 / 46

Page 11: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Is it at all possible?

De�ne, for each i 2 V , fi ; j g � V

e i (xi ) ��(xi ;+1V ni )

�(+1V );

e i ;j (xi ; xj ) ��(xi ; xj ;+1V nfi ;jg)�(+1V )

�(xi ;+1V ni )�(xj ;+1V nj ):

By Hammersley-Cli�ord

�(x ) = �(+1V )Y

fi ;jg2V

e ij (xi ; xj ) Yi2V

e i (xi ) ;e ij ( � ; � ) 6= 1) (i ; j ) 2 E

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 11 / 46

Page 12: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Is it at all possible?

De�ne, for each i 2 V , fi ; j g � V

e i (xi ) ��(xi ;+1V ni )

�(+1V );

e i ;j (xi ; xj ) ��(xi ; xj ;+1V nfi ;jg)�(+1V )

�(xi ;+1V ni )�(xj ;+1V nj ):

By Hammersley-Cli�ord

�(x ) = �(+1V )Y

fi ;jg2V

e ij (xi ; xj ) Yi2V

e i (xi ) ;e ij ( � ; � ) 6= 1) (i ; j ) 2 E

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 11 / 46

Page 13: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

We can estimate these from samples

b�(n)(x ) =1

n

nX`=1

I(x (`) = x )

b (n)i (xi ) �b�(n)(xi ;+1V ni )b�(n)(+1V )

;

b (n)i ;j (xi ; xj ) �b�(n)(xi ; xj ;+1V nfi ;jg) b�(n)(+1V )b�(n)(xi ;+1V ni ) b�(n)(xj ;+1V nj ) :

limn!1

b (n) = e Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 12 / 46

Page 14: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

We can estimate these from samples

b�(n)(x ) =1

n

nX`=1

I(x (`) = x )

b (n)i (xi ) �b�(n)(xi ;+1V ni )b�(n)(+1V )

;

b (n)i ;j (xi ; xj ) �b�(n)(xi ; xj ;+1V nfi ;jg) b�(n)(+1V )b�(n)(xi ;+1V ni ) b�(n)(xj ;+1V nj ) :

limn!1

b (n) = e Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 12 / 46

Page 15: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

We can estimate these from samples

b�(n)(x ) =1

n

nX`=1

I(x (`) = x )

b (n)i (xi ) �b�(n)(xi ;+1V ni )b�(n)(+1V )

;

b (n)i ;j (xi ; xj ) �b�(n)(xi ; xj ;+1V nfi ;jg) b�(n)(+1V )b�(n)(xi ;+1V ni ) b�(n)(xj ;+1V nj ) :

limn!1

b (n) = e Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 12 / 46

Page 16: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

How big is 1?

Exercise: Let X1; : : :Xn �i:i:d: Bernoulli(q) with q 2 (0; 1=2]. Thesmallest n to estimate q with multiplicative accuracy ", withprobability at least 1� � is

n �C

q"2log(1=�)

Hint: Use the large deviations estimate

Pn 1n

nXi=1

Xi � ro� e�nD(r jjq) :

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 13 / 46

Page 17: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

We want this with multiplicative error "

b�(n)(x ) =1

n

nX`=1

I(x (`) = x )

n &1

�(x )= e�(p)

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 14 / 46

Page 18: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

We want this with multiplicative error "

b�(n)(x ) =1

n

nX`=1

I(x (`) = x )

n &1

�(x )= e�(p)

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 14 / 46

Page 19: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Wait a minute

For each i 2 V , fi ; j g � V

e i (xi ) ��(xi ;+1V ni )

�(+1V );

e i ;j (xi ; xj ) ��(xi ; xj ;+1V nfi ;jg)�(+1V )

�(xi ;+1V ni )�(xj ;+1V nj ):

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 15 / 46

Page 20: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Wait a minute

For each i 2 V , fi ; j g � V

e i (xi ) ��(xi ;+1@i )

�(+1i[@i );

e i ;j (xi ; xj ) ��(xi ; xj + 1@ij )�(+1fi ;jg[@ij )

�(xi ;+1fjg[@ij )�(xj ;+1fig[@ij ):

Su�cient to estimate b�@ij[fi ;jg:

n �C

"2minjS j�3k ;xS �S (xS )log(1=�)

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 16 / 46

Page 21: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Wait a minute

For each i 2 V , fi ; j g � V

e i (xi ) ��(xi ;+1@i )

�(+1i[@i );

e i ;j (xi ; xj ) ��(xi ; xj + 1@ij )�(+1fi ;jg[@ij )

�(xi ;+1fjg[@ij )�(xj ;+1fig[@ij ):

Su�cient to estimate b�@ij[fi ;jg:

n �C

"2minjS j�3k ;xS �S (xS )log(1=�)

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 16 / 46

Page 22: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Sample complexity

Theorem (P.Abeel, D.Koller, A.Ng, J. Mach. Learn. Res., 2006)

Assume

I maxi2V degG(i) = k;

I min(ij )2E minxi ;xj ij (xi ; xj ) � �, mini2V minxi i (xi ) � �.

Then, with probability at least 1� �, we can learn all the 's withrelative accuracy " provided

n �C (�)k

"2log

�p�

�:

Further, under these assumptions D(�jjb�) +D(b�jj�) � C jE j".

Proof.

Lower bound minjS j�3k ;xS �S (xS ) plus union bound.

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 17 / 46

Page 23: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Sample complexity

Theorem (P.Abeel, D.Koller, A.Ng, J. Mach. Learn. Res., 2006)

Assume

I maxi2V degG(i) = k;

I min(ij )2E minxi ;xj ij (xi ; xj ) � �, mini2V minxi i (xi ) � �.

Then, with probability at least 1� �, we can learn all the 's withrelative accuracy " provided

n �C (�)k

"2log

�p�

�:

Further, under these assumptions D(�jjb�) +D(b�jj�) � C jE j".

Proof.

Lower bound minjS j�3k ;xS �S (xS ) plus union bound.

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 17 / 46

Page 24: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

How good is this?

I Number of bits to specify the 's = jE j log(1=(�")).

I Number of bits per sample = p

n �jE j

plog

� 1

�"

�=

k

2log

� 1

�"

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 18 / 46

Page 25: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

How good is this?

I Number of bits to specify the 's = jE j log(1=(�")).

I Number of bits per sample = p

n �jE j

plog

� 1

�"

�=

k

2log

� 1

�"

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 18 / 46

Page 26: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Pretty good, isn't it?

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 19 / 46

Page 27: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Pretty good, isn't it?

We are assuming that G is known!

[But see later]

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 20 / 46

Page 28: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Maximum likelihood

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 21 / 46

Page 29: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Another approach

Likelihood

Pn ;G;�

�x (1); : : : ; x (n)

=

nY`=1

�G;�(x(`))

=1

ZG(�)nexp

n X(i ;j )2E

�ij

nX`=1

x(`)i x

(`)j +

Xi2V

�i

nX`=1

x(`)i

o

Ln(�; fx(`)g) � �

1

nlogPn ;G;�

�x (1); : : : ; x (n)

= �hcM ; �i+ �(�)

cMi =1

n

nX`=1

x(`)i ; cMij =

1

n

nX`=1

x(`)i x

(`)j

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 22 / 46

Page 30: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Another approach

Likelihood

Pn ;G;�

�x (1); : : : ; x (n)

=

nY`=1

�G;�(x(`))

=1

ZG(�)nexp

n X(i ;j )2E

�ij

nX`=1

x(`)i x

(`)j +

Xi2V

�i

nX`=1

x(`)i

o

Ln(�; fx(`)g) � �

1

nlogPn ;G;�

�x (1); : : : ; x (n)

= �hcM ; �i+ �(�)

cMi =1

n

nX`=1

x(`)i ; cMij =

1

n

nX`=1

x(`)i x

(`)j

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 22 / 46

Page 31: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Maximum likelihood

b�(n) = b�(fx (`)g1�`�n) � argmin�Ln(�; fx

(`)g)

Unique : � 7! Ln(�; fx(`)g) is strictly convex;

Consistent : As n !1, cM !M , Mi = E��fxig, Mij = E��fxixj g

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 23 / 46

Page 32: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Maximum likelihood

b�(n) = b�(fx (`)g1�`�n) � argmin�Ln(�; fx

(`)g)

Unique : � 7! Ln(�; fx(`)g) is strictly convex;

Consistent : As n !1, cM !M , Mi = E��fxig, Mij = E��fxixj g

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 23 / 46

Page 33: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Maximum likelihood

b�(n) = b�(fx (`)g1�`�n) � argmin�Ln(�; fx

(`)g)

Unique : � 7! Ln(�; fx(`)g) is strictly convex;

Consistent : As n !1, cM !M , Mi = E��fxig, Mij = E��fxixj g

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 23 / 46

Page 34: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Proof of consistency

�(b�)� hcM ; b�i � �(�)� hcM ; �i

By strict convexity, for � > 0

�(�0) � �(�) + hM ; (�0 � �)i+�

2k�0 � �k22

Hence

hM ; (b� � �)i+ �

2kb� � �k22 � hcM ; b�i � �hcM ; �i

2kb� � �k22 � h(cM �M ); (b� � �)i � kcM �Mk2kb� � �k2

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 24 / 46

Page 35: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Proof of consistency

�(b�)� hcM ; b�i � �(�)� hcM ; �i

By strict convexity, for � > 0

�(�0) � �(�) + hM ; (�0 � �)i+�

2k�0 � �k22

Hence

hM ; (b� � �)i+ �

2kb� � �k22 � hcM ; b�i � �hcM ; �i

2kb� � �k22 � h(cM �M ); (b� � �)i � kcM �Mk2kb� � �k2

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 24 / 46

Page 36: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Proof of consistency

�(b�)� hcM ; b�i � �(�)� hcM ; �i

By strict convexity, for � > 0

�(�0) � �(�) + hM ; (�0 � �)i+�

2k�0 � �k22

Hence

hM ; (b� � �)i+ �

2kb� � �k22 � hcM ; b�i � �hcM ; �i

2kb� � �k22 � h(cM �M ); (b� � �)i � kcM �Mk2kb� � �k2

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 24 / 46

Page 37: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Proof of consistency

�(b�)� hcM ; b�i � �(�)� hcM ; �i

By strict convexity, for � > 0

�(�0) � �(�) + hM ; (�0 � �)i+�

2k�0 � �k22

Hence

hM ; (b� � �)i+ �

2kb� � �k22 � hcM ; b�i � �hcM ; �i

2kb� � �k22 � h(cM �M ); (b� � �)i � kcM �Mk2kb� � �k2

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 24 / 46

Page 38: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

High-dimensional consistency bound

jE j = m

1

mkb� � �k22 �

4

�2mkcM �Mk22

�C

�2 n m

p

2

!

� � inf��min(r

2�(�))

with more work can eliminate the inf�.

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 25 / 46

Page 39: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

High-dimensional consistency bound

jE j = m

1

mkb� � �k22 �

4

�2mkcM �Mk22

�C

�2 n m

p

2

!

� � inf��min(r

2�(�))

with more work can eliminate the inf�.

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 25 / 46

Page 40: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

High-dimensional consistency bound

jE j = m

1

mkb� � �k22 �

4

�2mkcM �Mk22

�C

�2 n m

p

2

!

� � inf��min(r

2�(�))

with more work can eliminate the inf�.

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 25 / 46

Page 41: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Sounds good, right?

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 26 / 46

Page 42: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Sounds good, right?

Approximating �(�) is hard!

Bad sample complexity for sparse graphs!

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 27 / 46

Page 43: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Structural learning

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 28 / 46

Page 44: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Structural learning

nAlg(G ; �) � inf�n 2 N : Pn ;G;�fAlg(x

(1); : : : ; x (n)) = Gg � 1� �;

�Alg(G ; �) � # operations of Alg when run on nAlg(G ; �) samples

Typically, we assume G sparse

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 29 / 46

Page 45: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

How would you modify maximum likelihod?

minimize L(�; fx (`)g)

subject to k�k0 � m

Intractable!

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 30 / 46

Page 46: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

How would you modify maximum likelihod?

minimize L(�; fx (`)g)

subject to k�k0 � m

Intractable!

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 30 / 46

Page 47: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

`1-regularized maximum likelihood

b� = argmin�

L(�; fx (`)) + �k�k1

= �hcM ; �i+ �(�) + �k�k1

[cf. J.Friedman, T.Hastie, R.Tibshirani, Biostatistics, 2008]

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 31 / 46

Page 48: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

`1-regularized maximum likelihood

b� = argmin�

L(�; fx (`)) + �k�k1

= �hcM ; �i+ �(�) + �k�k1

[cf. J.Friedman, T.Hastie, R.Tibshirani, Biostatistics, 2008]

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 31 / 46

Page 49: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

`1-regularized maximum likelihood

b� = argmin�

L(�; fx (`)) + �k�k1

= �hcM ; �i+ �(�) + �k�k1

[cf. J.Friedman, T.Hastie, R.Tibshirani, Biostatistics, 2008]

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 31 / 46

Page 50: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Local independence test

Idea: For each i 2 V , and for any candidate neighborood S ,

test independence of xi and xV nSi , Si � S [ fig.

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 32 / 46

Page 51: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

A possible implementation

Local Independence Test( samples fx (`)g )

1: For each i 2 V ;2: For each S � V n fig, jS j � k ,

3: Compute Score(S ; i) = cH (Xi jXS );4: Set S� = argminS Score(S ; i) and connect i to all j 2 S�;5: Prune the resulting graph.

[P.Abeel, D.Koller, A.Ng, 2006]

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 33 / 46

Page 52: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Another implementation

Score(S ; i) � minW�V nS ;j2S

maxxi ;xW ;xS ;xj��bPn ;G;�fXi = xi jXW = xW ;XS = xSg�bPn ;G;�fXi = xi jXW = xW ;XSnj = xSnj ;Xj = zj g

�� :

[G.Bresler, E.Mossel and A.Sly, APPROX 2008]

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 34 / 46

Page 53: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Another implementation

Local Independence Test( samples fx (`)g, thresholds ("; ) )

1: For each i 2 V ;2: For each S � V n fig, jS j � k ,3: Compute Score(S ; i);4: S� = argmaxfjS j : Score(S ; i) > "g and connect i to all j 2 S�;

Nobody would ever use these in practice!!

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 35 / 46

Page 54: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Another implementation

Local Independence Test( samples fx (`)g, thresholds ("; ) )

1: For each i 2 V ;2: For each S � V n fig, jS j � k ,3: Compute Score(S ; i);4: S� = argmaxfjS j : Score(S ; i) > "g and connect i to all j 2 S�;

Nobody would ever use these in practice!!

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 35 / 46

Page 55: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Why?

nk+1 operations!

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 36 / 46

Page 56: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

For the sake of simplicity

�ij = �, �i = 0

�G;�(x ) =1

ZG(�)exp

n�

X(i ;j )2E

xixj

o

M = (Mij )1�i ;j�n ; Mij = EG;�fxixj g ; cMij =1

n

nX`=1

x(`)i x

(`)j

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 37 / 46

Page 57: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

For the sake of simplicity

�ij = �, �i = 0

�G;�(x ) =1

ZG(�)exp

n�

X(i ;j )2E

xixj

o

M = (Mij )1�i ;j�n ; Mij = EG;�fxixj g ; cMij =1

n

nX`=1

x(`)i x

(`)j

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 37 / 46

Page 58: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

A very simple algorithm

Thresholding( samples fx (`)g, threshold � )

1: Compute the empirical correlations fcMij g(i ;j )2V�V ;2: For each (i ; j ) 2 V �V

3: If cMij � � , set (i ; j ) 2 E ;

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 38 / 46

Page 59: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

And its analysis

Theorem

If G is a tree, and � (�) = (tanh� + tanh2 �)=2, then

nThr(�)(G ; �) �8

(tanh� � tanh2 �)2log

2p

�:

Theorem

If G has maximum degree k > 1 and if � < atanh(1=(2k)) then

nThr(�)(G ; �) �8

(tanh� � 12k

)2log

2p

�:

Theorem

If k > 3 and � > C=k, there are graphs such that for any � ,nThr(�) = 1.

[J.Bento and A.Montanari, NIPS 2009, and arXiv:1110.1769]Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 39 / 46

Page 60: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

And its analysis

Theorem

If G is a tree, and � (�) = (tanh� + tanh2 �)=2, then

nThr(�)(G ; �) �8

(tanh� � tanh2 �)2log

2p

�:

Theorem

If G has maximum degree k > 1 and if � < atanh(1=(2k)) then

nThr(�)(G ; �) �8

(tanh� � 12k

)2log

2p

�:

Theorem

If k > 3 and � > C=k, there are graphs such that for any � ,nThr(�) = 1.

[J.Bento and A.Montanari, NIPS 2009, and arXiv:1110.1769]Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 39 / 46

Page 61: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

And its analysis

Theorem

If G is a tree, and � (�) = (tanh� + tanh2 �)=2, then

nThr(�)(G ; �) �8

(tanh� � tanh2 �)2log

2p

�:

Theorem

If G has maximum degree k > 1 and if � < atanh(1=(2k)) then

nThr(�)(G ; �) �8

(tanh� � 12k

)2log

2p

�:

Theorem

If k > 3 and � > C=k, there are graphs such that for any � ,nThr(�) = 1.

[J.Bento and A.Montanari, NIPS 2009, and arXiv:1110.1769]Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 39 / 46

Page 62: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Basic intuition

Thresholding works if

min(i ;j )2E

Mij > max(kl)62E

Mkl

This is true at small � because. . .

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 40 / 46

Page 63: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

High temperature series

ZG(�) =X

H�G;even

� jE(H )j ;

EG;�fxixj g =1

ZG(�)

XH�G;odd(H )=fi ;jg;

� jE(H )j ;

� = tanh� :

Theorem (R.Gri�ths, J. Math. Phys., 1967)

EG;�fxixj g �X

2SAW(i!j )

� j j

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 41 / 46

Page 64: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

High temperature series

ZG(�) =X

H�G;even

� jE(H )j ;

EG;�fxixj g =1

ZG(�)

XH�G;odd(H )=fi ;jg;

� jE(H )j ;

� = tanh� :

Theorem (R.Gri�ths, J. Math. Phys., 1967)

EG;�fxixj g �X

2SAW(i!j )

� j j

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 41 / 46

Page 65: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Does not work always because

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 42 / 46

Page 66: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

This phenomenon is generic

Example: Regularized pseudo-likelihoods[Meinshausen , Bühlmann, Ann.Stat. 2006]

[P.Ravikumar, M.Wainwright, J.La�erty, Ann.Stat. 2010]

�(i) � f�i ;j : j 2 [p] n figg.

minimize �1

n

nX`=1

logP�fx(`)i jx

(`)@i g+ �k�(i)k1

The �rst therm only depends on �(i)! Has explicit expression!

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 43 / 46

Page 67: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

This phenomenon is generic

Example: Regularized pseudo-likelihoods[Meinshausen , Bühlmann, Ann.Stat. 2006]

[P.Ravikumar, M.Wainwright, J.La�erty, Ann.Stat. 2010]

�(i) � f�i ;j : j 2 [p] n figg.

minimize �1

n

nX`=1

logP�fx(`)i jx

(`)@i g+ �k�(i)k1

The �rst therm only depends on �(i)! Has explicit expression!

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 43 / 46

Page 68: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

This phenomenon is generic

Example: Regularized pseudo-likelihoods[Meinshausen , Bühlmann, Ann.Stat. 2006]

[P.Ravikumar, M.Wainwright, J.La�erty, Ann.Stat. 2010]

�(i) � f�i ;j : j 2 [p] n figg.

minimize �1

n

nX`=1

logP�fx(`)i jx

(`)@i g+ �k�(i)k1

The �rst therm only depends on �(i)! Has explicit expression!

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 43 / 46

Page 69: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

This phenomenon is generic

Example: Regularized pseudo-likelihoods[Meinshausen , Bühlmann, Ann.Stat. 2006]

[P.Ravikumar, M.Wainwright, J.La�erty, Ann.Stat. 2010]

�(i) � f�i ;j : j 2 [p] n figg.

minimize �1

n

nX`=1

logP�fx(`)i jx

(`)@i g+ �k�(i)k1

The �rst therm only depends on �(i)! Has explicit expression!

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 43 / 46

Page 70: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

A numerical experiment

Uniformly random graphs of degree k = 4

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 44 / 46

Page 71: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

A numerical experiment

Theorem (J.Bento, A.Montanari, 2010)

There exists C > 0 such that regularized pseudolikelihod fails

I On random regular graphs of deree k for all � > C=k.

I On random subgraphs of the 2d grid, for all � > C.

I On `double-star' graphs for all � > C, and n > n0

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 45 / 46

Page 72: Stat 375: Inference in Graphical Models Lectures 11, 12 · Stat 375: Inference in Graphical Models Lectures 11, 12 Andrea Montanari Stanford University May 13, 2012 Andrea Montanari

Structural learning

I Lots of open problems

Andrea Montanari (Stanford) Stat375: Lecture 11, 12 May 13, 2012 46 / 46