some surprises in the theory of generalized belief propagation jonathan yedidia mitsubishi electric...

46
Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT) Yair Weiss (Hebrew University) See “Constructing Free Energy Approximations and Generalized Belief Propagation Algorithms,” MERL TR2004-040, to be published in IEEE Trans. Info. Theory.

Upload: jewel-hudson

Post on 16-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Some Surprises in the Theory of Generalized Belief Propagation

Jonathan Yedidia Mitsubishi Electric Research Labs (MERL)

Collaborators:Bill Freeman (MIT)

Yair Weiss (Hebrew University)

See “Constructing Free Energy Approximations and Generalized Belief Propagation Algorithms,” MERL TR2004-040, to be published in IEEE Trans. Info. Theory.

Page 2: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Outline

• Introduction to GBP– Quick Review of Standard BP– Free Energies– Region Graphs and Valid

Approximations

• Some Surprises• Maxent Normal Approximations & a useful heuristic

Page 3: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Factor Graphs

A B C

1 2 3 4

a

aa XfZ

Xp )(1

)(

(Kschischang, et.al. 2001)

)(),,(),,(1

),,,( 44323214321 xfxxxfxxxfZ

xxxxp CBA

Page 4: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Computing Marginal Probabilities

• Decoding error-correcting codes

• Inference in Bayesian networks

• Computer vision

• Statistical physics of magnets

SXX

SS XpXp\

)()(

Fundamental for

Non-trivial because of the huge number of terms in the sum.

Page 5: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Error-correcting Codes

+ + +

Marginal Probabilities = A posteriori bit probabilities

(Tanner, 1981

Gallager, 1963)

Page 6: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Statistical Physics

Marginal Probabilities= local magnetization

Page 7: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Standard Belief Propagation

i

)(

)()(iNa

iiaii xmxb

“beliefs” “messages”

a

)( \)(

)()()(aNi aiNb

iibaaaa xmXfXb

The “belief” is the BP approximation of the marginal probability.

Page 8: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

BP Message-update Rules

Using ,)()(\

ia xX

aaii Xbxb we get

ai

ia xX iaNj ajNb

jjbaaiia xmXfxm\ \)( \)(

)()()(

i a=

Page 9: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Variational (Gibbs) Free Energy Kullback-Leibler Distance:

X Xp

XbXbpbD

)(

)(ln)()||(

“Boltzmann’s Law” (serves to define the “energy”):

)(exp1

)(1

)( XEZ

XfZ

Xpa

aa

X X

ZXbXbXEXbpbD ln)(ln)()()()||(

Variational Free Energy is minimized when

)(bU )(bH

)()( XpXb bG

Page 10: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

So we’ve replaced the intractable problem of computing marginal probabilities with the even more intractable problem of minimizing a variational free energy over the space of possible beliefs.

But the point is that now we can introduce interesting approximations.

Page 11: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Region-based Approximations to the Variational Free Energy

)]([ XbG

)}]([{ rr XbG

Exact:

Regions:

(intractable)

(Kikuchi, 1951)

Page 12: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Defining a “Region” A region r is a set of variable nodes Vr

and factor nodes Fr such that if a factor node a belongs to Fr , all variable nodes neighboring a must belong to Vr.

Regions

Not a Region

Page 13: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Region Definitions

rX

aFa

arr XfXEr

ln

rr Xb

rX

rrrrr XEXbbUr

rX

rrrrr XbXbbHr

ln

Region states:

Region beliefs:

Region energy:

Region average energy:

Region entropy:

Region free energy: rrrrrr bHbUbG

Page 14: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

“Valid” ApproximationsIntroduce a set of regions R, and a counting number cr for each region r in R, such that cr=1 for the largest regions, and for every factor node a and variable node i,

Rr

rrrr ViIcFaIc 1

Count every node once!

Rr

rrrr bGcbG }{

Indicator functions

Page 15: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Entropy and Energy

• : Counting each factor node once makes the approximate energy exact (if the beliefs are).

• : Counting each variable node once makes the approximate entropy reasonable (at least the entropy is correct when all states are equiprobable).

1rr FaIc

1rr ViIc

Page 16: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Comments• We could actually use different counting

numbers for energy and entropy—leads to “fractional BP” (Weigerinck & Heskes 2002) or “convexified free energy” (Wainwright et.al., 2002) algorithms.

• Of course, we also need to impose normalization and consistency constraints on our free energy.

Page 17: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Methods to Generate Valid Region-based Approximations

Cluster Variational

Method

(Kikuchi)

Junction graphs

Aji-McEliece

Region Graphs

Bethe

(Bethe is example of Kikuchi for Factor graphs with no 4-cycles; Bethe is example of Aji-McEliece for “normal” factor graphs.)

Junction trees

Page 18: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Example of a Region Graph

A,C,1,2,4,5 B,D,2,3,5,6 C,E,4,5,7,8 D,F,5,6,8,9

2,5 C,4,5 D,5,6 5,8

5[5] is a child of [2,5]

Page 19: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Definition of a Region Graph

• Labeled, directed graph of regions.• Arc may exist from region A to region B

if B is a subset of A. • Sub-graphs formed from regions

containing a given node are connected.• where is the set of

ancestors of region r. (Mobius Function)• We insist that

)(

1rAs

sr cc )(rA

.1:,

Rr

rrrr ViIcFaIcia

Page 20: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Bethe Method

Two sets of regions:

Large regions containing a single factor node a and all attached variable nodes.

Small regions containing a single variable node i.

3

6

9

1 2

4 5

7 8

A B

C D

E F

A;1,2,4,5 B;2,3,5,6 C;4,5 D;5,6 E;4,5,7,8 F;5,6,8,9

1 2 3 4 5 6 7 8 9

1rc

ir dc 1

(after Bethe, 1935)

Page 21: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Bethe Approximation to Gibbs Free Energy

ia xiiii

ii

a aa

aa

XaaBethe xbxbd

Xf

XbXbG ln1ln

Equal to the exact Gibbs free energy when the factor graph is a tree because in that case,

id

iii

aaa xbXbXb

1

)(

Page 22: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Minimizing the Bethe Free Energy

a xXiiaa

aNi xiai

xii

iiBethe

iai

i

xbXbx

xbGL

\)(

}1)({

0)(

ii xb

L

)(

)(1

1exp)(

iNaiai

iii x

dxb

0)(

aa Xb

L

)(

)()(exp)(aNi

iaiaaaa xXEXb

Page 23: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Bethe = BP

aiNb

iibiai xmx)(

)(ln)(Identify

)(

)()(iNa

iiaii xmxb

to obtain BP equations:

)( \)(

)()()(aNi aiNb

iibaaaa xmXfXb

i

Page 24: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Cluster Variation Method

Form a region graph with an arbitrary number of different sized regions. Start with largest regions.

Then find intersection regions of the largest regions, discarding any regions that are sub-regions of other intersection regions.

Continue finding intersections of those intersection regions, etc.

All intersection regions obey , where

S(r ) is the set of super-regions of region r.

1rc

)(

1rSs

sr cc

(Kikuchi, 1951)

Page 25: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Region Graph Created Using CVM

A,C,1,2,4,5 B,D,2,3,5,6 C,E,4,5,7,8 D,F,5,6,8,9

2,5 C,4,5 D,5,6 5,8

5

Page 26: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Minimizing a Region Graph Free Energy

• Minimization is possible, but it may be awkward because of all the constraints that must be satisfied.

• We introduce generalized belief propagation algorithms whose fixed points are provably identical to the stationary points of the region graph free energy.

Page 27: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Generalized Belief Propagation

• Belief in a region is the product of:– Local information (factors in region)– Messages from parent regions– Messages into descendant regions from

parents who are not descendants.

• Message-update rules obtained by enforcing marginalization constraints.

Page 28: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

58

2356 4578 5689

25 45 56

1245

5

Generalized Belief Propagation

1 2 3

4 5 6

7 8 9

Page 29: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

58

2356 4578 5689

25 45 56

1245

5

Generalized Belief Propagation

585654525 mmmmb

1 2 3

4 5 6

7 8 9

Page 30: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

58

2356 4578 5689

25 45 56

1245

5

Generalized Belief Propagation

]][[ 585652457845124545 mmmmmfb

1 2 3

4 5 6

7 8 9

Page 31: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

58

2356 4578 5689

25 45 56

1245

5

Generalized Belief Propagation

1 2 3

4 5 6

7 8 9][ 452514121245 ffffb ][ 585645782536 mmmm

Page 32: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

1 2 3

4 5 6

7 8 9

Generalized Belief Propagation

1 2 3

4 5 6

7 8 9

4

),()( 544555x

xxbxb

=

Use Marginalization Constraints to Derive Message-Update Rules

Page 33: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

1 2 3

4 5 6

7 8 9

Generalized Belief Propagation

1 2 3

4 5 6

7 8 9

4

),()( 544555x

xxbxb

=

Use Marginalization Constraints to Derive Message-Update Rules

Page 34: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

1 2 3

4 5 6

7 8 9

Generalized Belief Propagation

1 2 3

4 5 6

7 8 9

4

),()( 544555x

xxbxb

=

Use Marginalization Constraints to Derive Message-Update Rules

Page 35: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

1 2 3

4 5 6

7 8 9

Generalized Belief Propagation

1 2 3

4 5 6

7 8 9

4

),(),(),()( 5445785445125445554x

xxmxxmxxfxm

=

Use Marginalization Constraints to Derive Message-Update Rules

Page 36: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

(Mild) Surprise #1• Region beliefs (even those given by

Bethe/BP) may not be realizable as the marginals of any global belief.

1.4.

4.1.),(

4.1.

1.4.),(),(

32

3121

xxb

xxbxxb

c

ba

a

c

b

1

2 3

is a perfectly acceptable solution to BP, but cannot arise from any ),,( 321 xxxb

Page 37: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

(Minor) Surprise #2

• For some sets of beliefs (sometimes even when the marginal beliefs are the exactly correct ones), the Bethe entropy is negative!

2ln22ln)2(42ln6 H

Each large region has counting number 1, each small region has counting number -2, so if the beliefs are

then the entropy is negative:

5.0

05.),(

5.5.)(

ji

i

xxb

xb

Page 38: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

(Serious) Surprise #3

• When there are no interactions, the minimum of the CVM free energy should correspond to the equiprobable global distribution, but sometimes it doesn’t!

Page 39: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

ExampleFully connected pairwise model, using CVM and all triplets as the largest regions, with pairs and singlets as the intersection regions.

2/)3)(2(

32/)1(

12/)2)(1(

NNN

NNN

NNNTriplets

Pairs

Singlets

# of regions Counting #

Page 40: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Two simple distributions

• Equiprobable distribution: each state is equally probable for all beliefs.– e.g.

• Distribution obtained by marginalizing the global distribution that only allows all-zeros or all-ones. “Equipolarized distribution”– e.g.

25.25.

25.25.),( ji xxb

5.0

05.),( ji xxb

Each region gives 2lnrr nH

Each region gives 2lnrH

Page 41: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Surprise #3 (cont.)

• Surprisingly, the “equipolarized” distribution can have a greater entropy than the equiprobable distribution for some valid CVM approximations.

• This is a serious problem, because if the model gets the wrong answer without any interactions, it can’t be expected to be correct with interactions.

Page 42: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

The Fix: Consider only Maxent-Normal Approximations

• A maxent-normal approximation is one which gives a maximum of the entropy when the beliefs correspond to the equiprobable distribution.

• Bethe approximations are provably maxent-normal.

• Some other CVM and other region-graph approximations are also provably maxent-normal. – Using quartets on square lattices – Empirically, these approximations give good results

• Other valid CVM or region-graph approximations are not maxent-normal.– Empirically, these approximations give poor results

Page 43: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

A Very Simple Heuristic

• Sum all the counting numbers. – If the sum is greater than N, the equipolarized

distribution will have a greater entropy than the equiprobable distribution. (Too Much—Very Bad)

– If the sum is less than 0, the equipolarized distribution will have a negative entropy. (Too Little)

– If the sum equals 1, the equipolarized distribution will have the correct entropy. (Just Right)

Page 44: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

10x10 Ising Spin Glass

Random interactions

Random fields

Page 45: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)
Page 46: Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Conclusions• Standard BP essentially equivalent to minimizing the

Bethe free energy.• Bethe method and cluster variation method are special

cases of the more general region graph method for generating valid free energy approximations.

• GBP is essentially equivalent to minimizing region graph free energy.

• One should be careful to use maxent-normal approximations.

• Sometimes you can prove that your approximation is maxent-normal; and simple heuristics can be used to prove when it isn’t.