chapter 8 cluster graph & belief...

81
Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical Models 2016 Fall

Upload: others

Post on 22-Sep-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Chapter 8 Cluster Graph &

Belief Propagation

Probabilistic Graphical Models

2016 Fall

Page 2: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Outlines

• Variable Elimination (消元法)

– Simple case: linear chain Bayesian networks

– VE in complex graphs

– Inferences in HMMs and linear-chain CRFs

• Exact Inference: Clique Tree

– Cluster graph and clique tree

– Message passing: sum product

– Message passing: belief update

– Constructing clique tree

Page 3: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Outlines

• From clique tree to loopy cluster graph

• Bethe cluster graph or cluster graph

• Belief propagation as variational inference

• Extensions of belief propagation

– Generalized belief propagation

– Convex belief propagation

– Expectation propagation

– ......

Page 4: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Simple Case: VE in Chains

d c b a

edcbaPeP ),,,,()(

A B C E D

Page 5: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Simple Case: VE in Chains

• By the chain rule for probabilities:

A B C E D

d c b a

d c b a

dePcdPbcPabPaP

edcbaPeP

)|()|()|()|()(

),,,,()(

Page 6: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Simple Case: VE in Chains

• Rearranging terms ...

A B C E D

d c b a

dePcdPbcPabPaPeP )|()|()|()|()()(

d c b a

abPaPdePcdPbcP )|()()|()|()|(

Page 7: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Simple Case: VE in Chains

• Perform the innermost summation

A B C E D

d c b

d c b a

bpdePcdPbcP

abPaPdePcdPbcPeP

)()|()|()|(

)|()()|()|()|()(

X

Page 8: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Simple Case: VE in Chains

• Rearrange and then sum again

A B C E D

d c

d c b

d c b

cpdePcdP

bpbcPdePcdP

bpdePcdPbcPeP

)()|()|(

)()|()|()|(

)()|()|()|()(

X X

Page 9: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

General Bayesian Networks

Visit Smoking

Lung Cancer Tuberculosis

Abnormality

in Chest Bronchitis

X-Ray Dyspnea

Example: a simplified lung disease diagnostic network

Page 10: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

V S

L T

A B

X D

),|()|(),|()|()|()|()()( BADPAXPLTAPSBPSLPVTPSPVP

• We want to compute P(D) • Need to eliminate: V,S,X,T,L,A,B

),,(),().,(),(),(),()()( BADAXLTASBSLVTSV DXABLTSV

Initial factors

Page 11: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

V S

L T

A B

X D

Eliminate: V

Note: τ1(T) = P(T)

Compute: V

TV VTVT ),()()(1

),,(),().,(),(),(),()()( BADAXLTASBSLVTSV DXABLTSV

),,(),().,(),(),()()(1 BADAXLTASBSLST DXABLS

• We want to compute P(D)

• Need to eliminate: V,S,X,T,L,A,B

Current factors

Page 12: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

V S

L T

A B

X D

Eliminate: S

Compute: ),(),()(),(2 SBSLSBL B

V

LS

),,(),().,(),(),()()(1 BADAXLTASBSLST DXABLS

),,(),().,(),()( 21 BADAXLTALBT DXA

• We want to compute P(D)

• Need to eliminate: S,X,T,L,A,B

Current factors

Page 13: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

V S

L T

A B

X D

Eliminate: X

Compute: X

X AXA ),()(3

),,()().,(),()( 321 BADALTALBT DA

),,(),().,(),()( 21 BADAXLTALBT DXA

Note: τ3(A) = 1 for all values of A !!

• We want to compute P(D)

• Need to eliminate: X,T,L,A,B

Current factors

Page 14: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

V S

L T

A B

X D

Eliminate: T

Compute: T

A LTATLA ),,()(),( 14

),,()(),(),( 324 BADALBLA D

),,()().,(),()( 321 BADALTALBT DA

• We want to compute P(D)

• Need to eliminate: T,L,A,B

Current factors

Page 15: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

V S

L T

A B

X D

Eliminate: L

Compute: L

LBLABA ),(),(),( 245

),,()(),( 35 BADABA D

),,()(),(),( 324 BADALBLA D

• We want to compute P(D)

• Need to eliminate: L,A,B

Current factors

Page 16: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

V S

L T

A B

X D

Eliminate: A

Compute: ),,()(),(),( 356 BADABADB D

A

),(6 DB

),,()(),( 35 BADABA D

• We want to compute P(D)

• Need to eliminate: A,B

Current factors

Page 17: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

V S

L T

A B

X D

Eliminate: B

Compute: B

DBD ),()( 67

),(6 DB

Note: τ7(D) is P(D)

• We want to compute P(D)

• Need to eliminate: B

Current factors

Page 18: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

• How do we deal with evidence?

• Suppose get evidence V = v, S = s, D = d

• We want to compute P(L, V = v, S = s, D = d)

V S

L T

A B

X D

Dealing with Evidence

Page 19: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Dealing with Evidence

• Compute P(L, V = v, S = s, D = d ) • Initial factors, after setting evidence:

V S

L T

A B

X D

),,(),(),,(),(),(),()()( BAdAXLTAsBsLvTsv DXABLTSV

),(~),(),,()(~)(~)(~()~()~ BAAXLTABLT DXABLTSV

Page 20: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Induced Graph in VE

• Want to compute P(L)

• Step 1: Moralizing

V S

L T

A B

X D

L T

A B

X

V S

D

Page 21: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

• Want to compute P(L)

• Moralizing • Eliminating V

V S

L T

A B

X D

L T

A B

X

V S

D

V

TV VTVT ),()()(1

Induced Graph in VE

Page 22: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

• Want to compute P(L)

• Moralizing • Eliminating V • Eliminating S

V S

L T

A B

X D

L T

A B

X

V S

D ),(),()(),(2 SBSLSBL B

V

LS

Induced Graph in VE

Page 23: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

• Want to compute P(L)

• Moralizing

• Eliminating V • Eliminating S

• Eliminating X

V S

L T

A B

X D

L T

A B

X

V S

D

X

X AXA ),()(3

Induced Graph in VE

Page 24: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

• Want to compute P(D)

• Moralizing

• Eliminating V • Eliminating S

• Eliminating X • Eliminating T

V S

L T

A B

X D

L T

A B

X

V S

D

T

LBLABA ),(),(),( 245

Induced Graph in VE

Page 25: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

• Want to compute P(D)

• Moralizing • Eliminating V • Eliminating S • Eliminating X • Eliminating T • Eliminating L

V S

L T

A B

X D

L T

A B

X

V S

D

L

LBLABA ),(),(),( 245

Induced Graph in VE

Page 26: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

• Want to compute P(D)

• Moralizing • Eliminating V • Eliminating S • Eliminating X • Eliminating T • Eliminating L • Eliminating A, B

V S

L T

A B

X D

L T

A B

X

V S

D

),,()(),(),( 356 BADABADB D

A

B

DBD ),()( 67

Induced Graph in VE

Page 27: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

L T

A B

X

V S

D

Induced Graph in VE

1) Moralized for BN

2) Chordal

Page 28: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

VE: Inferences in HMMs and CRFs

• Please recall the graphic representations of HMMs, MEMMs and linear-chain CRFs

• Given X, the backbone of Y is the same

Y0 Y1

X1

Y2

X2

Y3

X3

Y0 Y1

X1

Y2

X2

Y3

X3

Page 29: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

VE: Forward Algorithm

• Initialization:

• Induction:

• Termination:

1,..., , |t t ti P x x y i

11 ,i i xi e

11 , ,

1t

N

t t j i i x

j

i j t e

1

|N

T

i

P X i

Y1

X1

Y2

X2

Y3

X3

Compute 𝑷(𝑿|𝜽)

Page 30: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

VE Disadvantages

• We need to traverse the whole graph for each

run of reference

• Many intermediate results during previous runs

of variable elimination can be re-used

– For example, if we have run the reference of P(D),

to infer P(X), results for eliminating V, S, T, L, A, B

can be re-used

– We can directly re-start from P(A)

V S

L T

A B

X D

Page 31: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

VE Disadvantages

• For the induced undirected graph of

BN (moralized & chordal), the basic

structures are the cliques (maximal)

• If we can pre-calculate the marginal

distributions defined on maximal

cliques, the inferences may save many

re-calculations

C

D I

S G

L

J H Can we design an algorithm

to achieve this goal?

Page 32: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Clique Tree: A Concrete Example

Two important features:

Tree and family preserving

Running intersection property

• Clique Tree

– For a chordal graph

– A tree-like structure by cliques (maximal)

C,D G,I,D

D

G,S,I G,J,S,L H,G,J

G,I G,S G,J

1 2 3 4 5

C

D I

S G

L

J H

Page 33: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Clique Tree: A Concrete Example

• Tree and family preserving

– Factors 𝜙𝑖 are defined on cliques

– Edges are defined on the sepset Si,j of two

directly connected cliques

• Running intersection property

– Any variable X only exists in a unique sub-

path along the tree

C,D G,I,D

D

G,S,I G,J,S,L H,G,J

G,I G,S G,J

1 2 3 4 5

C

D I

S G

L

J H

Page 34: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Clique Tree: A Concrete Example

• Assign local CPDs to factors

C,D G,I,D

D

G,S,I G,J,S,L H,G,J

G,I G,S G,J

P(C)

P(D|C)

P(G|I,D) P(I)

P(S|I)

P(L|G)

P(J|L,S)

P(H|G,J)

1 2 3 4 5

The clique tree is an equivalent

representation of P as the original

factorization representation

C

D I

S G

L

J H

Page 35: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Exact Inference: Clique Trees

• Exploits factorization of the distribution for

efficient inference, similar to variable elimination

– Advantage: avoid unnecessary (or repeated)

computations if repeated queries are needed

• Distribution (un-normalized) can be represented

by clique tree with associated factors

– 𝑃 Φ 𝒳 = 𝜙𝑖 𝒳𝑖𝜙𝑖∈Φ

• For Bayesian networks, factors are local CPDs

• For Markov networks, factors are clique potentials

Page 36: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

C

D I

S G

L

J H

C,D G,I,D

D

G,S,I G,J,S,L H,G,J

G,I G,S G,J

P(C)

P(D|C)

P(G|I,D) P(I)

P(S|I)

P(L|G)

P(J|L,S)

P(H|G,J)

• Goal: Compute P(J)

– Set initial factors at each cluster as products

– C1: Eliminate C, sending 12(D) to C2

– C2: Eliminate D, sending 23(G,I) to C3

– C3: Eliminate I, sending 35(G,S) to C5

– C4: Eliminate H, sending 45(G,J) to C5

– C5: Obtain P(J) by summing out G,S,L

1 2 3 4 5

Message Passing: Sum Product on Clique Tree

Page 37: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

C

D I

S G

L

J H

C,D G,I,D

D

G,S,I G,J,S,L H,G,J

G,I G,S G,J

P(C)

P(D|C)

P(G|I,D) P(I)

P(S|I)

P(L|G)

P(J|L,S)

P(H|G,J)

• Goal: Compute P(J)

– Set initial factors at each cluster as products

– C1: Eliminate C, sending 12(D) to C2

– C2: Eliminate D, sending 23(G,I) to C3

– C3: Eliminate I, sending 35(G,S) to C5

– C5: Eliminate SL, sending 54(G,J) to C4

– C4: Obtain P(J) by summing out H,G

1 2 3 4 5

Message Passing: Sum Product on Clique Tree

Page 38: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

C

D I

S G

L

J H

C,D G,I,D

D

G,S,I G,J,S,L H,G,J

G,I G,S G,J

1 2 3 4 5

P(C)

P(D|C)

P(G|I,D) P(I)

P(S|I)

P(L|G)

P(J|L,S)

P(H|G,J)

Page 39: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Clique Tree Message Passing

• Let T be a clique tree and C1,...Ck its cliques

– Multiply factors of each clique, resulting in initial

potentials as each factor is assigned to some clique

() then we have

– Define Cr as the root cluster

– Start from tree leaves and move inward

j

jj C)(:

0 ][

k

jjj C

1

0 ][

C,D G,I,D

D

G,S,I G,J,S,L H,G,J

G,I G,S G,J

1 2 3 4 5

C

D I

S G

L

J H

Page 40: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Clique Tree Message Passing: Example

C1 C4

C3

C2

C5 C6

• Root C6

– Legal ordering I: 1,2,3,4,5,6

– Legal ordering II: 2,5,1,3,4,6

– Illegal ordering: 3,4,1,2,5,6

Page 41: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Clique Tree Calibration

• Calibration (校准): the sending messages of two adjacent cliques should be equal

• For calculating the probability of any variable, we need a more efficient algorithm to do the calculations rather than repeating above sum operations for each clique

• Obviously, there are some information during message passing which can be re-used to calculate the probability of other variables

C,D G,I,D

D

G,S,I G,J,S,L H,G,J

G,I G,S G,J

1 2 3 4 5

Page 42: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Clique Tree Calibration

• We say that cluster 𝐶𝑖 is ready to transmit to neighbor 𝐶𝑗

when 𝐶𝑖 has messages from all its neighbors except 𝐶𝑗 .

• When 𝐶𝑖 is ready, it can compute the message 𝛿𝑖→𝑗 𝑆𝑖,𝑗

by multiplying its initial potential 𝛽𝑖0 (final potential 𝛽𝑖)

with all the coming messages 𝛿𝑘∈*𝑁𝑏𝑖−𝑗+→𝑖 and then

eliminate the variables not in the sepset 𝐶𝑖 − 𝑆𝑖,𝑗

Page 43: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Clique Tree Calibration: Example

• Root: C5 (first downward pass)

C

D I

S G

L

J H

C,D G,I,D

D G,S,I G,J,S,L H,G,J

G,I G,S G,J

1 2 3 4 5

P(C)

P(D|C)

P(G|I,D) P(I)

P(S|I)

P(L|G)

P(J|L,S)

P(H|G,J)

After the upward pass, we already

get the marginal factor β(GJLS)

Page 44: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Clique Tree Calibration: Example

• Root: C5 (second downward pass) C

D I

S G

L

J H

C,D G,I,D

D G,S,I G,J,S,L H,G,J

G,I G,S G,J

1 2 3 4 5

P(C)

P(D|C)

P(G|I,D) P(I)

P(S|I)

P(L|G)

P(J|L,S)

P(H|G,J)

Page 45: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Clique Tree Calibration: Sum-Product

D G,I G,S G,J C,D G,I,D G,S,I G,J,S,L H,G,J

1 2 3 4 5

D G,I G,S G,J C,D G,I,D G,S,I G,J,S,L H,G,J

1 2 3 4 5

Page 46: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Clique Tree Calibration: Sum-Product

• After calibration, the final factors associated

with cliques are updated to the marginal

distributions (or factors) over the cliques

D G,I G,S G,J C,D G,I,D G,S,I G,J,S,L H,G,J

1 2 3 4 5

D G,I G,S G,J C,D G,I,D G,S,I G,J,S,L H,G,J

1 2 3 4 5

Page 47: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Clique Tree Calibration

• If X appears in multiple cliques, they must agree

– A clique tree with potentials i[Ci] is said to be

calibrated if for all neighboring cliques Ci and Cj:

– 𝛽𝑖 𝐶𝑖𝐶𝑖−𝑆𝑖,𝑗= 𝛽𝑗 𝐶𝑗𝐶𝑗−𝑆𝑖,𝑗

• Advantage: compute posteriors for all the cliques

using only twice passes

Page 48: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Calibrated Clique Tree as Distribution

• A calibrated clique tree is more than simply a data

structure that stores the results of probabilistic

inference for all of the clique in the tree.

• It can also be viewed as an alternative representation

of the measure 𝑃 𝜙 (un-normalized distribution).

• We can easily prove:

, ,

i

i j

i i

C T

i j i j

C C T

C

PS

The product of the marginal

distributions (or factors) of all the

cliques divided by the marginal

factors of all the sepsets

Page 49: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

A B C A,B B,C

B

Bayesian network Clique tree

• For calibrated tree

• Joint distribution can thus be written as

AC

BA

CB

CB

CB

BP

CB

BP

CBPBCP

],[

],[

],[

],[

)(

],[

)(

),()|(

1

2

2

22

][

],[],[)|(),(),,(

2,1

21

B

CBBABCPBAPCBAP

Calibrated trees can be

alternative representations of

the distributions.

They are equal!

Calibrated Clique Tree as Distribution

Page 50: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

To Random-Order Message Passing

• In the above sum-product message passing, a

factor i can transmit a message to j only if it has

received all the other messages (factor is ready)

• Can the factor i transmit a message to its

neighbors when it is not ready?

Page 51: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Message Passing: Belief Update

• Recall the clique tree calibration algorithm

– Upon calibration the final potential at i is:

– A message from i to j sums out the non-sepset

variables from the product of initial potential and all

other messages

– Can also be viewed as multiplying all messages and

dividing by the message from j to i

ii Nk iki 0

}{

0

, jNk ikSCjiijii i

ij

SC i

ij

Nk ikSC

jijiiijii i

,,

0

Page 52: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Message Passing: Belief Update

X1 X2 X3 X1,X2 X2,X3

X2

Bayesian network Clique tree

• Root: C2

• C1 to C2 Message: • C2 to C1 Message:

• Alternatively compute

• And then:

• Thus, the two approaches are equivalent

],[)()(],[ 3202323221322 XXXXXX

X4 X3,X4

X3

)|()(],[)( 121210

22111 1

XXPXPXXXXX

)(],[)( 323320

2123 2

XXXXX

33

3 )(],[)(

],[

)(

],[)( 32332

02

221

322

221

322

212XX

X XXXX

XX

X

XXX

Page 53: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Message Passing: Belief Update

• Based on the observation above, belief update

– Different message passing scheme

– Each clique Ci maintains its fully updated beliefs i

• product of initial messages and messages from neighbors

– Store at each sepset Si,j the previous message i,j passed regardless of the direction

– When passing a message, divide by previous i,j

– Claim: message passing is correct regardless of the clique that sent the last message

– This is called belief update

Page 54: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Algorithm for Belief Update

The BU is arbitrary. You can randomly

choose any edge in the clique graph

for the update. At convergence:

𝛽𝑖

𝐶𝑖−𝑆𝑖,𝑗

= 𝛽𝑗

𝐶𝑗−𝑆𝑖,𝑗

Page 55: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Clique Tree Invariant for BU

• BU maintains distribution invariant property

– Upon calibration we have

– Initially this invariant holds obviously

– At each update step invariant is also maintained

• Message only changes j and i,j

• We need to prove 𝛽𝑗

𝑈

𝜇𝑖,𝑗𝑈 =

𝛽𝑗

𝜇𝑖,𝑗

• This is exactly the message passing step 𝛽𝑗𝑈 = 𝛽𝑗𝜇𝑖,𝑗

𝑈 𝜇𝑖,𝑗

TCC jiji

TC ii

ji

i

S

CP

)( ,, )(

][)(

U

Belief update re-parameterizes P at each step

Page 56: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Inference on Calibrated Clique Trees

• Single variable inference

– The posterior of a target variable X can be directly

computed by eliminating the redundant variables

from a clique that contains X

• Inference outside a clique

• Inference with increment (or evidence)

After calibration, the final factors associated with

cliques are updated to the marginal distributions

(or factors) over the cliques

Page 57: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Answering Queries Outside a Clique

'

'

,

( )

'

' '

T

T

i

i V

i j

i j

P Y

Y Y Scope T

Find a minimal sub-path on

the clique tree which contains

all the query variables!

Page 58: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Answering Queries with Increments

• Introducing evidence Z=z

• Compute posterior of X where X is in a clique with Z

– Since clique tree is calibrated, multiply the clique that contains X and Z with indicator function I(Z=z) and sum out irrelevant variables

• Compute posterior of X if not sharing a clique with Z

– Introduce indicator function I(Z=z) into some clique containing Z and propagate messages along path to clique containing X

– Sum out irrelevant factors from clique containing X

Page 59: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Comments on Calibrated Clique Trees

• What is a calibrated clique tree?

– From the initial factors ψ Ci or ψi generated from

BNs/MNs, a clique tree is calibrated if we get the joint

distributions β Ci or βi associated with all nodes

(cliques) and the joint distributions μi,j of all sepsets in

the tree. We can use either sum product or belief

update to do the calibration.

Page 60: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Comments on Calibrated Clique Trees

• What is a calibrated clique tree?

• Why does a clique tree need to be calibrated?

– All CPDs/factors in BNs/MNs are equivalently

transformed as calibrated factors and messages in clique

tree. The joint distribution is invariant for the calibrated

beliefs and all the steps of BU.

– The calibrated factors 𝛽𝑖 and messages 𝜇𝑖,𝑗 are directed

and completely associated the clique tree.

Page 61: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Comments on Calibrated Clique Trees

• What is a calibrated clique tree?

• Why does a clique tree need to be calibrated?

• What is the advantage of clique tree?

– In most cases, the structure of clique tree is simpler

than the original BNs/MNs. Inference will be more

efficient.

– Belief propagation can be easily extended to approximate

inference

Page 62: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Constructing Clique Trees

• Goal: construct a tree that is family preserving and obeys the running intersection property

• Triangulate the graph to construct a chordal graph H – NP-hard to find triangulation where the largest clique in the

resulting chordal graph has minimum size

• Find cliques in H and make each a node in the graph – Finding maximal cliques is NP-hard

– Can start with families and grow greedily

• Construct a tree over the clique nodes – Use maximum spanning tree on an undirected graph whose

nodes are maximal cliques and edge weight is |CiCj|

– Can show that resulting graph obeys running intersection

Page 63: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

C

D I

S G

L

J H

C

D I

S G

L

J H

One possible

triangulation

C

D I

S G

L

J H

Moralized

Graph

C,D G,I,D G,S,I G,S,L L,S,J

1 2 2 2

G,H 1 1

1

Cluster graph with edge weights

1 1

C,D

G,I,D

G,S,I G,S,L L,S,J

G,H

Find the maximum spanning tree

Page 64: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Limitations of Clique Tree

• Hard to construct induced graph & clique tree

in large graph

• Inefficiency for Markov networks with loops

Pairwise Markov Network

Page 65: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

From Clique Tree to Loopy Cluster Graph

• Can we directly define clusters on the cliques

of the original rather than induced graph?

• The message passing strategy cannot stop due

to the problem of message circulating

Markov Network Clique Tree Loopy Cluster Graph

Page 66: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

BP in Loopy Cluster Graph

Loopy Cluster Graph Markov Network

Page 67: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

BP in Loopy Cluster Graph

Approximate marginal

by multiplying initial beliefs

and all the incoming messages

A new message by multiplying

initial beliefs and all the

incoming messages except i

Page 68: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Comments for BP in Loopy Graph

• Initialize all the messages as 1, so any cluster can send its messages at the beginning

• Don’t require messages are equal in both directions

• Maintain the factor beliefs by multiplying initial beliefs/potentials with all the incoming messages

• Send messages by multiplying initial beliefs/potentials with all the incoming messages except the target cluster (eliminating the beliefs of the variables not in the sepset)

Page 69: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Running Intersection Property

• For any variable, if a variable X exists in cluster Ci and Cj, there exists only a single path between the two clusters, all the clusters on the path contain the variable X

• If a cluster graph follows running intersection property, the calibrated beliefs are approximate marginal potentials of the clusters

– Beliefs do not change over time

– Or 𝛽𝑖𝐶𝑖−𝑆𝑖,𝑗= 𝛽𝑗𝐶𝑗−𝑆𝑖,𝑗

Page 70: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Bethe Cluster Graph or Factor Graph

• Two types of nodes

– Factor nodes defined on cliques

– Variable nodes defined on variables

• Factor graph ensures running

interaction property

• Two types of messages

– From factors to variables (simply

eliminate other variables)

– From variables to factors

A, B

B, C

C, D

D, A

A

B

C

D

Page 71: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Bethe Cluster Graph or Factor Graph

• From variables to factors

• From factors to variables

• Beliefs after the propagation

A, B

B, C

C, D

D, A

A

B

C

D

\

x f c x

c nb x f

\

,f x c f

c nb f x

f x

y

y

x c x

c nb x

x

Factor graph can easily calculate the

approximate marginal of single variable

Page 72: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

The Problem of Convergence

• The first concern of belief propagation is the

convergence

• If converged, another concern is that the

calibrated beliefs are not equal to the correct

marginal potentials

Do we have some heuristic solutions?

Page 73: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

The Problem of Convergence

• Several improvements

– Message scheduling: residual belief propagation,

order the messages according to their changes

– Minimum spanning tree: each time select a

different spanning tree of the cluster graph and do

calibration

– ......

Can we find a theoretical explanation of

belief propagation on cluster graph?

Page 74: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

An application to a 11*11 Ising model

Page 75: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

BP as Variational Inference

• The major advance of the theoretical analysis of belief propagation is that Yedidia et al. (2000, 2005) shows these approaches are maximizing an approximate energy functional

• This result connected the algorithmic developments in the field with literature on free-energy approximations developed in statistical mechanics (Bethe 1935; Kikuchi 1951), such as mean field inference

Page 76: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Exact Inference as Optimization

• Energy functional

– 𝐷𝐾𝐿 𝑄||𝑃Φ = ln𝑍 − 𝐻𝑄 𝑋 + 𝐸𝑄 𝜙𝜙∈Φ

– The second term is energy functional 𝐹 𝑃 Φ, 𝑄

• For a clique tree with clique 𝐶𝑖, we have a set of beliefs Q (𝛽𝑖 , 𝜇𝑖,𝑗 - not calibrated). Its energy

functional:

– 𝑃 Φ = 𝜓𝑖𝑖 (𝜓𝑖 is the initial factor for each clique)

– 𝐹 𝑃 Φ, 𝑸 = 𝐻𝛽𝑖𝐶𝑖𝑖 − 𝐻𝜇𝑖,𝑗

𝑆𝑖,𝑗𝑖 + 𝐸𝛽𝑖l𝑛 𝜓𝑖𝑖

Page 77: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Exact Inference as Optimization

• If Q is calibrated as Q*, we can conclude that

– 𝐹 𝑃 Φ, 𝑄∗ = max𝑸

𝐹 𝑃 Φ, 𝑸 = ln 𝑍

• Because the distribution is invariant for any calibrated beliefs, the relative entropy is minimized as 0

• Transform SP/BU algorithm as optimization

Page 78: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Extensions of Belief Propagation

• Generalized belief propagation

• Convex belief propagation

• Expectation propagation

• ......

Page 79: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

Region Graph

Page 80: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

References for Belief Propagation

• Textbook #2: Chapter 22. More variational

inference

• Wainwright MJ, Michael IJ. Graphical

models, exponential families, and

variational inference. Foundations and

Trends® in Machine Learning, 1(1-2):1-305.

Page 81: Chapter 8 Cluster Graph & Belief Propagationbioinfo.au.tsinghua.edu.cn/.../jgu/pgm/materials/Chapter8-ClusterGra… · Chapter 8 Cluster Graph & Belief Propagation Probabilistic Graphical

The End of Chapter 8

Cluster graph is another data structure

for efficient inferences of PGMs