intro to junction tree propagation and adaptations for a distributed environment thor whalen metron,...
TRANSCRIPT
Intro to Junction Tree Intro to Junction Tree propagation and propagation and
adaptations for a Distributed adaptations for a Distributed EnvironmentEnvironment
Thor Whalen
Metron, Inc.
a
b c
d
1
2 3
4 conflict5
67
8conflict
This naive approach of updating the network inherits oscillation problems!
Idea behind the Junction Tree Idea behind the Junction Tree AlgorithmAlgorithm
a
b c
d
a
bc
d
clever
algorithm
a
b c
d e
f
g
h
Secondary Structure/Junction Tree• multi-dim. random variables• joint probabilities (potentials)
Bayesian Network• one-dim. random variables• conditional probabilities
abd
ade
ace
ceg
eghdef
ad ae ce
de eg
• Write query in the form
• Iteratively– Move all irrelevant terms outside of innermost sum– Perform innermost sum, getting a new term– Insert the new term into the product
3 2
1( ) ( | ( ))k
i iX X X i
P X P X par X
Variable EliminationVariable Elimination (General Idea) (General Idea)
Eaxmple of Variable Elimination
• The “Asia” network:
Visit to Asia
Smoking
Lung CancerTuberculosis
Abnormalityin Chest
Bronchitis
X-Ray Dyspnea
V S
LT
A B
X D
),|( )|( ),|( )|( )|( )|( )( )( badPaxPltaPsbPslPvtPsPvP
We are interested in P(d)
- Need to eliminate: v,s,x,t,l,a,b
Initial factors:
Brute force:
v s x t l a b
badPaxPltaPsbPslPvtPsPvPdP ),|( )|( ),|( )|( )|( )|( )( )()(
V S
LT
A B
X D
),|( )|( ),|( )|( )|( )|( )( )( badPaxPltaPsbPslPvtPsPvP
Eliminate variables in order:
Initial factors:
v
v vtPvPtf )|()()(
baltxsv
),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv
[ Note: fv(t) = P(t) In general, result of elimination is not necessarily a probability term ]
),|( )|( ),|( )|( )|( )|( )( )( badPaxPltaPsbPslPvtPsPvP
Eliminate variables in order:
Initial factors:
V S
LT
A B
X D
baltxsv
),|( )|( ),|( )|( )|( )( )( badPaxPltaPsbPslPsPtfv
s
s slPsbPsPlbf )|()|()(),(
),|()|(),|(),()( badPaxPltaPlbftf sv
[ Note: result of elimination may be a function of several variables ]
),|(),|()(),()( badPltaPaflbftf xsv
),|( )|( ),|( )|( )|( )|( )( )( badPaxPltaPsbPslPvtPsPvP
Eliminate variables in order:
Initial factors:
V S
LT
A B
X D
baltxsv
),|( )|( ),|( )|( )|( )( )( badPaxPltaPsbPslPsPtfv),|( )|( ),|( ),( )( badPaxPltaPlbftf sv
x
x axPaf )|()([ Note: fx(a) = 1 for all values of a ]
),|( ),|( )( ),( )( badPltaPaflbftf xsv
),|( )|( ),|( )|( )|( )|( )( )( badPaxPltaPsbPslPvtPsPvP
Eliminate variables in order:
Initial factors:
V S
LT
A B
X D
baltxsv
),|( )|( ),|( )|( )|( )( )( badPaxPltaPsbPslPsPtfv),|( )|( ),|( ),( )( badPaxPltaPlbftf sv
t
vt ltaPtflaf ),|()(),(
),|(),()(),( badPlafaflbf txs
),|( ),|( )( ),( )( badPltaPaflbftf xsv
),|( )|( ),|( )|( )|( )|( )( )( badPaxPltaPsbPslPvtPsPvP
Eliminate variables in order:
Initial factors:
V S
LT
A B
X D
baltxsv
),|( )|( ),|( )|( )|( )( )( badPaxPltaPsbPslPsPtfv),|( )|( ),|( ),( )( badPaxPltaPlbftf sv
),|( ),( )( ),( badPlafaflbf txs
l
tsl laflbfbaf ),(),(),( ),|()(),( badPafbaf xl
),|( )( ),( badPafbaf xl
),|( ),|( )( ),( )( badPltaPaflbftf xsv
),|( )|( ),|( )|( )|( )|( )( )( badPaxPltaPsbPslPvtPsPvP
Eliminate variables in order:
Initial factors:
V S
LT
A B
X D
baltxsv
),|( )|( ),|( )|( )|( )( )( badPaxPltaPsbPslPsPtfv),|( )|( ),|( ),( )( badPaxPltaPlbftf sv
),|( ),( )( ),( badPlafaflbf txs
a
xla badpafbafdbf ),|()(),(),( ),( dbfa
),|( )( ),( badPafbaf xl
),|( ),|( )( ),( )( badPltaPaflbftf xsv
),|( )|( ),|( )|( )|( )|( )( )( badPaxPltaPsbPslPvtPsPvP
Eliminate variables in order:
Initial factors:
V S
LT
A B
X D
baltxsv
),|( )|( ),|( )|( )|( )( )( badPaxPltaPsbPslPsPtfv),|( )|( ),|( ),( )( badPaxPltaPlbftf sv
),|( ),( )( ),( badPlafaflbf txs
),( dbfa )(),()( dfdbfdf bb
ab
Intermediate factors
baltxsv
ga (l, t,d,b, x)
gb (l, t,d, x,s)
gx (l, t,d,s)
gt (l, t,s,v)
gv (l,d,s)
gs(l,d)
gl (d))(),(),(),(
)(),(
)(
dfdbfbaflaf
aflbf
tf
b
a
l
t
x
s
v
lsvtxba In our previous example: With a different ordering:
V S
LT
A B
X D
Complexity is exponential in the size of these factors!
Notes about variable elimination
• Actual computation is done in the elimination steps
• Computation depends on the order of elimination
• For each query we need to compute everything again!– Many redundant calculations
Junction Trees
• The junction tree algorithm “generalizes” Variable Elimination to avoid redundant calculations
• The JT algorithm compiles a class of elimination orders into a data structure that supports the computation of all possible queries.
Building a Junction Tree
DAG
Moral Graph
Triangulated Graph
Junction Tree
Identifying Cliques
Step 1: Moralization
a
b c
d e
f
g
h
a
b c
d e
f
g
h
a
b c
d e
f
g
h
1. For all w V:• For all u,vpa(w) add an edge e=u-v.
2. Undirect all edges.
GMG = ( V , E )
Step 2: Triangulation
Add edges to GM such that there is no cyclewith length 4 that does not contain a chord.
NO YES
a
b c
d e
f
g
h
a
b c
d e
f
g
h
GM GT
Step 2: Triangulation (cont.)
• Each elimination ordering triangulates the graph, not necessarily in the same way:
A
H
B
D
F
C
E
G
A
H
B
D
F
C
E
G
A
H
B
D
F
C
E
G
A
H
B
D
F
C
E
G
A
H
B
D
F
C
E
G
A
H
B
D
F
C
E
G
A
H
B
D
F
C
E
G
A
H
B
D
F
C
E
G
A
H
B
D
F
C
E
G
Step 2: Triangulation (cont.)
• Intuitively, triangulations with as few fill-ins as possible are preferred– Leaves us with small cliques (small probability tables)
• A common heuristic: Repeat until no nodes remain:
– Find the node whose elimination would require the least number of fill-ins (may be zero).
– Eliminate that node, and note the need for a fill-in edge between any two non-adjacent neighbors.
• Add the fill-in edges to the original graph.
a
b c
d e
f
g
h
a
b c
d e
f
g
h
a
b c
d e
f
g
a
b c
d e
f
a
b c
d e
a
b
d e
aa
e
a
d e
vertex induced addedremoved clique edges
1 h egh -2 g ceg -3 f def -4 c ace a-e
vertex induced added removed clique edges
5 b abd a-d6 d ade -7 e ae -8 a a -
GT
GM
Eliminate the vertex that requires least number of edges to be added.
Step 3: Junction Graph
• A junction graph for an undirected graph G is an undirected, labeled graph.
• The nodes are the cliques in G.
• If two cliques intersect, they are joined in the junction graph by an edge labeled with their intersection.
a
b
d
a
c
e
d e
f
a
d e
e
g
h
c
e
g
a
b c
d e
f
g
h
Bayesian NetworkG = ( V , E )
a
b c
d e
f
g
h
a
b c
d e
f
g
h
Moral graph GM Triangulated graph GT
abd
ade
ace
ceg
eghdef
ad ae ce
de eg
seperators
Junction graph GJ (not complete)e.g. ceg egh = eg Cliques
e
e
e
a
e
Step 4: Junction Tree
• A junction tree is a sub-graph of the junction graph that – Is a tree – Contains all the cliques (spanning tree)– Satisfies the running intersection property:
for each pair of nodes U, V, all nodes on the path between U and V contain VU
Running intersection?Running intersection?All vertices C and sepsets S along the path between any
two vertices A and B contain the intersection AB.
abd
ade
ace
ceg
eghdef
ad ae ce
de eg
Ex: A={a,b,d}, B={a,c,e} AB={a}C={a,d,e}{a}, S1={a,d}{a}, S2={a,e}{a}
AB
C
S1 S2
A few useful Theorems
• Theorem: An undirected graph is triangulated if and only if its junction graph has a junction tree
• Theorem: A sub-tree of the junction graph of a triangulated graph is a junction tree if and only if it is a spanning of maximal weight (sum of number the of variables in the domain of the link).
Junction graph GJ (not complete)
abd
ade
ace
ceg
eghdef
ad ae ce
de eg
e
e
e
a
e
abd
ade
ace
ceg
eghdef
ad ae ce
de eg
Junction tree GJT
There are several methods to find MST.
Kruskal’s algorithm: choose successively a link of
maximal weight unless it creates a cycle.
Colorful example
• Compute the elimination cliques(the order here is f, d, e, c, b, a).
• Form the complete junction graph over the maximal elimination cliques and find a maximum-weight spanning tree.
Principle of Inference
DAG
Junction Tree
Inconsistent Junction Tree
Initialization
Consistent Junction Tree
Propagation
)|( eE vVP
Marginalization
a
b
d
a
c
e
d e
f
a
d e
e
g
h
c
e
g
abd
ade
ace
ceg
eghdef
ad ae ce
de eg
sepsets
In JT cliquesbecomesvertices
GJT
Ex: ceg egh = eg
PotentialsPotentials
DEFINITION: A potential A over a set of variables XA is a function that maps each instantiation of xA into a non-negative real number.
Ex: A potential abc over
the set of vertices {a,b,c}.
Xa has four states, and
Xb and Xc has three
states.
A joint probability is a special case
of a potential where A(xA)=1.
The potentials in the junction tree are not consistent with each other., i.e. if we use marginalization to get the probability distribution for a variable Xu we will get different results depending on which clique we use.
abd
ade
ace
ceg
eghdef
ad ae ce
de eg
P(Xa) = ade
= (0.02, 0.43, 0.31, 0.12)
de
P(Xa) = ace
= (0.12, 0.33, 0.11, 0.03)
ce
The potentials might not even sum to one, so they are not joint probability distributions.
Message Passing from clique A to clique B
1. Project the potential of A into SAB
2. Absorb the potential of SAB into B
Projection Absorption
Propagating potentialsPropagating potentials
1. COLLECT-EVIDENCE messages 1-52. DISTRIBUTE-EVIDENCE messages 6-10
Global PropagationGlobal Propagation
32
5
1 48 10
9
7
6
Root
abd
ade
ace
ceg
eghdef
ad ae ce
de eg
A priori distributionA priori distribution
global propagation
potentials are consistent
Marginalizations gives probability distributions for the variables
Example: Create Join Tree
B C
A D
(this BN corresponds to an HMM with 2 time steps:
Junction Tree:
B,CA,B C,DB C
Example: Initialization
VariableAssociated
Cluster Potential function
A A,B
B A,B
C B,C
D C,D
, ( )A B P B
, ( ) ( | )A B P B P A B
, ( | )B C P C B
, ( | )C D P D C
B,CA,B C,DB C
B C
A D
Example: Collect Evidence
• Choose arbitrary clique, e.g. B,C, where all potential functions will be collected.
• Call recursively neighboring cliques for messages:
• 1. Call A,B:– 1. Projection onto B:
– 2. Absorption:, ( ) ( | ) ( )B A B
A A
P B P A B P B
, , ( | ) ( ) ( , )BB C B C old
B
P C B P B P B C
Example: Collect Evidence (cont.)
• 2. Call C,D:– 1. Projection:
– 2. Absorption:
, ( | ) 1C C DD D
P D C
, , ( , )CB C B C old
C
P B C
B,CA,B C,DB C
Example: Distribute Evidence
• Pass messages recursively to neighboring nodes
• Pass message from B,C to A,B:– 1. Projection:
– 2. Absorption:, ( , ) ( )B B C
C C
P B C P B
, ,
( )( , )
( )B
A B A B oldB
P BP A B
P B
Example: Distribute Evidence (cont.)
• Pass message from X1,X2 to X2,Y2:– 1. Projection:
– 2. Absorption:
, ( , ) ( )C B CB B
P B C P C
, ,
( )( | ) ( , )
1C
C D C D oldC
P CP D C P C D
B,CA,B C,DB C
Netica’s Animal Characteristics BN
Subnet 1:
* JTnode 1: An,En
* JTnode 2: An,Sh
* JTnode 3: An,Cl
Subnet 2:
* JTnode 4: Cl,Yo
* JTnode 5: Cl,Wa
Subnet 3:
* JTnode 6: Cl,Bod