statistical methods in ai/ml
Post on 14-Feb-2016
52 Views
Preview:
DESCRIPTION
TRANSCRIPT
Statistical Methods in AI/ML
Bucket eliminationVibhav Gogate
Bucket Elimination: Initialization
A
B
C
D
E
F
(A,C) (C,E)
(D,F)(B,D)
(C,D)(A,B)
• You put each function in exactly one bucket
• How?• Along the order, find the first bucket such
that one of the variable’s in the function’s scope is the bucket variable
A
E
D
F
B
C
(E,F)
Bucket elimination: Processing Buckets
• Process in order• Multiply all the functions in the
bucket• Sum-out the bucket variable• Put the new function in one of
the buckets obeying the initialization constraint
A
B
C
D
E
F(C,E)
(E,F)
(D,F)(B,D) (C,D)
A
E
D
F
B
C
ψ(B,C)
ψ(C,F)
ψ(B,C,F)
ψ2(B,C)
ψ(C)
(A,C)(A,B)
Z
Bucket elimination: Why it works?A
B
C
D
E
F
A
E
D
F
B
C
(C,E)(E,F)
(D,F)(B,D) (C,D)
ψ(B,C)
ψ(C,F)ψ(B,C,F)
ψ2(B,C)
ψ(C)
(A,C)(A,B)
Z𝒁=∑
𝒄❑∑
𝒃❑∑
𝒇❑∑
𝒅❑∑
𝒆❑∑
𝒂𝝓 (𝒂 ,𝒃)𝝓 (𝒂 ,𝒄)𝝓 (𝒃 ,𝒅 )𝝓 (𝒄 ,𝒅)𝝓 (𝒄 ,𝒆)𝝓(𝒅 , 𝒇 )𝝓 (𝒆 , 𝒇 )
Bucket elimination: Why it works?A
E
D
F
B
C
(C,E)(E,F)
(D,F)(B,D) (C,D)
ψ(B,C)
ψ(C,F)ψ(B,C,F)
ψ2(B,C)
ψ(C)
(A,C)(A,B)
Z
Bucket elimination: Why it works?A
E
D
F
B
C
(C,E)(E,F)
(D,F)(B,D) (C,D)
ψ(B,C)
ψ(C,F)ψ(B,C,F)
ψ2(B,C)
ψ(C)
(A,C)(A,B)
Z
Bucket elimination: Why it works?A
E
D
F
B
C
(C,E)(E,F)
(D,F)(B,D) (C,D)
ψ(B,C)
ψ(C,F)ψ(B,C,F)
ψ2(B,C)
ψ(C)
(A,C)(A,B)
Z
Bucket elimination: Why it works?A
E
D
F
B
C
(C,E)(E,F)
(D,F)(B,D) (C,D)
ψ(B,C)
ψ(C,F)ψ(B,C,F)
ψ2(B,C)
ψ(C)
(A,C)(A,B)
Z
and so on.
Bucket elimination: ComplexityA
E
D
F
B
C
(C,E)(E,F)
(D,F)(B,D) (C,D)
ψ(B,C)
ψ(C,F)ψ(B,C,F)
ψ2(B,C)
ψ(C)
(A,C)(A,B)
Z
exp(3)
exp(3)
exp(4)
exp(3)
exp(2)
exp(1)
≈6exp(3)
Complexity: O(nexp(w))w: scope of the largest function generatedn:#variables
Bucket elimination: Determining complexity graphically
• Schematic operation on a graph
– Process nodes in order– Connect all children of a node to
each other
E
D
F
B
C
A
A
B
C
D
E
F
Bucket elimination: Complexity
• Complexity of processing a bucket “i”– exp(childreni)
• Complexity of bucket elimination– nexp(max(childreni))
E
D
F
B
C
A
Treewidth and Tree Decompositions
• Running schematic bucket elimination yields a chordal graph– Each cycle of length > 3 has a chord (an edge
connecting two nodes that are not adjacent in the cycle)
• Every chordal graph can be represented using a tree decomposition
Tree Decomposition of Chordal graphs
E
D
F
B
C
A ABC
EFC
DBCF
FBC
BC
C
BCFC
FBC
BC
C
Tree Decomposition and Treewidth: Definition
• Given a network and its interaction graph• Tree Decomposition is a set of subset of variables connected by
a tree such that:– Each variable is present in at least one subset– Each edge is present in at least one subset– The set of subsets containing a variable “X” form a connected sub-tree
• Running intersection property
• Width of a tree decomposition: Cardinality of the maximum subset minus 1
• Treewidth: minimum width out of all possible tree decompositions
Bucket elimination: Complexity
• Best possible complexity: O(nexp(w+1)) where w is the treewidth of the graph
• Thus, we have a graph-based algorithm for determining the complexity of bucket elimination.
• If w is small, we can solve the problem efficiently!
Generating Tree Decompositions
• Computing treewidth is NP-hard• Branch and Bound algorithm
(Gogate&Dechter, 2004)• Best-first search algorithm– (Dow and Korf, 2009)
• Heuristics in practice– min-fill heuristic– min-degree heuristic
Min-degree and min-fill
• min-degree– At each point, select a variable with minimum
degree (ties broken arbitrarily)– Connect the children of the variable to each other
• min-fill– At each point, select a variable that adds the
minimum number of edges to the current graph– Connect the children of the selected variable to
each other
Computing all Marginals
• Bucket elimination computes – P(e) or Z– P(Xi|e) where “Xi” is the last variable eliminated
• To compute all marginals P(Xi|e) for all variables Xi
– Run bucket elimination “n” times• Efficient algorithm– Junction tree algorithm or bucket tree propagation– Requires only two passes to compute all marginals
Junction tree algorithm:An exact message passing algorithm
• Construct a tree decomposition T• Initialize the tree decomposition as in bucket
elimination• Select an arbitrary node of T as root• Pass messages from leaves to root (upward
pass)• Pass messages from root to leaves (downward
pass)
Message passing Equations• Multiply all received
messages except from R• Multiply all functions• Sum-out all variables
except the separatorS
R
𝑚 (𝑆→𝑅 )= ∑𝑉 𝑎𝑟𝑠 (𝑆 )−𝑆𝑒𝑝 (𝑆,𝑅)
∏𝑓 ∈ 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠 (𝑆)
𝑓 ∏𝐺∈ h𝑁𝑒𝑖𝑔 𝑏𝑜𝑟𝑠 (𝑆)−𝑅
𝑚 (¿¿¿G→R)¿¿¿
Computing all marginals
SP(S)
Message passing Equations
• Select “EFC” as root• Pass messages from
leaves to root• Pass messages from
root to leaves
ABC
EFC
DBCF
FBC
BC
C
FC
FBC
BC
C
(C,E) (E,F)
(D,F)
(B,D)
(C,D)
(A,C)(A,B)
Architectures
• Shenoy-Shafer architecture• Hugin architecture– Associate one function with each cluster– Requires multiplication– Smaller time complexity– Higher space complexity
top related