statistical methods in ai/ml

Statistical Methods in AI/ML

Bucket eliminationVibhav Gogate

Bucket Elimination: Initialization

(A,C) (C,E)

(D,F)(B,D)

(C,D)(A,B)

• You put each function in exactly one bucket

• How?• Along the order, find the first bucket such

that one of the variable’s in the function’s scope is the bucket variable

Bucket elimination: Processing Buckets

• Process in order• Multiply all the functions in the

bucket• Sum-out the bucket variable• Put the new function in one of

the buckets obeying the initialization constraint

F(C,E)

(D,F)(B,D) (C,D)

ψ(B,C)

ψ(C,F)

ψ(B,C,F)

ψ2(B,C)

(A,C)(A,B)

Bucket elimination: Why it works?A

(C,E)(E,F)

(D,F)(B,D) (C,D)

ψ(B,C)

ψ(C,F)ψ(B,C,F)

ψ2(B,C)

(A,C)(A,B)

Z𝒁=∑

𝒄❑∑

𝒃❑∑

𝒇❑∑

𝒅❑∑

𝒆❑∑

𝒂𝝓 (𝒂 ,𝒃)𝝓 (𝒂 ,𝒄)𝝓 (𝒃 ,𝒅 )𝝓 (𝒄 ,𝒅)𝝓 (𝒄 ,𝒆)𝝓(𝒅 , 𝒇 )𝝓 (𝒆 , 𝒇 )

(C,E)(E,F)

(D,F)(B,D) (C,D)

ψ(B,C)

ψ(C,F)ψ(B,C,F)

ψ2(B,C)

(A,C)(A,B)

(C,E)(E,F)

(D,F)(B,D) (C,D)

ψ(B,C)

ψ(C,F)ψ(B,C,F)

ψ2(B,C)

(A,C)(A,B)

(C,E)(E,F)

(D,F)(B,D) (C,D)

ψ(B,C)

ψ(C,F)ψ(B,C,F)

ψ2(B,C)

(A,C)(A,B)

(C,E)(E,F)

(D,F)(B,D) (C,D)

ψ(B,C)

ψ(C,F)ψ(B,C,F)

ψ2(B,C)

(A,C)(A,B)

and so on.

Bucket elimination: ComplexityA

(C,E)(E,F)

(D,F)(B,D) (C,D)

ψ(B,C)

ψ(C,F)ψ(B,C,F)

ψ2(B,C)

(A,C)(A,B)

exp(3)

exp(4)

exp(3)

exp(2)

exp(1)

≈6exp(3)

Complexity: O(nexp(w))w: scope of the largest function generatedn:#variables

Bucket elimination: Determining complexity graphically

• Schematic operation on a graph

– Process nodes in order– Connect all children of a node to

each other

Bucket elimination: Complexity

• Complexity of processing a bucket “i”– exp(childreni)

• Complexity of bucket elimination– nexp(max(childreni))

Treewidth and Tree Decompositions

• Running schematic bucket elimination yields a chordal graph– Each cycle of length > 3 has a chord (an edge

connecting two nodes that are not adjacent in the cycle)

• Every chordal graph can be represented using a tree decomposition

Tree Decomposition of Chordal graphs

Tree Decomposition and Treewidth: Definition

• Given a network and its interaction graph• Tree Decomposition is a set of subset of variables connected by

a tree such that:– Each variable is present in at least one subset– Each edge is present in at least one subset– The set of subsets containing a variable “X” form a connected sub-tree

• Running intersection property

• Width of a tree decomposition: Cardinality of the maximum subset minus 1

• Treewidth: minimum width out of all possible tree decompositions

Bucket elimination: Complexity

• Best possible complexity: O(nexp(w+1)) where w is the treewidth of the graph

• Thus, we have a graph-based algorithm for determining the complexity of bucket elimination.

• If w is small, we can solve the problem efficiently!

Generating Tree Decompositions

• Computing treewidth is NP-hard• Branch and Bound algorithm

(Gogate&Dechter, 2004)• Best-first search algorithm– (Dow and Korf, 2009)

• Heuristics in practice– min-fill heuristic– min-degree heuristic

Min-degree and min-fill

• min-degree– At each point, select a variable with minimum

degree (ties broken arbitrarily)– Connect the children of the variable to each other

• min-fill– At each point, select a variable that adds the

minimum number of edges to the current graph– Connect the children of the selected variable to

each other

Computing all Marginals

• Bucket elimination computes – P(e) or Z– P(Xi|e) where “Xi” is the last variable eliminated

• To compute all marginals P(Xi|e) for all variables Xi

– Run bucket elimination “n” times• Efficient algorithm– Junction tree algorithm or bucket tree propagation– Requires only two passes to compute all marginals

Junction tree algorithm:An exact message passing algorithm

• Construct a tree decomposition T• Initialize the tree decomposition as in bucket

elimination• Select an arbitrary node of T as root• Pass messages from leaves to root (upward

pass)• Pass messages from root to leaves (downward

Message passing Equations• Multiply all received

messages except from R• Multiply all functions• Sum-out all variables

except the separatorS

𝑚 (𝑆→𝑅 )= ∑𝑉 𝑎𝑟𝑠 (𝑆 )−𝑆𝑒𝑝 (𝑆,𝑅)

∏𝑓 ∈ 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠 (𝑆)

𝑓 ∏𝐺∈ h𝑁𝑒𝑖𝑔 𝑏𝑜𝑟𝑠 (𝑆)−𝑅

𝑚 (¿¿¿G→R)¿¿¿

Computing all marginals

Message passing Equations

• Select “EFC” as root• Pass messages from

leaves to root• Pass messages from

root to leaves

(C,E) (E,F)

(A,C)(A,B)

Architectures

• Shenoy-Shafer architecture• Hugin architecture– Associate one function with each cluster– Requires multiplication– Smaller time complexity– Higher space complexity

statistical methods in ai/ml

bzbucket elimination

complexitya e d f b

abcdefa e d f b cc

bucket variablea e d

da e d f b c b

variablesbucket elimination

bucket variableput

selected variable

Documents

statistical relational ai: logic, probability and...

a deep dive in ai and ml

statistical methods in ai and ml - the university of texas...

ai and ml in wealth management

jmp's five commandments of ai/ml design software · jmp's...

ai and ml week - aws

ai & ml for supply chain optimization

ai ml recently updated

oracle blockchain and ai/ml enabled government acquisition

unifying logical and statistical ai

vulnerability disclosure and management for ai/ml systems

ai ecosystem and solutions with ibm spectrum scale ·...

definitions: ai, ml, ds - umd

lecture 23: ai, ml, nlp,

statistical methods in ai/ml

creating an engine of scientific...

final program ai/ml. autonomous vehicles. security. iot

product management for ai/ml

executives & decision-makers foundations of ai & ml for

pipelineai + tensorflow ai + spark ml + kuberenetes + istio...