the sum-product algorithm
DESCRIPTION
The Sum-Product Algorithm. Use the factor graph framework to derive the algorithm which is applicable to the tree- structed graph. Focus on the problem of evaluation local marginals. Assume that the original graph is an undirected tree or a direct tree or a polytree. - PowerPoint PPT PresentationTRANSCRIPT
The Sum-Product Algorithm• Use the factor graph framework to derive the
algorithm which is applicable to the tree-structed graph
• Focus on the problem of evaluation local marginals
• Assume that the original graph is an undirected tree or a direct tree or a polytree
• First, convert the original graph into a factor graph so that we can deal with them using the same framework
Goal• The goal is to exploit the structure of the graph to
achieve the two thing:
(i) To obtain an efficient, exact inference algorithm for finding marginals
(ii) In situations where several marginals are required to allow computations to be shared efficiently
The Sum-Product Algorithm• Suppose that all of the variables are hidden
• By definition, Joint distribution
The set of variables in x without including x
• Use
• Then, interchange the summations and the product
The Sum-Product Algorithm• Consider the following graph
Joint distribution
The product of all the factors in the group associated with factor
The Sum-Product Algorithm• Substitution into and interchanging the sums and
products
𝑝 (𝑥 )= ∏𝑠∈𝑛𝑒 (𝑥)
∑𝑋 𝑠
𝐹 𝑠(𝑥 , 𝑋 𝑠)
• Introduce a set of functions:
• View as messages from the factor node to the variable node x
𝑝 (𝑥 )= ∏𝑠∈𝑛𝑒 (𝑥)
∑𝑋 𝑠
𝐹 𝑠 (𝑥 ,𝑋 𝑠 )= ∏𝑠∈𝑛𝑒 (𝑥)
𝜇𝑓 𝑠→𝑥 (𝑥)
Proof
𝑝 (𝑥 )= ∏𝑠∈𝑛𝑒 (𝑥)
∑𝑋 𝑠
𝐹 𝑠 (𝑥 ,𝑋 𝑠 )= ∏𝑠∈𝑛𝑒 (𝑥)
𝜇𝑓 𝑠→𝑥 (𝑥)
The Sum-Product Algorithm
• Each factor is described by a factor (sub-)graph and so can itself be factorized.
Denoted
The Sum-Product Algorithm𝜇 𝑓 𝑠→𝑥 (𝑥 )=∑
𝑥1
…∑𝑥𝑀
𝑓 𝑠 (𝑥 ,𝑥1 , …,𝑥𝑀 ) ∏𝑠∈𝑛𝑒( 𝑓 𝑠 )¿
¿¿¿
¿∑𝑥1
…∑𝑥𝑀
𝑓 𝑠 (𝑥 , 𝑥1 , …,𝑥𝑀) ∏𝑠∈𝑛𝑒( 𝑓 𝑠 )¿
𝜇𝑥𝑚 → 𝑓 𝑠(𝑥𝑚)
The message that go from factor nodes to variable nodes
The message that go from factor nodes to variable nodes
Proof
𝜇 𝑓 𝑠→𝑥 (𝑥 )=¿
¿ [∑𝑥1
…∑𝑥𝑀
𝑓 𝑠(𝑥 , 𝑥1 ,…, 𝑥𝑀)][ ∏𝑠∈𝑛𝑒 ( 𝑓 𝑠)¿
𝜇𝑥𝑚→ 𝑓 𝑠(𝑥𝑚)]
¿ [∑𝑥1
…∑𝑥𝑀
𝑓 𝑠 (𝑥 , 𝑥1 , …,𝑥𝑀 )𝐺1 (𝑥1 , 𝑋 𝑠1 ) …𝐺𝑀 (𝑥𝑀 , 𝑋 𝑠𝑀 )]¿ [∑𝑥1
𝐺1 (𝑥1 , 𝑋 𝑠1 ) …∑𝑥𝑀
𝐺𝑀 (𝑥𝑀 , 𝑋 𝑠𝑀 ) 𝑓 𝑠 (𝑥 ,𝑥1, …, 𝑥𝑀 )]
The Sum-Product Algorithm• Derive an expression of evaluating the message from
variable nodes to factor nodes, again by making the sub-graph factorization
𝐺𝑚 (𝑥𝑚 ,𝑋 𝑠𝑚 )= ∏𝑙∈𝑛𝑒 (𝑥𝑚) { 𝑓 𝑠
¿𝐹 𝑙(𝑥𝑚 ,𝑋𝑚𝑙)
The Sum-Product Algorithm• Each of these message can be computed recursively in
term of messages• To start the recursion, view the node x as the root of
the tree and begin at the leaf nodes
• If a leaf node is a variable node, then the message that is sent along its one and only one link
• If the leaf node is a factor node, the message should take the form
The Sum-Product Algorithm• Start by viewing the variable node x as the root of the
factor graph and initiating messages at the leave• The message passing steps are then applied until
messages have been propagated along every link
• The root node will receive messages from all its neighbours
• The required marginal can be evaluated
𝜇𝑥𝑚→ 𝑓 𝑠 (𝑥𝑚 )= ∏
𝑙∈𝑛𝑒( 𝑥𝑚){ 𝑓 𝑠
¿𝜇𝑓 𝑙→𝑥𝑚(𝑥𝑚 )
𝑝 (𝑥 )= ∏𝑠∈𝑛𝑒 (𝑥)
𝜇 𝑓 𝑠→𝑥 (𝑥)
Example• Unnormalized joint distribution:
Root
leaf
Example
𝜇 𝑓 𝑠→𝑥 (𝑥 )=∑𝑥1
…∑𝑥𝑀
𝑓 𝑠 (𝑥 ,𝑥1 , …,𝑥𝑀 ) ∏𝑠∈𝑛𝑒( 𝑓 𝑠 )¿
𝜇𝑥𝑚 → 𝑓 𝑠(𝑥𝑚)
𝜇𝑥𝑚→ 𝑓 𝑠 (𝑥𝑚 )= ∏
𝑙∈𝑛𝑒( 𝑥𝑚){ 𝑓 𝑠
¿𝜇𝑓 𝑙→𝑥𝑚(𝑥𝑚 )
𝜇𝑥𝑚→ 𝑓 𝑠 (𝑥𝑚 )= ∏
𝑙∈𝑛𝑒( 𝑥𝑚){ 𝑓 𝑠
¿𝜇𝑓 𝑙→𝑥𝑚(𝑥𝑚 )
𝜇 𝑓 𝑠→𝑥 (𝑥 )=∑𝑥1
…∑𝑥𝑀
𝑓 𝑠 (𝑥 ,𝑥1 , …,𝑥𝑀 ) ∏𝑠∈𝑛𝑒( 𝑓 𝑠 )¿
𝜇𝑥𝑚 → 𝑓 𝑠(𝑥𝑚)
Example
Sum-Product And Max-Sum Algorithm
Sum-product algorithm:Take a joint distribution expressed as a factor graphEfficiently find marginals over the component variables
Max-sum algorithm:Find a setting of the variables that has the largest probabilityFind the value of the above probabilityViewed as an application of dynamic programming
Find the maximal value
Or, find the set of values that have the largest probability, we can find the vector that the maximizes the joint distribution
However, the is not always the same as the set of
Run the sum-product algorithm to obtain for every variable, and then, for each marginal in turn, to find the value that the maximizes the marginal
Example
)p( 0.3 0.4 0.7
0.3 0.0 0.30.6 0.4
Max
MaxSo, the marignals are maximized by and , which
corresponds to a value of 0.3
But, the largest joint probability is 0.4
The Max-Sum Algorithm• Write out the max operator:
where M is the total number of variables
• Substitute for using the product of factors and use the distributive law of multiplication
The Max-Sum Algorithm
The Max-Sum AlgorithmThe final maximization is performed over the product
of all messages arriving at the root node, and gives the maximum value for
This is called the max-product algorithm and identical to the sum-product algorithm except that summations are replaced by maximization
The Max-Sum AlgorithmProduct of many small probabilities can lead to
numerical underflow problem, so work with the logarithm of the joint distribution
If then
ln (max𝐱𝑝 (𝐱))=max
𝐱( ln𝑝 (𝐱 ))
The logarithm function makes the products be the sums, so we can obtain the max-sum algorithm
The Max-Sum Algorithm
The Max-Sum Algorithm• The initial message:
• The probability at the root node:
The Max-Sum Algorithm• Finding the maximum of the joint distribution is
irrespective of which node is chosen as the root
• The process of evaluating the above equation will give the value for the most probable value of the root variable
𝑝 (𝑥)max=max𝑥
∑𝑠∈𝑛𝑒 (𝑥)
𝜇 𝑓 𝑠→𝑥(𝑥)
𝑥max=arg max𝑥
∑𝑠∈𝑛𝑒(𝑥 )
𝜇𝑓 𝑠→𝑥(𝑥 )
The Max-Sum Algorithm
The simple chain with N variables each having K states
Take the as the root nodeIn the first phase, propagate messages from the leaf node to the root node using
The initial message:
The most probable value for is given by
…. …. 144
𝑥𝑛+1 𝑥𝑁𝑥𝑛𝑥𝑛−1𝑥1 𝑓 𝑛− 1,𝑛𝑓 𝑛 ,𝑛+ 1
The Max-Sum Algorithm
• Need to determine the state of previous variables that correspond to the same maximizing configuration
• Done by keeping track of which values of the variables gave rise to the maximum state of each variable
The Max-Sum AlgorithmLattice or trellis diagram
• Not a probabilistic graphical because the nodes represent individual states of variable
The variable node
The nodes with the second states
• For each state of a given variable, there is a unique state of the previous variable that maximizes the probability, corresponding to the function , and indicated by the line connecting the node
The Max-Sum Algorithm• Once, we know the most probable value of the final node ,
simply follow link back to find the most probable state of node and so back to the initial node
• Using and is known as back-tracking
𝜙 (𝑥𝑛)=arg max𝑥𝑛− 1
[ ln 𝑓 𝑛−1 ,𝑛 (𝑥𝑛− 1 , 𝑥𝑛 )+𝜇𝑥𝑛− 1→ 𝑓 𝑛−1 ,𝑛(𝑥𝑛)]
The Max-Sum Algorithm
• Two paths, each of which we shall suppose corresponds to a global maximum
The Max-Sum Algorithm• If a message is sent from a factor node f to a variable node
x, a maximization is performed over all other variable node that neighbours of that factor nodes, using
• Performing this maximization, keep recode of which values of the variables gave rise to the maximization
• In the back-tracking step, having found , then use these stored values to assign consistent maximizing states