1 sampling bayesian networks ics 275b 2005. 2 approximation algorithms structural approximations...
TRANSCRIPT
![Page 1: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/1.jpg)
1
Sampling Bayesian Networks
ICS 275b
2005
![Page 2: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/2.jpg)
2
Approximation Algorithms
Structural Approximations Eliminate some dependencies
Remove edges Mini-Bucket Approach
Search Approach for optimization tasks: MPE, MAP
SamplingGenerate random samples and compute values of interest from samples, not original network
![Page 3: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/3.jpg)
3
Algorithm Tree
![Page 4: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/4.jpg)
4
Sampling
Input: Bayesian network with set of nodes X Sample = a tuple with assigned values
s=(X1=x1,X2=x2,… ,Xk=xk)
Tuple may include all variables (except evidence) or a subset
Sampling schemas dictate how to generate samples (tuples)
Ideally, samples are distributed according to P(X|E)
![Page 5: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/5.jpg)
5
Sampling
Idea: generate a set of samples T Estimate P(Xi|E) from samples Need to know:
How to generate a new sample ? How many samples T do we need ? How to estimate P(Xi|E) ?
![Page 6: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/6.jpg)
6
Sampling Algorithms
Forward Sampling Likelyhood Weighting Gibbs Sampling (MCMC)
Blocking Rao-Blackwellised
Importance Sampling Sequential Monte-Carlo (Particle
Filtering) in Dynamic Bayesian Networks
![Page 7: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/7.jpg)
7
Forward Sampling
Forward Sampling Case with No evidence Case with Evidence N and Error Bounds
![Page 8: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/8.jpg)
8
Forward Sampling No Evidence(Henrion 1988)
Input: Bayesian networkX= {X1,…,XN}, N- #nodes, T - # samples
Output: T samples Process nodes in topological order – first process the
ancestors of a node, then the node itself:
1. For t = 0 to T2. For i = 0 to N3. Xi sample xi
t from P(xi | pai)
![Page 9: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/9.jpg)
9
Sampling A Value
What does it mean to sample xit from P(Xi | pai) ?
Assume D(Xi)={0,1} Assume P(Xi | pai) = (0.3, 0.7)
Draw a random number r from [0,1]If r falls in [0,0.3], set Xi = 0
If r falls in [0.3,1], set Xi=1
0 10.3 r
![Page 10: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/10.jpg)
10
Sampling a Value
When we sample xit from P(Xi | pai),
most of the time, will pick the most likely value of Xi
occasionally, will pick the unlikely value of Xi
We want to find high-probability tuplesBut!!!….
Choosing unlikely value allows to “cross” the low probability tuples to reach the high probability tuples !
![Page 11: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/11.jpg)
11
Forward sampling (example)
1X
2X 3X
4X
)( 1xP
)|( 12 xxP
),|( 324 xxxP
)|( 13 xxP
)|( from sample 5.otherwise 1, fromstart and
samplereject 0, If .4)|( from Sample .3)|( from Sample .2
)( from Sample .1 sample generate//
0 :Evidence
3,244
3
133
122
11
3
xxxPx
xxxPxxxPx
xPxk
X
![Page 12: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/12.jpg)
12
Forward Sampling-Answering Queries
Task: given n samples {S1,S2,…,Sn} estimate P(Xi = xi) :
T
xXsamplesxXP ii
ii
)(#)(
Basically, count the proportion of samples where Xi = xi
![Page 13: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/13.jpg)
13
Forward Sampling w/ Evidence
Input: Bayesian networkX= {X1,…,XN}, N- #nodesE – evidence, T - # samples
Output: T samples consistent with E1. For t=1 to T2. For i=1 to N3. Xi sample xi
t from P(xi | pai)
4. If Xi in E and Xi xi, reject sample: 5. i = 1 and go to step 2
![Page 14: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/14.jpg)
14
Forward Sampling: Illustration
Let Y be a subset of evidence nodes s.t. Y=u
![Page 15: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/15.jpg)
15
Forward Sampling –How many samples?
Theorem: Let s(y) be the estimate of P(y) resulting from a randomly chosen sample set S with T samples. Then, to guarantee relative error at most with probability at least 1- it is enough to have:
1
)( 2
yP
cT
Derived from Chebychev’s Bound.
222])(,)([)( NeyPyPyPP
![Page 16: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/16.jpg)
16
Forward Sampling - How many samples?
Theorem: Let s(y) be the estimate of P(y) resulting from a randomly chosen sample set S with T samples. Then, to guarantee relative error at most with probability at least 1- it is enough to have:
2
ln)(
42
yP
T
Derived from Hoeffding’s Bound (full proof is given in Koller).
222])(,)([)( NeyPyPyPP
![Page 17: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/17.jpg)
17
Forward Sampling:Performance
Advantages: P(xi | pa(xi)) is readily available Samples are independent !Drawbacks: If evidence E is rare (P(e) is low), then
we will reject most of the samples! Since P(y) in estimate of N is unknown,
must estimate P(y) from samples themselves!
If P(e) is small, T will become very big!
![Page 18: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/18.jpg)
18
Problem: Evidence
Forward Sampling High Rejection Rate
Fix evidence values Gibbs sampling (MCMC) Likelyhood Weighting Importance Sampling
![Page 19: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/19.jpg)
19
Forward Sampling Bibliography
{henrion88} M. Henrion, "Propagating uncertainty in Bayesian networks by probabilistic logic sampling”, Uncertainty in AI, pp. = 149-163,1988
![Page 20: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/20.jpg)
20
Likelihood Weighting(Fung and Chang, 1990; Shachter and Peot, 1990)
Works well for likely evidence!
“Clamping” evidence+forward sampling+ weighing samples by evidence likelihood
![Page 21: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/21.jpg)
21
Likelihood Weighting
)|(
assign
)|( from sample
1
:),...,(order icalin topologeach each For
to#
1
iikk
ii
iiii
i
k
ni
paePww
eX
paxPxX
EX
w
XXoX
T1ksample
else
if
For
![Page 22: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/22.jpg)
22
Likelihood Weighting
k k
yYk k
k k
k
kk
i
w
w
w
sw
Epa
,
Eik
)(E)|yP(Y
:Query Compute
))(|P(e w
:Likelihood Sample Compute
where
otherwise
yys
kk
,0
,1)(
![Page 23: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/23.jpg)
23
Likelyhood Convergence(Chebychev’s Inequality)
Assume P(X=x|e) has mean and variance 2
Chebychev:
1
ˆ
22
2
22
2
cN
N
cP
=P(x|e) is unknown => obtain it from samples!
![Page 24: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/24.jpg)
24
Error Bound Derivation
2
2
2
2
2
22
2
2
2
2
/
)(:Numbers Large of Law theFrom
4
1)1(
/
1, ),' with samples#(k )|'(
then , If:
1,1
:'
T
TXP
TXVar
TT
PPPPP
TpqPPP
pqT
pqPVarxX
T
KexPP
XPkCorollary
kk
kXPsChebychev
K is a Bernoulli random variable
![Page 25: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/25.jpg)
25
Likelyhood Convergence 2
Assume P(X=x|e) has mean and variance 2
Zero-One Estimation Theory (Karp et al.,1989):
=P(x|e) is unknown => obtain it from samples!
2
ln4
2T
![Page 26: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/26.jpg)
26
Local Variance Bound (LVB)(Dagum&Luby, 1994)
Let be LVB of a binary valued network:
]1,1[))(|(
],[))(|()),(|(
],1,0[,,1
1,max
luxpaxP
ulxpaxPxpaxP
ululu
l
l
u
OR
![Page 27: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/27.jpg)
27
LVB Estimate(Pradhan,Dagum,1996)
Using the LVB, the Zero-One Estimator can be re-written:
2
ln4
2
k
T
![Page 28: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/28.jpg)
28
Importance Sampling Idea
In general, it is hard to sample from target distribution P(X|E)
Generate samples from sampling (proposal) distribution Q(X)
Weigh each sample against P(X|E)
dxxfxQ
xPdxxffI t )(
)(
)()()(
![Page 29: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/29.jpg)
29
Importance Sampling Variants
Importance sampling: forward, non-adaptive Nodes sampled in topological order Sampling distribution (for non-instantiated nodes)
equal to the prior conditionals
Importance sampling: forward, adaptive Nodes sampled in topological order Sampling distribution adapted according to
average importance weights obtained in previous samples [Cheng,Druzdzel2000]
![Page 30: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/30.jpg)
30
AIS-BN
The most efficient variant of importance sampling to-date is AIS-BN – Adaptive Importance Sampling for Bayesian networks.
Jian Cheng and Marek J. Druzdzel. AIS-BN: An adaptive importance sampling algorithm for evidential reasoning in large Bayesian networks. Journal of Artificial Intelligence Research (JAIR), 13:155-188, 2000.
![Page 31: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/31.jpg)
31
Gibbs Sampling
Markov Chain Monte Carlo method(Gelfand and Smith, 1990, Smith and Roberts, 1993, Tierney, 1994)
Samples are dependent, form Markov Chain Samples directly from P(X|e) Guaranteed to converge when all P > 0 Methods to improve convergence:
Blocking Rao-Blackwellised
Error Bounds Lag-t autocovariance Multiple Chains, Chebyshev’s Inequality
![Page 32: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/32.jpg)
32
MCMC Sampling Fundamentals
dxXxggE )()(
Given a set of variables X = {X1, X2, … Xn} that represent joint probability distribution (X) and some function g(X), we can compute expected value of g(X) :
![Page 33: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/33.jpg)
33
MCMC Sampling From (X)
Given independent, identically distributed samples (iid) S1, S2, …ST from (X), it follows from Strong Law of Large Numbers:
T
t
tSgT
g1
)(1
},...,,{ 21tn
ttt xxxS A sample St is an instantiation:
![Page 34: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/34.jpg)
34
Gibbs Sampling (Pearl, 1988)
A sample t[1,2,…],is an instantiation of all variables in the network:
Sampling process Fix values of observed variables e Instantiate node values in sample x0 at
random Generate samples x1,x2,…xT from P(x|e) Compute posteriors from samples
},...,,{ 2211tNN
ttt xXxXxXx
![Page 35: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/35.jpg)
35
Ordered Gibbs Sampler
Generate sample xt+1 from xt :
In short, for i=1 to N:),\|(
),,...,,|(
...
),,...,,|(
),,...,,|(
1
11
12
11
1
31
121
22
3211
11
exxxPxX
exxxxPxX
exxxxPxX
exxxxPxX
it
itii
tN
ttN
tNN
tN
ttt
tN
ttt
from sampled
ProcessAllVariablesIn SomeOrder
![Page 36: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/36.jpg)
36
Gibbs Sampling (cont’d)(Pearl, 1988)
ij chX
jjiiit
i paxPpaxPxxxP )|()|()\|(
:)\|( )\|( :Important it
iit
i xmarkovxPxxxP
iX )()( jj chX
jiii pachpaXM
Markov blanket:
nodesother all oft independen is parents), their andchildren, (parents,
Given
iX
blanketMarkov
![Page 37: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/37.jpg)
37
Ordered Gibbs Sampling Algorithm
Input: X, EOutput: T samples {xt } Fix evidence E Generate samples from P(X | E)1. For t = 1 to T (compute samples)2. For i = 1 to N (loop through variables)3. Xi sample xi
t from P(Xi | markovt \ Xi)
![Page 38: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/38.jpg)
Answering Queries
Query: P(xi |e) = ? Method 1: count #of samples where Xi=xi:
Method 2: average probability (mixture estimator):
n
t it
iiii XmarkovxXPT
xXP1
)\|(1
)(
T
xXsamplesxXP ii
ii
)(#)(
![Page 39: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/39.jpg)
39
Importance vs. Gibbs
T
tt
tt
t
T
t
t
t
xq
xpxf
Tf
exqx
xfT
xf
expx
1
1
)(
)()(1
)|(
)(1
)(
)|(
:Importance
:Gibbs
wt
![Page 40: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/40.jpg)
40
Gibbs Sampling Example - BN
X = {X1,X2,…,X9}
E = {X9}X1
X4
X8 X5 X2
X3
X9 X7
X6
![Page 41: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/41.jpg)
41
Gibbs Sampling Example - BN
X1 = x10
X6 = x60
X2 = x20
X7 = x70
X3 = x30
X8 = x80
X4 = x40
X5 = x50
X1
X4
X8 X5 X2
X3
X9 X7
X6
![Page 42: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/42.jpg)
42
Gibbs Sampling Example - BN
X1 P (X1 |X0
2,…,X0
8 ,X9}
E = {X9}X1
X4
X8 X5 X2
X3
X9 X7
X6
![Page 43: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/43.jpg)
43
Gibbs Sampling Example - BN
X2 P(X2 |X1
1,…,X0
8 ,X9}
E = {X9}
X1
X4
X8 X5 X2
X3
X9 X7
X6
![Page 44: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/44.jpg)
44
Gibbs Sampling: Illustration
![Page 45: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/45.jpg)
49
Gibbs Sampling: Burn-In
We want to sample from P(X | E) But…starting point is random Solution: throw away first K samples Known As “Burn-In” What is K ? Hard to tell. Use intuition. Alternatives: sample first sample valkues
from approximate P(x|e) (for example, run IBP first)
![Page 46: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/46.jpg)
50
Gibbs Sampling: Convergence
Converge to stationary distribution * :
* = * Pwhere P is a transition kernel
pij = P(Xi Xj) Guaranteed to converge iff chain is :
irreducible aperiodic ergodic ( i,j pij > 0)
![Page 47: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/47.jpg)
51
Irreducible
A Markov chain (or its probability transition matrix) is said to be irreducible if it is possible to reach every state from every other state (not necessarily in one step).
In other words, i,j k : P(k)ij > 0 where
k is the number of steps taken to get to state j from state i.
![Page 48: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/48.jpg)
52
Aperiodic
Define d(i) = g.c.d.{n > 0 | it is possible to go from i to i in n steps}. Here, g.c.d. means the greatest common divisor of the integers in the set. If d(i)=1 for i, then chain is aperiodic.
![Page 49: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/49.jpg)
53
Ergodicity
A recurrent state is a state to which the chain returns with probability 1:
n P(n)ij =
Recurrent, aperiodic states are ergodic.
Note: an extra condition for ergodicity is that expected recurrence time is finite. This holds for recurrent states in a finite state chain.
![Page 50: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/50.jpg)
55
Gibbs Convergence
Gibbs convergence is generally guaranteed as long as all probabilities are positive!
Intuition for ergodicity requirement: if nodes X and Y are correlated s.t. X=0 Y=0, then:
once we sample and assign X=0, then we are forced to assign Y=0;
once we sample and assign Y=0, then we are forced to assign X=0;
we will never be able to change their values again!
Another problem: it can take a very long time to converge!
![Page 51: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/51.jpg)
56
Gibbs Sampling: Performance
+Advantage: guaranteed to converge to P(X|E)-Disadvantage: convergence may be slow
Problems:
Samples are dependent ! Statistical variance is too big in high-
dimensional problems
![Page 52: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/52.jpg)
57
Gibbs: Speeding Convergence
Objectives:1. Reduce dependence between
samples (autocorrelation) Skip samples Randomize Variable Sampling Order
2. Reduce variance Blocking Gibbs Sampling Rao-Blackwellisation
![Page 53: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/53.jpg)
58
Skipping Samples
Pick only every k-th sample (Gayer, 1992)
Can reduce dependence between samples !
Increases variance ! Waists samples !
![Page 54: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/54.jpg)
59
Randomized Variable Order
Random Scan Gibbs SamplerPick each next variable Xi for update at
random with probability pi , i pi = 1.
(In the simplest case, pi are distributed uniformly.)
In some instances, reduces variance (MacEachern, Peruggia, 1999 “Subsampling the Gibbs Sampler: Variance
Reduction”)
![Page 55: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/55.jpg)
60
Blocking
Sample several variables together, as a block Example: Given three variables X,Y,Z, with
domains of size 2, group Y and Z together to form a variable W={Y,Z} with domain size 4. Then, given sample (xt,yt,zt), compute next sample:
Xt+1 P(yt,zt)=P(wt)(yt+1,zt+1)=Wt+1 P(xt+1)
+ Can improve convergence greatly when two variables are strongly correlated!
- Domain of the block variable grows exponentially with the #variables in a block!
![Page 56: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/56.jpg)
61
Blocking Gibbs Sampling
Jensen, Kong, Kjaerulff, 1993“Blocking Gibbs Sampling Very Large
Probabilistic Expert Systems” Select a set of subsets:
E1, E2, E3, …, Ek s.t. Ei X
Ui Ei = X
Ai = X \ Ei
Sample P(Ei | Ai)
![Page 57: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/57.jpg)
62
Rao-Blackwellisation
Do not sample all variables! Sample a subset! Example: Given three variables
X,Y,Z, sample only X and Y, sum out Z. Given sample (xt,yt), compute next sample:
Xt+1 P(yt)yt+1 P(xt+1)
![Page 58: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/58.jpg)
63
Rao-Blackwell Theorem
Bottom line: reducing number of variables in a sample reduce variance!
![Page 59: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/59.jpg)
64
Blocking vs. Rao-Blackwellisation
Standard Gibbs:P(x|y,z),P(y|x,z),P(z|x,y) (1)
Blocking:P(x|y,z), P(y,z|x) (2)
Rao-Blackwellised:P(x|y), P(y|x) (3)
Var3 < Var2 < Var1 [Liu, Wong, Kong, 1994Covariance structure of the Gibbs
sampler…]
X Y
Z
![Page 60: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/60.jpg)
65
Rao-Blackwellised Gibbs: Cutset Sampling
Select C X (possibly cycle-cutset), |C| = m
Fix evidence E Initialize nodes with random values:
For i=1 to m: ci to Ci = c 0i
For t=1 to n , generate samples:For i=1 to m:Ci=ci
t+1 P(ci|c1 t+1,…,ci-1
t+1,ci+1t,…,cm
t ,e)
![Page 61: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/61.jpg)
66
Cutset Sampling
Select a subset C={C1,…,CK} X A sample t[1,2,…],is an instantiation of C:
Sampling process Fix values of observed variables e Generate sample c0 at random Generate samples c1,c2,…cT from P(c|e) Compute posteriors from samples
},...,,{ 2211tKK
ttt cCcCcCc
![Page 62: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/62.jpg)
67
Cutset SamplingGenerating Samples
Generate sample ct+1 from ct :
In short, for i=1 to K:),\|(
),,...,,|(
...
),,...,,|(
),,...,,|(
1
11
12
11
1
31
121
22
3211
11
ecccPcC
eccccPcC
eccccPcC
eccccPcC
it
itii
tK
ttK
tKK
tK
ttt
tK
ttt
from sampled
![Page 63: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/63.jpg)
68
Rao-Blackwellised Gibbs: Cutset Sampling
How to compute P(ci|c t\ci, e) ?
Compute joint P(ci, c t\ci, e) for each ci
D(Ci) Then normalize:
P(ci| c t\ci , e) = P(ci, c
t\ci , e) Computation efficiency depends
on choice of C
![Page 64: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/64.jpg)
69
Rao-Blackwellised Gibbs: Cutset Sampling
How to choose C ? Special case: C is cycle-cutset, O(N) General case: apply Bucket Tree Elimination (BTE), O(exp(w)) where w is the induced width of the network when nodes in C are observed.
Pick C wisely so as to minimize w notion of w-cutset
![Page 65: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/65.jpg)
70
w-cutset Sampling
C=w-cutset of the network, a set of nodes such that when C and E are instantiated, the adjusted induced width of the network is w
Complexity of exact inference: bounded by w !
cycle-cutset is a special case
![Page 66: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/66.jpg)
71
Cutset Sampling-Answering Queries
Query: ci C, P(ci |e)=? same as Gibbs: Special case of w-cutset
computed while generating sample t
compute after generating sample t
T
t it
ii ecccPT
|e)(cP1
),\|(1
Query: P(xi |e) = ?
T
t it
ii ,eccxPT
|e)(xP1
)\|(1
![Page 67: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/67.jpg)
72
Cutset Sampling Example
}{ 05
02
0 ,xxc
X1
X7
X5 X4
X2
X9 X8
X3
E=x9
X6
![Page 68: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/68.jpg)
73
Cutset Sampling Example
),(
),(1)(
),(
),(
}{
905
''2
905
'2
9052
12
905
''2
905
'2
05
02
0
,xxxBTE
,xxxBTE,x| xxP x
,xxxBTE
,xxxBTE
,xx c
X1
X7
X6 X5 X4
X2
X9 X8
X3
Sample a new value for X2 :
![Page 69: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/69.jpg)
74
Cutset Sampling Example
},{
),(
),(1)(
),(
),(
)(
},{
15
12
1
9''
512
9'5
12
9125
15
9''
512
9'5
12
9052
12
05
02
0
xxc
,xxxBTE
,xxxBTE,x| xxP x
,xxxBTE
,xxxBTE
,x| xxP x
xxc
X1
X7
X6 X5 X4
X2
X9 X8
X3
Sample a new value for X5 :
![Page 70: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/70.jpg)
75
Cutset Sampling Example
)(
)(
)(
3
1)|(
)(
)(
)(
9252
9152
9052
92
9252
32
9152
22
9052
12
,x| xxP
,x| xxP
,x| xxP
xxP
,x| xxP x
,x| xxP x
,x| xxP x
X1
X7
X6 X5 X4
X2
X9 X8
X3
Query P(x2|e) for sampling node X2 :Sample 1
Sample 2
Sample 3
![Page 71: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/71.jpg)
76
Cutset Sampling Example
),,|(
),,|(
),,|(
3
1)|(
),,|(},{
),,|(},{
),,|(},{
935
323
925
223
915
123
93
935
323
35
32
3
925
223
25
22
2
915
123
15
12
1
xxxxP
xxxxP
xxxxP
xxP
xxxxPxxc
xxxxPxxc
xxxxPxxc
X1
X7
X6 X5 X4
X2
X9 X8
X3
Query P(x3 |e) for non-sampled node X3 :
![Page 72: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/72.jpg)
77
Gibbs: Error Bounds
Objectives: Estimate needed number of samples T Estimate error Methodology: 1 chain use lag-k autocovariance
Estimate T M chains standard sampling
variance Estimate Error
![Page 73: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/73.jpg)
78
Gibbs: lag-k autocovariance
12
1
1
1
)(2)0(1
)(
))((1
)(
)\|(1
)|(
)\|(
i
kN
t kii
itN
t ii
it
ii
iT
PVar
PPPPT
k
xxxPT
exPP
xxxPP
Lag-k autocovariance
![Page 74: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/74.jpg)
79
Gibbs: lag-k autocovariance
12
1
)(2)0(1
)(
i
iT
PVar
)(
)0(ˆPVar
T
Estimate Monte Carlo variance:
Here, is the smallest positive integer satisfying:
1)12()2( Effective chain size:
In absense of autocovariance: TT ˆ
![Page 75: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/75.jpg)
80
Gibbs: Multiple Chains
Generate M chains of size K Each chain produces independent estimate Pm:
M
i mPM
P1
1
)\|(1
)|(1 i
tK
t iim xxxPK
exPP
Treat Pm as independent random variables.
Estimate P(xi|e) as average of Pm (xi|e) :
![Page 76: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/76.jpg)
81
Gibbs: Multiple Chains
{ Pm } are independent random variables
Therefore:
M
St
PMPM
PPM
SPVar
M
M
mm
M
m m
1,2/
1
22
2
1
2
1
1
1
1)(
![Page 77: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/77.jpg)
82
Geman&Geman1984
Geman, S. & Geman D., 1984. Stocahstic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans.Pat.Anal.Mach.Intel. 6, 721-41.
Introduce Gibbs sampling; Place the idea of Gibbs sampling in a general
setting in which the collection of variables is structured in a graphical model and each variable has a neighborhood corresponding to a local region of the graphical structure. Geman and Geman use the Gibbs distribution to define the joint distribution on this structured set of variables.
![Page 78: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/78.jpg)
83
Tanner&Wong 1987
Tanner and Wong (1987) Data-augmentation Convergence Results
![Page 79: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/79.jpg)
84
Pearl1988
Pearl,1988. Probabilistic Reasoning in Intelligent Systems, Morgan-Kaufmann.
In the case of Bayesian networks, the neighborhoods correspond to the Markov blanket of a variable and the joint distribution is defined by the factorization of the network.
![Page 80: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/80.jpg)
85
Gelfand&Smith,1990
Gelfand, A.E. and Smith, A.F.M., 1990. Sampling-based approaches to calculating marginal densities. J. Am.Statist. Assoc. 85, 398-409.
Show variance reduction in using mixture estimator for posterior marginals.
![Page 81: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/81.jpg)
86
Neal, 1992
R. M. Neal, 1992. Connectionist learning of belief networks, Artifical Intelligence, v. 56, pp. 71-118.
Stochastic simulation in noisy-or networks.
![Page 82: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/82.jpg)
87
CPCS54 Test Results
MSE vs. #samples (left) and time (right)
Ergodic, |X| = 54, D(Xi) = 2, |C| = 15, |E| = 4
Exact Time = 30 sec using Cutset Conditioning
CPCS54, n=54, |C|=15, |E|=3
0
0.001
0.002
0.003
0.004
0 1000 2000 3000 4000 5000
# samples
Cutset Gibbs
CPCS54, n=54, |C|=15, |E|=3
0
0.0002
0.0004
0.0006
0.0008
0 5 10 15 20 25
Time(sec)
Cutset Gibbs
![Page 83: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/83.jpg)
88
CPCS179 Test Results
MSE vs. #samples (left) and time (right) Non-Ergodic (1 deterministic CPT entry)|X| = 179, |C| = 8, 2<= D(Xi)<=4, |E| = 35
Exact Time = 122 sec using Loop-Cutset Conditioning
CPCS179, n=179, |C|=8, |E|=35
0
0.002
0.004
0.006
0.008
0.01
0.012
100 500 1000 2000 3000 4000
# samples
Cutset Gibbs
CPCS179, n=179, |C|=8, |E|=35
0
0.002
0.004
0.006
0.008
0.01
0.012
0 20 40 60 80
Time(sec)
Cutset Gibbs
![Page 84: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/84.jpg)
89
CPCS360b Test Results
MSE vs. #samples (left) and time (right)
Ergodic, |X| = 360, D(Xi)=2, |C| = 21, |E| = 36
Exact Time > 60 min using Cutset Conditioning
Exact Values obtained via Bucket Elimination
CPCS360b, n=360, |C|=21, |E|=36
0
0.00004
0.00008
0.00012
0.00016
0 200 400 600 800 1000
# samples
Cutset Gibbs
CPCS360b, n=360, |C|=21, |E|=36
0
0.00004
0.00008
0.00012
0.00016
1 2 3 5 10 20 30 40 50 60
Time(sec)
Cutset Gibbs
![Page 85: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/85.jpg)
90
Random Networks
MSE vs. #samples (left) and time (right)
|X| = 100, D(Xi) =2,|C| = 13, |E| = 15-20
Exact Time = 30 sec using Cutset Conditioning
RANDOM, n=100, |C|=13, |E|=15-20
0
0.0005
0.001
0.0015
0.002
0.0025
0.003
0.0035
0 200 400 600 800 1000 1200
# samples
Cutset Gibbs
RANDOM, n=100, |C|=13, |E|=15-20
0
0.0002
0.0004
0.0006
0.0008
0.001
0 1 2 3 4 5 6 7 8 9 10 11
Time(sec)
Cutset Gibbs
![Page 86: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/86.jpg)
91
Coding Networks
MSE vs. time (right)
Non-Ergodic, |X| = 100, D(Xi)=2, |C| = 13-16, |E| = 50
Sample Ergodic Subspace U={U1, U2,…Uk}
Exact Time = 50 sec using Cutset Conditioning
x1 x1 x1 x1
u1 u2 u3 u4
p1 p2 p3 p4
y4y3y2y1
Coding Networks, n=100, |C|=12-14
0.001
0.01
0.1
0 10 20 30 40 50 60
Time(sec)
IBP Gibbs Cutset
![Page 87: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/87.jpg)
92
Non-Ergodic Hailfinder
MSE vs. #samples (left) and time (right)
Non-Ergodic, |X| = 56, |C| = 5, 2 <=D(Xi) <=11, |E| = 0
Exact Time = 2 sec using Loop-Cutset Conditioning
HailFinder, n=56, |C|=5, |E|=1
0.0001
0.001
0.01
0.1
1
1 2 3 4 5 6 7 8 9 10
Time(sec)
Cutset Gibbs
HailFinder, n=56, |C|=5, |E|=1
0.0001
0.001
0.01
0.1
0 500 1000 1500
# samples
Cutset Gibbs
![Page 88: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/88.jpg)
93
Non-Ergodic CPCS360b - MSE
cpcs360b, N=360, |E|=[20-34], w*=20, MSE
0
0.000005
0.00001
0.000015
0.00002
0.000025
0 200 400 600 800 1000 1200 1400 1600
Time (sec)
Gibbs
IBP
|C|=26,fw=3
|C|=48,fw=2
MSE vs. Time
Non-Ergodic, |X| = 360, |C| = 26, D(Xi)=2
Exact Time = 50 min using BTE
![Page 89: 1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649ed15503460f94be044f/html5/thumbnails/89.jpg)
94
Non-Ergodic CPCS360b - MaxErr
cpcs360b, N=360, |E|=[20-34], MaxErr
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0 200 400 600 800 1000 1200 1400 1600
Time (sec)
Gibbs
IBP
|C|=26,fw=3
|C|=48,fw=2