markov chain monte carlo (mcmc) inferencenlp.chonbuk.ac.kr/pgm2019/slides_jbnu/mcmc.pdf ·...

72
Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University

Upload: others

Post on 03-Feb-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Markov Chain Monte Carlo (MCMC) Inference

Seung-Hoon Na

Chonbuk National University

Page 2: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Monte Carlo Approximation

• Generate some (unweighted) samples from the posterior:

• Use these samples to compute any quantity of interest

– Posterior marginal: 𝑝(𝑥1|𝐷)

– Posterior predictive: 𝑝 𝑦 𝐷

– …

Page 3: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Sampling from standard distributions

• Using the cdf

– Based on the inverse probability transform

Page 4: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Sampling from standard

distributions: Inverse CDF

Page 5: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Sampling from standard distributions:

Inverse CDF

• Example: Exponential distribution

cdf

Page 6: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Sampling from a Gaussian: Box-Muller Method

• Sample 𝑧1, 𝑧2 ∈ (−1,1) uniformly

• Discard (𝑧1, 𝑧2) that do not inside the unit-circle, so satisfying 𝑧1

2 + 𝑧22 ≤ 1

• 𝑝 𝑧 =1

𝜋𝐼(𝑧 𝑖𝑛𝑐𝑖𝑑𝑒 𝑐𝑖𝑟𝑐𝑙𝑒)

• Define

Page 7: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Sampling from a Gaussian: Box-Muller Method

• Polar coordinate상에서 sampling

• r을 normal distribution에비례하도록 샘플 Exponential 분포

• 𝜃를 uniform하게샘플 주어진 r에대해 x,y 를 uniform하게분포

Page 8: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Sampling from a Gaussian: Box-Muller Method

• 𝑋 ~ 𝑁 0,1 𝑌 ~ 𝑁 0,1

• 𝑝 𝑥, 𝑦 =1

2𝜋exp −

𝑥2+𝑦2

2=

1

2𝜋exp −

𝑟2

2

• 𝑟2~𝐸𝑥𝑝1

2

• 𝐸𝑥𝑝 𝜆 =− log(𝑈 0,1 )

𝜆

• 𝑟 ∼ −2 log(𝑈 0,1 )

https://theclevermachine.wordpress.com/2012/09/11/sampling-from-the-normal-distribution-using-the-box-muller-transform/

𝑃 𝑈 ≤ 1 − exp(−0.5𝑟2)=𝑃 𝑟2 ≤ −2𝑙𝑜𝑔𝑈

= 𝑃 𝑟 ≤ −2𝑙𝑜𝑔𝑈

Page 9: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Sampling from a Gaussian: Box-Muller Method

• 1. Draw 𝑢1, 𝑢2 ∼ 𝑈 0,1

• 2. Transform to polar rep.

– 𝑟 = −2log(𝑢1) 𝜃 = 2𝜋𝑢2

• 3. Transform to Catesian rep.

– 𝑥 = 𝑟 cos 𝜃 𝑦 = 𝑟 cos 𝜃

Page 10: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Rejection sampling

• when the inverse cdf method cannot be used

• Create a proposal distribution 𝑞(𝑥) which satisfies • 𝑀𝑞(𝑥) provides an upper envelope for 𝑝(𝑥)

• Sample 𝑥 ∼ 𝑞(𝑥)

• Sample 𝑢 ∼ 𝑈(0,1)

• If 𝑢 >𝑝(𝑥)

𝑀𝑞(𝑥), reject the sample

• Otherwise accept it

Page 11: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Rejection sampling

Page 12: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Rejection sampling: Proof• Why rejection sampling works?

• Let

• The cdf of the accepted point:

𝑝(𝑥)

Page 13: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Rejection sampling

• How efficient is this method?

• 𝑃 𝑎𝑐𝑐𝑒𝑝𝑡 = 𝑞 𝑥 𝐼 𝑥, 𝑢 ∈ 𝑆 𝑑𝑢 𝑑𝑥

= න𝑝 𝑥

𝑀𝑞 𝑥𝑞 𝑥 𝑑𝑥 =

1

𝑀න 𝑝 𝑥 𝑑𝑥

We need to choose M as small as possible while still satisfying

න0

1

𝐼 𝑢 ≤ 𝑦 = 𝑦

Page 14: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Rejection sampling: Example

• Suppose we want to sample from a Gamma

• When 𝛼 is an integer, i. e. , 𝛼 = 𝑘, we can use:

• But, for non-integer 𝛼, we cannot use this trick & intead use rejection sampling

Page 15: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Rejection sampling: Example• Use as a proposal, where

• To obtain 𝑀 as small as possible, check the ratio 𝑝(𝑥)/𝑞 𝑥 :

• This ratio attains its maximum when

Page 16: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Rejection sampling: Example

• Proposal

Page 17: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Rejection Sampling:

Application to Bayesian Statistics

• Suppose we want to draw (unweighted) samples from the posterior

• Use rejection sampling

– Target distribution

– Proposal:

– M:

• Accepting probability:

Page 18: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Adaptive rejection sampling• Upper bound the log density with a piecewise

linear function

Page 19: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Importance Sampling

• MC methods for approximating integrals of the form:

• The idea: Draw samples 𝒙 in regions which have high probability, 𝑝(𝒙), but also where |𝑓(𝒙)| is large

• Define 𝑓 𝒙 = 𝐼 𝒙 ∈ 𝐸

• Sample from a proposal than to sample from itself

Page 20: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Importance Sampling

• Samples from any proposal 𝑞(𝒙) to estimate the integral:

• How should we choose the proposal?

– Minimize the variance of the estimate መ𝐼

.

Page 21: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Importance Sampling

• By Jensen’s inequality, we have 𝐸 𝑢2 𝒙 ≥ 𝐸 𝑢 𝒙 2

• Setting 𝑢 𝒙 =𝑝 𝒙 𝑓 𝒙

𝑞 𝒙= 𝑤 𝒙 |𝑓 𝒙 |, we have the lower

bound:

𝑢 𝒙 =𝑝 𝒙 𝑓 𝒙

𝑞 𝒙가상수이면 equality가성립

𝑞 𝒙 ∝ 𝑝 𝒙 𝑓 𝒙

Page 22: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Importance Sampling:

Handling unnormalized distributions • When only unnormalized target distribution and

proposals are available without 𝑍𝑝, 𝑍𝑞?

• Use the same set of samples to evaluate 𝑍𝑞/ 𝑍𝑞:

unnormalized importance weight

Page 23: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Ancestral sampling for PGM

• Ancestral sampling

– Sample the root nodes,

– then sample their children,

– then sample their children, etc.

– This is okay when we have no evidence

Page 24: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Ancestral sampling

Page 25: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Ancestral sampling: Example

Page 26: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Rejection sampling for PGM

• Now, suppose that we have some evidence with interest in conditional queries :

• Rejection sampling (local sampling)– Perform ancestral sampling,

– but as soon as we sample a value that is inconsistent with an observed value, reject the whole sample and start again

• However, rejection sampling is very inefficient (requiring so many samples) & cannot be applied for real-valued evidences

Page 27: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Importance Sampling for DGM:

Likelihood weighting

• Likelihood weighting

– Sample unobserved variables as before, conditional on their parents; But don’t sample observed variables; instead we just use their observed values.

– This is equivalent to using a proposal of the form:

– The corresponding importance weight: the set of observed nodes

Page 28: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Likelihood weighting

Page 29: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Likelihood weighting

Page 30: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Sampling importance resampling (SIR)

• Draw unweighted samples by first using importance sampling

• Sample with replacement where the probability that we pick 𝒙𝑠 is 𝑤𝑠

Page 31: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Sampling importance resampling (SIR)

• Application: Bayesian inference

– Goal: draw samples from the posterior

– Unnormalized posterior:

– Proposal:

– Normalized weights:

Typically S’ << S

Page 32: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Particle Filtering

• Simulation based, algorithm for recursive Bayesian inference

– Sequential importance sampling resampling

Page 33: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Markov Chain Monte Carlo

(MCMC)

• 1) Construct a Markov chain on the state space X

– whose stationary distribution is the target density 𝑝∗(𝒙) of interest

• 2) Perform a random walk on the state space

– in such a way that the fraction of time we spend in each state x is proportional to 𝑝∗(𝒙)

• 3) By drawing (correlated!) samples 𝒙0, 𝒙1, 𝒙2, . . . , from the chain, perform Monte Carlo integration wrt p∗

Page 34: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Markov Chain Monte Carlo (MCMC) vs.

Variational inference

• Variational inference

– (1) for small to medium problems, it is usually faster;

– (2) it is deterministic;

– (3) is it easy to determine when to stop;

– (4) it often provides a lower bound on the log liklihood.

Page 35: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Markov Chain Monte Carlo

(MCMC) vs. Variational inference

• MCMC– (1) it is often easier to implement;

– (2) it is applicable to a broader range of models, such as models whose size or structure changes depending on the values of certain variables (e.g., as happens in matching problems), or models without nice conjugate priors;

– (3) sampling can be faster than variationalmethods when applied to really huge models or datasets.2

Page 36: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Gibbs Sampling

• Sample each variable in turn, conditioned on the values of all the other variables in the distribution

• For example, if we have D = 3 variables

• Need to derive full conditional for variable I

Page 37: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Gibbs Sampling: Ising model

• Full conditional

Page 38: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Gibbs Sampling: Ising model

• Combine an Ising prior with a local evidence term 𝜓𝑡

Page 39: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Gibbs Sampling: Ising model

• Ising prior with 𝑊𝑖𝑗 = 𝐽 = 1

– Gaussian noise model with 𝜎 = 2

Page 40: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Gibbs Sampling: Ising model

Page 41: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Gaussian Mixture Model (GMM)

• Likelihood function

• Factored conjugate prior

Page 42: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Gaussian Mixture Model (GMM):

Variational EM

• Standard VB approximation to the posterior:

• Mean field approximation

• VBEM results in the optimal form of 𝑞 𝒛, 𝜽 :

Page 43: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

[Ref] Gaussian Models• Marginals and conditionals of a Gaussian model• 𝑝 𝒙 ∼ 𝑁(𝝁, 𝚺)

• Marginals:

• Posteriorconditionals:

Page 44: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

[Ref] Gaussian Models

• Linear Gaussian systems

• The posterior:

• The normalization constant:

Page 45: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

[Ref] Gaussian Models: Posterior distribution of 𝝁

• The likelihood wrt 𝝁:

• The prior:

• The posterior:

Page 46: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

[Ref] Gaussian Models: Posterior distribution of 𝚺

• The likelihood form of Σ

• The conjugate prior: the inverse Wishart distribution

The posterior:

Page 47: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Gaussian Mixture Model (GMM):

Gibbs Sampling

• Full joint distribution:

Page 48: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Gaussian Mixture Model (GMM): Gibbs Sampling

• The full conditionals:

Page 49: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Gaussian Mixture Model (GMM):

Gibbs Sampling

Page 50: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Label switching problem

• Unidentifiability

– The parameters of the model 𝜽, and the indicator functions 𝒛, are unidentifiable

– We can arbitrarily permute the hidden labels without affecting the likelihood

• Monte Carlo average of the samples for clusters:

– Samples for cluster 1 may be used for samples for cluster 2

• Labeling switching problem

– If we could average over all modes, we would find 𝐸[𝝁𝑘|𝐷] is the same for all 𝑘

Page 51: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Collapsed Gibbs sampling• Analytically integrate out some of the unknown

quantities, and just sample the rest.

• Suppose we sample 𝒛 and integrate out 𝜽

• Thus the 𝜽 parameters do not participate in the Markov chain;

• Consequently we can draw conditionally independent samples

This will have much lower variance than samples drawn from the joint state space

Rao-Blackwellisation

Page 52: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Collapsed Gibbs sampling

• Theorem 24.2.1 (Rao-Blackwell)

– Let 𝒛 and 𝜽 be dependent random variables, and 𝑓(𝒛, 𝜽) be some scalar function. Then

The variance of the estimate created by analytically integrating out 𝜽 will always be lower (or rather, will never be higher) than the variance of a direct MC estimate.

Page 53: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Collapsed Gibbs sampling

After integrating out the parameters.

Page 54: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Collapsed Gibbs: GMM

• Analytically integrate out the model parameters 𝝁𝑘, 𝚺𝑘 and 𝝅, and just sample the indicators 𝒛

– Once we integrate out 𝝅, all the 𝑧𝑖 nodes become inter-dependent.

– once we integrate out 𝜽𝑘, all the 𝑥𝑖 nodes become inter-dependent

• Full conditionals:

Page 55: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Collapsed Gibbs: GMM • Suppose a symmetric prior of the form 𝝅 =𝐷𝑖𝑟(𝜶) and 𝛼𝑘 = 𝛼/𝐾

• Integrating out 𝝅, using Dirichlet-multinormialformula:

𝑘=1

𝐾

𝑁𝑘,−𝑖 = 𝑁 − 1

Page 56: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

[ref] Dirichlet-multinormial

• The marginal likelihood for the Dirichlet-multinoulli model:

Page 57: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Collapsed Gibbs: GMM

All the data assigned to cluster 𝑘 except for 𝒙𝑖

Page 58: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Collapsed Gibbs: GMM

• Collapsed Gibbs sampler for a mixture model

Page 59: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Collapsed Gibbs vs. Vanilla Gibbs

a mixture of K = 4 2-d Gaussians applied to N = 300 data points

20 different random initializations

Page 60: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Collapsed Gibbs vs. Vanilla Gibbs logprob averaged over 100 different random initializations.

Solid line: the medianthick dashed: the 0.25 and 0.75 quantiles, thin dashed: the 0.05 and 0.95 quintiles.

Page 61: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Metropolis Hastings algorithm

• At each step, propose to move from the current state 𝒙 to a new state 𝒙′ with probability 𝑞(𝒙′|𝒙)

– 𝑞: the proposal distribution

– E.g.)

Random walk Metropolis algorithm

Page 62: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Metropolis Hastings algorithm• The acceptance probability if the proposal is

symmetric,

• The acceptance probability if the proposal is asymmetric

– Need the Hastings correction

Page 63: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Metropolis Hastings algorithm

• When evaluating α, we only need to know unnormalized density

Page 64: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Metropolis Hastings algorithm

Page 65: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Metropolis Hastings algorithm

• Gibbs sampling is a special case of MH, using the proposal:

• Then, the acceptance probability is 100%

Page 66: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Metropolis Hastings: Example

An example of the Metropolis Hastings algorithm for sampling from a mixture of two 1D Gaussians

MH sampling results using a Gaussian proposal with the variance 𝑣 = 1

Page 67: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Metropolis Hastings: Example

An example of the Metropolis Hastings algorithm for sampling from a mixture of two 1D Gaussians

MH sampling results using a Gaussian proposal with the variance 𝑣 = 500

Page 68: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Metropolis Hastings: Example

An example of the Metropolis Hastings algorithm for sampling from a mixture of two 1D Gaussians

MH sampling results using a Gaussian proposal with the variance 𝑣 = 8

Page 69: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Metropolis Hastings: Gaussian

Proposals • 1) an independence proposal

• 2) a random walk proposal

MH for binary logistic regression

Page 70: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Metropolis Hastings: Example

• MH for binary logical regression

Joint posterior of the parameters

Page 71: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Metropolis Hastings: Example• MH for binary logical regression

– Initialize the chain at the mode, computed using IRLS

– Use the random walk Metropolis sampler

Page 72: Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Metropolis Hastings: Example

• MH for binary logical regression