![Page 1: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/1.jpg)
Gibbs Sampling and VariationalMethods
Héctor Corrada Bravo
University of Maryland, College Park, USA CMSC 644: 20190403
![Page 2: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/2.jpg)
Blei (2012), Comm. ACM
Mixture ModelsDocuments as mixtures of topics (Hoffman 1999, Blei et al. 2003)
1 / 48
![Page 3: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/3.jpg)
Kovacevic (2014), PLOS One
Mixture ModelsMore applications: genetics, populations as mixture of ancestralpopulations
2 / 48
![Page 4: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/4.jpg)
Glueck, et al. (2017). TVCG
Mixture ModelsMore applications: clinical subtyping
3 / 48
![Page 5: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/5.jpg)
Schulman and Saria (2016). JMLR
Mixture ModelsMore applications: clinical prognosis
4 / 48
![Page 8: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/8.jpg)
Mixture ModelsWe have a set of documents
Each document modeled as a bagofwords (bow) over dictionary .
: the number of times word appears in document .
D
W
xw,d w ∈ W d ∈ D
7 / 48
![Page 9: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/9.jpg)
Approximate Inference by SamplingUltimately, what we are interested in is learning topics
Perhaps instead of finding parameters that maximize likelihood
Sample from a distribution that gives us topic estimates
θ
Pr(θ|D)
8 / 48
![Page 10: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/10.jpg)
Approximate Inference by SamplingUltimately, what we are interested in is learning topics
Perhaps instead of finding parameters that maximize likelihood
Sample from a distribution that gives us topic estimates
But, we only have talked about how can we sample parameters?
θ
Pr(θ|D)
Pr(D|θ)
9 / 48
![Page 11: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/11.jpg)
Approximate Inference by SamplingLike EM, the trick here is to expand model with latent data
And sample from distribution
Zm
Pr(θ, Zm|Z)
10 / 48
![Page 12: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/12.jpg)
Approximate Inference by SamplingLike EM, the trick here is to expand model with latent data
And sample from distribution
This is challenging, but sampling from and is easier
Zm
Pr(θ, Zm|Z)
Pr(θ|Zm, Z) Pr(Zm|θ, Z)
11 / 48
![Page 13: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/13.jpg)
Approximate Inference by SamplingThe Gibbs Sampler does exactly that
Property: After some rounds, samples from the conditional distributions
Correspond to samples from marginal
Pr(θ|Zm, Z)
Pr(θ|Z) = ∑Z
m Pr(θ, Zm|Z)
12 / 48
![Page 14: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/14.jpg)
Approximate Inference by SamplingQuick aside, how to simulate data for pLSA?
Generate parameters and Generate
{pd} {θt}
Δw,d,t
13 / 48
![Page 15: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/15.jpg)
Approximate Inference by SamplingLet's go backwards, let's deal with Δw,d,t
14 / 48
![Page 16: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/16.jpg)
Approximate Inference by SamplingLet's go backwards, let's deal with
Where was as given by Estep
Δw,d,t
Δw,d,t ∼ Multxw,d(γw,d,1, … , γw,d,T )
γw,d,t
15 / 48
![Page 17: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/17.jpg)
Approximate Inference by SamplingLet's go backwards, let's deal with
Where was as given by Estep
for d in range(num_docs):
delta[d,w,:] = np.random.multinomial(doc_mat[d,w],
gamma[d,w,:])
Δw,d,t
Δw,d,t ∼ Multxw,d(γw,d,1, … , γw,d,T )
γw,d,t
16 / 48
![Page 18: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/18.jpg)
Approximate Inference by SamplingHmm, that's a problem since we need ...
But, we know so, let's use that to generate each as
xw,d
Pr(w, d) = ∑t pt,dθw,t xw,d
xw,d ∼ Multnd(Pr(1, d), … ,Pr(W , d))
17 / 48
![Page 19: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/19.jpg)
Approximate Inference by SamplingHmm, that's a problem since we need ...
But, we know so, let's use that to generate each as
for d in range(num_docs):
doc_mat[d,:] = np.random.multinomial(nw[d], np.sum(p[:,d] * theta), axis=0)
xw,d
Pr(w, d) = ∑t pt,dθw,t xw,d
xw,d ∼ Multnd(Pr(1, d), … ,Pr(W , d))
18 / 48
![Page 20: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/20.jpg)
Approximate Inference by SamplingNow, how about ? How do we generate the parameters of a Multinomialdistribution?
pd
19 / 48
![Page 21: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/21.jpg)
Approximate Inference by SamplingNow, how about ? How do we generate the parameters of a Multinomialdistribution?
This is where the Dirichlet distribution comes in...
If , then
pd
pd ∼ Dir(α)
Pr(pd) ∝T
∏t=1
pαt−1t,d
20 / 48
![Page 22: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/22.jpg)
Approximate Inference by SamplingSome interesting properties:
So, if we set all we will tend to have uniform probability over topics ( each on average)
If we increase it will also have uniform probability but will have verylittle variance (it will almost always be )
E[pt,d] =αt
∑t′ αt′
αt = 1
1/t
αt = 100
1/t
21 / 48
![Page 23: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/23.jpg)
Approximate Inference by SamplingSo, we can say and pd ∼ Dir(α) θt ∼ Dir(β)
22 / 48
![Page 24: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/24.jpg)
Approximate Inference by SamplingSo, we can say and
And generate data as (with )
for d in range(num_docs):
p[:,d] = np.random.dirichlet(1. * np.ones(num_topics))
pd ∼ Dir(α) θt ∼ Dir(β)
αt = 1
23 / 48
![Page 25: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/25.jpg)
Approximate Inference by SamplingSo what we have is a prior over parameters and : and
And we can formulate a distribution for missing data :
{pd} {θt} Pr(pd|α) Pr(θt|β)
Δw,d,t
Pr(Δw,d,t|pd, θt,α,β) =
Pr(Δw,d,t|pd, θt)Pr(pd|α)Pr(θt|β)
24 / 48
![Page 26: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/26.jpg)
Approximate Inference by SamplingHowever, what we care about is the posterior distribution
What do we do???
Pr(pd|Δw,d,t, θt,α,β)
25 / 48
![Page 27: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/27.jpg)
Approximate Inference by SamplingAnother neat property of the Dirichlet distribution is that it is conjugate tothe Multinomial
If and , thenθ|α ∼ Dir(α) X|θ ∼ Multinomial(θ)
θ|X, α ∼ Dir(X + α)
26 / 48
![Page 28: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/28.jpg)
Approximate Inference by SamplingThat means we can sample from
and
pd
pt,d ∼ Dir(∑w
Δw,d,t + α)
θw,t ∼ Dir(∑d
Δw,d,t + β)
27 / 48
![Page 29: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/29.jpg)
Blei, Ng, Jordan (2003), JMLR
Approximate Inference by SamplingCoincidentally, we have just specified the Latent Dirichlet Allocationmethod for topic modeling.
This is the most commonly used method for topic modeling
28 / 48
![Page 30: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/30.jpg)
Approximate Inference by SamplingWe can now specify a full Gibbs Sampler for an LDA mixture model.
Given:
Worddocument counts Number of topics Prior parameters and
Do: Learn parameters and for topics
xw,d
K
α β
{pd} {θt} K
29 / 48
![Page 31: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/31.jpg)
Approximate Inference by SamplingStep 0: Initialize parameters and
and
{pd} {θt}
pd ∼ Dir(α)
θt ∼ Dir(β)
30 / 48
![Page 32: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/32.jpg)
Approximate Inference by SamplingStep 1:
Sample based on current parameters and Δw,d,t {pd} {θt}
Δw,d,. ∼ Multxw,d(γw,d,1, … , γw,d,T )
31 / 48
![Page 33: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/33.jpg)
Approximate Inference by SamplingStep 2:
Sample parameters from
and
pt,d ∼ Dir(∑w
Δw,d,t + α)
θw,t ∼ Dir(∑d
Δw,d,t + β)
32 / 48
![Page 34: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/34.jpg)
Approximate Inference by SamplingStep 3:
Get samples for a few iterations (e.g., 200), we want to reach astationary distribution...
33 / 48
![Page 35: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/35.jpg)
Approximate Inference by SamplingStep 4:
Estimate as the average of the estimates from the last iterations(e.g., m=500)
Δ̂w,d,t m
34 / 48
![Page 36: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/36.jpg)
Approximate Inference by SamplingStep 5:
Estimate parameters and based on estimated pd θt Δ̂w,d,t
p̂ t,d =∑w Δ̂w,d,t + α
∑t∑w Δ̂w,d,t + α
θ̂w,t =∑d Δ̂w,d,t + β
∑w∑d Δ̂w,d,t + β
35 / 48
![Page 37: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/37.jpg)
Mixture modelsWe have now seen two different mixture models: soft kmeans and topicmodels
36 / 48
![Page 38: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/38.jpg)
Mixture modelsWe have now seen two different mixture models: soft kmeans and topicmodels
Two inference procedures:
Exact Inference with Maximum Likelihood using the EM algorithmApproximate Inference using Gibbs Sampling
37 / 48
![Page 39: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/39.jpg)
Mixture modelsWe have now seen two different mixture models: soft kmeans and topicmodels
Two inference procedures:
Exact Inference with Maximum Likelihood using the EM algorithmApproximate Inference using Gibbs Sampling
Next, we will go back to Maximum Likelihood but learn aboutApproximate Inference using Variational Methods
38 / 48
![Page 40: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/40.jpg)
Variational MethodsConsider LDA model again
Benefits
Full document generative modelCan process new documents(posterior over topics) and words(prior parameters)
39 / 48
![Page 41: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/41.jpg)
Variational MethodsConsider LDA model again
With Gibbs we sampled from
What if we want to estimateparameters again? (maximum aposteriori parameters)
Pr(θ, Δ|x, α, β)
40 / 48
![Page 42: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/42.jpg)
Variational MethodsConsider LDA model again
Very difficult to maximize
Harder than pLSA due to Dirichletpriors
41 / 48
![Page 43: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/43.jpg)
Variational MethodsConsider LDA model again
Let's get inspiration from EM:maximize lower bound
But what should the lower boundbe?
42 / 48
![Page 44: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/44.jpg)
Variational MethodsConsider LDA model again
Make missing data and parameters"independent"!
43 / 48
![Page 45: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/45.jpg)
Variational MethodsConsider LDA model again
Find parameters that make simplemodel most similar to original model
44 / 48
![Page 46: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/46.jpg)
Variational MethodsCan then define EMlike algorithm
Estep: define expectation w.r.t. approximate distribution
Mstep: maximize parameters of approximate distribution
45 / 48
![Page 47: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/47.jpg)
Variational MethodsNet result:
1) Maximum posterior estimates 2) Super simple updates 3) Withstochastic approach (update using a few words at a time), extremelyscalable
46 / 48
![Page 48: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/48.jpg)
Hoffman, et al. (2010). NIPS
Variational Methods
47 / 48
![Page 49: Methods Gibbs Sampling and Variational · Gibbs Sampling and Variational Methods ... Mixture Models More applications: genetics, populations as mixture of ancestral populations 2](https://reader033.vdocuments.site/reader033/viewer/2022050109/5f46a5250072892d4b7f73af/html5/thumbnails/49.jpg)
ConclusionProbabilistic mixture models: powerful model class with manyapplications
Awesome historical algorithmic development
Outstanding software support
48 / 48