a p ermutation-augmented sampler fo r dp mixture mo...
TRANSCRIPT
![Page 1: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/1.jpg)
A Permutation-Augmented Sampler
for DP Mixture Models
ICML 2007 Corvallis, Oregon
June 21, 2007
Percy Liang Michael I. Jordan Ben TaskarUC Berkeley UC Berkeley U Penn
![Page 2: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/2.jpg)
Introduction
Dirichlet process mixture models:
• Clustering applications:– natural language processing, e.g. [Blei, et. al, 2004;
Daume, Marcu, 2005; Goldwater, et. al, 2006; Liang,et. al, 2007]
– vision, e.g. [Sudderth, et. al, 2006]
– bioinformatics, e.g. [Xing, et. al, 2004]
• Nonparametric: number of clusters adapts to data
• Current inference based on local moves
2
![Page 3: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/3.jpg)
Introduction
Dirichlet process mixture models:
• Clustering applications:– natural language processing, e.g. [Blei, et. al, 2004;
Daume, Marcu, 2005; Goldwater, et. al, 2006; Liang,et. al, 2007]
– vision, e.g. [Sudderth, et. al, 2006]
– bioinformatics, e.g. [Xing, et. al, 2004]
• Nonparametric: number of clusters adapts to data
• Current inference based on local moves
Outline:
• DP mixture model
• Permutation-augmented model ⇒ global moves
• Experiments
2
![Page 4: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/4.jpg)
Dirichlet processes
G
θi
xin
DP mixture modelG ∼ DP(α0, G0)For each data point i = 1, . . . , n:
θi ∼ Gxi ∼ F (θi)
[Ferguson, 1973; Antoniak, 1974]
3
![Page 5: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/5.jpg)
Dirichlet processes
G
θi
xin
DP mixture modelG ∼ DP(α0, G0)For each data point i = 1, . . . , n:
θi ∼ Gxi ∼ F (θi)
Definition: G0 = a distribution on Θ, α0 = concentration parameter.
G is a draw from a Dirichlet process, denoted G ∼ DP(α0, G0)
[Ferguson, 1973; Antoniak, 1974]
3
![Page 6: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/6.jpg)
Dirichlet processes
G
θi
xin
DP mixture modelG ∼ DP(α0, G0)For each data point i = 1, . . . , n:
θi ∼ Gxi ∼ F (θi)
Definition: G0 = a distribution on Θ, α0 = concentration parameter.
G is a draw from a Dirichlet process, denoted G ∼ DP(α0, G0)
⇔(G(A1), . . . , G(AK)) ∼ Dirichlet(α0G0(A1), . . . , α0G0(AK))
for all partitions (A1, . . . , AK) of Θ.
A1 A2
A3 A4
Θ
[Ferguson, 1973; Antoniak, 1974]
3
![Page 7: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/7.jpg)
Inference
Representations:
• Chinese restaurant process: marginalize G
• Stick-breaking representation: explicitly represent G
4
![Page 8: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/8.jpg)
Inference
Representations:
• Chinese restaurant process: marginalize G
• Stick-breaking representation: explicitly represent G
Previous algorithms:
• Collapsed Gibbs sampling [Escobar, West, 1995]
• Blocked Gibbs sampling [Ishwaran, James, 2001]
• Split-merge sampling [Jain, Neal, 2000; Dahl, 2003]
• Variational [Blei, Jordan, 2005; Kurihara, et. al, 2007]
• A-star search [Daume, 2007]
4
![Page 9: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/9.jpg)
Chinese restaurant process
G ∼ DP(α0, G0) is discrete (with probability 1)
Marginalize out G ⇒ induces clustering CEach cluster c ∈ C is a subset of {1, . . . , n}Example: C = {{1}, {2, 3, 5}, {4}}
[Pitman, 2002]
5
![Page 10: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/10.jpg)
Chinese restaurant process
G ∼ DP(α0, G0) is discrete (with probability 1)
Marginalize out G ⇒ induces clustering CEach cluster c ∈ C is a subset of {1, . . . , n}Example: C = {{1}, {2, 3, 5}, {4}}
. . .
p(i ∈ c) =
{|c|
i−1+α0if c old
α0i−1+α0
if c new
probability:
[Pitman, 2002]
5
![Page 11: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/11.jpg)
Chinese restaurant process
G ∼ DP(α0, G0) is discrete (with probability 1)
Marginalize out G ⇒ induces clustering CEach cluster c ∈ C is a subset of {1, . . . , n}Example: C = {{1}, {2, 3, 5}, {4}}
. . .1
p(i ∈ c) =
{|c|
i−1+α0if c old
α0i−1+α0
if c new
probability:α0
0+α0
[Pitman, 2002]
5
![Page 12: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/12.jpg)
Chinese restaurant process
G ∼ DP(α0, G0) is discrete (with probability 1)
Marginalize out G ⇒ induces clustering CEach cluster c ∈ C is a subset of {1, . . . , n}Example: C = {{1}, {2, 3, 5}, {4}}
. . .1 2
p(i ∈ c) =
{|c|
i−1+α0if c old
α0i−1+α0
if c new
probability:α0
0+α0
α01+α0
[Pitman, 2002]
5
![Page 13: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/13.jpg)
Chinese restaurant process
G ∼ DP(α0, G0) is discrete (with probability 1)
Marginalize out G ⇒ induces clustering CEach cluster c ∈ C is a subset of {1, . . . , n}Example: C = {{1}, {2, 3, 5}, {4}}
. . .1 2
3
p(i ∈ c) =
{|c|
i−1+α0if c old
α0i−1+α0
if c new
probability:α0
0+α0
α01+α0
12+α0
[Pitman, 2002]
5
![Page 14: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/14.jpg)
Chinese restaurant process
G ∼ DP(α0, G0) is discrete (with probability 1)
Marginalize out G ⇒ induces clustering CEach cluster c ∈ C is a subset of {1, . . . , n}Example: C = {{1}, {2, 3, 5}, {4}}
. . .1 2
3 4
p(i ∈ c) =
{|c|
i−1+α0if c old
α0i−1+α0
if c new
probability:α0
0+α0
α01+α0
12+α0
α03+α0
[Pitman, 2002]
5
![Page 15: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/15.jpg)
Chinese restaurant process
G ∼ DP(α0, G0) is discrete (with probability 1)
Marginalize out G ⇒ induces clustering CEach cluster c ∈ C is a subset of {1, . . . , n}Example: C = {{1}, {2, 3, 5}, {4}}
. . .1 2
3 45
p(i ∈ c) =
{|c|
i−1+α0if c old
α0i−1+α0
if c new
probability:α0
0+α0
α01+α0
12+α0
α03+α0
24+α0
[Pitman, 2002]
5
![Page 16: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/16.jpg)
CRP prior over clusterings
Previous example: p(C) = α00+α0
α01+α0
12+α0
α03+α0
24+α0
[Antoniak, 1974]
6
![Page 17: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/17.jpg)
CRP prior over clusterings
Previous example: p(C) = α00+α0
α01+α0
12+α0
α03+α0
24+α0
In general:
p(C) =1
AF(α0, n)
∏c∈C
α0(|c| − 1)!
AF(α0, n) = α0(α0 + 1) · · · (α0 + n− 1) is ascending factorial
Key: p(C) decomposes over clusters c
[Antoniak, 1974]
6
![Page 18: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/18.jpg)
DP mixture model via the CRP
G
θi
xin
7
![Page 19: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/19.jpg)
DP mixture model via the CRP
Each cluster (table) c has a dish θ.
Data points (customers) generated i.i.d. given dish.
Assuming conjugacy, we can marginalize out θ.
G
θi
xin
7
![Page 20: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/20.jpg)
DP mixture model via the CRP
Each cluster (table) c has a dish θ.
Data points (customers) generated i.i.d. given dish.
Assuming conjugacy, we can marginalize out θ.
G
θi
xin
C
x
7
![Page 21: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/21.jpg)
DP mixture model via the CRP
Each cluster (table) c has a dish θ.
Data points (customers) generated i.i.d. given dish.
Assuming conjugacy, we can marginalize out θ.
G
θi
xin
C
x
p(C) =1
AF(α0, n)
∏c∈C
α0(|c| − 1)!
p(x | C) =∏c∈C
∫ ∏i∈c
F (xi; θ)G0(dθ)︸ ︷︷ ︸def=p(xc)
Key: p(C) and p(x | C) decompose over clusters c
7
![Page 22: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/22.jpg)
Posterior inference
C
xGoal: compute p(C | x)
• Exact inference: sum over exponential number ofclusterings
8
![Page 23: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/23.jpg)
Posterior inference
C
xGoal: compute p(C | x)
• Exact inference: sum over exponential number ofclusterings
• Collapsed Gibbs sampler: change C one assignment ata time
8
![Page 24: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/24.jpg)
Posterior inference
C
xGoal: compute p(C | x)
• Exact inference: sum over exponential number ofclusterings
• Collapsed Gibbs sampler: change C one assignment ata time
• Split-merge sampler: change C two clusters at a time
8
![Page 25: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/25.jpg)
Posterior inference
C
xGoal: compute p(C | x)
• Exact inference: sum over exponential number ofclusterings
• Collapsed Gibbs sampler: change C one assignment ata time
• Split-merge sampler: change C two clusters at a time
• Permutation-augmented sampler: can change all of C
8
![Page 26: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/26.jpg)
Local optima
Collapsed Gibbs can get stuck in local optima
one collapsedGibbs move
Hard to reach this state:
9
![Page 27: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/27.jpg)
Augmenting with a permutation
C
x π
Sampler: alternate between sampling C and π
10
![Page 28: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/28.jpg)
Augmenting with a permutation
C
x π
Sampler: alternate between sampling C and π
Why augment?• Conditioned on π, can use
dynamic programming toefficiently sample all of C
10
![Page 29: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/29.jpg)
Augmenting with a permutation
C
x π
Sampler: alternate between sampling C and π
Why augment?• Conditioned on π, can use
dynamic programming toefficiently sample all of C
• If sample in augmented model,can marginalize out (ignore) π torecover original model
10
![Page 30: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/30.jpg)
Augmenting with a permutation
C
x π
Sampler: alternate between sampling C and π
Why augment?• Conditioned on π, can use
dynamic programming toefficiently sample all of C
• If sample in augmented model,can marginalize out (ignore) π torecover original model
{{1}, {2, 3, 5}, {4}}
10
![Page 31: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/31.jpg)
Augmenting with a permutation
C
x π
Sampler: alternate between sampling C and π
Why augment?• Conditioned on π, can use
dynamic programming toefficiently sample all of C
• If sample in augmented model,can marginalize out (ignore) π torecover original model
{{1}, {2, 3, 5}, {4}}sample π | C
4 1 5 2 3
10
![Page 32: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/32.jpg)
Augmenting with a permutation
C
x π
Sampler: alternate between sampling C and π
Why augment?• Conditioned on π, can use
dynamic programming toefficiently sample all of C
• If sample in augmented model,can marginalize out (ignore) π torecover original model
{{1}, {2, 3, 5}, {4}}sample π | C
4 1 5 2 3sample C | π,x
{{4, 1}, {5}, {2, 3}}
10
![Page 33: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/33.jpg)
Augmenting with a permutation
C
x π
Sampler: alternate between sampling C and π
Why augment?• Conditioned on π, can use
dynamic programming toefficiently sample all of C
• If sample in augmented model,can marginalize out (ignore) π torecover original model
{{1}, {2, 3, 5}, {4}}sample π | C
4 1 5 2 3sample C | π,x
{{4, 1}, {5}, {2, 3}}sample π | C
5 4 1 3 2
10
![Page 34: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/34.jpg)
Sampling the permutation
C
x πp(π | C,x)
11
![Page 35: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/35.jpg)
Sampling the permutation
C
x πp(π | C,x)
What’s p(π | C)?
Let Π(C) = permutations consistent with C (all clusters contiguousin permutation)
Example:
Clustering C = {{1, 3}, {2}}Consistent permutations:
132 312 213 231 123 321
11
![Page 36: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/36.jpg)
Sampling the permutation
C
x πp(π | C,x)
What’s p(π | C)?
Let Π(C) = permutations consistent with C (all clusters contiguousin permutation)
Example:
Clustering C = {{1, 3}, {2}}Consistent permutations:
132 312 213 231 123 321
p(π | C) = uniform over Π(C)
=1
|C|!∏
c∈C |c|!if π ∈ Π(C), else 0.
11
![Page 37: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/37.jpg)
Sampling the clustering
C
x πp(C | π,x) ∝ p(C)p(x | C)p(π | C)
Number of consistent clusterings C: 2n−1
Example:
Permutation π = 312Consistent clusterings C:
{3}, {1}, {2}{3, 1}, {2}{3}, {1, 2}{3, 1, 2}
12
![Page 38: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/38.jpg)
Sampling the clustering
C
x πp(C | π,x) ∝ p(C)p(x | C)p(π | C)
13
![Page 39: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/39.jpg)
Sampling the clustering
C
x πp(C | π,x) ∝ p(C)p(x | C)p(π | C)
p(C) =1
AF(α0, n)
∏c∈C
α0(|c| − 1)!
p(x | C) =∏c∈C
p(xc)
p(π | C) =1[π ∈ Π(C)]|C|!
∏c∈C |c|!
13
![Page 40: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/40.jpg)
Sampling the clustering
C
x πp(C | π,x) ∝ p(C)p(x | C)p(π | C)
p(C) =1
AF(α0, n)
∏c∈C
α0(|c| − 1)!
p(x | C) =∏c∈C
p(xc)
p(π | C) =1[π ∈ Π(C)]|C|!
∏c∈C |c|!
p(C, π,x) =1[π ∈ Π(C)]AF(α0, n)|C|!︸ ︷︷ ︸
def=A(|C|)
∏c∈C
α0p(xc)|c|︸ ︷︷ ︸
def=B(c)
13
![Page 41: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/41.jpg)
DPDP
p(C, π,x) = A(|C|)∏c∈C
B(c)
Goal: p(π,x) =∑n
K=1 A(K)∑
C:π∈Π(C),|C|=K
∏c∈C
B(c)︸ ︷︷ ︸def=g(n,K)
14
![Page 42: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/42.jpg)
DPDP
p(C, π,x) = A(|C|)∏c∈C
B(c)
Goal: p(π,x) =∑n
K=1 A(K)∑
C:π∈Π(C),|C|=K
∏c∈C
B(c)︸ ︷︷ ︸def=g(n,K)
g(r,K) = sum over clusterings of 1 . . . r with K clusters
g(r,K) =∑r
m=1 g(r − m,K − 1)B({πr−m+1, . . . , πr})
B({πr−m+1, . . . , πr})1 rr − m · · ·
g(r,K)g(r − m,K − 1)
Running time: O(n3), space: O(n2)
14
![Page 43: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/43.jpg)
Optimizations
Current running time: O(n3), space: O(n2)
p(C, π,x) = A(|C|)∏c∈C
B(c)
• Remove dependence on |C| to get MH proposal ⇒O(n2) dynamic program
• Use a beam ⇒ O(n) time
Final running time: empirically O(n), space: O(n)
15
![Page 44: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/44.jpg)
Data-dependent permutations
C
x π
16
![Page 45: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/45.jpg)
Data-dependent permutations
C
x π
C
x π
Goal: use data x to guide permutation—place similarpoints near each other
16
![Page 46: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/46.jpg)
Data-dependent permutations
C
x π
C
x π
Goal: use data x to guide permutation—place similarpoints near each other
Two possible p(π | C,x):• Markov Gibbs scans
• Random projections
16
![Page 47: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/47.jpg)
Random projections
How to sample from p(π | C,x):• Choose a random direction u
• Project points onto u ⇒ induces permutation
• Note: keep clusters contiguous in permutation
u
1
23 4
Permutation induced by projection u: 3 1 2 4
17
![Page 48: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/48.jpg)
Random projections
How to sample from p(π | C,x):• Choose a random direction u
• Project points onto u ⇒ induces permutation
• Note: keep clusters contiguous in permutation
u
1
23 4
Permutation induced by projection u: 3 1 2 4
Computing p(π | C,x) is hard; ignore it ⇒ stochastichill-climbing algorithm
17
![Page 49: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/49.jpg)
Experimental setup
Interleave different moves to form hybrid samplers:
Gibbs Collapsed Gibbs [Escobar, West, 1995]
Gibbs+SplitMerge With split-merge [Dahl, 2003]
Gibbs+Perm With permutation (this paper)
Gibbs+SplitMerge+Perm With all three moves
18
![Page 50: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/50.jpg)
Experimental setup
Interleave different moves to form hybrid samplers:
Gibbs Collapsed Gibbs [Escobar, West, 1995]
Gibbs+SplitMerge With split-merge [Dahl, 2003]
Gibbs+Perm With permutation (this paper)
Gibbs+SplitMerge+Perm With all three moves
• Run on synthetic Gaussians and two real-world datasets
• Evaluate on log-probability of clustering
18
![Page 51: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/51.jpg)
Synthetic Gaussians
Setup: generate mixture of Gaussians: 10,000 points, 10–80dimensions, 20–160 true clusters
-850000
-800000
-750000
-700000
-650000
0 50 100 150 200 250
log
prob
abili
ty
seconds
(g) 40 dimensions, 40 true clusters
-900000
-850000
-800000
-750000
-700000
-650000
0 200 400 600 800 1000 1200
log
prob
abili
ty
seconds
(f) 160 true clusters, 40 dimensions
Gibbs
Gibbs+SplitMerge
Gibbs+Perm
Gibbs+SplitMerge+Perm
19
![Page 52: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/52.jpg)
Synthetic Gaussians
Setup: generate mixture of Gaussians: 10,000 points, 10–80dimensions, 20–160 true clusters
-850000
-800000
-750000
-700000
-650000
0 50 100 150 200 250
log
prob
abili
ty
seconds
(g) 40 dimensions, 40 true clusters
-900000
-850000
-800000
-750000
-700000
-650000
0 200 400 600 800 1000 1200
log
prob
abili
ty
seconds
(f) 160 true clusters, 40 dimensions
Gibbs
Gibbs+SplitMerge
Gibbs+Perm
Gibbs+SplitMerge+Perm
• Gibbs+Perm significantly outperforms Gibbs
• Gibbs+Perm outperforms Gibbs+SplitMerge,especially when there are many clusters
19
![Page 53: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/53.jpg)
AP dataset
2246 points, 10,473 dimensions [multinomial model]
-3.65e+06
-3.64e+06
-3.63e+06
-3.62e+06
-3.61e+06
-3.6e+06
-3.59e+06
-3.58e+06
-3.57e+06
0 500 1000 1500 2000 2500 3000
log
prob
abili
ty
seconds
(j) AP
Gibbs
Gibbs+SplitMerge
Gibbs+Perm
Gibbs+SplitMerge+Perm
Gibbs+SplitMerge outperforms Gibbs+Perm
Gibbs+SplitMerge+Perm performs best
20
![Page 54: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/54.jpg)
MNIST dataset
10,000 points, 50 dimensions (obtained via PCA on pixels)[Gaussian model]
-1.26e+06
-1.258e+06
-1.256e+06
-1.254e+06
-1.252e+06
0 500 1000 1500 2000 2500 3000
log
prob
abili
ty
seconds
(i) MNIST
Gibbs
Gibbs+SplitMerge
Gibbs+Perm
Gibbs+SplitMerge+Perm
Gibbs+Perm outperforms Gibbs+SplitMerge
Gibbs+SplitMerge+Perm performs best21
![Page 55: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/55.jpg)
Conclusions
• Inference algorithms for DP mixtures suffer from localminima when they make small moves
• Key idea: can use dynamic programming to sum overall clusterings consistent with a permutation
• Random projections yields effective stochastichill-climbing algorithm
22
![Page 56: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/56.jpg)
Conclusions
• Inference algorithms for DP mixtures suffer from localminima when they make small moves
• Key idea: can use dynamic programming to sum overall clusterings consistent with a permutation
• Random projections yields effective stochastichill-climbing algorithm
What sampler should I use for my data?
• Gibbs is good at refining clusterings
• Split-merge is good when there are few clusters
• Permutation-augmented is good at changing manyclusters at once
22
![Page 57: A P ermutation-Augmented Sampler fo r DP Mixture Mo delspliang/papers//permutation-dp-icml2007-talk.pdfA P ermutation-Augmented Sampler fo r DP Mixture Mo dels ICML 2007 Co rvallis,](https://reader033.vdocuments.site/reader033/viewer/2022050121/5f513aa9e5f918157102aba6/html5/thumbnails/57.jpg)
Conclusions
• Inference algorithms for DP mixtures suffer from localminima when they make small moves
• Key idea: can use dynamic programming to sum overall clusterings consistent with a permutation
• Random projections yields effective stochastichill-climbing algorithm
What sampler should I use for my data?
• Gibbs is good at refining clusterings
• Split-merge is good when there are few clusters
• Permutation-augmented is good at changing manyclusters at once
Combining all three often leads to best performance.
22