bnp1models cvpr2012 applied bayesian nonparametrics

98
Applied Bayesian Nonparametrics 1. Models & Inference T utorial at C V P R 2 01 2 Erik Sudderth Brown University   Additional detail & citations in background chapter: E. B. Sudderth, Graphical Models for Visual Object Recognition and T racking, PhD Thesis, MIT, 2006.

Upload: zukun

Post on 05-Apr-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 1/98

Applied Bayesian

Nonparametrics1. Models & Inference

Tutorial at CVPR 2012

Erik SudderthBrown University 

 Additional detail & citations in background chapter:

E. B. Sudderth, Graphical Models for Visual Object 

Recognition and Tracking, PhD Thesis, MIT, 2006.

Page 2: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 2/98

Applied

Bayesian

Nonparametric

StatisticsLearning probabilistic models of visual data.

Clustering & features, space & time, mostly unsupervised.

Complex data motivates models of unbounded complexity.We often need to learn the structure of the model itself.

Not no parameters! Models with infinitely many parameters.

Distributions on uncertain functions, distributions,!

 

Focus on those models which are most useful in practice.To understand those models, we’ll start with theory ! 

Page 3: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 3/98

Applied BNP: Part I•! Review of parametric Bayesian models!! Finite mixture models

!! Beta and Dirichlet distributions

•! Canonical Bayesian nonparametric (BNP) model families

!! Dirichlet & Pitman-Yor processes for infinite clustering

!! Beta processes for infinite feature induction•! Key representations for BNP learning 

!! Infinite-dimensional stochastic processes

!! Stick-breaking constructions

!! Partitions and Chinese restaurant processes

!! Infinite limits of finite, parametric Bayesian models•! Learning and inference algorithms

!! Representation and truncation of infinite models

!! MCMC methods and Gibbs samplers

!! Variational methods and mean field

Page 4: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 4/98

Coffee Break

Page 5: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 5/98

Applied BNP: Part II

Page 6: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 6/98

Bayes Rule (Bayes Theorem)

unknown parameters (many possible models)

observed data available for learning

prior distribution (domain knowledge)

likelihood function (measurement model)

posterior distribution (learned information)

θD

 p(θ)

 p(D | θ) p(θ | D)

 p(θ, D) = p(θ) p(D | θ) = p(D) p(θ | D)

∝  p(D | θ) p(θ)

 p(θ | D) = p(θ, D) p(D)

= p(D | θ) p(θ)θ∈Θ

 p(D | θ) p(θ)

Page 7: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 7/98

Gaussian Mixture Models

•! Observed feature vectors:

•! Hidden cluster labels:

•! Hidden mixture means:

•! Hidden mixture covariances:

•! Hidden mixture probabilities:

xi ∈ R

d

, i = 1, 2, . . . , N 

µk ∈ Rd, k = 1, 2, . . . , K  

zi ∈ {1, 2, . . . , K }, i = 1, 2, . . . , N 

Σk ∈ Rd×d, k = 1, 2, . . . , K 

πk,

K

k=1

πk = 1

•! Gaussian mixture marginal likelihood:

 p(xi | π, µ,Σ) =K

zi=1

πzi N (xi | µzi ,Σzi)

 p(xi | zi,π, µ,Σ) = N (xi | µzi ,Σzi)

Page 8: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 8/98

Gaussian Mixture ModelsMixture of 3 Gaussian

Distributions in 2D

Contour Plot of Joint Density,

Marginalizing Cluster Assignments

0 0.2 0.4 0.6 0.8 1

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

 p(xi | π, µ,Σ) =K

zi=1

πzi N (xi | µzi ,Σzi)

 p(xi | zi,π, µ,Σ) = N (xi | µzi ,Σzi)

Page 9: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 9/98

Gaussian Mixture Models

Surface Plot of Joint Density,

Marginalizing Cluster Assignments

Page 10: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 10/98

Clustering with Gaussian Mixtures

C. Bishop, Pattern Recognition & Machine Learning 

(a)

0 0.5 1

0

0.5

1(b)

0 0.5 1

0

0.5

1

Complete Data Labeled 

by True Cluster Assignments

Incomplete Data:

Points to be Clustered 

Page 11: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 11/98

Inference Given Cluster Parameters

(b)

0 0.5 1

0

0.5

1

Posterior Probabilities of 

 Assignment to Each Cluster 

Incomplete Data:

Points to be Clustered 

(c)

0 0.5 1

0

0.5

1

rik = p(zi = k | xi,π, θ) =πk p(xi | θk)

K

=1 π p(xi | θ)

Page 12: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 12/98

Learning Binary Probabilities

Bernoulli Distribution: Single toss of a (possibly biased) coin

0 ≤ θ ≤ 1

X i ∼ Ber(θ), i = 1, . . . , N 

•! Suppose we observe N samples from a Bernoulli

distribution with unknown mean:

•! What is the maximum likelihood parameter estimate?

 p(x1, . . . , xN  | θ) = θN 1(1− θ)N 0

Ber(x | θ) = θI(x=1)(1− θ)I(x=0)

θ̂ = arg maxθ

log p(x | θ) =N 1

Page 13: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 13/98

Beta Distributions

Probability density function: x ∈ [0, 1]Γ(k) = (k − 1)!

Γ(x + 1) = xΓ(x)

Page 14: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 14/98

Beta Distributions

E[x] = a

a + bV[x] = ab

(a + b)2(a + b + 1)

Mode[x] = arg maxx∈[0,1]

Beta(x | a, b) =a− 1

(a−

1) + (b−

1)

Page 15: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 15/98

Bayesian Learning of Probabilities

Bernoulli Likelihood: Single toss of a (possibly biased) coin

0 ≤ θ ≤ 1Ber(x | θ) = θI(x=1)(1− θ)I(x=0)

 p(x1, . . . , xN  | θ) = θN 1(1− θ)N 0

Beta Prior Distribution: 

 p(θ) = Beta(θ | a, b) ∝ θa−1(1− θ)b−1

 p(θ | x) ∝ θN 1+a−1(1− θ)N 0+b−1

∝ Beta(θ | N 1 + a,N 0 + b)

Posterior Distribution: 

•! This is a conjugate prior, because posterior is in same family •! Estimate by posterior mode (MAP) or mean (preferred)

Page 16: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 16/98

Sequence of Beta Posteriors

0 0.2 0.4 0.6 0.8 1

0

1

2

3

4

5

6

7

8

9

10

 

truth

n=5

n=50

n=100

Murphy 

2012 

Page 17: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 17/98

Multinomial Simplex

Page 18: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 18/98

Learning Categorical Probabilities

Categorical Distribution: Single roll of a (possibly biased) die

Cat(x | θ) =K

k=1

θxk

k X  = {0, 1}K ,

K

k=1

xk = 1

 p(x1, . . . , xN  | θ) =

K

k=1θN k

k

•! If we have N k observations of outcome k in N trials:

•! The maximum likelihood parameter estimates are then:

•! Will this produce sensible predictions when K is large?

For nonparametric models we let K approach infinity! 

ˆ = arg maxθ

log  p(x | θ) ˆk =

N k

Page 19: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 19/98

Dirichlet Distributions

0

0.5

1

0

0.5

1

0

5

10

15

20

25

α=10.00

      p

0

0.5

0

0.5

1

0

5

10

15

α=0.10

      p

Moments:

Marginal Distributions:

Page 20: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 20/98

Dirichlet Probability Densities

Page 21: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 21/98

Dirichlet Samples

Dir θ | 0.1, 0.1, 0.1, 0.1, 0.1 Dir(θ | 1.0, 1.0, 1.0, 1.0, 1.0)

Page 22: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 22/98

Bayesian Learning of Probabilities

Dirichlet Prior Distribution: 

Posterior Distribution: 

•! This is a conjugate prior, because posterior is in same family 

Categorical Distribution: Single roll of a (possibly biased) die

Cat(x | θ) =K

k=1

θxk

k X  = {0, 1}K ,

K

k=1

xk = 1

 p(x1, . . . , xN  | θ) =

K

k=1 θ

N k

k

 p(θ) = Dir(θ | α) ∝K

k=1

θαk−1

k

 p(θ | x) ∝K

k=1

θN k+αk−1

k∝ Dir(θ | N 1 + α1, . . . , N  K + αK)

Page 23: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 23/98

Directed Graphical Models

Chain rule implies that any joint distribution equals:

Directed graphical model  implies a restricted factorization:

a(t)→ parents with edges pointing to node t

nodes → random variables

Valid for any directed acyclic graph (DAG):

equivalent to dropping conditional dependencies in standard chain rule

Page 24: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 24/98

Plates: Learning with Priors

•! Boxes, or  plates, indicate replication of variables•! Variables which are observed, or fixed, are often shaded 

•! Prior distributions may themselves have hyperparameters λ

Page 25: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 25/98

Gaussian Mixture Models

0 0.2 0.4 0.6 0.8 1

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

 p(xi | π, µ,Σ) =K

zi=1

πzi N (xi | µzi ,Σzi)

 p(xi | zi,π, µ,Σ) = N (xi | µzi ,Σzi)

Page 26: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 26/98

Finite Bayesian Mixture Models

(a)

•!Cluster frequencies: Symmetric Dirichlet

0

0.5

1

0

0.5

1

0

5

10

15

•! Cluster shapes: Any valid prior on chosen

family (e.g., Gaussian mean & covariance)

•! Data: Assign each data item to a cluster,

and sample from that cluster’s likelihood

zi ∼ Cat(π)

xi

∼ F 

(θzi)

Page 27: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 27/98

Generative Gaussian Mixture Samples

•! Gaussian with known variance: Gaussian prior on mean

•! Gaussian with unknown mean & variance: normal inverse-Wishart 

Learning is simplest with conjugate priors on cluster shapes:

Page 28: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 28/98

Mixtures as Discrete Measures

Θ

Θ

Θ

Toy visualization: 1D Gaussian

mixture with unknown cluster 

means and fixed variance

G(θ) =K

k=1

πkδθk(θ)

•! Define mixture via a discrete probability 

measure on cluster parameters:

θkatom, point mass, Dirac delta

•! Generate data via repeated draws G:

θ̄i = θzi

Page 29: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 29/98

Mixtures Induce Partitions

(a)

zi ∼ Cat(π)

•! If our goal is clustering, the output grouping

is defined by assignment indicator variables: 

•! The number of ways of assigning N data

points to K mixture components isK 

K ≥ N •! If this is much larger than thenumber of ways of partitioning that data:

33

= 27N=3: 5 partitions versus

Page 30: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 30/98

Mixtures Induce Partitions

zi ∼ Cat(π)

•! If our goal is clustering, the output grouping

is defined by assignment indicator variables: 

•! The number of ways of assigning N data

points to K mixture components isK 

K ≥ N •! If this is much larger than thenumber of ways of partitioning that data:

N=5: 52 partitions versus55

= 3125

Courtesy Wikipedia

For any clustering, there is aunique partition, but many ways to

label that partition’s blocks.

Page 31: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 31/98

Dirichlet Process MixturesThe Dirichlet Process (DP) 

 A distribution on countably infinite

discrete probability measures.

Sampling yields aPolya urn.

Infinite Mixture Models As an infinite limit of finite

mixtures with Dirichlet weight priors

Stick-Breaking An explicit construction

for the weights in

DP realizations

Chinese RestaurantProcess (CRP)The distribution on

 partitions induced by a DP prior 

Page 32: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 32/98

Chinese RestaurantProcess (CRP)The distribution on

 partitions induced by a DP prior 

Dirichlet Process Mixtures

Page 33: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 33/98

Nonparametric Clustering

•!Large Support:  All partitions of the data, from one giantcluster to N singletons, have positive probability under prior 

•! Exchangeable: Partition probabilities are invariant to

permutations of the data

•! Desirable: Good asymptotics, computational tractability,

flexibility and ease of generalization!

 

Clusters, arbitrarily ordered 

   O   b  s  e  r  v  a   t   i  o

  n  s ,  a  r   b   i   t  r  a  r   i   l  y  o  r   d  e  r  e

   d Ghahramani,

BNP 2009

Page 34: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 34/98

Chinese Restaurant Process (CRP)•! Visualize clustering as a sequential process of customers

sitting at tables in an (infinitely large) restaurant:

customers observed data to be clustered 

tables distinct blocks of partition, or clusters

•!The first customer sits at a table. Subsequent customersrandomly select a table according to:

number of tables occupied by the first N customersnumber of customers seated at table k 

k̄ a new, previously unoccupied table 

K N k

α positive concentration parameter  

Page 35: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 35/98

Chinese Restaurant Process (CRP)

Page 36: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 36/98

CRPs & Exchangeable Partitions

•! The probability of a seating arrangement of N customersis independent  of the order they enter the restaurant:

 p(z1, . . . , zN  | α) = Γ(α)Γ(N  + α)

αK

K

k=1

Γ(N k)

1

1 + α

·

1

2 + α

· · ·

1

N − 1 + α

=Γ(α)

Γ(N  + α)normalization

constants

α

1 · 2 · · · (N k − 1) = (N k − 1)! = Γ(N k)

first customer to

sit at each table

other customers

 joining each table

•! The CRP is thus a prior on infinitely exchangeable partitions

Page 37: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 37/98

De Finetti’s Theorem•!

Finitely exchangeable random variables satisfy:

•!  A sequence is infinitely exchangeable if every finitesubsequence is exchangeable

•! Exchangeable variables need not be independent, but

always have a representation with conditional independencies:

 An explicit construction is useful in hierarchical modeling ! 

Page 38: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 38/98

De Finetti’s Directed Graph

What distribution underlies the infinitely exchangeable CRP?

Page 39: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 39/98

Chinese RestaurantProcess (CRP)The distribution on

 partitions induced by a DP prior 

Dirichlet Process MixturesThe Dirichlet Process (DP) 

 A distribution on countably infinite

discrete probability measures.

Sampling yields aPolya urn.

Page 40: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 40/98

Dirichlet Processes

•! Given a base measure (distribution) H & concentration parameter 

•! Then for any finite partition

the distribution of the measure of those cells is Dirichlet:

Ferguson 1973

Base Measure

α > 0

Page 41: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 41/98

Dirichlet Processes

•! Marginalization properties of finite Dirichlet distributions satisfy

Kolmogorov’s extension theorem for stochastic processes:

Ferguson 1973

Base Measure

Page 42: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 42/98

DP Posteriors and Conjugacy

•! Does the posterior distribution of G have a tractable form?•! For any partition, the posterior mean given N observations is

G ∼ DP(α,H ) θ̄i ∼ G, i = 1, . . . , N 

•! In fact, the posterior distribution is another Dirichlet process,with mean that depends on the data’s empirical distribution:

Page 43: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 43/98

DPs and Polya Urns

!! Consider an urn containing ! pounds of very tiny,

colored sand (the space of possible colors is ")

!! Take out one grain of sand, record its color as

!! Put that grain back, add 1 extra pound of that color 

!! Repeat this process! 

θ̄1

•! Can we simulate observations without constructing G?•! Yes, by a variation on the classical balls in urns analogy:

G ∼ DP(α,H ) θ̄i ∼ G, i = 1, . . . , N 

Page 44: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 44/98

Chinese RestaurantProcess (CRP)The distribution on

 partitions induced by a DP prior 

Dirichlet Process MixturesThe Dirichlet Process (DP) 

 A distribution on countably infinitediscrete probability measures.

Sampling yields aPolya urn.

Stick-Breaking An explicit construction

for the weights in

DP realizations

Page 45: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 45/98

A Stick-Breaking Construction

0 1

concentrationparameter  

•! Dirichlet process realizations are discrete with probability one:

G ∼ DP(α,H ) G(θ) =∞

k=1

πkδθk(θ)

Sethuraman 1994

•! Cluster shape parameters drawn from base measure: θk ∼ H 

•! Cluster weights drawn from a stick-breaking process:

Page 46: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 46/98

Dirichlet Stick-Breaking

E[βk] =

1

1 + αβk∼

Beta(1,

α)

Page 47: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 47/98

DPs and Stick Breaking

Page 48: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 48/98

Chinese RestaurantProcess (CRP)The distribution on

 partitions induced by a DP prior 

Dirichlet Process MixturesThe Dirichlet Process (DP) 

 A distribution on countably infinitediscrete probability measures.

Sampling yields aPolya urn.

Stick-Breaking An explicit construction

for the weights in

DP realizations

Infinite Mixture Models As an infinite limit of finite

mixtures with Dirichlet weight priors

Page 49: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 49/98

DP Mixture Models

G ∼DP(

α, H )

θi ∼ G

xi ∼ F (θ̄i)

G(θ) =

k=1

πkδθk(θ)

zi ∼ Cat(π)

xi ∼ F (θzi)

θk ∼ H 

)π ∼ Stick(α)

•! Stick-breaking: Explicit size-biased sampling of weights•! Chinese restaurant process: Indicator sequence

•!Polya urn: Corresponding parameter sequence

z1, z2, z3, . . .

θ̄1,θ̄2,

θ̄3, . . .

Page 50: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 50/98

Samples from DP Mixture Priors

N=50 

Page 51: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 51/98

Samples from DP Mixture Priors

N=200 

Page 52: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 52/98

Samples from DP Mixture Priors

N=1000 

Page 53: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 53/98

Finite versus DP Mixtures

π ∼ Stick(α)

zi ∼ Cat(π)

xi ∼ F 

(θz

i)

Finite Mixture DP Mixture

GK(θ) =

K

k=1

πkδθk(θ)

THEOREM: For any measureable function f, as K →∞

G ∼ DP(α, H )

k ∼ H 

Page 54: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 54/98

Finite versus CRP Partitions

π ∼ Stick(α)

zi ∼ Cat(π)

Finite Mixture DP Mixture

Chinese Restaurant Process: 

 p(z1, . . . , zN  | α) = Γ(α)Γ(N + α)

α

K+

K+k=1

N k−1j=1

 j +

α

 p(z1, . . . , zN  | α) =

Γ(α)

Γ(N  + α)αK+

K+

k=1

(N k − 1)!

+ number of blocks in cluster 

Finite Dirichlet: 

•! Probability of Dirichlet indicators approach zero as•! Probability of Dirichlet partition approaches CRP as

K →∞

Page 55: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 55/98

Dirichlet Process MixturesThe Dirichlet Process (DP) 

 A distribution on countably infinitediscrete probability measures.

Sampling yields aPolya urn.

Infinite Mixture Models As an infinite limit of finite

mixtures with Dirichlet weight priors

Stick-Breaking An explicit construction

for the weights in

DP realizations

Chinese RestaurantProcess (CRP)The distribution on

 partitions induced by a DP prior 

Page 56: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 56/98

Pitman-Yor Process Mixtures

Infinite Mixture ModelsBut not an infinite limit of finite

mixtures with symmetric weight priors

Stick-Breaking An explicit construction

for the weights in

PY realizations

Chinese RestaurantProcess (CRP)The distribution on

 partitions induced by a PY prior 

Page 57: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 57/98

Pitman-Yor Processes

The Pitman-Yor process defines a distribution oninfinite discrete measures, or  partitions 

Dirichlet process: 

0 1

Page 58: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 58/98

Dirichlet Stick-Breaking

 All stick indices

Page 59: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 59/98

Pitman-Yor Stick-Breaking

Page 60: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 60/98

Chinese Restaurant Process (CRP)customers observed data to be clustered 

tables distinct blocks of partition, or clusters

•! Partitions sampled from the PY process can be generatedvia a generalized CRP, which remains exchangeable

•! The first customer sits at a table. Subsequent customers

randomly select a table according to:

number of tables occupied by the first N customersnumber of customers seated at table k 

k̄ a new, previously unoccupied table 

N k

discount & concentration parameters 0 ≤ a < 1, b > −a

 p(zN +1 = z | z1, . . . , zN ) =1

b + N 

Kk=1

(N k − a)δ(z, k) + (b + Ka)δ(z, k̄)

Page 61: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 61/98

Human Image Segmentations

Labels for more than 29,000 segments in 2,688 images of natural scenes

Page 62: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 62/98

Statistics of Human Segments

How many objectsare in this image?

ManySmall

Objects

SomeLarge

Objects

Object sizes followa power law

Labels for more than 29,000 segments in 2,688 images of natural scenes

Page 63: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 63/98

Statistics of Semantic Labels

sky

trees

person

lichen

rainbow

wheelbarrow

waterfall

Labels for more than 29,000 segments in 2,688 images of natural scenes

How frequent aretext region labels?

Page 64: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 64/98

Why Pitman-Yor?

Jim Pitman

Marc Yor 

Generalizing the Dirichlet Process

!! Distribution on partitions leads to ageneralized Chinese restaurant process

! Special cases of interest in probability:

Markov chains, Brownian motion,! 

Power Law Distributions

DP PY

Number of uniqueclusters in N 

observations

HeapsLaw:

Size of sorted cluster weight k 

Goldwater, Griffiths, & Johnson, 2005 

Teh, 2006 

Natural Language

Statistics

Zipfs Law:

Page 65: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 65/98

An Aside: Toy Dataset Bias

! !"# $ $"# % %"# & &"# ' '"# #!

!"#

$

$"#

%

%"#

&

&"#

'

'"#

#()*+,-.-/01+234+

! !"# $ $"# % %"# & &"# ' '"# #!

!"#

$

$"#

%

%"#

&

&"#

'

'"#

#0)(35(67500+,-&-/01+234+

! !"# $ $"# % %"# & &"# ' '"# #!

!"#

$

$"#

%

%"#

&

&"#

'

'"#

#8914/0916+,-%-/01+234+

! !"# $ $"# % %"# & &"# ' '"# #!

!"#

$

$"#

%

%"#

&

&"#

'

'"#

#+:1);;03+,-'-/01+234+

! !"# $ $"# % %"# & &"# ' '"# #!

!"#

$

$"#

%

%"#

&

&"#

'

'"#

#2<9/)4/03+,-%-/01+234+

! !"# $ $"# % %"# & &"# ' '"# #!

!"#

$

$"#

%

%"#

&

&"#

'

'"#

#2=433/)4/03+ >9)(36,-%-/01+234+

- - - - - -

Ng, Jordan, & Weiss, NIPS 2001

Page 66: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 66/98

Pitman-Yor Process Mixtures

Infinite Mixture ModelsBut not an infinite limit of finite

mixtures with symmetric weight priors

Stick-Breaking An explicit construction

for the weights in

PY realizations

Chinese RestaurantProcess (CRP)The distribution on

 partitions induced by a PY prior 

Dirichlet processes and finite Dirichlet 

distributions do not produceheavy-tailed, power law distributions

Page 67: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 67/98

Latent Feature Models

•! Latent feature modeling : Each group of observations isassociated with a subset of the possible latent features

•! Factorial power: There are 2K combinations of K features, while

accurate mixture modeling may require many more clusters

•! Question: What is the analog of the DP for feature modeling?

Binary matrix 

indicating feature

 presence/absence

Depending on application, features

can be associated with any 

 parameter value of interest 

Page 68: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 68/98

Nonparametric Binary FeaturesThe Beta Process (BP) 

 A Levy process whose realizationsare countably infinite collections of 

atoms, with mass between

0 and 1.

Infinite Feature Models As an infinite limit of a finite

beta-Bernoulli binary feature model 

Stick-Breaking An explicit construction

for the feature frequencies

in BP realizations

Indian Buffet

Process (IBP)The distribution on

sparse binary matricesinduced by a BP 

Page 69: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 69/98

Poisson Distribution for Counts

X =

{0

,1

,2

,3

, . . .

}Poi(x | θ) = e

−θθx

x!θ > 0

Page 70: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 70/98

Indian Buffet Process (IBP)•! Visualize feature assignment as a sequential process of 

customers sampling dishes from an (infinitely long) buffet:

customers observed data to be modeled 

dishes binary features to be selected 

•! The first customer chooses dishes,•! Subsequent customer i randomly samples each previously

tasted dish k with probability

•! That customer also samples new dishes

number of previous customers to sample dish k 

Poisson(α)

f ik ∼ Ber

mk

i

Poisson(α/i)

mk

α > 0

Page 71: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 71/98

Binary Feature Realizations

•! IBP is exchangeable, up to a permutation of the order withwhich dishes are listed in the binary feature matrix

•! Clustering models like the DP have one “feature” per customer 

•! The number of features sampled at least once is

Ghahramani,BNP 2009

O(α logN )

Page 72: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 72/98

Finite Beta-Bernoulli Features

πk ∼ Beta

α

K , 1

zik ∼ Ber(πk)

•! The expected number of active features in N customers is

N α

(1 + α/K )

→ N α

•! The marginal probability of the realized binary matrix equals

Page 73: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 73/98

Beta-Bernoulli and the IBP•! We can show that the limit of the finite beta-Bernoulli model,

and the IBP, produce the same distribution onleft-ordered-form equivalence classes of binary matrices:

•! Poisson distribution in IBP arises from the law of rare events:•! Flip K coins with probability of coming up heads

•!  As the distribution of the number of total heads

approaches Poisson(α)

α/K K →∞

N t i Bi F t

Page 74: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 74/98

Nonparametric Binary FeaturesThe Beta Process (BP) 

 A Levy process whose realizationsare countably infinite collections of 

atoms, with mass between

0 and 1.

Infinite Feature Models As an infinite limit of a finite

beta-Bernoulli binary feature model 

Stick-Breaking An explicit construction

for the feature frequencies

in BP realizations

Indian Buffet

Process (IBP)The distribution on

sparse binary matricesinduced by a BP 

Extensions: Additional control over feature sharing, power laws!

 

N t i L i

Page 75: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 75/98

Nonparametric Learning

Finite Bayesian ModelsSet finite model order to be larger 

than expected number of 

clusters or features.

Stick-BreakingTruncate stick-breaking 

to produce provably 

accurate approximation.

CRP & IBPTractably learn via finite

summaries of true,

infinite model.

Infinite Stochastic ProcessesConceptually useful, but usually 

impractical or impossible for 

learning algorithms.

Page 76: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 76/98

Markov Chain Monte Carlo

•! At each time point, state is a configuration of all the

variables in the model: parameters, hidden variables, etc. •! We design the transition distribution so that

the chain is irreducible and ergodic , with a unique

stationary distribution

z(0)

z(1)

z(2)

z(t+1)

∼ q(z | z(t))

z(t)

q(z | z(t))

 p∗(z)

 p

(z) = Z  q(z | z

) p∗

(z

) dz

•! For learning, the target equilibrium distribution is usually theposterior distribution given data x :

•! Popular recipes: Metropolis-Hastings and Gibbs samplers

 p∗(z) = p(z | x

Page 77: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 77/98

Gibbs Sampler for a 2D Gaussian

C. Bishop, Pattern Recognition & Machine Learning 

z1

z2

L

l

z(t

)i∼  p(zi | z(

t−

1)\i ) i = i(t)

z(t)j = z

(t−1)j j = i(t)

Under mild conditions,

converges assuming all 

variables are resampled 

infinitely often (order can be

fixed or random)

General Gibbs Sampler 

Page 78: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 78/98

Finite Mixture Gibbs Sampler 

Most basic approach: Sample z, #, $%

Page 79: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 79/98

Standard Finite Mixture Sampler 

Page 80: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 80/98

Standard Sampler: 2 Iterations

Page 81: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 81/98

Standard Sampler: 10 Iterations

Page 82: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 82/98

Standard Sampler: 50 Iterations

Page 83: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 83/98

Collapsed Finite Bayesian Mixture

•! Conjugate priors allow analytic integration of some parameters•! Resulting sampler operates on reduced space of cluster 

assignments (implicitly considers all possible cluster shapes)

Page 84: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 84/98

Collapsed Finite Mixture Sampler 

Page 85: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 85/98

Standard versus Collapsed Samplers

Page 86: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 86/98

DP Mixture Models

Page 87: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 87/98

Collapsed DP Mixture Sampler 

Page 88: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 88/98

Collapsed DP Sampler: 2 Iterations

S S 10

Page 89: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 89/98

Standard Sampler: 10 Iterations

St d d S l 50 It ti

Page 90: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 90/98

Standard Sampler: 50 Iterations

DP Fi it Mi t S l

Page 91: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 91/98

DP versus Finite Mixture Samplers

DP P t i N b f Cl t

Page 92: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 92/98

DP Posterior Number of Clusters

B i O kh ’ R

Page 93: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 93/98

Bayesian Ockham’s Razor 

William of Ockham

   Plurality must never be posited without necessity.    

Dm

θ

data

model 

 parameters

Even with uniform p(m), marginal likelihood  provides a model selection bias

E l I thi i f i ?

Page 94: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 94/98

Example: Is this coin fair?M 0 : Tosses are from a fair coin:

M 1: Tosses are from a coin of unknown bias:

θ = 1/2

θ ∼ Unif(0, 1)

Marginal Likelihoods

Number of heads in N=5 tosses

M 1

M 0 

•! ML: Always prefer M 1•! Bayes: Unbalanced counts

are much more likely with abiased coin, so favor M 1

•! Bayes: Balanced counts

only happen with somebiased coins, so favor M 0 

Variational Approximations

Page 95: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 95/98

Variational Approximations

(Multiply by one)

(Jensen’sinequality)

•! Minimizing KL divergence maximizes a likelihood bound•! Variational EM algorithms, which maximize for q(x) within

some tractable family, retain BNP model selection behavior 

M Fi ld f DP Mi t

Page 96: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 96/98

Mean Field for DP Mixtures

•! Truncate stick-breaking at some upper bound K on the truenumber of occupied clusters:

•! Priors encourage assigning data to fewer than K clusters

k = 1, . . . , K −

1

πK = 1−

K−1

k=1

πk

Blei & Jordan2006 

MCMC & Variational Learning

Page 97: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 97/98

MCMC & Variational Learning

Finite Bayesian ModelsSet finite model order to be larger 

than expected number of 

clusters or features.

Stick-BreakingTruncate stick-breaking 

to produce provably 

accurate approximation.

CRP & IBPTractably learn via finite

summaries of true,

infinite model.

Infinite Stochastic ProcessesConceptually useful, but usually 

impractical or impossible for 

learning algorithms.

A li d BNP P t II

Page 98: BNP1models Cvpr2012 Applied Bayesian Nonparametrics

7/31/2019 BNP1models Cvpr2012 Applied Bayesian Nonparametrics

http://slidepdf.com/reader/full/bnp1models-cvpr2012-applied-bayesian-nonparametrics 98/98

Applied BNP: Part II