university of joensuu dept. of computer science p.o. box 111 fin- 80101 joensuu tel. +358 13 251...

Post on 01-Apr-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Gaussian Mixture Models

Speech and Image Processing UnitDepartment of Computer Science

University of Joensuu, FINLAND

Ville Hautamäki

Clustering Methods: Part 8

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Preliminaries

• We assume that the dataset X has been generated by a parametric distribution p(X).

• Estimation of the parameters of p is known as density estimation.

• We consider Gaussian distribution.

http://research.microsoft.com/~cmbishop/PRML/Figures taken from:

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Typical parameters (1)

• Mean (μ): average value of p(X), also called expectation.

• Variance (σ): provides a measure of variability in p(X) around the mean.

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Typical parameters (2)

• Covariance: measures how much two

variables vary together.

• Covariance matrix: collection of covariances between all dimensions.

– Diagonal of the covariance matrix

contains the variances of each attribute.

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

One-dimensional Gaussian

• Parameters to be estimated are the mean (μ) and variance (σ)

2222

1 1Normal( | , ) exp

22x x

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Multivariate Gaussian (1)

• In multivariate case we have covariance matrix instead of variance

2 1/ 2

1 1 1Normal( | , ) exp

(2 ) det( ) 2

T

x x x

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Multivariate Gaussian (2)

Full covarianceDiagonalSingle

2

2

0

0

21

22

0

0

2 211 12

2 212 22

1

ln ( ) ln Normal( | , )N

nn

p X

x

Complete data log likelihood:

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Maximum Likelihood (ML) parameter estimation

• Maximize the log likelihood formulation

• Setting the gradient of the complete data log

likelihood to zero we can find the closed form

solution.

– Which in the case of mean, is the sample average.

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

When one Gaussian is not enough

• Real world datasets are rarely unimodal!

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Mixtures of Gaussians

1

( ) Normal( | , )M

k k kk

p

x x

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Mixtures of Gaussians (2)

• In addition to mean and covariance parameters (now M times), we have mixing coefficients πk.

1

1M

kk

0 1k

Following properties hold for the mixing coefficients:

It can be seen as the prior probability of the component k

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Responsibilities (1)

• Component labels (red, green and blue)

cannot be observed.

• We have to calculate approximations

(responsibilities).

Complete data Incomplete data Responsibilities

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Responsibilities (2)

• Responsibility describes, how

probably observation vector x is from

component k.

• In clustering, responsibilities take

values 0 and 1, and thus, it defines the

hard partitioning.

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

We can express the marginal density p(x) as:

1

( ) ( ) ( | )M

k

p p k p k

x x

( ) ( | )

( ) ( | )

( ) ( | )

Normal( | , )

Normal( | , )

k

l

k k k

l l ll

p k

p p k

p l p l

x x

x x

x

x

x

From this, we can find the responsibility of the kth component of x using Bayesian theorem:

Responsibilities (3)

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Expectation Maximization (EM)

• Goal: Maximize the log likelihood of the whole data

• When responsibilities are calculated, we can maximize individually for the means, covariances and the mixing coefficients!

1 1

ln ( | , , ) ln Normal( | , )N M

k n k kn k

p

X x

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Exact update equations

New mean estimates:

Covariance estimates

Mixing coefficient estimates

1

1( )

N

k k n nnkN

x x1

( )N

k k nn

N

x

1

1( )( )( )

TN

k k nnkN

x x x

kk

N

N

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

EM Algorithm

• Initialize parameters

• while not converged

– E step: Calculate responsibilities.

– M step: Estimate new parameters

– Calculate log likelihood of the new

parameters

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Example of EM

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Computational complexity

• Hard clustering with MSE criterion is NP-complete.

• Can we find optimal GMM in polynomial time?

• Finding optimal GMM is in class NP

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Some insights

• In GMM we need to estimate the parameters, which all are real numbers– Number of parameters:

M+M(D) + M(D(D-1)/2)

• Hard clustering has no parameters, just set partitioning (remember optimality criteria!)

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Some further insights (2)

• Both optimization functions are mathematically rigorous!

• Solutions minimizing MSE are always meaningful

• Maximization of log likelihood might lead to singularity!

University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu

Tel. +358 13 251 7959fax +358 13 251 7955

www.cs.joensuu.fi

Example of singularity

top related