machine learning using matlab - uni-konstanz.dematlab lecture 10 clustering part 2 outline gaussian...

Machine Learning using Matlab

Lecture 10 Clustering part 2

Outline● Gaussian Mixture Model (GMM)● Expectation-Maximization (EM) ● Mean shift● Mean shift clustering

Multivariate Gaussian distribution● The probability density function of multivariate Gaussian/normal distribution is

given by:

where m is the mean vector, Si is the covariance matrix, and sig denotes the determinant of the matrix .

● Partial derivative to mean and variance

Multivariate Gaussian distribution

GMM - model representation● Let be the latent variable, and . denotes the

probability that xi generated from jth Gaussian model (k models in total) ● Gaussian mixture density is given by:

● Assume , where . Correspondingly, . ● Gaussian mixture density is rewritten as:

mixture coefficient

GMM● The parameters of GMM are thus● Maximum Likelihood Estimation (MLE) to maximize the following log

likelihood:

GMM● If parameters maximize log likelihood, then we

have:

● Bayes rule: p(A|B)p(B)=p(B|A)p(A)● Posterior probability:

● The mean and variance are computed as:

GMM

Soft assignment

GMM● We still have an additional constraint:● Introduce the Lagrange multiplier:

● Take derivative, we have

● Randomly initialize parameters● Repeat until converge:

○ E step. Compute posterior probability :

○ M step. Update the parameters using the current :

GMM algorithm

The General EM algorithmGiven a joint distribution p(X,Z|the over observed variables X and the latent variables Z , governed by parameters th, the goal is to maximize the likelihood function p(X|thet with respect to th .

● Choose an initial setting for the parameters ● Repeat until converge:

○ E step: evaluate p(Z|X,the using current ○ M step: evaluate th given by Z

GMM vs. k-means● Randomly initialize parameters● Repeat until converge:

○ E step. Compute soft membership, i.e., posterior probability

○ M step. Update the parameters using the current soft membership

● Randomly initialize k cluster centroids● Repeat until converge

○ Assign each data to its closest centroid○ Update the cluster center using its current

assigned points

Mean shift● Mean shift is a procedure for locating the

maxima of a density function given discrete data sampled from that function (mode-seeking algorithm)

● Algorithm:○ Input: random initialize centroid and a fixed window size○ Repeat until converge:

■ Compute the mean of points in the window size■ Shift the centroid to the new mean

● It is guaranteed to move toward the direction of maximum increase in the density

● Application: clustering, tracking, ...

Mean shift - model representation● Suppose the current mean is m , and (x1,x2,...,xm) be the data that are in

the window size h, the mean shift vector is given by:

● Mean shift procedure:○ Compute the mean shift vector○ Shift to the new mean

kernel function

Common kernels● Flat kernel:

● Gaussian kernel:

● ...

Properties of Mean shift● Adaptive gradient ascent

○ Automatic convergence speed - the mean shift vector size depends on the gradient itself○ Near maxima, the steps are small and refined

● Convergence is guaranteed for infinitesimal steps only

Mean shift clustering● Attraction basin: the region for which all trajectories lead to the same mode● Clustering: all data points in the attraction basin of a mode

Mean shift clusteringAlgorithm

● Starting on the data points, run mean shift procedure to find the stationary points of the density function

● Prune those points by retaining only the local maxima

Image segmentation by mean shift● Find features (color, gradients, texture,

etc)● Initialize windows at individual pixel

locations● Perform mean shift for each window until

converge● Merge windows that end up the same

mode

Segmentation results

Results from: Mean Shift: A Robust Approach toward Feature Space Analysis, PAMI 2002.

Problem: computational complexity

Slides from Fei-Fei Li

Speedups

Slides from Fei-Fei Li

Summary of mean shift clustering● Pros:

○ General, application-independent tool, simple to implement○ Model-free, doesn’t assume any prior shape on data clusters○ Parameters free, only window size h○ Doesn't need to select number of cluster○ Robust to outliers

● Cons:○ Output depends on window radius, which is not trivial

■ Inappropriate window size can cause modes to be merged, or generate additional “shallow” modes → use adaptive window size

○ Computational intensive

machine learning using matlab - uni-konstanz.dematlab lecture 10 clustering part 2 outline gaussian...

Documents