machine learning using matlab - uni-konstanz.dematlab lecture 10 clustering part 2 outline gaussian...
TRANSCRIPT
-
Machine Learning using Matlab
Lecture 10 Clustering part 2
-
Outline● Gaussian Mixture Model (GMM)● Expectation-Maximization (EM) ● Mean shift● Mean shift clustering
-
Multivariate Gaussian distribution● The probability density function of multivariate Gaussian/normal distribution is
given by:
where m is the mean vector, Si is the covariance matrix, and sig denotes the determinant of the matrix .
● Partial derivative to mean and variance
-
Multivariate Gaussian distribution
-
Multivariate Gaussian distribution
-
Multivariate Gaussian distribution
-
Multivariate Gaussian distribution
-
GMM - model representation● Let be the latent variable, and . denotes the
probability that xi generated from jth Gaussian model (k models in total) ● Gaussian mixture density is given by:
● Assume , where . Correspondingly, . ● Gaussian mixture density is rewritten as:
mixture coefficient
-
GMM● The parameters of GMM are thus● Maximum Likelihood Estimation (MLE) to maximize the following log
likelihood:
-
GMM● If parameters maximize log likelihood, then we
have:
-
● Bayes rule: p(A|B)p(B)=p(B|A)p(A)● Posterior probability:
● The mean and variance are computed as:
GMM
Soft assignment
-
GMM● We still have an additional constraint:● Introduce the Lagrange multiplier:
● Take derivative, we have
-
● Randomly initialize parameters● Repeat until converge:
○ E step. Compute posterior probability :
○ M step. Update the parameters using the current :
GMM algorithm
-
The General EM algorithmGiven a joint distribution p(X,Z|the over observed variables X and the latent variables Z , governed by parameters th, the goal is to maximize the likelihood function p(X|thet with respect to th .
● Choose an initial setting for the parameters ● Repeat until converge:
○ E step: evaluate p(Z|X,the using current ○ M step: evaluate th given by Z
-
GMM vs. k-means● Randomly initialize parameters● Repeat until converge:
○ E step. Compute soft membership, i.e., posterior probability
○ M step. Update the parameters using the current soft membership
● Randomly initialize k cluster centroids● Repeat until converge
○ Assign each data to its closest centroid○ Update the cluster center using its current
assigned points
-
Mean shift● Mean shift is a procedure for locating the
maxima of a density function given discrete data sampled from that function (mode-seeking algorithm)
● Algorithm:○ Input: random initialize centroid and a fixed window size○ Repeat until converge:
■ Compute the mean of points in the window size■ Shift the centroid to the new mean
● It is guaranteed to move toward the direction of maximum increase in the density
● Application: clustering, tracking, ...
-
Mean shift - model representation● Suppose the current mean is m , and (x1,x2,...,xm) be the data that are in
the window size h, the mean shift vector is given by:
● Mean shift procedure:○ Compute the mean shift vector○ Shift to the new mean
kernel function
-
Common kernels● Flat kernel:
● Gaussian kernel:
● ...
-
Properties of Mean shift● Adaptive gradient ascent
○ Automatic convergence speed - the mean shift vector size depends on the gradient itself○ Near maxima, the steps are small and refined
● Convergence is guaranteed for infinitesimal steps only
-
Mean shift clustering● Attraction basin: the region for which all trajectories lead to the same mode● Clustering: all data points in the attraction basin of a mode
-
Mean shift clusteringAlgorithm
● Starting on the data points, run mean shift procedure to find the stationary points of the density function
● Prune those points by retaining only the local maxima
-
Image segmentation by mean shift● Find features (color, gradients, texture,
etc)● Initialize windows at individual pixel
locations● Perform mean shift for each window until
converge● Merge windows that end up the same
mode
-
Segmentation results
Results from: Mean Shift: A Robust Approach toward Feature Space Analysis, PAMI 2002.
-
Problem: computational complexity
Slides from Fei-Fei Li
-
Speedups
Slides from Fei-Fei Li
-
Speedups
Slides from Fei-Fei Li
-
Summary of mean shift clustering● Pros:
○ General, application-independent tool, simple to implement○ Model-free, doesn’t assume any prior shape on data clusters○ Parameters free, only window size h○ Doesn't need to select number of cluster○ Robust to outliers
● Cons:○ Output depends on window radius, which is not trivial
■ Inappropriate window size can cause modes to be merged, or generate additional “shallow” modes → use adaptive window size
○ Computational intensive