the amazing mean shift algorithm

4

Click here to load reader

Upload: engr-ebi

Post on 21-Apr-2017

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Amazing Mean Shift Algorithm

Normal Deviate

Thoughts on Statistics and Machine Learning

The Amazing Mean Shift Algorithm

The mean shift (http://en.wikipedia.org/wiki/Mean_shift) algorithm is a mode-based clusteringmethod due to Fukunaga and Hostetler (1975) that is commonly used in computer vision but seems

less well known in statistics.

The steps are: (1) estimate the density, (2) find the modes of the density, (3) associate each data point

to one mode.

1. The Algorithm

We are given data which are a sample from a density . Let

be a kernel density estimator with kernel and bandwidth . Let denote the gradient

of .

Suppose that has modes . (Note that is not determined in advance; it is simply the

number of modes in the density estimator.) An arbitrary point is assigned to mode if the

gradient ascent curve through leads to . More formally, the gradient defines a family of

integral curves . An integral curve is defined by the differential equation

In words, moving along corresponds to moving up in the direction of the gradient. The integralcurves partition the space.

For each mode , define the set to be the set of all such that the integral curve starting at

leads to . Intuitively, if we place a particle at and let it flow up , it will end up at . That

is, if is the integral curve starting at , then . The sets form a partition.

We can now partition the data according to which sets they fall into. In other words, we define

clusters where .

Page 2: The Amazing Mean Shift Algorithm

To find the clusters, we only need to start at each , move along its gradient ascent curve andsee which mode it goes to. This is where the mean shift algorithm comes in.

The mean shift algorithm approximates the integral curves (i.e. the gradient ascent curves). Although

it is usually applied to the data , it can be applied to any set of points. Suppose we pick some point and we want to find out which mode it belongs to. We set and then, for , set

In other words, replace with a weighted average around and repeat. The trajectory converges to a mode .

In practice, we run the algorithm, not on some arbitrary but on each data point . Thus, byrunning the algorithm you accomplish two things: you find the modes and you find which mode

each data point goes to. Hence, you find the mode-based clusters.

The algorithm can be slow, but the literature is filled with papers with speed-ups, improvements etc.

Here is a simple example:

Page 3: The Amazing Mean Shift Algorithm

(https://normaldeviate.files.wordpress.com/2012/07/simple-meanshift-crop.png)

The big black dots are data points. The blue crosses are the modes. The red curves are the mean shiftpaths. Pretty, pretty cool.

2. Some Theory

Donoho and Liu (1991) showed that, if behaves in a locally quadratic fashion around the modes,

Page 4: The Amazing Mean Shift Algorithm

Donoho and Liu (1991) showed that, if behaves in a locally quadratic fashion around the modes,

then the minimax rate for estimating modes is and that the rate is achieved by kernel density

estimators. Thus, the above procedure has some theoretical foundation. (Their result s for but I

am pretty sure it extends to .)

More recently, Klemel{ä} (2005) showed that it is possible to obtain minimax estimators of the modethat adapt to the local behavior of around the modes. He uses a method due to Lepski (1992) that is

a fairly general way to do adaptive inference.

My colleagues (Chris Genovese, Marco Perone-Pacifico and Isa Verdinelli) and I have been working

on some extensions to the mean shift algorithm. I’ll report on that when we finish the paper.

I’ll close with a question: why don’t we use this algorithm more in statistical machine learning?

3. References

Cheng, Yizong (1995). Mean Shift, Mode Seeking, and Clustering. IEEE Transactions on PatternAnalysis and Machine Intelligence, 17, 790-799.

Donoho, D.L. and Liu, R.C. (1991). Geometrizing rates of convergence, II. The Annals of Statistics,

633-667.

Comaniciu, Dorin; Peter Meer (2002). Mean Shift: A Robust Approach Toward Feature SpaceAnalysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 603-619.

Fukunaga, Keinosuke; Larry D. Hostetler (1975). The Estimation of the Gradient of a Density

Function, with Applications in Pattern Recognition. IEEE Transactions on Information Theory, 21, 32-

40.

Klemel{ä}, J. (2005). Adaptive estimation of the mode of a multivariate density. Journal ofNonparametric Statistics, 17, 83-105.

Lepski, O.V. (1992). On problems of adaptive estimation in white Gaussian noise. Topics in

nonparametric estimation, 12, 87-106.

—Larry Wasserman