meanshift and camshift -...

As we have seen in class, tracking consists of maintaining an estimate of the position of an object in a video. The way we did this in the past lab session was scanning the whole image for pixels that match a certain hue, binarizing the image, removing the noise and calculating the centroid of the remaining blob. We repeat this process for every frame in the video, effectively tracking the object.

Meanshift and CAMSHIFTMeanshift and CAMSHIFT

However, it is plain to see that this approach is inefficient: scanning the whole image is unnecessary most of the times, if we consider that the object we are tracking usually does not move much from one frame to the next.

Therefore, a more efficient tracking algorithm would look for an object only in the vicinity of where it was on the previous frame. One (of many) such algorithm is called Meanshift.

Meanshift acts on a grayscale image called a back projection (more clearly explained in the lab session 7 guide sheet). It first calculates the centroid of all the pixels in this image. This centroid is “weighted”, in the sense that it gives a greater weight to lighter pixels (and none at all to black pixels).

Original image Back projection. This image represents the probability that each pixel belongs to the object of interest (ball).

It then places a window (of arbitrary size, but small) centered on that centroid. From now on, it will only look for the object inside this window (this is the part that makes it more efficient).

The algorithm then calculates the centroid of the pixels in the window, and afterwards moves the window so that it is now centered on the new centroid. All of this happens in the same video frame.

In this picture, the algorithm first places (circular) window C1 on centroid C1_o. It calculates the centroid of the pixels in the window (C1_r) and then moves the window there (not shown). After a few iterations, the window will end up in the area with highest pixel concentration (window C2).

When this happens, the centroid of the window will be (virtually the same as) the centroid of the object of interest.

There is a very helpful animation that makes this process easier to visualize at http://fr.wikipedia.org/wiki/Camshift

http://fr.wikipedia.org/wiki/Camshift

There is, however, a problem with this tracking algorithm. Can you see what it is by looking at the picture to the left?

As the object of interest approaches the camera, it gets larger. The window, however, does not. This makes it possible that only a part of the object is tracked, giving as a result an incorrect estimate of its position.

An improvement over this algorithm would be to vary the size of the search window, so that it contains the whole object. CAMSHIFT (Continuously Adaptive Mean Shift) does this by first running Meanshift until it converges, and then using an equation to resize the window Meanshift will use on the next video frame.

The length of the side of the new (square) window is determined by the equation above. This equation is explained more clearly in the lab session guide sheet. CAMSHIFT further improves Meanshift by calculating the orientation of the object within the window (through equations shown in the guide sheet). This way we not only have the coordinates of the object of interest, but also its heading.

A complete description of CAMSHIFT can be found in the original publication on this algorithm (easily findable through Google):

Bradski, G.R., “Real time face and object tracking as a component of a perceptual user interface,” Applications of Computer Vision, 1998. WACV ‘98. Proceedings., Fourth IEEE Workshop on , vol., no., pp.214,219, 19-21 Oct 1998

meanshift and camshift -...

Documents