modified camshift

8/12/2019 Modified CAMSHIFT

1/5

People Tracking via a Modified CAMSHIFT Algorithm

Fahad Fazal Elahi Guraya1, Pierre-Yves Bayle2and Faouzi Alaya Cheikh1

1Department of Computer Science and Media Technology, Gjovik University College

Gjovik 2802, Norway2Department of Computer Science, Universite de Bourgogne

Dijon, France

Email: [email protected]

ABSTRACT

The past decade has witnessed a rapid growth in the use of

video cameras in all aspect of our daily life. Video camerasuse for Surveillance purposes has also increased. This

increased the demand of automatic video surveillancealgorithms that can detect abnormal activity/events in thesurveillance area and can raise the alarm. Most of the event

detection applications relay on persons detection and tracking

or object tracking algorithm. CAMSHIFT is a trackingalgorithm that has already been used for face-tracking. It is,

however, not being successfully used for people/persontracking. This paper presents a modified version ofCAMSHIFT algorithm which can be used for person tracking.

The proposed algorithm incorporates the motion informationwith CAMSHIFT algorithm, which allows it to track people

successfully even in the case of occlusions. The experimentalresults prove the consistency of the proposed algorithm.

Keywords:Object Tracking, CAMSHIFT, Motion estimationtracking.

1. INTRODUCTIONObject tracking is used for many applications such as

motion-based recognition, automated surveillance, videoindexing and retrieval, human computer interaction and trafficmonitoring etc. Object tracking can be divided into series of

steps, such as object representation, feature selection fortracking, object detection, background subtraction, and objectsegmentation.

Objects could be represented with the help of a single point

(e.g. centroid) [1] or a set of points [2], by primitive geometricshapes (rectangles or ellipses) [3], silhouettes and contours fortracking complex non-rigid objects[4]. Feature selection also

plays an important role in object tracking. The mostcommonly used features are color, edges, optical flow andtexture [10, 11]. Various color spaces other than RGB are used

for tracking purposes such as HSV because RGB colorspacedoes not correspond to the color differences perceived byhumans [5]. Commonly used edge detectors are Canny Edge

detector [6]. Optical flow can also be used as feature for objecttracking. Some of the most used optical flow methods are [7, 8,9].

Object trackers can be categorized into three differentcategories such as point trackers, kernel trackers and silhouette

trackers. Point tracking methods can be furthersub-categorized into deterministic methods such as [12, 13]

and statistical methods [14, 15]. Kernel based object tracking

methods are Mean Shift tracking [16], and ContinuouslyAdaptive Mean Shift tracking (CAMSHIFT) [17].

Eigen-tracking [18] and Support Vector Machines [19] arekernel based methods for multi-view appearance models.Silhouette tracking methods are divided into contour evolution

methods [20, 21] and shape matching methods [22, 23].

Background modeling could be the basic step in many video

analysis applications, used to extract foreground or movingobjects from the video frames. Change in a scene orforeground objects could be extracted from a video sequence

by subtracting the background image from each frame. Ingeneral the background is considered to be constant or slowly

changing due to luminance changes. In practice the

background pixels are always changing, for that reason weneed a model which accounts for gradual changes. Several

techniques have been proposed in the literature for modelingthe variation of the background information [24, 25, 26]. In[24] modeling of each stationary background color pixel with

a single 3D (YUV color-space) Gaussian was proposed. Asingle 3D (YUV) Gaussian however is not suitable for outdoorscenes [27] since at a certain location multiple colors can be

seen due to repetitive motion, shadows or reflectance. ThusMixture of Gaussians (MoG) for modeling a single pixel colorwas proposed in [28].

In this paper, we use MoG method for background modeling.Contour detection provided by Intel Open Source Computer

Vision Library (OpenCV) [29] is used for object detection andrepresentation. CAMSHIFT kernel based object tracking is

used to track the object with color information using HueSaturation Value color-space, considered as a betterrepresentative of color perceived by human vision [5]. Motion

information is also incorporated with color information usingOptical flow method provided in [32].

In the next section, the implementation of tracking algorithmsis presented. Section 3 proposed the CAMSHIFT algorithmwith motion detection. Conclusion and future work is given in

section 4.

2. PEOPLE DETECTION AND TRACKING2.1 Background Estimation and Post ProcessingTo perform people tracking, we first used Mixture of Gaussian

(MoG) method to calculate a good background model. TheOpenCV functions are an implementation of the GaussiansMixture Model in [30]. In this implementation, the model

assumes that each pixel in the scene is modeled by a mixtureof K Gaussian distributions where different Gaussiansrepresent different colors. The weight parameters of the

mixture represent the time proportions that those colors stay inthe scene. Thus, the probable background colors are the oneswhich stay longer and are more static. The probability that a

certain pixel has a value of xNat time N can be written as p(xN)

as shown in eq.(1).

1

( ) . ( ; )K

N j N j

j

p x w x =

= (1)

where wj is the weight parameter of the jth Gaussian


2/5

component. ( ; )N jx is the Normal distribution of the jth

component. Static single-color objects trend to form tightclusters in the color space while moving ones form widen

clusters due to different reflecting surfaces during themovement. The measure of this was called the fitness value[30]. The K distributions are ordered based on the fitness and

the first B distributions are used as a model of the backgroundof the scene where B is estimated as shown in eq.(2).

1

argminb

jb

j

B w T=

= > (2)

The threshold T is the minimum fraction of the background

model. In other words, it is the minimum prior probability thatthe background is in the scene. Background subtraction isperformed by marking a foreground pixel any pixel that is

more than 2.5 standard deviations away from any of the Bdistributions. Figure.1 shows the original frame and thebackground obtained after applying Gaussian mixture model.

Figure 1.An example of the background results obtained withthe Adaptive Gaussian Mixture Model.

Next, we need to remove the shadows in the foregroundobtained with the previous background subtraction method. In

[31] the authors assume that we can consider a pixel as shadedbackground or shadow if it has similar chromaticity but lowerbrightness than those of the same pixel in the background

image. Thus, with an appropriate threshold T, we can removeshadows from the foreground image. Eq.(3) shows thedecision either a certain pixel belongs to shadow or not.

1

{ , } &

0

img bg

img bg

if brightness brightness

Shadow x y chromaticity chromaticity T

otherwise


3/5

(2) Select an initial location of the Mean Shift searchwindow. The selected location is the target

distribution to be tracked.(3) Calculate a color probability distribution of the

region centered at the Mean Shift search window.

(4) Iterate Mean Shift algorithm to find the centroid ofthe probability image. Store the 0th moment(distribution area) and centroid location.

(5) For the following frame, center the search windowat the mean location found in step 4 and set thewindow size to a function of the 0th moment. Then

go to Step 3.

The creation of the color histogram corresponds to steps 1 to

3. The first step is to define the region of interest (ROI) whichis the bounding box corresponding to the detected person thatwe want to track. Then, we need to calculate the color

histogram corresponding to this person. For that we use theHSV color space, and calculate a one dimensional histogram

corresponding to the first component: hue. We also define amask for the histogram calculation, which is the foreground

image, to calculate the histogram only for the person, and notfor the background inside the bounding box. But the resultsobtained with this method were not satisfying, because in thecase where the background has almost the same color as the

person, it is not possible to detect the difference between thetwo in the back-projection image. That is why we decided in asecond time to use a three dimensional histogram, using the

three components of the HSV color space: hue, saturation,value. With this method, we were able to find the location ofthe person in the whole frame, even with a similar

background. In all cases the histogram bins values are scaledto be within the discrete pixel range of the 2D probabilitydistribution image using eq.(4).

1

255min ,255

max{ }u u

u m

p q

q

=

=

(4)

That is, the histogram bin values are rescaled to the range [0,

255], where pixels with the highest probability of being in thesample histogram will map as visible intensities in the 2D

histogram back-projection image.

Now we have a histogram of the moving person, we need to

find this person in all the frames (steps 4 to 5). For that end,we calculate the back-projection of the histogram in the

subsequent frame. For each pixel of all input images we put, inthe back-projection image, the value of the histogram bincorresponding to the pixel. In terms of statistics, the value ofeach output image pixel is the probability of the observed

pixel that it is a pixel of the tracked object, given thedistribution (histogram). Finally, using the previous locationof the person, we detect the new position of the moving person

and use it as starting search window for the next frame. Thesearch window center could be computed from:

00 ( , )x y

I x y= (5)

10

( , )x y

xI x y=

(6)

01 ( , )x y

yI x y= (7)

10 01

00 00

;c cM

x yM

= =

(8)

Where M00 in eq.(5) is the zeroth moment, M10 in eq.(6) and

M01 in eq.(7) are the first moments, these moments could beused to compute the next center position of the tracking

window xcand yc as shown in eq.(8). Then, back to step 3, wecalculate the new histogram of the person to update theprevious one, using a slow update, to keep the difference

between different persons if they are overlapping. The personstracked and their corresponding tracking windows usingCAMSHIFT algorithm are shown in figure.5.

But when two persons are crossing, it happens that both of thetracking windows follow the person in the foreground, forexample if the two persons have similar colors.

Figure 5. One frame with the tracking window of each tracked

people using CAMSHIFT tracking.

In order to solve this problem, we decided to add more

information to the back-projection image, and we decided to

use the motion coherence. When two persons are going in twoopposite directions, the motion will allow us to follow the

right person.

3. CAMSHIFT WITH MOTION(PEOPLETRACKING USING OPTICAL FLOW)

In order to improve the results of the CAMSHIFT algorithmdescribed above, we use an optical flow algorithm

(Lucas-Kanade Method) [32] to determine the motion of thetracked persons.

3.1. Lucas-Kanade AlgorithmThe basic idea of the LK algorithm rests on three assumptions:

(1) Brightness constancy: A pixel from the image of anobject in the scene does not change in appearance as

it (possibly) moves from frame to frame. Forgrayscale images, this means we assume that thebrightness of a pixel does not change as it is tracked

from frame to frame.(2) Temporal persistence or small movements: The

image motion of a surface patch changes slowly in

time. In practice, this means the temporal incrementsare fast enough relative to the scale of motion in theimage that the object does not move much from

frame to frame.(3) Spatial coherence: Neighboring points in a scene

belong to the same surface, have similar motion, andproject to nearby points on the image plane.

(4) The OpenCV function implements sparse iterativeversion of Lucas-Kanade optical flow in pyramids

[32]. It calculates coordinates of the feature pointson the current video frame given their coordinates


4/5

on the previous frame. The function finds thecoordinates with sub-pixel accuracy.

So the aim of the optical flow method is, for a given set ofpoints in a video frame find those same points in the next

frame. Or for given point [ux,vy] in frame F1;find the point

[ux+x, uy+

y]

T

in image F2that minimizes

as shown ineq.(9).

( ) 1 2, ( ( , ) ( , ))y yx x

x x y y

u wu w

x y x y

x u w y u w

F x y F x y

++

= =

= + +

(9)

Figure 6 presents the results of the Lucas-Kanade algorithm.The arrows are drawn between the previous and the nextposition of the pixels corresponding to the good features to

track in the figure 6(right), and current frame is shown infigure 6(left).

To combine the previous histogram and the motion calculatedby the LK optical flow method, we first calculate the previous

histogram, and then calculate the back-projection image usingthis histogram. Before finding the new location of the personwith CAMSHIFT, we update the back-projection image using

the motion information. For that, we calculate the globalmotion of each person, and for each back-projection imagecalculated using the histogram, we update each point

calculated with LK algorithm, and put a higher value forpixels going in the same direction of the person. Doing this,we will be able to follow two persons crossing, and know who

is who when they keep going in two opposite directions. Usingthis method, we need also to use the first type of thehistogram, namely the one dimensional histogram (hue

component) because it was difficult to combine informationfrom the 3-D histogram and motion. Using this method, we

still have the problem of the background similar to the personcolor, and to solve it, we combined the back-projection imagewith the foreground mask, to keep only the back-projection of

the foreground.

Figure 6. Original frame (left) and representation of theresults of the LK algorithm (right).

3.2. Experimental Results

An example video is used to first test the simple CAMSHIFT

algorithm without motion information. The CAMSHIFTwithout motion information was failed to track the personsafter the occlusion as shown in figure.7, which means color

information is not enough to track the person in occlusion.Same video is used to test our proposed tracking algorithm

using both color histogram and motion information; the videoframes before and after occlusion are shown in figure 8. Wecan see here an example of two persons crossing in the

opposite directions; the motion in this case allows us to trackboth persons after the occlusion. The tracking windows have

the same tracking numbers as before and after the occlusionthat shows the algorithm can track the person even afterocclusion.

Figure 7. One frame with two persons before crossing (left)and same persons just after crossing (right) (Tracking using

CAMSHIFT only).

Figure 8. One frame with two persons before crossing (left)and same persons just after crossing (right) (Tracking usingCAMSHIFT plus Motion information).

4. CONCLUSIONS AND FUTURE WORKIn this paper, a modified CAMSHIFT algorithm is presented.

This algorithm is using color feature similar to classicalCAMSHIFT algorithm, motion or optical flow information isalso added to it to make it robust against occlusions. The

algorithm is verified for a set of videos. As future work thesame algorithm will be extended to be used formultiple-cameras scenario also referred as multiple camera

tracking.

5. REFERENCES[1] Veenman, C. Reinders, M., and Backer, E. 2001.

Resolving motion correspondence for densely movingpoints. IEEE Trans. Patt. Analy. Mach. Intell. 23, 1,

5472.[2] SERBY, D., KOLLER-MEIER, S., AND GOOL, L. V.

2004. Probabilistic object tracking using multiple

features. In IEEE International Conference of PatternRecognition (ICPR). 184187.

[3] COMANICIU, D., RAMESH, V., ANDMEER, P. 2003.Kernel-based object tracking. IEEE Trans. Patt. Analy.

Mach Intell. 25, 564575.[4] YILMAZ, A., LI, X., AND SHAH, M. 2004. Contourbased object tracking with occlusion handling in videoacquired using mobile cameras.IEEE Trans. Patt. Analy.

Mach. Intell. 26, 11, 15311536.

[5] PASCHOS, G. 2001. Perceptually uniform color spacesfor color texture analysis: an empirical evaluation. IEEE

Trans. Image Process. 10, 932937.[6] CANNY, J. 1986. A computational approach to edge

detection.IEEE Trans. Patt. Analy. Mach. Intell. 8, 6,

679698.[7] HORN, B. AND SCHUNK, B. 1981. Determining

optical flow.Artific. Intell. 17, 185203.

[8] KANADE, T., COLLINS, R., LIPTON, A., BURT, P.,AND WIXSON, L. 1998. Advances in cooperative

multi-sensor video surveillance. Darpa IU Workshop.324.

[9] BLACK, M. AND ANANDAN, P. 1996. The robustestimation of multiple motions: Parametric and

piecewisesmooth


5/5

flow fields. Comput. Vision Image Understand. 63, 1,75104.

[10]HARALICK, R., SHANMUGAM, B., AND DINSTEIN,I. 1973. Textural features for image classification. IEEETrans.Syst. Man Cybern. 33, 3, 610622.

[11]LAWS, K. 1980. Textured image segmentation. PhDthesis, Electrical Engineering, University of SouthernCalifornia.

[12]SALARI, V. AND SETHI, I. K. 1990. Feature pointcorrespondence in the presence of occlusion. IEEETrans.Patt. Analy. Mach. Intell. 12, 1, 8791.

[13]VEENMAN, C., REINDERS, M., AND BACKER, E.2001. Resolving motion correspondence for denselymoving points.IEEE Trans. Patt. Analy. Mach. Intell. 23,

1, 5472.[14]BROIDA, T. AND CHELLAPPA, R. 1986. Estimation

of object motion parameters from noisy images. IEEE

Trans.Patt. Analy. Mach. Intell. 8, 1, 9099.[15]STREIT, R. L. AND LUGINBUHL, T. E. 1994.

Maximum likelihood method for probabilisticmulti-hypothesis tracking. In Proceedings of the

International Society for Optical Engineering (SPIE.) vol.2235. 394405.

[16]COMANICIU, D., RAMESH, V., ANDMEER, P. 2003.Kernel-based object tracking. IEEE Trans. Patt. Analy.

Mach. Intell. 25, 564575.[17]Bradski, G. R. 1998. Real Time Face and Object

Tracking as a Component of a Perceptual User Interface.

In Proceedings of the 4th IEEE Workshop onApplications of Computer Vision (Wacv'98)(October 19 -21, 1998). WACV. IEEE Computer Society, Washington,

DC, 214.[18]BLACK, M. AND JEPSON, A. 1998. Eigentracking:

Robust matching and tracking of articulated objects usinga view-based representation.Int. J. Comput. Vision 26, 1,

6384.[19]AVIDAN, S. 2001. Support vector tracking. In IEEE

Conference on Computer Vision and Pattern Recognition(CVPR). 184191.

[20]ISARD, M. AND BLAKE, A. 1998. Condensation -conditional density propagation for visual tracking. Int.J.Comput. Vision 29, 1, 528.

[21]RONFARD, R. 1994. Region based strategies for activecontour models.Int. J. Comput. Vision 13, 2, 229251.

[22]HUTTENLOCHER, D., NOH, J., AND RUCKLIDGE,W. 1993. Tracking nonrigid objects in complex scenes. InIEEE International Conference on Computer Vision

(ICCV). 93101.

[23]KANG, J., COHEN, I., ANDMEDIONI, G. 2003.Continuous tracking within and across camera streams. In

IEEE Conference on Computer Vision and PatternRecognition (CVPR). 267272.

[24]C. Wren, A. Azarbayejani, T. Darrell, and A. Pentland,Pfinder: Real-time tracking of the human body, IEEETransactions on Pattern Analysis and MachineIntelligence, vol. 19, pp. 780-785, 1997.

[25]A. Monnet, A. Mittal, N. Paragios, and V. Ramesh,Background modeling and subtraction of dynamicscenes, Oct. 2003, pp. 1305-1312 vol.2.

[26]M. Irani and P. Anandan, Video indexing based onmosaic representations, Proceedings of the IEEE, vol.86, no. 5, pp. 905-921, May 1998.

[27]X. Gao, T. Boult, F. Coetzee, and V. Ramesh, Erroranalysis of background adaption, vol. 1, 2000, pp.

503-510 vol.1.[28]C. Stauffer and W. Grimson, Learning patterns of

activity using real-time tracking, Pattern Analysis and

Machine Intelligence, IEEE Transactions on, vol. 22, no.8, pp. 747-757, Aug 2000.

[29]Intel Corporation (2001): Open Source Computer VisionLibrary Reference Manual.

[30]P. KaewTraKulPong and R. Bowden, (2001): Animproved adaptive background mixture model forreal-time tracking with shadow detection, in Proc. 2nd

EuropeanWorkshop on Advanced Video-Based

Surveillance Systems.[31]Andrea Prati, Ivana Mikic, Mohan M. Trivedi, RitaCucchiara, (2003): Detecting Moving Shadows:Algorithms and Evaluation, IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 25, no. 7, pp.

918-923.[32]Jean-Yves Bouguet (2000): Pyramidal Implementation of

the Lucas Kanade Feature Tracker Description of the

algorithm, Intel Corporation Microprocessor ResearchLabs.

modified camshift

Documents