Download - Modified CAMSHIFT
-
8/12/2019 Modified CAMSHIFT
1/5
People Tracking via a Modified CAMSHIFT Algorithm
Fahad Fazal Elahi Guraya1, Pierre-Yves Bayle2and Faouzi Alaya Cheikh1
1Department of Computer Science and Media Technology, Gjovik University College
Gjovik 2802, Norway2Department of Computer Science, Universite de Bourgogne
Dijon, France
Email: [email protected]
ABSTRACT
The past decade has witnessed a rapid growth in the use of
video cameras in all aspect of our daily life. Video camerasuse for Surveillance purposes has also increased. This
increased the demand of automatic video surveillancealgorithms that can detect abnormal activity/events in thesurveillance area and can raise the alarm. Most of the event
detection applications relay on persons detection and tracking
or object tracking algorithm. CAMSHIFT is a trackingalgorithm that has already been used for face-tracking. It is,
however, not being successfully used for people/persontracking. This paper presents a modified version ofCAMSHIFT algorithm which can be used for person tracking.
The proposed algorithm incorporates the motion informationwith CAMSHIFT algorithm, which allows it to track people
successfully even in the case of occlusions. The experimentalresults prove the consistency of the proposed algorithm.
Keywords:Object Tracking, CAMSHIFT, Motion estimationtracking.
1. INTRODUCTIONObject tracking is used for many applications such as
motion-based recognition, automated surveillance, videoindexing and retrieval, human computer interaction and trafficmonitoring etc. Object tracking can be divided into series of
steps, such as object representation, feature selection fortracking, object detection, background subtraction, and objectsegmentation.
Objects could be represented with the help of a single point
(e.g. centroid) [1] or a set of points [2], by primitive geometricshapes (rectangles or ellipses) [3], silhouettes and contours fortracking complex non-rigid objects[4]. Feature selection also
plays an important role in object tracking. The mostcommonly used features are color, edges, optical flow andtexture [10, 11]. Various color spaces other than RGB are used
for tracking purposes such as HSV because RGB colorspacedoes not correspond to the color differences perceived byhumans [5]. Commonly used edge detectors are Canny Edge
detector [6]. Optical flow can also be used as feature for objecttracking. Some of the most used optical flow methods are [7, 8,9].
Object trackers can be categorized into three differentcategories such as point trackers, kernel trackers and silhouette
trackers. Point tracking methods can be furthersub-categorized into deterministic methods such as [12, 13]
and statistical methods [14, 15]. Kernel based object tracking
methods are Mean Shift tracking [16], and ContinuouslyAdaptive Mean Shift tracking (CAMSHIFT) [17].
Eigen-tracking [18] and Support Vector Machines [19] arekernel based methods for multi-view appearance models.Silhouette tracking methods are divided into contour evolution
methods [20, 21] and shape matching methods [22, 23].
Background modeling could be the basic step in many video
analysis applications, used to extract foreground or movingobjects from the video frames. Change in a scene orforeground objects could be extracted from a video sequence
by subtracting the background image from each frame. Ingeneral the background is considered to be constant or slowly
changing due to luminance changes. In practice the
background pixels are always changing, for that reason weneed a model which accounts for gradual changes. Several
techniques have been proposed in the literature for modelingthe variation of the background information [24, 25, 26]. In[24] modeling of each stationary background color pixel with
a single 3D (YUV color-space) Gaussian was proposed. Asingle 3D (YUV) Gaussian however is not suitable for outdoorscenes [27] since at a certain location multiple colors can be
seen due to repetitive motion, shadows or reflectance. ThusMixture of Gaussians (MoG) for modeling a single pixel colorwas proposed in [28].
In this paper, we use MoG method for background modeling.Contour detection provided by Intel Open Source Computer
Vision Library (OpenCV) [29] is used for object detection andrepresentation. CAMSHIFT kernel based object tracking is
used to track the object with color information using HueSaturation Value color-space, considered as a betterrepresentative of color perceived by human vision [5]. Motion
information is also incorporated with color information usingOptical flow method provided in [32].
In the next section, the implementation of tracking algorithmsis presented. Section 3 proposed the CAMSHIFT algorithmwith motion detection. Conclusion and future work is given in
section 4.
2. PEOPLE DETECTION AND TRACKING2.1 Background Estimation and Post ProcessingTo perform people tracking, we first used Mixture of Gaussian
(MoG) method to calculate a good background model. TheOpenCV functions are an implementation of the GaussiansMixture Model in [30]. In this implementation, the model
assumes that each pixel in the scene is modeled by a mixtureof K Gaussian distributions where different Gaussiansrepresent different colors. The weight parameters of the
mixture represent the time proportions that those colors stay inthe scene. Thus, the probable background colors are the oneswhich stay longer and are more static. The probability that a
certain pixel has a value of xNat time N can be written as p(xN)
as shown in eq.(1).
1
( ) . ( ; )K
N j N j
j
p x w x =
= (1)
where wj is the weight parameter of the jth Gaussian
-
8/12/2019 Modified CAMSHIFT
2/5
component. ( ; )N jx is the Normal distribution of the jth
component. Static single-color objects trend to form tightclusters in the color space while moving ones form widen
clusters due to different reflecting surfaces during themovement. The measure of this was called the fitness value[30]. The K distributions are ordered based on the fitness and
the first B distributions are used as a model of the backgroundof the scene where B is estimated as shown in eq.(2).
1
argminb
jb
j
B w T=
= > (2)
The threshold T is the minimum fraction of the background
model. In other words, it is the minimum prior probability thatthe background is in the scene. Background subtraction isperformed by marking a foreground pixel any pixel that is
more than 2.5 standard deviations away from any of the Bdistributions. Figure.1 shows the original frame and thebackground obtained after applying Gaussian mixture model.
Figure 1.An example of the background results obtained withthe Adaptive Gaussian Mixture Model.
Next, we need to remove the shadows in the foregroundobtained with the previous background subtraction method. In
[31] the authors assume that we can consider a pixel as shadedbackground or shadow if it has similar chromaticity but lowerbrightness than those of the same pixel in the background
image. Thus, with an appropriate threshold T, we can removeshadows from the foreground image. Eq.(3) shows thedecision either a certain pixel belongs to shadow or not.
1
{ , } &
0
img bg
img bg
if brightness brightness
Shadow x y chromaticity chromaticity T
otherwise
-
8/12/2019 Modified CAMSHIFT
3/5
(2) Select an initial location of the Mean Shift searchwindow. The selected location is the target
distribution to be tracked.(3) Calculate a color probability distribution of the
region centered at the Mean Shift search window.
(4) Iterate Mean Shift algorithm to find the centroid ofthe probability image. Store the 0th moment(distribution area) and centroid location.
(5) For the following frame, center the search windowat the mean location found in step 4 and set thewindow size to a function of the 0th moment. Then
go to Step 3.
The creation of the color histogram corresponds to steps 1 to
3. The first step is to define the region of interest (ROI) whichis the bounding box corresponding to the detected person thatwe want to track. Then, we need to calculate the color
histogram corresponding to this person. For that we use theHSV color space, and calculate a one dimensional histogram
corresponding to the first component: hue. We also define amask for the histogram calculation, which is the foreground
image, to calculate the histogram only for the person, and notfor the background inside the bounding box. But the resultsobtained with this method were not satisfying, because in thecase where the background has almost the same color as the
person, it is not possible to detect the difference between thetwo in the back-projection image. That is why we decided in asecond time to use a three dimensional histogram, using the
three components of the HSV color space: hue, saturation,value. With this method, we were able to find the location ofthe person in the whole frame, even with a similar
background. In all cases the histogram bins values are scaledto be within the discrete pixel range of the 2D probabilitydistribution image using eq.(4).
1
255min ,255
max{ }u u
u m
p q
q
=
=
(4)
That is, the histogram bin values are rescaled to the range [0,
255], where pixels with the highest probability of being in thesample histogram will map as visible intensities in the 2D
histogram back-projection image.
Now we have a histogram of the moving person, we need to
find this person in all the frames (steps 4 to 5). For that end,we calculate the back-projection of the histogram in the
subsequent frame. For each pixel of all input images we put, inthe back-projection image, the value of the histogram bincorresponding to the pixel. In terms of statistics, the value ofeach output image pixel is the probability of the observed
pixel that it is a pixel of the tracked object, given thedistribution (histogram). Finally, using the previous locationof the person, we detect the new position of the moving person
and use it as starting search window for the next frame. Thesearch window center could be computed from:
00 ( , )x y
I x y= (5)
10
( , )x y
xI x y=
(6)
01 ( , )x y
yI x y= (7)
10 01
00 00
;c cM
x yM
= =
(8)
Where M00 in eq.(5) is the zeroth moment, M10 in eq.(6) and
M01 in eq.(7) are the first moments, these moments could beused to compute the next center position of the tracking
window xcand yc as shown in eq.(8). Then, back to step 3, wecalculate the new histogram of the person to update theprevious one, using a slow update, to keep the difference
between different persons if they are overlapping. The personstracked and their corresponding tracking windows usingCAMSHIFT algorithm are shown in figure.5.
But when two persons are crossing, it happens that both of thetracking windows follow the person in the foreground, forexample if the two persons have similar colors.
Figure 5. One frame with the tracking window of each tracked
people using CAMSHIFT tracking.
In order to solve this problem, we decided to add more
information to the back-projection image, and we decided to
use the motion coherence. When two persons are going in twoopposite directions, the motion will allow us to follow the
right person.
3. CAMSHIFT WITH MOTION(PEOPLETRACKING USING OPTICAL FLOW)
In order to improve the results of the CAMSHIFT algorithmdescribed above, we use an optical flow algorithm
(Lucas-Kanade Method) [32] to determine the motion of thetracked persons.
3.1. Lucas-Kanade AlgorithmThe basic idea of the LK algorithm rests on three assumptions:
(1) Brightness constancy: A pixel from the image of anobject in the scene does not change in appearance as
it (possibly) moves from frame to frame. Forgrayscale images, this means we assume that thebrightness of a pixel does not change as it is tracked
from frame to frame.(2) Temporal persistence or small movements: The
image motion of a surface patch changes slowly in
time. In practice, this means the temporal incrementsare fast enough relative to the scale of motion in theimage that the object does not move much from
frame to frame.(3) Spatial coherence: Neighboring points in a scene
belong to the same surface, have similar motion, andproject to nearby points on the image plane.
(4) The OpenCV function implements sparse iterativeversion of Lucas-Kanade optical flow in pyramids
[32]. It calculates coordinates of the feature pointson the current video frame given their coordinates
-
8/12/2019 Modified CAMSHIFT
4/5
on the previous frame. The function finds thecoordinates with sub-pixel accuracy.
So the aim of the optical flow method is, for a given set ofpoints in a video frame find those same points in the next
frame. Or for given point [ux,vy] in frame F1;find the point
[ux+x, uy+
y]
T
in image F2that minimizes
as shown ineq.(9).
( ) 1 2, ( ( , ) ( , ))y yx x
x x y y
u wu w
x y x y
x u w y u w
F x y F x y
++
= =
= + +
(9)
Figure 6 presents the results of the Lucas-Kanade algorithm.The arrows are drawn between the previous and the nextposition of the pixels corresponding to the good features to
track in the figure 6(right), and current frame is shown infigure 6(left).
To combine the previous histogram and the motion calculatedby the LK optical flow method, we first calculate the previous
histogram, and then calculate the back-projection image usingthis histogram. Before finding the new location of the personwith CAMSHIFT, we update the back-projection image using
the motion information. For that, we calculate the globalmotion of each person, and for each back-projection imagecalculated using the histogram, we update each point
calculated with LK algorithm, and put a higher value forpixels going in the same direction of the person. Doing this,we will be able to follow two persons crossing, and know who
is who when they keep going in two opposite directions. Usingthis method, we need also to use the first type of thehistogram, namely the one dimensional histogram (hue
component) because it was difficult to combine informationfrom the 3-D histogram and motion. Using this method, we
still have the problem of the background similar to the personcolor, and to solve it, we combined the back-projection imagewith the foreground mask, to keep only the back-projection of
the foreground.
Figure 6. Original frame (left) and representation of theresults of the LK algorithm (right).
3.2. Experimental Results
An example video is used to first test the simple CAMSHIFT
algorithm without motion information. The CAMSHIFTwithout motion information was failed to track the personsafter the occlusion as shown in figure.7, which means color
information is not enough to track the person in occlusion.Same video is used to test our proposed tracking algorithm
using both color histogram and motion information; the videoframes before and after occlusion are shown in figure 8. Wecan see here an example of two persons crossing in the
opposite directions; the motion in this case allows us to trackboth persons after the occlusion. The tracking windows have
the same tracking numbers as before and after the occlusionthat shows the algorithm can track the person even afterocclusion.
Figure 7. One frame with two persons before crossing (left)and same persons just after crossing (right) (Tracking using
CAMSHIFT only).
Figure 8. One frame with two persons before crossing (left)and same persons just after crossing (right) (Tracking usingCAMSHIFT plus Motion information).
4. CONCLUSIONS AND FUTURE WORKIn this paper, a modified CAMSHIFT algorithm is presented.
This algorithm is using color feature similar to classicalCAMSHIFT algorithm, motion or optical flow information isalso added to it to make it robust against occlusions. The
algorithm is verified for a set of videos. As future work thesame algorithm will be extended to be used formultiple-cameras scenario also referred as multiple camera
tracking.
5. REFERENCES[1] Veenman, C. Reinders, M., and Backer, E. 2001.
Resolving motion correspondence for densely movingpoints. IEEE Trans. Patt. Analy. Mach. Intell. 23, 1,
5472.[2] SERBY, D., KOLLER-MEIER, S., AND GOOL, L. V.
2004. Probabilistic object tracking using multiple
features. In IEEE International Conference of PatternRecognition (ICPR). 184187.
[3] COMANICIU, D., RAMESH, V., ANDMEER, P. 2003.Kernel-based object tracking. IEEE Trans. Patt. Analy.
Mach Intell. 25, 564575.[4] YILMAZ, A., LI, X., AND SHAH, M. 2004. Contourbased object tracking with occlusion handling in videoacquired using mobile cameras.IEEE Trans. Patt. Analy.
Mach. Intell. 26, 11, 15311536.
[5] PASCHOS, G. 2001. Perceptually uniform color spacesfor color texture analysis: an empirical evaluation. IEEE
Trans. Image Process. 10, 932937.[6] CANNY, J. 1986. A computational approach to edge
detection.IEEE Trans. Patt. Analy. Mach. Intell. 8, 6,
679698.[7] HORN, B. AND SCHUNK, B. 1981. Determining
optical flow.Artific. Intell. 17, 185203.
[8] KANADE, T., COLLINS, R., LIPTON, A., BURT, P.,AND WIXSON, L. 1998. Advances in cooperative
multi-sensor video surveillance. Darpa IU Workshop.324.
[9] BLACK, M. AND ANANDAN, P. 1996. The robustestimation of multiple motions: Parametric and
piecewisesmooth
-
8/12/2019 Modified CAMSHIFT
5/5
flow fields. Comput. Vision Image Understand. 63, 1,75104.
[10]HARALICK, R., SHANMUGAM, B., AND DINSTEIN,I. 1973. Textural features for image classification. IEEETrans.Syst. Man Cybern. 33, 3, 610622.
[11]LAWS, K. 1980. Textured image segmentation. PhDthesis, Electrical Engineering, University of SouthernCalifornia.
[12]SALARI, V. AND SETHI, I. K. 1990. Feature pointcorrespondence in the presence of occlusion. IEEETrans.Patt. Analy. Mach. Intell. 12, 1, 8791.
[13]VEENMAN, C., REINDERS, M., AND BACKER, E.2001. Resolving motion correspondence for denselymoving points.IEEE Trans. Patt. Analy. Mach. Intell. 23,
1, 5472.[14]BROIDA, T. AND CHELLAPPA, R. 1986. Estimation
of object motion parameters from noisy images. IEEE
Trans.Patt. Analy. Mach. Intell. 8, 1, 9099.[15]STREIT, R. L. AND LUGINBUHL, T. E. 1994.
Maximum likelihood method for probabilisticmulti-hypothesis tracking. In Proceedings of the
International Society for Optical Engineering (SPIE.) vol.2235. 394405.
[16]COMANICIU, D., RAMESH, V., ANDMEER, P. 2003.Kernel-based object tracking. IEEE Trans. Patt. Analy.
Mach. Intell. 25, 564575.[17]Bradski, G. R. 1998. Real Time Face and Object
Tracking as a Component of a Perceptual User Interface.
In Proceedings of the 4th IEEE Workshop onApplications of Computer Vision (Wacv'98)(October 19 -21, 1998). WACV. IEEE Computer Society, Washington,
DC, 214.[18]BLACK, M. AND JEPSON, A. 1998. Eigentracking:
Robust matching and tracking of articulated objects usinga view-based representation.Int. J. Comput. Vision 26, 1,
6384.[19]AVIDAN, S. 2001. Support vector tracking. In IEEE
Conference on Computer Vision and Pattern Recognition(CVPR). 184191.
[20]ISARD, M. AND BLAKE, A. 1998. Condensation -conditional density propagation for visual tracking. Int.J.Comput. Vision 29, 1, 528.
[21]RONFARD, R. 1994. Region based strategies for activecontour models.Int. J. Comput. Vision 13, 2, 229251.
[22]HUTTENLOCHER, D., NOH, J., AND RUCKLIDGE,W. 1993. Tracking nonrigid objects in complex scenes. InIEEE International Conference on Computer Vision
(ICCV). 93101.
[23]KANG, J., COHEN, I., ANDMEDIONI, G. 2003.Continuous tracking within and across camera streams. In
IEEE Conference on Computer Vision and PatternRecognition (CVPR). 267272.
[24]C. Wren, A. Azarbayejani, T. Darrell, and A. Pentland,Pfinder: Real-time tracking of the human body, IEEETransactions on Pattern Analysis and MachineIntelligence, vol. 19, pp. 780-785, 1997.
[25]A. Monnet, A. Mittal, N. Paragios, and V. Ramesh,Background modeling and subtraction of dynamicscenes, Oct. 2003, pp. 1305-1312 vol.2.
[26]M. Irani and P. Anandan, Video indexing based onmosaic representations, Proceedings of the IEEE, vol.86, no. 5, pp. 905-921, May 1998.
[27]X. Gao, T. Boult, F. Coetzee, and V. Ramesh, Erroranalysis of background adaption, vol. 1, 2000, pp.
503-510 vol.1.[28]C. Stauffer and W. Grimson, Learning patterns of
activity using real-time tracking, Pattern Analysis and
Machine Intelligence, IEEE Transactions on, vol. 22, no.8, pp. 747-757, Aug 2000.
[29]Intel Corporation (2001): Open Source Computer VisionLibrary Reference Manual.
[30]P. KaewTraKulPong and R. Bowden, (2001): Animproved adaptive background mixture model forreal-time tracking with shadow detection, in Proc. 2nd
EuropeanWorkshop on Advanced Video-Based
Surveillance Systems.[31]Andrea Prati, Ivana Mikic, Mohan M. Trivedi, RitaCucchiara, (2003): Detecting Moving Shadows:Algorithms and Evaluation, IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 25, no. 7, pp.
918-923.[32]Jean-Yves Bouguet (2000): Pyramidal Implementation of
the Lucas Kanade Feature Tracker Description of the
algorithm, Intel Corporation Microprocessor ResearchLabs.