modified camshift

Upload: engr-ebi

Post on 03-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Modified CAMSHIFT

    1/5

    People Tracking via a Modified CAMSHIFT Algorithm

    Fahad Fazal Elahi Guraya1, Pierre-Yves Bayle2and Faouzi Alaya Cheikh1

    1Department of Computer Science and Media Technology, Gjovik University College

    Gjovik 2802, Norway2Department of Computer Science, Universite de Bourgogne

    Dijon, France

    Email: [email protected]

    ABSTRACT

    The past decade has witnessed a rapid growth in the use of

    video cameras in all aspect of our daily life. Video camerasuse for Surveillance purposes has also increased. This

    increased the demand of automatic video surveillancealgorithms that can detect abnormal activity/events in thesurveillance area and can raise the alarm. Most of the event

    detection applications relay on persons detection and tracking

    or object tracking algorithm. CAMSHIFT is a trackingalgorithm that has already been used for face-tracking. It is,

    however, not being successfully used for people/persontracking. This paper presents a modified version ofCAMSHIFT algorithm which can be used for person tracking.

    The proposed algorithm incorporates the motion informationwith CAMSHIFT algorithm, which allows it to track people

    successfully even in the case of occlusions. The experimentalresults prove the consistency of the proposed algorithm.

    Keywords:Object Tracking, CAMSHIFT, Motion estimationtracking.

    1. INTRODUCTIONObject tracking is used for many applications such as

    motion-based recognition, automated surveillance, videoindexing and retrieval, human computer interaction and trafficmonitoring etc. Object tracking can be divided into series of

    steps, such as object representation, feature selection fortracking, object detection, background subtraction, and objectsegmentation.

    Objects could be represented with the help of a single point

    (e.g. centroid) [1] or a set of points [2], by primitive geometricshapes (rectangles or ellipses) [3], silhouettes and contours fortracking complex non-rigid objects[4]. Feature selection also

    plays an important role in object tracking. The mostcommonly used features are color, edges, optical flow andtexture [10, 11]. Various color spaces other than RGB are used

    for tracking purposes such as HSV because RGB colorspacedoes not correspond to the color differences perceived byhumans [5]. Commonly used edge detectors are Canny Edge

    detector [6]. Optical flow can also be used as feature for objecttracking. Some of the most used optical flow methods are [7, 8,9].

    Object trackers can be categorized into three differentcategories such as point trackers, kernel trackers and silhouette

    trackers. Point tracking methods can be furthersub-categorized into deterministic methods such as [12, 13]

    and statistical methods [14, 15]. Kernel based object tracking

    methods are Mean Shift tracking [16], and ContinuouslyAdaptive Mean Shift tracking (CAMSHIFT) [17].

    Eigen-tracking [18] and Support Vector Machines [19] arekernel based methods for multi-view appearance models.Silhouette tracking methods are divided into contour evolution

    methods [20, 21] and shape matching methods [22, 23].

    Background modeling could be the basic step in many video

    analysis applications, used to extract foreground or movingobjects from the video frames. Change in a scene orforeground objects could be extracted from a video sequence

    by subtracting the background image from each frame. Ingeneral the background is considered to be constant or slowly

    changing due to luminance changes. In practice the

    background pixels are always changing, for that reason weneed a model which accounts for gradual changes. Several

    techniques have been proposed in the literature for modelingthe variation of the background information [24, 25, 26]. In[24] modeling of each stationary background color pixel with

    a single 3D (YUV color-space) Gaussian was proposed. Asingle 3D (YUV) Gaussian however is not suitable for outdoorscenes [27] since at a certain location multiple colors can be

    seen due to repetitive motion, shadows or reflectance. ThusMixture of Gaussians (MoG) for modeling a single pixel colorwas proposed in [28].

    In this paper, we use MoG method for background modeling.Contour detection provided by Intel Open Source Computer

    Vision Library (OpenCV) [29] is used for object detection andrepresentation. CAMSHIFT kernel based object tracking is

    used to track the object with color information using HueSaturation Value color-space, considered as a betterrepresentative of color perceived by human vision [5]. Motion

    information is also incorporated with color information usingOptical flow method provided in [32].

    In the next section, the implementation of tracking algorithmsis presented. Section 3 proposed the CAMSHIFT algorithmwith motion detection. Conclusion and future work is given in

    section 4.

    2. PEOPLE DETECTION AND TRACKING2.1 Background Estimation and Post ProcessingTo perform people tracking, we first used Mixture of Gaussian

    (MoG) method to calculate a good background model. TheOpenCV functions are an implementation of the GaussiansMixture Model in [30]. In this implementation, the model

    assumes that each pixel in the scene is modeled by a mixtureof K Gaussian distributions where different Gaussiansrepresent different colors. The weight parameters of the

    mixture represent the time proportions that those colors stay inthe scene. Thus, the probable background colors are the oneswhich stay longer and are more static. The probability that a

    certain pixel has a value of xNat time N can be written as p(xN)

    as shown in eq.(1).

    1

    ( ) . ( ; )K

    N j N j

    j

    p x w x =

    = (1)

    where wj is the weight parameter of the jth Gaussian

  • 8/12/2019 Modified CAMSHIFT

    2/5

    component. ( ; )N jx is the Normal distribution of the jth

    component. Static single-color objects trend to form tightclusters in the color space while moving ones form widen

    clusters due to different reflecting surfaces during themovement. The measure of this was called the fitness value[30]. The K distributions are ordered based on the fitness and

    the first B distributions are used as a model of the backgroundof the scene where B is estimated as shown in eq.(2).

    1

    argminb

    jb

    j

    B w T=

    = > (2)

    The threshold T is the minimum fraction of the background

    model. In other words, it is the minimum prior probability thatthe background is in the scene. Background subtraction isperformed by marking a foreground pixel any pixel that is

    more than 2.5 standard deviations away from any of the Bdistributions. Figure.1 shows the original frame and thebackground obtained after applying Gaussian mixture model.

    Figure 1.An example of the background results obtained withthe Adaptive Gaussian Mixture Model.

    Next, we need to remove the shadows in the foregroundobtained with the previous background subtraction method. In

    [31] the authors assume that we can consider a pixel as shadedbackground or shadow if it has similar chromaticity but lowerbrightness than those of the same pixel in the background

    image. Thus, with an appropriate threshold T, we can removeshadows from the foreground image. Eq.(3) shows thedecision either a certain pixel belongs to shadow or not.

    1

    { , } &

    0

    img bg

    img bg

    if brightness brightness

    Shadow x y chromaticity chromaticity T

    otherwise

  • 8/12/2019 Modified CAMSHIFT

    3/5

    (2) Select an initial location of the Mean Shift searchwindow. The selected location is the target

    distribution to be tracked.(3) Calculate a color probability distribution of the

    region centered at the Mean Shift search window.

    (4) Iterate Mean Shift algorithm to find the centroid ofthe probability image. Store the 0th moment(distribution area) and centroid location.

    (5) For the following frame, center the search windowat the mean location found in step 4 and set thewindow size to a function of the 0th moment. Then

    go to Step 3.

    The creation of the color histogram corresponds to steps 1 to

    3. The first step is to define the region of interest (ROI) whichis the bounding box corresponding to the detected person thatwe want to track. Then, we need to calculate the color

    histogram corresponding to this person. For that we use theHSV color space, and calculate a one dimensional histogram

    corresponding to the first component: hue. We also define amask for the histogram calculation, which is the foreground

    image, to calculate the histogram only for the person, and notfor the background inside the bounding box. But the resultsobtained with this method were not satisfying, because in thecase where the background has almost the same color as the

    person, it is not possible to detect the difference between thetwo in the back-projection image. That is why we decided in asecond time to use a three dimensional histogram, using the

    three components of the HSV color space: hue, saturation,value. With this method, we were able to find the location ofthe person in the whole frame, even with a similar

    background. In all cases the histogram bins values are scaledto be within the discrete pixel range of the 2D probabilitydistribution image using eq.(4).

    1

    255min ,255

    max{ }u u

    u m

    p q

    q

    =

    =

    (4)

    That is, the histogram bin values are rescaled to the range [0,

    255], where pixels with the highest probability of being in thesample histogram will map as visible intensities in the 2D

    histogram back-projection image.

    Now we have a histogram of the moving person, we need to

    find this person in all the frames (steps 4 to 5). For that end,we calculate the back-projection of the histogram in the

    subsequent frame. For each pixel of all input images we put, inthe back-projection image, the value of the histogram bincorresponding to the pixel. In terms of statistics, the value ofeach output image pixel is the probability of the observed

    pixel that it is a pixel of the tracked object, given thedistribution (histogram). Finally, using the previous locationof the person, we detect the new position of the moving person

    and use it as starting search window for the next frame. Thesearch window center could be computed from:

    00 ( , )x y

    I x y= (5)

    10

    ( , )x y

    xI x y=

    (6)

    01 ( , )x y

    yI x y= (7)

    10 01

    00 00

    ;c cM

    x yM

    = =

    (8)

    Where M00 in eq.(5) is the zeroth moment, M10 in eq.(6) and

    M01 in eq.(7) are the first moments, these moments could beused to compute the next center position of the tracking

    window xcand yc as shown in eq.(8). Then, back to step 3, wecalculate the new histogram of the person to update theprevious one, using a slow update, to keep the difference

    between different persons if they are overlapping. The personstracked and their corresponding tracking windows usingCAMSHIFT algorithm are shown in figure.5.

    But when two persons are crossing, it happens that both of thetracking windows follow the person in the foreground, forexample if the two persons have similar colors.

    Figure 5. One frame with the tracking window of each tracked

    people using CAMSHIFT tracking.

    In order to solve this problem, we decided to add more

    information to the back-projection image, and we decided to

    use the motion coherence. When two persons are going in twoopposite directions, the motion will allow us to follow the

    right person.

    3. CAMSHIFT WITH MOTION(PEOPLETRACKING USING OPTICAL FLOW)

    In order to improve the results of the CAMSHIFT algorithmdescribed above, we use an optical flow algorithm

    (Lucas-Kanade Method) [32] to determine the motion of thetracked persons.

    3.1. Lucas-Kanade AlgorithmThe basic idea of the LK algorithm rests on three assumptions:

    (1) Brightness constancy: A pixel from the image of anobject in the scene does not change in appearance as

    it (possibly) moves from frame to frame. Forgrayscale images, this means we assume that thebrightness of a pixel does not change as it is tracked

    from frame to frame.(2) Temporal persistence or small movements: The

    image motion of a surface patch changes slowly in

    time. In practice, this means the temporal incrementsare fast enough relative to the scale of motion in theimage that the object does not move much from

    frame to frame.(3) Spatial coherence: Neighboring points in a scene

    belong to the same surface, have similar motion, andproject to nearby points on the image plane.

    (4) The OpenCV function implements sparse iterativeversion of Lucas-Kanade optical flow in pyramids

    [32]. It calculates coordinates of the feature pointson the current video frame given their coordinates

  • 8/12/2019 Modified CAMSHIFT

    4/5

    on the previous frame. The function finds thecoordinates with sub-pixel accuracy.

    So the aim of the optical flow method is, for a given set ofpoints in a video frame find those same points in the next

    frame. Or for given point [ux,vy] in frame F1;find the point

    [ux+x, uy+

    y]

    T

    in image F2that minimizes

    as shown ineq.(9).

    ( ) 1 2, ( ( , ) ( , ))y yx x

    x x y y

    u wu w

    x y x y

    x u w y u w

    F x y F x y

    ++

    = =

    = + +

    (9)

    Figure 6 presents the results of the Lucas-Kanade algorithm.The arrows are drawn between the previous and the nextposition of the pixels corresponding to the good features to

    track in the figure 6(right), and current frame is shown infigure 6(left).

    To combine the previous histogram and the motion calculatedby the LK optical flow method, we first calculate the previous

    histogram, and then calculate the back-projection image usingthis histogram. Before finding the new location of the personwith CAMSHIFT, we update the back-projection image using

    the motion information. For that, we calculate the globalmotion of each person, and for each back-projection imagecalculated using the histogram, we update each point

    calculated with LK algorithm, and put a higher value forpixels going in the same direction of the person. Doing this,we will be able to follow two persons crossing, and know who

    is who when they keep going in two opposite directions. Usingthis method, we need also to use the first type of thehistogram, namely the one dimensional histogram (hue

    component) because it was difficult to combine informationfrom the 3-D histogram and motion. Using this method, we

    still have the problem of the background similar to the personcolor, and to solve it, we combined the back-projection imagewith the foreground mask, to keep only the back-projection of

    the foreground.

    Figure 6. Original frame (left) and representation of theresults of the LK algorithm (right).

    3.2. Experimental Results

    An example video is used to first test the simple CAMSHIFT

    algorithm without motion information. The CAMSHIFTwithout motion information was failed to track the personsafter the occlusion as shown in figure.7, which means color

    information is not enough to track the person in occlusion.Same video is used to test our proposed tracking algorithm

    using both color histogram and motion information; the videoframes before and after occlusion are shown in figure 8. Wecan see here an example of two persons crossing in the

    opposite directions; the motion in this case allows us to trackboth persons after the occlusion. The tracking windows have

    the same tracking numbers as before and after the occlusionthat shows the algorithm can track the person even afterocclusion.

    Figure 7. One frame with two persons before crossing (left)and same persons just after crossing (right) (Tracking using

    CAMSHIFT only).

    Figure 8. One frame with two persons before crossing (left)and same persons just after crossing (right) (Tracking usingCAMSHIFT plus Motion information).

    4. CONCLUSIONS AND FUTURE WORKIn this paper, a modified CAMSHIFT algorithm is presented.

    This algorithm is using color feature similar to classicalCAMSHIFT algorithm, motion or optical flow information isalso added to it to make it robust against occlusions. The

    algorithm is verified for a set of videos. As future work thesame algorithm will be extended to be used formultiple-cameras scenario also referred as multiple camera

    tracking.

    5. REFERENCES[1] Veenman, C. Reinders, M., and Backer, E. 2001.

    Resolving motion correspondence for densely movingpoints. IEEE Trans. Patt. Analy. Mach. Intell. 23, 1,

    5472.[2] SERBY, D., KOLLER-MEIER, S., AND GOOL, L. V.

    2004. Probabilistic object tracking using multiple

    features. In IEEE International Conference of PatternRecognition (ICPR). 184187.

    [3] COMANICIU, D., RAMESH, V., ANDMEER, P. 2003.Kernel-based object tracking. IEEE Trans. Patt. Analy.

    Mach Intell. 25, 564575.[4] YILMAZ, A., LI, X., AND SHAH, M. 2004. Contourbased object tracking with occlusion handling in videoacquired using mobile cameras.IEEE Trans. Patt. Analy.

    Mach. Intell. 26, 11, 15311536.

    [5] PASCHOS, G. 2001. Perceptually uniform color spacesfor color texture analysis: an empirical evaluation. IEEE

    Trans. Image Process. 10, 932937.[6] CANNY, J. 1986. A computational approach to edge

    detection.IEEE Trans. Patt. Analy. Mach. Intell. 8, 6,

    679698.[7] HORN, B. AND SCHUNK, B. 1981. Determining

    optical flow.Artific. Intell. 17, 185203.

    [8] KANADE, T., COLLINS, R., LIPTON, A., BURT, P.,AND WIXSON, L. 1998. Advances in cooperative

    multi-sensor video surveillance. Darpa IU Workshop.324.

    [9] BLACK, M. AND ANANDAN, P. 1996. The robustestimation of multiple motions: Parametric and

    piecewisesmooth

  • 8/12/2019 Modified CAMSHIFT

    5/5

    flow fields. Comput. Vision Image Understand. 63, 1,75104.

    [10]HARALICK, R., SHANMUGAM, B., AND DINSTEIN,I. 1973. Textural features for image classification. IEEETrans.Syst. Man Cybern. 33, 3, 610622.

    [11]LAWS, K. 1980. Textured image segmentation. PhDthesis, Electrical Engineering, University of SouthernCalifornia.

    [12]SALARI, V. AND SETHI, I. K. 1990. Feature pointcorrespondence in the presence of occlusion. IEEETrans.Patt. Analy. Mach. Intell. 12, 1, 8791.

    [13]VEENMAN, C., REINDERS, M., AND BACKER, E.2001. Resolving motion correspondence for denselymoving points.IEEE Trans. Patt. Analy. Mach. Intell. 23,

    1, 5472.[14]BROIDA, T. AND CHELLAPPA, R. 1986. Estimation

    of object motion parameters from noisy images. IEEE

    Trans.Patt. Analy. Mach. Intell. 8, 1, 9099.[15]STREIT, R. L. AND LUGINBUHL, T. E. 1994.

    Maximum likelihood method for probabilisticmulti-hypothesis tracking. In Proceedings of the

    International Society for Optical Engineering (SPIE.) vol.2235. 394405.

    [16]COMANICIU, D., RAMESH, V., ANDMEER, P. 2003.Kernel-based object tracking. IEEE Trans. Patt. Analy.

    Mach. Intell. 25, 564575.[17]Bradski, G. R. 1998. Real Time Face and Object

    Tracking as a Component of a Perceptual User Interface.

    In Proceedings of the 4th IEEE Workshop onApplications of Computer Vision (Wacv'98)(October 19 -21, 1998). WACV. IEEE Computer Society, Washington,

    DC, 214.[18]BLACK, M. AND JEPSON, A. 1998. Eigentracking:

    Robust matching and tracking of articulated objects usinga view-based representation.Int. J. Comput. Vision 26, 1,

    6384.[19]AVIDAN, S. 2001. Support vector tracking. In IEEE

    Conference on Computer Vision and Pattern Recognition(CVPR). 184191.

    [20]ISARD, M. AND BLAKE, A. 1998. Condensation -conditional density propagation for visual tracking. Int.J.Comput. Vision 29, 1, 528.

    [21]RONFARD, R. 1994. Region based strategies for activecontour models.Int. J. Comput. Vision 13, 2, 229251.

    [22]HUTTENLOCHER, D., NOH, J., AND RUCKLIDGE,W. 1993. Tracking nonrigid objects in complex scenes. InIEEE International Conference on Computer Vision

    (ICCV). 93101.

    [23]KANG, J., COHEN, I., ANDMEDIONI, G. 2003.Continuous tracking within and across camera streams. In

    IEEE Conference on Computer Vision and PatternRecognition (CVPR). 267272.

    [24]C. Wren, A. Azarbayejani, T. Darrell, and A. Pentland,Pfinder: Real-time tracking of the human body, IEEETransactions on Pattern Analysis and MachineIntelligence, vol. 19, pp. 780-785, 1997.

    [25]A. Monnet, A. Mittal, N. Paragios, and V. Ramesh,Background modeling and subtraction of dynamicscenes, Oct. 2003, pp. 1305-1312 vol.2.

    [26]M. Irani and P. Anandan, Video indexing based onmosaic representations, Proceedings of the IEEE, vol.86, no. 5, pp. 905-921, May 1998.

    [27]X. Gao, T. Boult, F. Coetzee, and V. Ramesh, Erroranalysis of background adaption, vol. 1, 2000, pp.

    503-510 vol.1.[28]C. Stauffer and W. Grimson, Learning patterns of

    activity using real-time tracking, Pattern Analysis and

    Machine Intelligence, IEEE Transactions on, vol. 22, no.8, pp. 747-757, Aug 2000.

    [29]Intel Corporation (2001): Open Source Computer VisionLibrary Reference Manual.

    [30]P. KaewTraKulPong and R. Bowden, (2001): Animproved adaptive background mixture model forreal-time tracking with shadow detection, in Proc. 2nd

    EuropeanWorkshop on Advanced Video-Based

    Surveillance Systems.[31]Andrea Prati, Ivana Mikic, Mohan M. Trivedi, RitaCucchiara, (2003): Detecting Moving Shadows:Algorithms and Evaluation, IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 25, no. 7, pp.

    918-923.[32]Jean-Yves Bouguet (2000): Pyramidal Implementation of

    the Lucas Kanade Feature Tracker Description of the

    algorithm, Intel Corporation Microprocessor ResearchLabs.