[lecture notes in computer science] innovations in applied artificial intelligence volume 3533 ||...

M. Ali and F. Esposito (Eds.): IEA/AIE 2005, LNAI 3533, pp. 26 – 35, 2005. © Springer-Verlag Berlin Heidelberg 2005

Object Tracking Using Mean Shift and Active Contours

Jae Sik Chang1, Eun Yi Kim2, KeeChul Jung3, and Hang Joon Kim1

1 Dept. of Computer Engineering, Kyungpook National Univ., South Korea {jschang, hjkim}@ailab.knu.ac.kr

2 Scool of Internet and Multimedia, NITRI (Next-Generation Innovative Technology Research Institute), Konkuk Univ., South Korea

[email protected] 3 School of Media, College of Information Science, Soongsil University

[email protected]

Abstract. Active contours based tracking methods have widely used for object tracking due to their following advantages. 1) effectiveness to descript complex object boundary, and 2) ability to track the dynamic object boundary. However their tracking results are very sensitive to location of the initial curve. Initial curve far form the object induces more heavy computational cost, low accuracy of results, as well as missing the highly active object. Therefore, this paper pre-sents an object tracking method using a mean shift algorithm and active con-tours. The proposed method consists of two steps: object localization and object extraction. In the first step, the object location is estimated using mean shift. And the second step, at the location, evolves the initial curve using an active contour model. To assess the effectiveness of the proposed method, it is applied to synthetic sequences and real image sequences which include moving objects.

1 Introduction

An active contour model is a description of an object boundary which is iteratively adjusted until it matches the object of interest [1]. Recently, the models are success-fully used for object detection and tracking because of their ability to effectively de-script curve and elastic property. So, they have been applied to many applications such as non-rigid object (hand, pedestrian and etc.) detection and tracking, shape warping system and so on [2, 3, 4].

In the tracking approaches based on active contour models, the object tracking problem is considered as a curve evolution problem, i.e., the initial curve, initialized by the object boundary of the previous frame, is evolved until it matches the object boundary of interest [2, 3]. Generally, the curve evolutions are computed in narrow band around the current curve. This small computation area induces low computation cost. And the initial curve near the object boundary guarantees practically that the curve converges to object boundary. However their tracking results are very sensitive to conditions of the initial curve such as location, scale and shape. Among these con-ditions, location of the initial curve has a high effect on the results. The initial curve far from the object needs more heavy computational cost to converge and induces errors such as noises and holes which have similar feature to object boundary. More-over, it lost the highly active objects that have large movements.

Object Tracking Using Mean Shift and Active Contours 27

Accordingly, this paper proposes a method for object tracking using mean shift al-gorithm and active contours. The method consists of two steps: object localization and object extraction. In the first step, the object location is estimated using mean shift. And the second step, at the location, evolves the initial curve using an active contour model. The proposed method not only develops the advantage of the curve evolution based approaches but also adds the robustness to large amount of motion of the ob-ject.

The remainder of the paper is organized as follows. Chapter 2 illustrates how to lo-calize the object using mean shift algorithm and active contours based object detec-tion method is shown in chapter 3. Experimental results are presented in chapter 4. Finally, chapter 5 concludes the paper.

2 Object Localization

2.1 Mean Shift Algorithm

The mean shift algorithm is a nonparametric technique that climbs the gradient of a probability distribution to find the nearest dominant mode (peak) [5, 6]. The algo-rithm has recently been adopted as an efficient technique for object tracking [6, 7].

The algorithm simply replacing the search window location (the centroid) with a object probability distribution {P(Iij|αo)}i,j=1,…,IW,IH(IW: image width, IH: image height) which represent the probability of a pixel (i,j) in the image being part of object, where αo is its parameters and I is a photometric variable. The search window location is simply computed as follows [5, 6, 7]:

x = M10/M00 and y = M01/M00 , (1)

where Mab is the (a + b)th moment as defined by

∑∈

=Wji

oijba

ab IPjiWM,

).|()( α

The object location is obtained by successive computations of the search window location (x,y).

2.2 Object Localization Using Mean Shift

The mean shift algorithm for object localization is as follows:

1. Set up initial location and size of search window W and repeat Steps 2 to 4 until terminal condition is satisfied.

2. Generate a distribution over a photometric variable, object probability distribution, within W.

3. Estimate the search window location using Eq. (1). 4. (If the second iteration, modify the size of W as bounding box size of

initial curve.) 5. Output the window location as the object location.

If the variation of the window location is smaller than a threshold value, then the terminal condition is satisfied.

28 J.S. Chang et al.

In the mean shift algorithm, instead of calculating the object probability distribu-tion over the whole image, the distribution calculation can be restricted to a smaller image region within the search window. This results in significant computational savings when the object does not dominate the image [5].

2.3 Adaptation of Search Window Size

The search window size of general mean shift algorithm is determined according to object size. It is efficient to track the object whose motion is smaller than the object size. However, in many case, objects have large motion due to their activity and low frame rate. The smaller search window than the object motion fails to track the ob-ject. Accordingly, in this paper, the size of the search window in the first iteration of the mean shift algorithm is adaptively determined in direct proportional to the amount of object’s motion, which is determined as follows:

( )( ) widthwidthtx

txwidth BBmmW βα +−−= − 0,max 1 and

( )( ) heightheightty

tyheight BBmmW βα +−−= − 0,max 1 ,

(2)

where α and β is a constant and superscript of m means frame index.

3 Object Extraction

3.1 Active Contours Based on Region Competition

Zhu and Yuille proposed a hybrid approach to image segmentation, called region competition [8]. Their basic functional is as follows:

{ }( )∑ ∫= ⎭

⎬⎫

⎩⎨⎧ +∈−=Γ

M

iiisRi RsIPdsE

i1

|:log2

}]{,[ λαµα , (3)

where Γ is the boundary in the image, P(·) is a specific distribution for region Ri, αi is its parameters, M is the number of the regions, s is a site of image coordinate system, and µ and λ are two constants.

To minimize the energy E, steepest descent can be done with respect to boundary Γ. For any point v

! . On the boundary Γ we obtain:

{ }[ ]v

E

dt

vd i!!

δαδ ,Γ−= , (4)

where the right-hand side is (minus) the functional derivative of the energy E. Taking the functional derivative yields the motion equation for point v

! :

∑∈ ⎭

⎬⎫

⎩⎨⎧ +−=

)(

)()()()( )|(log2

vQkvkkvvkvk nIPnk

dt

vd

!

!!!!!!!

αµ , (5)

where { }kv vkQ Γ= on lies |)(

!! , i.e., the summation is done over those regions Rk for

which v! is on Γk. )(vkk ! is the curvature of Γk at point v

! and )(vkn !

! is the unit normal to

Γk at point v! .


Region competition contains many of the desirable properties of region growing and active contours. Indeed we can derive many aspects of these models as special cases of region competition [8, 9]. Active contours can be a special case in which there are two regions (object region Ro and background region Rb) and a common boundary Γ as shown in follows:

( ) )()()()()( )|(log)|(log vobvovvovo nIPIPnkdt

vd!!!!!

!!!ααµ −+−= (6)

3.2 Level Set Implementation

The active contour evolution was implemented using the level set technique. We represent curve Γ implicitly by the zero level set of function u : ℜ2 → ℜ, with the region inside Γ corresponding to u > 0. Accordingly, Eq. (6) can be rewritten by the following equation, which is a level set evolution equation [2, 3]:

( ) uIPIPukdt

sdubsoss ∇−+∇−= )|(log)|(log

)( ααµ , (7)

where

.)(

22/322

22

yx

xyyxyxyyxx

uu

uuuuuyuk

++−

=

The curve evolution is achieved by iterative calculation of level values u(s) using Eq. (7). In curve evolution, the stopping criterion is satisfied when the difference of the number of the pixel inside curve v

! in the successive iteration is less than a thresh-old value. The threshold value is used a constant chosen experimentally.

3.3 Object Extraction Using Active Contours

The aim of the object extraction is to find closed curve that separates the image into object and background regions. The object to be tracked is assumed to be character-ized by a probability distribution, an object probability distribution P(Is| αo), over some variable such as intensity, color, or texture. Unlike in the object region, the background is difficult to be characterized a simple probability distribution. The dis-tribution is not clustered in a small area of a feature space due to their variety. How-ever, it is spread out across the whole space uniformly for a variety of background regions. From that, we can assume that the photometric variable of background is uniformly distributed in the space. Thus, the distribution P(Is| αb) can be proportional to a constant value.

Active contour model based object boundary extraction algorithm is as follows:

1. Set up initial level values u, and repeat Steps 2 to 3 until terminal con-dition is satisfied.

2. Update level values using Eq. (7) within narrow band around curve, zero level set.

3. Reconstruct the evolved curve, zero level set. 4. Output the final evolved curve as the object boundary.


To set up the initial level values, we use a Euclidian distance mapping technique. Euclidian distance between each pixel of the image and initial curve is assigned to the pixel as a level value. In general active contours, the search area for optimal boundary curve is restricted to the narrow band around curve. This not only save computational cost but also avoid the local optima when the initial curve is near the object boundary. However it makes the evolving curve miss the boundary when the curve is far from the object.

After updating the level values, the approximated final propagated curve, the zero level set, is reconstructed. Curve reconstruction is accomplished by determining the zero crossing grid location in the level set function. The terminal condition is satisfied when the difference of the number of pixel inside contour Γ is less than a threshold value chosen manually.

4 Experimental Results

This paper presents a method for tracking object which have distributions over some photometric variable such as intensity, color, or texture. This section focuses on evaluating the proposed method. In order to assess the effectiveness of the proposed method, it was tested with a synthetic image sequence and hand image sequences, and then the results were compared with those obtained using the active contours for dis-tribution tracking proposed by Freedman et al. [2].

Freedman’s method finds the region such that the sample distribution of the inte-rior of the region most closely matches the model distribution using active contours. For matching distribution, the method examined Kullback-Leibler distance and Bhat-tacharyya measure. In this experiment, we only have tested former.

4.1 Evaluation Function

To quantitatively evaluate the performance of the two methods, The Chamfer distance was used. This distance has been many used as matching measure between shapes [10]. To calculate the distance, ground truths are manually extracted from images to construct accurate boundaries of each object. Then, the distances between the ground truth and the object boundaries extracted by the respective method are calculated.

The Chamfer distance is the average over one shape of distance to the closet point on the other and defined as

∑=

=n

iiv

nGFC

1

21

3

1),( , (8)

where F and G are sets of pixels on object boundary detected by the proposed method and manually, respectively. In Eq. (8), vi are the distance values from each point on F to the closet point on G and n is the number of points in the curve. The distance val-ues vi were described in [10].

4.2 Tracking in Synthetic Sequences

To demonstrate the ability of the method to track textured regions, a synthetic image sequence is used. In the sequence, the background is composed of horizontal strips,


while the object is composed of diagonal strips. For photometric variable which de-scribe the object, a simple texture vector may be chosen based on the directions of (nonzero) intensity gradients in the neighborhood of a pixel.

Fig. 1 and 2 show tracking results in the synthetic sequence extracted using the proposed method and Freedman’s method, respectively. In the first frame, an initial curve was manually selected around the object, and then the curve was evolved using only active contours. The Chamfer distances of the two methods are shown in Fig. 3. In the case of the proposed method, object localization using mean shift is considered as the first iteration. The distance in the proposed method decreases more dramati-cally and the method satisfies the stopping criteria after less iteration than Freedman’s method. Due to it, Freedman’s method takes lager time to track the object than the

1st frame 2nd frame 3rd frame

4th frame 5th frame 6th frame

Fig. 1. Tracking with the proposed method in synthetic images



Fig. 2. Tracking with the Freedman’s method in synthetic images


proposed method as shown in Table 1. When visually inspected, the proposed method produces superior detection results to the Freedman’s method. As shown in Fig. 1 and 2, the proposed method detects object boundary accurately. On the contrary, the Freedman’s method produces some holes and tough boundaries. This is because ac-tive contours detect whole local optima passed by curve during curve evolution but the proposed method moves the initial curve near the global optimum using mean shift algorithm before curve evolution.

0

5

10

15

20

25

30

35

40

0 2 4 6 8 10 12 14 16 18 20 22 24

Iteration

Chamfer Distance

Freedman's methodthe proposed method

Fig. 3. Comparison of two methods in term of the Chamfer distance

Table 1. Time taken for tracking in synthetic images (sec.)

1st frame 2nd frame 3rd frame 4th frame 5th frame 6th frame Freedman’s

method 0.031000 0.157000 0.172000 0.281000 0.313000 0.282000

proposed method

0.031000 0.047000 0.063000 0.063000 0.093000 0.125000

Fig. 4. Tracking in synthetic images include a large amount of motion



One of the problems of almost active contours is that the search areas for optima are limited to the narrow band around curve. Because of it, the active contours have difficulties to track objects that have large amount of motion. The other side, in the proposed method, the initial curve is moved near the global optimum before curve evolution. Accordingly, the method is more effective to track the objects that have large amount of motion. Fig.4 shows the tracking results in a synthetic sequence de-signed to demonstrate the ability of the proposed method to track objects that have large amount of motion. For photometric variable which describe the object, a simple texture vector may be chosen RGB color value of a pixel. As shown Fig. 4, the pro-posed method tracks the object while Freeman’s method fails to track it.

Table 2. Time taken for tracking in hand sequence (sec.)

Fig. 5. Tracking with the proposed method in hand sequence

4.3 Tracking in Hands Images

To assess the effectiveness of the proposed method to real image sequence, it is ap-plied to hand tracking. For photometric variable which describe the hands, we use skin-color information which is represented by a 2D-Gaussian model. In the RGB space, color representation includes both color and brightness. Therefore, RGB is not

1st frame 2nd frame 3rd frame 4th frame 5th frame 6th frame Feedman’s

method 0.192000 0.360000 0.359000 0.453000 0.188000 0.438000

proposed method

0.192000 0.188000 0.187000 0.218000 0.156000 0.188000




necessarily the best color representation for detecting pixels with skin color. Bright-ness can be removed by dividing the three components of a color pixel (R, G, B) ac-cording to intensity. This space is known as chromatic color, where intensity is a normalized color vector with two components (r, g). The skin-color model is obtained from 200 sample images. Means and covariance matrix of the skin color model are as follows:

)064.79,588.117(),( == grm ,

.748.8085.10

085.10132.242

,

,2

⎥⎦

⎤⎢⎣

⎡−

−=⎥⎥⎦

⎤

⎢⎢⎣

⎡=Σ

ggrYX

rgYXr

σσσρσσρσ

The hand tracking result in real image sequence is shown in Fig. 5. The proposed method is successful in tracking through the entire 80-frame sequence. Freedman’s method also succeeds in the hand tracking in the sequence, because the sequence has high capture rate and hand has not a large movement. However Freedman’s method takes lager time to track the hand than the proposed method as shown in Table 2.

5 Conclusions

In this paper, we have proposed an active contour model based object tracking with mean shift algorithm. In the approaches based on active contour models, the object tracking problem is considered as a curve flow problem and their results are very sensitive to condition of initial contour. Bad initial condition induces a heavy compu-tational cost, low accuracy of results, and missing the object that has a large move-ment. Accordingly, the proposed method consisted of two steps: object localization and object extraction. The first step finds the object location using a mean shift algo-rithm. And at the location, the initial curve is evolved using an active contour model to find object boundary. The experimental results shown demonstrate that the pro-posed method yields accurate tracking results despite low computational cost.

Acknowledgement

This work was supported by the Korea Research Foundation Grant (KRF-2004- 041-D00643).

References

1. Fenster, S. D., Kender, J. R.: Sectored Snakes: Evaluating Learned Energy Segmentations. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 23, No. 9 (2002) 1028-1034

2. Freedman, D., Zhang, T.: Active Contours for Tracking Distributions. IEEE Transactions on Image Processing. Vol. 13, No. 4 (2004) 518-526

3. Chan, T. F., Vese, L. A.: Active Contours Without Edges. IEEE Transactions on Image Processing. Vol. 10, No. 2 (2001) 266-277


4. Gastaud, M., Barlaud, M., Aubert, G.: Combining Shape Prior and Statistical Features for Active Contour Segmentation. IEEE Transactions on Circuits and Systems for Video Technology. Vol. 14. No. 5 (2004) 726-734

5. Kim, K. I., Jung, K., Kim, J. H.:Texture-Based Approach for Text Detection in Image Us-ing Support Vector Machines and Continuously Adaptive Mean Shift Algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 25, No. 12 (2003) 1631-1639

6. Bradski, G. R.: Computer Vision Face Tracking For Use in a Perceptual User Interface. In-tel Technology Journal 2nd quarter (1998) 1-15

7. Jaffre, G., Crouzil, A.: Non-rigid Object Localization From Color Model Using Mean Shift. In Proceedings of the International Conference on Image Processing, Vol. 3 (2003) 317-319

8. Zhu, S. C., Yuille, A.: Region Competition: Unifying Snakes, Region Growing, and Bayes/MDL for Multiband Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 18, No 9 (1996) 884-900

9. Mansouri, A.: Region Tracking via Level Set PDEs without Motion Computation. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 24, No. 7 (2002) 947-961

10. Borgefors, G.: Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 10. No. 11 (1998) 849-865

[lecture notes in computer science] innovations in applied artificial intelligence volume 3533 ||...

Documents