a supervised approach to deformable contour tracking...

7
A Supervised Approach to Deformable Contour Tracking Paper No.: 1131 Abstract Contour tracking is a tough problem since the motion of the contour exhibits a complex dynamic process in which nonlinear deformation often occurs. This paper proposes a supervised approach to deformable contour tracking. The tracking system is built on a dynamic Bayesian network, which introduces one more hidden Markov process to switch among different contours. The contours are used to help the tracker to estimate and predict the target’s position through motion decom- position. The motion is decomposed into two compo- nents: a global affine motion and a kind of local non- rigid deformation. The global affine motion and the lo- cal deformation are utilized together to limit the sample space and supervise the sampling process by combining with the information from image observations. This approach has an advantage that only a small number of samples are needed to infer the motion state dur- ing tracking. The effectiveness of the proposed method for tracking deformable contour is demonstrated for a variety of image sequences. 1 Introduction Video-based contour tracking in cluttered environ- ments is a challenging problem. The difficulty lies in the following three facts. First, there are no easy ways to identify the target directly through image measure- ments. Second, the contour curve is too complex to be represented in a low dimensional parameter space in many applications. This situation becomes more ap- parent and difficult when nonlinear deformation occurs during motion. Third, occlusion makes tracking task increasingly arduous. Since tracking should be fulfilled in a dynamic environment, the important work is not to describe the contour curve itself, but to analyze its variations, namely, the global motion and the local de- formation. This variation information is heuristic for state search. Thus, it would be helpful for the tracker if utilized appropriately. Background modelling [1] can help us detect the mo- tion of the target. However, when parts of the target or the target itself stay motionless for a long time, the corresponding contour information loses. Methodolog- ically, the process of learning background is in gen- eral done in a pixelwise measurement style. Thus noise from shadow is a special problem for extracting the target contour. ne can the visual tracking based ap- proaches. The dynamic process of deformable contour motion is complex. To increase the robustness and accuracy of the tracker, some hints can help to search the posi- tion of the target along right directions. How to obtain and utilize the hints now becomes a task to be accom- plished. To approach to these problems, this paper presents a tracking system based on a Dynamic Bayesian Network (DBN), which introduces one more Hidden Markov Process (HMP) based on the ordinary DNB used for visual tracking [2, 3]. The HMP can switch among dif- ferent contours. Those contours are application depen- dent, and can be learned or calculated from the training samples. Through them, the variations of the target contour and the latent appearances are both implic- itly presented. They are used to supervise the motion and the deformation of the target contour. The su- pervised variation contains two components: a global affine motion stipulated by an affine transform matrix and a kind of local non-rigid deformation expressed by the thin-plate splines [4]. Moreover, the real dynamic system is also modelled implicitly through the HMP. Tracking can be done by inferring the target states and the hidden states simultaneously. Although we use DBN as a tracking framework, we need not employ a large number of weighted samples during tracking. The reason is that the information about the supervised motion and deformation helps one to limit the sampling space. During this process, the motion information extracted from the image ob- servations is utilized to further restraint the sampling process. As a result, one can use a small number of samples to infer the target state by maximum a poste- riori (MAP) via observation model. Compared with the Condensation [2], the pro- posed approach integrates the hidden Markov factor, the supervised motion and deformation, and the mo- tion information extracted from image observation dur- ing the tracking process. 1

Upload: others

Post on 24-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Supervised Approach to Deformable Contour Tracking Abstractvision.ucsd.edu/tmp/ICCV2005/A1131.pdf · rameters to describe contour, such as a rectangle, an ellipse, etc., we are

A Supervised Approach to Deformable Contour Tracking

Paper No.: 1131

Abstract

Contour tracking is a tough problem since the motionof the contour exhibits a complex dynamic process inwhich nonlinear deformation often occurs. This paperproposes a supervised approach to deformable contourtracking. The tracking system is built on a dynamicBayesian network, which introduces one more hiddenMarkov process to switch among different contours.The contours are used to help the tracker to estimateand predict the target’s position through motion decom-position. The motion is decomposed into two compo-nents: a global affine motion and a kind of local non-rigid deformation. The global affine motion and the lo-cal deformation are utilized together to limit the samplespace and supervise the sampling process by combiningwith the information from image observations. Thisapproach has an advantage that only a small numberof samples are needed to infer the motion state dur-ing tracking. The effectiveness of the proposed methodfor tracking deformable contour is demonstrated for avariety of image sequences.

1 Introduction

Video-based contour tracking in cluttered environ-ments is a challenging problem. The difficulty lies inthe following three facts. First, there are no easy waysto identify the target directly through image measure-ments. Second, the contour curve is too complex to berepresented in a low dimensional parameter space inmany applications. This situation becomes more ap-parent and difficult when nonlinear deformation occursduring motion. Third, occlusion makes tracking taskincreasingly arduous. Since tracking should be fulfilledin a dynamic environment, the important work is notto describe the contour curve itself, but to analyze itsvariations, namely, the global motion and the local de-formation. This variation information is heuristic forstate search. Thus, it would be helpful for the trackerif utilized appropriately.

Background modelling [1] can help us detect the mo-tion of the target. However, when parts of the targetor the target itself stay motionless for a long time, thecorresponding contour information loses. Methodolog-

ically, the process of learning background is in gen-eral done in a pixelwise measurement style. Thus noisefrom shadow is a special problem for extracting thetarget contour. ne can the visual tracking based ap-proaches.

The dynamic process of deformable contour motionis complex. To increase the robustness and accuracyof the tracker, some hints can help to search the posi-tion of the target along right directions. How to obtainand utilize the hints now becomes a task to be accom-plished.

To approach to these problems, this paper presents atracking system based on a Dynamic Bayesian Network(DBN), which introduces one more Hidden MarkovProcess (HMP) based on the ordinary DNB used forvisual tracking [2, 3]. The HMP can switch among dif-ferent contours. Those contours are application depen-dent, and can be learned or calculated from the trainingsamples. Through them, the variations of the targetcontour and the latent appearances are both implic-itly presented. They are used to supervise the motionand the deformation of the target contour. The su-pervised variation contains two components: a globalaffine motion stipulated by an affine transform matrixand a kind of local non-rigid deformation expressed bythe thin-plate splines [4]. Moreover, the real dynamicsystem is also modelled implicitly through the HMP.Tracking can be done by inferring the target states andthe hidden states simultaneously.

Although we use DBN as a tracking framework, weneed not employ a large number of weighted samplesduring tracking. The reason is that the informationabout the supervised motion and deformation helpsone to limit the sampling space. During this process,the motion information extracted from the image ob-servations is utilized to further restraint the samplingprocess. As a result, one can use a small number ofsamples to infer the target state by maximum a poste-riori (MAP) via observation model.

Compared with the Condensation [2], the pro-posed approach integrates the hidden Markov factor,the supervised motion and deformation, and the mo-tion information extracted from image observation dur-ing the tracking process.

1

Page 2: A Supervised Approach to Deformable Contour Tracking Abstractvision.ucsd.edu/tmp/ICCV2005/A1131.pdf · rameters to describe contour, such as a rectangle, an ellipse, etc., we are

2 Related Work

We denote the target state at time t by Xt, and all theobserved image measurements by Z1:t = {Z1, · · · ,Zt}.The task of visual tracking is to estimate the time-evolving posterior density p(Xt|Z1:t) and infer the stateXt. The conditional density propagation is governedby the dynamic model p(Xt+1|Xt)and the observationmodel p(Zt|Xt).

Bayesian sequential estimation is a popular ap-proach to the above problem. It is difficult to ob-tain analytical results under this framework, due tothe coarse approximations to the dynamic model andobservation model.

A lot of efforts have been paid to learn dynamicmodels [5, 6, 7, 8]. The original intention of employinglearning algorithms is to discover the latent patternsto help us model them more explicitly. To learn thedynamic model on-line or use the learned ones directly,a switch model can be added into a DBN. Pavlovicet al. describ a DNB-based switching linear dynamicsystem for human figure tracking [9]. Isard and Blake[2] design a mixed-state Condensation tracker, whichuses motion models to predict the motion of a bouncingball.

As for observation model, the computation isdensely connected to image measurement[2, 10, 11].In order to incorporate multiple cues by expandingimage observation, Chen et al. [12] propose a HMMframework targeted at contour-based object tracking,in which a joint probability data association filter(JPDAF) is used to compute the HMM’s transitionprobabilities. Wu et al. [13] propose a generativemodel approach to contour tracking based on a DBN,which consists of multiple hidden processes. The de-signed tracker can automatically switch among differ-ent observation models.

In recent years particle filter [3] has proved to bea successful approach. Its success stems from its sim-plicity, generality and adaptability over a wide rangeof challenging applications. However these good prop-erties would lose if there are not enough samples. Italso needs a suitable proposal distribution from whichnew particles can be simulated [14, 15]. In addition, inmost work on visual tracking, the importance weightsare directly calculated from image measurement, sel-dom associated with the motion of the target.

It is difficult for particle filter based approaches toget a good estimation in case of high dimensional statespace. When we use a simple shape with a few pa-rameters to describe contour, such as a rectangle, anellipse, etc., we are faced with how to represent its de-formation. Within a tracking context, the important

work is to describe the contour’s motion and deforma-tion. Non-rigid shape matching [4, 16, 17] provides amethodology for solving this task.

3 Tracking Model

The motion of deformable contour is more complexcompared with that of the rigid one. Due to the de-formation, it is difficult in general for us to model thewhole continuous motion process. Since the DBN al-lows the recursive estimation of the state, it is feasibleto provide the tracker with local motion informationat each time, respectively. In addition, a complex mo-tion can be in general parsed into a set of fundamentalposes. If we correspond each pose to a state, then aMarkov Chain can be introduced to describe a newmotion sequence. Furthermore, for a real motion, thetarget pose at time t is always similar to one of thoseposes. Those poses not only explicitly represent theappearances of the target contour, but also implicitlyrender its variation information. Thus, it is feasible forus to avoid representing the contour curve with geomet-rical parameters of high dimensionality. This observa-tion further motivates us to use the similar poses toobtain the variation information of the target contour.As a result, we can avoid sampling in high dimensionalspace when using a DNB as tracking framework. Thetracking model of this paper can be described by aDBN in Figure 1.

tS 1t−S

tZ 1t+Z

1t−X

1t−Z

tX 1t+X

1t+S

Figure 1: The Bayesian network representation of thetracking system.

Based on the ordinary DBN used for visual track-ing, this model introduces one more HMP {St}, whichswitches among different fundamental poses.

In this model, St ∈ T = {C1, · · · ,CNs} is a discreterandom variable to indicate which fundamental pose isselected. P (St+1|St), the state transition probability,stipulates the transition which is described by a finitestate machine M :

M = [m(i, j)] = [P (Cj |Ci)], i, j = 1, · · · , Ns

2

Page 3: A Supervised Approach to Deformable Contour Tracking Abstractvision.ucsd.edu/tmp/ICCV2005/A1131.pdf · rameters to describe contour, such as a rectangle, an ellipse, etc., we are

According to the graphical model shown in Figure1, the posterior density p(Xt,St|Z1:t) should be esti-mated by sampling. It requires a very large number ofsamples to obtain a good estimation in case of high-dimensional state space. The next sections will discusshow to reduce the number of the samples.

4 Supervising the Contour’sMotion and Deformation

For a deformable contour, the nonlinear deformationoften occurs during moving. We assume that the mo-tion can be decomposed into two components. Onecomponent is the global affine motion, which is stip-ulated by an affine transform matrix. The other isthe local non-affine deformation, namely, nonlinear de-formation, which can be described by the thin-platesplines [4]. As a result, the motion of the kth pointP k

Cson the contour Cs can be expressed as follows [4]:

FCd(P k

Cs,A,D) = P k

Cs·A + φ(P k

Cs) ·D, (1)

where FCd(P k

Cs,A,D) denotes the new location of the

point P kCs

, A is a global affine matrix with six unknownparameters, and D is the coefficient matrix about localnon-linear deformation with 2N unknown parameters.Here, N is the number of the points of Cs to be con-sidered. The kernel distance vector is calculated as

φ(P kCs

) = (φ(∥∥P k

Cs− P 1

Cs

∥∥), · · · , φ(∥∥P k

Cs− PN

Cs

∥∥))

where φ(·) is related to the twist energy function [4],which is constructed from the thin-plate splines.

There have totally 2N+6 parameters to be esti-mated in Formula (1). However, due to the motioncontinuity, the variation of the Cs at each time is notfree. Since it would deform and move to a destinationlocation, the variation during this process can be su-pervised by an appropriate contour, Cd. In Formula(1), Cd is the destination of Cs, which acts as a mo-tion and deformation controller. Given Cd, we can useFormula (1) to solve the parameters in A and D forCs by minimizing the twist energy [4].

Figure 2 shows an example of a contour’s motionand deformation. The contour Cs is extracted fromthe bottom image in Figure 2(a), while Cd is from thetop. The motion is demonstrated pointwise by line seg-ments in Figure 2(b). The result is also drawn into ablended image with the original resolution (see Figure2(c)). According to Figure 2, we can see that the vari-ation of Cs can not be stipulated only by a normalaffine transformation with six parameters. The localvariation part is described by the thin-plate functionsin Formula (1).

(a)

(b)

(c)

Figure 2: (a) two frames form a video; (b) the motionof a contour restricted by another guiding contour; (3)the result shows on the blend image.

5 Tracking Algorithm

5.1. Observation Model

It is indispensable and critical for contour tracking tocalculate the likelihood P (Zt|Xt) since it decides thereliability of a predicted sample. Some existing obser-vation models can be found in [2, 13]. Here we calculatethe likelihood through the Hausdorff distance measure-ment between the contours and the current image byfast global matching [18].

We denote the predicted contour by C. Note thatthere have only N controlling points to depict C. Tocompute the Hausdorff distance, we first get all thediscrete pixels of C and denote them by Cp. Then wehave

P (Zt|Xt = C) ≈ P (Zt|Xt = C,Cp) (2)

Through Cp and the edge features of the observedimage, the Hausdorff distance d is easy to compute [18].

We assume that the distances {d} associated withthe predicted contours and image edge features bear aGassian distribution N(d : dh, σh), where dh and σh

denotes the mean and variance, respectively. Thus weobtain

P (Zt|Xt = C,Cp) = N(d− dh : 0, σh)

5.2. Sampling Process

According to the graphical model depicted in Figure 1,the propagation process requires the tracker to predict

3

Page 4: A Supervised Approach to Deformable Contour Tracking Abstractvision.ucsd.edu/tmp/ICCV2005/A1131.pdf · rameters to describe contour, such as a rectangle, an ellipse, etc., we are

Xt+1 by sampling from p(Xt+1|Xt,St+1). For eachpredicted sample, an importance weight is associatedwith it through the likelihood model, which is based onimage observation.

In general, to obtain a good approximate descriptionabout p(Xt+1|Xt,St+1), a large number of samples areneeded. For instance, suppose that the dimension ofthe state space is 246 (which corresponds to the situa-tion that N is equal to 120 in Formula (1)), and thatthere only assigns 50 samples for each element, thenthe total number of samples is 50246. For high dimen-sional state space, one faces with two questions: (1)how many samples for prediction are sufficient and (2)how to keep a better structure for the distribution ofthese samples. Actually, one needs to effectively findout what samples are highly weighted.

For the sample far from the right state, the likeli-hood is relatively lower. Thus, discarding these sam-ples can improve the efficiency of the sampling process.To discard those samples is to do re-sampling processaccording to importance weights [3]. This is equivalentto supervising the sampling process to qualify the sam-ples via observations by limiting the sampling space.

On the other hand, in many applications, one mayobtain or learn the prior knowledge about the targetmotion. In context of contouring tracking , one mayknow at advance a set of fundamental poses of a com-plex motion, for example, the poses of a weight-liftingathlete when lifting a barbell. For this purpose, theHMP in Figure 1 is introduced to supervise the trackingprocess. From a sampling point of view, the tracker canutilize the information provided by the hidden statesto reduce the search space to implicitly realize the im-portance sampling.

In addition, we can also utilize the planar movementof a contour shown in image sequence. The latent mo-tion information may be extracted via the change ofimage features. For 2D contour tracking, it can be di-rectly associated with the motion of the target contour.By cooperating with the information provided by theHMP, the search space is further reduced.

Based on the above considerations, we give the up-dating equation as follows

Xt+1 = Xt + wI · It+1(Zt,Zt+1)+ wG · (X′

t+1(St+1)−Xt) + Vt+1 (3)

wI + wG = 1 (4)

where It+1(Zt+1,Zt) describes the change of the ob-servations from Zt to Zt+1. X′

t+1(St+1)−Xt denotesthe restriction of the hidden state to Xt’s motion andformation. wI and wG denote the weight, respectively.The term Vt+1 is used to denote the random noise.

Equation (3) is an implicit description about the dy-namic system of the target state, which has the recur-sive form fit to the DNB in Figure 1. From Equation(3) we can see that the sampling process is now equiva-lent to generating samples for X′

t+1(St+1) according tothe finite state machine M and the random term Vt+1.We may get good tracking results from a small numberof samples since It+1(Zt+1,Zt) and (X′

t+1(St+1)−Xt)simultaneously constrain the sampling space.

5.3. Inferring Target State

The previous section discusses the sample process.Now we can generate samples for Xt+1. First constructan indicator set

∆t+1 = {j|P (Sj |St) 6= 0} (5)

For j ∈ ∆t+1, we then get samples of Xt+1

X(j)t+1 = Xt + w

(j)I · It+1(Zt,Zt+1) (6)

+ w(j)G · (X′

t+1(Cj)− Xt), j = 1, · · · , N ′

where Xt is the estimation at time t, w(j)I and w

(j)G are

calculated from It+1(Zt+1,Zt) and X′t+1(Cj)−Xt (see

in section 5.4). N ′ is the number of the set of ∆t+1.Given {S(j)

t+1}, we generate Xt+1 on the assumptionthat the conditional probability P (X|S) is uniform.That is, the generation of {X(j)

t+1} is based on {S(j)t+1},

which is in turn governed by the state transition prob-ability {P (·|St)} . We have

P (Xt+1 = X(j)t+1|Zt+1)

=P (Zt+1|Xt+1 = X(j)

t+1) · P (Xt+1 = X(j)t+1)

P (Zt+1)

∝ P (Zt+1|Xt+1 = X(j)t+1) · P (Sj |St)

P (Zt+1)

∝ P (Zt+1|Xt+1 = X(j)t+1) · P (Sj |St)

By MAP, we finally obtain

< Xt+1, St+1 > = (7)

arg max<X

(j)t+1,Sj>

Sj∈ CSt

P (Zt+1|Xt+1 = X(j)t+1) · P (Sj |St)

5.4. Details of the Computation

This subsection discusses how to calculateX′

t+1(St+1) − Xt, It+1(Zt,Zt+1), wI and wG inEquation (3).

4

Page 5: A Supervised Approach to Deformable Contour Tracking Abstractvision.ucsd.edu/tmp/ICCV2005/A1131.pdf · rameters to describe contour, such as a rectangle, an ellipse, etc., we are

Note that X′t+1(St+1) should be one of the approx-

imation estimation of Xt+1. Because Xt+1 is unknownhere, we take X′

t+1(St+1) ≈ St+1. This means that theHMP uses the contours in the set T to supervise thesampling process about Xt+1. The reason we do theabove approximation is that we only need to optimizethe parameters in Formula (1) once.

For Formula (1), let Cs = Xt, and Cd =X′

t+1(St+1), then we can calculate the parameters inA and D. Finally the displacement of each point on Xt

can be evaluated. Denote the displacement of the kth

point of Xt by Dk.Since It+1(Zt,Zt+1) reflects the change of the obser-

vations from Zt to Zt+1, naturally it can be evaluatedfrom the inter-frame displacements [19] of the target.

However, the inter-frame displacements are not veryrobust sometimes. Thus we need further computationabout It+1(Zt,Zt+1).

Denoting the neighbor set of the kth point of Xt

as Wk, we construct a indicator set ∆k = {j|Pj ∈Wk, Ilk(Pj) · Dk > 0}, where Ilk(Pj) is the displace-ment of Pj , and it is calculated from inter-frame dis-placement. Thus we have

Iki+1 =

1|∆k|

j∈∆k

Ilk(Pj) (8)

where Iki+1 is the kth elements of It+1(Zt,Zt+1), and

|∆k| is the number of the elements in ∆k.If |∆k| = 0, all the directions of {Ilk(Pj)} are op-

posite to that of Dk. This means that the two kindsof motion information conflict with each other. Thissituation occurs when an unsuitable St+1 is employedto guide the motion of the target. In this situation, weset

Iki+1 = (Ilk(Pk) + Dk)/2 (9)

Finally, wI and wG are calculated as follows:

wI =Nt∑

k=1

∥∥Iki+1

∥∥/

Ni∑

k=1

[∥∥Ik

i+1

∥∥ + ‖Dk‖] (10)

wG = 1− wI (11)

5.5. Algorithmic Procedures

Now we can summarize the procedures of tracking al-gorithm as follows:

Given St, Xt at time t, the following operations areperformed in succession,Step1: construct ∆t+1 according to Equation (5). Forj ∈ ∆t+1:

1) calculate the unknown parameters in A and D inFormula (1) according to Xt and Cj , then calcu-late (X′

t+1(Cj)− Xt)

2) calculate Iki+1 according to Formula (8) or Formula

(9), k = 1, · · · , N . Then obtain It+1(Zt,Zt+1)

3) calculate w(j)I and w

(j)(G) according to Formula (10)

and Formula (11) respectively

4) calculate X(j)t+1 according to Formula (6)

Step2: infer St+1 by MAP, according to Formula (7)Step3: add samples and correct Xt+1

Step4: re-sample {Xt+1}Through the Step 1 and 2, we limit the main region

of the samples with high confidences as well as get thecandidate appearances for the target contour. To ob-tain good adaptability in real situation, random factorsshould be further introduced into the samples to keep abetter distribution structure for prediction and to get abetter estimation about Xt+1. We first define an offsetpoint set Ws = {(x, y)| − r 6 x, y 6 r|r ∈ N}, thentranslate Xt+1 according to Ws. This leads us to get(2r + 1)2/4 new samples for Xt+1.

We should further rectify Xt+1 to fit the image fea-tures. For each point on Xt+1, we can decide whetherit should be rectified according to a preset probabilityP0. For each selected point to be moved, a line seg-ment, with length L according to the point itself andits normal vector, is produced to measure the featurepoints [13]. From those detected points and the pointitself, we select one point at random as its replacer. Asa result, a new contour is produced . Geometrically,it is then fairred to obtain a smoothed estimation ofXt+1. The process of rectifying is done in a coarse-to-fine style. Finally, by maximizing P (Zt+1|Xt+1), weget a new estimation, which is denoted as Xt+1.

Note that the sampling process according to Ws isessentially an sample-adding process about Xt+1. Tothis end, we assume that Xt+1 bears a uniform dis-tribution sampling near the approximate estimation ofXt+1, which obtained through Step 1.

The two procedures of the sampling process, includ-ing the Step 1 and 3, implicity deal with the term ofVt+1 in Equation (3). The normal re-sampling is alsonecessary after the above number-reduced sampling.

6 Experimental Results

The proposed method has been applied to the task oftracking the contour of human body and other inter-ested objects. Two experiments are reported here. Oneis tracking the contour of a single human body, and theother is tracking the whole contour shaped by a humanbody and a barbell in weight-lifting exercise.

5

Page 6: A Supervised Approach to Deformable Contour Tracking Abstractvision.ucsd.edu/tmp/ICCV2005/A1131.pdf · rameters to describe contour, such as a rectangle, an ellipse, etc., we are

Figure 3(a) and Figure 3(b) show the poses rep-resenting two kinds of motion. These poses are se-lected and averaged from the video sequences everyeight frames, respectively. The contours of these posescompose the hidden state set T . Each contour is dis-cretized as 120 controlling points. To calculate thelikelihood, we take dh = 4 and σh = 10. When do-ing the sample-adding process for Xt+1 described insection 5.5, we take r=5. when rectifying the points,L is equal to 5 and P0 is equal to 0.3333. To computeIt+1(Zt,Zt+1), the size of the neighbor window is 3×3.The parameters of the transition of {St} and the initialposition for tracking are manually set.

(a)

(b)

Figure 3: (a) a set of a single human body poses; (b)a set of poses shaped by human body and barbell.

Figure 4 shows some results of the first reported ex-perimental results. The interested object is the hu-man body. During tracking, the motion is guided bythe poses demonstrated in Figure 3(a). Figure 5 showssome results of the second experiment, where the inter-ested objects are the human body and the barbell. Themotion is guided by the poses in Figure 3(b). Becauseof the high speed of the barbell and the occlusion, themotion and deformation of the contour changes sharplyduring lifting. Through the guidance of the hiddenposes, the tracker gets good tracking results.

Actually, the tracker utilizes useful information fromtwo sources. One is the HMP. The other is the ob-servations, from which not only the likelihood is cal-culated, but also the motion information is extracted.The sampling process is then more explicit since thesearch space is certainly limited. This enables us toget a good estimation via a small number of samples.

For many special applications, the poses in T canbe learned from sample sequences by machine learn-ing methods or clustering approaches. Based on theconstructed T , the proposed method can be appliedto the tough problem of deformable contour tracking

Figure 4: Some tracked contours of a single humanbody. The real size of frame image is 720× 576.

Figure 5: Some tracked contours of the human bodyand the barbell. The real size of frame image is 720×576.

6

Page 7: A Supervised Approach to Deformable Contour Tracking Abstractvision.ucsd.edu/tmp/ICCV2005/A1131.pdf · rameters to describe contour, such as a rectangle, an ellipse, etc., we are

with the confidences of high accuracy, more stabilityand robustness.

7 Conclusions

In this paper, we propose a supervised approach todeformable contour tracking. A HMP is employed toguide the tracker to estimate the motion and the de-formation of the target contour. Geometrically, themotion and the deformation are described by a globalaffine transform and thin-plate splines. The samplingspace is limited by using the variation information fromthe hidden states of a HMP and the image observations.Thus, tracking is implemented under supervision. Un-der this framework, we get good results of tracking de-formable contour based on a small number of samples.

The proposed approach can be applied to many situ-ations such as motion analysis and action recognition.Since we add one more HPM to the normal DBN tosupervise the motion and deformation, the proposedapproach can be combined with the approaches of ma-chine learning and shape analysis. How to achieve areal time tracking is one of our further research topics.

References

[1] Chris Stauffer and W.Eric L. Grimson, “Learningpatterns of activity using real time tracking”, IEEETrans. On PAMI, vol. 22, pp. 747–757, 2000.

[2] Michael Isard and Andrew Blake, “Condensation-conditional density propagation for visual tracking”,Int’l J. Computer Vision, vol. 26, pp. 5–28, 1998.

[3] Jun Liu, Rong Chen, and Tanya Logvinenko, “Atheoretical framework for sequential importance sam-pling and resampling”, in Sequential Monte Carlo inPractice, N. de Freitas A.Doucet and N. Gordon, Eds.Springer-Verlag, New York, 2000.

[4] Haili Chui and Anand Rangarajan, “A new algorithmfor non-rigid point matching”, in In Proc. IEEE Int.Conf. on Computer Vision and Pattern Recognition,South Carolina, 2000, pp. 440–51.

[5] Andrew Blake, Ben North, and Michael Isard, “Learn-ing multi-class dynamics”, in Advances in NeuralInformation Processing Systems 11, Cambridge, MA,1999, pp. 389–395.

[6] Matthew Brand, “Pattern discovery viaentropy minimization”, Tech. rep. tr98-21, Mitsubishi Electric Research Lab,http://uncertainty99.microsoft.com/brand.htm,1998.

[7] Christoph Bregler and Stephen M. Omohundro, “Non-linear manifold learning for visual speech recognition”,

in In Proc. of IEEE Int’l Conf. On Computer Vision,Cambridge, MA, 1995, pp. 494–499.

[8] Zoubin Ghahramani and Geoffery E. Hinton, “Vari-ational learning for switching state-space models”,INeural Computation, vol. 12, pp. 831–864, 2000.

[9] Vladimir Pavlovic, James M. Rehg, Tat-Jen Cham,and Kevin P. Murphy, “A dynamic bayesian net-work approach to figure tracking using learned dy-namic models”, in In Proc. of IEEE Int’l Conf. onComputer Vision, Corfu, Greece, 1999, pp. 94–101.

[10] John MacCormick and Andrew Blake, “A probabilis-tic contour discriminant for object localization”, in InProc. of IEEE Int’l Conf. on Computer Vision, Bom-bay, India, 1989, pp. 390–395.

[11] John MacCormick and Andrew Blake, “A probabilisticexclusion principle for tracking multiple objects”, in InProc. of IEEE Int’l Conf. on Computer Vision, Corfu,Greece, 1999, pp. 572–578.

[12] Yunqiang Chen, Yong Rui, and Thomas S. Huang,“Jpdaf based hmm or real-time contour tracking”, inIn Proc. IEEE Int. Conf. on Computer Vision andPattern Recognition, Cambridge, MA, 2001, pp. 543–550.

[13] Ying Wu, Gang Hua, and Ting Yu., “Switching obser-vation models for contour tracking in clutter”, in InProc. IEEE Int. Conf. on Computer Vision and Pat-tern Recognition, Madison Wisconsin, 2003, pp. 295–302.

[14] Arnaud Doucet, Simon Godsill, and Christophe An-drieu, “On sequential monte carlo sampling methodsfor bayesian filtering”, Int’J. Statistics and Comput-ing, vol. 10, pp. 197–208, 2000.

[15] Elise Arnaud and Etienne Memin, “Optimal impor-tance sampling for tracking in image sequences: Ap-plication to point tracking.”, in ECCV, Prague, Czech,2004, pp. 302–314.

[16] Fred L. Bookstein and Principal Warps, “Principalwarps: Thin-plate splines and the decomposition ofdeformations”, IEEE Trans. On PAMI, vol. 2, pp.567–585, 1989.

[17] Stegan Soatto and Anthony J. Yezzi, “Shape averageand the joint registration and segmentation of image”,in In Proc. of European Conf. On Computer Vision,Copenhagen, Danmark, 2002, pp. 32–57.

[18] Daniel P. Huttenlocher, Gregory. A. Klanderman, andWilliam. A. Rucklidge, “Comparing images using thehausdorff distance”, IEEE Trans. On PAMI, vol. 15,pp. 850–863, 1993.

[19] J. R. Bergen, P. Anandan, K. J. Hanna, and R. Hingo-rani, “Hierarchical motion-based motion estimation”,in In Proc. Eurpour Conf. on Computer Vision, OlivierFaugeras, 1992, pp. 237–252.

7