kernel-based object tracking

30
Kernel-Based Object Tracking Dorin Comaniciu Visvanathan Ramesh Peter Meer Real-Time Vision and Modeling Department Siemens Corporate Research 755 College Road East, Princeton, NJ 08540 Electrical and Computer Engineering Department Rutgers University 94 Brett Road, Piscataway, NJ 08854-8058 Abstract A new approach toward target representation and localization, the central component in visual track- ing of non-rigid objects, is proposed. The feature histogram based target representations are regularized by spatial masking with an isotropic kernel. The masking induces spatially-smooth similarity functions suitable for gradient-based optimization, hence, the target localization problem can be formulated us- ing the basin of attraction of the local maxima. We employ a metric derived from the Bhattacharyya coefficient as similarity measure, and use the mean shift procedure to perform the optimization. In the presented tracking examples the new method successfully coped with camera motion, partial occlusions, clutter, and target scale variations. Integration with motion filters and data association techniques is also discussed. We describe only few of the potential applications: exploitation of background information, Kalman tracking using motion models, and face tracking. Keywords: non-rigid object tracking; target localization and representation; spatially-smooth sim- ilarity function; Bhattacharyya coefficient; face tracking. 1 Introduction Real-time object tracking is the critical task in many computer vision applications such as surveil- lance [44, 16, 32], perceptual user interfaces [10], augmented reality [26], smart rooms [39, 75, 47], object-based video compression [11], and driver assistance [34, 4]. Two major components can be distinguished in a typical visual tracker. Target Representa- tion and Localization is mostly a bottom-up process which has also to cope with the changes in the appearance of the target. Filtering and Data Association is mostly a top-down process dealing with the dynamics of the tracked object, learning of scene priors, and evaluation of different hy- potheses. The way the two components are combined and weighted is application dependent and plays a decisive role in the robustness and efficiency of the tracker. For example, face tracking in 1

Upload: duongthuan

Post on 24-Jan-2017

232 views

Category:

Documents


1 download

TRANSCRIPT

Kernel-BasedObject Tracking

Dorin Comaniciu�

VisvanathanRamesh�

PeterMeer�

�Real-TimeVision andModelingDepartment

SiemensCorporateResearch755CollegeRoadEast,Princeton,NJ08540

�ElectricalandComputerEngineeringDepartment

RutgersUniversity94 Brett Road,Piscataway, NJ08854-8058

Abstract

A new approachtowardtargetrepresentationandlocalization,thecentralcomponentin visualtrack-ing of non-rigidobjects,is proposed.Thefeaturehistogrambasedtargetrepresentationsareregularizedby spatialmaskingwith anisotropickernel.Themaskinginducesspatially-smoothsimilarity functionssuitablefor gradient-basedoptimization,hence,the target localizationproblemcanbe formulatedus-ing the basinof attractionof the local maxima. We employ a metric derived from the Bhattacharyyacoefficient assimilarity measure,andusethemeanshift procedureto performtheoptimization.In thepresentedtrackingexamplesthenew methodsuccessfullycopedwith cameramotion,partialocclusions,clutter, andtargetscalevariations.Integrationwith motionfiltersanddataassociationtechniquesis alsodiscussed.We describeonly few of thepotentialapplications:exploitationof backgroundinformation,Kalmantrackingusingmotionmodels,andfacetracking.

Keywords: non-rigidobjecttracking;targetlocalizationandrepresentation;spatially-smoothsim-

ilarity function;Bhattacharyyacoefficient; facetracking.

1 Intr oduction

Real-timeobjecttrackingis thecritical taskin many computervisionapplicationssuchassurveil-

lance[44,16,32], perceptualuserinterfaces[10], augmentedreality[26], smartrooms[39, 75,47],

object-basedvideocompression[11], anddriverassistance[34, 4].

Two majorcomponentscanbedistinguishedin a typical visual tracker. Target Representa-

tion and Localizationis mostly a bottom-upprocesswhich hasalsoto copewith the changesin

theappearanceof thetarget.Filtering andData Associationis mostlya top-down processdealing

with thedynamicsof the trackedobject,learningof scenepriors,andevaluationof differenthy-

potheses.Theway thetwo componentsarecombinedandweightedis applicationdependentand

playsa decisive role in therobustnessandefficiency of thetracker. For example,facetrackingin

1

a crowdedscenereliesmoreon targetrepresentationthanon targetdynamics[21], while in aerial

video surveillance,e.g., [74], the target motion andthe ego-motionof the cameraare the more

importantcomponents.In real-timeapplicationsonly a smallpercentageof thesystemresources

canbeallocatedfor tracking,therestbeingrequiredfor thepreprocessingstagesor to high-level

taskssuchasrecognition,trajectoryinterpretation,andreasoning.Therefore,it is desirableto keep

thecomputationalcomplexity of a tracker aslow aspossible.

Themostabstractformulationof thefilteringanddataassociationprocessis throughthestate

spaceapproach for modelingdiscrete-timedynamicsystems[5]. The informationcharacterizing

the target is definedby the statesequence��������� �� ���������

, whoseevolution in time is specifiedby

the dynamicequation���������������������! "��#

. The availablemeasurements��$��%��������������

arerelatedto

the correspondingstatesthroughthe measurementequation$��&�(')�*���+���!,)��#

. In general,both�!�and

'-�arevector-valued,nonlinearandtime-varyingfunctions. Eachof thenoisesequences,�� "����������������and

��,)���������������is assumedto beindependentandidenticallydistributed(i.i.d.).

The objective of tracking is to estimatethe state���

given all the measurements$��/. �

up

that moment,or equivalently to constructthe probability densityfunction (pdf) 0 �����*1 $2�/. �%# . The

theoreticallyoptimalsolutionis providedby therecursiveBayesianfilter whichsolvestheproblem

in two steps.Thepredictionstepusesthedynamicequationandthealreadycomputedpdf of the

stateat time 3 �547698, 0 ����������1 $2�/. ������# , to derive theprior pdf of the currentstate,0 �����*1 $2�/. �����:# .

Then, the updatestepemploys the likelihood function 0 �;$%��1 ����# of the currentmeasurementto

computetheposteriorpdf 0 ���+��1 $��/. � ).When the noisesequencesare Gaussianand

���and

'<�are linear functions, the optimal

solutionis providedby theKalmanfilter [5, p.56],whichyieldstheposteriorbeingalsoGaussian.

(We will return to this topic in Section6.2.) When the functions�=�

and')�

are nonlinear, by

linearizationtheExtendedKalmanFilter (EKF) [5, p.106]is obtained,theposteriordensitybeing

still modeledasGaussian.A recentalternative to theEKF is theUnscentedKalmanFilter (UKF)

[42] which usesa setof discretelysampledpoints to parameterizethe meanandcovarianceof

the posteriordensity. Whenthe statespaceis discreteandconsistsof a finite numberof states,

HiddenMarkov Models(HMM) filters [60] canbe appliedfor tracking. The mostgeneralclass

of filters is representedby particlefilters [45], alsocalledbootstrapfilters [31], which arebased

on Monte Carlo integrationmethods.The currentdensityof the stateis representedby a setof

2

randomsampleswith associatedweightsandthenew densityis computedbasedon thesesamples

andweights(see[23,3] for reviews). TheUKF canbeemployedto generateproposaldistributions

for particlefilters, in whichcasethefilter is calledUnscentedParticleFilter (UPF)[54].

When the tracking is performedin a clutteredenvironmentwheremultiple targetscanbe

present[52], problemsrelatedto the validation and associationof the measurementsarise [5,

p.150]. Gating techniquesareusedto validateonly measurementswhosepredictedprobability

of appearanceis high. After validation,a strategy is neededto associatethe measurementswith

thecurrenttargets. In additionto theNearestNeighborFilter, which selectstheclosestmeasure-

ment,techniquessuchasProbabilisticDataAssociationFilter (PDAF) areavailablefor thesingle

target case.The underlyingassumptionof the PDAF is that for any given target only onemea-

surementis valid, andtheothermeasurementsaremodeledasrandominterference,that is, i.i.d.

uniformly distributedrandomvariables. The Joint DataAssociationFilter (JPDAF) [5, p.222],

on theotherhand,calculatesthemeasurement-to-targetassociationprobabilitiesjointly acrossall

the targets.A differentstrategy is representedby theMultiple HypothesisFilter (MHF) [63, 20],

[5, p.106]which evaluatesthe probability that a given target gave rise to a certainmeasurement

sequence.TheMHF formulationcanbeadaptedto trackthemodesof thestatedensity[13]. The

dataassociationproblemfor multiple targetparticlefiltering is presentedin [62, 38].

The filtering andassociationtechniquesdiscussedabove wereappliedin computervision

for varioustrackingscenarios.Boykov andHuttenlocher[9] employedtheKalmanfilter to track

vehiclesin anadaptive framework. RosalesandSclaroff [65] usedtheExtendedKalmanFilter to

estimatea 3D object trajectoryfrom 2D imagemotion. Particle filtering wasfirst introducedin

visionastheCondensationalgorithmby IsardandBlake [40]. Probabilisticexclusionfor tracking

multipleobjectswasdiscussedin [51]. Wu andHuangdevelopedanalgorithmto integratemultiple

targetclues[76]. Li andChellappa[48] proposedsimultaneoustrackingandverificationbasedon

particlefilters appliedto vehiclesand faces. Chenet al. [15] usedthe HiddenMarkov Model

formulationfor trackingcombinedwith JPDAF dataassociation.Rui andChenproposedto track

thefacecontourbasedon theunscentedparticlefilter [66]. ChamandRehg[13] applieda variant

of MHF for figuretracking.

Theemphasisin this paperis on theothercomponentof tracking: targetrepresentationand

localization.While thefiltering anddataassociationhave their rootsin controltheory, algorithms

3

for targetrepresentationandlocalizationarespecificto imagesandrelatedto registrationmethods

[72, 64, 56]. Both target localizationandregistrationmaximizesa likelihoodtype function. The

differenceis that in tracking,asopposedto registration,only small changesareassumedin the

locationandappearanceof the target in two consecutive frames. This propertycanbe exploited

to developefficient,gradientbasedlocalizationschemesusingthenormalizedcorrelationcriterion

[6]. Sincethecorrelationis sensitive to illumination, HagerandBelhumeur[33] explicitly mod-

eledthe geometryandillumination changes.The methodwasimprovedby Sclaroff andIsidoro

[67] usingrobustM-estimators.Learningof appearancemodelsby employing a mixtureof stable

imagestructure,motion informationandan outlier process,wasdiscussedin [41]. In a differ-

ent approach,Ferrariet al. [26] presentedan affine tracker basedon planarregionsandanchor

points. Trackingpeople,which risesmany challengesdueto thepresenceof large3D, non-rigid

motion,wasextensively analyzedin [36, 1, 30, 73]. Explicit trackingapproachesof people[69]

aretime-consumingandoften the simplerblob model [75] or adaptive mixture models[53] are

alsoemployed.

Themaincontribution of thepaperis to introduceanew framework for efficient trackingof

non-rigidobjects.Weshow thatbyspatiallymaskingthetargetwith anisotropickernel,aspatially-

smoothsimilarity function canbe definedandthe target localizationproblemis thenreducedto

a searchin the basinof attractionof this function. The smoothnessof the similarity function

allows applicationof a gradientoptimizationmethodwhich yieldsmuchfastertarget localization

comparedwith the(optimized)exhaustivesearch.Thesimilarity betweenthetargetmodelandthe

target candidatesin thenext frameis measuredusingthemetricderived from the Bhattacharyya

coefficient. In our casetheBhattacharyyacoefficient hasthemeaningof a correlationscore.The

new targetrepresentationandlocalizationmethodcanbeintegratedwith variousmotionfiltersand

dataassociationtechniques.We presenttrackingexperimentsin which our methodsuccessfully

copedwith complex cameramotion,partialocclusionof thetarget,presenceof significantclutter

andlargevariationsin targetscaleandappearance.We alsodiscusstheintegrationof background

informationandKalmanfilter basedtracking.

Thepaperis organizedasfollows. Section2 discussesissuesof targetrepresentationandthe

importanceof aspatially-smoothsimilarity function.Section3 introducesthemetricderivedfrom

theBhattacharyyacoefficient. Theoptimizationalgorithmis describedin Section4. Experimental

resultsareshown in Section5. Section6 presentsextensionsof thebasicalgorithmandthenew

4

approachis put in thecontext of computervision literaturein Section7.

2 TargetRepresentation

To characterizethetarget,first a featurespaceis chosen.Thereferencetargetmodelis represented

by its pdf > in thefeaturespace.For example,thereferencemodelcanbechosento bethecolor

pdf of the target. Without lossof generalitythe target model canbe consideredascenteredat

the spatiallocation ? . In the subsequentframea target candidateis definedat location @ , and

is characterizedby the pdf 0 � @ # . Both pdf-s are to be estimatedfrom the data. To satisfy the

low computationalcostimposedby real-timeprocessingdiscretedensities,i.e., A -bin histograms

shouldbeused.Thuswehave

targetmodel: BC �D� B>�E E ��������� FF

E ��� B>�E�G8

targetcandidate: BH � @ #I�J� B0KE � @ #L E ��������� FF

E ��� B0KE�M8ON

Thehistogramis not thebestnonparametricdensityestimate[68], but it sufficesfor our purposes.

Otherdiscretedensityestimatescanbealsoemployed.

Wewill denoteby

BP � @ #OQ PSR BH � @ #L� BC<T (1)

asimilarity functionbetweenBH and BC . Thefunction BP � @ # playstheroleof alikelihoodandits local

maximain the imageindicatethepresenceof objectsin thesecondframehaving representations

similar to BC definedin thefirst frame.If only spectralinformationis usedto characterizethetarget,

thesimilarity functioncanhave largevariationsfor adjacentlocationson theimagelatticeandthe

spatialinformationis lost. To find themaximaof suchfunctions,gradient-basedoptimizationpro-

ceduresaredifficult to applyandonly anexpensiveexhaustivesearchcanbeused.We regularize

thesimilarity functionby maskingtheobjectswith anisotropickernelin thespatialdomain.When

thekernelweights,carryingcontinuousspatialinformation,areusedin definingthefeaturespace

representations,BP � @ # becomesasmoothfunctionin @ .

5

2.1 TargetModel

A targetis representedby anellipsoidalregion in theimage.To eliminatetheinfluenceof different

targetdimensions,all targetsarefirst normalizedto aunit circle. This is achievedby independently

rescalingtherow andcolumndimensionswith UWV and UYX .Let

����Z[\ [ ��������� ]be the normalizedpixel locationsin the region definedasthe target model.

Theregion is centeredat ? . An isotropickernel,with a convex andmonotonicdecreasingkernel

profile4)�_^<#

1, assignssmallerweightsto pixels fartherfrom the center. Using theseweightsin-

creasesthe robustnessof the densityestimationsincethe peripheralpixelsarethe leastreliable,

beingoftenaffectedby occlusions(clutter)or interferencefrom thebackground.

Thefunction `ba"cedgf �+8<N\N�N A associatesto thepixel at location��Z[

theindex ` ����Z[�# of its

bin in thequantizedfeaturespace.Theprobabilityof thefeatureh ��8<N\N�N A in thetargetmodel

is thencomputedas

B>iE �Jj][ ��� 4)�Lk=� Z[Wk d #il R ` ��� Z[\#m6 h T � (2)

wherel

is theKronecker deltafunction.Thenormalizationconstantj

is derivedby imposingthe

conditionFE ��� B>�E �M8

, from where

jJ� 8][ ��� 4-�ik=� Z[Wk d #�

(3)

sincethesummationof deltafunctionsfor h �M8-N�N�N A is equalto one.

2.2 TargetCandidates

Let��� [ [ ��������� ]on

bethenormalizedpixel locationsof thetargetcandidate,centeredat @ in thecurrent

frame.Thenormalizationis inheritedfrom theframecontainingthetargetmodel.Usingthesame

kernelprofile4-�p^<#

, but with bandwidthU , theprobabilityof the featureh ��8-N�N\N A in thetarget

candidateis givenby

B0YE � @ #I�Jjrq]on[ ��� 4 @ 6s� [

Ud l R ` ��� [ #<6 h T � (4)

1Theprofile of akernel t is definedasa function ubvxw y%z�{}|*~D� suchthat t��;��|���u��i���)� � | .6

where jrq�� 8]n[ ��� 4-�ikx� �%�K�q k d # (5)

is thenormalizationconstant.Notethatjrq

doesnot dependon @ , sincethepixel locations� [

are

organizedin a regularlatticeand @ is oneof thelatticenodes.Therefore,jrq

canbeprecalculated

for a given kernel and different valuesof U . The bandwidth U definesthe scaleof the target

candidate,i.e., thenumberof pixelsconsideredin thelocalizationprocess.

2.3 Similarity Function Smoothness

Thesimilarity function(1) inheritsthepropertiesof thekernelprofile4-�p^<#

whenthetargetmodel

andcandidateare representedaccordingto (2) and (4). A differentiablekernelprofile yields a

differentiablesimilarity functionandefficientgradient-basedoptimizationsprocedurescanbeused

for finding its maxima.Thepresenceof thecontinuouskernelintroducesaninterpolationprocess

betweenthelocationson theimagelattice.Theemployedtargetrepresentationsdo not restrictthe

way similarity is measuredandvariousfunctionscanbeusedfor P . See[59] for anexperimental

evaluationof differenthistogramsimilarity measures.

3 Metric basedon Bhattacharyya Coefficient

The similarity function definesa distanceamongtarget modelandcandidates.To accommodate

comparisonsamongvarioustargets,this distanceshouldhave a metric structure.We definethe

distancebetweentwo discretedistributionsas

� � @ #r� 8r6 P�R BH � @ #L� BC<T � (6)

wherewechose

BP � @ #IQ P�R BH � @ #L� BC<T �F

E ��� B0YE � @ # B>iE � (7)

thesampleestimateof theBhattacharyyacoefficientbetweenH and C [43].

TheBhattacharyyacoefficient is adivergence-typemeasure[49] whichhasastraightforward

geometricinterpretation.It is thecosineof theanglebetweenthe A -dimensionalunit vectors� B0 ����N�N�N�� � B0 F �and

� B> �L�\N�N�N!� � B> F �. Thefactthat H and C aredistributionsis thusexplicitly

7

takeninto accountby representingthemontheunit hypersphere.At thesametimewecaninterpret

(7) asthe(normalized)correlationbetweenthevectors� B0 �i��N�N\N\� � B0 F �

and� B> �L��N\N�N\� � B> F �

.

Propertiesof theBhattacharyyacoefficientsuchasits relationto theFishermeasureof information,

quality of thesampleestimate,andexplicit formsfor variousdistributionsaregivenin [22, 43].

Thestatisticalmeasure(6) hasseveraldesirableproperties:

1. It imposesa metric structure(seeAppendix). The Bhattacharyyadistance[28, p.99] or

Kullbackdivergence[19, p.18]arenotmetricssincethey violateat leastoneof thedistance

axioms.

2. It hasa cleargeometricinterpretation.Note that the �<� histogrammetrics(including his-

togramintersection[71]) donot enforcetheconditionsFE ��� B>�E �98

andFE ��� B0KE �M8

.

3. It usesdiscretedensities,andthereforeit is invariantto thescaleof thetarget(up to quanti-

zationeffects).

4. It is valid for arbitrarydistributions,thusbeingsuperiorto the Fisherlinear discriminant,

which yieldsusefulresultsonly for distributionsthatareseparatedby themean-difference

[28, p.132].

5. It approximatesthechi-squaredstatistic,while avoiding thesingularityproblemof thechi-

squaretestwhencomparingemptyhistogrambins[2].

Divergencebasedmeasureswerealreadyusedin computervision. TheChernoff andBhat-

tacharyyaboundshavebeenemployedin [46] to determinetheeffectivenessof edgedetectors.The

Kullbackdivergencebetweenjoint distributionandproductof marginals(e.g.,themutualinforma-

tion) hasbeenusedin [72] for registration.Informationtheoreticmeasuresfor targetdistinctness

werediscussedin [29].

4 TargetLocalization

To find the locationcorrespondingto the target in the currentframe,the distance(6) shouldbe

minimizedasa functionof @ . The localizationprocedurestartsfrom thepositionof the target in

8

theprevious frame(themodel)andsearchesin theneighborhood.Sinceour distancefunction is

smooth,theprocedureusesgradientinformationwhich is providedby themeanshift vector[17].

More involvedoptimizationsbasedon theHessianof (6) canbeapplied[58].

Color informationwaschosenas the target feature,however, the sameframework canbe

usedfor texture and edges,or any combinationof them. In the sequelit is assumedthat the

following informationis available:(a) detectionandlocalizationin theinitial frameof theobjects

to track(targetmodels)[50, 8]; (b) periodicanalysisof eachobjectto accountfor possibleupdates

of thetargetmodelsdueto significantchangesin color [53].

4.1 DistanceMinimization

Minimizing thedistance(6) is equivalentto maximizingtheBhattacharyyacoefficient BP � @ # . The

searchfor thenew target locationin thecurrentframestartsat thelocation B@ of thetarget in the

previous frame. Thus,the probabilities� B0YE � B@ #L E ��������� F of the target candidateat location B@ in

thecurrentframehave to becomputedfirst. UsingTaylorexpansionaroundthevalues B0YE � B@ # , the

linearapproximationof theBhattacharyyacoefficient (7) is obtainedaftersomemanipulationsas

P�R BH � @ #L� BC-T+�8�F

E ��� B0YE � B@ # B>iE��8�F

E ��� B0YE� @ # B>�E

B0YE � B@ #N

(8)

Theapproximationis satisfactorywhenthetargetcandidate� B0YE � @ #L E ��������� F doesnotchangedrasti-

cally from theinitial� B0YE � B@ #L E ��������� F , which is mostoftenavalid assumptionbetweenconsecutive

frames.The condition B0YE � B@ #����(or somesmall threshold)for all h ��8-N�N\N A , canalwaysbe

enforcedby notusingthefeaturevaluesin violation. Recalling(4) resultsin

P�R BH � @ #L� BC-T+�8�F

E ��� B0YE � B@ # B>iE��jrq�] n[ ���"� [ 4 @ 6�� [

Ud

(9)

where

� [ �F

E ���B>iE

B0YE � B@ #l R ` ��� [ #<6 h T N (10)

Thus,to minimizethedistance(6), thesecondtermin (9) hasto bemaximized,thefirst termbeing

independentof @ . Observe that the secondterm representsthe densityestimatecomputedwith

kernelprofile4-�p^<#

at @ in thecurrentframe,with thedatabeingweightedby � [ (10). Themode

of this densityin the local neighborhoodis thesoughtmaximumwhich canbe foundemploying

9

themeanshift procedure[17]. In this procedurethekernelis recursively movedfrom thecurrent

location B@ to thenew location B@ � accordingto therelation

B@ � �]n[ ��� � [ � [�� �@S� � � �q d]on[ ��� � [�� �@m� � � �q d (11)

where� �_^<#��96�4 ��p^<#

, assumingthat thederivativeof4-�_^<#

exists for all^&¡ R �o��¢£# , exceptfor a

finite setof points.Thecompletetargetlocalizationalgorithmis presentedbelow.

BhattacharyyaCoefficient P¤R BH � @ #i� BC<T Maximization

Given:

thetargetmodel� B>iE E ��������� F andits location B@ in thepreviousframe.

1. Initialize the locationof the target in the currentframewith B@ , compute� B0YE � B@ #L E ��������� F ,

andevaluate

P¤R BH � B@ #i� BC<T �FE ��� B0YE � B@ # B>�E N

2. Derive theweights� � [ [ ��������� ]n accordingto (10).

3. Find thenext locationof thetargetcandidateaccordingto (11).

4. Compute� B0YE � B@ � #L E ��������� F , andevaluate

P¤R BH � B@ � #i� BC<T �FE ��� B0YE � B@ � # B>�E N

5. While P�R BH � B@ � #L� BC-T+¥ P�R BH � B@ #L� BC<TDo B@ �I¦

�d� B@ � B@ � #

Evaluate P¤R BH � B@ � #L� BC<T6. If

k B@ � 6 B@ k ¥J§ Stop.

Otherwise Set B@ ¦ B@ � andgo to Step2.

10

4.2 Implementation of the Algorithm

Thestoppingcriterionthreshold§ usedin Step6 is derivedby constrainingthevectors B@ and B@ �to bewithin thesamepixel in original imagecoordinates.A lower thresholdwill inducesubpixel

accuracy. From real-timeconstraints(i.e., uniform CPU load in time), we alsolimit the number

of meanshift iterationsto ¨ FS© V , typically taken equalto 20. In practicethe averagenumberof

iterationsis muchsmaller, about4.

Implementationof thetrackingalgorithmcanbemuchsimplerthanaspresentedabove. The

roleof Step5 is only to avoid potentialnumericalproblemsin themeanshift basedmaximization.

Theseproblemscanappeardueto thelinearapproximationof theBhattacharyyacoefficient. How-

ever, a largesetof experimentstrackingdifferentobjectsfor long periodsof time,hasshown that

theBhattacharyyacoefficient computedat thenew location B@ � failed to increasein only�KNª8!«

of

thecases.Therefore,theStep5 is not usedin practice,andasa result,thereis noneedto evaluate

theBhattacharyyacoefficient in Steps1 and4.

In the practicalalgorithm, we only iterateby computingthe weights in Step2, deriving

the new locationin Step3, andtestingthe sizeof the kernelshift in Step6. The Bhattacharyya

coefficient is computedonly afterthealgorithmcompletionto evaluatethesimilarity betweenthe

targetmodelandthechosencandidate.

Kernelswith Epanechnikov profile [17]

4-�_^<#I� �dY¬���­ � � � � #��i8r6®^<# if

^°¯M8�otherwise

(12)

arerecommendedto beused.In this casethederivative of theprofile,� �_^<#

, is constantand(11)

reducesto

B@ � �]n[ ��� � [ � []n[ ��� � [

�(13)

i.e.,asimpleweightedaverage.

The maximizationof the Bhattacharyyacoefficient can be also interpretedas a matched

filtering procedure. Indeed,(7) is the correlationcoefficient betweenthe unit vectors� BC and

BH � @ # , representingthetargetmodelandcandidate.Themeanshift procedurethusfindsthelocal

maximumof thescalarfield of correlationcoefficients.

11

Will call the operational basinof attraction the region in the currentframe in which the

new location of the target can be found by the proposedalgorithm. Due to the useof kernels

this basinis at leastequalto thesizeof the targetmodel. In otherwords,if in thecurrentframe

the centerof the target remainsin the imageareacoveredby the target model in the previous

frame,thelocalmaximumof theBhattacharyyacoefficient is areliableindicatorfor thenew target

location.Weassumethatthetargetrepresentationprovidessufficientdiscrimination,suchthatthe

Bhattacharyyacoefficientpresentsauniquemaximumin thelocal neighborhood.

The meanshift procedurefinds a root of the gradientas function of location,which can,

however, alsocorrespondto asaddlepointof thesimilarity surface.Thesaddlepointsareunstable

solutions,andsincetheimagenoiseactsasanindependentperturbationfactoracrossconsecutive

frames,they cannotinfluencethetackingperformancein animagesequence.

4.3 AdaptiveScale

Accordingto thealgorithmdescribedin Section4.1, for a giventargetmodel,the locationof the

targetin thecurrentframeminimizesthedistance(6) in theneighborhoodof thepreviouslocation

estimate.However, the scaleof the target often changesin time, andthus in (4) the bandwidth

U of thekernelprofile hasto beadaptedaccordingly. This is possibledueto thescaleinvariance

propertyof (6).

Denoteby UY�:±ª²ª³ thebandwidthin thepreviousframe.WemeasurethebandwidthUY´µ��¶ in the

currentframeby runningthetargetlocalizationalgorithmthreetimes,with bandwidthsU � UY�:±ª²ª³ ,U � UY�:±ª²ª³��¸·¹U , and U � UY�:±ª²ª³ 6 ·¹U . A typical valueis ·¹U �º�KNª8 UW��±µ²�³ . Thebestresult, UY´µ��¶ ,yieldingthelargestBhattacharyyacoefficient,is retained.To avoid over-sensitivescaleadaptation,

thebandwidthassociatedwith thecurrentframeis obtainedthroughfiltering

U ] ²�» ��¼ UY´µ��¶+� �i8r6®¼m# UW��±µ²�³ (14)

wherethedefaultvaluefor¼

is 0.1.Notethatthesequenceof U ] ²�» containsimportantinformation

aboutthedynamicsof thetargetscalewhich canbefurtherexploited.

12

4.4 Computational Complexity

Let N betheaveragenumberof iterationsper frame. In Step2 thealgorithmrequiresto compute

therepresentation� B0YE � @ #i E ��������� F . Weightedhistogramcomputationhasroughly thesamecostas

theunweightedhistogramsincethekernelvaluesareprecomputed.In Step3 thecentroid(13) is

computedwhichinvolvesaweightedsumof itemsrepresentingthesquare-rootof adivisionof two

terms. We concludethat themeancostof theproposedalgorithmfor onescaleis approximately

givenby jr½¾� ¨ � ¬:¿ �®Àq¬�Á# � ¨7À q ¬�Á (15)

where ¬:¿ is thecostof thehistogramand ¬�Á thecostof anaddition,a square-root,anda division.

We assumethat thenumberof histogramentriesA andthenumberof targetpixels À q arein the

samerange.

It is of interestto comparethecomplexity of thenew algorithmwith thatof targetlocalization

without gradientoptimization,asdiscussedin [25]. Thesearchareais assumedto beequalto the

operationalbasinof attraction,i.e., a region covering the targetmodelpixels. Thefirst stepis to

computeÀ q histograms.Assumethateachhistogramis derivedin asquaredwindow of À q pixels.

To implementthecomputationefficiently weobtaina targethistogramandupdateit by sliding the

window À q times(� À q horizontalstepstimes

� À q verticalsteps).For eachmove� � À q additions

areneededto updatethehistogram,hence,theeffort is ¬:¿ �� À q � À q ¬: , where¬: is thecostof an

addition. Thesecondstepis to computeÀ q Bhattacharyyacoefficients. This canalsobedoneby

computingonecoefficient andthenupdatingit sequentially. Theeffort is A ¬�Á �� À q � À q ¬�Á . The

total effort for targetlocalizationwithoutgradientoptimizationis then

jrÃĽÅ�¬:¿ �

� À q � À q ¬� �� A�� � À q � À qo# ¬�Á �

� À q � À q ¬:ÁN

(16)

Theratiobetween(16)and(15) is�<Æ � À qKÇ ¨ . In atypicalsetting(asit will beshown in Section5)

thetargethasaboutÈ �IÉ È � pixels(i.e.,� À q�� È � ) andthemeannumberof iterationsis ¨ �ËÊKNª8!Ì

.

Thustheproposedoptimizationtechniquereducesthecomputationaltime�rÆ È �KÇÊoNª8�Ì � � Ê

times.

An optimizedimplementationof thenew trackingalgorithmhasbeentestedon a1GHzPC.

Thebasicframework with scaleadaptation(which involvesthreeoptimizationsat eachstep)runs

at a rateof 150 fps allowing simultaneoustrackingof up to five targetsin real time. Note that

withoutscaleadaptationsthesenumbersshouldbemultiplied by three.

13

Figure1: Football sequence,trackingplayerno. 75 . The frames30, 75, 105,140,and150areshown.

0 50 100 150

2

4

6

8

10

12

14

16

18

Frame Index

Mea

n S

hift

Itera

tions

Figure2: The numberof meanshift iterationsfunction of the frameindex for the Football se-quence.Themeannumberof iterationsis

ÊoNª8�Ìperframe.

5 Experimental Results

The kernel-basedvisual tracker wasappliedto many sequencesand it is integratedinto several

applications.Herewe just presentsomerepresentative results.In all but the lastexperimentsthe

RGB color spacewastaken asfeaturespaceandit wasquantizedinto8�Í}ÉD8�ÍÎÉJ8!Í

bins. The

algorithmwasimplementedasdiscussedin Section4.2. TheEpanechnikov profile wasusedfor

histogramcomputationsandthemeanshift iterationswerebasedon weightedaverages.

TheFootballsequence(Figure1) has8 È Ê framesof ÏoÈ � É � Ê� pixels,andthemovementof

playerno. 75 wastracked. The targetwasinitialized with a hand-drawn elliptical region (frame

14

−40−20

020

40

−40−2002040

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

XY

Bha

ttach

aryy

a C

oeffi

cien

t

Initial location Convergence point

Figure3: The similarity surface(valuesof the Bhattacharyyacoefficient) correspondingto therectanglemarkedin frame

8�� È of Figure1. Theinitial andfinal locationsof themeanshift itera-tionsarealsoshown.

Ï � ) of size Ð 8gÉ ÈoÏ (yielding normalizationconstantsequalto� UYV � UYX #Ñ��� ÏÈ � � Ío#�# . Thekernel-

basedtracker provesto berobust to partialocclusion,clutter, distractors(frame8�Êo�

) andcamera

motion. Sinceno motionmodelhasbeenassumed,the tracker adaptedwell to thenonstationary

characterof the player’s movementswhich alternatesabruptlybetweenslow andfastaction. In

addition,theintenseblurringpresentin someframesdueto thecameramotion,doesnot influence

the tracker performance(frame8 È � ). Thesameconditions,however, canlargely perturbcontour

basedtrackers.

Thenumberof meanshift iterationsnecessaryfor eachframe(onescale)is shown in Fig-

ure2. Thetwo centralpeakscorrespondto themovementof theplayerto thecenterof theimage

andbackto theleft side.Thelastandlargestpeakis dueto a fastmove from theleft to theright.

In all thesecasestherelative largemovementbetweentwo consecutive framesputsmoreburden

on themeanshift procedure.

To demonstratethe efficiency of our approach,Figure3 presentsthe surfaceobtainedby

computingthe Bhattacharyyacoefficient for the Ò 8¹É Ò 8 pixels rectanglemarked in Figure 1,

frame8�� È . The targetmodel(theelliptical region selectedin frame Ï � ) hasbeencomparedwith

thetargetcandidatesobtainedby sweepingin frame8�� È theelliptical region insidetherectangle.

The surfaceis asymmetricdueto neighboringcolors that aresimilar to the target. While most

of thetrackingapproachesbasedon regions[7, 27, 50] mustperformanexhaustive searchin the

rectangleto find themaximum,our algorithmconvergedin four iterationsasshown in Figure3.

Notethattheoperationalbasinof attractionof themodecoverstheentirerectangularwindow.

15

Figure4: Subway-1sequence.Theframes3140,3516,3697,5440,6081,and6681areshown.

3000 3500 4000 4500 5000 5500 6000 6500 70000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Frame Index

d

Figure5: The minimum valueof distance�

function of the frame index for the Subway-1se-quence.

16

Figure6: Subway-2sequence.Theframes30,240,330,450,510,and600areshown.

In controlledenvironmentswith fixedcamera,additionalgeometricconstraints(suchasthe

expectedscale)andbackgroundsubtraction[24] canbeexploitedto improvethetrackingprocess.

TheSubway-1sequence( Figure4) is suitablefor suchanapproach,however, theresultspresented

herehasbeenprocessedwith the algorithmunchanged.This is a�

minutesequencein which a

personis trackedfrom themomentsheentersthesubwayplatformtill shegetson thetrain (about

3600frames).The trackingis mademorechallengingby the low quality of thesequencedueto

imagecompressionartifacts.Notethechangesin thesizeof thetrackedtarget.

Theminimumvalueof thedistance(6) for eachframe,i.e., thedistancebetweenthetarget

modelandthechosencandidate,is shown in Figure5. Thecompressionnoiseelevatestheresidual

distancevaluefrom�

(perfectmatch)to a about�oN Ï . Significantdeviationsfrom this valuecorre-

spondto occlusionsgeneratedby otherpersonsor rotationsin depthof thetarget(largechangesin

therepresentation).For example,the� � �KN_Í

peakcorrespondsto thepartialocclusionin frame

Ï ÍoÌ Ð . At the endof the sequence,the personbeingtracked getson the train, which producesa

completeocclusion.

TheshorterSubway-2sequence(Figure6) of about600framesis evenmoredifficult since

17

Figure7: Mug sequence.Theframes60,150,240,270,360,and960areshown.

thecameraquality is worseandtheamountof compressionis higher, introducingclearlyvisible

artifacts.Nevertheless,thealgorithmwasstill ableto trackapersonthroughtheentiresequence.

In theMugsequence(Figure7) of about1000framestheimageof acup(frame60)wasused

astargetmodel.Thenormalizationconstantswere� UYV �ËÊoÊK� UWX �JÍoÊo#

. Thetracker wastestedfor

fastmotion (frame150),dramaticchangesin appearance(frame270), rotations(frame270)and

scalechanges(frames360-960).

6 Extensionsof the Tracking Algorithm

We presentthreeextensionsof the basicalgorithm: integrationof the backgroundinformation,

Kalmantracking,andanapplicationto facetracking.It shouldbeemphasized,however, thatthere

aremany otherpossibilitiesthroughwhich thevisualtrackercanbefurtherenhanced.

6.1 Background-WeightedHistogram

The backgroundinformation is importantfor at leasttwo reasons.First, if someof the target

featuresare also presentin the background,their relevancefor the localizationof the target is

diminished. Second,in many applicationsit is difficult to exactly delineatethe target, and its

model might containbackgroundfeaturesas well. At the sametime, the improperuseof the

18

backgroundinformationmayaffect thescaleselectionalgorithm,makingimpossibleto measure

similarity acrossscales,hence,to determinetheappropriatetargetscale.Ourapproachis to derive

asimplerepresentationof thebackgroundfeatures,andto useit for selectingonly thesalientparts

from therepresentationsof thetargetmodelandtargetcandidates.

Let� BÓ E E ��������� F (with

FE ��� BÓ E �Ô8

) bethediscreterepresentation(histogram)of theback-

groundin thefeaturespaceand BÓÕ beits smallestnonzeroentry. This representationis computed

in a region aroundthe target. The extent of the region is applicationdependentandwe usedan

areaequalto threetimesthetargetarea.Theweights

Ö E �Ë×7Ø�Ù BÓ�ÕBÓ E�o8

E ��������� F (17)

aresimilar in conceptto the ratio histogramcomputedfor backprojection[71]. However, in our

casetheseweightsareonly employedto defineatransformationfor therepresentationsof thetarget

modelandcandidates.Thetransformationdiminishestheimportanceof thosefeatureswhichhave

low Ö E , i.e.,areprominentin thebackground.Thenew targetmodelrepresentationis thendefined

by

B>�E �Ëj Ö E][ ��� 4-�ik=� Z[Yk d #il R ` �;� Z[�#<6 h T (18)

with thenormalizationconstantj

expressedas

j¸� 8][ ��� 4-�Lk=� Z[ k d #FE ��� Ö E l R ` ��� Z[ #<6 h T

N(19)

Comparewith (2) and(3). Similarly, thenew targetcandidaterepresentationis

B0YE � @ #O�Jjrq Ö E]on[ ��� 4 @ 6�� [

Ud l R ` ��� [ #<6 h T (20)

wherenowjrq

is givenby

jrqg� 8] n[ ��� 4-�Lkx� �%�o�q k d #FE ��� Ö E l R ` ��� [ #<6 h T

N(21)

An importantbenefitof usingbackground-weightedhistogramsis shown for the Ball se-

quence(Figure8). Themovementof theping-pongball from frameto frameis largerthanits size.

Applying the techniquedescribedabove, the target modelcanbe initialized with a� 8gÉ Ï 8 size

region (frame2), largerthanoneobtainedby accuratetargetdelineation.Thelargerregion yields

19

Figure8: Ball sequence.Theframes2, 12,16,26,32,40,48,and51 areshown.

a satisfactoryoperationalbasinof attraction,while theprobabilitiesof thecolorsthatarepartof

thebackgroundareconsiderablyreduced.Theball is reliably trackedover theentiresequenceofÍo�frames.

Thelastexampleis alsotakenfrom theFootball sequence.This time theheadandshoulder

of playerno. È Ì is tracked (Figure9). Note the changesin target appearancealong the entire

sequenceandtherapidmovementsof thetarget.

6.2 Kalman Prediction

It was alreadymentionedin Section1 that the Kalman filter assumesthat the noisesequences "�and

,-�areGaussianandthe functions

�!�and

'-�arelinear. The dynamicequationbecomes���Ú�¸Û��Ü����� � "����� while themeasurementequationis$��Ú�¸ÝÞ��� � ,)� . Thematrix

Ûis calledthe

systemmatrix andÝ

is themeasurementmatrix. As in thegeneralcase,theKalmanfilter solves

thestateestimationproblemin two steps:predictionandupdate.For moredetailssee[5, p.56].

Thekernel-basedtargetlocalizationmethodwasintegratedwith theKalmanfiltering frame-

work. For a fasterimplementation,two independenttrackersweredefinedfor horizontalandver-

tical movement. A constant-velocity dynamicmodelwith accelerationaffectedby white noise

20

Figure9: Football sequence,trackingplayerno. 59. The frames70, 96, 108,127,140,147areshown.

[5, p.82] hasbeenassumed.The uncertaintyof the measurementshasbeenestimatedaccording

to [55]. The ideais to normalizethe similarity surfaceandrepresentit asa probability density

function.Sincethesimilarity surfaceis smooth,for eachfilter only 3 measurementsaretakeninto

account,oneat theconvergencepoint (peakof thesurface)andtheothertwo atadistanceequalto

half of thetargetdimension,measuredfrom thepeak.We fit a scaledGaussianto thethreepoints

andcomputethemeasurementuncertaintyasthestandarddeviationof thefitted Gaussian.

A first setof trackingresultsincorporatingtheKalmanfilter is presentedin Figure 10 for

the 120 framesHand sequencewherethe dynamicmodel is assumedto be affectedby a noise

with standarddeviation equalto È . Thesizeof thegreencrossmarkedon thetarget indicatesthe

stateuncertaintyfor thetwo trackers.Observe thattheoverall algorithmis ableto trackthetarget

(hand)in thepresenceof completeocclusionby asimilarobject(theotherhand).Thepresenceof a

similarobjectin theneighborhoodincreasesthemeasurementuncertaintyin frame46,determining

an increasein stateuncertainty. In Figure 11a we presentthe measurements(dotted)and the

estimatedlocationof the target (continuous). Note that the graphis symmetricwith respectto

the numberof framessincethe sequencehasbeenplayedforward andbackward. The velocity

associatedwith thetwo trackersis shown in Figure11b.

A secondsetof resultsshowing trackingwith Kalmanfilter is displayedin Figure12. The

sequencehas8 ÒoÐ framesof Ï � �}É � Êo� pixels eachandthe initial normalizationconstantswere

21

Figure10: Handsequence.Theframes40,46,50,and57areshown.

0 20 40 60 80 100 120190

200

210

220

230

240

250

260

270

Frame Index

Mea

sure

men

t and

Est

imat

ed L

ocat

ion

0 20 40 60 80 100 120−6

−4

−2

0

2

4

6

Frame Index

Est

imat

ed V

eloc

ity

(a) (b)

Figure11: Measurementsandestimatedstatefor Hand sequence.(a) The measurementvalue(dottedcurve) andtheestimatedlocation(continuouscurve) functionof the frameindex. Uppercurvescorrespondto the y filter, while the lower curvescorrespondto thex filter. (b) Estimatedvelocity. Dottedcurve is for they filter andcontinuouscurve is for thex filter

22

Figure12: Subway-3sequence:Theframes470,529,580,624,and686areshown.

� UYV � UWX #e���i88\�K8 Ò # . Observe theadaptationof thealgorithmto scalechanges.Theuncertaintyof

thestateis indicatedby thegreencross.

6.3 FaceTracking

Weappliedtheproposedframework for real-timefacetracking.Thefacemodelis anelliptical re-

gionwhosehistogramis representedin theintensitynormalizedß � spacewith8 � Ò É�8 � Ò bins.To

adaptto fastscalechangeswe alsoexploit thegradientinformationin thedirectionperpendicular

to the borderof the hypothesizedfaceregion. The scaleadaptationis thusdeterminedby maxi-

mizing the sumof two normalizedscores,basedon color andgradientfeatures,respectively. In

Figure13wepresentthecapabilityof thekernel-basedtracker to handlescalechanges,thesubject

turningaway (frame150),in-planerotationsof thehead(frame498),andforeground/background

saturationdue to back-lighting(frame 576). The tracked faceis shown in the small upper-left

window.

23

Figure13: Facesequence:Theframes39,150,163,498,576,and619areshown.

7 Discussion

The kernel-basedtracking techniqueintroducedin this paperusesthe basinof attractionof the

similarity function. This functionis smoothsincethetargetrepresentationsarederivedfrom con-

tinuousdensities.Severalexamplesvalidatetheapproachandshow its efficiency. Extensionsof

the basicframework were presentedregardingthe useof backgroundinformation, Kalman fil-

tering, and facetracking. The new techniquecanbe further combinedwith moresophisticated

filtering andassociationapproachessuchasmultiplehypothesistracking[13].

Centroidcomputationhasbeenalsoemployed in [53]. Themeanshift is usedfor tracking

humanfacesby projectingthehistogramof a facemodelontotheincomingframe[10]. However,

thedirectprojectionof themodelhistogramonto thenew framecanintroducea largebiasin the

estimatedlocationof the target, andthe resultingmeasureis scalevariant(see[37, p.262] for a

discussion).

We mentionthatsinceits original publicationin [18] the ideaof kernel-basedtrackinghas

beenexploited anddevelopedforward by variousresearchers.ChenandLiu [14] experimented

with thesamekernel-weightedhistograms,but employedtheKullback-Leiblerdistanceasdissim-

ilarity while performingtheoptimizationbasedon trust-region methods.HaritaogluandFlickner

24

[35] usedanappearancemodelbasedoncolorandedgedensityin conjunctionwith akerneltracker

for monitoringshoppinggroupsin stores.Yilmazetal. [78] combinedkerneltrackingwith global

motion compensationfor forward-lookinginfrared(FLIR) imagery. Xu andFujimura[77] used

night vision for pedestriandetectionandtracking,wherethedetectionis performedby a support

vectormachineandthetrackingis kernel-based.Raoet al.[61] employedkerneltrackingin their

systemfor actionrecognition,while Caenenet al. [12] followed the sameprinciple for texture

analysis.Thebenefitsof guidingrandomparticlesby gradientoptimizationarediscussedin [70]

andaparticlefilter for colorhistogramtrackingbasedon themetric(6) is implementedin [57].

Finally we would like to add a word of caution. The tracking solution presentedin this

paperhasseveraldesirableproperties:it is efficient,modular, hasstraightforwardimplementation,

and provides superiorperformanceon most imagesequences.Nevertheless,we note that this

techniqueshouldnot be appliedblindly. Instead,the versatility of the basicalgorithm should

be exploited to integratea priori information which is almostalways availablewhena specific

applicationis considered.For example,if themotionof the target from frameto frameis known

to be larger thanthe operationalbasinof attraction,oneshouldinitialize the tracker in multiple

locationsin theneighborhoodof basinof attraction,accordingto themotionmodel. If occlusions

arepresent,oneshouldemploy a moresophisticatedmotion filter. Similarly, oneshouldverify

that the chosentarget representationis sufficiently discriminantfor the applicationdomain. The

kernel-basedtrackingtechnique,whencombinedwith prior task-specificinformation,canachieve

reliableperformance.2

APPENDIX

Proof that the distance� � BH � BC #r� 8r6 P � BH � BC # is a metric

Theproof is basedon thepropertiesof theBhattacharyyacoefficient (7). Accordingto the

Jensen’s inequality[19, p.25]wehave

P � BH � BC #<�F

E ��� B0KE B>iE �F

E ��� B0KEB>�EB0YE¯

F

E ��� B>�E�à8\�

(A.1)

2A patentapplicationhasbeenfiled covering the tracking algorithm togetherwith the extensionsand variousapplications[79].

25

with equalityif f BH � BC . Therefore,� � BH � BC #e� 8I6 P � BH � BC # existsfor all discretedistributions BH

and BC , is positive,symmetric,andis equalto zeroif f BH � BC .

ToprovethetriangleinequalityconsiderthreediscretedistributionsdefiningtheA -dimensional

vectors BH , BC , and Bá , associatedwith thepoints â � � � B0 �i��N�N�N�� � B0 F �, â*ã � � B> �L��N\N�N\� � B> F �

,

and â ± � � Bß �L��N\N�N�� � Bß F �on the unit hypersphere.From the geometricinterpretationof the

Bhattacharyyacoefficient, thetriangleinequality

� � BH � Bá # � � � BC � Bá #Iä � � BH � BC # (A.2)

is equivalentto

8r6¬ Ó�å

� â � � â ± # � 8r6¬ Ó�å

� â ã � â ± #Iä 8r6¬ Ó�å

� â � � â ã #æN (A.3)

If we fix the points â � and â ã , andthe anglebetweenâ � and â ± , the left sideof inequality

(A.3) is minimizedwhenthevectorsâ � , â ã , andâ ± lie in thesameplane.Thus,theinequality(A.3)

canbereducedto a�-dimensionalproblemthatcanbeeasilyprovenby employing thehalf-angle

sinusformulaanda few trigonometricmanipulations.

Acknowledgment

A preliminaryversionof thepaperwaspresentedatthe2000IEEEConferenceonComputerVision

and Pattern Recognition, Hilton Head,SC. PeterMeer wassupportedby the National Science

FoundationundertheawardIRI 99-87695.

References

[1] J.Aggarwal andQ.Cai,“Humanmotionanalysis:A review,” ComputerVisionandImageUnderstand-ing, vol. 73,pp.428–440,1999.

[2] F. Aherne,N. Thacker, andP. Rockett, “The Bhattacharyyametricasan absolutesimilarity measurefor frequency codeddata,” Kybernetika, vol. 34,no.4, pp.363–368,1998.

[3] S. Arulampalam,S. Maskell, N. Gordon,andT. Clapp,“A tutorial on particlefilters for on-linenon-linear/non-GaussianBayesiantracking,” IEEE Trans.Signal Process., vol. 50, no. 2, pp. 174–189,2002.

[4] S. Avidan, “Supportvectortracking,” in Proc. IEEE Conf. on ComputerVision andPatternRecogni-tion, Kauai,Hawaii, volumeI, 2001,pp.184–191.

[5] Y. Bar-ShalomandT. Fortmann,TrackingandDataAssociation. AcademicPress,1988.

26

[6] B. Bascleand R. Deriche,“Region tracking throughimagesequences,” in Proc. 5th Intl. Conf. onComputerVision,Cambridge,MA, 1995,pp.302–307.

[7] S. Birchfield, “Elliptical headtrackingusingintensitygradientsandcolor histograms,” in Proc. IEEEConf. on ComputerVisionandPatternRecognition,SantaBarbara,CA, 1998,pp.232–237.

[8] M. BlackandD. Fleet,“Probabilisticdetectionandtrackingof motionboundaries,” Intl. J. of ComputerVision, vol. 38,no.3, pp.231–245,2000.

[9] Y. Boykov andD. Huttenlocher, “Adaptive Bayesianrecognitionin trackingrigid objects,” in Proc.IEEE Conf. onComputerVisionandPatternRecognition,Hilton Head,SC,2000,pp.697–704.

[10] G. R. Bradski,“Computervision facetrackingasacomponentof aperceptualuserinterface,” in Proc.IEEE WorkshoponApplicationsof ComputerVision,Princeton,NJ,October1998,pp.214–219.

[11] A. D. Bue,D. Comaniciu,V. Ramesh,andC. Regazzoni,“Smartcameraswith real-timevideoobjectgeneration,” in Proc. IEEE Intl. Conf. on Image Processing, Rochester, NY, volume III, 2002, pp.429–432.

[12] G. Caenen,V. Ferrari,A. Zalesny, andL. VanGool,“Analyzing the layoutof compositetextures,” inProceedingsTexture 2002Workshop,Copenhagen,Denmark,2002,pp.15–19.

[13] T. ChamandJ. Rehg,“A multiple hypothesisapproachto figure tracking,” in Proc. IEEE Conf. onComputerVision andPatternRecognition,Fort Collins,CO,volumeII, 1999,pp.239–219.

[14] H. ChenandT. Liu, “Trust-regionmethodsfor real-timetracking,” in Proc.8thIntl. Conf. onComputerVision,Vancouver, Canada,volumeII, 2001,pp.717–722.

[15] Y. Chen,Y. Rui, andT. Huang,“JPDAF-basedHMM for real-timecontourtracking,” in Proc. IEEEConf. on ComputerVisionandPatternRecognition,Kauai,Hawaii, volumeI, 2001,pp.543–550.

[16] R. Collins, A. Lipton, H. Fujiyoshi,andT. Kanade,“Algorithmsfor cooperative multisensorsurveil-lance,” Proceedingsof theIEEE, vol. 89,no.10,pp.1456–1477,2001.

[17] D. Comaniciuand P. Meer, “Mean shift: A robust approachtoward featurespaceanalysis,” IEEETrans.PatternAnal.MachineIntell., vol. 24,no.5, pp.603–619,2002.

[18] D. Comaniciu,V. Ramesh,andP. Meer, “Real-timetrackingof non-rigid objectsusingmeanshift,”in Proc. IEEE Conf. on ComputerVision andPatternRecognition, Hilton Head,SC,volumeII, June2000,pp.142–149.

[19] T. CoverandJ.Thomas,Elementsof InformationTheory. JohnWiley & Sons,New York, 1991.

[20] I. CoxandS.Hingorani,“An efficient implementationof Reid’smultiplehypothesistrackingalgorithmanditsevaluationfor thepurposeof visualtracking,” IEEETrans.PatternAnal.MachineIntell., vol. 18,no.2, pp.138–150,1996.

[21] D. DeCarloandD. Metaxas,“Optical flow constraintsondeformablemodelswith applicationsto facetracking,” Intl. J. of ComputerVision, vol. 38,no.2, pp.99–127,2000.

[22] A. Djouadi, O. Snorrason,and F. Garber, “The quality of training-sampleestimatesof the Bhat-tacharyyacoefficient,” IEEE Trans.PatternAnal.MachineIntell., vol. 12,pp.92–97,1990.

[23] A. Doucet,S. Godsill, andC. Andrieu, “On sequentialMonteCarlosamplingmethodsfor Bayesianfiltering,” StatisticsandComputing, vol. 10,no.3, pp.197–208,2000.

[24] A. Elgammal,D. Harwood, and L. Davis, “Non-parametricmodel for backgroundsubtraction,” inProc.EuropeanConf. on ComputerVision,Dublin, Ireland,volumeII, June2000,pp.751–767.

[25] F. EnnesserandG. Medioni, “Finding Waldo, or focusof attentionusing local color information,”IEEE Trans.PatternAnal.MachineIntell., vol. 17,no.8, pp.805–809,1995.

27

[26] V. Ferrari,T. Tuytelaars,andL. V. Gool, “Real-timeaffine region trackingandcoplanargrouping,” inProc. IEEE Conf. on ComputerVision andPatternRecognition, Kauai,Hawaii, volumeII, 2001,pp.226–233.

[27] P. FieguthandD. Teropoulos,“Color-basedtrackingof headsandothermobileobjectsat videoframerates,” in Proc.IEEEConf. onComputerVisionandPatternRecognition,SanJuan,PuertoRico,1997,pp.21–27.

[28] K. Fukunaga,Introductionto StatisticalPatternRecognition. AcademicPress,secondedition,1990.

[29] J. Garcia,J. Valdivia, and X. Vidal, “Information theoreticmeasurefor visual target distinctness,”IEEE Trans.PatternAnal.MachineIntell., vol. 23,no.4, pp.362–383,2001.

[30] D. Gavrila, “The visualanalysisof humanmovement:A survey,” ComputerVisionandImage Under-standing, vol. 73,pp.82–98,1999.

[31] N. Gordon,D. Salmond,andA. Smith, “A novel approachto non-linearandnon-GaussianBayesianstateestimation,” IEE Proceedings-F, vol. 140,pp.107–113,1993.

[32] M. Greiffenhagen,D. Comaniciu,H. Niemann,andV. Ramesh,“Design,analysisandengineeringofvideomonitoringsystems:An approachanda casestudy,” Proceedingsof the IEEE, vol. 89, no. 10,pp.1498–1517,2001.

[33] G. Hagerand P. Belhumeur, “Real-time tracking of imageregions with changesin geometryandillumination,” in Proc. IEEE Conf. on ComputerVision andPatternRecognition, SanFrancisco,CA,1996,pp.403–410.

[34] U. Handmann,T. Kalinke, C. Tzomakas,M. Werner, andW. von Seelen,“Computervision for driverassistancesystems,” in ProceedingsSPIE, volume3364,1998,pp.136–147.

[35] I. HaritaogluandM. Flickner, “Detectionandtrackingof shoppinggroupsin stores,” in Proc. IEEEConf. on ComputerVisionandPatternRecognition,Kauai,Hawaii, 2001.

[36] I. Haritaoglu,D. Harwood,andL. Davis, “W4: Who? When?Where?What? - A real time systemfor detectingandtrackingpeople,” in Proc.of Intl. Conf. on AutomaticFaceandGesture Recognition,Nara,Japan,1998,pp.222–227.

[37] J. Huang,S. Kumar, M. Mitra, W. Zhu, andR. Zabih,“Spatialcolor indexing andapplications,” Intl.J. of ComputerVision, vol. 35,no.3, pp.245–268,1999.

[38] C. Hue,J.Cadre,andP. Perez,“SequentialMonteCarlofiltering for multiple target trackinganddatafusion,” IEEETrans.SignalProcess., vol. 50,no.2, pp.309–325,2002.

[39] S. Intille, J. Davis, andA. Bobick, “Real-timeclosed-world tracking,” in Proc. IEEE Conf. on Com-puterVisionandPatternRecognition,SanJuan,PuertoRico,1997,pp.697–703.

[40] M. IsardandA. Blake, “Condensation- Conditionaldensitypropagationfor visual tracking,” Intl. J.of ComputerVision, vol. 29,no.1, 1998.

[41] A. Jepson,D. Fleet,andT. El-Maraghi, “Robust online appearancemodelsfor visual tracking,” inProc. IEEEConf. onComputerVisionandPatternRecognition,Hawaii, volumeI, 2001,pp.415–422.

[42] S.JulierandJ.Uhlmann,“A new extensionof theKalmanfilter to nonlinearsystems,” in Proc.SPIE,volume3068,1997,pp.182–193.

[43] T. Kailath, “The divergenceandBhattacharyyadistancemeasuresin signalselection,” IEEE Trans.Commun.Tech., vol. 15,pp.52–60,1967.

[44] V. Kettnaker andR. Zabih,“Bayesianmulti-camerasurveillance,” in Proc. IEEE Conf. on ComputerVisionandPatternRecognition,Fort Collins,CO,1999,pp.253–259.

28

[45] G. Kitagawa, “Non-Gaussianstate-spacemodelingof nonstationarytime series,” J. of Amer. Stat.Assoc., vol. 82,pp.1032–1063,1987.

[46] S.Konishi,A. Yuille, J.Coughlan,andS.Zhu, “Fundamentalboundson edgedetection:An informa-tion theoreticevaluationof differentedgecues,” in Proc. IEEE Conf. on ComputerVision andPatternRecognition,Fort Collins,1999,pp.573–579.

[47] J. Krumm, S. Harris, B. Meyers, B. Brumitt, M. Hale, and S. Shafer, “Multi-cameramulti-persontrackingfor EasyLiving,” in Proc. IEEE Intl. Workshopon Visual Surveillance, Dublin, Ireland,2000,pp.3–10.

[48] B. Li andR. Chellappa,“Simultaneoustrackingandverificationvia sequentialposteriorestimation,”in Proc. IEEE Conf. on ComputerVision andPatternRecognition, Hilton Head,SC,volumeII, 2000,pp.110–117.

[49] J.Lin, “Di vergencemeasuresbasedontheShannonentropy,” IEEETrans.InformationTheory, vol. 37,pp.145–151,1991.

[50] A. Lipton, H. Fujiyoshi,andR. Patil, “Moving targetclassificationandtrackingfrom real-timevideo,”in IEEEWorkshopon Applicationsof ComputerVision,Princeton,NJ,1998,pp.8–14.

[51] J.MacCormickandA. Blake, “A probabilisticexclusionprinciplefor trackingmultiple objects,” Intl.J. of ComputerVision, vol. 39,no.1, pp.57–71,2000.

[52] R. Mahler, “Engineeringstatisticsfor multi-object tracking,” in Proc. IEEE WorkshopMulti-ObjectTracking, 2001.

[53] S. McKenna,Y. Raja,andS. Gong,“Trackingcolourobjectsusingadaptive mixturemodels,” ImageandVision ComputingJournal, vol. 17,pp.223–229,1999.

[54] R. Merwe, A. Doucet, N. Freitas,and E. Wan, “The unscentedparticle filter,” TechnicalReportCUED/F-INFENG/TR380,CambridgeUniversityEngineeringDepartment,2000.

[55] K. Nickels and S. Hutchinson,“Estimatinguncertaintyin SSD-basedfeaturetracking,” Image andVisionComputing, vol. 20,pp.47–58,2002.

[56] C. Olson, “Image registrationby aligning entropies,” in Proc. IEEE Conf. on ComputerVision andPatternRecognition,Kauai,Hawaii, volumeII, 2001,pp.331–336.

[57] P. Perez,C.Hue,J.Vermaak,andM. Gangnet,“Color-basedprobabilistictracking,” in Proc.EuropeanConf. on ComputerVision,Copenhagen,Denmark,volumeI, 2002,pp.661–675.

[58] W. Press,S.Teukolsky, W. Vetterling,andB. Flannery, NumericalRecipesin C. CambridgeUniversityPress,secondedition,1992.

[59] J.Puzicha,Y. Rubner, C.Tomasi,andJ.Buhmann,“Empirical evaluationof dissimilaritymeasuresforcolorandtexture,” in Proc.7th Intl. Conf. onComputerVision,Kerkyra,Greece,1999,pp.1165–1173.

[60] L. Rabiner, “A tutorial on hiddenMarkov modelsandselectedapplicationsin speechrecognition,”Proceedingsof theIEEE, vol. 77,no.2, pp.257–285,1989.

[61] C. Rao,A. Yilmaz,andM. Shah,“View-invariantrepresentationandrecognitionof actions,” Intl. J. ofComputerVision, vol. 50,no.2, p. To appear, 2003.

[62] C. RasmussenandG. Hager, “Probabilisticdataassociationmethodsfor trackingcomplex visualob-jects,” IEEE Trans.PatternAnal.MachineIntell., vol. 23,no.6, pp.560–576,2001.

[63] D. Reid,“An algorithmfor trackingmultiple targets,” IEEETrans.AutomaticControl, vol. AC-24,pp.843–854,1979.

29

[64] A. Roche,G.Malandain,andN. Ayache,“Unifying maximumlikelihoodapproachesin medicalimageregistration,” TechnicalReport3741,INRIA, 1999.

[65] R. RosalesandS.Sclaroff, “3D trajectoryrecovery for trackingmultipleobjectsandtrajectoryguidedrecognitionof actions,” in Proc.IEEEConf. onComputerVisionandPatternRecognition,Fort Collins,CO,1999,pp.117–123.

[66] Y. Rui andY. Chen,“Better proposaldistributions: Objecttrackingusingunscentedparticlefilter,” inProc. IEEE Conf. on ComputerVision andPatternRecognition, Kauai,Hawaii, volumeII, 2001,pp.786–793.

[67] S.Sclaroff andJ. Isidoro,“Active blobs,” in Proc.6th Intl. Conf. on ComputerVision,Bombay, India,1998,pp.1146–1153.

[68] D. W. Scott,MultivariateDensityEstimation. Wiley, 1992.

[69] C. SminchisescuandB. Triggs, “Covariancescaledsamplingfor monocular3D body tracking,” inProc. IEEE Conf. on ComputerVision and Pattern Recognition, Kauai,Hawaii, volumeI, 2001,pp.447–454.

[70] J.SullivanandJ.Rittscher, “Guiding randomparticlesby deterministicsearch,” in Proc.8thIntl. Conf.on ComputerVision,Vancouver, Canada,volumeI, 2001,pp.323–330.

[71] M. SwainandD. Ballard,“Color indexing,” Intl. J. of ComputerVision, vol. 7, no.1, pp.11–32,1991.

[72] P. Viola and W. Wells, “Alignment by maximizationof mutual information,” Intl. J. of ComputerVision, vol. 24,no.2, pp.137–154,1997.

[73] S. WachterandH. Nagel,“Trackingpersonsin monocularimagesequences,” ComputerVision andImage Understanding, vol. 74,no.3, pp.174–192,1999.

[74] R. Wildes, R. Kumar, H. Sawhney, S. Samasekera, S. Hsu, H. Tao, Y. Guo, K. Hanna,A. Pope,D. Hirvonen,M. Hansen,andP. Burt, “Aerial videosurveillanceandexploitation,” Proceedingsof theIEEE, vol. 89,no.10,pp.1518–1539,2001.

[75] C. Wren, A. Azarbayejani,T. Darrell, andA. Pentland,“Pfinder: Real-timetrackingof the humanbody,” IEEE Trans.PatternAnal.MachineIntell., vol. 19,pp.780–785,1997.

[76] Y. Wu andT. Huang,“A co-inferenceapproachto robusttracking,” in Proc.8thIntl. Conf. onComputerVision,Vancouver, Canada,volumeII, 2001,pp.26–33.

[77] F. Xu andK. Fujimura,“Pedestriandetectionandtrackingwith nightvision,” in Proc.IEEEIntelligentVehicleSymposium,Versailles,France,2002.

[78] A. Yilmaz, K. Shafique,N. Lobo, X. Li, T. Olson,andM. Shah,“Target trackingin FLIR imageryusingmeanshift andglobal motion compensation,” in IEEE Workshopon ComputerVision BeyondVisible Spectrum,Kauai,Hawaii, 2001.

[79] “Real-timetrackingof non-rigidobjectsusingmeanshift.” US patentpending,2000.

30