submitted 1 visual tracking via boolean map representations

11
SUBMITTED 1 Visual Tracking via Boolean Map Representations Kaihua Zhang, Qingshan Liu, and Ming-Hsuan Yang Abstract—In this paper, we present a simple yet effective Boolean map based representation that exploits connectivity cues for visual tracking. We describe a target object with histogram of oriented gradients and raw color features, of which each one is characterized by a set of Boolean maps generated by uniformly thresholding their values. The Boolean maps effectively encode multi-scale connectivity cues of the target with different granularities. The fine-grained Boolean maps capture spatially structural details that are effective for precise target localization while the coarse-grained ones encode global shape information that are robust to large target appearance variations. Finally, all the Boolean maps form together a robust representation that can be approximated by an explicit feature map of the intersection kernel, which is fed into a logistic regression classifier with online update, and the target location is estimated within a particle filter framework. The proposed representation scheme is computationally efficient and facilitates achieving favorable performance in terms of accuracy and robustness against the state-of-the-art tracking methods on a large benchmark dataset of 50 image sequences. Index Terms—Visual tracking, Boolean map, logistic regres- sion. I. I NTRODUCTION Object tracking is a fundamental problem in computer vision and image processing with numerous applications. Despite significant progress in past decades, it remains a challenging task due to large appearance variations caused by illumination changes, partial occlusion, deformation, as well as cluttered background. To address these challenges, a robust representation plays a critical role for the success of a visual tracker, and attracts much attention in recent years [1]. Numerous representation schemes have been developed for visual tracking based on holistic and local features. Lucas and Kanade [2] leverage holistic templates based on raw pixel values to represent target appearance. Matthews et al. [3] design an effective template update scheme that uses stable information from the first frame for visual tracking. In [4] Henriques et al. propose a correlation filter based template (trained with raw intensity) for visual tracking with promis- ing performance. Zhang et al. [5] propose a multi-expert restoration scheme to address the drift problem in tracking, in which each base tracker leverages an explicit feature map representation via quantizing the CIE LAB color channels of spatially sampled image patches. To deal with appear- ance changes, subspace learning based trackers have been Kaihua Zhang and Qingshan Liu are with Jiangsu Key Laboratory of Big Data Analysis Technology (B-DAT), Nanjing University of Information Science and Technology. E-mail: {cskhzhang, qsliu}@nuist.edu.cn. Ming-Hsuan Yang is with Electrical Engineering and Computer Science, University of California, Merced, CA, 95344. E-mail: [email protected]. proposed. Black and Jepson [6] develop a pre-learned view- based eigenbasis representation for visual tracking. However, the pre-trained representation cannot adapt well to significant target appearance variations. In [7] Ross et al. propose an incremental update scheme to learn a low-dimensional sub- space representation. Recently, numerous tracking algorithms based on sparse representation have been proposed. Mei and Ling [8] devise a dictionary of holistic intensity templates with target and trivial templates, and then find the location of the object with minimal reconstruction error via solving an 1 minimization problem. Zhang et al. [9] formulate visual tracking as a multi-task sparse learning problem, which learns particle representations jointly. In [10] Wang et al. introduce 1 regularization into the eigen-reconstruction to develop an effective representation that combines the merits of both subspace and sparse representations. In spite of demonstrated success of exploiting global rep- resentations for visual tracking, existing methods are less ef- fective in dealing with heavy occlusion and large deformation as local visual cues are not taken into account. Consequently, local representations are developed to handle occlusion and deformation. Adam et al. [11] propose a fragment-based tracking method that divides a target object into a set of local regions and represents each region with a histogram. In [12], He et al. present a locality sensitive histogram for visual tracking by considering the contributions of local regions at each pixel, which can model target appearance well. Babenko et al. [13] formulate the tracking task as a multiple instance learning problem, in which Haar-like features are used to represent target appearance. Hare et al. [14] pose visual tracking as a structure learning task and leverage Haar-like features to describe target appearance. In [15] Henriques et al. propose an algorithm based on a kernel correlation filter (KCF) to describe target templates with feature maps based on histogram of oriented gradients (HOG) [16]. This method has been shown to achieve promising performance on the recent tracking benchmark dataset [17] in terms of accuracy and efficiency. Kwon and Lee [18] present a tracking method that represents target appearance with a set of local patches where the topology is updated to account for large shape deformation. Jia et al. [19] propose a structural sparse representation scheme by dividing a target object into some local image patches in a regular grid and using the coefficients to analyze occlusion and deformation. Hierarchical representation methods that capture holistic and local object appearance have been developed for visual tracking [20]–[23]. Zhong et al. [20] propose a sparse collab- orative appearance model for visual tracking in which both holistic templates and local representations are used. Li and Zhu [21] extend the KCF tracker [15] with a scale adaptive arXiv:1610.09652v1 [cs.CV] 30 Oct 2016

Upload: others

Post on 03-Feb-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SUBMITTED 1 Visual Tracking via Boolean Map Representations

SUBMITTED 1

Visual Tracking via Boolean Map Representations

Kaihua Zhang Qingshan Liu and Ming-Hsuan Yang

AbstractmdashIn this paper we present a simple yet effectiveBoolean map based representation that exploits connectivity cuesfor visual tracking We describe a target object with histogramof oriented gradients and raw color features of which eachone is characterized by a set of Boolean maps generated byuniformly thresholding their values The Boolean maps effectivelyencode multi-scale connectivity cues of the target with differentgranularities The fine-grained Boolean maps capture spatiallystructural details that are effective for precise target localizationwhile the coarse-grained ones encode global shape informationthat are robust to large target appearance variations Finallyall the Boolean maps form together a robust representationthat can be approximated by an explicit feature map of theintersection kernel which is fed into a logistic regression classifierwith online update and the target location is estimated withina particle filter framework The proposed representation schemeis computationally efficient and facilitates achieving favorableperformance in terms of accuracy and robustness against thestate-of-the-art tracking methods on a large benchmark datasetof 50 image sequences

Index TermsmdashVisual tracking Boolean map logistic regres-sion

I INTRODUCTION

Object tracking is a fundamental problem in computervision and image processing with numerous applicationsDespite significant progress in past decades it remains achallenging task due to large appearance variations caused byillumination changes partial occlusion deformation as wellas cluttered background To address these challenges a robustrepresentation plays a critical role for the success of a visualtracker and attracts much attention in recent years [1]

Numerous representation schemes have been developed forvisual tracking based on holistic and local features Lucas andKanade [2] leverage holistic templates based on raw pixelvalues to represent target appearance Matthews et al [3]design an effective template update scheme that uses stableinformation from the first frame for visual tracking In [4]Henriques et al propose a correlation filter based template(trained with raw intensity) for visual tracking with promis-ing performance Zhang et al [5] propose a multi-expertrestoration scheme to address the drift problem in trackingin which each base tracker leverages an explicit feature maprepresentation via quantizing the CIE LAB color channelsof spatially sampled image patches To deal with appear-ance changes subspace learning based trackers have been

Kaihua Zhang and Qingshan Liu are with Jiangsu Key Laboratory ofBig Data Analysis Technology (B-DAT) Nanjing University of InformationScience and Technology E-mail cskhzhang qsliunuisteducn

Ming-Hsuan Yang is with Electrical Engineering and ComputerScience University of California Merced CA 95344 E-mailmhyangucmercededu

proposed Black and Jepson [6] develop a pre-learned view-based eigenbasis representation for visual tracking Howeverthe pre-trained representation cannot adapt well to significanttarget appearance variations In [7] Ross et al propose anincremental update scheme to learn a low-dimensional sub-space representation Recently numerous tracking algorithmsbased on sparse representation have been proposed Mei andLing [8] devise a dictionary of holistic intensity templateswith target and trivial templates and then find the locationof the object with minimal reconstruction error via solving an`1 minimization problem Zhang et al [9] formulate visualtracking as a multi-task sparse learning problem which learnsparticle representations jointly In [10] Wang et al introduce`1 regularization into the eigen-reconstruction to develop aneffective representation that combines the merits of bothsubspace and sparse representations

In spite of demonstrated success of exploiting global rep-resentations for visual tracking existing methods are less ef-fective in dealing with heavy occlusion and large deformationas local visual cues are not taken into account Consequentlylocal representations are developed to handle occlusion anddeformation Adam et al [11] propose a fragment-basedtracking method that divides a target object into a set of localregions and represents each region with a histogram In [12]He et al present a locality sensitive histogram for visualtracking by considering the contributions of local regions ateach pixel which can model target appearance well Babenkoet al [13] formulate the tracking task as a multiple instancelearning problem in which Haar-like features are used torepresent target appearance Hare et al [14] pose visualtracking as a structure learning task and leverage Haar-likefeatures to describe target appearance In [15] Henriques etal propose an algorithm based on a kernel correlation filter(KCF) to describe target templates with feature maps based onhistogram of oriented gradients (HOG) [16] This method hasbeen shown to achieve promising performance on the recenttracking benchmark dataset [17] in terms of accuracy andefficiency Kwon and Lee [18] present a tracking method thatrepresents target appearance with a set of local patches wherethe topology is updated to account for large shape deformationJia et al [19] propose a structural sparse representation schemeby dividing a target object into some local image patches ina regular grid and using the coefficients to analyze occlusionand deformation

Hierarchical representation methods that capture holisticand local object appearance have been developed for visualtracking [20]ndash[23] Zhong et al [20] propose a sparse collab-orative appearance model for visual tracking in which bothholistic templates and local representations are used Li andZhu [21] extend the KCF tracker [15] with a scale adaptive

arX

iv1

610

0965

2v1

[cs

CV

] 3

0 O

ct 2

016

SUBMITTED 2

Fig 1 Boolean map representation For clarity only the BMRs of a positive and negative sample are demonstrated Note that the Booleanmaps contain more connected structures than the LAB+HOG representations

scheme and effective color features In [22] Wang et aldemonstrate that a simple tracker based on logistic regres-sion with a representation composed of HOG and raw colorchannels performs favorably on the benchmark dataset [17]Ma et al [23] exploit features from hierarchical layers ofa convolutional neural network and learn an effective KCFwhich takes account of spatial details and semantics of targetobjects for visual tracking

In biological vision it has been suggested that objecttracking is carried out by attention mechanisms [24] [25]Global topological structure such as connectivity is used tomodel tasks related to visual attention [26] [27] Howeverall aforementioned representations do not consider topologicalstructure for visual tracking

In this work we propose a Boolean map based repre-sentation (BMR) that leverages connectivity cues for visualtracking One case of connectivity is the enclosure topologicalrelationship between the (foreground) figure and ground whichdefines the boundaries of figures Recent gestalt psychologicalstudies suggest that the enclosure topological cues play animportant role in figure-ground segregation and have beensuccessfully applied to saliency detection [28] and measuringobjectness [29] [30] The proposed BMR scheme character-izes target appearance by concatenating multiple layers ofBoolean maps at different granularities based on uniformlythresholding HOG and color feature maps The fine-grainedBoolean maps capture locally spatial structural details thatare effective for precise localization and coarse-grained oneswhich encode much global shape information to account forsignificant appearance variations The Boolean maps are thenconcatenated and normalized to a BMR scheme that can be justapproximated by an explicate feature map We learn a logisticregression classifier using the BMR scheme and online updateto estimate target locations within a particle filter frameworkThe effectiveness of the proposed algorithm is demonstratedon a large tracking benchmark dataset with 50 challengingvideos [17] against the state-of-the-art approaches

The main contributions of this work are summarized asfollowsbull We demonstrate that the connectivity cues can be effec-

tively used for robust visual tracking

bull We show that the BMR scheme can be approximated asan explicit feature map of the intersection kernel whichcan find a nonlinear classification boundary via a linearclassifier In addition it is easy to train and detect forrobust visual tracking with this approach

bull The proposed tracking algorithm based on the BMRscheme performs favorably in terms of accuracy and ro-bustness to initializations based on the benchmark datasetwith 50 challenging videos [17] against 35 methods in-cluding the state-of-the-art trackers based on hierarchicalfeatures from deep networks [23] and multiple expertswith entropy minimization (MEEM) [5]

II TRACKING VIA BOOLEAN MAP REPRESENTATIONS

We present the BMR scheme and a logistic regressionclassifier with online update for visual tracking

A Boolean Map Representation

The proposed image representation is based on recentfindings of human visual attention [31] which shows that mo-mentary conscious awareness of a scene can be represented byBoolean maps The Boolean maps are concerned with center-surround contrast that mimic the sensitivity of neurons eitherto dark centers on bright surrounds or vice versa [32] Specifi-cally we exploit the connectivity cues inside a target measuredby the Boolean maps which can be used for separating theforeground object from the background effectively [26] [28]ndash[30] As demonstrated in Figure 1 the connectivity inside atarget can be well captured by the Boolean maps at differentscales

Neurobiological studies have demonstrated that human vi-sual system is sensitive to color and edge orientations [33]which provide useful cues to discriminate the foregroundobject from the background In this work we use color featuresin the CIE LAB color space and HOG features to representobjects To extract the perceptually uniform color features wefirst normalize each sample x to a canonical size (32times32 in ourexperiments) and then subsample it to a half size to reduceappearance variations and finally transform the sample intothe CIE LAB color space denoted as Φcol(x) isin Rncoltimesncoltimes3

SUBMITTED 3

Fig 2 Right two columns reconstructed LAB+HOG representationsof the target by BMRs in our experiments Left two columns the cor-responding prototypes shown in Figure 1 Some reconstructed oneswith more connected structures than their prototypes are highlightedin yellow

(ncol = 16 in this work) Furthermore we leverage the HOGfeatures to capture edge orientation information of a targetobject denoted as Φhog(x) isin Rnhogtimesnhogtimes31 (nhog = 4 inthis work) Figure 1 demonstrates that most color and HOGfeature maps of the target own center-surrounded patterns thatare similar to biologically plausible architecture of primatesin [32] We normalize both Φcol(x) and Φhog(x) to rangefrom 0 to 1 and concatenate Φcol(x) and Φhog(x) to forma feature vector φ(x) isin Rdtimes1 with d = 3n2col + 31n2hog Thefeature vector is rescaled to [0 1] by

φ(x)larr φ(x)minusmin(φ(x))

max(φ(x))minusmin(φ(x)) (1)

where max(middot) and min(middot) denotes the maximal and minimaloperators respectively

Next φ(x) in (1) is encoded into a set of vectorized Booleanmaps B(x) = bi(x)ci=1 by

bi(x) =

1 φ(x) θi0 otherwise

(2)

where θi sim U(0 1) is a threshold drawn from a uniformdistribution over [0 1] and the symbol denotes elementwiseinequality In this work we set θi = ic that is simply sampledat a fixed-step size δ = 1c and a fixed-step sampling isequivalent to the uniform sampling in the limit δ rarr 0 [28]Hence we have b1(x) b2(x) bc(x) It is easy toshow that

0 le φk(x)minus1

c

csumj=1

bjk(x) lt δ (3)

where φk and bjk are the k-th entries of φ and bj respec-tively

Proof Without loss of generality we assume that iδ leφk(x) lt (1+ i)δ i = 0 c As such we have bjk(x) = 1for all j le i because b1(x) b2(x) bc(x) andbjk(x) = 0 for j gt i Therefore we have 1

c

sumcj=1 bjk(x) =

iδ and 0 le φk(x)minus 1c

sumcj=1 bjk(x) lt (1+ i)δminus iδ = δ

In (3) when δ rarr 0 (ie θi sim U(0 1)) we have

φk(x) =1

c

csumj=1

bjk(x) (4)

In this work we set δ = 025 Although (4) may notbe strictly satisfied empirical results show that most distinct

structures in φ(x) can be reconstructed as demonstrated inFigure 2 Furthermore the reconstructed representations con-tain more connected structures than the original ones (seethe ones highlighted in yellow in Figure 2) which showsthat the Boolean maps facilitate capturing global geometricinformation of target objects

Based on (4) to measure the similarity between two samplesx and y we use the intersection function [34]

I(φ(x)φ(y)) =dsumk=1

min(φk(x) φk(y))

=

dsumk=1

min(1

c

csumj=1

bjk(x)1

c

csumj=1

bjk(y))

=1

c

dsumk=1

csumj=1

min(bjk(x) bjk(y))

=1

c

dsumk=1

csumj=1

bjk(x)bjk(y)

=lt b(x) b(y) gt

(5)

where b = [bgt1 bgtc ]gtradicc

To avoid favoring larger input sets [34] we normalizeI(φ(x)φ(y)) in (5) and define the kernel I(φ(x)φ(y)) as

I(φ(x)φ(y)) = I(φ(x)φ(y))radicI(φ(x)φ(x))I(φ(y)φ(y))

=lt b(x)b(y) gt

(6)

where b(middot) is an explicit feature map function In this workthe feature map function is defined by

b(x) =b(x)

|b(x)|12 (7)

where |middot|12 is an `2 norm operator We use b(x) to train a linearclassifier which is able to address the nonlinear classificationproblem in the feature space of φ for visual tracking withfavorable performance The proposed tracking algorithm basedon BMR is summarized in Algorithm 1

B Learning Linear Classifier with BMRs

We pose visual tracking as a binary classification problemwith local search in which a linear classifier is learned inthe Boolean map feature space to separate the target from thebackground Specifically we use a logistic regressor to learnthe classifier for measuring similarity scores of samples

Let lt(xit) isin R2 denote the location of the i-th sample at

frame t We assume that lt(xt) is the object location anddensely draw samples Dα = x|||lt(x)minus lt(xt)|| lt α withina search radius α centered at the current object location andlabel them as positive samples Next we uniformly samplesome patches from set Dζβ = x||ζ lt ||lt(x) minus lt(xt)|| ltβ and label them as negative samples After representingthese samples with BMRs we obtain a set of training dataDt = (b(xit) yit)

nti=1 where yit isin +1minus1 is the class

label and nt is the number of samples The cost function at

SUBMITTED 4

Algorithm 1 BMR

Input Normalized image patch x1) Compute feature vector φ(x) in (1)2) for all entries φi(x) i = 1 d of φ(x) do3) for θi = δ δ 1minus δ do4) if φi(x) gt θi5) bi(x) = 16) else7) bi(x) = 08) end if9) end for

10) end for11) b(x)larr [b1(x) bcd(x)]

gtradicc

12) b(x)larr b(x)|b(x)|12 NormalizationOutput BMR b(x)

Algorithm 2 BMR-based Tracking

Input Target state stminus1 classifier parameter vector wt1) Sample np candidate particles sit

np

i=1 with the motionmodel p(sit |stminus1) in (12)

2) For each particle sit extract the corresponding imagepatch xit and compute the BMR b(xit) by Algo-rithm 1 and compute the corresponding observationmodel p(ot|sit) by (13)

3) Estimate the optimal state st by (12) and obtain thecorresponding image patch xt

4) if f(xt) lt ρ5) Update wt by iterating (10) until convergence and

set wt+1 larr wt6) else7) wt+1 larr wt8) end if

Output Target state st and classifier parameter vectorwt+1

frame t is defined as the negative log-likelihood for logisticregression

`t(w) =1

nt

ntsumi=1

log(1 + exp(minusyitwgtb(xit))) (8)

where w is the classifier parameter vector and the correspond-ing classifier is denoted as

f(x) =1

1 + exp(minuswgtb(x)) (9)

We use a gradient descent method to minimize `t(w) byiterating

wlarr w minus part`t(w)

partw (10)

where part`t(w)partw minus 1

nt

sumnt

i=1 b(xit)yit exp(minusy

itw

gtb(xit))

1+exp(minusyitwgtb(xit))

In thiswork we use the parameter wtminus1 obtained at frame tminus 1 toinitialize w in (10) and iterate 20 times for updates

C Proposed Tracking Algorithm

We estimate the target states sequentially within a particlefilter framework Given the observation set Ot = oiti=1 upto frame t the target sate st is obtained by maximizing theposteriori probability

st = argmaxstp(st|Ot) prop p(ot|st)p(st|Otminus1) (11)

where p(st|Otminus1) intp(st|stminus1)p(stminus1|Otminus1)dstminus1 st =

[xt yt st] is the target state with translations xt and yt andscale st p(st|stminus1) is a dynamic model that describes thetemporal correlation of the target states in two consecutiveframes and p(ot|st) is the observation model that estimatesthe likelihood of a state given an observation In the proposedalgorithm we assume that the target state parameters areindependent and modeled by three scalar Gaussian distri-butions between two consecutive frames ie p(st|stminus1) =N (st|stminus1Σ) where Σ = diag(σx σy σs) is a diagonalcovariance matrix whose elements are the standard deviationsof the target state parameters In visual tracking the posteriorprobability p(st|Ot) in (11) is approximated by a finite setof particles sit

np

i=1 that are sampled with correspondingimportance weights πit

np

i=1 where πit prop p(ot|sit) Therefore(11) can be approximated as

st = arg maxsit

npi=1

p(ot|sit)p(sit|stminus1) (12)

In our method the observation model p(ot|sit) is defined as

p(ot|sit) prop f(xit) (13)

where f(xit) is the logistic regression classifier defined by (9)To adapt to target appearance variations while preserving

the stable information that helps prevent the tracker fromdrifting to background we update the classifier parametersw in a conservative way We update w by (10) only whenthe confidence of the target falls below a threshold ρ Thisensures that the target states always have high confidencescores and alleviate the problem of including noisy sampleswhen updating classifier [22] The main steps of the proposedalgorithm are summarized in Algorithm 2

III EXPERIMENTAL RESULTS

We first present implementation details of the proposedalgorithm and discuss the dataset and metrics for perfor-mance evaluation Next we analyze the empirical results usingwidely-adopted metrics We present ablation study to examinethe effectiveness of each key component in the proposed BMRscheme Finally we show and analyze some failure cases

A Implementation Details

All images are resized to a fixed size of 240times320 pixels [22]for experiments and each patch is resized to a canonical size of32times32 pixels In addition each canonical patch is subsampledto a half size with ncol = 16 for color representations TheHOG features are extracted from the canonical patches thatsupports both gray and color images and the sizes of HOGfeature maps are the same as nhog times nhog times 31 = 4times 4times 31(as implemented in httpgithubcompdollartoolbox)

SUBMITTED 5

For grayscale videos the original image patches are usedto extract raw intensity and HOG features and the featuredimension d = 4times4times31+16times16 = 752 For color videos theimage patches are transformed to the CIE LAB color space toextract raw color features and the original RGB image patchesare used to extract HOG features The corresponding totaldimension d = 4times 4times 31+16times 16times 3 = 1264 The numberof Boolean maps is set to c = 4 and the total dimensionof BMRs is 3d = 2256 for gray videos and 3792 for colorvideos and the sampling step δ = 1c = 025 The searchradius for positive samples is set to α = 3 The inner searchradius for negative samples is set to 03min(w h) where wand h are the weight and height of the target respectivelyand the outer search radius β = 100 where the searchstep is set to 5 which generates a small subset of negativesamples The target state parameter set for particle filter is setto [σx σy σs] = [6 6 001] and the number of particles is setto np = 400 The confidence threshold is set to ρ = 09 Allparameter values are fixed for all sequences and the sourcecode will be made available to the public More results andvideos are available at httpkaihuazhangnetbmrbmrhtm

B Dataset and Evaluation Metrics

For performance evaluation we use the tracking benchmarkdataset and code library [17] which includes 29 trackers and50 fully-annotated videos In addition we also add the corre-sponding results of 6 most recent trackers including DLT [35]DSST [36] KCF [15] TGPR [37] MEEM [5] and HCF [23]For detailed analysis the sequences are annotated with 11attributes based on different challenging factors including lowresolution (LR) in-plane rotation (IPR) out-of-plane rotation(OPR) scale variation (SV) occlusion (OCC) deformation(DEF) background clutters (BC) illumination variation (IV)motion blur (MB) fast motion (FM) and out-of-view (OV)

We quantitatively evaluate the trackers with success andprecision plots [17] Given the tracked bounding box BT andthe ground truth bounding box BG the overlap score is definedas score = Area(BT

⋂BG)

Area(BT

⋃BG) Hence 0 le score le 1 and a

larger value of score means a better performance of the eval-uated tracker The success plot demonstrates the percentageof frames with score gt t through all threshold t isin [0 1]Furthermore the area under curve (AUC) of each success plotserves as a measure to rank the evaluated trackers On theother hand the precision plot shows the percentage of frameswhose tracked locations are within a given threshold distance(ie 20 pixels in [17]) to the ground truth Both success andprecision plots are used in the one-pass evaluation (OPE)temporal robustness evaluation (TRE) and spatial robustnessevaluation (SRE) where OPE reports the average precision orsuccess rate by running the trackers through a test sequencewith initialization from the ground truth position and TRE aswell as SRE measure a trackerprimes robustness to initializationwith temporal and spatial perturbations respectively [17] Wereport the OPE TRE and SRE results For presentation claritywe only present the top 10 algorithms in each plot

C Empirical Results

1) Overall Performance Figure 3 shows overall perfor-mance of the top 10 trackers in terms of success and precisionplots The BMR-based tracking algorithm ranks first on thesuccess rate of all OPE and second based on TRE andSRE Furthermore the BMR-based method ranks third basedon the precision rates of OPE TRE and SRE Overall theproposed BMR-based tracker performs favorably against thestate-of-the-art methods in terms of all metrics except forMEEM [5] and HCF [23] The MEEM tracker exploits a multi-expert restoration scheme to handle the drift problem whichcombines a tracker and the historical snapshots as expertsIn contrast even using only a logistic regression classifierwithout using any restoration strategy the proposed BMR-based method performs well against MEEM in terms of mostmetrics (ie the success rates of the BMR-based methodoutperform the MEEM scheme while the precision rates of theBMR-based method are comparable to the MEEM scheme)which shows the effectiveness of the proposed representationscheme for visual tracking In addition the HCF method isbased on deep learning which leverages complex hierarchicalconvolutional features learned off-line from a large datasetand correlation filters for visual tracking Notwithstanding theproposed BMR-based algorithm performs comparably againstHCF in terms of success rates on all metrics

2) Attribute-based Performance To demonstrate thestrength and weakness of BMR we further evaluate the 35trackers on videos with 11 attributes categorized by [17]

Table I and II summarize the results of success and precisionscores of OPE with different attributes Among them theBMR-based method ranks within top 3 with most attributesSpecifically with the success rate of OPE the BMR-basedmethod ranks first on 4 out of 11 attributes while second on 6out of 11 attributes In the sequences with the BC attribute theBMR-based method ranks third and its score is close to theMEEM scheme that ranks second (0555 vs 0569) For theprecision scores of OPE the BMR-based method ranks secondon 4 out of 11 attributes and third on 3 out of 11 attributes Inthe sequences with the OV attribute the BMR-based trackerranks first and for the videos with the IPR and BC attributesthe proposed tracking algorithm ranks fourth with comparableperformance to the third-rank DSST and KCF methods

Table III and IV show the results of TRE with differentattributes The BMR-based method ranks within top 3 withmost attributes In terms of success rates the BMR-basedmethod ranks first on 2 attributes second on 3 attributes andthird on 6 attributes In terms of precision rates the BMR-based tracker ranks third on 7 attributes and first and secondon the OV and OCC attributes respectively Furthermore forother attributes such as LR and BC the BMR-based trackingalgorithm ranks fourth but it scores are close to the results ofMEEM and KCF that rank third (0581 vs 0598 and 0772vs 0776)

Table V and VI show the results of SRE with differentattributes In terms of success rates the rankings of the BMR-based method are similar to those based on TRE except forthe IPR and OPR attributes Among them the BMR-based

SUBMITTED 6

Fig 3 Success and precision plots of OPE TRE and SRE by the top 10 trackers The trackers are ranked by the AUC scores (shown inthe legends) when the success rates are used or precession cores at the threshold of 20 pixels

TABLE I Success score of OPE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0409 0557 0360 0310 0352 0370 0279 0372 0309 0157 0256IPR (31) 0557 0582 0535 0497 0532 0479 0458 0444 0416 0425 0383

OPR (39) 0590 0587 0558 0496 0491 0485 0470 0432 0420 0422 0393SV (28) 0586 0531 0498 0427 0451 0418 0518 0425 0421 0452 0458

OCC (29) 0615 0606 0552 0513 0480 0484 0487 0413 0402 0376 0384DEF (19) 0594 0626 0560 0533 0474 0510 0448 0393 0378 0372 0330

BC (21) 0555 0623 0569 0533 0492 0522 0450 0458 0345 0408 0327IV (25) 0551 0560 0533 0494 0506 0484 0473 0428 0399 0429 0392

MB (12) 0559 0616 0541 0499 0458 0434 0298 0433 0404 0258 0329FM (17) 0559 0578 0553 0461 0433 0396 0296 0462 0417 0247 0353

OV (6) 0616 0575 0606 0550 0490 0442 0361 0459 0457 0312 0409

TABLE II Precision scores of OPE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0517 0897 0490 0379 0534 0538 0305 0545 0349 0156 0303IPR (31) 0776 0868 0800 0725 0780 0675 0597 0617 0584 0511 0510

OPR (39) 0819 0869 0840 0730 0732 0678 0618 0597 0596 0518 0527SV (28) 0803 0880 0785 0680 0740 0620 0672 0639 0606 0552 0606

OCC (29) 0846 0877 0799 0749 0725 0675 0640 0564 0563 0460 0495DEF (19) 0802 0881 0846 0741 0657 0691 0586 0521 0512 0445 0512

BC (21) 0742 0885 0797 0752 0691 0717 0578 0585 0428 0496 0440IV (25) 0742 0844 0766 0729 0741 0671 0594 0558 0537 0517 0492

MB (12) 0755 0844 0715 0650 0603 0537 0339 0551 0518 0278 0427FM (17) 0758 0790 0742 0602 0562 0493 0333 0604 0551 0253 0435

OV (6) 0773 0695 0727 0649 0533 0505 0429 0539 0576 0333 0505

tracker ranks third based on SRE and second based on TREFurthermore although the MEEM method ranks higher thanthe BMR-based tracker in most attributes the differences ofthe scores are within 1 In terms of precision rates the BMR-based algorithm ranks within top 3 with most attributes exceptfor the LR DEF and IV attributes

The AUC score of success rate measures the overall perfor-

mance of each tracking method [17] Figure 3 shows that theBMR-based method achieves better results in terms of successrates than that precision rates in terms of all metrics (OPESRE TRE) and attributes The tracking performance can beattributed to two factors First the proposed method exploits alogistic regression classifier with explicit feature maps which

SUBMITTED 7

TABLE III Success scores of TRE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0444 0520 0424 0382 0403 0443 0304 0456 0299 0278 0324IPR (31) 0562 0591 0558 0520 0515 0514 0453 0473 0406 0451 0423

OPR (39) 0578 0595 0572 0531 0507 0523 0480 0477 0425 0465 0428SV (28) 0564 0544 0517 0488 0473 0468 0496 0446 0418 0487 0448

OCC (29) 0585 0610 0566 0547 0519 0520 0502 0462 0426 0444 0426DEF (19) 0599 0651 0611 0571 0548 0577 0515 0500 0425 0466 0399

BC (21) 0575 0631 0577 0565 0518 0530 0469 0478 0372 0445 0366IV (25) 0555 0597 0564 0528 0529 0518 0475 0486 0402 0468 0427

MB (12) 0537 0594 0553 0493 0472 0483 0290 0485 0388 0296 0349FM (17) 0516 0560 0542 0456 0429 0461 0282 0464 0392 0285 0350

OV (6) 0593 0557 0581 0539 0505 0440 0344 0417 0434 0325 0403

TABLE IV Precision scores of TRE with 11 attributes The number after each attribute name is the number of sequences The red blueand green fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0581 0750 0589 0501 0574 0602 0350 0628 0376 0325 0391IPR (31) 0767 0851 0802 0728 0725 0716 0581 0650 0569 0582 0572

OPR (39) 0789 0859 0826 0749 0719 0728 0617 0660 0597 0605 0584SV (28) 0769 0840 0787 0727 0717 0676 0633 0652 0600 0634 0594

OCC (29) 0791 0854 0788 0758 0726 0705 0633 0631 0579 0560 0550DEF (19) 0798 0889 0854 0757 0723 0765 0635 0655 0571 0571 0556

BC (21) 0772 0874 0793 0776 0697 0721 0600 0622 0488 0575 0517IV (25) 0747 0851 0792 0729 0727 0693 0585 0643 0543 0584 0572

MB (12) 0720 0785 0724 0626 0597 0607 0323 0617 0491 0332 0450FM (17) 0681 0738 0710 0578 0532 0582 0302 0580 0487 0305 0432

OV (6) 0719 0692 0692 0643 0587 0514 0371 0484 0485 0339 0470

TABLE V Success scores of SRE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0352 0488 0374 0289 0326 0332 0254 0360 0305 0213 0243IPR (31) 0487 0537 0494 0450 0460 0438 0399 0410 0380 0405 0357

OPR (39) 0510 0536 0514 0445 0439 0455 0396 0409 0387 0404 0368SV (28) 0524 0492 0463 0401 0413 0396 0438 0395 0384 0440 0402

OCC (29) 0524 0543 0510 0445 0434 0449 0398 0405 0384 0381 0354DEF (19) 0492 0566 0516 0469 0434 0504 0358 0398 0357 0386 0322

BC (21) 0500 0569 0517 0483 0451 0483 0387 0408 0334 0410 0303IV (25) 0486 0516 0490 0442 0446 0438 0389 0396 0350 0405 0347

MB (12) 0503 0565 0513 0425 0389 0420 0266 0451 0385 0256 0312FM (17) 0504 0534 0518 0415 0384 0412 0269 0464 0392 0285 0350

OV (6) 0578 0526 0575 0455 0426 0391 0335 0421 0407 0316 0314

TABLE VI Precision scores of SRE with 11 attributes The number after each attribute name is the number of sequences The red blueand green fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0476 0818 0511 0377 0543 0501 0305 0504 0363 0263 0299IPR (31) 0704 0839 0752 0667 0704 0648 0546 0592 0554 0556 0503

OPR (39) 0732 0828 0774 0666 0680 0669 0547 0595 0560 0560 0525SV (28) 0752 0832 0732 0632 0696 0599 0598 0607 0558 0601 0562

OCC (29) 0735 0815 0730 0662 0671 0649 0540 0568 0516 0514 0483DEF (19) 0684 0835 0757 0677 0630 0715 0475 0547 0505 0516 0467

BC (21) 0702 0851 0734 0693 0655 0698 0521 0555 0451 0555 0439IV (25) 0677 0809 0707 0652 0681 0630 0509 0556 0480 0544 0472

MB (12) 0686 0807 0691 0567 0532 0561 0309 0587 0521 0310 0388FM (17) 0685 0748 0694 0545 0505 0544 0308 0577 0496 0291 0397

OV (6) 0719 0644 0690 0533 0504 0451 0386 0455 0463 0355 0360

efficiently determines the nonlinear decision boundary throughonline training Second the online classifier parameter updatescheme in (10) facilitates recovering from tracking drift

Figure 4 shows sampled tracking results from six longsequences (each with more than 1000 frames) The totalnumber of frames of these sequences is 11 134 that ac-counts for about 384 of the total number of frames (about29 000) in the benchmark and hence the performance on

these sequences plays an important role in performance eval-uation For clear presentation only the results of the topperforming BMR HCF and MEEM methods are shown Inall sequences the BMR-based tracker is able to track thetargets stably over almost all frames However the HCFscheme drifts away from the target objects after a fewframes in the sylvester (117812851345) and lemming(38611371336) sequences The MEEM method drifts

SUBMITTED 8

Fig 4 Screenshots sampled results from six long sequences sylvester mhyang dog1 lemming liquor and doll

to background when severe occlusions happen in the liquorsequence (508728775) To further demonstrate theresults over all frames clearly Figure 5 shows the plots interms of overlap score of each frame Overall the BMR-basedtracker performs well against the HCF and MEEM methodsin most frames of these sequences

D Analysis of BMR

To demonstrate the effectiveness of BMRs we eliminatethe component of Boolean maps in the proposed trackingalgorithm and only leverage the LAB+HOG representationsfor visual tracking In addition we use the KCF as a baselineas it adopts the HOG representations as the proposed trackingmethod Figure 6 shows quantitative comparisons on thebenchmark dataset Without using the proposed Boolean mapsthe AUC score of success rate in OPE of the proposed methodis reduced by 75 For TRE and SRE the AUC scores of theproposed method are reduced by 39 and 47 respectivelywithout the component of Boolean maps It is worth noticingthat the proposed method without using the Boolean maps

still outperforms KCF in terms of all metrics on success rateswhich shows the effectiveness of the LAB color features inBMR These experimental results show that the BMRs in theproposed method play a key role for robust visual tracking

E Failure Cases

Figure 7 shows failed results of the proposed BMR-basedmethod in two sequences singer2 and motorRolling In thesinger2 sequence the foreground object and background sceneare similar due to the dim stage lighting at the beginning(1025) The HCF MEEM and proposed methods all driftto the background Furthermore as the targets in the motor-Rolling sequence undergo from 360-degree in-plane rotationin early frames (35) the MEEM and proposed methods donot adapt to drastic appearance variations well due to limitedtraining samples In contrast only the HCF tracker performswell in this sequence because it leverages dense sampling andhigh-dimensional convolutional features

SUBMITTED 9

Fig 5 Overlap score plots of six long sequences shown in Figure 4

IV CONCLUSIONS

In this paper we propose a Boolean map based representa-tion which exploits the connectivity cues for visual trackingIn the BMR scheme the HOG and raw color feature mapsare decomposed into a set of Boolean maps by uniformlythresholding the respective channels These Boolean maps areconcatenated and normalized to form a robust representationwhich approximates an explicit feature map of the intersectionkernel A logistic regression classifier with the explicit featuremap is trained in an online manner that determines thenonlinear decision boundary for visual tracking Extensiveevaluations on a large tracking benchmark dataset demonstratethe proposed tracking algorithm performs favorably against thestate-of-the-art algorithms in terms of accuracy and robustness

REFERENCES

[1] X Li W Hu C Shen Z Zhang A Dick and A V D HengelldquoA survey of appearance models in visual object trackingrdquo ACMTransactions on Intelligent Systems and Technology vol 4 no 4 p 582013 1

[2] B D Lucas and T Kanade ldquoAn iterative image registration techniquewith an application to stereo visionrdquo in International Joint Conferenceon Artificial Intelligence vol 81 pp 674ndash679 1981 1

[3] I Matthews T Ishikawa and S Baker ldquoThe template update prob-lemrdquo IEEE Transactions on Pattern Analysis and Machine Intelligencevol 26 no 6 pp 810ndash815 2004 1

[4] J Henriques R Caseiro P Martins and J Batista ldquoExploiting thecirculant structure of tracking-by-detection with kernelsrdquo in Proceedingsof European Conference on Computer Vision pp 702ndash715 2012 1

[5] J Zhang S Ma and S Sclaroff ldquoMeem Robust tracking via multi-ple experts using entropy minimizationrdquo in Proceedings of EuropeanConference on Computer Vision pp 188ndash203 2014 1 2 5 6 7

[6] M J Black and A D Jepson ldquoEigentracking Robust matching andtracking of articulated objects using a view-based representationrdquo Inter-national Journal of Computer Vision vol 26 no 1 pp 63ndash84 19981

[7] D Ross J Lim R Lin and M-H Yang ldquoIncremental learningfor robust visual trackingrdquo International Journal of Computer Visionvol 77 no 1 pp 125ndash141 2008 1

[8] X Mei and H Ling ldquoRobust visual tracking and vehicle classificationvia sparse representationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 33 no 11 pp 2259ndash2272 2011 1

[9] T Zhang B Ghanem S Liu and N Ahuja ldquoRobust visual trackingvia multi-task sparse learningrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2042ndash2049 2012 1

[10] D Wang H Lu and M-H Yang ldquoOnline object tracking with sparseprototypesrdquo IEEE Transactions on Image Processing vol 22 no 1pp 314ndash325 2013 1

[11] A Adam E Rivlin and I Shimshoni ldquoRobust fragments-based trackingusing the integral histogramrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 798ndash805 2006 1

[12] S He Q Yang R Lau J Wang and M-H Yang ldquoVisual tracking vialocality sensitive histogramsrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2427ndash2434 2013 1

[13] B Babenko M-H Yang and S Belongie ldquoRobust object trackingwith online multiple instance learningrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 33 no 8 pp 1619ndash1632 20111

[14] S Hare A Saffari and P H Torr ldquoStruck Structured output trackingwith kernelsrdquo in Proceedings of the IEEE International Conference onComputer Vision pp 263ndash270 2011 1 6 7

[15] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speedtracking with kernelized correlation filtersrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 37 no 3 pp 583ndash5962015 1 5 6 7

[16] N Dalal and B Triggs ldquoHistograms of oriented gradients for humandetectionrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition vol 1 pp 886ndash893 2005 1

[17] Y Wu J Lim and M-H Yang ldquoOnline object tracking A benchmarkrdquoin Proceedings of IEEE Conference on Computer Vision and PatternRecognition pp 2411ndash2418 2013 1 2 5 6

[18] J Kwon and K M Lee ldquoTracking of a non-rigid object via patch-baseddynamic appearance modeling and adaptive basin hopping monte carlosamplingrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition pp 1208ndash1215 2009 1

[19] X Jia H Lu and M-H Yang ldquoVisual tracking via adaptive structurallocal sparse appearance modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1822ndash1829 2012 1 67

SUBMITTED 10

Fig 6 Success and precision plots of OPE TRE and SRE for BMR BMR only with LAB+HOG representations and KCF (KCF is usedas a baseline for comparisons)

Fig 7 Failure cases of the BMR-based tracker in the singer2 and motorRolling sequences The results of HCF and MEEM are also illustrated

[20] W Zhong H Lu and M-H Yang ldquoRobust object tracking via sparsity-based collaborative modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1838ndash1845 2012 16 7

[21] Y Li and J Zhu ldquoA scale adaptive kernel correlation filter trackerwith feature integrationrdquo in European Conference on Computer Vision-Workshops pp 254ndash265 2014 1

[22] N Wang J Shi D-Y Yeung and J Jia ldquoUnderstanding and diagnosingvisual tracking systemsrdquo in Proceedings of the IEEE InternationalConference on Computer Vision pp 3101ndash3109 2015 1 2 4

[23] C Ma J-B Huang X Yang and M-H Yang ldquoHierarchical con-volutional features for visual trackingrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 3074ndash3082 20151 2 5 6 7

[24] R Allen P Mcgeorge D Pearson and A B Milne ldquoAttention andexpertise in multiple target trackingrdquo Applied Cognitive Psychologyvol 18 no 3 pp 337ndash347 2004 2

[25] P Cavanagh and G A Alvarez ldquoTracking multiple targets with multifo-cal attentionrdquo Trends in Cognitive Sciences vol 9 no 7 pp 349ndash3542005 2

[26] A Set and B Set ldquoTopological structure in visual perceptionrdquo Sciencevol 218 p 699 1982 2

[27] S E Palmer Vision Science Photons to Phenomenology vol 1 MIT

Press 1999 2[28] J Zhang and S Sclaroff ldquoSaliency detection A boolean map approachrdquo

in Proceedings of the IEEE International Conference on ComputerVision pp 153ndash160 2013 2 3

[29] B Alexe T Deselaers and V Ferrari ldquoMeasuring the objectness ofimage windowsrdquo IEEE Transactions Pattern Analysis and MachineIntelligence vol 34 no 11 pp 2189ndash2202 2012 2

[30] M-M Cheng Z Zhang W-Y Lin and P Torr ldquoBing Binarizednormed gradients for objectness estimation at 300fpsrdquo in Proceedingsof IEEE Conference on Computer Vision and Pattern Recognitionpp 3286ndash3293 2014 2

[31] L Huang and H Pashler ldquoA boolean map theory of visual attentionrdquoPsychological review vol 114 no 3 p 599 2007 2

[32] L Itti C Koch and E Niebur ldquoA model of saliency-based visual at-tention for rapid scene analysisrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence no 11 pp 1254ndash1259 1998 2 3

[33] M S Livingstone and D H Hubel ldquoAnatomy and physiology of a colorsystem in the primate visual cortexrdquo The Journal of Neuroscience vol 4no 1 pp 309ndash356 1984 2

[34] K Grauman and T Darrell ldquoThe pyramid match kernel Efficientlearning with sets of featuresrdquo The Journal of Machine LearningResearch vol 8 pp 725ndash760 2007 3

[35] N Wang and D-Y Yeung ldquoLearning a deep compact image representa-

SUBMITTED 11

tion for visual trackingrdquo in Advances in Neural Information ProcessingSystems pp 809ndash817 2013 5 6 7

[36] M Danelljan G Hager F Khan and M Felsberg ldquoAccurate scaleestimation for robust visual trackingrdquo in Proceedings of British MachineVision Conference 2014 5 6 7

[37] J Gao H Ling W Hu and J Xing ldquoTransfer learning based visualtracking with Gaussian processes regressionrdquo in Proceedings of Euro-pean Conference on Computer Vision pp 188ndash203 2014 5 6 7

[38] Z Kalal J Matas and K Mikolajczyk ldquoPn learning Bootstrappingbinary classifiers by structural constraintsrdquo in Proceedings of IEEEConference on Computer Vision and Pattern Recognition pp 49ndash562010 6 7

Page 2: SUBMITTED 1 Visual Tracking via Boolean Map Representations

SUBMITTED 2

Fig 1 Boolean map representation For clarity only the BMRs of a positive and negative sample are demonstrated Note that the Booleanmaps contain more connected structures than the LAB+HOG representations

scheme and effective color features In [22] Wang et aldemonstrate that a simple tracker based on logistic regres-sion with a representation composed of HOG and raw colorchannels performs favorably on the benchmark dataset [17]Ma et al [23] exploit features from hierarchical layers ofa convolutional neural network and learn an effective KCFwhich takes account of spatial details and semantics of targetobjects for visual tracking

In biological vision it has been suggested that objecttracking is carried out by attention mechanisms [24] [25]Global topological structure such as connectivity is used tomodel tasks related to visual attention [26] [27] Howeverall aforementioned representations do not consider topologicalstructure for visual tracking

In this work we propose a Boolean map based repre-sentation (BMR) that leverages connectivity cues for visualtracking One case of connectivity is the enclosure topologicalrelationship between the (foreground) figure and ground whichdefines the boundaries of figures Recent gestalt psychologicalstudies suggest that the enclosure topological cues play animportant role in figure-ground segregation and have beensuccessfully applied to saliency detection [28] and measuringobjectness [29] [30] The proposed BMR scheme character-izes target appearance by concatenating multiple layers ofBoolean maps at different granularities based on uniformlythresholding HOG and color feature maps The fine-grainedBoolean maps capture locally spatial structural details thatare effective for precise localization and coarse-grained oneswhich encode much global shape information to account forsignificant appearance variations The Boolean maps are thenconcatenated and normalized to a BMR scheme that can be justapproximated by an explicate feature map We learn a logisticregression classifier using the BMR scheme and online updateto estimate target locations within a particle filter frameworkThe effectiveness of the proposed algorithm is demonstratedon a large tracking benchmark dataset with 50 challengingvideos [17] against the state-of-the-art approaches

The main contributions of this work are summarized asfollowsbull We demonstrate that the connectivity cues can be effec-

tively used for robust visual tracking

bull We show that the BMR scheme can be approximated asan explicit feature map of the intersection kernel whichcan find a nonlinear classification boundary via a linearclassifier In addition it is easy to train and detect forrobust visual tracking with this approach

bull The proposed tracking algorithm based on the BMRscheme performs favorably in terms of accuracy and ro-bustness to initializations based on the benchmark datasetwith 50 challenging videos [17] against 35 methods in-cluding the state-of-the-art trackers based on hierarchicalfeatures from deep networks [23] and multiple expertswith entropy minimization (MEEM) [5]

II TRACKING VIA BOOLEAN MAP REPRESENTATIONS

We present the BMR scheme and a logistic regressionclassifier with online update for visual tracking

A Boolean Map Representation

The proposed image representation is based on recentfindings of human visual attention [31] which shows that mo-mentary conscious awareness of a scene can be represented byBoolean maps The Boolean maps are concerned with center-surround contrast that mimic the sensitivity of neurons eitherto dark centers on bright surrounds or vice versa [32] Specifi-cally we exploit the connectivity cues inside a target measuredby the Boolean maps which can be used for separating theforeground object from the background effectively [26] [28]ndash[30] As demonstrated in Figure 1 the connectivity inside atarget can be well captured by the Boolean maps at differentscales

Neurobiological studies have demonstrated that human vi-sual system is sensitive to color and edge orientations [33]which provide useful cues to discriminate the foregroundobject from the background In this work we use color featuresin the CIE LAB color space and HOG features to representobjects To extract the perceptually uniform color features wefirst normalize each sample x to a canonical size (32times32 in ourexperiments) and then subsample it to a half size to reduceappearance variations and finally transform the sample intothe CIE LAB color space denoted as Φcol(x) isin Rncoltimesncoltimes3

SUBMITTED 3

Fig 2 Right two columns reconstructed LAB+HOG representationsof the target by BMRs in our experiments Left two columns the cor-responding prototypes shown in Figure 1 Some reconstructed oneswith more connected structures than their prototypes are highlightedin yellow

(ncol = 16 in this work) Furthermore we leverage the HOGfeatures to capture edge orientation information of a targetobject denoted as Φhog(x) isin Rnhogtimesnhogtimes31 (nhog = 4 inthis work) Figure 1 demonstrates that most color and HOGfeature maps of the target own center-surrounded patterns thatare similar to biologically plausible architecture of primatesin [32] We normalize both Φcol(x) and Φhog(x) to rangefrom 0 to 1 and concatenate Φcol(x) and Φhog(x) to forma feature vector φ(x) isin Rdtimes1 with d = 3n2col + 31n2hog Thefeature vector is rescaled to [0 1] by

φ(x)larr φ(x)minusmin(φ(x))

max(φ(x))minusmin(φ(x)) (1)

where max(middot) and min(middot) denotes the maximal and minimaloperators respectively

Next φ(x) in (1) is encoded into a set of vectorized Booleanmaps B(x) = bi(x)ci=1 by

bi(x) =

1 φ(x) θi0 otherwise

(2)

where θi sim U(0 1) is a threshold drawn from a uniformdistribution over [0 1] and the symbol denotes elementwiseinequality In this work we set θi = ic that is simply sampledat a fixed-step size δ = 1c and a fixed-step sampling isequivalent to the uniform sampling in the limit δ rarr 0 [28]Hence we have b1(x) b2(x) bc(x) It is easy toshow that

0 le φk(x)minus1

c

csumj=1

bjk(x) lt δ (3)

where φk and bjk are the k-th entries of φ and bj respec-tively

Proof Without loss of generality we assume that iδ leφk(x) lt (1+ i)δ i = 0 c As such we have bjk(x) = 1for all j le i because b1(x) b2(x) bc(x) andbjk(x) = 0 for j gt i Therefore we have 1

c

sumcj=1 bjk(x) =

iδ and 0 le φk(x)minus 1c

sumcj=1 bjk(x) lt (1+ i)δminus iδ = δ

In (3) when δ rarr 0 (ie θi sim U(0 1)) we have

φk(x) =1

c

csumj=1

bjk(x) (4)

In this work we set δ = 025 Although (4) may notbe strictly satisfied empirical results show that most distinct

structures in φ(x) can be reconstructed as demonstrated inFigure 2 Furthermore the reconstructed representations con-tain more connected structures than the original ones (seethe ones highlighted in yellow in Figure 2) which showsthat the Boolean maps facilitate capturing global geometricinformation of target objects

Based on (4) to measure the similarity between two samplesx and y we use the intersection function [34]

I(φ(x)φ(y)) =dsumk=1

min(φk(x) φk(y))

=

dsumk=1

min(1

c

csumj=1

bjk(x)1

c

csumj=1

bjk(y))

=1

c

dsumk=1

csumj=1

min(bjk(x) bjk(y))

=1

c

dsumk=1

csumj=1

bjk(x)bjk(y)

=lt b(x) b(y) gt

(5)

where b = [bgt1 bgtc ]gtradicc

To avoid favoring larger input sets [34] we normalizeI(φ(x)φ(y)) in (5) and define the kernel I(φ(x)φ(y)) as

I(φ(x)φ(y)) = I(φ(x)φ(y))radicI(φ(x)φ(x))I(φ(y)φ(y))

=lt b(x)b(y) gt

(6)

where b(middot) is an explicit feature map function In this workthe feature map function is defined by

b(x) =b(x)

|b(x)|12 (7)

where |middot|12 is an `2 norm operator We use b(x) to train a linearclassifier which is able to address the nonlinear classificationproblem in the feature space of φ for visual tracking withfavorable performance The proposed tracking algorithm basedon BMR is summarized in Algorithm 1

B Learning Linear Classifier with BMRs

We pose visual tracking as a binary classification problemwith local search in which a linear classifier is learned inthe Boolean map feature space to separate the target from thebackground Specifically we use a logistic regressor to learnthe classifier for measuring similarity scores of samples

Let lt(xit) isin R2 denote the location of the i-th sample at

frame t We assume that lt(xt) is the object location anddensely draw samples Dα = x|||lt(x)minus lt(xt)|| lt α withina search radius α centered at the current object location andlabel them as positive samples Next we uniformly samplesome patches from set Dζβ = x||ζ lt ||lt(x) minus lt(xt)|| ltβ and label them as negative samples After representingthese samples with BMRs we obtain a set of training dataDt = (b(xit) yit)

nti=1 where yit isin +1minus1 is the class

label and nt is the number of samples The cost function at

SUBMITTED 4

Algorithm 1 BMR

Input Normalized image patch x1) Compute feature vector φ(x) in (1)2) for all entries φi(x) i = 1 d of φ(x) do3) for θi = δ δ 1minus δ do4) if φi(x) gt θi5) bi(x) = 16) else7) bi(x) = 08) end if9) end for

10) end for11) b(x)larr [b1(x) bcd(x)]

gtradicc

12) b(x)larr b(x)|b(x)|12 NormalizationOutput BMR b(x)

Algorithm 2 BMR-based Tracking

Input Target state stminus1 classifier parameter vector wt1) Sample np candidate particles sit

np

i=1 with the motionmodel p(sit |stminus1) in (12)

2) For each particle sit extract the corresponding imagepatch xit and compute the BMR b(xit) by Algo-rithm 1 and compute the corresponding observationmodel p(ot|sit) by (13)

3) Estimate the optimal state st by (12) and obtain thecorresponding image patch xt

4) if f(xt) lt ρ5) Update wt by iterating (10) until convergence and

set wt+1 larr wt6) else7) wt+1 larr wt8) end if

Output Target state st and classifier parameter vectorwt+1

frame t is defined as the negative log-likelihood for logisticregression

`t(w) =1

nt

ntsumi=1

log(1 + exp(minusyitwgtb(xit))) (8)

where w is the classifier parameter vector and the correspond-ing classifier is denoted as

f(x) =1

1 + exp(minuswgtb(x)) (9)

We use a gradient descent method to minimize `t(w) byiterating

wlarr w minus part`t(w)

partw (10)

where part`t(w)partw minus 1

nt

sumnt

i=1 b(xit)yit exp(minusy

itw

gtb(xit))

1+exp(minusyitwgtb(xit))

In thiswork we use the parameter wtminus1 obtained at frame tminus 1 toinitialize w in (10) and iterate 20 times for updates

C Proposed Tracking Algorithm

We estimate the target states sequentially within a particlefilter framework Given the observation set Ot = oiti=1 upto frame t the target sate st is obtained by maximizing theposteriori probability

st = argmaxstp(st|Ot) prop p(ot|st)p(st|Otminus1) (11)

where p(st|Otminus1) intp(st|stminus1)p(stminus1|Otminus1)dstminus1 st =

[xt yt st] is the target state with translations xt and yt andscale st p(st|stminus1) is a dynamic model that describes thetemporal correlation of the target states in two consecutiveframes and p(ot|st) is the observation model that estimatesthe likelihood of a state given an observation In the proposedalgorithm we assume that the target state parameters areindependent and modeled by three scalar Gaussian distri-butions between two consecutive frames ie p(st|stminus1) =N (st|stminus1Σ) where Σ = diag(σx σy σs) is a diagonalcovariance matrix whose elements are the standard deviationsof the target state parameters In visual tracking the posteriorprobability p(st|Ot) in (11) is approximated by a finite setof particles sit

np

i=1 that are sampled with correspondingimportance weights πit

np

i=1 where πit prop p(ot|sit) Therefore(11) can be approximated as

st = arg maxsit

npi=1

p(ot|sit)p(sit|stminus1) (12)

In our method the observation model p(ot|sit) is defined as

p(ot|sit) prop f(xit) (13)

where f(xit) is the logistic regression classifier defined by (9)To adapt to target appearance variations while preserving

the stable information that helps prevent the tracker fromdrifting to background we update the classifier parametersw in a conservative way We update w by (10) only whenthe confidence of the target falls below a threshold ρ Thisensures that the target states always have high confidencescores and alleviate the problem of including noisy sampleswhen updating classifier [22] The main steps of the proposedalgorithm are summarized in Algorithm 2

III EXPERIMENTAL RESULTS

We first present implementation details of the proposedalgorithm and discuss the dataset and metrics for perfor-mance evaluation Next we analyze the empirical results usingwidely-adopted metrics We present ablation study to examinethe effectiveness of each key component in the proposed BMRscheme Finally we show and analyze some failure cases

A Implementation Details

All images are resized to a fixed size of 240times320 pixels [22]for experiments and each patch is resized to a canonical size of32times32 pixels In addition each canonical patch is subsampledto a half size with ncol = 16 for color representations TheHOG features are extracted from the canonical patches thatsupports both gray and color images and the sizes of HOGfeature maps are the same as nhog times nhog times 31 = 4times 4times 31(as implemented in httpgithubcompdollartoolbox)

SUBMITTED 5

For grayscale videos the original image patches are usedto extract raw intensity and HOG features and the featuredimension d = 4times4times31+16times16 = 752 For color videos theimage patches are transformed to the CIE LAB color space toextract raw color features and the original RGB image patchesare used to extract HOG features The corresponding totaldimension d = 4times 4times 31+16times 16times 3 = 1264 The numberof Boolean maps is set to c = 4 and the total dimensionof BMRs is 3d = 2256 for gray videos and 3792 for colorvideos and the sampling step δ = 1c = 025 The searchradius for positive samples is set to α = 3 The inner searchradius for negative samples is set to 03min(w h) where wand h are the weight and height of the target respectivelyand the outer search radius β = 100 where the searchstep is set to 5 which generates a small subset of negativesamples The target state parameter set for particle filter is setto [σx σy σs] = [6 6 001] and the number of particles is setto np = 400 The confidence threshold is set to ρ = 09 Allparameter values are fixed for all sequences and the sourcecode will be made available to the public More results andvideos are available at httpkaihuazhangnetbmrbmrhtm

B Dataset and Evaluation Metrics

For performance evaluation we use the tracking benchmarkdataset and code library [17] which includes 29 trackers and50 fully-annotated videos In addition we also add the corre-sponding results of 6 most recent trackers including DLT [35]DSST [36] KCF [15] TGPR [37] MEEM [5] and HCF [23]For detailed analysis the sequences are annotated with 11attributes based on different challenging factors including lowresolution (LR) in-plane rotation (IPR) out-of-plane rotation(OPR) scale variation (SV) occlusion (OCC) deformation(DEF) background clutters (BC) illumination variation (IV)motion blur (MB) fast motion (FM) and out-of-view (OV)

We quantitatively evaluate the trackers with success andprecision plots [17] Given the tracked bounding box BT andthe ground truth bounding box BG the overlap score is definedas score = Area(BT

⋂BG)

Area(BT

⋃BG) Hence 0 le score le 1 and a

larger value of score means a better performance of the eval-uated tracker The success plot demonstrates the percentageof frames with score gt t through all threshold t isin [0 1]Furthermore the area under curve (AUC) of each success plotserves as a measure to rank the evaluated trackers On theother hand the precision plot shows the percentage of frameswhose tracked locations are within a given threshold distance(ie 20 pixels in [17]) to the ground truth Both success andprecision plots are used in the one-pass evaluation (OPE)temporal robustness evaluation (TRE) and spatial robustnessevaluation (SRE) where OPE reports the average precision orsuccess rate by running the trackers through a test sequencewith initialization from the ground truth position and TRE aswell as SRE measure a trackerprimes robustness to initializationwith temporal and spatial perturbations respectively [17] Wereport the OPE TRE and SRE results For presentation claritywe only present the top 10 algorithms in each plot

C Empirical Results

1) Overall Performance Figure 3 shows overall perfor-mance of the top 10 trackers in terms of success and precisionplots The BMR-based tracking algorithm ranks first on thesuccess rate of all OPE and second based on TRE andSRE Furthermore the BMR-based method ranks third basedon the precision rates of OPE TRE and SRE Overall theproposed BMR-based tracker performs favorably against thestate-of-the-art methods in terms of all metrics except forMEEM [5] and HCF [23] The MEEM tracker exploits a multi-expert restoration scheme to handle the drift problem whichcombines a tracker and the historical snapshots as expertsIn contrast even using only a logistic regression classifierwithout using any restoration strategy the proposed BMR-based method performs well against MEEM in terms of mostmetrics (ie the success rates of the BMR-based methodoutperform the MEEM scheme while the precision rates of theBMR-based method are comparable to the MEEM scheme)which shows the effectiveness of the proposed representationscheme for visual tracking In addition the HCF method isbased on deep learning which leverages complex hierarchicalconvolutional features learned off-line from a large datasetand correlation filters for visual tracking Notwithstanding theproposed BMR-based algorithm performs comparably againstHCF in terms of success rates on all metrics

2) Attribute-based Performance To demonstrate thestrength and weakness of BMR we further evaluate the 35trackers on videos with 11 attributes categorized by [17]

Table I and II summarize the results of success and precisionscores of OPE with different attributes Among them theBMR-based method ranks within top 3 with most attributesSpecifically with the success rate of OPE the BMR-basedmethod ranks first on 4 out of 11 attributes while second on 6out of 11 attributes In the sequences with the BC attribute theBMR-based method ranks third and its score is close to theMEEM scheme that ranks second (0555 vs 0569) For theprecision scores of OPE the BMR-based method ranks secondon 4 out of 11 attributes and third on 3 out of 11 attributes Inthe sequences with the OV attribute the BMR-based trackerranks first and for the videos with the IPR and BC attributesthe proposed tracking algorithm ranks fourth with comparableperformance to the third-rank DSST and KCF methods

Table III and IV show the results of TRE with differentattributes The BMR-based method ranks within top 3 withmost attributes In terms of success rates the BMR-basedmethod ranks first on 2 attributes second on 3 attributes andthird on 6 attributes In terms of precision rates the BMR-based tracker ranks third on 7 attributes and first and secondon the OV and OCC attributes respectively Furthermore forother attributes such as LR and BC the BMR-based trackingalgorithm ranks fourth but it scores are close to the results ofMEEM and KCF that rank third (0581 vs 0598 and 0772vs 0776)

Table V and VI show the results of SRE with differentattributes In terms of success rates the rankings of the BMR-based method are similar to those based on TRE except forthe IPR and OPR attributes Among them the BMR-based

SUBMITTED 6

Fig 3 Success and precision plots of OPE TRE and SRE by the top 10 trackers The trackers are ranked by the AUC scores (shown inthe legends) when the success rates are used or precession cores at the threshold of 20 pixels

TABLE I Success score of OPE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0409 0557 0360 0310 0352 0370 0279 0372 0309 0157 0256IPR (31) 0557 0582 0535 0497 0532 0479 0458 0444 0416 0425 0383

OPR (39) 0590 0587 0558 0496 0491 0485 0470 0432 0420 0422 0393SV (28) 0586 0531 0498 0427 0451 0418 0518 0425 0421 0452 0458

OCC (29) 0615 0606 0552 0513 0480 0484 0487 0413 0402 0376 0384DEF (19) 0594 0626 0560 0533 0474 0510 0448 0393 0378 0372 0330

BC (21) 0555 0623 0569 0533 0492 0522 0450 0458 0345 0408 0327IV (25) 0551 0560 0533 0494 0506 0484 0473 0428 0399 0429 0392

MB (12) 0559 0616 0541 0499 0458 0434 0298 0433 0404 0258 0329FM (17) 0559 0578 0553 0461 0433 0396 0296 0462 0417 0247 0353

OV (6) 0616 0575 0606 0550 0490 0442 0361 0459 0457 0312 0409

TABLE II Precision scores of OPE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0517 0897 0490 0379 0534 0538 0305 0545 0349 0156 0303IPR (31) 0776 0868 0800 0725 0780 0675 0597 0617 0584 0511 0510

OPR (39) 0819 0869 0840 0730 0732 0678 0618 0597 0596 0518 0527SV (28) 0803 0880 0785 0680 0740 0620 0672 0639 0606 0552 0606

OCC (29) 0846 0877 0799 0749 0725 0675 0640 0564 0563 0460 0495DEF (19) 0802 0881 0846 0741 0657 0691 0586 0521 0512 0445 0512

BC (21) 0742 0885 0797 0752 0691 0717 0578 0585 0428 0496 0440IV (25) 0742 0844 0766 0729 0741 0671 0594 0558 0537 0517 0492

MB (12) 0755 0844 0715 0650 0603 0537 0339 0551 0518 0278 0427FM (17) 0758 0790 0742 0602 0562 0493 0333 0604 0551 0253 0435

OV (6) 0773 0695 0727 0649 0533 0505 0429 0539 0576 0333 0505

tracker ranks third based on SRE and second based on TREFurthermore although the MEEM method ranks higher thanthe BMR-based tracker in most attributes the differences ofthe scores are within 1 In terms of precision rates the BMR-based algorithm ranks within top 3 with most attributes exceptfor the LR DEF and IV attributes

The AUC score of success rate measures the overall perfor-

mance of each tracking method [17] Figure 3 shows that theBMR-based method achieves better results in terms of successrates than that precision rates in terms of all metrics (OPESRE TRE) and attributes The tracking performance can beattributed to two factors First the proposed method exploits alogistic regression classifier with explicit feature maps which

SUBMITTED 7

TABLE III Success scores of TRE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0444 0520 0424 0382 0403 0443 0304 0456 0299 0278 0324IPR (31) 0562 0591 0558 0520 0515 0514 0453 0473 0406 0451 0423

OPR (39) 0578 0595 0572 0531 0507 0523 0480 0477 0425 0465 0428SV (28) 0564 0544 0517 0488 0473 0468 0496 0446 0418 0487 0448

OCC (29) 0585 0610 0566 0547 0519 0520 0502 0462 0426 0444 0426DEF (19) 0599 0651 0611 0571 0548 0577 0515 0500 0425 0466 0399

BC (21) 0575 0631 0577 0565 0518 0530 0469 0478 0372 0445 0366IV (25) 0555 0597 0564 0528 0529 0518 0475 0486 0402 0468 0427

MB (12) 0537 0594 0553 0493 0472 0483 0290 0485 0388 0296 0349FM (17) 0516 0560 0542 0456 0429 0461 0282 0464 0392 0285 0350

OV (6) 0593 0557 0581 0539 0505 0440 0344 0417 0434 0325 0403

TABLE IV Precision scores of TRE with 11 attributes The number after each attribute name is the number of sequences The red blueand green fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0581 0750 0589 0501 0574 0602 0350 0628 0376 0325 0391IPR (31) 0767 0851 0802 0728 0725 0716 0581 0650 0569 0582 0572

OPR (39) 0789 0859 0826 0749 0719 0728 0617 0660 0597 0605 0584SV (28) 0769 0840 0787 0727 0717 0676 0633 0652 0600 0634 0594

OCC (29) 0791 0854 0788 0758 0726 0705 0633 0631 0579 0560 0550DEF (19) 0798 0889 0854 0757 0723 0765 0635 0655 0571 0571 0556

BC (21) 0772 0874 0793 0776 0697 0721 0600 0622 0488 0575 0517IV (25) 0747 0851 0792 0729 0727 0693 0585 0643 0543 0584 0572

MB (12) 0720 0785 0724 0626 0597 0607 0323 0617 0491 0332 0450FM (17) 0681 0738 0710 0578 0532 0582 0302 0580 0487 0305 0432

OV (6) 0719 0692 0692 0643 0587 0514 0371 0484 0485 0339 0470

TABLE V Success scores of SRE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0352 0488 0374 0289 0326 0332 0254 0360 0305 0213 0243IPR (31) 0487 0537 0494 0450 0460 0438 0399 0410 0380 0405 0357

OPR (39) 0510 0536 0514 0445 0439 0455 0396 0409 0387 0404 0368SV (28) 0524 0492 0463 0401 0413 0396 0438 0395 0384 0440 0402

OCC (29) 0524 0543 0510 0445 0434 0449 0398 0405 0384 0381 0354DEF (19) 0492 0566 0516 0469 0434 0504 0358 0398 0357 0386 0322

BC (21) 0500 0569 0517 0483 0451 0483 0387 0408 0334 0410 0303IV (25) 0486 0516 0490 0442 0446 0438 0389 0396 0350 0405 0347

MB (12) 0503 0565 0513 0425 0389 0420 0266 0451 0385 0256 0312FM (17) 0504 0534 0518 0415 0384 0412 0269 0464 0392 0285 0350

OV (6) 0578 0526 0575 0455 0426 0391 0335 0421 0407 0316 0314

TABLE VI Precision scores of SRE with 11 attributes The number after each attribute name is the number of sequences The red blueand green fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0476 0818 0511 0377 0543 0501 0305 0504 0363 0263 0299IPR (31) 0704 0839 0752 0667 0704 0648 0546 0592 0554 0556 0503

OPR (39) 0732 0828 0774 0666 0680 0669 0547 0595 0560 0560 0525SV (28) 0752 0832 0732 0632 0696 0599 0598 0607 0558 0601 0562

OCC (29) 0735 0815 0730 0662 0671 0649 0540 0568 0516 0514 0483DEF (19) 0684 0835 0757 0677 0630 0715 0475 0547 0505 0516 0467

BC (21) 0702 0851 0734 0693 0655 0698 0521 0555 0451 0555 0439IV (25) 0677 0809 0707 0652 0681 0630 0509 0556 0480 0544 0472

MB (12) 0686 0807 0691 0567 0532 0561 0309 0587 0521 0310 0388FM (17) 0685 0748 0694 0545 0505 0544 0308 0577 0496 0291 0397

OV (6) 0719 0644 0690 0533 0504 0451 0386 0455 0463 0355 0360

efficiently determines the nonlinear decision boundary throughonline training Second the online classifier parameter updatescheme in (10) facilitates recovering from tracking drift

Figure 4 shows sampled tracking results from six longsequences (each with more than 1000 frames) The totalnumber of frames of these sequences is 11 134 that ac-counts for about 384 of the total number of frames (about29 000) in the benchmark and hence the performance on

these sequences plays an important role in performance eval-uation For clear presentation only the results of the topperforming BMR HCF and MEEM methods are shown Inall sequences the BMR-based tracker is able to track thetargets stably over almost all frames However the HCFscheme drifts away from the target objects after a fewframes in the sylvester (117812851345) and lemming(38611371336) sequences The MEEM method drifts

SUBMITTED 8

Fig 4 Screenshots sampled results from six long sequences sylvester mhyang dog1 lemming liquor and doll

to background when severe occlusions happen in the liquorsequence (508728775) To further demonstrate theresults over all frames clearly Figure 5 shows the plots interms of overlap score of each frame Overall the BMR-basedtracker performs well against the HCF and MEEM methodsin most frames of these sequences

D Analysis of BMR

To demonstrate the effectiveness of BMRs we eliminatethe component of Boolean maps in the proposed trackingalgorithm and only leverage the LAB+HOG representationsfor visual tracking In addition we use the KCF as a baselineas it adopts the HOG representations as the proposed trackingmethod Figure 6 shows quantitative comparisons on thebenchmark dataset Without using the proposed Boolean mapsthe AUC score of success rate in OPE of the proposed methodis reduced by 75 For TRE and SRE the AUC scores of theproposed method are reduced by 39 and 47 respectivelywithout the component of Boolean maps It is worth noticingthat the proposed method without using the Boolean maps

still outperforms KCF in terms of all metrics on success rateswhich shows the effectiveness of the LAB color features inBMR These experimental results show that the BMRs in theproposed method play a key role for robust visual tracking

E Failure Cases

Figure 7 shows failed results of the proposed BMR-basedmethod in two sequences singer2 and motorRolling In thesinger2 sequence the foreground object and background sceneare similar due to the dim stage lighting at the beginning(1025) The HCF MEEM and proposed methods all driftto the background Furthermore as the targets in the motor-Rolling sequence undergo from 360-degree in-plane rotationin early frames (35) the MEEM and proposed methods donot adapt to drastic appearance variations well due to limitedtraining samples In contrast only the HCF tracker performswell in this sequence because it leverages dense sampling andhigh-dimensional convolutional features

SUBMITTED 9

Fig 5 Overlap score plots of six long sequences shown in Figure 4

IV CONCLUSIONS

In this paper we propose a Boolean map based representa-tion which exploits the connectivity cues for visual trackingIn the BMR scheme the HOG and raw color feature mapsare decomposed into a set of Boolean maps by uniformlythresholding the respective channels These Boolean maps areconcatenated and normalized to form a robust representationwhich approximates an explicit feature map of the intersectionkernel A logistic regression classifier with the explicit featuremap is trained in an online manner that determines thenonlinear decision boundary for visual tracking Extensiveevaluations on a large tracking benchmark dataset demonstratethe proposed tracking algorithm performs favorably against thestate-of-the-art algorithms in terms of accuracy and robustness

REFERENCES

[1] X Li W Hu C Shen Z Zhang A Dick and A V D HengelldquoA survey of appearance models in visual object trackingrdquo ACMTransactions on Intelligent Systems and Technology vol 4 no 4 p 582013 1

[2] B D Lucas and T Kanade ldquoAn iterative image registration techniquewith an application to stereo visionrdquo in International Joint Conferenceon Artificial Intelligence vol 81 pp 674ndash679 1981 1

[3] I Matthews T Ishikawa and S Baker ldquoThe template update prob-lemrdquo IEEE Transactions on Pattern Analysis and Machine Intelligencevol 26 no 6 pp 810ndash815 2004 1

[4] J Henriques R Caseiro P Martins and J Batista ldquoExploiting thecirculant structure of tracking-by-detection with kernelsrdquo in Proceedingsof European Conference on Computer Vision pp 702ndash715 2012 1

[5] J Zhang S Ma and S Sclaroff ldquoMeem Robust tracking via multi-ple experts using entropy minimizationrdquo in Proceedings of EuropeanConference on Computer Vision pp 188ndash203 2014 1 2 5 6 7

[6] M J Black and A D Jepson ldquoEigentracking Robust matching andtracking of articulated objects using a view-based representationrdquo Inter-national Journal of Computer Vision vol 26 no 1 pp 63ndash84 19981

[7] D Ross J Lim R Lin and M-H Yang ldquoIncremental learningfor robust visual trackingrdquo International Journal of Computer Visionvol 77 no 1 pp 125ndash141 2008 1

[8] X Mei and H Ling ldquoRobust visual tracking and vehicle classificationvia sparse representationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 33 no 11 pp 2259ndash2272 2011 1

[9] T Zhang B Ghanem S Liu and N Ahuja ldquoRobust visual trackingvia multi-task sparse learningrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2042ndash2049 2012 1

[10] D Wang H Lu and M-H Yang ldquoOnline object tracking with sparseprototypesrdquo IEEE Transactions on Image Processing vol 22 no 1pp 314ndash325 2013 1

[11] A Adam E Rivlin and I Shimshoni ldquoRobust fragments-based trackingusing the integral histogramrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 798ndash805 2006 1

[12] S He Q Yang R Lau J Wang and M-H Yang ldquoVisual tracking vialocality sensitive histogramsrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2427ndash2434 2013 1

[13] B Babenko M-H Yang and S Belongie ldquoRobust object trackingwith online multiple instance learningrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 33 no 8 pp 1619ndash1632 20111

[14] S Hare A Saffari and P H Torr ldquoStruck Structured output trackingwith kernelsrdquo in Proceedings of the IEEE International Conference onComputer Vision pp 263ndash270 2011 1 6 7

[15] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speedtracking with kernelized correlation filtersrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 37 no 3 pp 583ndash5962015 1 5 6 7

[16] N Dalal and B Triggs ldquoHistograms of oriented gradients for humandetectionrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition vol 1 pp 886ndash893 2005 1

[17] Y Wu J Lim and M-H Yang ldquoOnline object tracking A benchmarkrdquoin Proceedings of IEEE Conference on Computer Vision and PatternRecognition pp 2411ndash2418 2013 1 2 5 6

[18] J Kwon and K M Lee ldquoTracking of a non-rigid object via patch-baseddynamic appearance modeling and adaptive basin hopping monte carlosamplingrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition pp 1208ndash1215 2009 1

[19] X Jia H Lu and M-H Yang ldquoVisual tracking via adaptive structurallocal sparse appearance modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1822ndash1829 2012 1 67

SUBMITTED 10

Fig 6 Success and precision plots of OPE TRE and SRE for BMR BMR only with LAB+HOG representations and KCF (KCF is usedas a baseline for comparisons)

Fig 7 Failure cases of the BMR-based tracker in the singer2 and motorRolling sequences The results of HCF and MEEM are also illustrated

[20] W Zhong H Lu and M-H Yang ldquoRobust object tracking via sparsity-based collaborative modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1838ndash1845 2012 16 7

[21] Y Li and J Zhu ldquoA scale adaptive kernel correlation filter trackerwith feature integrationrdquo in European Conference on Computer Vision-Workshops pp 254ndash265 2014 1

[22] N Wang J Shi D-Y Yeung and J Jia ldquoUnderstanding and diagnosingvisual tracking systemsrdquo in Proceedings of the IEEE InternationalConference on Computer Vision pp 3101ndash3109 2015 1 2 4

[23] C Ma J-B Huang X Yang and M-H Yang ldquoHierarchical con-volutional features for visual trackingrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 3074ndash3082 20151 2 5 6 7

[24] R Allen P Mcgeorge D Pearson and A B Milne ldquoAttention andexpertise in multiple target trackingrdquo Applied Cognitive Psychologyvol 18 no 3 pp 337ndash347 2004 2

[25] P Cavanagh and G A Alvarez ldquoTracking multiple targets with multifo-cal attentionrdquo Trends in Cognitive Sciences vol 9 no 7 pp 349ndash3542005 2

[26] A Set and B Set ldquoTopological structure in visual perceptionrdquo Sciencevol 218 p 699 1982 2

[27] S E Palmer Vision Science Photons to Phenomenology vol 1 MIT

Press 1999 2[28] J Zhang and S Sclaroff ldquoSaliency detection A boolean map approachrdquo

in Proceedings of the IEEE International Conference on ComputerVision pp 153ndash160 2013 2 3

[29] B Alexe T Deselaers and V Ferrari ldquoMeasuring the objectness ofimage windowsrdquo IEEE Transactions Pattern Analysis and MachineIntelligence vol 34 no 11 pp 2189ndash2202 2012 2

[30] M-M Cheng Z Zhang W-Y Lin and P Torr ldquoBing Binarizednormed gradients for objectness estimation at 300fpsrdquo in Proceedingsof IEEE Conference on Computer Vision and Pattern Recognitionpp 3286ndash3293 2014 2

[31] L Huang and H Pashler ldquoA boolean map theory of visual attentionrdquoPsychological review vol 114 no 3 p 599 2007 2

[32] L Itti C Koch and E Niebur ldquoA model of saliency-based visual at-tention for rapid scene analysisrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence no 11 pp 1254ndash1259 1998 2 3

[33] M S Livingstone and D H Hubel ldquoAnatomy and physiology of a colorsystem in the primate visual cortexrdquo The Journal of Neuroscience vol 4no 1 pp 309ndash356 1984 2

[34] K Grauman and T Darrell ldquoThe pyramid match kernel Efficientlearning with sets of featuresrdquo The Journal of Machine LearningResearch vol 8 pp 725ndash760 2007 3

[35] N Wang and D-Y Yeung ldquoLearning a deep compact image representa-

SUBMITTED 11

tion for visual trackingrdquo in Advances in Neural Information ProcessingSystems pp 809ndash817 2013 5 6 7

[36] M Danelljan G Hager F Khan and M Felsberg ldquoAccurate scaleestimation for robust visual trackingrdquo in Proceedings of British MachineVision Conference 2014 5 6 7

[37] J Gao H Ling W Hu and J Xing ldquoTransfer learning based visualtracking with Gaussian processes regressionrdquo in Proceedings of Euro-pean Conference on Computer Vision pp 188ndash203 2014 5 6 7

[38] Z Kalal J Matas and K Mikolajczyk ldquoPn learning Bootstrappingbinary classifiers by structural constraintsrdquo in Proceedings of IEEEConference on Computer Vision and Pattern Recognition pp 49ndash562010 6 7

Page 3: SUBMITTED 1 Visual Tracking via Boolean Map Representations

SUBMITTED 3

Fig 2 Right two columns reconstructed LAB+HOG representationsof the target by BMRs in our experiments Left two columns the cor-responding prototypes shown in Figure 1 Some reconstructed oneswith more connected structures than their prototypes are highlightedin yellow

(ncol = 16 in this work) Furthermore we leverage the HOGfeatures to capture edge orientation information of a targetobject denoted as Φhog(x) isin Rnhogtimesnhogtimes31 (nhog = 4 inthis work) Figure 1 demonstrates that most color and HOGfeature maps of the target own center-surrounded patterns thatare similar to biologically plausible architecture of primatesin [32] We normalize both Φcol(x) and Φhog(x) to rangefrom 0 to 1 and concatenate Φcol(x) and Φhog(x) to forma feature vector φ(x) isin Rdtimes1 with d = 3n2col + 31n2hog Thefeature vector is rescaled to [0 1] by

φ(x)larr φ(x)minusmin(φ(x))

max(φ(x))minusmin(φ(x)) (1)

where max(middot) and min(middot) denotes the maximal and minimaloperators respectively

Next φ(x) in (1) is encoded into a set of vectorized Booleanmaps B(x) = bi(x)ci=1 by

bi(x) =

1 φ(x) θi0 otherwise

(2)

where θi sim U(0 1) is a threshold drawn from a uniformdistribution over [0 1] and the symbol denotes elementwiseinequality In this work we set θi = ic that is simply sampledat a fixed-step size δ = 1c and a fixed-step sampling isequivalent to the uniform sampling in the limit δ rarr 0 [28]Hence we have b1(x) b2(x) bc(x) It is easy toshow that

0 le φk(x)minus1

c

csumj=1

bjk(x) lt δ (3)

where φk and bjk are the k-th entries of φ and bj respec-tively

Proof Without loss of generality we assume that iδ leφk(x) lt (1+ i)δ i = 0 c As such we have bjk(x) = 1for all j le i because b1(x) b2(x) bc(x) andbjk(x) = 0 for j gt i Therefore we have 1

c

sumcj=1 bjk(x) =

iδ and 0 le φk(x)minus 1c

sumcj=1 bjk(x) lt (1+ i)δminus iδ = δ

In (3) when δ rarr 0 (ie θi sim U(0 1)) we have

φk(x) =1

c

csumj=1

bjk(x) (4)

In this work we set δ = 025 Although (4) may notbe strictly satisfied empirical results show that most distinct

structures in φ(x) can be reconstructed as demonstrated inFigure 2 Furthermore the reconstructed representations con-tain more connected structures than the original ones (seethe ones highlighted in yellow in Figure 2) which showsthat the Boolean maps facilitate capturing global geometricinformation of target objects

Based on (4) to measure the similarity between two samplesx and y we use the intersection function [34]

I(φ(x)φ(y)) =dsumk=1

min(φk(x) φk(y))

=

dsumk=1

min(1

c

csumj=1

bjk(x)1

c

csumj=1

bjk(y))

=1

c

dsumk=1

csumj=1

min(bjk(x) bjk(y))

=1

c

dsumk=1

csumj=1

bjk(x)bjk(y)

=lt b(x) b(y) gt

(5)

where b = [bgt1 bgtc ]gtradicc

To avoid favoring larger input sets [34] we normalizeI(φ(x)φ(y)) in (5) and define the kernel I(φ(x)φ(y)) as

I(φ(x)φ(y)) = I(φ(x)φ(y))radicI(φ(x)φ(x))I(φ(y)φ(y))

=lt b(x)b(y) gt

(6)

where b(middot) is an explicit feature map function In this workthe feature map function is defined by

b(x) =b(x)

|b(x)|12 (7)

where |middot|12 is an `2 norm operator We use b(x) to train a linearclassifier which is able to address the nonlinear classificationproblem in the feature space of φ for visual tracking withfavorable performance The proposed tracking algorithm basedon BMR is summarized in Algorithm 1

B Learning Linear Classifier with BMRs

We pose visual tracking as a binary classification problemwith local search in which a linear classifier is learned inthe Boolean map feature space to separate the target from thebackground Specifically we use a logistic regressor to learnthe classifier for measuring similarity scores of samples

Let lt(xit) isin R2 denote the location of the i-th sample at

frame t We assume that lt(xt) is the object location anddensely draw samples Dα = x|||lt(x)minus lt(xt)|| lt α withina search radius α centered at the current object location andlabel them as positive samples Next we uniformly samplesome patches from set Dζβ = x||ζ lt ||lt(x) minus lt(xt)|| ltβ and label them as negative samples After representingthese samples with BMRs we obtain a set of training dataDt = (b(xit) yit)

nti=1 where yit isin +1minus1 is the class

label and nt is the number of samples The cost function at

SUBMITTED 4

Algorithm 1 BMR

Input Normalized image patch x1) Compute feature vector φ(x) in (1)2) for all entries φi(x) i = 1 d of φ(x) do3) for θi = δ δ 1minus δ do4) if φi(x) gt θi5) bi(x) = 16) else7) bi(x) = 08) end if9) end for

10) end for11) b(x)larr [b1(x) bcd(x)]

gtradicc

12) b(x)larr b(x)|b(x)|12 NormalizationOutput BMR b(x)

Algorithm 2 BMR-based Tracking

Input Target state stminus1 classifier parameter vector wt1) Sample np candidate particles sit

np

i=1 with the motionmodel p(sit |stminus1) in (12)

2) For each particle sit extract the corresponding imagepatch xit and compute the BMR b(xit) by Algo-rithm 1 and compute the corresponding observationmodel p(ot|sit) by (13)

3) Estimate the optimal state st by (12) and obtain thecorresponding image patch xt

4) if f(xt) lt ρ5) Update wt by iterating (10) until convergence and

set wt+1 larr wt6) else7) wt+1 larr wt8) end if

Output Target state st and classifier parameter vectorwt+1

frame t is defined as the negative log-likelihood for logisticregression

`t(w) =1

nt

ntsumi=1

log(1 + exp(minusyitwgtb(xit))) (8)

where w is the classifier parameter vector and the correspond-ing classifier is denoted as

f(x) =1

1 + exp(minuswgtb(x)) (9)

We use a gradient descent method to minimize `t(w) byiterating

wlarr w minus part`t(w)

partw (10)

where part`t(w)partw minus 1

nt

sumnt

i=1 b(xit)yit exp(minusy

itw

gtb(xit))

1+exp(minusyitwgtb(xit))

In thiswork we use the parameter wtminus1 obtained at frame tminus 1 toinitialize w in (10) and iterate 20 times for updates

C Proposed Tracking Algorithm

We estimate the target states sequentially within a particlefilter framework Given the observation set Ot = oiti=1 upto frame t the target sate st is obtained by maximizing theposteriori probability

st = argmaxstp(st|Ot) prop p(ot|st)p(st|Otminus1) (11)

where p(st|Otminus1) intp(st|stminus1)p(stminus1|Otminus1)dstminus1 st =

[xt yt st] is the target state with translations xt and yt andscale st p(st|stminus1) is a dynamic model that describes thetemporal correlation of the target states in two consecutiveframes and p(ot|st) is the observation model that estimatesthe likelihood of a state given an observation In the proposedalgorithm we assume that the target state parameters areindependent and modeled by three scalar Gaussian distri-butions between two consecutive frames ie p(st|stminus1) =N (st|stminus1Σ) where Σ = diag(σx σy σs) is a diagonalcovariance matrix whose elements are the standard deviationsof the target state parameters In visual tracking the posteriorprobability p(st|Ot) in (11) is approximated by a finite setof particles sit

np

i=1 that are sampled with correspondingimportance weights πit

np

i=1 where πit prop p(ot|sit) Therefore(11) can be approximated as

st = arg maxsit

npi=1

p(ot|sit)p(sit|stminus1) (12)

In our method the observation model p(ot|sit) is defined as

p(ot|sit) prop f(xit) (13)

where f(xit) is the logistic regression classifier defined by (9)To adapt to target appearance variations while preserving

the stable information that helps prevent the tracker fromdrifting to background we update the classifier parametersw in a conservative way We update w by (10) only whenthe confidence of the target falls below a threshold ρ Thisensures that the target states always have high confidencescores and alleviate the problem of including noisy sampleswhen updating classifier [22] The main steps of the proposedalgorithm are summarized in Algorithm 2

III EXPERIMENTAL RESULTS

We first present implementation details of the proposedalgorithm and discuss the dataset and metrics for perfor-mance evaluation Next we analyze the empirical results usingwidely-adopted metrics We present ablation study to examinethe effectiveness of each key component in the proposed BMRscheme Finally we show and analyze some failure cases

A Implementation Details

All images are resized to a fixed size of 240times320 pixels [22]for experiments and each patch is resized to a canonical size of32times32 pixels In addition each canonical patch is subsampledto a half size with ncol = 16 for color representations TheHOG features are extracted from the canonical patches thatsupports both gray and color images and the sizes of HOGfeature maps are the same as nhog times nhog times 31 = 4times 4times 31(as implemented in httpgithubcompdollartoolbox)

SUBMITTED 5

For grayscale videos the original image patches are usedto extract raw intensity and HOG features and the featuredimension d = 4times4times31+16times16 = 752 For color videos theimage patches are transformed to the CIE LAB color space toextract raw color features and the original RGB image patchesare used to extract HOG features The corresponding totaldimension d = 4times 4times 31+16times 16times 3 = 1264 The numberof Boolean maps is set to c = 4 and the total dimensionof BMRs is 3d = 2256 for gray videos and 3792 for colorvideos and the sampling step δ = 1c = 025 The searchradius for positive samples is set to α = 3 The inner searchradius for negative samples is set to 03min(w h) where wand h are the weight and height of the target respectivelyand the outer search radius β = 100 where the searchstep is set to 5 which generates a small subset of negativesamples The target state parameter set for particle filter is setto [σx σy σs] = [6 6 001] and the number of particles is setto np = 400 The confidence threshold is set to ρ = 09 Allparameter values are fixed for all sequences and the sourcecode will be made available to the public More results andvideos are available at httpkaihuazhangnetbmrbmrhtm

B Dataset and Evaluation Metrics

For performance evaluation we use the tracking benchmarkdataset and code library [17] which includes 29 trackers and50 fully-annotated videos In addition we also add the corre-sponding results of 6 most recent trackers including DLT [35]DSST [36] KCF [15] TGPR [37] MEEM [5] and HCF [23]For detailed analysis the sequences are annotated with 11attributes based on different challenging factors including lowresolution (LR) in-plane rotation (IPR) out-of-plane rotation(OPR) scale variation (SV) occlusion (OCC) deformation(DEF) background clutters (BC) illumination variation (IV)motion blur (MB) fast motion (FM) and out-of-view (OV)

We quantitatively evaluate the trackers with success andprecision plots [17] Given the tracked bounding box BT andthe ground truth bounding box BG the overlap score is definedas score = Area(BT

⋂BG)

Area(BT

⋃BG) Hence 0 le score le 1 and a

larger value of score means a better performance of the eval-uated tracker The success plot demonstrates the percentageof frames with score gt t through all threshold t isin [0 1]Furthermore the area under curve (AUC) of each success plotserves as a measure to rank the evaluated trackers On theother hand the precision plot shows the percentage of frameswhose tracked locations are within a given threshold distance(ie 20 pixels in [17]) to the ground truth Both success andprecision plots are used in the one-pass evaluation (OPE)temporal robustness evaluation (TRE) and spatial robustnessevaluation (SRE) where OPE reports the average precision orsuccess rate by running the trackers through a test sequencewith initialization from the ground truth position and TRE aswell as SRE measure a trackerprimes robustness to initializationwith temporal and spatial perturbations respectively [17] Wereport the OPE TRE and SRE results For presentation claritywe only present the top 10 algorithms in each plot

C Empirical Results

1) Overall Performance Figure 3 shows overall perfor-mance of the top 10 trackers in terms of success and precisionplots The BMR-based tracking algorithm ranks first on thesuccess rate of all OPE and second based on TRE andSRE Furthermore the BMR-based method ranks third basedon the precision rates of OPE TRE and SRE Overall theproposed BMR-based tracker performs favorably against thestate-of-the-art methods in terms of all metrics except forMEEM [5] and HCF [23] The MEEM tracker exploits a multi-expert restoration scheme to handle the drift problem whichcombines a tracker and the historical snapshots as expertsIn contrast even using only a logistic regression classifierwithout using any restoration strategy the proposed BMR-based method performs well against MEEM in terms of mostmetrics (ie the success rates of the BMR-based methodoutperform the MEEM scheme while the precision rates of theBMR-based method are comparable to the MEEM scheme)which shows the effectiveness of the proposed representationscheme for visual tracking In addition the HCF method isbased on deep learning which leverages complex hierarchicalconvolutional features learned off-line from a large datasetand correlation filters for visual tracking Notwithstanding theproposed BMR-based algorithm performs comparably againstHCF in terms of success rates on all metrics

2) Attribute-based Performance To demonstrate thestrength and weakness of BMR we further evaluate the 35trackers on videos with 11 attributes categorized by [17]

Table I and II summarize the results of success and precisionscores of OPE with different attributes Among them theBMR-based method ranks within top 3 with most attributesSpecifically with the success rate of OPE the BMR-basedmethod ranks first on 4 out of 11 attributes while second on 6out of 11 attributes In the sequences with the BC attribute theBMR-based method ranks third and its score is close to theMEEM scheme that ranks second (0555 vs 0569) For theprecision scores of OPE the BMR-based method ranks secondon 4 out of 11 attributes and third on 3 out of 11 attributes Inthe sequences with the OV attribute the BMR-based trackerranks first and for the videos with the IPR and BC attributesthe proposed tracking algorithm ranks fourth with comparableperformance to the third-rank DSST and KCF methods

Table III and IV show the results of TRE with differentattributes The BMR-based method ranks within top 3 withmost attributes In terms of success rates the BMR-basedmethod ranks first on 2 attributes second on 3 attributes andthird on 6 attributes In terms of precision rates the BMR-based tracker ranks third on 7 attributes and first and secondon the OV and OCC attributes respectively Furthermore forother attributes such as LR and BC the BMR-based trackingalgorithm ranks fourth but it scores are close to the results ofMEEM and KCF that rank third (0581 vs 0598 and 0772vs 0776)

Table V and VI show the results of SRE with differentattributes In terms of success rates the rankings of the BMR-based method are similar to those based on TRE except forthe IPR and OPR attributes Among them the BMR-based

SUBMITTED 6

Fig 3 Success and precision plots of OPE TRE and SRE by the top 10 trackers The trackers are ranked by the AUC scores (shown inthe legends) when the success rates are used or precession cores at the threshold of 20 pixels

TABLE I Success score of OPE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0409 0557 0360 0310 0352 0370 0279 0372 0309 0157 0256IPR (31) 0557 0582 0535 0497 0532 0479 0458 0444 0416 0425 0383

OPR (39) 0590 0587 0558 0496 0491 0485 0470 0432 0420 0422 0393SV (28) 0586 0531 0498 0427 0451 0418 0518 0425 0421 0452 0458

OCC (29) 0615 0606 0552 0513 0480 0484 0487 0413 0402 0376 0384DEF (19) 0594 0626 0560 0533 0474 0510 0448 0393 0378 0372 0330

BC (21) 0555 0623 0569 0533 0492 0522 0450 0458 0345 0408 0327IV (25) 0551 0560 0533 0494 0506 0484 0473 0428 0399 0429 0392

MB (12) 0559 0616 0541 0499 0458 0434 0298 0433 0404 0258 0329FM (17) 0559 0578 0553 0461 0433 0396 0296 0462 0417 0247 0353

OV (6) 0616 0575 0606 0550 0490 0442 0361 0459 0457 0312 0409

TABLE II Precision scores of OPE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0517 0897 0490 0379 0534 0538 0305 0545 0349 0156 0303IPR (31) 0776 0868 0800 0725 0780 0675 0597 0617 0584 0511 0510

OPR (39) 0819 0869 0840 0730 0732 0678 0618 0597 0596 0518 0527SV (28) 0803 0880 0785 0680 0740 0620 0672 0639 0606 0552 0606

OCC (29) 0846 0877 0799 0749 0725 0675 0640 0564 0563 0460 0495DEF (19) 0802 0881 0846 0741 0657 0691 0586 0521 0512 0445 0512

BC (21) 0742 0885 0797 0752 0691 0717 0578 0585 0428 0496 0440IV (25) 0742 0844 0766 0729 0741 0671 0594 0558 0537 0517 0492

MB (12) 0755 0844 0715 0650 0603 0537 0339 0551 0518 0278 0427FM (17) 0758 0790 0742 0602 0562 0493 0333 0604 0551 0253 0435

OV (6) 0773 0695 0727 0649 0533 0505 0429 0539 0576 0333 0505

tracker ranks third based on SRE and second based on TREFurthermore although the MEEM method ranks higher thanthe BMR-based tracker in most attributes the differences ofthe scores are within 1 In terms of precision rates the BMR-based algorithm ranks within top 3 with most attributes exceptfor the LR DEF and IV attributes

The AUC score of success rate measures the overall perfor-

mance of each tracking method [17] Figure 3 shows that theBMR-based method achieves better results in terms of successrates than that precision rates in terms of all metrics (OPESRE TRE) and attributes The tracking performance can beattributed to two factors First the proposed method exploits alogistic regression classifier with explicit feature maps which

SUBMITTED 7

TABLE III Success scores of TRE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0444 0520 0424 0382 0403 0443 0304 0456 0299 0278 0324IPR (31) 0562 0591 0558 0520 0515 0514 0453 0473 0406 0451 0423

OPR (39) 0578 0595 0572 0531 0507 0523 0480 0477 0425 0465 0428SV (28) 0564 0544 0517 0488 0473 0468 0496 0446 0418 0487 0448

OCC (29) 0585 0610 0566 0547 0519 0520 0502 0462 0426 0444 0426DEF (19) 0599 0651 0611 0571 0548 0577 0515 0500 0425 0466 0399

BC (21) 0575 0631 0577 0565 0518 0530 0469 0478 0372 0445 0366IV (25) 0555 0597 0564 0528 0529 0518 0475 0486 0402 0468 0427

MB (12) 0537 0594 0553 0493 0472 0483 0290 0485 0388 0296 0349FM (17) 0516 0560 0542 0456 0429 0461 0282 0464 0392 0285 0350

OV (6) 0593 0557 0581 0539 0505 0440 0344 0417 0434 0325 0403

TABLE IV Precision scores of TRE with 11 attributes The number after each attribute name is the number of sequences The red blueand green fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0581 0750 0589 0501 0574 0602 0350 0628 0376 0325 0391IPR (31) 0767 0851 0802 0728 0725 0716 0581 0650 0569 0582 0572

OPR (39) 0789 0859 0826 0749 0719 0728 0617 0660 0597 0605 0584SV (28) 0769 0840 0787 0727 0717 0676 0633 0652 0600 0634 0594

OCC (29) 0791 0854 0788 0758 0726 0705 0633 0631 0579 0560 0550DEF (19) 0798 0889 0854 0757 0723 0765 0635 0655 0571 0571 0556

BC (21) 0772 0874 0793 0776 0697 0721 0600 0622 0488 0575 0517IV (25) 0747 0851 0792 0729 0727 0693 0585 0643 0543 0584 0572

MB (12) 0720 0785 0724 0626 0597 0607 0323 0617 0491 0332 0450FM (17) 0681 0738 0710 0578 0532 0582 0302 0580 0487 0305 0432

OV (6) 0719 0692 0692 0643 0587 0514 0371 0484 0485 0339 0470

TABLE V Success scores of SRE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0352 0488 0374 0289 0326 0332 0254 0360 0305 0213 0243IPR (31) 0487 0537 0494 0450 0460 0438 0399 0410 0380 0405 0357

OPR (39) 0510 0536 0514 0445 0439 0455 0396 0409 0387 0404 0368SV (28) 0524 0492 0463 0401 0413 0396 0438 0395 0384 0440 0402

OCC (29) 0524 0543 0510 0445 0434 0449 0398 0405 0384 0381 0354DEF (19) 0492 0566 0516 0469 0434 0504 0358 0398 0357 0386 0322

BC (21) 0500 0569 0517 0483 0451 0483 0387 0408 0334 0410 0303IV (25) 0486 0516 0490 0442 0446 0438 0389 0396 0350 0405 0347

MB (12) 0503 0565 0513 0425 0389 0420 0266 0451 0385 0256 0312FM (17) 0504 0534 0518 0415 0384 0412 0269 0464 0392 0285 0350

OV (6) 0578 0526 0575 0455 0426 0391 0335 0421 0407 0316 0314

TABLE VI Precision scores of SRE with 11 attributes The number after each attribute name is the number of sequences The red blueand green fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0476 0818 0511 0377 0543 0501 0305 0504 0363 0263 0299IPR (31) 0704 0839 0752 0667 0704 0648 0546 0592 0554 0556 0503

OPR (39) 0732 0828 0774 0666 0680 0669 0547 0595 0560 0560 0525SV (28) 0752 0832 0732 0632 0696 0599 0598 0607 0558 0601 0562

OCC (29) 0735 0815 0730 0662 0671 0649 0540 0568 0516 0514 0483DEF (19) 0684 0835 0757 0677 0630 0715 0475 0547 0505 0516 0467

BC (21) 0702 0851 0734 0693 0655 0698 0521 0555 0451 0555 0439IV (25) 0677 0809 0707 0652 0681 0630 0509 0556 0480 0544 0472

MB (12) 0686 0807 0691 0567 0532 0561 0309 0587 0521 0310 0388FM (17) 0685 0748 0694 0545 0505 0544 0308 0577 0496 0291 0397

OV (6) 0719 0644 0690 0533 0504 0451 0386 0455 0463 0355 0360

efficiently determines the nonlinear decision boundary throughonline training Second the online classifier parameter updatescheme in (10) facilitates recovering from tracking drift

Figure 4 shows sampled tracking results from six longsequences (each with more than 1000 frames) The totalnumber of frames of these sequences is 11 134 that ac-counts for about 384 of the total number of frames (about29 000) in the benchmark and hence the performance on

these sequences plays an important role in performance eval-uation For clear presentation only the results of the topperforming BMR HCF and MEEM methods are shown Inall sequences the BMR-based tracker is able to track thetargets stably over almost all frames However the HCFscheme drifts away from the target objects after a fewframes in the sylvester (117812851345) and lemming(38611371336) sequences The MEEM method drifts

SUBMITTED 8

Fig 4 Screenshots sampled results from six long sequences sylvester mhyang dog1 lemming liquor and doll

to background when severe occlusions happen in the liquorsequence (508728775) To further demonstrate theresults over all frames clearly Figure 5 shows the plots interms of overlap score of each frame Overall the BMR-basedtracker performs well against the HCF and MEEM methodsin most frames of these sequences

D Analysis of BMR

To demonstrate the effectiveness of BMRs we eliminatethe component of Boolean maps in the proposed trackingalgorithm and only leverage the LAB+HOG representationsfor visual tracking In addition we use the KCF as a baselineas it adopts the HOG representations as the proposed trackingmethod Figure 6 shows quantitative comparisons on thebenchmark dataset Without using the proposed Boolean mapsthe AUC score of success rate in OPE of the proposed methodis reduced by 75 For TRE and SRE the AUC scores of theproposed method are reduced by 39 and 47 respectivelywithout the component of Boolean maps It is worth noticingthat the proposed method without using the Boolean maps

still outperforms KCF in terms of all metrics on success rateswhich shows the effectiveness of the LAB color features inBMR These experimental results show that the BMRs in theproposed method play a key role for robust visual tracking

E Failure Cases

Figure 7 shows failed results of the proposed BMR-basedmethod in two sequences singer2 and motorRolling In thesinger2 sequence the foreground object and background sceneare similar due to the dim stage lighting at the beginning(1025) The HCF MEEM and proposed methods all driftto the background Furthermore as the targets in the motor-Rolling sequence undergo from 360-degree in-plane rotationin early frames (35) the MEEM and proposed methods donot adapt to drastic appearance variations well due to limitedtraining samples In contrast only the HCF tracker performswell in this sequence because it leverages dense sampling andhigh-dimensional convolutional features

SUBMITTED 9

Fig 5 Overlap score plots of six long sequences shown in Figure 4

IV CONCLUSIONS

In this paper we propose a Boolean map based representa-tion which exploits the connectivity cues for visual trackingIn the BMR scheme the HOG and raw color feature mapsare decomposed into a set of Boolean maps by uniformlythresholding the respective channels These Boolean maps areconcatenated and normalized to form a robust representationwhich approximates an explicit feature map of the intersectionkernel A logistic regression classifier with the explicit featuremap is trained in an online manner that determines thenonlinear decision boundary for visual tracking Extensiveevaluations on a large tracking benchmark dataset demonstratethe proposed tracking algorithm performs favorably against thestate-of-the-art algorithms in terms of accuracy and robustness

REFERENCES

[1] X Li W Hu C Shen Z Zhang A Dick and A V D HengelldquoA survey of appearance models in visual object trackingrdquo ACMTransactions on Intelligent Systems and Technology vol 4 no 4 p 582013 1

[2] B D Lucas and T Kanade ldquoAn iterative image registration techniquewith an application to stereo visionrdquo in International Joint Conferenceon Artificial Intelligence vol 81 pp 674ndash679 1981 1

[3] I Matthews T Ishikawa and S Baker ldquoThe template update prob-lemrdquo IEEE Transactions on Pattern Analysis and Machine Intelligencevol 26 no 6 pp 810ndash815 2004 1

[4] J Henriques R Caseiro P Martins and J Batista ldquoExploiting thecirculant structure of tracking-by-detection with kernelsrdquo in Proceedingsof European Conference on Computer Vision pp 702ndash715 2012 1

[5] J Zhang S Ma and S Sclaroff ldquoMeem Robust tracking via multi-ple experts using entropy minimizationrdquo in Proceedings of EuropeanConference on Computer Vision pp 188ndash203 2014 1 2 5 6 7

[6] M J Black and A D Jepson ldquoEigentracking Robust matching andtracking of articulated objects using a view-based representationrdquo Inter-national Journal of Computer Vision vol 26 no 1 pp 63ndash84 19981

[7] D Ross J Lim R Lin and M-H Yang ldquoIncremental learningfor robust visual trackingrdquo International Journal of Computer Visionvol 77 no 1 pp 125ndash141 2008 1

[8] X Mei and H Ling ldquoRobust visual tracking and vehicle classificationvia sparse representationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 33 no 11 pp 2259ndash2272 2011 1

[9] T Zhang B Ghanem S Liu and N Ahuja ldquoRobust visual trackingvia multi-task sparse learningrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2042ndash2049 2012 1

[10] D Wang H Lu and M-H Yang ldquoOnline object tracking with sparseprototypesrdquo IEEE Transactions on Image Processing vol 22 no 1pp 314ndash325 2013 1

[11] A Adam E Rivlin and I Shimshoni ldquoRobust fragments-based trackingusing the integral histogramrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 798ndash805 2006 1

[12] S He Q Yang R Lau J Wang and M-H Yang ldquoVisual tracking vialocality sensitive histogramsrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2427ndash2434 2013 1

[13] B Babenko M-H Yang and S Belongie ldquoRobust object trackingwith online multiple instance learningrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 33 no 8 pp 1619ndash1632 20111

[14] S Hare A Saffari and P H Torr ldquoStruck Structured output trackingwith kernelsrdquo in Proceedings of the IEEE International Conference onComputer Vision pp 263ndash270 2011 1 6 7

[15] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speedtracking with kernelized correlation filtersrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 37 no 3 pp 583ndash5962015 1 5 6 7

[16] N Dalal and B Triggs ldquoHistograms of oriented gradients for humandetectionrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition vol 1 pp 886ndash893 2005 1

[17] Y Wu J Lim and M-H Yang ldquoOnline object tracking A benchmarkrdquoin Proceedings of IEEE Conference on Computer Vision and PatternRecognition pp 2411ndash2418 2013 1 2 5 6

[18] J Kwon and K M Lee ldquoTracking of a non-rigid object via patch-baseddynamic appearance modeling and adaptive basin hopping monte carlosamplingrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition pp 1208ndash1215 2009 1

[19] X Jia H Lu and M-H Yang ldquoVisual tracking via adaptive structurallocal sparse appearance modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1822ndash1829 2012 1 67

SUBMITTED 10

Fig 6 Success and precision plots of OPE TRE and SRE for BMR BMR only with LAB+HOG representations and KCF (KCF is usedas a baseline for comparisons)

Fig 7 Failure cases of the BMR-based tracker in the singer2 and motorRolling sequences The results of HCF and MEEM are also illustrated

[20] W Zhong H Lu and M-H Yang ldquoRobust object tracking via sparsity-based collaborative modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1838ndash1845 2012 16 7

[21] Y Li and J Zhu ldquoA scale adaptive kernel correlation filter trackerwith feature integrationrdquo in European Conference on Computer Vision-Workshops pp 254ndash265 2014 1

[22] N Wang J Shi D-Y Yeung and J Jia ldquoUnderstanding and diagnosingvisual tracking systemsrdquo in Proceedings of the IEEE InternationalConference on Computer Vision pp 3101ndash3109 2015 1 2 4

[23] C Ma J-B Huang X Yang and M-H Yang ldquoHierarchical con-volutional features for visual trackingrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 3074ndash3082 20151 2 5 6 7

[24] R Allen P Mcgeorge D Pearson and A B Milne ldquoAttention andexpertise in multiple target trackingrdquo Applied Cognitive Psychologyvol 18 no 3 pp 337ndash347 2004 2

[25] P Cavanagh and G A Alvarez ldquoTracking multiple targets with multifo-cal attentionrdquo Trends in Cognitive Sciences vol 9 no 7 pp 349ndash3542005 2

[26] A Set and B Set ldquoTopological structure in visual perceptionrdquo Sciencevol 218 p 699 1982 2

[27] S E Palmer Vision Science Photons to Phenomenology vol 1 MIT

Press 1999 2[28] J Zhang and S Sclaroff ldquoSaliency detection A boolean map approachrdquo

in Proceedings of the IEEE International Conference on ComputerVision pp 153ndash160 2013 2 3

[29] B Alexe T Deselaers and V Ferrari ldquoMeasuring the objectness ofimage windowsrdquo IEEE Transactions Pattern Analysis and MachineIntelligence vol 34 no 11 pp 2189ndash2202 2012 2

[30] M-M Cheng Z Zhang W-Y Lin and P Torr ldquoBing Binarizednormed gradients for objectness estimation at 300fpsrdquo in Proceedingsof IEEE Conference on Computer Vision and Pattern Recognitionpp 3286ndash3293 2014 2

[31] L Huang and H Pashler ldquoA boolean map theory of visual attentionrdquoPsychological review vol 114 no 3 p 599 2007 2

[32] L Itti C Koch and E Niebur ldquoA model of saliency-based visual at-tention for rapid scene analysisrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence no 11 pp 1254ndash1259 1998 2 3

[33] M S Livingstone and D H Hubel ldquoAnatomy and physiology of a colorsystem in the primate visual cortexrdquo The Journal of Neuroscience vol 4no 1 pp 309ndash356 1984 2

[34] K Grauman and T Darrell ldquoThe pyramid match kernel Efficientlearning with sets of featuresrdquo The Journal of Machine LearningResearch vol 8 pp 725ndash760 2007 3

[35] N Wang and D-Y Yeung ldquoLearning a deep compact image representa-

SUBMITTED 11

tion for visual trackingrdquo in Advances in Neural Information ProcessingSystems pp 809ndash817 2013 5 6 7

[36] M Danelljan G Hager F Khan and M Felsberg ldquoAccurate scaleestimation for robust visual trackingrdquo in Proceedings of British MachineVision Conference 2014 5 6 7

[37] J Gao H Ling W Hu and J Xing ldquoTransfer learning based visualtracking with Gaussian processes regressionrdquo in Proceedings of Euro-pean Conference on Computer Vision pp 188ndash203 2014 5 6 7

[38] Z Kalal J Matas and K Mikolajczyk ldquoPn learning Bootstrappingbinary classifiers by structural constraintsrdquo in Proceedings of IEEEConference on Computer Vision and Pattern Recognition pp 49ndash562010 6 7

Page 4: SUBMITTED 1 Visual Tracking via Boolean Map Representations

SUBMITTED 4

Algorithm 1 BMR

Input Normalized image patch x1) Compute feature vector φ(x) in (1)2) for all entries φi(x) i = 1 d of φ(x) do3) for θi = δ δ 1minus δ do4) if φi(x) gt θi5) bi(x) = 16) else7) bi(x) = 08) end if9) end for

10) end for11) b(x)larr [b1(x) bcd(x)]

gtradicc

12) b(x)larr b(x)|b(x)|12 NormalizationOutput BMR b(x)

Algorithm 2 BMR-based Tracking

Input Target state stminus1 classifier parameter vector wt1) Sample np candidate particles sit

np

i=1 with the motionmodel p(sit |stminus1) in (12)

2) For each particle sit extract the corresponding imagepatch xit and compute the BMR b(xit) by Algo-rithm 1 and compute the corresponding observationmodel p(ot|sit) by (13)

3) Estimate the optimal state st by (12) and obtain thecorresponding image patch xt

4) if f(xt) lt ρ5) Update wt by iterating (10) until convergence and

set wt+1 larr wt6) else7) wt+1 larr wt8) end if

Output Target state st and classifier parameter vectorwt+1

frame t is defined as the negative log-likelihood for logisticregression

`t(w) =1

nt

ntsumi=1

log(1 + exp(minusyitwgtb(xit))) (8)

where w is the classifier parameter vector and the correspond-ing classifier is denoted as

f(x) =1

1 + exp(minuswgtb(x)) (9)

We use a gradient descent method to minimize `t(w) byiterating

wlarr w minus part`t(w)

partw (10)

where part`t(w)partw minus 1

nt

sumnt

i=1 b(xit)yit exp(minusy

itw

gtb(xit))

1+exp(minusyitwgtb(xit))

In thiswork we use the parameter wtminus1 obtained at frame tminus 1 toinitialize w in (10) and iterate 20 times for updates

C Proposed Tracking Algorithm

We estimate the target states sequentially within a particlefilter framework Given the observation set Ot = oiti=1 upto frame t the target sate st is obtained by maximizing theposteriori probability

st = argmaxstp(st|Ot) prop p(ot|st)p(st|Otminus1) (11)

where p(st|Otminus1) intp(st|stminus1)p(stminus1|Otminus1)dstminus1 st =

[xt yt st] is the target state with translations xt and yt andscale st p(st|stminus1) is a dynamic model that describes thetemporal correlation of the target states in two consecutiveframes and p(ot|st) is the observation model that estimatesthe likelihood of a state given an observation In the proposedalgorithm we assume that the target state parameters areindependent and modeled by three scalar Gaussian distri-butions between two consecutive frames ie p(st|stminus1) =N (st|stminus1Σ) where Σ = diag(σx σy σs) is a diagonalcovariance matrix whose elements are the standard deviationsof the target state parameters In visual tracking the posteriorprobability p(st|Ot) in (11) is approximated by a finite setof particles sit

np

i=1 that are sampled with correspondingimportance weights πit

np

i=1 where πit prop p(ot|sit) Therefore(11) can be approximated as

st = arg maxsit

npi=1

p(ot|sit)p(sit|stminus1) (12)

In our method the observation model p(ot|sit) is defined as

p(ot|sit) prop f(xit) (13)

where f(xit) is the logistic regression classifier defined by (9)To adapt to target appearance variations while preserving

the stable information that helps prevent the tracker fromdrifting to background we update the classifier parametersw in a conservative way We update w by (10) only whenthe confidence of the target falls below a threshold ρ Thisensures that the target states always have high confidencescores and alleviate the problem of including noisy sampleswhen updating classifier [22] The main steps of the proposedalgorithm are summarized in Algorithm 2

III EXPERIMENTAL RESULTS

We first present implementation details of the proposedalgorithm and discuss the dataset and metrics for perfor-mance evaluation Next we analyze the empirical results usingwidely-adopted metrics We present ablation study to examinethe effectiveness of each key component in the proposed BMRscheme Finally we show and analyze some failure cases

A Implementation Details

All images are resized to a fixed size of 240times320 pixels [22]for experiments and each patch is resized to a canonical size of32times32 pixels In addition each canonical patch is subsampledto a half size with ncol = 16 for color representations TheHOG features are extracted from the canonical patches thatsupports both gray and color images and the sizes of HOGfeature maps are the same as nhog times nhog times 31 = 4times 4times 31(as implemented in httpgithubcompdollartoolbox)

SUBMITTED 5

For grayscale videos the original image patches are usedto extract raw intensity and HOG features and the featuredimension d = 4times4times31+16times16 = 752 For color videos theimage patches are transformed to the CIE LAB color space toextract raw color features and the original RGB image patchesare used to extract HOG features The corresponding totaldimension d = 4times 4times 31+16times 16times 3 = 1264 The numberof Boolean maps is set to c = 4 and the total dimensionof BMRs is 3d = 2256 for gray videos and 3792 for colorvideos and the sampling step δ = 1c = 025 The searchradius for positive samples is set to α = 3 The inner searchradius for negative samples is set to 03min(w h) where wand h are the weight and height of the target respectivelyand the outer search radius β = 100 where the searchstep is set to 5 which generates a small subset of negativesamples The target state parameter set for particle filter is setto [σx σy σs] = [6 6 001] and the number of particles is setto np = 400 The confidence threshold is set to ρ = 09 Allparameter values are fixed for all sequences and the sourcecode will be made available to the public More results andvideos are available at httpkaihuazhangnetbmrbmrhtm

B Dataset and Evaluation Metrics

For performance evaluation we use the tracking benchmarkdataset and code library [17] which includes 29 trackers and50 fully-annotated videos In addition we also add the corre-sponding results of 6 most recent trackers including DLT [35]DSST [36] KCF [15] TGPR [37] MEEM [5] and HCF [23]For detailed analysis the sequences are annotated with 11attributes based on different challenging factors including lowresolution (LR) in-plane rotation (IPR) out-of-plane rotation(OPR) scale variation (SV) occlusion (OCC) deformation(DEF) background clutters (BC) illumination variation (IV)motion blur (MB) fast motion (FM) and out-of-view (OV)

We quantitatively evaluate the trackers with success andprecision plots [17] Given the tracked bounding box BT andthe ground truth bounding box BG the overlap score is definedas score = Area(BT

⋂BG)

Area(BT

⋃BG) Hence 0 le score le 1 and a

larger value of score means a better performance of the eval-uated tracker The success plot demonstrates the percentageof frames with score gt t through all threshold t isin [0 1]Furthermore the area under curve (AUC) of each success plotserves as a measure to rank the evaluated trackers On theother hand the precision plot shows the percentage of frameswhose tracked locations are within a given threshold distance(ie 20 pixels in [17]) to the ground truth Both success andprecision plots are used in the one-pass evaluation (OPE)temporal robustness evaluation (TRE) and spatial robustnessevaluation (SRE) where OPE reports the average precision orsuccess rate by running the trackers through a test sequencewith initialization from the ground truth position and TRE aswell as SRE measure a trackerprimes robustness to initializationwith temporal and spatial perturbations respectively [17] Wereport the OPE TRE and SRE results For presentation claritywe only present the top 10 algorithms in each plot

C Empirical Results

1) Overall Performance Figure 3 shows overall perfor-mance of the top 10 trackers in terms of success and precisionplots The BMR-based tracking algorithm ranks first on thesuccess rate of all OPE and second based on TRE andSRE Furthermore the BMR-based method ranks third basedon the precision rates of OPE TRE and SRE Overall theproposed BMR-based tracker performs favorably against thestate-of-the-art methods in terms of all metrics except forMEEM [5] and HCF [23] The MEEM tracker exploits a multi-expert restoration scheme to handle the drift problem whichcombines a tracker and the historical snapshots as expertsIn contrast even using only a logistic regression classifierwithout using any restoration strategy the proposed BMR-based method performs well against MEEM in terms of mostmetrics (ie the success rates of the BMR-based methodoutperform the MEEM scheme while the precision rates of theBMR-based method are comparable to the MEEM scheme)which shows the effectiveness of the proposed representationscheme for visual tracking In addition the HCF method isbased on deep learning which leverages complex hierarchicalconvolutional features learned off-line from a large datasetand correlation filters for visual tracking Notwithstanding theproposed BMR-based algorithm performs comparably againstHCF in terms of success rates on all metrics

2) Attribute-based Performance To demonstrate thestrength and weakness of BMR we further evaluate the 35trackers on videos with 11 attributes categorized by [17]

Table I and II summarize the results of success and precisionscores of OPE with different attributes Among them theBMR-based method ranks within top 3 with most attributesSpecifically with the success rate of OPE the BMR-basedmethod ranks first on 4 out of 11 attributes while second on 6out of 11 attributes In the sequences with the BC attribute theBMR-based method ranks third and its score is close to theMEEM scheme that ranks second (0555 vs 0569) For theprecision scores of OPE the BMR-based method ranks secondon 4 out of 11 attributes and third on 3 out of 11 attributes Inthe sequences with the OV attribute the BMR-based trackerranks first and for the videos with the IPR and BC attributesthe proposed tracking algorithm ranks fourth with comparableperformance to the third-rank DSST and KCF methods

Table III and IV show the results of TRE with differentattributes The BMR-based method ranks within top 3 withmost attributes In terms of success rates the BMR-basedmethod ranks first on 2 attributes second on 3 attributes andthird on 6 attributes In terms of precision rates the BMR-based tracker ranks third on 7 attributes and first and secondon the OV and OCC attributes respectively Furthermore forother attributes such as LR and BC the BMR-based trackingalgorithm ranks fourth but it scores are close to the results ofMEEM and KCF that rank third (0581 vs 0598 and 0772vs 0776)

Table V and VI show the results of SRE with differentattributes In terms of success rates the rankings of the BMR-based method are similar to those based on TRE except forthe IPR and OPR attributes Among them the BMR-based

SUBMITTED 6

Fig 3 Success and precision plots of OPE TRE and SRE by the top 10 trackers The trackers are ranked by the AUC scores (shown inthe legends) when the success rates are used or precession cores at the threshold of 20 pixels

TABLE I Success score of OPE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0409 0557 0360 0310 0352 0370 0279 0372 0309 0157 0256IPR (31) 0557 0582 0535 0497 0532 0479 0458 0444 0416 0425 0383

OPR (39) 0590 0587 0558 0496 0491 0485 0470 0432 0420 0422 0393SV (28) 0586 0531 0498 0427 0451 0418 0518 0425 0421 0452 0458

OCC (29) 0615 0606 0552 0513 0480 0484 0487 0413 0402 0376 0384DEF (19) 0594 0626 0560 0533 0474 0510 0448 0393 0378 0372 0330

BC (21) 0555 0623 0569 0533 0492 0522 0450 0458 0345 0408 0327IV (25) 0551 0560 0533 0494 0506 0484 0473 0428 0399 0429 0392

MB (12) 0559 0616 0541 0499 0458 0434 0298 0433 0404 0258 0329FM (17) 0559 0578 0553 0461 0433 0396 0296 0462 0417 0247 0353

OV (6) 0616 0575 0606 0550 0490 0442 0361 0459 0457 0312 0409

TABLE II Precision scores of OPE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0517 0897 0490 0379 0534 0538 0305 0545 0349 0156 0303IPR (31) 0776 0868 0800 0725 0780 0675 0597 0617 0584 0511 0510

OPR (39) 0819 0869 0840 0730 0732 0678 0618 0597 0596 0518 0527SV (28) 0803 0880 0785 0680 0740 0620 0672 0639 0606 0552 0606

OCC (29) 0846 0877 0799 0749 0725 0675 0640 0564 0563 0460 0495DEF (19) 0802 0881 0846 0741 0657 0691 0586 0521 0512 0445 0512

BC (21) 0742 0885 0797 0752 0691 0717 0578 0585 0428 0496 0440IV (25) 0742 0844 0766 0729 0741 0671 0594 0558 0537 0517 0492

MB (12) 0755 0844 0715 0650 0603 0537 0339 0551 0518 0278 0427FM (17) 0758 0790 0742 0602 0562 0493 0333 0604 0551 0253 0435

OV (6) 0773 0695 0727 0649 0533 0505 0429 0539 0576 0333 0505

tracker ranks third based on SRE and second based on TREFurthermore although the MEEM method ranks higher thanthe BMR-based tracker in most attributes the differences ofthe scores are within 1 In terms of precision rates the BMR-based algorithm ranks within top 3 with most attributes exceptfor the LR DEF and IV attributes

The AUC score of success rate measures the overall perfor-

mance of each tracking method [17] Figure 3 shows that theBMR-based method achieves better results in terms of successrates than that precision rates in terms of all metrics (OPESRE TRE) and attributes The tracking performance can beattributed to two factors First the proposed method exploits alogistic regression classifier with explicit feature maps which

SUBMITTED 7

TABLE III Success scores of TRE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0444 0520 0424 0382 0403 0443 0304 0456 0299 0278 0324IPR (31) 0562 0591 0558 0520 0515 0514 0453 0473 0406 0451 0423

OPR (39) 0578 0595 0572 0531 0507 0523 0480 0477 0425 0465 0428SV (28) 0564 0544 0517 0488 0473 0468 0496 0446 0418 0487 0448

OCC (29) 0585 0610 0566 0547 0519 0520 0502 0462 0426 0444 0426DEF (19) 0599 0651 0611 0571 0548 0577 0515 0500 0425 0466 0399

BC (21) 0575 0631 0577 0565 0518 0530 0469 0478 0372 0445 0366IV (25) 0555 0597 0564 0528 0529 0518 0475 0486 0402 0468 0427

MB (12) 0537 0594 0553 0493 0472 0483 0290 0485 0388 0296 0349FM (17) 0516 0560 0542 0456 0429 0461 0282 0464 0392 0285 0350

OV (6) 0593 0557 0581 0539 0505 0440 0344 0417 0434 0325 0403

TABLE IV Precision scores of TRE with 11 attributes The number after each attribute name is the number of sequences The red blueand green fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0581 0750 0589 0501 0574 0602 0350 0628 0376 0325 0391IPR (31) 0767 0851 0802 0728 0725 0716 0581 0650 0569 0582 0572

OPR (39) 0789 0859 0826 0749 0719 0728 0617 0660 0597 0605 0584SV (28) 0769 0840 0787 0727 0717 0676 0633 0652 0600 0634 0594

OCC (29) 0791 0854 0788 0758 0726 0705 0633 0631 0579 0560 0550DEF (19) 0798 0889 0854 0757 0723 0765 0635 0655 0571 0571 0556

BC (21) 0772 0874 0793 0776 0697 0721 0600 0622 0488 0575 0517IV (25) 0747 0851 0792 0729 0727 0693 0585 0643 0543 0584 0572

MB (12) 0720 0785 0724 0626 0597 0607 0323 0617 0491 0332 0450FM (17) 0681 0738 0710 0578 0532 0582 0302 0580 0487 0305 0432

OV (6) 0719 0692 0692 0643 0587 0514 0371 0484 0485 0339 0470

TABLE V Success scores of SRE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0352 0488 0374 0289 0326 0332 0254 0360 0305 0213 0243IPR (31) 0487 0537 0494 0450 0460 0438 0399 0410 0380 0405 0357

OPR (39) 0510 0536 0514 0445 0439 0455 0396 0409 0387 0404 0368SV (28) 0524 0492 0463 0401 0413 0396 0438 0395 0384 0440 0402

OCC (29) 0524 0543 0510 0445 0434 0449 0398 0405 0384 0381 0354DEF (19) 0492 0566 0516 0469 0434 0504 0358 0398 0357 0386 0322

BC (21) 0500 0569 0517 0483 0451 0483 0387 0408 0334 0410 0303IV (25) 0486 0516 0490 0442 0446 0438 0389 0396 0350 0405 0347

MB (12) 0503 0565 0513 0425 0389 0420 0266 0451 0385 0256 0312FM (17) 0504 0534 0518 0415 0384 0412 0269 0464 0392 0285 0350

OV (6) 0578 0526 0575 0455 0426 0391 0335 0421 0407 0316 0314

TABLE VI Precision scores of SRE with 11 attributes The number after each attribute name is the number of sequences The red blueand green fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0476 0818 0511 0377 0543 0501 0305 0504 0363 0263 0299IPR (31) 0704 0839 0752 0667 0704 0648 0546 0592 0554 0556 0503

OPR (39) 0732 0828 0774 0666 0680 0669 0547 0595 0560 0560 0525SV (28) 0752 0832 0732 0632 0696 0599 0598 0607 0558 0601 0562

OCC (29) 0735 0815 0730 0662 0671 0649 0540 0568 0516 0514 0483DEF (19) 0684 0835 0757 0677 0630 0715 0475 0547 0505 0516 0467

BC (21) 0702 0851 0734 0693 0655 0698 0521 0555 0451 0555 0439IV (25) 0677 0809 0707 0652 0681 0630 0509 0556 0480 0544 0472

MB (12) 0686 0807 0691 0567 0532 0561 0309 0587 0521 0310 0388FM (17) 0685 0748 0694 0545 0505 0544 0308 0577 0496 0291 0397

OV (6) 0719 0644 0690 0533 0504 0451 0386 0455 0463 0355 0360

efficiently determines the nonlinear decision boundary throughonline training Second the online classifier parameter updatescheme in (10) facilitates recovering from tracking drift

Figure 4 shows sampled tracking results from six longsequences (each with more than 1000 frames) The totalnumber of frames of these sequences is 11 134 that ac-counts for about 384 of the total number of frames (about29 000) in the benchmark and hence the performance on

these sequences plays an important role in performance eval-uation For clear presentation only the results of the topperforming BMR HCF and MEEM methods are shown Inall sequences the BMR-based tracker is able to track thetargets stably over almost all frames However the HCFscheme drifts away from the target objects after a fewframes in the sylvester (117812851345) and lemming(38611371336) sequences The MEEM method drifts

SUBMITTED 8

Fig 4 Screenshots sampled results from six long sequences sylvester mhyang dog1 lemming liquor and doll

to background when severe occlusions happen in the liquorsequence (508728775) To further demonstrate theresults over all frames clearly Figure 5 shows the plots interms of overlap score of each frame Overall the BMR-basedtracker performs well against the HCF and MEEM methodsin most frames of these sequences

D Analysis of BMR

To demonstrate the effectiveness of BMRs we eliminatethe component of Boolean maps in the proposed trackingalgorithm and only leverage the LAB+HOG representationsfor visual tracking In addition we use the KCF as a baselineas it adopts the HOG representations as the proposed trackingmethod Figure 6 shows quantitative comparisons on thebenchmark dataset Without using the proposed Boolean mapsthe AUC score of success rate in OPE of the proposed methodis reduced by 75 For TRE and SRE the AUC scores of theproposed method are reduced by 39 and 47 respectivelywithout the component of Boolean maps It is worth noticingthat the proposed method without using the Boolean maps

still outperforms KCF in terms of all metrics on success rateswhich shows the effectiveness of the LAB color features inBMR These experimental results show that the BMRs in theproposed method play a key role for robust visual tracking

E Failure Cases

Figure 7 shows failed results of the proposed BMR-basedmethod in two sequences singer2 and motorRolling In thesinger2 sequence the foreground object and background sceneare similar due to the dim stage lighting at the beginning(1025) The HCF MEEM and proposed methods all driftto the background Furthermore as the targets in the motor-Rolling sequence undergo from 360-degree in-plane rotationin early frames (35) the MEEM and proposed methods donot adapt to drastic appearance variations well due to limitedtraining samples In contrast only the HCF tracker performswell in this sequence because it leverages dense sampling andhigh-dimensional convolutional features

SUBMITTED 9

Fig 5 Overlap score plots of six long sequences shown in Figure 4

IV CONCLUSIONS

In this paper we propose a Boolean map based representa-tion which exploits the connectivity cues for visual trackingIn the BMR scheme the HOG and raw color feature mapsare decomposed into a set of Boolean maps by uniformlythresholding the respective channels These Boolean maps areconcatenated and normalized to form a robust representationwhich approximates an explicit feature map of the intersectionkernel A logistic regression classifier with the explicit featuremap is trained in an online manner that determines thenonlinear decision boundary for visual tracking Extensiveevaluations on a large tracking benchmark dataset demonstratethe proposed tracking algorithm performs favorably against thestate-of-the-art algorithms in terms of accuracy and robustness

REFERENCES

[1] X Li W Hu C Shen Z Zhang A Dick and A V D HengelldquoA survey of appearance models in visual object trackingrdquo ACMTransactions on Intelligent Systems and Technology vol 4 no 4 p 582013 1

[2] B D Lucas and T Kanade ldquoAn iterative image registration techniquewith an application to stereo visionrdquo in International Joint Conferenceon Artificial Intelligence vol 81 pp 674ndash679 1981 1

[3] I Matthews T Ishikawa and S Baker ldquoThe template update prob-lemrdquo IEEE Transactions on Pattern Analysis and Machine Intelligencevol 26 no 6 pp 810ndash815 2004 1

[4] J Henriques R Caseiro P Martins and J Batista ldquoExploiting thecirculant structure of tracking-by-detection with kernelsrdquo in Proceedingsof European Conference on Computer Vision pp 702ndash715 2012 1

[5] J Zhang S Ma and S Sclaroff ldquoMeem Robust tracking via multi-ple experts using entropy minimizationrdquo in Proceedings of EuropeanConference on Computer Vision pp 188ndash203 2014 1 2 5 6 7

[6] M J Black and A D Jepson ldquoEigentracking Robust matching andtracking of articulated objects using a view-based representationrdquo Inter-national Journal of Computer Vision vol 26 no 1 pp 63ndash84 19981

[7] D Ross J Lim R Lin and M-H Yang ldquoIncremental learningfor robust visual trackingrdquo International Journal of Computer Visionvol 77 no 1 pp 125ndash141 2008 1

[8] X Mei and H Ling ldquoRobust visual tracking and vehicle classificationvia sparse representationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 33 no 11 pp 2259ndash2272 2011 1

[9] T Zhang B Ghanem S Liu and N Ahuja ldquoRobust visual trackingvia multi-task sparse learningrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2042ndash2049 2012 1

[10] D Wang H Lu and M-H Yang ldquoOnline object tracking with sparseprototypesrdquo IEEE Transactions on Image Processing vol 22 no 1pp 314ndash325 2013 1

[11] A Adam E Rivlin and I Shimshoni ldquoRobust fragments-based trackingusing the integral histogramrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 798ndash805 2006 1

[12] S He Q Yang R Lau J Wang and M-H Yang ldquoVisual tracking vialocality sensitive histogramsrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2427ndash2434 2013 1

[13] B Babenko M-H Yang and S Belongie ldquoRobust object trackingwith online multiple instance learningrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 33 no 8 pp 1619ndash1632 20111

[14] S Hare A Saffari and P H Torr ldquoStruck Structured output trackingwith kernelsrdquo in Proceedings of the IEEE International Conference onComputer Vision pp 263ndash270 2011 1 6 7

[15] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speedtracking with kernelized correlation filtersrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 37 no 3 pp 583ndash5962015 1 5 6 7

[16] N Dalal and B Triggs ldquoHistograms of oriented gradients for humandetectionrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition vol 1 pp 886ndash893 2005 1

[17] Y Wu J Lim and M-H Yang ldquoOnline object tracking A benchmarkrdquoin Proceedings of IEEE Conference on Computer Vision and PatternRecognition pp 2411ndash2418 2013 1 2 5 6

[18] J Kwon and K M Lee ldquoTracking of a non-rigid object via patch-baseddynamic appearance modeling and adaptive basin hopping monte carlosamplingrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition pp 1208ndash1215 2009 1

[19] X Jia H Lu and M-H Yang ldquoVisual tracking via adaptive structurallocal sparse appearance modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1822ndash1829 2012 1 67

SUBMITTED 10

Fig 6 Success and precision plots of OPE TRE and SRE for BMR BMR only with LAB+HOG representations and KCF (KCF is usedas a baseline for comparisons)

Fig 7 Failure cases of the BMR-based tracker in the singer2 and motorRolling sequences The results of HCF and MEEM are also illustrated

[20] W Zhong H Lu and M-H Yang ldquoRobust object tracking via sparsity-based collaborative modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1838ndash1845 2012 16 7

[21] Y Li and J Zhu ldquoA scale adaptive kernel correlation filter trackerwith feature integrationrdquo in European Conference on Computer Vision-Workshops pp 254ndash265 2014 1

[22] N Wang J Shi D-Y Yeung and J Jia ldquoUnderstanding and diagnosingvisual tracking systemsrdquo in Proceedings of the IEEE InternationalConference on Computer Vision pp 3101ndash3109 2015 1 2 4

[23] C Ma J-B Huang X Yang and M-H Yang ldquoHierarchical con-volutional features for visual trackingrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 3074ndash3082 20151 2 5 6 7

[24] R Allen P Mcgeorge D Pearson and A B Milne ldquoAttention andexpertise in multiple target trackingrdquo Applied Cognitive Psychologyvol 18 no 3 pp 337ndash347 2004 2

[25] P Cavanagh and G A Alvarez ldquoTracking multiple targets with multifo-cal attentionrdquo Trends in Cognitive Sciences vol 9 no 7 pp 349ndash3542005 2

[26] A Set and B Set ldquoTopological structure in visual perceptionrdquo Sciencevol 218 p 699 1982 2

[27] S E Palmer Vision Science Photons to Phenomenology vol 1 MIT

Press 1999 2[28] J Zhang and S Sclaroff ldquoSaliency detection A boolean map approachrdquo

in Proceedings of the IEEE International Conference on ComputerVision pp 153ndash160 2013 2 3

[29] B Alexe T Deselaers and V Ferrari ldquoMeasuring the objectness ofimage windowsrdquo IEEE Transactions Pattern Analysis and MachineIntelligence vol 34 no 11 pp 2189ndash2202 2012 2

[30] M-M Cheng Z Zhang W-Y Lin and P Torr ldquoBing Binarizednormed gradients for objectness estimation at 300fpsrdquo in Proceedingsof IEEE Conference on Computer Vision and Pattern Recognitionpp 3286ndash3293 2014 2

[31] L Huang and H Pashler ldquoA boolean map theory of visual attentionrdquoPsychological review vol 114 no 3 p 599 2007 2

[32] L Itti C Koch and E Niebur ldquoA model of saliency-based visual at-tention for rapid scene analysisrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence no 11 pp 1254ndash1259 1998 2 3

[33] M S Livingstone and D H Hubel ldquoAnatomy and physiology of a colorsystem in the primate visual cortexrdquo The Journal of Neuroscience vol 4no 1 pp 309ndash356 1984 2

[34] K Grauman and T Darrell ldquoThe pyramid match kernel Efficientlearning with sets of featuresrdquo The Journal of Machine LearningResearch vol 8 pp 725ndash760 2007 3

[35] N Wang and D-Y Yeung ldquoLearning a deep compact image representa-

SUBMITTED 11

tion for visual trackingrdquo in Advances in Neural Information ProcessingSystems pp 809ndash817 2013 5 6 7

[36] M Danelljan G Hager F Khan and M Felsberg ldquoAccurate scaleestimation for robust visual trackingrdquo in Proceedings of British MachineVision Conference 2014 5 6 7

[37] J Gao H Ling W Hu and J Xing ldquoTransfer learning based visualtracking with Gaussian processes regressionrdquo in Proceedings of Euro-pean Conference on Computer Vision pp 188ndash203 2014 5 6 7

[38] Z Kalal J Matas and K Mikolajczyk ldquoPn learning Bootstrappingbinary classifiers by structural constraintsrdquo in Proceedings of IEEEConference on Computer Vision and Pattern Recognition pp 49ndash562010 6 7

Page 5: SUBMITTED 1 Visual Tracking via Boolean Map Representations

SUBMITTED 5

For grayscale videos the original image patches are usedto extract raw intensity and HOG features and the featuredimension d = 4times4times31+16times16 = 752 For color videos theimage patches are transformed to the CIE LAB color space toextract raw color features and the original RGB image patchesare used to extract HOG features The corresponding totaldimension d = 4times 4times 31+16times 16times 3 = 1264 The numberof Boolean maps is set to c = 4 and the total dimensionof BMRs is 3d = 2256 for gray videos and 3792 for colorvideos and the sampling step δ = 1c = 025 The searchradius for positive samples is set to α = 3 The inner searchradius for negative samples is set to 03min(w h) where wand h are the weight and height of the target respectivelyand the outer search radius β = 100 where the searchstep is set to 5 which generates a small subset of negativesamples The target state parameter set for particle filter is setto [σx σy σs] = [6 6 001] and the number of particles is setto np = 400 The confidence threshold is set to ρ = 09 Allparameter values are fixed for all sequences and the sourcecode will be made available to the public More results andvideos are available at httpkaihuazhangnetbmrbmrhtm

B Dataset and Evaluation Metrics

For performance evaluation we use the tracking benchmarkdataset and code library [17] which includes 29 trackers and50 fully-annotated videos In addition we also add the corre-sponding results of 6 most recent trackers including DLT [35]DSST [36] KCF [15] TGPR [37] MEEM [5] and HCF [23]For detailed analysis the sequences are annotated with 11attributes based on different challenging factors including lowresolution (LR) in-plane rotation (IPR) out-of-plane rotation(OPR) scale variation (SV) occlusion (OCC) deformation(DEF) background clutters (BC) illumination variation (IV)motion blur (MB) fast motion (FM) and out-of-view (OV)

We quantitatively evaluate the trackers with success andprecision plots [17] Given the tracked bounding box BT andthe ground truth bounding box BG the overlap score is definedas score = Area(BT

⋂BG)

Area(BT

⋃BG) Hence 0 le score le 1 and a

larger value of score means a better performance of the eval-uated tracker The success plot demonstrates the percentageof frames with score gt t through all threshold t isin [0 1]Furthermore the area under curve (AUC) of each success plotserves as a measure to rank the evaluated trackers On theother hand the precision plot shows the percentage of frameswhose tracked locations are within a given threshold distance(ie 20 pixels in [17]) to the ground truth Both success andprecision plots are used in the one-pass evaluation (OPE)temporal robustness evaluation (TRE) and spatial robustnessevaluation (SRE) where OPE reports the average precision orsuccess rate by running the trackers through a test sequencewith initialization from the ground truth position and TRE aswell as SRE measure a trackerprimes robustness to initializationwith temporal and spatial perturbations respectively [17] Wereport the OPE TRE and SRE results For presentation claritywe only present the top 10 algorithms in each plot

C Empirical Results

1) Overall Performance Figure 3 shows overall perfor-mance of the top 10 trackers in terms of success and precisionplots The BMR-based tracking algorithm ranks first on thesuccess rate of all OPE and second based on TRE andSRE Furthermore the BMR-based method ranks third basedon the precision rates of OPE TRE and SRE Overall theproposed BMR-based tracker performs favorably against thestate-of-the-art methods in terms of all metrics except forMEEM [5] and HCF [23] The MEEM tracker exploits a multi-expert restoration scheme to handle the drift problem whichcombines a tracker and the historical snapshots as expertsIn contrast even using only a logistic regression classifierwithout using any restoration strategy the proposed BMR-based method performs well against MEEM in terms of mostmetrics (ie the success rates of the BMR-based methodoutperform the MEEM scheme while the precision rates of theBMR-based method are comparable to the MEEM scheme)which shows the effectiveness of the proposed representationscheme for visual tracking In addition the HCF method isbased on deep learning which leverages complex hierarchicalconvolutional features learned off-line from a large datasetand correlation filters for visual tracking Notwithstanding theproposed BMR-based algorithm performs comparably againstHCF in terms of success rates on all metrics

2) Attribute-based Performance To demonstrate thestrength and weakness of BMR we further evaluate the 35trackers on videos with 11 attributes categorized by [17]

Table I and II summarize the results of success and precisionscores of OPE with different attributes Among them theBMR-based method ranks within top 3 with most attributesSpecifically with the success rate of OPE the BMR-basedmethod ranks first on 4 out of 11 attributes while second on 6out of 11 attributes In the sequences with the BC attribute theBMR-based method ranks third and its score is close to theMEEM scheme that ranks second (0555 vs 0569) For theprecision scores of OPE the BMR-based method ranks secondon 4 out of 11 attributes and third on 3 out of 11 attributes Inthe sequences with the OV attribute the BMR-based trackerranks first and for the videos with the IPR and BC attributesthe proposed tracking algorithm ranks fourth with comparableperformance to the third-rank DSST and KCF methods

Table III and IV show the results of TRE with differentattributes The BMR-based method ranks within top 3 withmost attributes In terms of success rates the BMR-basedmethod ranks first on 2 attributes second on 3 attributes andthird on 6 attributes In terms of precision rates the BMR-based tracker ranks third on 7 attributes and first and secondon the OV and OCC attributes respectively Furthermore forother attributes such as LR and BC the BMR-based trackingalgorithm ranks fourth but it scores are close to the results ofMEEM and KCF that rank third (0581 vs 0598 and 0772vs 0776)

Table V and VI show the results of SRE with differentattributes In terms of success rates the rankings of the BMR-based method are similar to those based on TRE except forthe IPR and OPR attributes Among them the BMR-based

SUBMITTED 6

Fig 3 Success and precision plots of OPE TRE and SRE by the top 10 trackers The trackers are ranked by the AUC scores (shown inthe legends) when the success rates are used or precession cores at the threshold of 20 pixels

TABLE I Success score of OPE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0409 0557 0360 0310 0352 0370 0279 0372 0309 0157 0256IPR (31) 0557 0582 0535 0497 0532 0479 0458 0444 0416 0425 0383

OPR (39) 0590 0587 0558 0496 0491 0485 0470 0432 0420 0422 0393SV (28) 0586 0531 0498 0427 0451 0418 0518 0425 0421 0452 0458

OCC (29) 0615 0606 0552 0513 0480 0484 0487 0413 0402 0376 0384DEF (19) 0594 0626 0560 0533 0474 0510 0448 0393 0378 0372 0330

BC (21) 0555 0623 0569 0533 0492 0522 0450 0458 0345 0408 0327IV (25) 0551 0560 0533 0494 0506 0484 0473 0428 0399 0429 0392

MB (12) 0559 0616 0541 0499 0458 0434 0298 0433 0404 0258 0329FM (17) 0559 0578 0553 0461 0433 0396 0296 0462 0417 0247 0353

OV (6) 0616 0575 0606 0550 0490 0442 0361 0459 0457 0312 0409

TABLE II Precision scores of OPE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0517 0897 0490 0379 0534 0538 0305 0545 0349 0156 0303IPR (31) 0776 0868 0800 0725 0780 0675 0597 0617 0584 0511 0510

OPR (39) 0819 0869 0840 0730 0732 0678 0618 0597 0596 0518 0527SV (28) 0803 0880 0785 0680 0740 0620 0672 0639 0606 0552 0606

OCC (29) 0846 0877 0799 0749 0725 0675 0640 0564 0563 0460 0495DEF (19) 0802 0881 0846 0741 0657 0691 0586 0521 0512 0445 0512

BC (21) 0742 0885 0797 0752 0691 0717 0578 0585 0428 0496 0440IV (25) 0742 0844 0766 0729 0741 0671 0594 0558 0537 0517 0492

MB (12) 0755 0844 0715 0650 0603 0537 0339 0551 0518 0278 0427FM (17) 0758 0790 0742 0602 0562 0493 0333 0604 0551 0253 0435

OV (6) 0773 0695 0727 0649 0533 0505 0429 0539 0576 0333 0505

tracker ranks third based on SRE and second based on TREFurthermore although the MEEM method ranks higher thanthe BMR-based tracker in most attributes the differences ofthe scores are within 1 In terms of precision rates the BMR-based algorithm ranks within top 3 with most attributes exceptfor the LR DEF and IV attributes

The AUC score of success rate measures the overall perfor-

mance of each tracking method [17] Figure 3 shows that theBMR-based method achieves better results in terms of successrates than that precision rates in terms of all metrics (OPESRE TRE) and attributes The tracking performance can beattributed to two factors First the proposed method exploits alogistic regression classifier with explicit feature maps which

SUBMITTED 7

TABLE III Success scores of TRE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0444 0520 0424 0382 0403 0443 0304 0456 0299 0278 0324IPR (31) 0562 0591 0558 0520 0515 0514 0453 0473 0406 0451 0423

OPR (39) 0578 0595 0572 0531 0507 0523 0480 0477 0425 0465 0428SV (28) 0564 0544 0517 0488 0473 0468 0496 0446 0418 0487 0448

OCC (29) 0585 0610 0566 0547 0519 0520 0502 0462 0426 0444 0426DEF (19) 0599 0651 0611 0571 0548 0577 0515 0500 0425 0466 0399

BC (21) 0575 0631 0577 0565 0518 0530 0469 0478 0372 0445 0366IV (25) 0555 0597 0564 0528 0529 0518 0475 0486 0402 0468 0427

MB (12) 0537 0594 0553 0493 0472 0483 0290 0485 0388 0296 0349FM (17) 0516 0560 0542 0456 0429 0461 0282 0464 0392 0285 0350

OV (6) 0593 0557 0581 0539 0505 0440 0344 0417 0434 0325 0403

TABLE IV Precision scores of TRE with 11 attributes The number after each attribute name is the number of sequences The red blueand green fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0581 0750 0589 0501 0574 0602 0350 0628 0376 0325 0391IPR (31) 0767 0851 0802 0728 0725 0716 0581 0650 0569 0582 0572

OPR (39) 0789 0859 0826 0749 0719 0728 0617 0660 0597 0605 0584SV (28) 0769 0840 0787 0727 0717 0676 0633 0652 0600 0634 0594

OCC (29) 0791 0854 0788 0758 0726 0705 0633 0631 0579 0560 0550DEF (19) 0798 0889 0854 0757 0723 0765 0635 0655 0571 0571 0556

BC (21) 0772 0874 0793 0776 0697 0721 0600 0622 0488 0575 0517IV (25) 0747 0851 0792 0729 0727 0693 0585 0643 0543 0584 0572

MB (12) 0720 0785 0724 0626 0597 0607 0323 0617 0491 0332 0450FM (17) 0681 0738 0710 0578 0532 0582 0302 0580 0487 0305 0432

OV (6) 0719 0692 0692 0643 0587 0514 0371 0484 0485 0339 0470

TABLE V Success scores of SRE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0352 0488 0374 0289 0326 0332 0254 0360 0305 0213 0243IPR (31) 0487 0537 0494 0450 0460 0438 0399 0410 0380 0405 0357

OPR (39) 0510 0536 0514 0445 0439 0455 0396 0409 0387 0404 0368SV (28) 0524 0492 0463 0401 0413 0396 0438 0395 0384 0440 0402

OCC (29) 0524 0543 0510 0445 0434 0449 0398 0405 0384 0381 0354DEF (19) 0492 0566 0516 0469 0434 0504 0358 0398 0357 0386 0322

BC (21) 0500 0569 0517 0483 0451 0483 0387 0408 0334 0410 0303IV (25) 0486 0516 0490 0442 0446 0438 0389 0396 0350 0405 0347

MB (12) 0503 0565 0513 0425 0389 0420 0266 0451 0385 0256 0312FM (17) 0504 0534 0518 0415 0384 0412 0269 0464 0392 0285 0350

OV (6) 0578 0526 0575 0455 0426 0391 0335 0421 0407 0316 0314

TABLE VI Precision scores of SRE with 11 attributes The number after each attribute name is the number of sequences The red blueand green fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0476 0818 0511 0377 0543 0501 0305 0504 0363 0263 0299IPR (31) 0704 0839 0752 0667 0704 0648 0546 0592 0554 0556 0503

OPR (39) 0732 0828 0774 0666 0680 0669 0547 0595 0560 0560 0525SV (28) 0752 0832 0732 0632 0696 0599 0598 0607 0558 0601 0562

OCC (29) 0735 0815 0730 0662 0671 0649 0540 0568 0516 0514 0483DEF (19) 0684 0835 0757 0677 0630 0715 0475 0547 0505 0516 0467

BC (21) 0702 0851 0734 0693 0655 0698 0521 0555 0451 0555 0439IV (25) 0677 0809 0707 0652 0681 0630 0509 0556 0480 0544 0472

MB (12) 0686 0807 0691 0567 0532 0561 0309 0587 0521 0310 0388FM (17) 0685 0748 0694 0545 0505 0544 0308 0577 0496 0291 0397

OV (6) 0719 0644 0690 0533 0504 0451 0386 0455 0463 0355 0360

efficiently determines the nonlinear decision boundary throughonline training Second the online classifier parameter updatescheme in (10) facilitates recovering from tracking drift

Figure 4 shows sampled tracking results from six longsequences (each with more than 1000 frames) The totalnumber of frames of these sequences is 11 134 that ac-counts for about 384 of the total number of frames (about29 000) in the benchmark and hence the performance on

these sequences plays an important role in performance eval-uation For clear presentation only the results of the topperforming BMR HCF and MEEM methods are shown Inall sequences the BMR-based tracker is able to track thetargets stably over almost all frames However the HCFscheme drifts away from the target objects after a fewframes in the sylvester (117812851345) and lemming(38611371336) sequences The MEEM method drifts

SUBMITTED 8

Fig 4 Screenshots sampled results from six long sequences sylvester mhyang dog1 lemming liquor and doll

to background when severe occlusions happen in the liquorsequence (508728775) To further demonstrate theresults over all frames clearly Figure 5 shows the plots interms of overlap score of each frame Overall the BMR-basedtracker performs well against the HCF and MEEM methodsin most frames of these sequences

D Analysis of BMR

To demonstrate the effectiveness of BMRs we eliminatethe component of Boolean maps in the proposed trackingalgorithm and only leverage the LAB+HOG representationsfor visual tracking In addition we use the KCF as a baselineas it adopts the HOG representations as the proposed trackingmethod Figure 6 shows quantitative comparisons on thebenchmark dataset Without using the proposed Boolean mapsthe AUC score of success rate in OPE of the proposed methodis reduced by 75 For TRE and SRE the AUC scores of theproposed method are reduced by 39 and 47 respectivelywithout the component of Boolean maps It is worth noticingthat the proposed method without using the Boolean maps

still outperforms KCF in terms of all metrics on success rateswhich shows the effectiveness of the LAB color features inBMR These experimental results show that the BMRs in theproposed method play a key role for robust visual tracking

E Failure Cases

Figure 7 shows failed results of the proposed BMR-basedmethod in two sequences singer2 and motorRolling In thesinger2 sequence the foreground object and background sceneare similar due to the dim stage lighting at the beginning(1025) The HCF MEEM and proposed methods all driftto the background Furthermore as the targets in the motor-Rolling sequence undergo from 360-degree in-plane rotationin early frames (35) the MEEM and proposed methods donot adapt to drastic appearance variations well due to limitedtraining samples In contrast only the HCF tracker performswell in this sequence because it leverages dense sampling andhigh-dimensional convolutional features

SUBMITTED 9

Fig 5 Overlap score plots of six long sequences shown in Figure 4

IV CONCLUSIONS

In this paper we propose a Boolean map based representa-tion which exploits the connectivity cues for visual trackingIn the BMR scheme the HOG and raw color feature mapsare decomposed into a set of Boolean maps by uniformlythresholding the respective channels These Boolean maps areconcatenated and normalized to form a robust representationwhich approximates an explicit feature map of the intersectionkernel A logistic regression classifier with the explicit featuremap is trained in an online manner that determines thenonlinear decision boundary for visual tracking Extensiveevaluations on a large tracking benchmark dataset demonstratethe proposed tracking algorithm performs favorably against thestate-of-the-art algorithms in terms of accuracy and robustness

REFERENCES

[1] X Li W Hu C Shen Z Zhang A Dick and A V D HengelldquoA survey of appearance models in visual object trackingrdquo ACMTransactions on Intelligent Systems and Technology vol 4 no 4 p 582013 1

[2] B D Lucas and T Kanade ldquoAn iterative image registration techniquewith an application to stereo visionrdquo in International Joint Conferenceon Artificial Intelligence vol 81 pp 674ndash679 1981 1

[3] I Matthews T Ishikawa and S Baker ldquoThe template update prob-lemrdquo IEEE Transactions on Pattern Analysis and Machine Intelligencevol 26 no 6 pp 810ndash815 2004 1

[4] J Henriques R Caseiro P Martins and J Batista ldquoExploiting thecirculant structure of tracking-by-detection with kernelsrdquo in Proceedingsof European Conference on Computer Vision pp 702ndash715 2012 1

[5] J Zhang S Ma and S Sclaroff ldquoMeem Robust tracking via multi-ple experts using entropy minimizationrdquo in Proceedings of EuropeanConference on Computer Vision pp 188ndash203 2014 1 2 5 6 7

[6] M J Black and A D Jepson ldquoEigentracking Robust matching andtracking of articulated objects using a view-based representationrdquo Inter-national Journal of Computer Vision vol 26 no 1 pp 63ndash84 19981

[7] D Ross J Lim R Lin and M-H Yang ldquoIncremental learningfor robust visual trackingrdquo International Journal of Computer Visionvol 77 no 1 pp 125ndash141 2008 1

[8] X Mei and H Ling ldquoRobust visual tracking and vehicle classificationvia sparse representationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 33 no 11 pp 2259ndash2272 2011 1

[9] T Zhang B Ghanem S Liu and N Ahuja ldquoRobust visual trackingvia multi-task sparse learningrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2042ndash2049 2012 1

[10] D Wang H Lu and M-H Yang ldquoOnline object tracking with sparseprototypesrdquo IEEE Transactions on Image Processing vol 22 no 1pp 314ndash325 2013 1

[11] A Adam E Rivlin and I Shimshoni ldquoRobust fragments-based trackingusing the integral histogramrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 798ndash805 2006 1

[12] S He Q Yang R Lau J Wang and M-H Yang ldquoVisual tracking vialocality sensitive histogramsrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2427ndash2434 2013 1

[13] B Babenko M-H Yang and S Belongie ldquoRobust object trackingwith online multiple instance learningrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 33 no 8 pp 1619ndash1632 20111

[14] S Hare A Saffari and P H Torr ldquoStruck Structured output trackingwith kernelsrdquo in Proceedings of the IEEE International Conference onComputer Vision pp 263ndash270 2011 1 6 7

[15] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speedtracking with kernelized correlation filtersrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 37 no 3 pp 583ndash5962015 1 5 6 7

[16] N Dalal and B Triggs ldquoHistograms of oriented gradients for humandetectionrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition vol 1 pp 886ndash893 2005 1

[17] Y Wu J Lim and M-H Yang ldquoOnline object tracking A benchmarkrdquoin Proceedings of IEEE Conference on Computer Vision and PatternRecognition pp 2411ndash2418 2013 1 2 5 6

[18] J Kwon and K M Lee ldquoTracking of a non-rigid object via patch-baseddynamic appearance modeling and adaptive basin hopping monte carlosamplingrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition pp 1208ndash1215 2009 1

[19] X Jia H Lu and M-H Yang ldquoVisual tracking via adaptive structurallocal sparse appearance modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1822ndash1829 2012 1 67

SUBMITTED 10

Fig 6 Success and precision plots of OPE TRE and SRE for BMR BMR only with LAB+HOG representations and KCF (KCF is usedas a baseline for comparisons)

Fig 7 Failure cases of the BMR-based tracker in the singer2 and motorRolling sequences The results of HCF and MEEM are also illustrated

[20] W Zhong H Lu and M-H Yang ldquoRobust object tracking via sparsity-based collaborative modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1838ndash1845 2012 16 7

[21] Y Li and J Zhu ldquoA scale adaptive kernel correlation filter trackerwith feature integrationrdquo in European Conference on Computer Vision-Workshops pp 254ndash265 2014 1

[22] N Wang J Shi D-Y Yeung and J Jia ldquoUnderstanding and diagnosingvisual tracking systemsrdquo in Proceedings of the IEEE InternationalConference on Computer Vision pp 3101ndash3109 2015 1 2 4

[23] C Ma J-B Huang X Yang and M-H Yang ldquoHierarchical con-volutional features for visual trackingrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 3074ndash3082 20151 2 5 6 7

[24] R Allen P Mcgeorge D Pearson and A B Milne ldquoAttention andexpertise in multiple target trackingrdquo Applied Cognitive Psychologyvol 18 no 3 pp 337ndash347 2004 2

[25] P Cavanagh and G A Alvarez ldquoTracking multiple targets with multifo-cal attentionrdquo Trends in Cognitive Sciences vol 9 no 7 pp 349ndash3542005 2

[26] A Set and B Set ldquoTopological structure in visual perceptionrdquo Sciencevol 218 p 699 1982 2

[27] S E Palmer Vision Science Photons to Phenomenology vol 1 MIT

Press 1999 2[28] J Zhang and S Sclaroff ldquoSaliency detection A boolean map approachrdquo

in Proceedings of the IEEE International Conference on ComputerVision pp 153ndash160 2013 2 3

[29] B Alexe T Deselaers and V Ferrari ldquoMeasuring the objectness ofimage windowsrdquo IEEE Transactions Pattern Analysis and MachineIntelligence vol 34 no 11 pp 2189ndash2202 2012 2

[30] M-M Cheng Z Zhang W-Y Lin and P Torr ldquoBing Binarizednormed gradients for objectness estimation at 300fpsrdquo in Proceedingsof IEEE Conference on Computer Vision and Pattern Recognitionpp 3286ndash3293 2014 2

[31] L Huang and H Pashler ldquoA boolean map theory of visual attentionrdquoPsychological review vol 114 no 3 p 599 2007 2

[32] L Itti C Koch and E Niebur ldquoA model of saliency-based visual at-tention for rapid scene analysisrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence no 11 pp 1254ndash1259 1998 2 3

[33] M S Livingstone and D H Hubel ldquoAnatomy and physiology of a colorsystem in the primate visual cortexrdquo The Journal of Neuroscience vol 4no 1 pp 309ndash356 1984 2

[34] K Grauman and T Darrell ldquoThe pyramid match kernel Efficientlearning with sets of featuresrdquo The Journal of Machine LearningResearch vol 8 pp 725ndash760 2007 3

[35] N Wang and D-Y Yeung ldquoLearning a deep compact image representa-

SUBMITTED 11

tion for visual trackingrdquo in Advances in Neural Information ProcessingSystems pp 809ndash817 2013 5 6 7

[36] M Danelljan G Hager F Khan and M Felsberg ldquoAccurate scaleestimation for robust visual trackingrdquo in Proceedings of British MachineVision Conference 2014 5 6 7

[37] J Gao H Ling W Hu and J Xing ldquoTransfer learning based visualtracking with Gaussian processes regressionrdquo in Proceedings of Euro-pean Conference on Computer Vision pp 188ndash203 2014 5 6 7

[38] Z Kalal J Matas and K Mikolajczyk ldquoPn learning Bootstrappingbinary classifiers by structural constraintsrdquo in Proceedings of IEEEConference on Computer Vision and Pattern Recognition pp 49ndash562010 6 7

Page 6: SUBMITTED 1 Visual Tracking via Boolean Map Representations

SUBMITTED 6

Fig 3 Success and precision plots of OPE TRE and SRE by the top 10 trackers The trackers are ranked by the AUC scores (shown inthe legends) when the success rates are used or precession cores at the threshold of 20 pixels

TABLE I Success score of OPE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0409 0557 0360 0310 0352 0370 0279 0372 0309 0157 0256IPR (31) 0557 0582 0535 0497 0532 0479 0458 0444 0416 0425 0383

OPR (39) 0590 0587 0558 0496 0491 0485 0470 0432 0420 0422 0393SV (28) 0586 0531 0498 0427 0451 0418 0518 0425 0421 0452 0458

OCC (29) 0615 0606 0552 0513 0480 0484 0487 0413 0402 0376 0384DEF (19) 0594 0626 0560 0533 0474 0510 0448 0393 0378 0372 0330

BC (21) 0555 0623 0569 0533 0492 0522 0450 0458 0345 0408 0327IV (25) 0551 0560 0533 0494 0506 0484 0473 0428 0399 0429 0392

MB (12) 0559 0616 0541 0499 0458 0434 0298 0433 0404 0258 0329FM (17) 0559 0578 0553 0461 0433 0396 0296 0462 0417 0247 0353

OV (6) 0616 0575 0606 0550 0490 0442 0361 0459 0457 0312 0409

TABLE II Precision scores of OPE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0517 0897 0490 0379 0534 0538 0305 0545 0349 0156 0303IPR (31) 0776 0868 0800 0725 0780 0675 0597 0617 0584 0511 0510

OPR (39) 0819 0869 0840 0730 0732 0678 0618 0597 0596 0518 0527SV (28) 0803 0880 0785 0680 0740 0620 0672 0639 0606 0552 0606

OCC (29) 0846 0877 0799 0749 0725 0675 0640 0564 0563 0460 0495DEF (19) 0802 0881 0846 0741 0657 0691 0586 0521 0512 0445 0512

BC (21) 0742 0885 0797 0752 0691 0717 0578 0585 0428 0496 0440IV (25) 0742 0844 0766 0729 0741 0671 0594 0558 0537 0517 0492

MB (12) 0755 0844 0715 0650 0603 0537 0339 0551 0518 0278 0427FM (17) 0758 0790 0742 0602 0562 0493 0333 0604 0551 0253 0435

OV (6) 0773 0695 0727 0649 0533 0505 0429 0539 0576 0333 0505

tracker ranks third based on SRE and second based on TREFurthermore although the MEEM method ranks higher thanthe BMR-based tracker in most attributes the differences ofthe scores are within 1 In terms of precision rates the BMR-based algorithm ranks within top 3 with most attributes exceptfor the LR DEF and IV attributes

The AUC score of success rate measures the overall perfor-

mance of each tracking method [17] Figure 3 shows that theBMR-based method achieves better results in terms of successrates than that precision rates in terms of all metrics (OPESRE TRE) and attributes The tracking performance can beattributed to two factors First the proposed method exploits alogistic regression classifier with explicit feature maps which

SUBMITTED 7

TABLE III Success scores of TRE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0444 0520 0424 0382 0403 0443 0304 0456 0299 0278 0324IPR (31) 0562 0591 0558 0520 0515 0514 0453 0473 0406 0451 0423

OPR (39) 0578 0595 0572 0531 0507 0523 0480 0477 0425 0465 0428SV (28) 0564 0544 0517 0488 0473 0468 0496 0446 0418 0487 0448

OCC (29) 0585 0610 0566 0547 0519 0520 0502 0462 0426 0444 0426DEF (19) 0599 0651 0611 0571 0548 0577 0515 0500 0425 0466 0399

BC (21) 0575 0631 0577 0565 0518 0530 0469 0478 0372 0445 0366IV (25) 0555 0597 0564 0528 0529 0518 0475 0486 0402 0468 0427

MB (12) 0537 0594 0553 0493 0472 0483 0290 0485 0388 0296 0349FM (17) 0516 0560 0542 0456 0429 0461 0282 0464 0392 0285 0350

OV (6) 0593 0557 0581 0539 0505 0440 0344 0417 0434 0325 0403

TABLE IV Precision scores of TRE with 11 attributes The number after each attribute name is the number of sequences The red blueand green fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0581 0750 0589 0501 0574 0602 0350 0628 0376 0325 0391IPR (31) 0767 0851 0802 0728 0725 0716 0581 0650 0569 0582 0572

OPR (39) 0789 0859 0826 0749 0719 0728 0617 0660 0597 0605 0584SV (28) 0769 0840 0787 0727 0717 0676 0633 0652 0600 0634 0594

OCC (29) 0791 0854 0788 0758 0726 0705 0633 0631 0579 0560 0550DEF (19) 0798 0889 0854 0757 0723 0765 0635 0655 0571 0571 0556

BC (21) 0772 0874 0793 0776 0697 0721 0600 0622 0488 0575 0517IV (25) 0747 0851 0792 0729 0727 0693 0585 0643 0543 0584 0572

MB (12) 0720 0785 0724 0626 0597 0607 0323 0617 0491 0332 0450FM (17) 0681 0738 0710 0578 0532 0582 0302 0580 0487 0305 0432

OV (6) 0719 0692 0692 0643 0587 0514 0371 0484 0485 0339 0470

TABLE V Success scores of SRE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0352 0488 0374 0289 0326 0332 0254 0360 0305 0213 0243IPR (31) 0487 0537 0494 0450 0460 0438 0399 0410 0380 0405 0357

OPR (39) 0510 0536 0514 0445 0439 0455 0396 0409 0387 0404 0368SV (28) 0524 0492 0463 0401 0413 0396 0438 0395 0384 0440 0402

OCC (29) 0524 0543 0510 0445 0434 0449 0398 0405 0384 0381 0354DEF (19) 0492 0566 0516 0469 0434 0504 0358 0398 0357 0386 0322

BC (21) 0500 0569 0517 0483 0451 0483 0387 0408 0334 0410 0303IV (25) 0486 0516 0490 0442 0446 0438 0389 0396 0350 0405 0347

MB (12) 0503 0565 0513 0425 0389 0420 0266 0451 0385 0256 0312FM (17) 0504 0534 0518 0415 0384 0412 0269 0464 0392 0285 0350

OV (6) 0578 0526 0575 0455 0426 0391 0335 0421 0407 0316 0314

TABLE VI Precision scores of SRE with 11 attributes The number after each attribute name is the number of sequences The red blueand green fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0476 0818 0511 0377 0543 0501 0305 0504 0363 0263 0299IPR (31) 0704 0839 0752 0667 0704 0648 0546 0592 0554 0556 0503

OPR (39) 0732 0828 0774 0666 0680 0669 0547 0595 0560 0560 0525SV (28) 0752 0832 0732 0632 0696 0599 0598 0607 0558 0601 0562

OCC (29) 0735 0815 0730 0662 0671 0649 0540 0568 0516 0514 0483DEF (19) 0684 0835 0757 0677 0630 0715 0475 0547 0505 0516 0467

BC (21) 0702 0851 0734 0693 0655 0698 0521 0555 0451 0555 0439IV (25) 0677 0809 0707 0652 0681 0630 0509 0556 0480 0544 0472

MB (12) 0686 0807 0691 0567 0532 0561 0309 0587 0521 0310 0388FM (17) 0685 0748 0694 0545 0505 0544 0308 0577 0496 0291 0397

OV (6) 0719 0644 0690 0533 0504 0451 0386 0455 0463 0355 0360

efficiently determines the nonlinear decision boundary throughonline training Second the online classifier parameter updatescheme in (10) facilitates recovering from tracking drift

Figure 4 shows sampled tracking results from six longsequences (each with more than 1000 frames) The totalnumber of frames of these sequences is 11 134 that ac-counts for about 384 of the total number of frames (about29 000) in the benchmark and hence the performance on

these sequences plays an important role in performance eval-uation For clear presentation only the results of the topperforming BMR HCF and MEEM methods are shown Inall sequences the BMR-based tracker is able to track thetargets stably over almost all frames However the HCFscheme drifts away from the target objects after a fewframes in the sylvester (117812851345) and lemming(38611371336) sequences The MEEM method drifts

SUBMITTED 8

Fig 4 Screenshots sampled results from six long sequences sylvester mhyang dog1 lemming liquor and doll

to background when severe occlusions happen in the liquorsequence (508728775) To further demonstrate theresults over all frames clearly Figure 5 shows the plots interms of overlap score of each frame Overall the BMR-basedtracker performs well against the HCF and MEEM methodsin most frames of these sequences

D Analysis of BMR

To demonstrate the effectiveness of BMRs we eliminatethe component of Boolean maps in the proposed trackingalgorithm and only leverage the LAB+HOG representationsfor visual tracking In addition we use the KCF as a baselineas it adopts the HOG representations as the proposed trackingmethod Figure 6 shows quantitative comparisons on thebenchmark dataset Without using the proposed Boolean mapsthe AUC score of success rate in OPE of the proposed methodis reduced by 75 For TRE and SRE the AUC scores of theproposed method are reduced by 39 and 47 respectivelywithout the component of Boolean maps It is worth noticingthat the proposed method without using the Boolean maps

still outperforms KCF in terms of all metrics on success rateswhich shows the effectiveness of the LAB color features inBMR These experimental results show that the BMRs in theproposed method play a key role for robust visual tracking

E Failure Cases

Figure 7 shows failed results of the proposed BMR-basedmethod in two sequences singer2 and motorRolling In thesinger2 sequence the foreground object and background sceneare similar due to the dim stage lighting at the beginning(1025) The HCF MEEM and proposed methods all driftto the background Furthermore as the targets in the motor-Rolling sequence undergo from 360-degree in-plane rotationin early frames (35) the MEEM and proposed methods donot adapt to drastic appearance variations well due to limitedtraining samples In contrast only the HCF tracker performswell in this sequence because it leverages dense sampling andhigh-dimensional convolutional features

SUBMITTED 9

Fig 5 Overlap score plots of six long sequences shown in Figure 4

IV CONCLUSIONS

In this paper we propose a Boolean map based representa-tion which exploits the connectivity cues for visual trackingIn the BMR scheme the HOG and raw color feature mapsare decomposed into a set of Boolean maps by uniformlythresholding the respective channels These Boolean maps areconcatenated and normalized to form a robust representationwhich approximates an explicit feature map of the intersectionkernel A logistic regression classifier with the explicit featuremap is trained in an online manner that determines thenonlinear decision boundary for visual tracking Extensiveevaluations on a large tracking benchmark dataset demonstratethe proposed tracking algorithm performs favorably against thestate-of-the-art algorithms in terms of accuracy and robustness

REFERENCES

[1] X Li W Hu C Shen Z Zhang A Dick and A V D HengelldquoA survey of appearance models in visual object trackingrdquo ACMTransactions on Intelligent Systems and Technology vol 4 no 4 p 582013 1

[2] B D Lucas and T Kanade ldquoAn iterative image registration techniquewith an application to stereo visionrdquo in International Joint Conferenceon Artificial Intelligence vol 81 pp 674ndash679 1981 1

[3] I Matthews T Ishikawa and S Baker ldquoThe template update prob-lemrdquo IEEE Transactions on Pattern Analysis and Machine Intelligencevol 26 no 6 pp 810ndash815 2004 1

[4] J Henriques R Caseiro P Martins and J Batista ldquoExploiting thecirculant structure of tracking-by-detection with kernelsrdquo in Proceedingsof European Conference on Computer Vision pp 702ndash715 2012 1

[5] J Zhang S Ma and S Sclaroff ldquoMeem Robust tracking via multi-ple experts using entropy minimizationrdquo in Proceedings of EuropeanConference on Computer Vision pp 188ndash203 2014 1 2 5 6 7

[6] M J Black and A D Jepson ldquoEigentracking Robust matching andtracking of articulated objects using a view-based representationrdquo Inter-national Journal of Computer Vision vol 26 no 1 pp 63ndash84 19981

[7] D Ross J Lim R Lin and M-H Yang ldquoIncremental learningfor robust visual trackingrdquo International Journal of Computer Visionvol 77 no 1 pp 125ndash141 2008 1

[8] X Mei and H Ling ldquoRobust visual tracking and vehicle classificationvia sparse representationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 33 no 11 pp 2259ndash2272 2011 1

[9] T Zhang B Ghanem S Liu and N Ahuja ldquoRobust visual trackingvia multi-task sparse learningrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2042ndash2049 2012 1

[10] D Wang H Lu and M-H Yang ldquoOnline object tracking with sparseprototypesrdquo IEEE Transactions on Image Processing vol 22 no 1pp 314ndash325 2013 1

[11] A Adam E Rivlin and I Shimshoni ldquoRobust fragments-based trackingusing the integral histogramrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 798ndash805 2006 1

[12] S He Q Yang R Lau J Wang and M-H Yang ldquoVisual tracking vialocality sensitive histogramsrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2427ndash2434 2013 1

[13] B Babenko M-H Yang and S Belongie ldquoRobust object trackingwith online multiple instance learningrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 33 no 8 pp 1619ndash1632 20111

[14] S Hare A Saffari and P H Torr ldquoStruck Structured output trackingwith kernelsrdquo in Proceedings of the IEEE International Conference onComputer Vision pp 263ndash270 2011 1 6 7

[15] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speedtracking with kernelized correlation filtersrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 37 no 3 pp 583ndash5962015 1 5 6 7

[16] N Dalal and B Triggs ldquoHistograms of oriented gradients for humandetectionrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition vol 1 pp 886ndash893 2005 1

[17] Y Wu J Lim and M-H Yang ldquoOnline object tracking A benchmarkrdquoin Proceedings of IEEE Conference on Computer Vision and PatternRecognition pp 2411ndash2418 2013 1 2 5 6

[18] J Kwon and K M Lee ldquoTracking of a non-rigid object via patch-baseddynamic appearance modeling and adaptive basin hopping monte carlosamplingrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition pp 1208ndash1215 2009 1

[19] X Jia H Lu and M-H Yang ldquoVisual tracking via adaptive structurallocal sparse appearance modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1822ndash1829 2012 1 67

SUBMITTED 10

Fig 6 Success and precision plots of OPE TRE and SRE for BMR BMR only with LAB+HOG representations and KCF (KCF is usedas a baseline for comparisons)

Fig 7 Failure cases of the BMR-based tracker in the singer2 and motorRolling sequences The results of HCF and MEEM are also illustrated

[20] W Zhong H Lu and M-H Yang ldquoRobust object tracking via sparsity-based collaborative modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1838ndash1845 2012 16 7

[21] Y Li and J Zhu ldquoA scale adaptive kernel correlation filter trackerwith feature integrationrdquo in European Conference on Computer Vision-Workshops pp 254ndash265 2014 1

[22] N Wang J Shi D-Y Yeung and J Jia ldquoUnderstanding and diagnosingvisual tracking systemsrdquo in Proceedings of the IEEE InternationalConference on Computer Vision pp 3101ndash3109 2015 1 2 4

[23] C Ma J-B Huang X Yang and M-H Yang ldquoHierarchical con-volutional features for visual trackingrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 3074ndash3082 20151 2 5 6 7

[24] R Allen P Mcgeorge D Pearson and A B Milne ldquoAttention andexpertise in multiple target trackingrdquo Applied Cognitive Psychologyvol 18 no 3 pp 337ndash347 2004 2

[25] P Cavanagh and G A Alvarez ldquoTracking multiple targets with multifo-cal attentionrdquo Trends in Cognitive Sciences vol 9 no 7 pp 349ndash3542005 2

[26] A Set and B Set ldquoTopological structure in visual perceptionrdquo Sciencevol 218 p 699 1982 2

[27] S E Palmer Vision Science Photons to Phenomenology vol 1 MIT

Press 1999 2[28] J Zhang and S Sclaroff ldquoSaliency detection A boolean map approachrdquo

in Proceedings of the IEEE International Conference on ComputerVision pp 153ndash160 2013 2 3

[29] B Alexe T Deselaers and V Ferrari ldquoMeasuring the objectness ofimage windowsrdquo IEEE Transactions Pattern Analysis and MachineIntelligence vol 34 no 11 pp 2189ndash2202 2012 2

[30] M-M Cheng Z Zhang W-Y Lin and P Torr ldquoBing Binarizednormed gradients for objectness estimation at 300fpsrdquo in Proceedingsof IEEE Conference on Computer Vision and Pattern Recognitionpp 3286ndash3293 2014 2

[31] L Huang and H Pashler ldquoA boolean map theory of visual attentionrdquoPsychological review vol 114 no 3 p 599 2007 2

[32] L Itti C Koch and E Niebur ldquoA model of saliency-based visual at-tention for rapid scene analysisrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence no 11 pp 1254ndash1259 1998 2 3

[33] M S Livingstone and D H Hubel ldquoAnatomy and physiology of a colorsystem in the primate visual cortexrdquo The Journal of Neuroscience vol 4no 1 pp 309ndash356 1984 2

[34] K Grauman and T Darrell ldquoThe pyramid match kernel Efficientlearning with sets of featuresrdquo The Journal of Machine LearningResearch vol 8 pp 725ndash760 2007 3

[35] N Wang and D-Y Yeung ldquoLearning a deep compact image representa-

SUBMITTED 11

tion for visual trackingrdquo in Advances in Neural Information ProcessingSystems pp 809ndash817 2013 5 6 7

[36] M Danelljan G Hager F Khan and M Felsberg ldquoAccurate scaleestimation for robust visual trackingrdquo in Proceedings of British MachineVision Conference 2014 5 6 7

[37] J Gao H Ling W Hu and J Xing ldquoTransfer learning based visualtracking with Gaussian processes regressionrdquo in Proceedings of Euro-pean Conference on Computer Vision pp 188ndash203 2014 5 6 7

[38] Z Kalal J Matas and K Mikolajczyk ldquoPn learning Bootstrappingbinary classifiers by structural constraintsrdquo in Proceedings of IEEEConference on Computer Vision and Pattern Recognition pp 49ndash562010 6 7

Page 7: SUBMITTED 1 Visual Tracking via Boolean Map Representations

SUBMITTED 7

TABLE III Success scores of TRE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0444 0520 0424 0382 0403 0443 0304 0456 0299 0278 0324IPR (31) 0562 0591 0558 0520 0515 0514 0453 0473 0406 0451 0423

OPR (39) 0578 0595 0572 0531 0507 0523 0480 0477 0425 0465 0428SV (28) 0564 0544 0517 0488 0473 0468 0496 0446 0418 0487 0448

OCC (29) 0585 0610 0566 0547 0519 0520 0502 0462 0426 0444 0426DEF (19) 0599 0651 0611 0571 0548 0577 0515 0500 0425 0466 0399

BC (21) 0575 0631 0577 0565 0518 0530 0469 0478 0372 0445 0366IV (25) 0555 0597 0564 0528 0529 0518 0475 0486 0402 0468 0427

MB (12) 0537 0594 0553 0493 0472 0483 0290 0485 0388 0296 0349FM (17) 0516 0560 0542 0456 0429 0461 0282 0464 0392 0285 0350

OV (6) 0593 0557 0581 0539 0505 0440 0344 0417 0434 0325 0403

TABLE IV Precision scores of TRE with 11 attributes The number after each attribute name is the number of sequences The red blueand green fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0581 0750 0589 0501 0574 0602 0350 0628 0376 0325 0391IPR (31) 0767 0851 0802 0728 0725 0716 0581 0650 0569 0582 0572

OPR (39) 0789 0859 0826 0749 0719 0728 0617 0660 0597 0605 0584SV (28) 0769 0840 0787 0727 0717 0676 0633 0652 0600 0634 0594

OCC (29) 0791 0854 0788 0758 0726 0705 0633 0631 0579 0560 0550DEF (19) 0798 0889 0854 0757 0723 0765 0635 0655 0571 0571 0556

BC (21) 0772 0874 0793 0776 0697 0721 0600 0622 0488 0575 0517IV (25) 0747 0851 0792 0729 0727 0693 0585 0643 0543 0584 0572

MB (12) 0720 0785 0724 0626 0597 0607 0323 0617 0491 0332 0450FM (17) 0681 0738 0710 0578 0532 0582 0302 0580 0487 0305 0432

OV (6) 0719 0692 0692 0643 0587 0514 0371 0484 0485 0339 0470

TABLE V Success scores of SRE with 11 attributes The number after each attribute name is the number of sequences The red blue andgreen fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0352 0488 0374 0289 0326 0332 0254 0360 0305 0213 0243IPR (31) 0487 0537 0494 0450 0460 0438 0399 0410 0380 0405 0357

OPR (39) 0510 0536 0514 0445 0439 0455 0396 0409 0387 0404 0368SV (28) 0524 0492 0463 0401 0413 0396 0438 0395 0384 0440 0402

OCC (29) 0524 0543 0510 0445 0434 0449 0398 0405 0384 0381 0354DEF (19) 0492 0566 0516 0469 0434 0504 0358 0398 0357 0386 0322

BC (21) 0500 0569 0517 0483 0451 0483 0387 0408 0334 0410 0303IV (25) 0486 0516 0490 0442 0446 0438 0389 0396 0350 0405 0347

MB (12) 0503 0565 0513 0425 0389 0420 0266 0451 0385 0256 0312FM (17) 0504 0534 0518 0415 0384 0412 0269 0464 0392 0285 0350

OV (6) 0578 0526 0575 0455 0426 0391 0335 0421 0407 0316 0314

TABLE VI Precision scores of SRE with 11 attributes The number after each attribute name is the number of sequences The red blueand green fonts indicate the best second and third performance

Attribute BMR HCF [23] MEEM [5] KCF [15] DSST [36] TGPR [37] SCM [20] Struck [14] TLD [38] ASLA [19] DLT [35]

LR (4) 0476 0818 0511 0377 0543 0501 0305 0504 0363 0263 0299IPR (31) 0704 0839 0752 0667 0704 0648 0546 0592 0554 0556 0503

OPR (39) 0732 0828 0774 0666 0680 0669 0547 0595 0560 0560 0525SV (28) 0752 0832 0732 0632 0696 0599 0598 0607 0558 0601 0562

OCC (29) 0735 0815 0730 0662 0671 0649 0540 0568 0516 0514 0483DEF (19) 0684 0835 0757 0677 0630 0715 0475 0547 0505 0516 0467

BC (21) 0702 0851 0734 0693 0655 0698 0521 0555 0451 0555 0439IV (25) 0677 0809 0707 0652 0681 0630 0509 0556 0480 0544 0472

MB (12) 0686 0807 0691 0567 0532 0561 0309 0587 0521 0310 0388FM (17) 0685 0748 0694 0545 0505 0544 0308 0577 0496 0291 0397

OV (6) 0719 0644 0690 0533 0504 0451 0386 0455 0463 0355 0360

efficiently determines the nonlinear decision boundary throughonline training Second the online classifier parameter updatescheme in (10) facilitates recovering from tracking drift

Figure 4 shows sampled tracking results from six longsequences (each with more than 1000 frames) The totalnumber of frames of these sequences is 11 134 that ac-counts for about 384 of the total number of frames (about29 000) in the benchmark and hence the performance on

these sequences plays an important role in performance eval-uation For clear presentation only the results of the topperforming BMR HCF and MEEM methods are shown Inall sequences the BMR-based tracker is able to track thetargets stably over almost all frames However the HCFscheme drifts away from the target objects after a fewframes in the sylvester (117812851345) and lemming(38611371336) sequences The MEEM method drifts

SUBMITTED 8

Fig 4 Screenshots sampled results from six long sequences sylvester mhyang dog1 lemming liquor and doll

to background when severe occlusions happen in the liquorsequence (508728775) To further demonstrate theresults over all frames clearly Figure 5 shows the plots interms of overlap score of each frame Overall the BMR-basedtracker performs well against the HCF and MEEM methodsin most frames of these sequences

D Analysis of BMR

To demonstrate the effectiveness of BMRs we eliminatethe component of Boolean maps in the proposed trackingalgorithm and only leverage the LAB+HOG representationsfor visual tracking In addition we use the KCF as a baselineas it adopts the HOG representations as the proposed trackingmethod Figure 6 shows quantitative comparisons on thebenchmark dataset Without using the proposed Boolean mapsthe AUC score of success rate in OPE of the proposed methodis reduced by 75 For TRE and SRE the AUC scores of theproposed method are reduced by 39 and 47 respectivelywithout the component of Boolean maps It is worth noticingthat the proposed method without using the Boolean maps

still outperforms KCF in terms of all metrics on success rateswhich shows the effectiveness of the LAB color features inBMR These experimental results show that the BMRs in theproposed method play a key role for robust visual tracking

E Failure Cases

Figure 7 shows failed results of the proposed BMR-basedmethod in two sequences singer2 and motorRolling In thesinger2 sequence the foreground object and background sceneare similar due to the dim stage lighting at the beginning(1025) The HCF MEEM and proposed methods all driftto the background Furthermore as the targets in the motor-Rolling sequence undergo from 360-degree in-plane rotationin early frames (35) the MEEM and proposed methods donot adapt to drastic appearance variations well due to limitedtraining samples In contrast only the HCF tracker performswell in this sequence because it leverages dense sampling andhigh-dimensional convolutional features

SUBMITTED 9

Fig 5 Overlap score plots of six long sequences shown in Figure 4

IV CONCLUSIONS

In this paper we propose a Boolean map based representa-tion which exploits the connectivity cues for visual trackingIn the BMR scheme the HOG and raw color feature mapsare decomposed into a set of Boolean maps by uniformlythresholding the respective channels These Boolean maps areconcatenated and normalized to form a robust representationwhich approximates an explicit feature map of the intersectionkernel A logistic regression classifier with the explicit featuremap is trained in an online manner that determines thenonlinear decision boundary for visual tracking Extensiveevaluations on a large tracking benchmark dataset demonstratethe proposed tracking algorithm performs favorably against thestate-of-the-art algorithms in terms of accuracy and robustness

REFERENCES

[1] X Li W Hu C Shen Z Zhang A Dick and A V D HengelldquoA survey of appearance models in visual object trackingrdquo ACMTransactions on Intelligent Systems and Technology vol 4 no 4 p 582013 1

[2] B D Lucas and T Kanade ldquoAn iterative image registration techniquewith an application to stereo visionrdquo in International Joint Conferenceon Artificial Intelligence vol 81 pp 674ndash679 1981 1

[3] I Matthews T Ishikawa and S Baker ldquoThe template update prob-lemrdquo IEEE Transactions on Pattern Analysis and Machine Intelligencevol 26 no 6 pp 810ndash815 2004 1

[4] J Henriques R Caseiro P Martins and J Batista ldquoExploiting thecirculant structure of tracking-by-detection with kernelsrdquo in Proceedingsof European Conference on Computer Vision pp 702ndash715 2012 1

[5] J Zhang S Ma and S Sclaroff ldquoMeem Robust tracking via multi-ple experts using entropy minimizationrdquo in Proceedings of EuropeanConference on Computer Vision pp 188ndash203 2014 1 2 5 6 7

[6] M J Black and A D Jepson ldquoEigentracking Robust matching andtracking of articulated objects using a view-based representationrdquo Inter-national Journal of Computer Vision vol 26 no 1 pp 63ndash84 19981

[7] D Ross J Lim R Lin and M-H Yang ldquoIncremental learningfor robust visual trackingrdquo International Journal of Computer Visionvol 77 no 1 pp 125ndash141 2008 1

[8] X Mei and H Ling ldquoRobust visual tracking and vehicle classificationvia sparse representationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 33 no 11 pp 2259ndash2272 2011 1

[9] T Zhang B Ghanem S Liu and N Ahuja ldquoRobust visual trackingvia multi-task sparse learningrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2042ndash2049 2012 1

[10] D Wang H Lu and M-H Yang ldquoOnline object tracking with sparseprototypesrdquo IEEE Transactions on Image Processing vol 22 no 1pp 314ndash325 2013 1

[11] A Adam E Rivlin and I Shimshoni ldquoRobust fragments-based trackingusing the integral histogramrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 798ndash805 2006 1

[12] S He Q Yang R Lau J Wang and M-H Yang ldquoVisual tracking vialocality sensitive histogramsrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2427ndash2434 2013 1

[13] B Babenko M-H Yang and S Belongie ldquoRobust object trackingwith online multiple instance learningrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 33 no 8 pp 1619ndash1632 20111

[14] S Hare A Saffari and P H Torr ldquoStruck Structured output trackingwith kernelsrdquo in Proceedings of the IEEE International Conference onComputer Vision pp 263ndash270 2011 1 6 7

[15] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speedtracking with kernelized correlation filtersrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 37 no 3 pp 583ndash5962015 1 5 6 7

[16] N Dalal and B Triggs ldquoHistograms of oriented gradients for humandetectionrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition vol 1 pp 886ndash893 2005 1

[17] Y Wu J Lim and M-H Yang ldquoOnline object tracking A benchmarkrdquoin Proceedings of IEEE Conference on Computer Vision and PatternRecognition pp 2411ndash2418 2013 1 2 5 6

[18] J Kwon and K M Lee ldquoTracking of a non-rigid object via patch-baseddynamic appearance modeling and adaptive basin hopping monte carlosamplingrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition pp 1208ndash1215 2009 1

[19] X Jia H Lu and M-H Yang ldquoVisual tracking via adaptive structurallocal sparse appearance modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1822ndash1829 2012 1 67

SUBMITTED 10

Fig 6 Success and precision plots of OPE TRE and SRE for BMR BMR only with LAB+HOG representations and KCF (KCF is usedas a baseline for comparisons)

Fig 7 Failure cases of the BMR-based tracker in the singer2 and motorRolling sequences The results of HCF and MEEM are also illustrated

[20] W Zhong H Lu and M-H Yang ldquoRobust object tracking via sparsity-based collaborative modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1838ndash1845 2012 16 7

[21] Y Li and J Zhu ldquoA scale adaptive kernel correlation filter trackerwith feature integrationrdquo in European Conference on Computer Vision-Workshops pp 254ndash265 2014 1

[22] N Wang J Shi D-Y Yeung and J Jia ldquoUnderstanding and diagnosingvisual tracking systemsrdquo in Proceedings of the IEEE InternationalConference on Computer Vision pp 3101ndash3109 2015 1 2 4

[23] C Ma J-B Huang X Yang and M-H Yang ldquoHierarchical con-volutional features for visual trackingrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 3074ndash3082 20151 2 5 6 7

[24] R Allen P Mcgeorge D Pearson and A B Milne ldquoAttention andexpertise in multiple target trackingrdquo Applied Cognitive Psychologyvol 18 no 3 pp 337ndash347 2004 2

[25] P Cavanagh and G A Alvarez ldquoTracking multiple targets with multifo-cal attentionrdquo Trends in Cognitive Sciences vol 9 no 7 pp 349ndash3542005 2

[26] A Set and B Set ldquoTopological structure in visual perceptionrdquo Sciencevol 218 p 699 1982 2

[27] S E Palmer Vision Science Photons to Phenomenology vol 1 MIT

Press 1999 2[28] J Zhang and S Sclaroff ldquoSaliency detection A boolean map approachrdquo

in Proceedings of the IEEE International Conference on ComputerVision pp 153ndash160 2013 2 3

[29] B Alexe T Deselaers and V Ferrari ldquoMeasuring the objectness ofimage windowsrdquo IEEE Transactions Pattern Analysis and MachineIntelligence vol 34 no 11 pp 2189ndash2202 2012 2

[30] M-M Cheng Z Zhang W-Y Lin and P Torr ldquoBing Binarizednormed gradients for objectness estimation at 300fpsrdquo in Proceedingsof IEEE Conference on Computer Vision and Pattern Recognitionpp 3286ndash3293 2014 2

[31] L Huang and H Pashler ldquoA boolean map theory of visual attentionrdquoPsychological review vol 114 no 3 p 599 2007 2

[32] L Itti C Koch and E Niebur ldquoA model of saliency-based visual at-tention for rapid scene analysisrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence no 11 pp 1254ndash1259 1998 2 3

[33] M S Livingstone and D H Hubel ldquoAnatomy and physiology of a colorsystem in the primate visual cortexrdquo The Journal of Neuroscience vol 4no 1 pp 309ndash356 1984 2

[34] K Grauman and T Darrell ldquoThe pyramid match kernel Efficientlearning with sets of featuresrdquo The Journal of Machine LearningResearch vol 8 pp 725ndash760 2007 3

[35] N Wang and D-Y Yeung ldquoLearning a deep compact image representa-

SUBMITTED 11

tion for visual trackingrdquo in Advances in Neural Information ProcessingSystems pp 809ndash817 2013 5 6 7

[36] M Danelljan G Hager F Khan and M Felsberg ldquoAccurate scaleestimation for robust visual trackingrdquo in Proceedings of British MachineVision Conference 2014 5 6 7

[37] J Gao H Ling W Hu and J Xing ldquoTransfer learning based visualtracking with Gaussian processes regressionrdquo in Proceedings of Euro-pean Conference on Computer Vision pp 188ndash203 2014 5 6 7

[38] Z Kalal J Matas and K Mikolajczyk ldquoPn learning Bootstrappingbinary classifiers by structural constraintsrdquo in Proceedings of IEEEConference on Computer Vision and Pattern Recognition pp 49ndash562010 6 7

Page 8: SUBMITTED 1 Visual Tracking via Boolean Map Representations

SUBMITTED 8

Fig 4 Screenshots sampled results from six long sequences sylvester mhyang dog1 lemming liquor and doll

to background when severe occlusions happen in the liquorsequence (508728775) To further demonstrate theresults over all frames clearly Figure 5 shows the plots interms of overlap score of each frame Overall the BMR-basedtracker performs well against the HCF and MEEM methodsin most frames of these sequences

D Analysis of BMR

To demonstrate the effectiveness of BMRs we eliminatethe component of Boolean maps in the proposed trackingalgorithm and only leverage the LAB+HOG representationsfor visual tracking In addition we use the KCF as a baselineas it adopts the HOG representations as the proposed trackingmethod Figure 6 shows quantitative comparisons on thebenchmark dataset Without using the proposed Boolean mapsthe AUC score of success rate in OPE of the proposed methodis reduced by 75 For TRE and SRE the AUC scores of theproposed method are reduced by 39 and 47 respectivelywithout the component of Boolean maps It is worth noticingthat the proposed method without using the Boolean maps

still outperforms KCF in terms of all metrics on success rateswhich shows the effectiveness of the LAB color features inBMR These experimental results show that the BMRs in theproposed method play a key role for robust visual tracking

E Failure Cases

Figure 7 shows failed results of the proposed BMR-basedmethod in two sequences singer2 and motorRolling In thesinger2 sequence the foreground object and background sceneare similar due to the dim stage lighting at the beginning(1025) The HCF MEEM and proposed methods all driftto the background Furthermore as the targets in the motor-Rolling sequence undergo from 360-degree in-plane rotationin early frames (35) the MEEM and proposed methods donot adapt to drastic appearance variations well due to limitedtraining samples In contrast only the HCF tracker performswell in this sequence because it leverages dense sampling andhigh-dimensional convolutional features

SUBMITTED 9

Fig 5 Overlap score plots of six long sequences shown in Figure 4

IV CONCLUSIONS

In this paper we propose a Boolean map based representa-tion which exploits the connectivity cues for visual trackingIn the BMR scheme the HOG and raw color feature mapsare decomposed into a set of Boolean maps by uniformlythresholding the respective channels These Boolean maps areconcatenated and normalized to form a robust representationwhich approximates an explicit feature map of the intersectionkernel A logistic regression classifier with the explicit featuremap is trained in an online manner that determines thenonlinear decision boundary for visual tracking Extensiveevaluations on a large tracking benchmark dataset demonstratethe proposed tracking algorithm performs favorably against thestate-of-the-art algorithms in terms of accuracy and robustness

REFERENCES

[1] X Li W Hu C Shen Z Zhang A Dick and A V D HengelldquoA survey of appearance models in visual object trackingrdquo ACMTransactions on Intelligent Systems and Technology vol 4 no 4 p 582013 1

[2] B D Lucas and T Kanade ldquoAn iterative image registration techniquewith an application to stereo visionrdquo in International Joint Conferenceon Artificial Intelligence vol 81 pp 674ndash679 1981 1

[3] I Matthews T Ishikawa and S Baker ldquoThe template update prob-lemrdquo IEEE Transactions on Pattern Analysis and Machine Intelligencevol 26 no 6 pp 810ndash815 2004 1

[4] J Henriques R Caseiro P Martins and J Batista ldquoExploiting thecirculant structure of tracking-by-detection with kernelsrdquo in Proceedingsof European Conference on Computer Vision pp 702ndash715 2012 1

[5] J Zhang S Ma and S Sclaroff ldquoMeem Robust tracking via multi-ple experts using entropy minimizationrdquo in Proceedings of EuropeanConference on Computer Vision pp 188ndash203 2014 1 2 5 6 7

[6] M J Black and A D Jepson ldquoEigentracking Robust matching andtracking of articulated objects using a view-based representationrdquo Inter-national Journal of Computer Vision vol 26 no 1 pp 63ndash84 19981

[7] D Ross J Lim R Lin and M-H Yang ldquoIncremental learningfor robust visual trackingrdquo International Journal of Computer Visionvol 77 no 1 pp 125ndash141 2008 1

[8] X Mei and H Ling ldquoRobust visual tracking and vehicle classificationvia sparse representationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 33 no 11 pp 2259ndash2272 2011 1

[9] T Zhang B Ghanem S Liu and N Ahuja ldquoRobust visual trackingvia multi-task sparse learningrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2042ndash2049 2012 1

[10] D Wang H Lu and M-H Yang ldquoOnline object tracking with sparseprototypesrdquo IEEE Transactions on Image Processing vol 22 no 1pp 314ndash325 2013 1

[11] A Adam E Rivlin and I Shimshoni ldquoRobust fragments-based trackingusing the integral histogramrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 798ndash805 2006 1

[12] S He Q Yang R Lau J Wang and M-H Yang ldquoVisual tracking vialocality sensitive histogramsrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2427ndash2434 2013 1

[13] B Babenko M-H Yang and S Belongie ldquoRobust object trackingwith online multiple instance learningrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 33 no 8 pp 1619ndash1632 20111

[14] S Hare A Saffari and P H Torr ldquoStruck Structured output trackingwith kernelsrdquo in Proceedings of the IEEE International Conference onComputer Vision pp 263ndash270 2011 1 6 7

[15] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speedtracking with kernelized correlation filtersrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 37 no 3 pp 583ndash5962015 1 5 6 7

[16] N Dalal and B Triggs ldquoHistograms of oriented gradients for humandetectionrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition vol 1 pp 886ndash893 2005 1

[17] Y Wu J Lim and M-H Yang ldquoOnline object tracking A benchmarkrdquoin Proceedings of IEEE Conference on Computer Vision and PatternRecognition pp 2411ndash2418 2013 1 2 5 6

[18] J Kwon and K M Lee ldquoTracking of a non-rigid object via patch-baseddynamic appearance modeling and adaptive basin hopping monte carlosamplingrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition pp 1208ndash1215 2009 1

[19] X Jia H Lu and M-H Yang ldquoVisual tracking via adaptive structurallocal sparse appearance modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1822ndash1829 2012 1 67

SUBMITTED 10

Fig 6 Success and precision plots of OPE TRE and SRE for BMR BMR only with LAB+HOG representations and KCF (KCF is usedas a baseline for comparisons)

Fig 7 Failure cases of the BMR-based tracker in the singer2 and motorRolling sequences The results of HCF and MEEM are also illustrated

[20] W Zhong H Lu and M-H Yang ldquoRobust object tracking via sparsity-based collaborative modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1838ndash1845 2012 16 7

[21] Y Li and J Zhu ldquoA scale adaptive kernel correlation filter trackerwith feature integrationrdquo in European Conference on Computer Vision-Workshops pp 254ndash265 2014 1

[22] N Wang J Shi D-Y Yeung and J Jia ldquoUnderstanding and diagnosingvisual tracking systemsrdquo in Proceedings of the IEEE InternationalConference on Computer Vision pp 3101ndash3109 2015 1 2 4

[23] C Ma J-B Huang X Yang and M-H Yang ldquoHierarchical con-volutional features for visual trackingrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 3074ndash3082 20151 2 5 6 7

[24] R Allen P Mcgeorge D Pearson and A B Milne ldquoAttention andexpertise in multiple target trackingrdquo Applied Cognitive Psychologyvol 18 no 3 pp 337ndash347 2004 2

[25] P Cavanagh and G A Alvarez ldquoTracking multiple targets with multifo-cal attentionrdquo Trends in Cognitive Sciences vol 9 no 7 pp 349ndash3542005 2

[26] A Set and B Set ldquoTopological structure in visual perceptionrdquo Sciencevol 218 p 699 1982 2

[27] S E Palmer Vision Science Photons to Phenomenology vol 1 MIT

Press 1999 2[28] J Zhang and S Sclaroff ldquoSaliency detection A boolean map approachrdquo

in Proceedings of the IEEE International Conference on ComputerVision pp 153ndash160 2013 2 3

[29] B Alexe T Deselaers and V Ferrari ldquoMeasuring the objectness ofimage windowsrdquo IEEE Transactions Pattern Analysis and MachineIntelligence vol 34 no 11 pp 2189ndash2202 2012 2

[30] M-M Cheng Z Zhang W-Y Lin and P Torr ldquoBing Binarizednormed gradients for objectness estimation at 300fpsrdquo in Proceedingsof IEEE Conference on Computer Vision and Pattern Recognitionpp 3286ndash3293 2014 2

[31] L Huang and H Pashler ldquoA boolean map theory of visual attentionrdquoPsychological review vol 114 no 3 p 599 2007 2

[32] L Itti C Koch and E Niebur ldquoA model of saliency-based visual at-tention for rapid scene analysisrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence no 11 pp 1254ndash1259 1998 2 3

[33] M S Livingstone and D H Hubel ldquoAnatomy and physiology of a colorsystem in the primate visual cortexrdquo The Journal of Neuroscience vol 4no 1 pp 309ndash356 1984 2

[34] K Grauman and T Darrell ldquoThe pyramid match kernel Efficientlearning with sets of featuresrdquo The Journal of Machine LearningResearch vol 8 pp 725ndash760 2007 3

[35] N Wang and D-Y Yeung ldquoLearning a deep compact image representa-

SUBMITTED 11

tion for visual trackingrdquo in Advances in Neural Information ProcessingSystems pp 809ndash817 2013 5 6 7

[36] M Danelljan G Hager F Khan and M Felsberg ldquoAccurate scaleestimation for robust visual trackingrdquo in Proceedings of British MachineVision Conference 2014 5 6 7

[37] J Gao H Ling W Hu and J Xing ldquoTransfer learning based visualtracking with Gaussian processes regressionrdquo in Proceedings of Euro-pean Conference on Computer Vision pp 188ndash203 2014 5 6 7

[38] Z Kalal J Matas and K Mikolajczyk ldquoPn learning Bootstrappingbinary classifiers by structural constraintsrdquo in Proceedings of IEEEConference on Computer Vision and Pattern Recognition pp 49ndash562010 6 7

Page 9: SUBMITTED 1 Visual Tracking via Boolean Map Representations

SUBMITTED 9

Fig 5 Overlap score plots of six long sequences shown in Figure 4

IV CONCLUSIONS

In this paper we propose a Boolean map based representa-tion which exploits the connectivity cues for visual trackingIn the BMR scheme the HOG and raw color feature mapsare decomposed into a set of Boolean maps by uniformlythresholding the respective channels These Boolean maps areconcatenated and normalized to form a robust representationwhich approximates an explicit feature map of the intersectionkernel A logistic regression classifier with the explicit featuremap is trained in an online manner that determines thenonlinear decision boundary for visual tracking Extensiveevaluations on a large tracking benchmark dataset demonstratethe proposed tracking algorithm performs favorably against thestate-of-the-art algorithms in terms of accuracy and robustness

REFERENCES

[1] X Li W Hu C Shen Z Zhang A Dick and A V D HengelldquoA survey of appearance models in visual object trackingrdquo ACMTransactions on Intelligent Systems and Technology vol 4 no 4 p 582013 1

[2] B D Lucas and T Kanade ldquoAn iterative image registration techniquewith an application to stereo visionrdquo in International Joint Conferenceon Artificial Intelligence vol 81 pp 674ndash679 1981 1

[3] I Matthews T Ishikawa and S Baker ldquoThe template update prob-lemrdquo IEEE Transactions on Pattern Analysis and Machine Intelligencevol 26 no 6 pp 810ndash815 2004 1

[4] J Henriques R Caseiro P Martins and J Batista ldquoExploiting thecirculant structure of tracking-by-detection with kernelsrdquo in Proceedingsof European Conference on Computer Vision pp 702ndash715 2012 1

[5] J Zhang S Ma and S Sclaroff ldquoMeem Robust tracking via multi-ple experts using entropy minimizationrdquo in Proceedings of EuropeanConference on Computer Vision pp 188ndash203 2014 1 2 5 6 7

[6] M J Black and A D Jepson ldquoEigentracking Robust matching andtracking of articulated objects using a view-based representationrdquo Inter-national Journal of Computer Vision vol 26 no 1 pp 63ndash84 19981

[7] D Ross J Lim R Lin and M-H Yang ldquoIncremental learningfor robust visual trackingrdquo International Journal of Computer Visionvol 77 no 1 pp 125ndash141 2008 1

[8] X Mei and H Ling ldquoRobust visual tracking and vehicle classificationvia sparse representationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 33 no 11 pp 2259ndash2272 2011 1

[9] T Zhang B Ghanem S Liu and N Ahuja ldquoRobust visual trackingvia multi-task sparse learningrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2042ndash2049 2012 1

[10] D Wang H Lu and M-H Yang ldquoOnline object tracking with sparseprototypesrdquo IEEE Transactions on Image Processing vol 22 no 1pp 314ndash325 2013 1

[11] A Adam E Rivlin and I Shimshoni ldquoRobust fragments-based trackingusing the integral histogramrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 798ndash805 2006 1

[12] S He Q Yang R Lau J Wang and M-H Yang ldquoVisual tracking vialocality sensitive histogramsrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 2427ndash2434 2013 1

[13] B Babenko M-H Yang and S Belongie ldquoRobust object trackingwith online multiple instance learningrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 33 no 8 pp 1619ndash1632 20111

[14] S Hare A Saffari and P H Torr ldquoStruck Structured output trackingwith kernelsrdquo in Proceedings of the IEEE International Conference onComputer Vision pp 263ndash270 2011 1 6 7

[15] J F Henriques R Caseiro P Martins and J Batista ldquoHigh-speedtracking with kernelized correlation filtersrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 37 no 3 pp 583ndash5962015 1 5 6 7

[16] N Dalal and B Triggs ldquoHistograms of oriented gradients for humandetectionrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition vol 1 pp 886ndash893 2005 1

[17] Y Wu J Lim and M-H Yang ldquoOnline object tracking A benchmarkrdquoin Proceedings of IEEE Conference on Computer Vision and PatternRecognition pp 2411ndash2418 2013 1 2 5 6

[18] J Kwon and K M Lee ldquoTracking of a non-rigid object via patch-baseddynamic appearance modeling and adaptive basin hopping monte carlosamplingrdquo in Proceedings of IEEE Conference on Computer Vision andPattern Recognition pp 1208ndash1215 2009 1

[19] X Jia H Lu and M-H Yang ldquoVisual tracking via adaptive structurallocal sparse appearance modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1822ndash1829 2012 1 67

SUBMITTED 10

Fig 6 Success and precision plots of OPE TRE and SRE for BMR BMR only with LAB+HOG representations and KCF (KCF is usedas a baseline for comparisons)

Fig 7 Failure cases of the BMR-based tracker in the singer2 and motorRolling sequences The results of HCF and MEEM are also illustrated

[20] W Zhong H Lu and M-H Yang ldquoRobust object tracking via sparsity-based collaborative modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1838ndash1845 2012 16 7

[21] Y Li and J Zhu ldquoA scale adaptive kernel correlation filter trackerwith feature integrationrdquo in European Conference on Computer Vision-Workshops pp 254ndash265 2014 1

[22] N Wang J Shi D-Y Yeung and J Jia ldquoUnderstanding and diagnosingvisual tracking systemsrdquo in Proceedings of the IEEE InternationalConference on Computer Vision pp 3101ndash3109 2015 1 2 4

[23] C Ma J-B Huang X Yang and M-H Yang ldquoHierarchical con-volutional features for visual trackingrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 3074ndash3082 20151 2 5 6 7

[24] R Allen P Mcgeorge D Pearson and A B Milne ldquoAttention andexpertise in multiple target trackingrdquo Applied Cognitive Psychologyvol 18 no 3 pp 337ndash347 2004 2

[25] P Cavanagh and G A Alvarez ldquoTracking multiple targets with multifo-cal attentionrdquo Trends in Cognitive Sciences vol 9 no 7 pp 349ndash3542005 2

[26] A Set and B Set ldquoTopological structure in visual perceptionrdquo Sciencevol 218 p 699 1982 2

[27] S E Palmer Vision Science Photons to Phenomenology vol 1 MIT

Press 1999 2[28] J Zhang and S Sclaroff ldquoSaliency detection A boolean map approachrdquo

in Proceedings of the IEEE International Conference on ComputerVision pp 153ndash160 2013 2 3

[29] B Alexe T Deselaers and V Ferrari ldquoMeasuring the objectness ofimage windowsrdquo IEEE Transactions Pattern Analysis and MachineIntelligence vol 34 no 11 pp 2189ndash2202 2012 2

[30] M-M Cheng Z Zhang W-Y Lin and P Torr ldquoBing Binarizednormed gradients for objectness estimation at 300fpsrdquo in Proceedingsof IEEE Conference on Computer Vision and Pattern Recognitionpp 3286ndash3293 2014 2

[31] L Huang and H Pashler ldquoA boolean map theory of visual attentionrdquoPsychological review vol 114 no 3 p 599 2007 2

[32] L Itti C Koch and E Niebur ldquoA model of saliency-based visual at-tention for rapid scene analysisrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence no 11 pp 1254ndash1259 1998 2 3

[33] M S Livingstone and D H Hubel ldquoAnatomy and physiology of a colorsystem in the primate visual cortexrdquo The Journal of Neuroscience vol 4no 1 pp 309ndash356 1984 2

[34] K Grauman and T Darrell ldquoThe pyramid match kernel Efficientlearning with sets of featuresrdquo The Journal of Machine LearningResearch vol 8 pp 725ndash760 2007 3

[35] N Wang and D-Y Yeung ldquoLearning a deep compact image representa-

SUBMITTED 11

tion for visual trackingrdquo in Advances in Neural Information ProcessingSystems pp 809ndash817 2013 5 6 7

[36] M Danelljan G Hager F Khan and M Felsberg ldquoAccurate scaleestimation for robust visual trackingrdquo in Proceedings of British MachineVision Conference 2014 5 6 7

[37] J Gao H Ling W Hu and J Xing ldquoTransfer learning based visualtracking with Gaussian processes regressionrdquo in Proceedings of Euro-pean Conference on Computer Vision pp 188ndash203 2014 5 6 7

[38] Z Kalal J Matas and K Mikolajczyk ldquoPn learning Bootstrappingbinary classifiers by structural constraintsrdquo in Proceedings of IEEEConference on Computer Vision and Pattern Recognition pp 49ndash562010 6 7

Page 10: SUBMITTED 1 Visual Tracking via Boolean Map Representations

SUBMITTED 10

Fig 6 Success and precision plots of OPE TRE and SRE for BMR BMR only with LAB+HOG representations and KCF (KCF is usedas a baseline for comparisons)

Fig 7 Failure cases of the BMR-based tracker in the singer2 and motorRolling sequences The results of HCF and MEEM are also illustrated

[20] W Zhong H Lu and M-H Yang ldquoRobust object tracking via sparsity-based collaborative modelrdquo in Proceedings of IEEE Conference onComputer Vision and Pattern Recognition pp 1838ndash1845 2012 16 7

[21] Y Li and J Zhu ldquoA scale adaptive kernel correlation filter trackerwith feature integrationrdquo in European Conference on Computer Vision-Workshops pp 254ndash265 2014 1

[22] N Wang J Shi D-Y Yeung and J Jia ldquoUnderstanding and diagnosingvisual tracking systemsrdquo in Proceedings of the IEEE InternationalConference on Computer Vision pp 3101ndash3109 2015 1 2 4

[23] C Ma J-B Huang X Yang and M-H Yang ldquoHierarchical con-volutional features for visual trackingrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 3074ndash3082 20151 2 5 6 7

[24] R Allen P Mcgeorge D Pearson and A B Milne ldquoAttention andexpertise in multiple target trackingrdquo Applied Cognitive Psychologyvol 18 no 3 pp 337ndash347 2004 2

[25] P Cavanagh and G A Alvarez ldquoTracking multiple targets with multifo-cal attentionrdquo Trends in Cognitive Sciences vol 9 no 7 pp 349ndash3542005 2

[26] A Set and B Set ldquoTopological structure in visual perceptionrdquo Sciencevol 218 p 699 1982 2

[27] S E Palmer Vision Science Photons to Phenomenology vol 1 MIT

Press 1999 2[28] J Zhang and S Sclaroff ldquoSaliency detection A boolean map approachrdquo

in Proceedings of the IEEE International Conference on ComputerVision pp 153ndash160 2013 2 3

[29] B Alexe T Deselaers and V Ferrari ldquoMeasuring the objectness ofimage windowsrdquo IEEE Transactions Pattern Analysis and MachineIntelligence vol 34 no 11 pp 2189ndash2202 2012 2

[30] M-M Cheng Z Zhang W-Y Lin and P Torr ldquoBing Binarizednormed gradients for objectness estimation at 300fpsrdquo in Proceedingsof IEEE Conference on Computer Vision and Pattern Recognitionpp 3286ndash3293 2014 2

[31] L Huang and H Pashler ldquoA boolean map theory of visual attentionrdquoPsychological review vol 114 no 3 p 599 2007 2

[32] L Itti C Koch and E Niebur ldquoA model of saliency-based visual at-tention for rapid scene analysisrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence no 11 pp 1254ndash1259 1998 2 3

[33] M S Livingstone and D H Hubel ldquoAnatomy and physiology of a colorsystem in the primate visual cortexrdquo The Journal of Neuroscience vol 4no 1 pp 309ndash356 1984 2

[34] K Grauman and T Darrell ldquoThe pyramid match kernel Efficientlearning with sets of featuresrdquo The Journal of Machine LearningResearch vol 8 pp 725ndash760 2007 3

[35] N Wang and D-Y Yeung ldquoLearning a deep compact image representa-

SUBMITTED 11

tion for visual trackingrdquo in Advances in Neural Information ProcessingSystems pp 809ndash817 2013 5 6 7

[36] M Danelljan G Hager F Khan and M Felsberg ldquoAccurate scaleestimation for robust visual trackingrdquo in Proceedings of British MachineVision Conference 2014 5 6 7

[37] J Gao H Ling W Hu and J Xing ldquoTransfer learning based visualtracking with Gaussian processes regressionrdquo in Proceedings of Euro-pean Conference on Computer Vision pp 188ndash203 2014 5 6 7

[38] Z Kalal J Matas and K Mikolajczyk ldquoPn learning Bootstrappingbinary classifiers by structural constraintsrdquo in Proceedings of IEEEConference on Computer Vision and Pattern Recognition pp 49ndash562010 6 7

Page 11: SUBMITTED 1 Visual Tracking via Boolean Map Representations

SUBMITTED 11

tion for visual trackingrdquo in Advances in Neural Information ProcessingSystems pp 809ndash817 2013 5 6 7

[36] M Danelljan G Hager F Khan and M Felsberg ldquoAccurate scaleestimation for robust visual trackingrdquo in Proceedings of British MachineVision Conference 2014 5 6 7

[37] J Gao H Ling W Hu and J Xing ldquoTransfer learning based visualtracking with Gaussian processes regressionrdquo in Proceedings of Euro-pean Conference on Computer Vision pp 188ndash203 2014 5 6 7

[38] Z Kalal J Matas and K Mikolajczyk ldquoPn learning Bootstrappingbinary classifiers by structural constraintsrdquo in Proceedings of IEEEConference on Computer Vision and Pattern Recognition pp 49ndash562010 6 7