Multi-target tracking with occlusions via skeleton points assignment

Download Multi-target tracking with occlusions via skeleton points assignment

Post on 10-Sep-2016




0 download

Embed Size (px)


  • ia

    , Be

    n c




    ed b

    te t


    , a




    conducted on the public challenging dataset PETS 2009. Results show that this approach can improve

    the performance of the existing tracking approach and handle dynamic occlusions better.

    Crown Copyright & 2011 Published by Elsevier B.V. All rights reserved.

    edicate lastcomp

    f multisions.lly unaith occ

    can b

    identity. In addition, for some applications, multiple views are

    ame-blem,l and

    temporal association of observations that could maximize the

    Papadourakis et al. [11] presented a robust object tracking

    Contents lists available at SciVerse ScienceDirect

    journal homepage: www.els


    Neurocomputing 83 (2012) 165175either a single Gaussian or a Gaussian mixture distribution (W. Zhang).algorithm which could automatically build appropriate objectrepresentations by color and handles spatially extended andtemporally long object occlusions. The majority of the abovemethods are under a simple assumption that object color satises

    0925-2312/$ - see front matter Crown Copyright & 2011 Published by Elsevier B.V. All rights reserved.


    n Corresponding author. Tel.: 86 010 82614489; fax: 86 010 62545229.E-mail addresses: (H. Ding),not always available. Thus, this paper focuses on designing a consistency of both motion and appearance of object trajectories.special geometries provide insufcient information to generatea unique signal for tracking, due to visual occlusions. In thesecases more specialized tracking methods need to maintain track

    interactions are demonstrated. Qian et al. [10] proposed a frwork for treating the general multiple target tracking prowhich was formulated in terms of nding the best spatiagroups, depending on whether a scene is captured by a singlestationary camera or by multiple cameras. Multi-view methodsfuse information from multi-views to localize people on multiplescene planes [1]. These methods work well, except for specialgeometries of the camera views and people locations. These

    coherent 2D motion layers and introduced a complete dynamicmotion layer representation in which spatial and temporal con-straints on shape, motion and appearance are estimated using theEM algorithm. His method has been applied in an airborne vehicletracking system and examples of tracking vehicles in complexIn particular, the major challenge ofrequent presence of visual occlucurrent observation totally or partiaintervals. The problem of dealing wan open subject.

    Solutions to managing occlusionsprocessing tasks, such as video surveillance and event inference. part of the object model, and embed it into tracking process.1. Introduction

    Despite a lot of attention being dpeople in video sequences over thremains very concerning in manyed to tracking multiple20 years, the problemuter vision and video

    -targets tracking is theOcclusions make thevailable for some timelusions correctly is still

    e decomposed into two

    monocular tracking methodology, with the goal of handlingrelatively complex occlusion scenarios.

    A considerable amount of research has reported on the treat-ment of occlusions from a stationary monocular camera duringthe last decade. Most of them consider occlusion segmentation as

    These works build object models using color [2], appearance [25]and motion [68] information. Unfortunately, these models arelearned for describing the postures or actions of the trackingtargets, and do not t well with occlusion segmentation.

    There are also many attempts to deal with the problem ofocclusions explicitly. Hai et al. [9] decomposed video frames intoMulti-target tracking with occlusions v

    Huan Ding n, Wensheng Zhang

    Institute of Automation, Chinese Academy of Sciences, No. 95 Zhongguancun East Road

    a r t i c l e i n f o

    Article history:

    Received 19 May 2011

    Received in revised form

    6 December 2011

    Accepted 10 December 2011

    Communicated by Tao MeiAvailable online 29 December 2011


    Skeleton points assignment

    Occlusion segmentation

    Occlusion compensation

    Multi-target tracking

    a b s t r a c t

    Multiple-target tracking i

    vision. Handling the occlu

    introduces the method of

    (Skeleton Points Assign, SP

    complex situations captur

    skeleton points and evalua

    assign these points to di

    motion and color; nally

    accomplish occlusion segm

    missing information of oc

    framework, in which a proskeleton points assignment

    ijing 100190, China

    omplex scenes is one of the most complicated problems in computer

    between objects is the key issue in multiple-target tracking. This paper

    tion segmentation into the object tracking system, and presents a SPA

    based occlusion segmentation approach to track multiple people through

    y static monocular cameras. In the proposed method, we rst select the

    heir occlusion states by low-level information like optical ow; then we

    nt objects using advanced semantic information, such as appearance,

    dense classication of foreground pixels are taken advantages of to

    tation and a blob-based compensation strategy is utilized to estimate the

    ed objects. Object tracking is handled by a particle lter-based tracking

    ilistic appearance model is used to nd the best particle. Experiments are


  • Compared with the existing methods, our contribution is

    H. Ding, W. Zhang / Neurocomputing 83 (2012) 165175166Meanwhile, these approaches do not take the historical informa-tion in the scene into account. Thus all the methods mentionedabove lead to the poor performance of occlusion segmentationand multiple targets tracking for objects with similar color.

    Recently, some important issues in utilizing motion segmenta-tion in tracking system are discussed. In the context of motionsegmentation, the literature can be divided in two kinds: directmethods and feature-based methods [12]. Direct methods recoverthe unknown parameters directly from measurable image quan-tities at each pixel in the image. This is in contrast with thefeature-based methods, which rst extract a sparse set of distinctfeatures from each image separately, and then recover andanalyze their correspondences in order to determine motion.Feature-based methods minimize an error measure that is basedon distances between a few corresponding features, while directmethods minimize a global error measure that is based on directimage information collected from all pixels in the image. It isimportant to observe that with direct methods the pixel corre-spondence/classication is performed directly with the measur-able image quantities at each pixel, while in feature-basedmethods this is done indirectly, based on independent featuremeasurements in a set of sparse pixels. An important property ofthe direct methods is that they can successfully estimate globalmotion even in the presence of multiple motions and/or outliers[13]. However, computational time is wasted by including in theminimization a large number of pixels where no ow can bereliably estimated. On the other hand, feature-based methodsinitially ignore areas of low information, resulting in a problem offewer parameters to be estimated, with good convergence evenfor long sequences. Hence, considering the tracking efciency, weonly focus on the feature-based methods.

    Feature-based methods for motion segmentation usually con-sist of two independent stages: (1) feature selection and/orcorrespondence and (2) motion parameter estimation [13]. Thesecond stage is often performed through factorization methods[14], although some simpler clustering strategy can be used [15].Several methods have been proposed for sparse feature selectionand/or correspondence, and among them, the most popular arethe Harris Corner Detector [16,17], and SIFT [18]. However, thesesparse feature-based methods compute feature correspondencesindependently. Thus, they are very sensitive to outliers, makingthem susceptible to errors in motion parameter estimation/segmentation. Moreover, homogeneous regions of a frame maypresent none or few features, which results in the motionestimation/segmentation difcult (or even impossible) in largeareas of the video frames. In object tracking eld, Papadakis et al.[19] utilized a graph cuts approach and separated each object intovisible and occluded parts using an energy function, whichcontains terms based on position and motion information. Silvaet al. [12] obtained a pixel-wise segmentation by clustering a setof adaptively sampled points in space and time domains. Thesemethods still tend to emphasize the motion segmentation, whichfocus on the low-level information of the pixels. They rarely usethe high-level semantic features associated with tracking target.Thus it leads to a low performance of tracking and a high cost ofcomputation.

    In this paper, we present a novel approach for multi-targettracking where the scene is captured by a stationary monocularcamera. Based on skeleton points assignment (SPA), our approachcombines the advantages of feature-based motion segmentationmethods and the probabilistic appearance-based particle ltertracking framework, as described next. Initially, a set of sparsepoints are computed, which we call the skeleton points. Instead ofcomputing point correspondences independently (as done inmany feature-based methods), neighboring particles are treated

    as they were linked, reducing the occurrence of outliers andstated as follows: rstly, we dene the matching strategy andthe state transition matrix of the skeleton point


View more >