Multi-target tracking with occlusions via skeleton points assignment
Post on 10-Sep-2016
Embed Size (px)
conducted on the public challenging dataset PETS 2009. Results show that this approach can improve
the performance of the existing tracking approach and handle dynamic occlusions better.
Crown Copyright & 2011 Published by Elsevier B.V. All rights reserved.
f multisions.lly unaith occ
identity. In addition, for some applications, multiple views are
temporal association of observations that could maximize the
Papadourakis et al.  presented a robust object tracking
Contents lists available at SciVerse ScienceDirect
journal homepage: www.els
Neurocomputing 83 (2012) 165175either a single Gaussian or a Gaussian mixture distribution email@example.com (W. Zhang).algorithm which could automatically build appropriate objectrepresentations by color and handles spatially extended andtemporally long object occlusions. The majority of the abovemethods are under a simple assumption that object color satises
0925-2312/$ - see front matter Crown Copyright & 2011 Published by Elsevier B.V. All rights reserved.
n Corresponding author. Tel.: 86 010 82614489; fax: 86 010 62545229.E-mail addresses: firstname.lastname@example.org (H. Ding),not always available. Thus, this paper focuses on designing a consistency of both motion and appearance of object trajectories.special geometries provide insufcient information to generatea unique signal for tracking, due to visual occlusions. In thesecases more specialized tracking methods need to maintain track
interactions are demonstrated. Qian et al.  proposed a frwork for treating the general multiple target tracking prowhich was formulated in terms of nding the best spatiagroups, depending on whether a scene is captured by a singlestationary camera or by multiple cameras. Multi-view methodsfuse information from multi-views to localize people on multiplescene planes . These methods work well, except for specialgeometries of the camera views and people locations. These
coherent 2D motion layers and introduced a complete dynamicmotion layer representation in which spatial and temporal con-straints on shape, motion and appearance are estimated using theEM algorithm. His method has been applied in an airborne vehicletracking system and examples of tracking vehicles in complexIn particular, the major challenge ofrequent presence of visual occlucurrent observation totally or partiaintervals. The problem of dealing wan open subject.
Solutions to managing occlusionsprocessing tasks, such as video surveillance and event inference. part of the object model, and embed it into tracking process.1. Introduction
Despite a lot of attention being dpeople in video sequences over thremains very concerning in manyed to tracking multiple20 years, the problemuter vision and video
-targets tracking is theOcclusions make thevailable for some timelusions correctly is still
e decomposed into two
monocular tracking methodology, with the goal of handlingrelatively complex occlusion scenarios.
A considerable amount of research has reported on the treat-ment of occlusions from a stationary monocular camera duringthe last decade. Most of them consider occlusion segmentation as
These works build object models using color , appearance and motion  information. Unfortunately, these models arelearned for describing the postures or actions of the trackingtargets, and do not t well with occlusion segmentation.
There are also many attempts to deal with the problem ofocclusions explicitly. Hai et al.  decomposed video frames intoMulti-target tracking with occlusions v
Huan Ding n, Wensheng Zhang
Institute of Automation, Chinese Academy of Sciences, No. 95 Zhongguancun East Road
a r t i c l e i n f o
Received 19 May 2011
Received in revised form
6 December 2011
Accepted 10 December 2011
Communicated by Tao MeiAvailable online 29 December 2011
Skeleton points assignment
a b s t r a c t
Multiple-target tracking i
vision. Handling the occlu
introduces the method of
(Skeleton Points Assign, SP
complex situations captur
skeleton points and evalua
assign these points to di
motion and color; nally
accomplish occlusion segm
missing information of oc
framework, in which a proskeleton points assignment
ijing 100190, China
omplex scenes is one of the most complicated problems in computer
between objects is the key issue in multiple-target tracking. This paper
tion segmentation into the object tracking system, and presents a SPA
based occlusion segmentation approach to track multiple people through
y static monocular cameras. In the proposed method, we rst select the
heir occlusion states by low-level information like optical ow; then we
nt objects using advanced semantic information, such as appearance,
dense classication of foreground pixels are taken advantages of to
tation and a blob-based compensation strategy is utilized to estimate the
ed objects. Object tracking is handled by a particle lter-based tracking
ilistic appearance model is used to nd the best particle. Experiments are
Compared with the existing methods, our contribution is
H. Ding, W. Zhang / Neurocomputing 83 (2012) 165175166Meanwhile, these approaches do not take the historical informa-tion in the scene into account. Thus all the methods mentionedabove lead to the poor performance of occlusion segmentationand multiple targets tracking for objects with similar color.
Recently, some important issues in utilizing motion segmenta-tion in tracking system are discussed. In the context of motionsegmentation, the literature can be divided in two kinds: directmethods and feature-based methods . Direct methods recoverthe unknown parameters directly from measurable image quan-tities at each pixel in the image. This is in contrast with thefeature-based methods, which rst extract a sparse set of distinctfeatures from each image separately, and then recover andanalyze their correspondences in order to determine motion.Feature-based methods minimize an error measure that is basedon distances between a few corresponding features, while directmethods minimize a global error measure that is based on directimage information collected from all pixels in the image. It isimportant to observe that with direct methods the pixel corre-spondence/classication is performed directly with the measur-able image quantities at each pixel, while in feature-basedmethods this is done indirectly, based on independent featuremeasurements in a set of sparse pixels. An important property ofthe direct methods is that they can successfully estimate globalmotion even in the presence of multiple motions and/or outliers. However, computational time is wasted by including in theminimization a large number of pixels where no ow can bereliably estimated. On the other hand, feature-based methodsinitially ignore areas of low information, resulting in a problem offewer parameters to be estimated, with good convergence evenfor long sequences. Hence, considering the tracking efciency, weonly focus on the feature-based methods.
Feature-based methods for motion segmentation usually con-sist of two independent stages: (1) feature selection and/orcorrespondence and (2) motion parameter estimation . Thesecond stage is often performed through factorization methods, although some simpler clustering strategy can be used .Several methods have been proposed for sparse feature selectionand/or correspondence, and among them, the most popular arethe Harris Corner Detector [16,17], and SIFT . However, thesesparse feature-based methods compute feature correspondencesindependently. Thus, they are very sensitive to outliers, makingthem susceptible to errors in motion parameter estimation/segmentation. Moreover, homogeneous regions of a frame maypresent none or few features, which results in the motionestimation/segmentation difcult (or even impossible) in largeareas of the video frames. In object tracking eld, Papadakis et al. utilized a graph cuts approach and separated each object intovisible and occluded parts using an energy function, whichcontains terms based on position and motion information. Silvaet al.  obtained a pixel-wise segmentation by clustering a setof adaptively sampled points in space and time domains. Thesemethods still tend to emphasize the motion segmentation, whichfocus on the low-level information of the pixels. They rarely usethe high-level semantic features associated with tracking target.Thus it leads to a low performance of tracking and a high cost ofcomputation.
In this paper, we present a novel approach for multi-targettracking where the scene is captured by a stationary monocularcamera. Based on skeleton points assignment (SPA), our approachcombines the advantages of feature-based motion segmentationmethods and the probabilistic appearance-based particle ltertracking framework, as described next. Initially, a set of sparsepoints are computed, which we call the skeleton points. Instead ofcomputing point correspondences independently (as done inmany feature-based methods), neighboring particles are treated
as they were linked, reducing the occurrence of outliers andstated as follows: rstly, we dene the matching strategy andthe state transition matrix of the skeleton point