tracking people through partial occlusions

April 2009, 16(2): 117–121 www.sciencedirect.com/science/journal/10058885 www.buptjournal.cn/xben

The Journal of China Universities of Posts and Telecommunications

Tracking people through partial occlusions LU Jian-guo ( ), CAI An-ni

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

Abstract

This article presents a novel people-tracking approach to cope with partial occlusions caused by scene objects. Instead of predicting when and where the occlusions will occur, a part-based model is used to model the pixel distribution of the target body under occlusion. The subdivided patches corresponding to a template image will be tracked independently using Markov chain Monte Carlo (MCMC) method. A set of voting-based rules is established for the patch-tracking result to verify if the target is indeed located at the estimated position. Experiments show the effectiveness of the proposed method.

Keywords partial occlusion, part-based model, MCMC, voting-based rules

1 Introduction

Visual surveillance has been widely used in intelligent monitoring, behavior analysis, and traffic control. Most of recent research on human trajectory tracking has been focused on interaction among multiple human bodies [1–3]. These studies try to establish models on interaction among objects and solve data association. The occlusions will be inferred from the interaction model. However, in many cases, occlusions are caused by stationary scene objects. And it is difficult to use generalized reasoning to define occlusions caused by stationary scene objects in such circumstances. There are methods of solving occlusion problems as segmentations, and using appearance models to cope with large occlusions [4–5]. However, they are difficult to be used in complicated environments. Currently, pixel assignment during occlusion starts to be considered in a few works.

In this article, occlusions are explained as the probability of invisible pixels on the surface area of a human body and thus the problem is formulated as a subdivided multi-target tracking similar to that in Ref. [6]. The human’s articulate motion is not emphasized and the body in upright pose is simply represented by a 2D spatial region. A part-based model is proposed to cope with the non-rigid body motion. Verification of the existence of a human body is treated as the co-effect of each patch area with occluded pixels discarded. This article is organized as follows:

Received date: 11-04-2008 Corresponding author: LU Jian-guo, E-mail: [email protected] DOI: 10.1016/S1005-8885(08)60215-0

Sect. 2 describes the architecture of the proposed method. Sect. 3 details the part-based model and the tracking of multiple patches with MCMC method. The voting-based rules for combination of patches are described in Sect. 4. Experiment results and conclusions are presented in Sects. 5 and 6, respectively.

2 The proposed tracking scheme

The scheme of the proposed algorithm is shown in Fig. 1. First, the background in the scope of monitor camera is modeled. Set in this background, any person entering the area will be observed in segment to generate the original object-body template. And the extracted foreground will be subdivided into non-overlap rectangular patches to form the part-based template model. The above process may be regarded as system initialization.

Fig. 1 Diagram of the proposed method

Then, observation of the object body is turned into inspecting multiple patches on the body, which are marked uniquely and tracked independently. Finally, existence of the human body is verified by a voting strategy based on successfully tracked patches.

118 The Journal of China Universities of Posts and Telecommunications 2009

3 Part-based tracking

In tracking a specific object in a given video sequence, suppose that a human body O appearing at time t. Usually, a template T is given to represent O and a sub-region of the image I with the highest similarity to T is considered as the presence of O. The tracking process is shown in Fig. 2. Suppose that the position of O in the previous frame is 1 0 0: ( , )p x y . A search will be carried out around the estimated position 2 1 1: ( , )p x y in the current frame. 1p and

2p are the coordinates of the object center at time 1t and t, respectively. TP is a patch with a displacement ( , )x yd d

from the template’s center, and IP is its corresponding patch in the current image frame.

Fig. 2 Tracking with template

3.1 Tracking model

Let [1, ] , where denotes tracking time duration. Assume that there are k patches in the proposed human-body appearance model, as shown in Fig. 3. Let tz denote the observation of all IP , which corresponds to TP in the template at time t. During the tracking, it is supposed that image features of the same target moves together. Then, the tracking problem is defined as, given a set of observations

1 2{ , ,..., },t tz z zz inferring the positions of varying number of patches { }ix and deducing the most probable ‘cluster’ of tracked patches with certain spatio-temporal constraints in each time step to validate existence of the target object.

Fig. 3 Pre-partitioned model

In our algorithm, ix is defined as a rectangle image patch corresponding to the ith subarea in the pre-partitioned appearance model T, which is characterized by a statistical

RGB histogram and assigned a unique identity (ID). The joint state of patches is composed of the continuous Markov random field (MRF) 1{ } t

t

kk i ixx , with size tk that denotes

the states of target patches, and a discrete process { }iccfor labeling the existence of the corresponding patches. Here, the object is located by maximizing the posterior probability of its elementary blobs ,tk t

x over all pixel sites. The

posterior probability is conditioned by the measurements of patches in adjacent steps and the collecting state of patches in the current frame is expressed by Eq. (1):

1ˆarg max ( | , , )

t tt k t kX p x z c x (1)

3.2 Patch tracking

From the human-body appearance model, the patches of the target object are allowed to move independently. To maximize the likelihood of each patch, the tracking process for selected patches can be formulated as Eq. (2) following Bayesian rules.

1

1

1 1

, , , 1 1

, 1 1 , 1

( | ) ( | ) ( , | , )

( | )d

t t tt

t t

t t t k t k t t k t tk

k t t k t

p p z p k k

p z

x z x x x

x x

(2)

1 1 1, , 1 1 , 1 , 1 1 , 1( , | , ) ( | , , ) ( | , )t t t t tk t t k t t k t t t k t t t k tp k k p k k p k kx x x x x

(3) ,( | )tt k tp z x denotes the measurement of tk patches at time t,

1, , 1 1( , | , )t tk t t k t tp k kx x predicts the dynamic of patches and

the probability of patches entering or leaving the patch set. When mapped onto a calculator, Bayes filter-based

algorithms will be used to estimate the numeric description of the involved distributions, whose nature may be parametric (Kalman filtering) or nonparametric (particle filtering). In this article, the particle filter (PF) methods are used. Because they are based on universal approximations while still are compact in description and not subject to any constraints of the models [7]. The basic PF is a Monte Carlo (MC) method that the target distribution is represented with a weighted set of samples ( ) ( )

, 1{ , }n n Ni t t nx that propagate with multiple

hypotheses maintained at the same time and use a stochastic motion model to predict the object distribution. Assume that all selected patches move independently and no interaction occurs, and the measurement can be approximated as:

( ) ( ), , 1 , , 1( | ) ( | ) ( | )t t

t

n nk t t t k t t i t i t

n i k

p p z p x xx z x (4)

Here, the number tk keeps constant in one time step. The condition for it to change will be discussed later.

However, the dimension of the multi-patch state-space will grow linearly with the number of patches. Search in such a high dimensional joint state-space, as described by Eq. (2), is

Issue 2 LU Jian-guo, et al. / Tracking people through partial occlusions 119

not trivial for importance sampling techniques. Thus, a more efficient sampling scheme is necessary.

3.3 MCMC sampling

To avoid the dimension problem, the samples should be placed as close as possible to regions with high probability. In this section, a MCMC-based PF method similar to those in Refs. [3,8–9] is used. Given a data set x , MCMC is a method of drawing new samples *x from a proposal distribution *( | )q x x using the Metropolis-Hastings (MH)

sampler and *x will be accepted to construct a Markov chain under acceptance ratio .

* *

*

( ) ( )min 1,( ) ( | )p qp qx x | xx x x

(5)

Here, ( )p x represents patch distribution. Thus, the

construction of proposal density *( | )q x x will be the key problem. A proposal is used that selects a single target patch from all patches with equal probability and its state ,i tx is

updated just by sampling from the proposal density of Eq. (6). * *( | ) ( ) ( | , )t t t t

iq q i q ix x x x (6)

* ( ) *, , 1*

*

1 ( ); ( | , )

0;

ni t i t

nt t

p x x i iq i K

i ix x (7)

where K is the number of patches. From the above definition for MH sampler, candidates for each patch are set to a region of high likelihood at each time step. Once all particles are generated, the marginal mean estimation will be used to approximate the positions of patches.

3.4 Variations of the number of patches

The above sampling method is used with a constant number of patches in every time step. While continuously running the tracker on a video sequence, some of the patches constructing the human model may enter or leave the monitoring scope. Although the MCMC-PF can cope with the trans-dimension problem using filter recursion [8], this would require a multi-object observation model that clearly defines the number of objects. However, it is hard for the multi-patch distribution in a single foreground body to fulfill such a condition. Hence, a detection-aided strategy is used to deal with the problem of changing number of patches.

Under the condition of leaving, it is assumed that the patches at time 1t are transferred to the next time step, and some of them are allowed to be lost by rules (shown in Sect. 4). And the lost blocks may be predicted by using the neighborhood rule for continuously monitoring the human body

in subsequent frames. On the other hand, if new blobs appear at some scene

regions in a birth-like manner and do not overlap with the existing patches, the corresponding patch ID *j will be offered to the new regions based on the neighborhood relationship with the remaining patches in the template and the likelihood will be computed. The newly entered patches will also be validated following the rules presented in Sect. 4. The legal patches will be added to the current configurations as the prior condition for the next time step.

4 Voting-based rules

In the proposed framework, ic is evaluated according to the tracking results of patch i. Here, ic is treated as Boolean. If patch i is successfully tracked, ic is set to 1, indicating the existence of patch i in the current frame. On the contrary, ic is set to 0. The position of human body will be predicted according to the collection of { }ic . Instead of calculating the distribution of { }ic , a more direct method is used based on the voting strategy to judge each patch’s contribution to the existence of the tracked object.

To explain the voting rule conveniently, some measurements are defined first.

1) Motion is represented by the displacement ( , )x y

in adjacent fames and direction ( , )x y in the same

Castesian coordinate system. 2) The geometrical center of patch k is represented by ( , )kC x y .

3) Distance d is calculated by Euclidean distance between the centers of two patches.

Let tO be the object body represented by a set of patches itP in frame tz and 1 1t tO z be depicted by a set of

patches 1itP . In the part-based model, the similarity of the

patch’s spatial distribution is used to indicate for validating the hypothesis that the object body is indeed located at a certain position. A correspondence between the constellations

1( )itV P of patches in 1tO and ( )itV P as 1: ( )i

tM V P

( )itV P will be accepted (or rejected) according to the following rules.

1) The patch itP in tO have three kinds of

neighborhoods in position: diagonal, horizontal, and vertical, which can be expressed as:

,: ( )d i j HM d T ,( )i j Vd T ,( )i j Dd T (8)

where ,i jd is the Euclidean distance between the centers of

patches i and j, { , , }H V DT T T are the thresholds corresponding

120 The Journal of China Universities of Posts and Telecommunications 2009

to the three directions, respectively. The thresholds can be defined according to the size of patches.

2) In the application, there is no need to track the human’s articulate motion. Thus, it is assumed that patches in the part-based model have similar motions, that is:

, ,[( ) ]i x j x H , ,[( ) ]i y j y V ( )i j (9)

where H and V can be calculated as the means in the previous observation.

The above rules aim to keep spatial and temporal consistencies during tracking. In case of occlusions, the trackers will keep the same appearance model as in non-occlusion situations to deal with the case when ‘lost’ patches may reenter. The more patches respond positive, the stronger indication is of the existence of the object.

5 Experiment results

In this section, the tracking results and evaluations on two video sets are shown monocular. The video sequences are collected from the Internet with size 720 576 and 352 288, respectively. The experiments are carried out under the assumption that humans walk with a nearly constant velocity. The cases such as sudden disappearing or changing motion directions are not involved.

On initialization, the moving body area can be manually marked out or automatically extracted using the adaptive background subtraction method [10]. Target’s state was defined by the body’s position on the reference plane. The body’s appearance was represented by patches of the template, and each patch was modeled by the weighted color histogram [11]. The Bhattacharyya distance was used to measure the similarity between two patches.

For status evolution, it is supposed that the sample set is propagated following the first order dynamic model as follows:

1t t tx Ax w (10)where A is a deterministic component, and tw is Gaussian noise.

To evaluate the performance of the tracker quantitatively, the authors manually marked out the ground truth (GT) of the human body in original frames and recorded the predicted position error generated by the proposed algorithm.

Fig. 4 illustrates the result of tracking a person in a parking lot with the tracker. In this sequence, occlusions were caused by shoring and guardrail. The position error produced by the tracker is shown in Fig. 5 with solid line marked by ‘o’. To examine the effectiveness of this method, a traditional importance-sampling-based PF tracker is used to predict the position of the human body, and the position error is presented in Fig. 5 with solid line marked by ‘ ’.

Fig. 4 Tracking a people in a parking lot

Fig. 5 Tracking error in sequence 1

Fig. 6 shows the result of tracking a walking pedestrian by the proposed method. In this sequence, occlusions were caused by cars parked along the roadside. In this round of

Fig. 6 Tracking a pedestrian

Issue 2 LU Jian-guo, et al. / Tracking people through partial occlusions 121

experiments, the authors also compared the tracking results of the method with the traditional PF method, as shown in Fig. 7. The two lines denote the same as in those in Fig. 5.

From the curves in Figs. 5 and 7, it can be seen that the traditional PF tracker drifts away significantly because of the background clutter and distortion of body appearance caused by scene object occlusion. However, under such situations, the tracker performs much better although poor predictions for some patches will affect the joint likelihood of all patches.

Fig. 7 Tracking error in sequence 2

6 Conclusions

This article aims to process video monitoring in public areas, such as parking lots, road, and indoor space. A MCMC-particle-filter-based framework is proposed for tracking people through partial occlusions caused by scene objects. The occluded body may be regarded as loss of original pixels. Hence, a part-based model is established and each patch of the model will be tracked independently. The existing patches in the current fame will be combined to verify the existence of the tracked target. In comparison, a human body is modeled using color histogram and the body is tracked by the basic PF method. The experiment results indicate that the proposed tracker is more flexible under such situations.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (60772114).

References

1. Qu W, Schonfeld D, Mohamed M. Real-time interactively distributed multi-object tracking using a magnetic-inertia potential model. Proceedings of the 10th International Conference on Computer Vision (ICCV’05): Vol 1, Oct. 17 21, 2005, Beijing, China. Piscataway, NJ, USA: IEEE, 2005: 535 540

2. Han M, Xu W, Tao H, et al. An algorithm for multiple object trajectory tracking. Proceedings of IEEE 2004 Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04): Vol 1, Jun 27 Jul 2, 2004, Washington, D C, USA. Los Alamitos, CA, USA: IEEE Computer Society, 2004: 864 871

3. Khan K, Balch T, Dellaert F. MCMC-based particle filtering for tracking a variable number of interacting targets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(11): 1805 1819

4. Cucchiara R, Grana C, Tardini G, et al. Probabilistic people tracking for occlusion handling. Proceedings of the 17th International Conference on Pattern Recognition (ICPR’04): Vol 1, Aug 23 26, 2004, Cambridge, UK. Piscataway, NJ, USA: IEEE, 2004: 132 135

5. Eng H L, Wang J X, Kam A H, et al. A Bayesian framework for robust human detection and occlusion handling human shape model. Proceedings of the 17th International Conference on Pattern Recognition (ICPR’04): Vol 2, Aug 23 26, 2004, Cambridge, UK: Piscataway, NJ, USA: IEEE, 2004: 257 260

6. Zhao Q, Kang J M, Tao H, et al. Part based human tracking in a multiple cues fusion framework. Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06): Vol 1, Aug 20 24, 2006, Hong Kong, China: Piscataway, NJ, USA: IEEE, 2006: 450 455

7. Arulampalam M S, Maskell S, Gordon N, et al. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing, 2002, 50(2): 174 188

8. Smith K, Gatica-Perez D, Odobez J M. Using particle to track varying numbers of interacting people. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05): Vol 1, Jun 20 25, 2005, San Diego, CA, USA. Los Alamitos, CA, USA: IEEE Computer Society, 2005: 962 969

9. Yu Q, Mediono G, Cohen I. Multiple target tracking using spatio-temporal Markov chain monte carlo data association. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’07), Jun 17 22, 2007, Minneapolis, MN, USA. Los Alamitos, CA, USA: IEEE Computer Society, 2007: 1 8

10. Stauffer C, Grimson W E L. Learning patterns of activity using real-time tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 747 757

11. Satoh Y, Okatani T, Deguchi K. A color-based tracking by Kalman particle filter. Proceeding of the 17th International Conference on Pattern Recongnition (ICPR’04): Vol 3, Aug 23 26, 2004, Cambridge, UK. Piscataway, NJ, USA: IEEE, 2004: 502 505

(Editor: ZHANG Ying)

tracking people through partial occlusions

Documents