single and multiple object tracking using log-euclidean ... · using log-euclidean riemannian...

Single and Multiple Object Tracking Using Log-Euclidean Riemannian Subspace and Block-Division Appearance Model IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.34, NO.12 NOVEMBER 2012

Weiming Hu, Xi Li, Wenhan Luo, Xiaoqin Zhang,

Stephen Maybank, and Zhongfei Zhang

1

Outline

• Introduction

• Proposed Method

• Experimental Result

• Conclusion

2

Introduction • Appearance Models

• Riemannian Metrics

• Proposed Work

3

Introduction

• Visual object tracking is one of the most fundamental tasks in

applications of video motion processing, analysis, and data

mining.

• Object Appearance models can be based on CH, kernel density

estimates, Gaussian mixture models, conditional random fields,

or learned subspaces.

4

Subspace-Based Appearance Model

• In this model, the matrices of the pixel values in image are

rewritten into vectors, and global statistical information about

the pixel values is obtained by PCA.

5

Drawbacks

• General limitations of the current subspace-based models:

• The do not directly use object pixel values’ local relations.

• Difficult to update appearance models during occlusions.

6

Riemannian Metrics

• A covariance matrix descriptor can capture the spatial

correlations of the features extracted from an object region.

• Statistics for the matrices can be constructed using an

appropriate Riemannian metric [26].

• Porikli et al. [24] propose a Riemannian metric-based object

tracking method.

• Tuzel et al. [25] proposed an algorithm for detecting people by

classification on Riemannian manifolds.

7

Riemannian Metrics (cont.)

• Arsigny et al. [28] propose the log-Euclidean Riemannian

metric for statistics on the manifold of symmetric positive

definite matrices.

8

[24] F. Porikli, O. Tuzel, and P. Meer, “Covariance Tracking Using Model Update Based on Lie Algebra,” Proc. IEEE Conf. Computer

Vision and Pattern Recognition, vol. 1, pp. 728-735, 2006.

[25] O. Tuzel, F. Porikli, and P. Meer, “Human Detection via Classification on Riemannian Manifolds,” Proc. IEEE Conf.Computer Vision

and Pattern Recognition, pp. 1-8, June 2007.

[26] P.T. Fletcher and S. Joshi, “Principal Geodesic Analysis on Symmetric Spaces: Statistics of Diffusion Tensors,” Proc. Computer

Vision and Math. Methods in Medical and Biomedical Image Analysis, pp. 87-98, 2004.

[28] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache, “Geometric Means in a Novel Vector Space Structure on Symmetric Positive-

Definite Matrices,” SIAM J. Matrix Analysis and Applications, vol. 29, no. 1, pp. 328-347, Feb. 2007.

Proposed Work

• Proposed work is based on the their previous work [1], which

uses log-Euclidean Riemannian metric.

• Main components include a block-division appearance model,

Bayesian state inference for single object tracking, and multi-

object tracking with occlusion.

9

[1] X. Li, W.M. Hu, Z.F. Zhang, X.Q. Zhang, M.L. Zhu, and J. Cheng, “Visual Tracking via Incremental Log-Euclidean Riemannian

Subspace Learning,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, pp. 1-8, June 2008.

Proposed Work (cont.)

• In their incremental subspace learning algorithm, covariance

matrices of image features are transformed into log-Euclidean

Riemannian matrices.

• The object appearance region is divided into several block.

• The likelihood of a candidate block is computed, and then a

related likelihood matrix is obtained.

• The matrix will be filtered locally and globally.

10

Proposed Method

• Incremental Log-Euclidean

Riemannian Subspace Learning

• Log-Euclidean Block-Division

Appearance Model

• Single/Multi-Object Tracking

11

Incremental Log-Euclidean Riemannian Subspace Learning

• First, the image covariance matrix descriptor is introduced.

• Riemannian geometry will be introduced as well.

• Then the proposed algorithm is described.

12

Covariance Matrix Descriptor

• Let 𝑓𝑖 be a 𝑑-dimensional feature vector of pixel 𝑖.

• 𝑓𝑖 is defined by 𝑥, 𝑦, 𝐸𝑗 𝑗=1,…,𝜏

13

Covariance Matrix Descriptor

• 𝐿 is the number of pixels in the region.

• The image region R is represented using 𝑑 × 𝑑 covariance

matrix 𝐶𝑅

14

Riemannian Geometry for Symmetric Positive Definite Matrices

• The Riemannian geometry is available for calculating statistics

of covariance matrices.

• The Riemannian geometry depends on the Riemannian metric,

which describes the distance relations between samples in

Riemannian space and mean.

15


16

• Exponential and the logarithm of matrices are fundamental

matrix operations.

• Given a symmetric positive definite matrix 𝐴, then SVD for 𝐴 is

𝐴 = 𝑈Σ𝑈𝑇, where Σ is a diagonal matrix 𝐷𝑖𝑎𝑔(𝜆1, 𝜆2, … , 𝜆𝑑).


• .

17


• Because there’re some limitations of the affine-invariant

Riemannian metric, the log-Euclidean Riemannian metric is

proposed.

• In the Lie algebra, the mean 𝜇 is explicitly computed by

18


• A covariance matrix of the image features inside an object

block is used to represent this object block.

• A sequence of N images yields N covariance matrix 𝐶𝑡 ∈ 𝑅𝑑×𝑑

which constitute a covariance matrix sequence 𝐴 ∈ 𝑅𝑑×𝑑×𝑁.

• By (4), transform A into log-Euclidean covariance matrix

sequence 𝛼 = 𝑙𝑜𝑔 𝐶1 , … , 𝑙𝑜𝑔 𝐶𝑁 .

• Unfold 𝑙𝑜𝑔 𝐶𝑡 into a 𝑑2-dimensional column vector 𝑣𝑡. Then

𝛼 → Υ = 𝑣1 … 𝑣𝑁 .

19


• Υ is incrementally updated when new data arrive. A matrix 𝑋

whose columns are obtained by subtracting 𝜇 from each

column vector in Υ.

• Then SVD for 𝑋 can be carried out. The first 𝑘 largest singular

values in D form 𝐷𝑘 and 𝑈𝑘.

• The log-Euclidean Riemannian subspace is represented by

𝜇, 𝑈𝑘 , 𝐷𝑘 .

20


• The incremental updating process example :

• 1. Let 𝜇𝑡−1, 𝑈𝑡−1𝑘 , 𝐷𝑡−1

𝑘 be the previous log-Euclidean subspace

at stage 𝑡 − 1.

• 2. At stage 𝑡, a covariance matrix sequence 𝐴∗ ∈ 𝑅𝑑×𝑑×𝑁∗ is

added.

• 3. And it will be transformed into log-Euclidean covariance

matrix sequence 𝛼∗.

21


• 4. Then 𝛼∗ is transformed into a new log-Euclidean unfolding

matrix Υ ∈ 𝑅𝑑2×𝑁∗.

• 5. Finally, the new subspace 𝜇𝑡 , 𝑈𝑡𝑘 , 𝐷𝑡

𝑘 at stage 𝑡 is estimated.

22

Likelihood Evaluation

• Let 𝐶𝑡 ∈ 𝑅𝑑×𝑑 be the covariance matrix. Let 𝑣𝑡 be the column

vector obtained by unfolding 𝑙𝑜𝑔 𝐶𝑡 .

• Then the distance between 𝑣𝑡 and 𝜇, 𝑈, 𝐷 is

23

Log-Euclidean Block-Division Appearance Model

• Object region is divided into nonoverlapping blocks in order to

incorporate more spatial information into the model.

• In this section, block division and spatial filtering will be

introduced.

24

Appearance Block Division

• Object is divided into 𝑚 × 𝑛 blocks. For each block 𝑖, 𝑗 , the

covariance matrix feature 𝐶𝑖𝑗𝑡 ∈ 𝑅𝑑×𝑑 is extracted using (1), (2).

25


• By the log-Euclidean mapping using (4), get 𝛼𝑖𝑗. Then unfold.

26

Example

• The incremental updating process example :

• 1. Let 𝜇𝑡−1, 𝑈𝑡−1𝑘 , 𝐷𝑡−1

𝑘 be the previous log-Euclidean subspace

at stage 𝑡 − 1.

• 2. At stage 𝑡, a covariance matrix sequence 𝐴∗ ∈ 𝑅𝑑×𝑑×𝑁∗ is

added.

• 3. And it will be transformed into log-Euclidean covariance

matrix sequence 𝛼∗.

27

Example

• 4. Then 𝛼∗ is transformed into a new log-Euclidean unfolding

matrix Υ ∈ 𝑅𝑑2×𝑁∗.

• 5. Finally, the new subspace 𝜇𝑡 , 𝑈𝑡𝑘 , 𝐷𝑡

𝑘 at stage 𝑡 is estimated.

28


• After unfold and get Υ𝑖𝑗 , 𝜇𝑖𝑗 , 𝑈𝑖𝑗 , 𝐷𝑖𝑗 is learned using

incremental log-Euclidean Riemannian subspace learning

algorithm.

• ℤ𝑖𝑗, the Euclidean vector distance between the block (𝑖, 𝑗) and

the learned log-Euclidean subspace model 𝜇𝑖𝑗 , 𝑈𝑖𝑗 , 𝐷𝑖𝑗 can be

determined by (10)

29

Local and Global Spatial Filtering

• A matrix 𝑀 = 𝑝𝑖𝑗 𝑚×𝑛∈ 𝑅𝑚×𝑛 is obtained in the previous part.

• In order to remedy occasional inaccurate estimation, 𝑀 will be

filtered both local and global.

30

Observation Likelihood

• The overall likelihood correlates with the product of all the

corresponding block-specific likelihoods after the local and

global spatial filtering:

31

Single/Multi Object Tracking

• According to [1], a particle filter is used for approximating the

distribution over the location of the object using a set of

weighted samples and is applied to estimate the optimal state.

32 Reference : M. Isard and A. Blake, “CONDENSATION : conditional density propagation for visual tracking,"

International Journal on Computer Vision, vol. 29, no. 1, pp. 5-28, 1998.

Single/Multi Object Tracking

• The state 𝑋𝑡 in frame t is described by the affine motion

parameter 𝑥𝑡 , 𝑦𝑡 , 𝜂𝑡 , 𝑠𝑡 , 𝛽𝑡 , 𝜙𝑡 .

• In the tracking process, observation model 𝑝 𝑂𝑡|𝑋𝑡 (10) and a

dynamic model 𝑝 𝑋𝑡|𝑋𝑡−1 are used to obtain the optimal state

where 𝑂𝑡 is the observation in frame 𝑡.

33

Occlusion Detection

• Observation on reconstruct error (10), if occluded, blocks

corresponding to the occluded part have much larger value.

34

Experimental Result

35

Initial Setup

• The experiments covered 10 challenging videos, five of which were

taken by nonstationary cameras and five of which were taken by

stationary cameras.

• 1. Face tracking example : face detection

• 2. Stationary cameras example : background subtraction

• 3. Nonstationary cameras example : optical flow region analysis [19]

36

[19] X. Zhou, W.M. Hu, Y. Chen, and W. Hu, “Markov Random Field Modeled Level Sets Method for Object Tracking with Moving

Cameras,” Proc. Asian Conf. Computer Vision, pp. 832-842, 2007.

Experiment Comparison

• The algorithm based on the affine-invariant Riemannian metric [24]

• The vector subspace-based algorithm [14]

• Jepson et al.’s algorithm [5]

• Yu and Wu’s algorithm [7]

• The multiple instance learning based algorithm [36]

37

[5] A.D. Jepson, D.J. Fleet, and T.F. El-Maraghi, “Robust Online Appearance Models for Visual Tracking,” Proc. IEEE Conf.

Computer Vision and Pattern Recognition, vol. 1, pp. 415-422, 2001.

[7] T. Yu and Y. Wu, “Differential Tracking Based on Spatial- Appearance Model (SAM),” Proc. IEEE Conf. Computer Vision and

Pattern Recognition, vol. 1, pp. 720-727, June 2006.

[14] D.A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang, “Incremental Learning for Robust Visual Tracking,” Int’l J. Computer Vision,

vol. 77, no. 2, pp. 125-141, May 2008.

[24] F. Porikli, O. Tuzel, and P. Meer, “Covariance Tracking Using Model Update Based on Lie Algebra,” Proc. IEEE Conf. Computer

Vision and Pattern Recognition, vol. 1, pp. 728-735, 2006.

[36] B. Babenko, M.-H. Yang, and S. Belongie, “Visual Tracking with Online Multiple Instance Learning,” Proc. IEEE Conf. Computer

Vision and Pattern Recognition, pp. 983-990, June 2009.

Example1

• Available on http://www.cs.Toronto.edu/~dross/ivt

38

http://www.cs.toronto.edu/~dross/ivt

Example1

39

Example 6

40

Example 10

• Database PETS 2004

41

Conclusion

42

Conclusion

• Under the log-Euclidean Riemannian metric, image feature

covariance matrices directly describe spatial relations between

pixel values.

• Block-division appearance model ensures tracking algorithm

can adapt to large appearance changes.

• Compare with six tracking algorithm, proposed method obtains

more accurate tracking results when there are variations in

illumination, pose variation, occlusions, etc.

43

single and multiple object tracking using log-euclidean ... · using log-euclidean riemannian...

Documents