single and multiple object tracking using log-euclidean ... · using log-euclidean riemannian...
TRANSCRIPT
Single and Multiple Object Tracking Using Log-Euclidean Riemannian Subspace and Block-Division Appearance Model IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.34, NO.12 NOVEMBER 2012
Weiming Hu, Xi Li, Wenhan Luo, Xiaoqin Zhang,
Stephen Maybank, and Zhongfei Zhang
1
Outline
• Introduction
• Proposed Method
• Experimental Result
• Conclusion
2
Introduction • Appearance Models
• Riemannian Metrics
• Proposed Work
3
Introduction
• Visual object tracking is one of the most fundamental tasks in
applications of video motion processing, analysis, and data
mining.
• Object Appearance models can be based on CH, kernel density
estimates, Gaussian mixture models, conditional random fields,
or learned subspaces.
4
Subspace-Based Appearance Model
• In this model, the matrices of the pixel values in image are
rewritten into vectors, and global statistical information about
the pixel values is obtained by PCA.
5
Drawbacks
• General limitations of the current subspace-based models:
• The do not directly use object pixel values’ local relations.
• Difficult to update appearance models during occlusions.
6
Riemannian Metrics
• A covariance matrix descriptor can capture the spatial
correlations of the features extracted from an object region.
• Statistics for the matrices can be constructed using an
appropriate Riemannian metric [26].
• Porikli et al. [24] propose a Riemannian metric-based object
tracking method.
• Tuzel et al. [25] proposed an algorithm for detecting people by
classification on Riemannian manifolds.
7
Riemannian Metrics (cont.)
• Arsigny et al. [28] propose the log-Euclidean Riemannian
metric for statistics on the manifold of symmetric positive
definite matrices.
8
[24] F. Porikli, O. Tuzel, and P. Meer, “Covariance Tracking Using Model Update Based on Lie Algebra,” Proc. IEEE Conf. Computer
Vision and Pattern Recognition, vol. 1, pp. 728-735, 2006.
[25] O. Tuzel, F. Porikli, and P. Meer, “Human Detection via Classification on Riemannian Manifolds,” Proc. IEEE Conf.Computer Vision
and Pattern Recognition, pp. 1-8, June 2007.
[26] P.T. Fletcher and S. Joshi, “Principal Geodesic Analysis on Symmetric Spaces: Statistics of Diffusion Tensors,” Proc. Computer
Vision and Math. Methods in Medical and Biomedical Image Analysis, pp. 87-98, 2004.
[28] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache, “Geometric Means in a Novel Vector Space Structure on Symmetric Positive-
Definite Matrices,” SIAM J. Matrix Analysis and Applications, vol. 29, no. 1, pp. 328-347, Feb. 2007.
Proposed Work
• Proposed work is based on the their previous work [1], which
uses log-Euclidean Riemannian metric.
• Main components include a block-division appearance model,
Bayesian state inference for single object tracking, and multi-
object tracking with occlusion.
9
[1] X. Li, W.M. Hu, Z.F. Zhang, X.Q. Zhang, M.L. Zhu, and J. Cheng, “Visual Tracking via Incremental Log-Euclidean Riemannian
Subspace Learning,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, pp. 1-8, June 2008.
Proposed Work (cont.)
• In their incremental subspace learning algorithm, covariance
matrices of image features are transformed into log-Euclidean
Riemannian matrices.
• The object appearance region is divided into several block.
• The likelihood of a candidate block is computed, and then a
related likelihood matrix is obtained.
• The matrix will be filtered locally and globally.
10
Proposed Method
• Incremental Log-Euclidean
Riemannian Subspace Learning
• Log-Euclidean Block-Division
Appearance Model
• Single/Multi-Object Tracking
11
Incremental Log-Euclidean Riemannian Subspace Learning
• First, the image covariance matrix descriptor is introduced.
• Riemannian geometry will be introduced as well.
• Then the proposed algorithm is described.
12
Covariance Matrix Descriptor
• Let 𝑓𝑖 be a 𝑑-dimensional feature vector of pixel 𝑖.
• 𝑓𝑖 is defined by 𝑥, 𝑦, 𝐸𝑗 𝑗=1,…,𝜏
13
Covariance Matrix Descriptor
• 𝐿 is the number of pixels in the region.
• The image region R is represented using 𝑑 × 𝑑 covariance
matrix 𝐶𝑅
14
Riemannian Geometry for Symmetric Positive Definite Matrices
• The Riemannian geometry is available for calculating statistics
of covariance matrices.
• The Riemannian geometry depends on the Riemannian metric,
which describes the distance relations between samples in
Riemannian space and mean.
15
Riemannian Geometry for Symmetric Positive Definite Matrices
16
• Exponential and the logarithm of matrices are fundamental
matrix operations.
• Given a symmetric positive definite matrix 𝐴, then SVD for 𝐴 is
𝐴 = 𝑈Σ𝑈𝑇, where Σ is a diagonal matrix 𝐷𝑖𝑎𝑔(𝜆1, 𝜆2, … , 𝜆𝑑).
Riemannian Geometry for Symmetric Positive Definite Matrices
• .
17
Riemannian Geometry for Symmetric Positive Definite Matrices
• Because there’re some limitations of the affine-invariant
Riemannian metric, the log-Euclidean Riemannian metric is
proposed.
• In the Lie algebra, the mean 𝜇 is explicitly computed by
18
Incremental Log-Euclidean Riemannian Subspace Learning
• A covariance matrix of the image features inside an object
block is used to represent this object block.
• A sequence of N images yields N covariance matrix 𝐶𝑡 ∈ 𝑅𝑑×𝑑
which constitute a covariance matrix sequence 𝐴 ∈ 𝑅𝑑×𝑑×𝑁.
• By (4), transform A into log-Euclidean covariance matrix
sequence 𝛼 = 𝑙𝑜𝑔 𝐶1 , … , 𝑙𝑜𝑔 𝐶𝑁 .
• Unfold 𝑙𝑜𝑔 𝐶𝑡 into a 𝑑2-dimensional column vector 𝑣𝑡. Then
𝛼 → Υ = 𝑣1 … 𝑣𝑁 .
19
Incremental Log-Euclidean Riemannian Subspace Learning
• Υ is incrementally updated when new data arrive. A matrix 𝑋
whose columns are obtained by subtracting 𝜇 from each
column vector in Υ.
• Then SVD for 𝑋 can be carried out. The first 𝑘 largest singular
values in D form 𝐷𝑘 and 𝑈𝑘.
• The log-Euclidean Riemannian subspace is represented by
𝜇, 𝑈𝑘 , 𝐷𝑘 .
20
Incremental Log-Euclidean Riemannian Subspace Learning
• The incremental updating process example :
• 1. Let 𝜇𝑡−1, 𝑈𝑡−1𝑘 , 𝐷𝑡−1
𝑘 be the previous log-Euclidean subspace
at stage 𝑡 − 1.
• 2. At stage 𝑡, a covariance matrix sequence 𝐴∗ ∈ 𝑅𝑑×𝑑×𝑁∗ is
added.
• 3. And it will be transformed into log-Euclidean covariance
matrix sequence 𝛼∗.
21
Incremental Log-Euclidean Riemannian Subspace Learning
• 4. Then 𝛼∗ is transformed into a new log-Euclidean unfolding
matrix Υ ∈ 𝑅𝑑2×𝑁∗.
• 5. Finally, the new subspace 𝜇𝑡 , 𝑈𝑡𝑘 , 𝐷𝑡
𝑘 at stage 𝑡 is estimated.
22
Likelihood Evaluation
• Let 𝐶𝑡 ∈ 𝑅𝑑×𝑑 be the covariance matrix. Let 𝑣𝑡 be the column
vector obtained by unfolding 𝑙𝑜𝑔 𝐶𝑡 .
• Then the distance between 𝑣𝑡 and 𝜇, 𝑈, 𝐷 is
23
Log-Euclidean Block-Division Appearance Model
• Object region is divided into nonoverlapping blocks in order to
incorporate more spatial information into the model.
• In this section, block division and spatial filtering will be
introduced.
24
Appearance Block Division
• Object is divided into 𝑚 × 𝑛 blocks. For each block 𝑖, 𝑗 , the
covariance matrix feature 𝐶𝑖𝑗𝑡 ∈ 𝑅𝑑×𝑑 is extracted using (1), (2).
25
Appearance Block Division
• By the log-Euclidean mapping using (4), get 𝛼𝑖𝑗. Then unfold.
26
Example
• The incremental updating process example :
• 1. Let 𝜇𝑡−1, 𝑈𝑡−1𝑘 , 𝐷𝑡−1
𝑘 be the previous log-Euclidean subspace
at stage 𝑡 − 1.
• 2. At stage 𝑡, a covariance matrix sequence 𝐴∗ ∈ 𝑅𝑑×𝑑×𝑁∗ is
added.
• 3. And it will be transformed into log-Euclidean covariance
matrix sequence 𝛼∗.
27
Example
• 4. Then 𝛼∗ is transformed into a new log-Euclidean unfolding
matrix Υ ∈ 𝑅𝑑2×𝑁∗.
• 5. Finally, the new subspace 𝜇𝑡 , 𝑈𝑡𝑘 , 𝐷𝑡
𝑘 at stage 𝑡 is estimated.
28
Appearance Block Division
• After unfold and get Υ𝑖𝑗 , 𝜇𝑖𝑗 , 𝑈𝑖𝑗 , 𝐷𝑖𝑗 is learned using
incremental log-Euclidean Riemannian subspace learning
algorithm.
• ℤ𝑖𝑗, the Euclidean vector distance between the block (𝑖, 𝑗) and
the learned log-Euclidean subspace model 𝜇𝑖𝑗 , 𝑈𝑖𝑗 , 𝐷𝑖𝑗 can be
determined by (10)
29
Local and Global Spatial Filtering
• A matrix 𝑀 = 𝑝𝑖𝑗 𝑚×𝑛∈ 𝑅𝑚×𝑛 is obtained in the previous part.
• In order to remedy occasional inaccurate estimation, 𝑀 will be
filtered both local and global.
30
Observation Likelihood
• The overall likelihood correlates with the product of all the
corresponding block-specific likelihoods after the local and
global spatial filtering:
31
Single/Multi Object Tracking
• According to [1], a particle filter is used for approximating the
distribution over the location of the object using a set of
weighted samples and is applied to estimate the optimal state.
32 Reference : M. Isard and A. Blake, “CONDENSATION : conditional density propagation for visual tracking,"
International Journal on Computer Vision, vol. 29, no. 1, pp. 5-28, 1998.
Single/Multi Object Tracking
• The state 𝑋𝑡 in frame t is described by the affine motion
parameter 𝑥𝑡 , 𝑦𝑡 , 𝜂𝑡 , 𝑠𝑡 , 𝛽𝑡 , 𝜙𝑡 .
• In the tracking process, observation model 𝑝 𝑂𝑡|𝑋𝑡 (10) and a
dynamic model 𝑝 𝑋𝑡|𝑋𝑡−1 are used to obtain the optimal state
where 𝑂𝑡 is the observation in frame 𝑡.
33
Occlusion Detection
• Observation on reconstruct error (10), if occluded, blocks
corresponding to the occluded part have much larger value.
34
Experimental Result
35
Initial Setup
• The experiments covered 10 challenging videos, five of which were
taken by nonstationary cameras and five of which were taken by
stationary cameras.
• 1. Face tracking example : face detection
• 2. Stationary cameras example : background subtraction
• 3. Nonstationary cameras example : optical flow region analysis [19]
36
[19] X. Zhou, W.M. Hu, Y. Chen, and W. Hu, “Markov Random Field Modeled Level Sets Method for Object Tracking with Moving
Cameras,” Proc. Asian Conf. Computer Vision, pp. 832-842, 2007.
Experiment Comparison
• The algorithm based on the affine-invariant Riemannian metric [24]
• The vector subspace-based algorithm [14]
• Jepson et al.’s algorithm [5]
• Yu and Wu’s algorithm [7]
• The multiple instance learning based algorithm [36]
37
[5] A.D. Jepson, D.J. Fleet, and T.F. El-Maraghi, “Robust Online Appearance Models for Visual Tracking,” Proc. IEEE Conf.
Computer Vision and Pattern Recognition, vol. 1, pp. 415-422, 2001.
[7] T. Yu and Y. Wu, “Differential Tracking Based on Spatial- Appearance Model (SAM),” Proc. IEEE Conf. Computer Vision and
Pattern Recognition, vol. 1, pp. 720-727, June 2006.
[14] D.A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang, “Incremental Learning for Robust Visual Tracking,” Int’l J. Computer Vision,
vol. 77, no. 2, pp. 125-141, May 2008.
[24] F. Porikli, O. Tuzel, and P. Meer, “Covariance Tracking Using Model Update Based on Lie Algebra,” Proc. IEEE Conf. Computer
Vision and Pattern Recognition, vol. 1, pp. 728-735, 2006.
[36] B. Babenko, M.-H. Yang, and S. Belongie, “Visual Tracking with Online Multiple Instance Learning,” Proc. IEEE Conf. Computer
Vision and Pattern Recognition, pp. 983-990, June 2009.
Example1
• Available on http://www.cs.Toronto.edu/~dross/ivt
38
Example1
39
Example 6
40
Example 10
• Database PETS 2004
41
Conclusion
42
Conclusion
• Under the log-Euclidean Riemannian metric, image feature
covariance matrices directly describe spatial relations between
pixel values.
• Block-division appearance model ensures tracking algorithm
can adapt to large appearance changes.
• Compare with six tracking algorithm, proposed method obtains
more accurate tracking results when there are variations in
illumination, pose variation, occlusions, etc.
43