learning an attribute dictionary for human action classification qiang qiu, zhuolin jiang, and rama...
TRANSCRIPT
Learning an Attribute Dictionary for Human Action Classification
Qiang Qiu, Zhuolin Jiang, and Rama Chellappa, ”Sparse Dictionary-based Representation and Recognition of Action Attributes”, ICCV 2011
Qiang Qiu
Action Feature Representation
2
Shape
Motion
HOG
Action Sparse Representation
3
=
000.640.53-0.400.35000
0.430.6300-0.330-0.3600
=0.43 × + 0.63 × - 0.33 × - 0.36 ×
= 0.64× + 0.53× - 0.40 × +0.35 ×
Action Dictionary
Sparse code
K-SVD
4
=
Y
y2d1 d2 d3 …
000.640.53-0.400.35000
0.430.6300-0.330-0.3600
x1 x2
y1
D
X K-SVD
Input: signals Y, dictionary size, sparisty T Output: dictionary D, sparse codes X
arg min |Y- DX|2 s.t. i , |xi|0 ≤ TD,X
Input signals Dictionary
Sparse codes
[1] M. Aharon and M. Elad and A. Bruckstein, K-SVD: An Algorithm for Designing Overcomplete Dictionries for Sparse Representation, IEEE Trans. on Signal Process, 2006
5
CompactDiscriminativeand
Dictionary.
Learn a
Objective
Probabilistic Model for Sparse Representation
A Gaussian Process Dictionary Class Distribution
6
7
0.430.6300-0.330-0.36000000
000.640.53-0.400.350000000
00000000-0.280.6980.370.250
0000-0.4200000.420.4700.32
=
y2y1 y4y3
l1l1 l2l2
d1 d2 d3 …
x1 x2 x3 x4
xd1
l1l1 l2l2
More Views of Sparse Representation
8
y2y1 y4y3
l1l1 l2l2
xd10.43 0 0 0
x1 x2 x3 x4
0.63 0 0 0
0 0.64 0 0
0 0.53 0 0
-0.33 -0.40 0 -0.42… …
xd2
xd3
xd4
xd5
l1 l1 l2 l2
d1
d2
d3
d4
d5
A Gaussian Process• Covariance function entry: K(i,j) = cov(xdi, xdj)• P(Xd*|XD*) is a Gaussian with a closed-form conditional variance
A Gaussian Process
9
y2y1 y4y3
l1l1 l2l2
xd10.43 0 0 0x1 x2 x3 x4
0.63 0 0 0
0 0.64 0 0
0 0.53 0 0
-0.33 -0.40 0 -0.42… …
xd2
xd3
xd4
xd5
l1 l1 l2 l2
d1
d2
d3
d4
d5
Dictionary Class Distribution• P(L|di), L [1, M]• aggregate |xdi| based on class labels to obtain a M sized vector• P(L=l1|d5) = (0.33+0.40)/(0.33+0.40+0.42) = 0.6348• P(L=l2|d5) = (0+0.42)/(0.33+0.40+0.42) = 0.37
Dictionary Class Distribution
Dictionary Learning Approaches Maximization of Joint Entropy (ME) Maximization of Mutual Information
(MMI) Unsupervised Learning (MMI-1) Supervised Learning (MMI-2)
10
11
Maximization of Joint Entropy (ME)- Initialize dictionary using k-SVD
Do =
- Start with D* = - Untill |D*|=k, iteratively choose d* from Do\D*,
d* = arg max H(d|D*)dME dictionary
D
- A good approximation to ME criteriaarg max H(D)
where
12
Maximization of Mutual Information for Unsupervised Learning (MMI-1)
- Initialize dictionary using k-SVD
- Start with D* = - Untill |D*|=k, iteratively choose d* from Do\D*,
MMI dictionary
- Closed form:
- A near-optimal approximation to MMI
arg max I(D; Do\D) within (1-1/e) of the optimum
D
d* = arg max H(d|D*) - H(d|Do\(D*
d)) d
Diversity Coverage
13
Dictionary Class Distribution• P(L|di), L [1, M]• aggregate|xdi|based on class labels to obtain a M sized vector• P(l1|d5) = (0.33+0.40)/(0.33+0.40+0.42) = 0.6348• P(l2|d5) = (0+0.42)/(0.33+0.40+0.42) = 0.37
• P(Ld) = P(L|d) • P(LD) = P(L|D) , where
y2y1 y4y3
l1l1 l2l2
xd10.43 0 0 0
x1 x2 x3 x4
0.63 0 0 0
0 0.64 0 0
0 0.53 0 0
-0.33 -0.40 0 -0.42… …
xd2
xd3
xd4
xd5
l1 l1 l2 l2
d1
d2
d3
d4
d5
Revisit
14
Maximization of Mutual Information for Supervised Learning (MMI-2)- Initialize dictionary using k-SVD
- Start with D* = - Untill |D*|=k, iteratively choose d* from Do\D*,
dd* = arg max [H(d|D*) - H(d|Do\(D*
d))] + λ[H(Ld|LD*) – H(Ld|LDo\
(D* d))]
- MMI-1 is a special case of MMI-2 with λ=0.
Keck gesture dataset
15
Representation Consistency
16[1] J. Liu and M. Shah, Learning Human Actions via Information Maximization, CVPR 2008
[1]
Recognition Accuracy
17
The recognition accuracy using initial dictionary Do: (a) 0.23 (b) 0.42 (c) 0.71
Recognizing Realistic Actions
18
• 150 broadcast sports videos.• 10 different actions.
• Average recognition rate: 83:6%• Best reported result 86.6%