action recognition in video by sparse representation on ...[ali-shah-pami’10] action features...
TRANSCRIPT
![Page 1: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/1.jpg)
Action Recognition in Video by Sparse Representationon Covariance Manifolds of Silhouette Tunnels
Kai Guo, Prakash Ishwar, and Janusz Konrad
Department of Electrical & Computer Engineering
![Page 2: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/2.jpg)
Motivation
out of 352
• Recognize actions in video
• Applications
Surveillance: Sports & entertainment:
Wildlife habitatmonitoring:
Tools for hearing impaired:
skate
walk
![Page 3: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/3.jpg)
Challenges and assumptions
out of 353
Scene• Multiple objects • Clutter and occlusion• Illumination variability
![Page 4: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/4.jpg)
Challenges and assumptions
out of 354
Scene• Multiple objects • Clutter and occlusion• Illumination variability
Scene• Single object • No significant clutter and occlusion• No significant illumination change
![Page 5: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/5.jpg)
Challenges and assumptions
out of 355
Scene• Multiple objects • Clutter and occlusion• Illumination variability
Acquisition• Camera motion • Camera viewpoint and zoom• Camera imperfections
Scene• Single object • No significant clutter and occlusion• No significant illumination change
![Page 6: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/6.jpg)
Challenges and assumptions
out of 356
Scene• Multiple objects • Clutter and occlusion• Illumination variability
Acquisition• Camera motion • Camera viewpoint and zoom• Camera imperfections
Scene• Single object • No significant clutter and occlusion• No significant illumination change
Acquisition• Single camera, fixed viewpoint• No significant camera motion
and distortion
![Page 7: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/7.jpg)
Challenges and assumptions
out of 357
Scene• Multiple objects • Clutter and occlusion• Illumination variability
Acquisition• Camera motion • Camera viewpoint and zoom• Camera imperfections
Scene• Single object • No significant clutter and occlusion• No significant illumination change
Acquisition• Single camera, fixed viewpoint• No significant camera motion
and distortion
Action• Non-rigid objects• Complex motion (ballet video)• Intra and inter object motion
variability
![Page 8: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/8.jpg)
Challenges and assumptions
out of 358
Scene• Multiple objects • Clutter and occlusion• Illumination variability
Acquisition• Camera motion • Camera viewpoint and zoom• Camera imperfections
Scene• Single object • No significant clutter and occlusion• No significant illumination change
Acquisition• Single camera, fixed viewpoint• No significant camera motion
and distortion
Action• Non-rigid objects• Complex motion (ballet video)• Intra and inter object motion
variability
Action• Non-rigid objects• Complex motion• Intra and inter object motion
variability
![Page 9: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/9.jpg)
Problem statement
out of 359
Given:
Task: Action Recognition
Dictionary of labeled training data
walk run jump wave
unknown action
walk
Recall: challenges– non-rigid object– complex motion– intra and inter object motion variability
![Page 10: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/10.jpg)
Related work
out of 3510
Shape-basedfeatures
Interest -point based features
Geometric human body
features
Motion-based features
NearestNeighbor
[Gorelick-Irani-PAMI’07][Bobick-Davis-PAMI’01]
[Collins-Gross-ICAFGR’02]
[Dollar-Rabaud-VS PETS’05]
[Cunado-Nixon-CVIU’03]
[Wang-Ning-ICSVT’04]
[Seo-Milanfar-PAMI (submitted)],
[Liu-Ali-CVPR’08][Lowe-IJCV’04]
SVM[Ikizler-Duygulu-
LNCS’07][Ahmad-Lee-Journal of
Multimedia’10]
[Shuldt-Laptev-ICPR’04]
[Laptev-CVPR’08]
[Goncalves-Bernardo-CVPR’95]
[Danafar-Gheissari-ACCV07], [Scovanner-
Ali-ACMMultimedia’07]
Boosting [Zhang-Liu-ICCV’09] [Smith-Shah-ICCV’05] -
[Alireza-Mori-CVPR’08],
[Ke-Sukthankar-ICCV’05]
Graphical (Probabilistic)
model[Chen-Wu-ICDMW’08]
[Niebles-Lei-IJCV’08][Wong-Cipolla-
ICCV’07] [Rohr-CVGIP’94] [Ali-Shah-PAMI’10]
Action features
Actio
n c
lass
ifier
s
![Page 11: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/11.jpg)
Action recognition framework
out of 3511
Action recognition = Supervised learning problem,
where data samples are video clips
Two main ingredients:
— Representation of samples
— Classification
![Page 12: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/12.jpg)
Action representation
out of 3512
video clip bag of dense local featureslocal features
![Page 13: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/13.jpg)
Action representation
out of 3513
• How to reduce the dimension of bag of local features?
video clip bag of dense local featureslocal features
![Page 14: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/14.jpg)
Action representation
out of 3514
• How to reduce the dimension of bag of local features?
video clip
Action representation
— Ideally, one should learn and compare pdfs of features— Problem: it may not reduce the dimensionality
bag of dense local featureslocal features
![Page 15: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/15.jpg)
Action representation
out of 3515
• How to reduce the dimension of bag of local features?
video clip
Action representation
— Idea-1: Learn and compare 1st order statistics (mean)— Problem: not sufficiently discriminative
Meanbag of dense local featureslocal features
![Page 16: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/16.jpg)
Action representation
out of 3516
• How to reduce the dimension of bag of local features?
video clip covariance matrix
Action representation
— Idea-2: Learn and compare 2nd order statistics (covariance)[Tuzel-Porikli-Meer PAMI’08]
bag of dense local featureslocal features
![Page 17: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/17.jpg)
Action representation
out of 3517
• How to reduce the dimension of bag of local features?
video clip covariance matrix
ΨAction representation
— Idea-2: Learn and compare 2nd order statistics (covariance)
— Output: feature covariance matrix (e.g., 13-dim vector -> 91-dim covariance matrix)
[Tuzel-Porikli-Meer PAMI’08]
bag of dense local featureslocal features
![Page 18: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/18.jpg)
Action representation
out of 3518
• How to reduce the dimension of bag of local features?
video clip covariance matrix
ΨAction representation
— Idea-2: Learn and compare 2nd order statistics (covariance)
Main thesis: covariance matrix is “sufficient” for action recognition
— Output: feature covariance matrix (e.g., 13-dim vector -> 91-dim covariance matrix)
[Tuzel-Porikli-Meer PAMI’08]
bag of dense local featureslocal features
![Page 19: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/19.jpg)
Action representation
out of 3519
• How to reduce the dimension of bag of local features?
video clip covariance matrix
ΨAction representation
— Idea-2: Learn and compare 2nd order statistics (covariance)
Main thesis: covariance matrix is “sufficient” for action recognition
— Output: feature covariance matrix (e.g., 13-dim vector -> 91-dim covariance matrix)
[Tuzel-Porikli-Meer PAMI’08]
silhouette tunnel bag of dense local features
![Page 20: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/20.jpg)
Covariance manifold
• Covariance matrices form:— a Riemannian manifold— not a vector space
• Matrix-log maps a Riemannian manifoldto a vector space
——
out of 3520
CC’
Manifold ofcovariance matrices
matrix-log
log(C)
log(C’)
TUDUC =TUDUC )log(:)log( =
vector space
[Arsigny-Pennec-Ayache’06]
![Page 21: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/21.jpg)
Classification based on sparse linear representation
out of 3521
seg1
seg2
segK
label(1)
label(2)
label(N)
Ψ
C1
C2
CN
Cquery
Matrix-log&
vectorization
Matrix-log&
vectorization
Matrix-log&
vectorization
Matrix-log&
vectorization
pquery
P
...
Decide query label
Estimated query label
Ψ
Ψ
Ψ
pN
p2
p1
[Wright et al., PAMI’09]
![Page 22: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/22.jpg)
Classification algorithm
Each coefficient of weights the contribution of training segments to query segment
out of 3522
Step 1: Compute residual error:
Step 2: Determine query label
2* ||||)( iiqueryqueryi PR α−= pp
)(minarg)( queryiiquery Rlabel pp =
action 1 action 2 action 3
P
P1 P2 P3
*1α
*2α
*3α
*α
![Page 23: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/23.jpg)
Silhouette-based local features
out of 3523
dNE
dE
dSE
dS
dSW
dWdNW
dN
dT+
dT-
f(s) =
xyt
dN
dE…dT+
dT- 13x1
s = yt
x
(x,y,t)
• Dimensionality reductionT
SS S
C ))()()((||
1: μμ −−= ∑∈
sfsfs
![Page 24: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/24.jpg)
Implementation issues• How to process a long video?
out of 3524
— Break it into segments
• What should be the length of segments??
timex
y—Period for human action ≈ 0.4 — 0.8 s
—Segment length ≈ 10 — 20 frames
(@25 fps video)
![Page 25: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/25.jpg)
Action segments
out of 3525
• No knowledge about the beginning and end of periods— Use overlapping segments
• Additional benefits of overlapping segments— Reduced sensitivity to temporal action misalignment— Richer dictionary
![Page 26: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/26.jpg)
Segment-level classification
out of 3526
seg1
seg2
segK
label(1)
label(2)
label(N)
Ψ
C1
C2
CN
Cquery
Matrix-log&
vectorization
Matrix-log&
vectorization
Matrix-log&
vectorization
Matrix-log&
vectorization
pquery
P
...
Decide query label
Estimated query label
Ψ
Ψ
Ψ
pN
p2
p1
[Wright et al., PAMI’09]
![Page 27: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/27.jpg)
From segment decisions to a sequence decision
out of 3527
walk
• How to get the labels of query video ? — Majority rule
![Page 28: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/28.jpg)
Summary of the approach• Partitioning into segments
28 out of 35
![Page 29: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/29.jpg)
Summary of the approach• Partitioning into segments• Action representation for each segment
video clip covariance matrix
ΨAction representation
silhouette tunnel bag of dense local features
29 out of 35
![Page 30: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/30.jpg)
Summary of the approach• Partitioning into segments
• Segment-wise action recognition• Action representation for each segment
Cquery
…
C1 C2 CK
Bag of labeled feature covariance matrices
label(1) label(2) label(K)
Action recognition Estimated label of query segment
30 out of 35
![Page 31: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/31.jpg)
Summary of the approach• Partitioning into segments
• Segment-wise action recognition
• Decision fusion
walk
• Action representation for each segment
31 out of 35
![Page 32: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/32.jpg)
Experimental results
out of 3532
• Datasets— Weizmann: 9 persons x 10 actions (180x144)
— UT-Tower: 6 persons x 9 actions x 2 times (about 90x70)
wave1 wave2
run
side skip walk
pjumpjumpjumping-jackbend
point stand dig walk carry
runwave1 wave2jump
![Page 33: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/33.jpg)
out of 3533
Performances
• Weizmann dataset: (LOOCV, N=8)
Method Proposed NN-based Gorelick Niebles Ali Seo
SEG-CCR 96.74% 97.05% 97.83% — 95.75% —
SEQ-CCR 100% 100% — 90% — 96%
• Correct classification rate (CCR)— SEG-CCR: % of correctly classified query segments — SEQ-CCR: % of correctly classified query sequences
Method Proposed NN-based
SEG-CCR 96.15% 93.53%
SEQ-CCR 97.22% 96.30%
• UT-Tower dataset: (LOOCV, N=8)
![Page 34: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/34.jpg)
out of 3534
Computational complexity
• Platform: Dual Core 2.2 GHz + 2GB Memory + Matlab 7.6
• Action representation
video: 180 x 144 x 84 — 0.12 sec/frame (8.3fps)
• Action classification
0.07 sec/segment (14.3fps)
![Page 35: Action Recognition in Video by Sparse Representation on ...[Ali-Shah-PAMI’10] Action features Action classifiers. Action recognition framework 11 out of 35](https://reader036.vdocuments.site/reader036/viewer/2022071505/6125cfbd10c99e22c009d1db/html5/thumbnails/35.jpg)
Conclusions
• We proposed a novel approach to action recognition:
—action representation = covariance matrix of local features—action classification = sparse-representation-based classifier
• The proposed approach has
— state-of-the-art performance on Weizmann dataset— 100% performance on non-static actions in low-resolution
UT-Tower dataset— low memory requirements with close to real-time
performance
out of 3535