pose invariant action recognition for automated behaviour...
Post on 08-Aug-2020
16 Views
Preview:
TRANSCRIPT
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action
Recognition
22nd November 2016, Tuesday
Manoj Ramanathan
Research Engineer, IMI
IMI Research Seminar
Contents
• Introduction • Literature Review
– Motion – Pose – Motion + Pose
• Proposed Framework – Propagate Motion Forward (PMF) Path – Canonical Pose Feedback (CPF) Path
• Experimental Results & Discussion • Conclusion
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
2
Introduction
• For several applications, it is necessary for device to understand environment and humans.
• Recognition of human action is essential. • RGB camera based action recognition is not easy.
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
3
Introduction & Motivation
Occlusion
Background Clutter
View invariance
Motivating Challenges & factors
Execution rate
Anthropometric variations
Moving Cameras
Generalizability
Action localization
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
4
Introduction & Motivation
• Objectives:
– RGB camera Action recognition that can handle following challenges
• View angle changes
• Occlusion
• Pose Variations
• Background Clutter
– Generalized to handle actions performed in non-upright human postures.
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
5
Literature Review
Motion Based Approaches
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
Motion History Images & Motion Energy Images [1] – To indicate presence of motion and recency of motion
Trajectories [2]
- Optical Flow [3] - Kinematic Features [4,5]
[1] Aaron F. Bobick and James W. Davis, “The recognition of human movement using temporal templates”, IEEE Trans. on Pattern Analysis and Machine Intelligence, 23(3): 257 - 267, March 2001. [2] H. Wang, A. Klser, C. Schmid, and C.-L. Liu, “Dense trajectories and motion boundary descriptors for action recognition,” Intl Journal on Computer Vision, vol. 103, pp. 60 – 79, May 2013. [3] L. Liu, L. Shao, and P. Rockett, “Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition,” Pattern Recognition 46, Elsevier, pp. 1810 – 1818, July 2013. [4] M. Jain, H. Jegou, and P. Bouthemy, “Better exploiting motion for better action recognition,” in IEEE Conf. on Computer Vision and Pattern Recognition, June 2013. [5] S. Ali and M. Shah, “Human action recognition in videos using kinematic features and multiple instance learning,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 32, pp. 288 – 303, February 2010. 7
Pose Based Approaches
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
- Shape [1], Contours [2] - Based on extraction and representation of key poses [5] - Silhouette [4]
Poselets [3] – Body part detectors in 3D appearance space
[1] H. Zhang and L. E. Parker, “4-Dimensional local spatio-temporal features for human activity recognition,” in IEEE Intl. Conf. on Intelligent Robots and Systems, pp. 2044 – 2049, September 2011. [2] S. Cheema, A. Eweiwi, C. Thurau, and C. Bauckhage, “Action recognition by learning discriminative key poses,” in IEEE Intl. conf. on computer vision workshops, pp. 1302 – 1309, November 2011. [3] M. Raptis and L. Sigal, “Poselet key-framing: A model for human activity recognition,” in IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2650 – 2657, October 2013. [4] F. Lv and R. Nevatia, “Single view human action recognition using keypose matching and Viterbi path searching,” in IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1–8, June 2007. [5] N. Ikizler-Cinbis and S. Scarloff, “Web-based classifiers for humanaction recognition,” IEEE Trans. On Multimedia, vol. 14, pp. 1031 –1045, August 2012.
8
Motion + Pose Based Approaches
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
Shape-Motion Prototypes [1] Motionlets [2] - a mid-level spatio-temporal part, which are a tight cluster in motion and appearance space corresponding to each body part movements.
[1] Z. Jiang, Z. Lin, and L. S. Davis, “Recognizing human actions by learning and matching shape-motion prototype trees,” IEEE Trans. On Pattern Analysis and Machine Intelligence, vol. 34, pp. 533 – 547, March 2012. [2] L. Wang, Y. Qiao, and X. Tang, “Motionlets mid-level 3d parts for human motion recognition,” in IEEE Conf. on Computer Vision and Pattern Recognition, June 2013.
9
Proposed Pose Invariant Action Recognition Framework
Consists of two components, namely motion and pose component in a mutually reinforcing framework
Framework for Action Recognition
• Actions are manifested as movements of body parts
• Detection of body parts and analyzing their motion provides a good framework
• Mutually assistive components to improve each other’s performance
• Represent motion of each body part with respect to the body (for pose-invariance)
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
11 Confidential To be Published
Input Video
Propagation Mechanism – Grid Division
Human body centric space conversion
Kinematic features – Div, Curl, Proj, Rot
Action Model from training
videos
ELM Classifier
Canonical Pose Hypothesis – Identify pose in the frame
Canonical Sticks from
training videos
Preprocessing – Foreground detection
Temporal Stick Features – Implicitly captures dynamics
of pose evolution
Recognized Action
Propagation Motion Forward Path
Canonical Pose Feedback Path
Realign grids based on head size
Kinematic features – Div, Curl, Proj, Rot, BodyProj, BodyRot
12 Confidential To be Published
Propagate Motion Forward Path
Parameters assumed as available or estimated Foreground Neck point
Major viewing direction
Propagation Mechanism – Grid Division Requires neck and foreground
Human body centric space conversion Requires viewing direction
Kinematic features – Div, Curl, Proj, Rot Requires neck
Propagation Motion Forward Path
Propagate Motion Forward Path
2x
5x
6x
Divide into grids based on body
proportion
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
14 Confidential To be Published
Propagate Motion Forward Path
• Optical Flow [1] used in the framework
• Kinematic features [2] extracted from Optical flow to represent and characterize actions
– Divergence
– Vorticity (Curl)
– Projection
– Rotation
[1] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, "High Accuracy Optical Flow Estimation Based on a Theory for Warping," ECCV, vol. 3024, pp. 25-36, 2004. [2] S. Shojaeilangari, W. Y. Yau, k. Nandakumar, J. Li, and E. K. Teoh.,” Robust Representation and Recognition of Facial Emotions Using Extreme Sparse Learning”, IEEE Trans. on Image Processing, Vol.24, No.7, pg. 2140 – 2152, March 2015.
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
15 Confidential To be Published
Propagate Motion Forward Path
• Weighted & Unweighted Histograms of the motion features were used in Pose invariant Emotion recognition [1]
• Assumed that the face is frontal and deals only with 2D motion
• For action recognition
– Method should handle 3D motion
– Human performing action need not be frontal
[1] S. Shojaeilangari, W. Y. Yau, k. Nandakumar, J. Li, and E. K. Teoh.,” Robust Representation and Recognition of Facial Emotions Using Extreme Sparse Learning”, IEEE Trans. on Image Processing, Vol.24, No.7, pg. 2140 – 2152, March 2015.
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
16 Confidential To be Published
Propagate Motion Forward Path Up
Left
Front
3
Grid 1
Grid 2
Grid 3
1 2
4
5 6
3
Human body centric space
Encode the grids based on view
Confidential To be Published
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
17
Canonical Pose Feedback Path
Initially, Head size is assumed to divide into different grids.
Use the initial motion features to recognize an initial action
Canonical Pose Hypothesis – Identify pose in the frame Uses initial action and body part detector
Canonical Sticks from training videos
- offline training only once
Temporal Stick Features – Implicitly captures dynamics of pose evolution
Canonical Pose Feedback Path
Realign grids based on head size
Kinematic features – Div, Curl, Proj, Rot, BodyProj, BodyRot
Canonical Pose Feedback Path
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
19 Confidential To be Published
Canonical Pose Feedback Path
Available training videos
1) Crop foreground region in frame 2) Convert to grayscale 3) Resize to fixed dimension 4) Collect all resized images for all
videos 5) Apply NNMF to this data and
extract top N (=100) Eigen vectors or principal components
- Manually mark the sticks of each Eigen vector. - Use neck point and size of head, to obtain a normalized stick representation that can be used for comparison with test frame. Mutually Reinforcing Motion-Pose Framework for
Pose Invariant Action Recognition
Canonical Stick Extraction
Confidential To be Published
20
Weizmann
KTH
UCF Sports
Bend Wave2 Run
Boxing Hand Clapping Running
Golf Swing Kicking SkateBoarding
Canonical Pose Feedback Path
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
21 Confidential To be Published
Canonical Pose Feedback Path
1) Compare each canonical stick of action with the image to identify the most possible pose
2) A hypothesis for each canonical pose is computed based on the formula
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
Canonical Pose hypothesis
𝑃𝐻𝑁 = ∑i ∑j L(i,j)
Where L(i,j) = l(I,j), if dj ≤ Τd and (Θj –Θs ) ≤ ΤΘ
= 0, otherwise
l(i,j) – Body part i Likelihood score in segment j Td – distance threshold TΘ – Orientation threshold i – For each body part j – For each motion consistent segment
Confidential To be Published
22
Canonical Pose Feedback Path
Algorithm 1. Start with likelihood score Li for each part i in stick pose as 0, Li = 0. 2. Using kinematic motion features, obtain an initial segmentation of the foreground region. 3. Pass each of these segments through the body part detector [1], to know if the segment is a body part i or not. 4. If segment m is detected as body part i associated likelihood score li;m is obtained. 5. If segment m satisfies distance and orientation constraints, the likelihood score Li for body part i in stick pose is accumulated by li;m. (Distance constraints are imposed in normalized stick coordinates). 6. Repeat steps 3 - 5 for every segment and obtain the final Li for the canonical stick pose. 7. The pose hypothesis PHn for canonical stick pose n is summation of all Li. 8. Repeat steps 1 - 7 for every canonical pose n and compute pose hypothesis. 9. Choose the top 3 poses with highest pose hypothesis and compute the mean pose 10. Perform a pixel-wise segmentation into one of the body parts based on the distance from each body part’s stick in the mean pose. 11. Compute the body orientation using obtained torso region and neck point. 12. Compute the head size using obtained head region and body orientation. 13. Repeat steps 5- 12 if computed head size and initial approximate
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
[1] Manoj Ramanathan, Wei-Yun Yau and Eam Khwang Teoh, `Human Body Part Detection Using Likelihood Score Computations', IEEE Symposium on Computational Intelligence in Biometrics and Identity Management (CIBIM), pg. 160 – 166, December 2014.
Confidential To be Published
23
Input video divided into T Temporal Segments For each segment, average Stick Pose and Neck Point is computed
Total T Stick Poses
…..
Motion of each stick joint between consecutive segments
Proj & Rot features computed with respect to neck point
Temporal Stick Features
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
24 Confidential To be Published
Canonical Pose Feedback Path
• Pose component helping motion component – Re-align the grids of the according to the canonical
pose identified for each frame – Compute body part referenced kinematic feature
using pixel wise segmentation for each pixel
– Action recognized based on the original motion feature and newly computed feature.
• Framework forms a loop-like structure that can be repeated until action recognition converges.
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
Confidential To be Published
25
NUAD - Focus on Non-upright action (NUAD) instead of the usual set of upright actions - 35 actors - 8 actions
- Bend - Squat - Push up - Climber - Knee bending - Single hand wave - Double hand wave - Lying down wave
- 3 views (Front, Left, Right) - Ground truth marking done to
indicate all body parts and neck points in the frames
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
Confidential To be Published
26
Experiments & Discussion
- Datasets • Simple ones – KTH & Weizmann • Challenging ones – UCF Sports & Hollywood • Cross Dataset – MSR Action • Posture Variation – NUAD - Tolerance range for neck markings
Method Performance (%)
Proposed (only PMF) 92.47
Proposed (PMF + CPF) 100
Shape –Motion Prototype [1] 100
Kinematic Features [2] 95.75
MHI & MEI based [3] 93
Experiments
Weizmann Dataset - 9 actors - 10 actions - Simple background - Leave one actor out method
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
Confidential To be Published
[1] Z. Jiang, Z. Lin, and L. S. Davis, “Recognizing human actions by learning and matching shape-motion prototype trees,” IEEE Trans. On Pattern Analysis and Machine Intelligence, vol. 34, pp. 533 – 547, March 2012. [2] S. Ali and M. Shah, “Human action recognition in videos using kinematic features and multiple instance learning,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 32, pp. 288 – 303, February 2010. [3] Y. Lu, Y. Li, Y. Chen, F. Ding, X. Wang, J. Hu, and S. Ding, “A Human action recognition method based on Tchebichef moment invariants and temporal templates,” in Intl. Conf. on Intelligent Human-Machine Systems and Cybernetics, pp. 76–79, August 2012. [4] L. Wang, Y. Qiao, and X. Tang, “Motionlets mid-level 3d parts for human motion recognition,” in IEEE Conf. on Computer Vision and Pattern Recognition, June 2013.
KTH Dataset - 25 actors - 6 actions - 4 scenarios - Leave one out (LOO) method - (16+9) Test validation
Method 16 + 9 (%) LOO (%)
Proposed (only PMF) 87 90
Proposed (PMF + CPF) 90 93.32
Shape –Motion Prototype [1]
95.77
Kinematic Features [2]
87.77
Motionlets [4] 93.3
28
Method Performance (%)
Proposed (only PMF) 76.4
Proposed (PMF + CPF) 87.4
Shape –Motion Prototype [1] 88
[2] 96.6
[3] 81.7
Experiments UCF Sports - 152 videos of 10 sports based actions - Dynamic backgrounds, view changes,
camera motion and pose variations - Skateboard – walk & Run – Kick confused
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
Confidential To be Published
[1] Z. Jiang, Z. Lin, and L. S. Davis, “Recognizing human actions by learning and matching shape-motion prototype trees,” IEEE Trans. On Pattern Analysis and Machine Intelligence, vol. 34, pp. 533 – 547, March 2012. [2] M. T. Harandi, C. Sanderson, S. Shirazi, and B. C. Lovell, “Kernel analysis on grassmann manifolds for action recognition,” Pattern Recognition Letters, vol. 34, pp. 1906 – 1913, November 2013. [3] K. G. Derpanis, M. Sizintsev, K. J. Cannons, and R. P. Wildes, “Action spotting and recognition based on spatiotemporal orientation analysis,” IEEE Trans. On Pattern Analysis and Machine Intelligence, vol. 35, pp. 527 – 540, March 2013. [4] A. Gilbert, J. Illingworth, and R. Bowden, “Action recognition using mined hierarchical compound features,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 33, pp. 883 – 897, May 2011. [5] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning realistic human actions from movies,” in IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1–8, June 2008
Hollywood Dataset - 8 actions (Interactions included) - Dynamic backgrounds, view changes,
camera motion and pose variations - Occlusion Handling –
- Stick limits & Grid based - Test & train set provided
Method Performance (%)
Proposed (only PMF) 54.12
Proposed (PMF + CPF) 56.87
[4] 53.3
[5] 38.2
29
Dataset All canonical Sticks
Lesser no. of Sticks
UCF Sports 87.4% 83.4%
Hollywood 56.87 51.89
Experiments Testing Pose effectiveness - Reducing available canonical stick poses for each action. - Adding sticks extracted from other datasets for certain actions
NUAD Dataset - View-invariance only with mirror image cases. (Around 92.5%) - Frontal & side cases, the accuracy is very less. (Around 58% only) - Because of Human body centric space created using 2D images is not same for different views.
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
Confidential To be Published
Cross Dataset applicability of canonical sticks - Tested using MSR Action Dataset - 3 actions only same as KTH dataset - Canonical sticks extracted from KTH and
used.
Method Performance (%)
Proposed (only PMF) 90.2
Proposed (PMF + CPF) 92.46
Method Performance (%)
Proposed (only PMF) 90.1
Proposed (PMF + CPF) 91.4
30
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
Experiments
Errors as neck is not available or action is not visible Erroneous pose identification
31 Confidential To be Published
Discussion
- Availability of different canonical sticks for each action. - Based on available 2D stick project to 3D stick so that it can be used for
comparing with any view.
- Estimation of Neck point and viewing direction.
- Tolerance for neck region – only 3% performance drop
- Foreground estimated using background averaging.
- Body part detection error resulting in wrong pose estimation
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
Confidential To be Published
32
Conclusion
- Mutually reinforcing motion – pose framework action recognition - Pose-invariant - Partially view-invariant - Partial occlusion handling.
- Forward path handles motion & Feedback path handles pose.
- Representation of motion of each body part in a body centric space that allows
pose-invariance.
- Motion determines initial action, that determines the canonical stick poses to be used.
- Canonical stick pose identified help to realign grids and include motion features for each body part motion
Mutually Reinforcing Motion-Pose Framework for Pose Invariant Action Recognition
Confidential To be Published
33
Thank you!!
Q & A ??
top related