human pose estimation by deep learning

46
Human Pose Estimation by Deep Learning Wei Yang Supervisor: Prof. WANG Xiaogang, Prof. OUYANG Wanli IVP Lab, CUHK September 11, 2015

Upload: wei-yang

Post on 16-Apr-2017

375 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Human Pose Estimation by Deep Learning

Human Pose Estimation by Deep Learning

Wei YangSupervisor: Prof. WANG Xiaogang, Prof. OUYANG Wanli

IVP Lab, CUHKSeptember 11, 2015

Page 2: Human Pose Estimation by Deep Learning

2

Outline

• Introduction• Traditional Approaches• Deep Learning Methods

– Global view (holistic view)

– Local appearance

– Combination of local appearance and global view

– Others

2015/9/11

Page 3: Human Pose Estimation by Deep Learning

3

Introduction

• What is articulated body pose estimation? “recovers the pose of an articulated body, which consists of joints and rigid parts using image-based observations.”

2015/9/11

Page 4: Human Pose Estimation by Deep Learning

4

Applications

Action recognition Clothing Parsing

Gaming2015/9/11

Human tracking

Page 5: Human Pose Estimation by Deep Learning

5

Challenges

2015/9/11

Page 6: Human Pose Estimation by Deep Learning

6

Traditional Approaches

Fischler & Elschlager 1973 Felzenszwalb & Huttenlocher 2005

Pictorial Structure• Unary Templates• Pairwise Springs

Yang & Ramanan 2011

Mixtures of “mini-parts”• Mixture of part • Unary template for part with mixture • Pairwise springs between part with

mixture and part with mixture

2015/9/11

headtorso

leg

Example of mini parts: near-vertical and near horizontal limbs

Page 7: Human Pose Estimation by Deep Learning

7

Deep Learning for Pose Estimation

• Holistic View– e.g., joints position regression

• Local View– e.g., body parts detection

• Combining global and local information– e.g., body parts detection + joints position regression

• Others– e.g., motion features, pose estimation in videos

2015/9/11

Page 8: Human Pose Estimation by Deep Learning

8

Holistic View

DeepPose: Human Pose Estimation via Deep Neural Networks

2015/9/11

Page 9: Human Pose Estimation by Deep Learning

9

Holistic Reasoning

2015/9/11

• Why holistic reasoning?– Besides extreme variability in articulations, many of the joints are barely visible

Page 10: Human Pose Estimation by Deep Learning

10

DeepPose: A CNN Regressor

2015/9/11

• Network architecture: AlexNet– Krizhevsky, Sutskever, and Hinton, NIPS 2012 (ImageNet)

– The first time deep model is shown to be effective on large scale

[Toshev & Szegedy, CVPR 2014]

Page 11: Human Pose Estimation by Deep Learning

11

Results on LSP (Leeds Sports Pose) dataset

2015/9/11

Page 12: Human Pose Estimation by Deep Learning

12

Cascade of Pose Regressors

• The pose estimation results are very coarse:– due to its fixed input size of 220 × 220, the network has limited capacity to look

at detail

– Train cascade of pose regressors for more precise joint localization

2015/9/11

Page 13: Human Pose Estimation by Deep Learning

13

Cascade of Pose Regressors

2015/9/11

Page 14: Human Pose Estimation by Deep Learning

14

Refined pose estimation

2015/9/11

Page 15: Human Pose Estimation by Deep Learning

15

Percentage of Correct Parts (PCP) on LSP dataset

2015/9/11

Page 16: Human Pose Estimation by Deep Learning

16

Local Appearance Method

Articulated Pose Estimation by a Graphical Modelwith Image Dependent Pairwise Relations

2015/9/11

Page 17: Human Pose Estimation by Deep Learning

17

Motivation

• Local image patches are able to capture:– Part presence

– Pairwise part spatial relationships

2015/9/11

Number of mixture type for each pair: 6

Neighbor: 1# of relationships:

Neighbor: 2# of relationships:

Lower arm

Upper arm

[Chen & Yuille NIPS 2014]

Page 18: Human Pose Estimation by Deep Learning

18

Tree-structured Relational Graph

– : body parts

– : pairwise relationships between parts

– : Pixel location of part

– Pairwise relationship

– Defined by relative position

– In experiment: 13 type for each pair

2015/9/11

Page 19: Human Pose Estimation by Deep Learning

19

Formulation

2015/9/11

𝐹 (𝐩 ,𝐭|𝐼 ;𝝎 ,𝜃 )¿∑𝑖∈𝑉

𝐴𝑖(𝑝𝑖∨𝐼 ;𝜃)

Part presence

𝜔 𝑖 ⋅

Inference: • Tree structure• Can be solved efficiently by dynamic programming

, , are learned by Latent structure SVM

+ ∑(𝑖 , 𝑗 )∈𝐸

𝑅 (𝑝𝑖 ,𝑝 𝑗 , 𝑡𝑖𝑗 , 𝑡 𝑗𝑖∨𝐼 ;𝜃)

Pairwise deformation

+𝝎𝑖𝑗𝑡𝑖𝑗 ⋅𝜔 𝑖𝑗 ⋅

Pairwise Relationship

Page 20: Human Pose Estimation by Deep Learning

20

Learning DCNN parameters

2015/9/11

Derive the type label for each patch• use relative position to represent the

pairwise relations• Cluster the relative positions over the

whole training set • Type label : cluster index• Mean relative position : cluster center

Page 21: Human Pose Estimation by Deep Learning

21

Casting Full Connections into Convolutions

2015/9/11Elbow

Part presence map

Pairwise relationship map

Page 22: Human Pose Estimation by Deep Learning

22

PCP and PDJ on LSP dataset and FLIC dataset

Dataset Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP

LSPDCNN 92.5 85.1 82.7 76.3 70.2 55.9 74.8

Ouyang et al. 85.8 83.1 76.5 72.2 63.3 46.6 68.6

LSP FLIC

2015/9/11

Page 23: Human Pose Estimation by Deep Learning

23

Combining Local Appearance and Holistic View

Dual-Source Deep Neural Networks for Human Pose Estimation

2015/9/11

Page 24: Human Pose Estimation by Deep Learning

24

Dual-Source CNN

• Integrate both the local part appearance and the holistic view of each local part for more accurate human pose estimation

• Each input is an image pair– Part patches

– Body patches

2015/9/11

Page 25: Human Pose Estimation by Deep Learning

25

Part patches: incorporate local appearance

• Generated by region proposals with some restrictions– Not too small (at least contain a body part)

– Not too big (may contain too many body parts and lacks sufficient resolution)

• All classes of joints are covered by similar number of part patches

• During testing, part patches are selected from multi-scale sliding windows

2015/9/11

Page 26: Human Pose Estimation by Deep Learning

26

Body patches: holistic view

• Also from region proposals– Must cover all body parts

– In testing stage, the body patch can be generated by human detection

• For DS-CNN, each training sample is made up with 3 components– A part patch

– A body patch

– Binary mask specifying the location of the part patch in body patch

2015/9/11

Page 27: Human Pose Estimation by Deep Learning

27

Training of the DS-CNN

2015/9/11

Shared weights Classification( softmax)

Regression(L2 distance)

Page 28: Human Pose Estimation by Deep Learning

28

• Part heat map– Same size of input image

– Uniformly distributed probability for each sliding window

– Sum and average over all pixels

Testing

2015/9/11

0.0

0.9

0

K part

Page 29: Human Pose Estimation by Deep Learning

29

Testing

• Final pose estimation– Weighted average of predicted joint locations within part patches with high

responses.

2015/9/11

Page 30: Human Pose Estimation by Deep Learning

30

Results: PCP on LSP

2015/9/11

Page 31: Human Pose Estimation by Deep Learning

31

Other Methods & Applications

• MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation

• Flowing ConvNets for Human Pose Estimation in Videos

2015/9/11

Page 32: Human Pose Estimation by Deep Learning

32

Using Motion Features for Human Pose Estimation

• motion is a powerful visual cue that alone can be used to extract high-level information, including articulated pose.

2015/9/11

Image credit: Large displacement optical flow: descriptor matching in variational motion estimationThomas Brox, J. Malik. IEEE TPAMI, 33(3): 500-513, 2011

Page 33: Human Pose Estimation by Deep Learning

33

Modeep: Using Motion Features for Human Pose Estimation

• Extended Frames Labeled In Cinema (FLIC) dataset with additional motion features

2015/9/11

MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation. Arjun et. al., ACCV 2014

Average of frame pair Optical flow

Page 34: Human Pose Estimation by Deep Learning

34

Multi-resolution efficient sliding window model

2015/9/11

Page 35: Human Pose Estimation by Deep Learning

35

Simple Spatial Model

• FLIC: multiple people with only one annotated person• Testing: incorporate annotated torso position with simple

spatial model

2015/9/11

Predicted left shoulder Spatial mask of left shoulder Result

Page 36: Human Pose Estimation by Deep Learning

36

Experiment results

2015/9/11

Without motion feature

With motion feature

occlusion Cluttered bg Motion blur

Page 37: Human Pose Estimation by Deep Learning

37

Flowing ConvNets for Human Pose Estimation in Videos

2015/9/11

• CNN can benefit from temporal context by combining information across the multiple frames using optical flow.

Page 38: Human Pose Estimation by Deep Learning

38

Spatial ConvNet

2015/9/11

Why regression heatmap instead of joint coordinates?• The network can be multi-modal• regressing coordinates directly is a highly

non-linear and more difficult to learn mapping

Page 39: Human Pose Estimation by Deep Learning

39

Warping neighbouring heatmaps for improving pose estimates

• Heatmaps from frames (t − n) and (t + n) warped to frame t using tracks from optical flow (green & blue lines) can help refine the wrongly estimated part location

2015/9/11

Page 40: Human Pose Estimation by Deep Learning

40

Results

2015/9/11

Page 41: Human Pose Estimation by Deep Learning

41

• End-to-end pose estimation– Joint learning of pose features and pose configurations

– Allow local appearance to be fine-tuned by pose configuration

Ongoing Project

2015/9/11

Unary response

Pairwise relationships

Page 42: Human Pose Estimation by Deep Learning

42

Ongoing Project

2015/9/11

Pairwise relationships

… 𝑥𝑡 −2 𝑥𝑡 −1 𝑥𝑡 𝑥𝑇

𝑥𝑡 𝑥𝑡+1𝑥𝑡 −1

𝑤𝑑𝑡 𝑤𝑑𝑡 𝑤𝑑𝑡

𝑤𝑚 𝑤𝑚 𝑤𝑚

() () ()

𝑧𝑡 𝑧𝑡+1𝑧𝑡 −1Add constraints between body parts in a network

Distance transform

Unary response

Page 43: Human Pose Estimation by Deep Learning

43

Preliminary Results (PCP on LSP)

2015/9/11

• Future work– Pose relational graph learning

– Multi-task learning• Human detection

• Human segmentation

– Combining global information

Head Torso U.arms L.arms U.legs L.legs mean 84.7 91 68.7 53.6 80.7 73.3 72.82

Page 44: Human Pose Estimation by Deep Learning

44

Recent developments

• Deeppose: Human pose estimation via deep neural networks– A Toshev, C Szegedy – CVPR, 2014

• Joint training of a convolutional network and a graphical model for human pose estimation– JJ Tompson, A Jain, Y LeCun, C Bregler – NIPS, 2014

• Human Pose Estimation with Iterative Error Feedback – Carreira, Joao, et al. arXiv preprint arXiv:1507.06550 (2015).

• Maximum-Margin Structured Learning with Deep Networks for 3D Human PoseEstimation – S Li, W Zhang, AB Chan - arXiv preprint arXiv:1508.06708, 2015

• Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network – S Li, ZQ Liu, AB Chan – CVPR Workshop, 2014

• Flowing ConvNets for Human Pose Estimation in Videos – T Pfister, J Charles, A Zisserman - ICCV, 2015

• R-CNNs for Pose Estimation and Action Detection – G Gkioxari, B Hariharan, R Girshick, J Malik - arXiv preprint arXiv:1406.5212, 2014

• MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation – A Jain, J Tompson, Y LeCun, C Bregler -ACCV 2014

• Efficient object localization using convolutional networks– J Tompson, R Goroshin, A Jain, Y LeCun, C Bregler – CVPR, 2015

• Combining Local Appearance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation– Xiaochuan Fan, Kang Zheng, Yuewei Lin, Song Wang, CVPR 2015

• Parsing Occluded People by Flexible Compositions– Xianjie Chen, Alan L. Yuille. CVPR 2015

• Articulated pose estimation by a graphical model with image dependent pairwise relations– X Chen, AL Yuille –NIPS, 2014

• …

2015/9/11

Page 45: Human Pose Estimation by Deep Learning

Thank you

Human Pose Estimation by Deep LearningWei Yang

IVP Lab, CUHKSeptember 11, 2015

Page 46: Human Pose Estimation by Deep Learning

46

Evaluation Metrics

• Percentage of Correct Parts (PCP)– measures the percentage of correctly localized body parts.

– A candidate body part is treated as correct if its segment endpoints lie within 50% of the length of the ground-truth annotated endpoints.

• Percentage of Detected Joints (PDJ)– measures the performance using a curve of the percentage of correctly localized

joints by varying localization precision threshold, which is normalized by the scale defined as distance between left shoulder and right hip

– invariant to scale

2015/9/11