human pose estimation and action recognition · cascaded pyramid network for multi-person pose...
TRANSCRIPT
![Page 1: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/1.jpg)
Human Pose Estimation and Action Recognition
Gang Yu, Megvii (Face++)
Junsong Yuan, SUNY Buffalo
Zicheng Liu, Microsoft
ICIP 2019 Tutorial
![Page 2: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/2.jpg)
Overview
• Part1: Human Pose Estimation• 2D Skeleton
• Top-Down• Bottom-Up
• 3D Skeleton• 2D -> 3D Skeleton• 2D -> 3D Shape
• Application
• Part2: Action Recognition
– Datasets
• RGB
• RGB-D
– Skeleton based
approaches
• 2D and 3D skeletons
– Video based approaches• 2D/3D CNN features
![Page 3: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/3.jpg)
Gang Yuy u g a n g @ m e g v i i . c o m
Human Pose EstimationAlgorithm and Application
![Page 4: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/4.jpg)
Outline
• Introduction to Human Pose Estimation• 2D Skeleton
• Top-Down• Bottom-Up
• 3D Skeleton• 2D -> 3D Skeleton• 2D -> 3D Shape
• Application• Conclusion
![Page 5: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/5.jpg)
Outline
• Introduction to Human Pose Estimation• 2D Skeleton
• Top-Down• Bottom-Up
• 3D Skeleton• 2D -> 3D Skeleton• 2D -> 3D Shape
• Application• Conclusion
![Page 6: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/6.jpg)
What is Human Pose Estimation?
![Page 7: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/7.jpg)
Benchmark and Evaluation
• Benchmark• Single-person Estimation
• MPII, FLIC, LSP, LIP• Multi-person Keypoint Detection
• COCO, CrowdPose• Video
• PoseTrack• 3D
• Human3.6M, DensePose• Evaluation on COCO
![Page 8: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/8.jpg)
Outline
• Introduction to Human Pose Estimation• 2D Skeleton
• Top-Down• Bottom-Up
• 3D Skeleton• 2D -> 3D Skeleton• 2D -> 3D Shape
• Application• Conclusion
![Page 9: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/9.jpg)
2D Skeleton: How to Do Pose Estimation
• Top-down Approach VS Bottom-up Approach
• Top-down• Mask R-CNN, CPN, MSPN• High Performance (good localization ability), High Recall
• Bottom-up• Openpose, Associative Embeding• Clean framework, potentially fast speed
Human Head
L-Arm
Top-down
Bottom-up
Mask R-CNN, Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, ICCV 2018
Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun, CVPR 2018
Rethinking on Multi-Stage Networks for Human Pose Estimation, Wenbo Li, Zhicheng Wang, Binyi Yin, Qixiang Peng, Yuming Du, Tianzi Xiao, Gang Yu, Hongtao Lu,
Yichen Wei, Jian Sun
OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, Yaser Sheikh,
Associative Embedding: End-to-End Learning for Joint Detection and Grouping, Alejandro Newell, Zhiao Huang, Jia Deng, NIPS 2017
![Page 10: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/10.jpg)
Challenges
• Ambiguous Appearance• Crowd Case• Large Pose• Inference Speed
![Page 11: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/11.jpg)
Top-Down: Mask R-CNN
Mask R-CNN, Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, ICCV 2017
• Motivation:• Multi-task learning• ROI Pool -> ROI Align
![Page 12: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/12.jpg)
Top-Down: Mask R-CNN
Mask R-CNN, Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, ICCV 2017
• Experiments on COCO Skeleton:
![Page 13: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/13.jpg)
Top-Down: Hourglass
Stacked Hourglass Networks for Human Pose Estimation, Alejandro Newell, Kaiyu Yang, and Jia Deng, ECCV 2016
• Motivation:• Crop & Single Person Skeleton• Multi-stage context refinement
![Page 14: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/14.jpg)
Top-Down: Hourglass
• Structure of a one block
Stacked Hourglass Networks for Human Pose Estimation, Alejandro Newell, Kaiyu Yang, and Jia Deng, ECCV 2016
![Page 15: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/15.jpg)
Top-Down: Hourglass
• Experiments
Stacked Hourglass Networks for Human Pose Estimation, Alejandro Newell, Kaiyu Yang, and Jia Deng, ECCV 2016
![Page 16: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/16.jpg)
Top-Down: Single Person Skeleton: CPM
• Motivation:• Multi-stage context refinement• Large receptive Field -> long range spatial relationship
Convolutional Pose Machines, Shih-En Wei, Varun Ramakrishna, Takeo Kanade, Yaser Sheikh, CVPR 2016
![Page 17: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/17.jpg)
Top-Down: Cascade Pyramid Network
• Motivation: How to locate the “hard” joints• Human perspective
Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun, CVPR 2018
![Page 18: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/18.jpg)
Top-Down: Cascade Pyramid Network
• Motivation: How to locate the “hard” joints• Human perspective
Left elbow
Right hand
What ?
What?
Nose ✓
✓
✓
✕
✕
easy visible parts
Visible easy keypoints
![Page 19: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/19.jpg)
Top-Down: Cascade Pyramid Network
• Motivation: How to locate the “hard” joints• Human perspective
easy visible parts
Left elbow
Right hand
hard visible parts
What ?
Visible easy keypoints
enlarge view
context
Left knee
Right knee
Left hip
What?
Nose
enlarge view hard to
distinguish?
Visible hard
keypoints
✓
✓
✓
✓
✓
✓
✕
✕
![Page 20: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/20.jpg)
Top-Down: Cascade Pyramid Network
• Motivation: How to locate the “hard” joints• Human perspective
easy visible parts
Left elbow
Right hand
hard visible parts
What ?
Visible easy keypoints
enlarge view
context
Left knee
Right knee
Left hip
Invisible part
What?
context
Right
shoulder
Nose
enlarge view hard to
distinguish?
Visible hard
keypoints
✓
✓
✓
✓
✓
✓
✓
✕
✕
![Page 21: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/21.jpg)
Top-Down: Cascade Pyramid Network
• Motivation: How to locate the “hard” joints• Human perspective: Coarse to Fine
coarse parts
fine parts
Input image Output imagereceptive view getting larger
& more context
![Page 22: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/22.jpg)
Network Design Principles:
● Inspired by the process of human locating keypoints and adjusted to CNN network
○ locate easy parts => locate hard parts
● Two stages
○ GlobalNet: to locate the easy parts (Vanilla L2 loss)
○ RefineNet: to locate hard parts (deep layers) with online hard keypoint mining(Hard Mining Loss)
Network Architecture
![Page 23: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/23.jpg)
52.149.344.341.136.3Det mAP
Keypoint mAP
Experiments: Person Detector
68.8 69.4 69.7 69.8 69.8
![Page 24: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/24.jpg)
Experiments: Online Hard Keypoints Mining
![Page 25: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/25.jpg)
Experiments: Design Choices of GlobalNet & RefineNet
![Page 26: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/26.jpg)
Experiments
![Page 27: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/27.jpg)
Summary for CPN
• Hard Keypoints with Coarse-to-fine Strategy (context)• Code: https://github.com/chenyilun95/tf-cpn• MS COCO2017 Challenge Winner
![Page 28: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/28.jpg)
Top-Down: A Simple Baseline
Simple Baselines for Human Pose Estimation and Tracking, Bin Xiao, Haiping Wu, Yichen Wei, ECCV 2018
• Motivation• Simple Baseline & OKS based tracking• Spatial Resolution
![Page 29: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/29.jpg)
Top-Down: A Simple Baseline
• Experiments on COCO and PoseTrack
Simple Baselines for Human Pose Estimation and Tracking, Bin Xiao, Haiping Wu, Yichen Wei, ECCV 2018
![Page 30: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/30.jpg)
Top-Down: HRNet
Deep High-Resolution Representation Learning for Human Pose Estimation, Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang, CVPR2019
• Motivation• High Resolution Feature maps
![Page 31: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/31.jpg)
Top-Down: HRNet
Deep High-Resolution Representation Learning for Human Pose Estimation, Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang, CVPR2019
![Page 32: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/32.jpg)
Top-Down: HRNet
Deep High-Resolution Representation Learning for Human Pose Estimation, Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang, CVPR2019
• Experiments
![Page 33: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/33.jpg)
Top-Down: Multi-stage Pose Estimation
• Motivation• Upperbound• Only Two-stages available (limited Context)
Rethinking on Multi-Stage Networks for Human Pose Estimation, Wenbo Li, Zhicheng Wang, Binyi Yin, Qixiang Peng, Yuming Du, Tianzi Xiao, Gang Yu, Hongtao Lu, Yichen Wei, Jian Sun
![Page 34: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/34.jpg)
Top-Down: Multi-stage Pose Estimation
• Method• Coarse-to-fine with better information flow• Involve more stages
![Page 35: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/35.jpg)
Top-Down: Multi-stage Pose Estimation
• Cross Stage Feature Aggregation• Coarse-to-fine Supervision
![Page 36: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/36.jpg)
Experiments: More Stages
![Page 37: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/37.jpg)
Experiments: CTF & CSFA
![Page 38: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/38.jpg)
Experiments: COCO test-dev
![Page 39: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/39.jpg)
Experiments: COCO test-Challenge
![Page 40: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/40.jpg)
Summary for MSPN
• Refined Coarse-to-fine Strategy• Code: https://github.com/megvii-detection/MSPN• MS COCO2018 Challenge Winner
![Page 41: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/41.jpg)
Bottom-Up: DeepCut
• Motivation• Part Detector• Assemble (Integer Linear Optimization)
DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation, Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele, CVPR 2016
![Page 42: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/42.jpg)
Bottom-Up: DeeperCut
• Motivation• Deeper Part Detector + Assemble (image-conditioned
pairwise terms + incremental optimization)
DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model, Eldar Insafutdinov, Leonid Pishchulin, Bjoern Andres, Mykhaylo Andriluka, Bernt Schiele, ECCV2016
![Page 43: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/43.jpg)
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, CVPR 2017
Bottom-Up: OpenPose
• Motivation• Part Detector (CPM) + Assemble (PAF)
![Page 44: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/44.jpg)
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, CVPR 2017
Bottom-Up: OpenPose
• Motivation• Part Detector (CPM) + Assemble (PAF)
![Page 45: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/45.jpg)
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, CVPR 2017
Bottom-Up: OpenPose
• Experiments on MPI and COCO
![Page 46: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/46.jpg)
Associative Embedding: End-to-End Learning for Joint Detection and Grouping, Alejandro Newell, Zhiao Huang, Jia Deng, NIPS 2017
Bottom-Up: Associative Embedding
• Motivation• Part Detector (Hourglass) + Assemble (AE)
![Page 47: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/47.jpg)
Associative Embedding: End-to-End Learning for Joint Detection and Grouping, Alejandro Newell, Zhiao Huang, Jia Deng, NIPS 2017
Bottom-Up: Associative Embedding
• Motivation• Part Detector (Hourglass) + Assemble (AE)
![Page 48: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/48.jpg)
Associative Embedding: End-to-End Learning for Joint Detection and Grouping, Alejandro Newell, Zhiao Huang, Jia Deng, NIPS 2017
Bottom-Up: Associative Embedding
• Experiments on MPI and COCO
![Page 49: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/49.jpg)
Bottom-Up: Azure Kinect
![Page 50: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/50.jpg)
Azure Kinect DK
Build computer vision and speech
models using a developer kit with
advanced AI sensors
• Get started with a range of SDKs,
including an open-source Sensor
SDK.
• Experiment with multiple modes
and mounting options.
• Add cognitive services and manage
connected PCs with easy Azure
integration.
![Page 51: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/51.jpg)
Azure Kinect Body Tracking SDK
• Bottom up approach• On IR image
• Insensitive to environment lighting• DNN outputs
• Heat map• Part Affinity Field• Part Segmentation Map
• SDK outputs• 3D skeletons• Instance segmentation
![Page 52: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/52.jpg)
Neural Network
Contact: Lijuan Wang
Last Updated: April 20, 2019
![Page 53: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/53.jpg)
![Page 54: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/54.jpg)
Summary for 2D Skeleton
• Top-down vs Bottom-up• Top-down: Context & spatial resolution • Bottom-up: Assemble• Remaining issues
• Crowd• Spatial resolution• Speed
![Page 55: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/55.jpg)
Outline
• Introduction to Human Pose Estimation• 2D Skeleton
• Top-Down• Bottom-Up
• 3D Skeleton• 2D -> 3D Skeleton• 2D -> 3D shape
• Application• Conclusion
![Page 56: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/56.jpg)
Benchmark: H3.6M
• Large-scale Constrained 3D Skeleton benchmark
• 3.6M human pose
• Evaluations
• Protocol 1: Six subjects (S1, S5, S6, S7, S8, S9) are used in training. Evaluation is performed on every 64th frame of Subject 11’s videos. Alignment is used.
• Protocol 2: Five subjects (S1, S5, S6, S7, S8) are used for training. Evaluation is performed on every 64th frame of two subjects (S9, S11)
http://vision.imar.ro/human3.6m/description.php
![Page 57: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/57.jpg)
3D Skeleton: 3D Human Pose Estimation = 2D Pose Estimation + Matching
• Motivation• 3D = 2D CNN + NN Match
https://zpascal.net/cvpr2017/Chen_3D_Human_Pose_CVPR_2017_paper.pdf
3D Human Pose Estimation = 2D Pose Estimation + Matching, Ching-Hang Chen Deva Ramanan, CVPR2017
• Split or Joint Training
• 3D structure: 2D Joints
![Page 58: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/58.jpg)
3D Skeleton: 3D Human Pose Estimation = 2D Pose Estimation + Matching
• Experiments
https://zpascal.net/cvpr2017/Chen_3D_Human_Pose_CVPR_2017_paper.pdf
3D Human Pose Estimation = 2D Pose Estimation + Matching, Ching-Hang Chen Deva Ramanan, CVPR2017
![Page 59: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/59.jpg)
3D Skeleton: A simple yet effective baseline for 3d human pose estimation
• Motivation• 3D = 2D CNN + Mapping
http://openaccess.thecvf.com/content_ICCV_2017/papers/Martinez_A_Simple_yet_ICCV_2017_paper.pdf
A simple yet effective baseline for 3d human pose estimation, Deva Ramanan, Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little, ICCV2018
• Split or Joint Training
• 3D structure: 2D Joints
![Page 60: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/60.jpg)
3D Skeleton: A simple yet effective baseline for 3d human pose estimation
• Experiments
http://openaccess.thecvf.com/content_ICCV_2017/papers/Martinez_A_Simple_yet_ICCV_2017_paper.pdfA simple yet effective baseline for 3d human pose estimation, Deva Ramanan, Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little, ICCV2018
![Page 61: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/61.jpg)
3D Skeleton: Compositional Human Pose Regression
• Motivation• Bone Representation + 2D & 3D Joint training
http://openaccess.thecvf.com/content_ICCV_2017/papers/Sun_Compositional_Human_Pose_ICCV_2017_paper.pdf
Compositional Human Pose Regression, Xiao Sun, Jiaxiang Shang, Shuang Liang, Yichen Wei, ICCV2017
• Split or Joint Training
• 3D structure: 2D Joints + bone
![Page 62: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/62.jpg)
3D Skeleton: Compositional Human Pose Regression
• Experiments
http://openaccess.thecvf.com/content_ICCV_2017/papers/Sun_Compositional_Human_Pose_ICCV_2017_paper.pdf
Compositional Human Pose Regression, Xiao Sun, Jiaxiang Shang, Shuang Liang, Yichen Wei, ICCV2017
![Page 63: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/63.jpg)
3D Skeleton: Integral Human Pose Regression
• Motivation• Heatmap vs Regression
• Heatmap: non-differentiable, quantization error• Regression: miss spatial structure
• Integral loss
http://openaccess.thecvf.com/content_ICCV_2017/papers/Sun_Compositional_Human_Pose_ICCV_2017_paper.pdf
Integral Human Pose Regression, Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, and Yichen Wei, ECCV2018
• Split or Joint Training
• 3D structure: 3D Heatmaps
![Page 64: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/64.jpg)
3D Skeleton: Integral Human Pose Regression
• Experiments
http://openaccess.thecvf.com/content_ICCV_2017/papers/Sun_Compositional_Human_Pose_ICCV_2017_paper.pdf
Integral Human Pose Regression, Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, and Yichen Wei, ECCV2018
![Page 65: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/65.jpg)
3D Shape: DensePose
• Motivation• Dense Correspondence
DensePose: Dense Human Pose Estimation In The Wild, Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos, CVPR2018
![Page 66: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/66.jpg)
3D Shape: DensePose
• Dataset• DensePose-COCO Dataset
DensePose: Dense Human Pose Estimation In The Wild, Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos, CVPR2018
50K Images, 5M correspondences
24 UV Parts
![Page 67: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/67.jpg)
3D Shape: DensePose
• Method
DensePose: Dense Human Pose Estimation In The Wild, Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos, CVPR2018
![Page 68: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/68.jpg)
3D Shape: DensePose
• Experiments
DensePose: Dense Human Pose Estimation In The Wild, Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos, CVPR2018
![Page 69: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/69.jpg)
Summary for 3D Skeleton
• 3D Representation: 3D Skeleton vs 3D Shape • 2D -> 3D Joint -> 3D Shape• Remaining issues
• Unconstrained (in the wild) benchmark• Ambiguous poses• Joint training of both 2D and 3D skeleton data
![Page 70: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/70.jpg)
Outline
• Introduction to Human Pose Estimation• 2D Skeleton
• Top-Down• Bottom-Up
• 3D Skeleton• 2D -> 3D Skeleton• 2D -> 3D Shape
• Application• Conclusion
![Page 71: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/71.jpg)
Application: Action Recognition
![Page 72: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/72.jpg)
Application: Robotics
![Page 73: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/73.jpg)
Application: Human-Computer Interaction
![Page 74: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/74.jpg)
Application: Mobile Applications
![Page 75: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/75.jpg)
Outline
• Introduction to Human Pose Estimation• 2D Skeleton
• Top-Down• Bottom-Up
• 3D Skeleton• 2D -> 3D Skeleton• 2D -> 3D Shape
• Application• Conclusion
![Page 76: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/76.jpg)
Conclusion
• 2D Skeleton (context, resolution) -> 3D Skeleton (regression) -> 3D shape (Representation)
• A lot of potential applications based on Skeleton
• Action, Interaction, Game
• An improvement of skeleton is a large step for the industry
![Page 77: Human Pose Estimation and Action Recognition · Cascaded Pyramid Network for Multi-Person Pose Estimation, Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun,](https://reader034.vdocuments.site/reader034/viewer/2022050103/5f41fc1cf140b93fce28bd12/html5/thumbnails/77.jpg)