total capture - cvssp · 2017-10-03 · related work tuesday, 03 october 2017 6 wei et al....
TRANSCRIPT
![Page 1: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/1.jpg)
Total Capture3D Human Pose Estimation Fusing Video and Inertial Sensors
M. Trumble, A. Gilbert, C. Malleson, A. Hilton and J. CollomosseCentre for Vision, Speech and Signal Processing
Tuesday, 03 October 2017 1
![Page 2: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/2.jpg)
Motivation
Tuesday, 03 October 2017 2
Image credit: Electronic Arts
![Page 3: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/3.jpg)
Related Work
Tuesday, 03 October 2017 6
Wei et al. Convolutional Pose Machines, CVPR 2016
Pavlakos et al. Harvesting Multiple Views for Marker-less 3D Human Pose Annotations, CVPR 2017
Andrews et al. Real-time Physics-based Motion Capture with Sparse Sensors, CVMP 2016
![Page 4: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/4.jpg)
Contributions
Tuesday, 03 October 2017 7
• Accurate 3D human pose estimation
• 3D Convolutional Neural Network
• Fusion of video and IMUs
• New multi-modal dataset
![Page 5: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/5.jpg)
Contributions
Total Capture Dataset
Tuesday, 03 October 2017 8
• 4 x 6 metre capture volume
• 8 x 1080p60 video cameras
• 13 IMU sensors
• Vicon ground truth labelling
• 5 subjects x 12 sequences
http://cvssp.org/data/totalcapture
![Page 6: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/6.jpg)
Contributions
Total Capture Dataset
Tuesday, 03 October 2017 9
Xsens MTw Awinda wireless motion
trackers
• Calibrated orientation and
acceleration per unit at 60Hz
Vicon motion capture for testing
• Solved skeleton provided in BVH
format, also 60Hz
http://cvssp.org/data/totalcapture
![Page 7: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/7.jpg)
Pipeline
Overview
Tuesday, 03 October 2017 10
![Page 8: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/8.jpg)
Pipeline
Volumetric Pose Estimation – Probabilistic Visual Hull (PVH)
Tuesday, 03 October 2017 11
• Geometric proxy constructed from MVV
• Capture volume decimated into 1cm3 grid
• Voxels assigned probability of occupancy
• Downsampled to 30x30x30 grid for CNN
0 1
![Page 9: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/9.jpg)
Pipeline
Volumetric Pose Estimation – 3D CNN Training
Tuesday, 03 October 2017 12
• Trained with stochastic gradient descent to minimize mean squared error
over 26 3D joint positions
• 100K unique training poses / 50K test from Total Capture dataset
• Augmented during training with random rotation around vertical axis
![Page 10: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/10.jpg)
Pipeline
Inertial Pose Estimation
Tuesday, 03 October 2017 13
• 13 inertial measurement units (IMUs)
• Arms and legs, feet, head, sternum and pelvis
• Manual calibration to an initial T-pose
• Joint angles inferred by forward kinematics
![Page 11: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/11.jpg)
Pipeline
Inertial Pose Estimation – forward kinematics
Tuesday, 03 October 2017 14
Assume fixed relative orientation between
each IMU 𝑘 ∈ 1,13 and bone: 𝑅𝑖𝑏𝑘
Global bone orientation 𝑹𝒃𝒌 = (𝑹𝒊𝒃
𝒌 )−𝟏 𝑹𝒊𝒘𝒌 𝑹𝒊𝒎
𝒌
where 𝑅𝑖𝑤𝑘 is IMU reference frame in global coordinates
and local IMU measurement 𝑅𝑖𝑚𝑘
![Page 12: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/12.jpg)
Pipeline
Inertial Pose Estimation – forward kinematics
Tuesday, 03 October 2017 15
Local joint rotation 𝑹𝒉𝒊 = 𝑹𝒃
𝒊 (𝑹𝒃𝒑𝒂𝒓(𝒊)
)−𝟏
Inferred from parent bone, 𝑝𝑎𝑟(𝑖)by forward kinematics beginning at root node
![Page 13: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/13.jpg)
Pipeline
Temporal Sequence Prediction (TSP)
Tuesday, 03 October 2017 16
• Long Short Term Memory RNN (LSTM)
• Exploits temporal nature of motion
• Independent model for each modality
• Learns joint locations based on
previous 5 frames
![Page 14: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/14.jpg)
Pipeline
Temporal Sequence Prediction (TSP) – LSTM details
Tuesday, 03 October 2017 17
Memory cell,
𝒄𝒕 = 𝒇𝒕 ∘ 𝒄𝒕−𝟏 + 𝒊𝒕 ∘ 𝝈𝒉(𝑾𝒙𝒙𝒕 + 𝑼𝒄𝒉𝒕−𝟏 + 𝒃𝒄)
Input vector 𝑥𝑡, output vector ℎ𝑡 = 𝑜𝑡 ∘ 𝜎ℎ(𝑐𝑡),learnt weights 𝑊 and 𝑈
sigmoid function 𝜎𝑔,hyperbolic tangent 𝜎ℎ,
vector constant 𝑏
![Page 15: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/15.jpg)
Pipeline
Temporal Sequence Prediction (TSP) – LSTM details
Tuesday, 03 October 2017 18
Memory cell,
𝒄𝒕 = 𝒇𝒕 ∘ 𝒄𝒕−𝟏 + 𝒊𝒕 ∘ 𝝈𝒉(𝑾𝒙𝒙𝒕 + 𝑼𝒄𝒉𝒕−𝟏 + 𝒃𝒄)
Input gate 𝒊𝒕 = 𝝈𝒈(𝑾𝒊𝒙𝒕 + 𝑼𝒊𝒉𝒕−𝟏 + 𝒃𝒊)
Input vector 𝑥𝑡, output vector ℎ𝑡 = 𝑜𝑡 ∘ 𝜎ℎ(𝑐𝑡),learnt weights 𝑊 and 𝑈
sigmoid function 𝜎𝑔,hyperbolic tangent 𝜎ℎ,
vector constant 𝑏
![Page 16: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/16.jpg)
Pipeline
Temporal Sequence Prediction (TSP) – LSTM details
Tuesday, 03 October 2017 19
Memory cell,
𝒄𝒕 = 𝒇𝒕 ∘ 𝒄𝒕−𝟏 + 𝒊𝒕 ∘ 𝝈𝒉(𝑾𝒙𝒙𝒕 + 𝑼𝒄𝒉𝒕−𝟏 + 𝒃𝒄)
Input vector 𝑥𝑡, output vector ℎ𝑡 = 𝑜𝑡 ∘ 𝜎ℎ(𝑐𝑡),learnt weights 𝑊 and 𝑈
sigmoid function 𝜎𝑔,hyperbolic tangent 𝜎ℎ,
vector constant 𝑏
Forget gate 𝒇𝒕 = 𝝈𝒈(𝑾𝒇𝒙𝒕 +𝑼𝒇𝒉𝒕−𝟏 + 𝒃𝒇)
![Page 17: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/17.jpg)
Pipeline
Temporal Sequence Prediction (TSP) – LSTM details
Tuesday, 03 October 2017 20
Memory cell,
𝒄𝒕 = 𝒇𝒕 ∘ 𝒄𝒕−𝟏 + 𝒊𝒕 ∘ 𝝈𝒉(𝑾𝒙𝒙𝒕 + 𝑼𝒄𝒉𝒕−𝟏 + 𝒃𝒄)
Input vector 𝑥𝑡, output vector ℎ𝑡 = 𝑜𝑡 ∘ 𝜎ℎ(𝑐𝑡),learnt weights 𝑊 and 𝑈
sigmoid function 𝜎𝑔,hyperbolic tangent 𝜎ℎ,
vector constant 𝑏
Output gate 𝒐𝒕 = 𝝈𝒈(𝑾𝒐𝒙𝒕 + 𝑼𝒐𝒉𝒕−𝟏 + 𝒃𝒐)
![Page 18: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/18.jpg)
Evaluation – video branch
Human 3.6M
Tuesday, 03 October 2017 21
PVH Only PVH + TSP Ground Truth Source
![Page 19: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/19.jpg)
Evaluation – video branch
Human 3.6M
Tuesday, 03 October 2017 22
Average per joint error in millimetres
![Page 20: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/20.jpg)
Pipeline
Fusion layer
Tuesday, 03 October 2017 23
![Page 21: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/21.jpg)
Evaluation – full pipeline
Total Capture Dataset – Full Pipeline
Tuesday, 03 October 2017 24
PVH + TSP IMU + TSP Fusion Source
![Page 22: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/22.jpg)
Evaluation – full pipeline
Total Capture Dataset
Tuesday, 03 October 2017 25
Average per joint error in millimetres
![Page 23: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/23.jpg)
Evaluation – full pipeline
Total Capture Dataset – Full Pipeline
Tuesday, 03 October 2017 26
PVH + TSP IMU + TSP Fusion Source
![Page 24: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/24.jpg)
Evaluation
Training data volume PVH resolution
Tuesday, 03 October 2017 27
Training data randomly sampledfrom ~100k MVV frames
16x16x16 48x48x48
![Page 25: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/25.jpg)
Evaluation
Camera ablation study
Tuesday, 03 October 2017 28
4 cameras 6 cameras 8 cameras
Relative accuracy change (mm/joint)
![Page 26: Total Capture - CVSSP · 2017-10-03 · Related Work Tuesday, 03 October 2017 6 Wei et al. Convolutional Pose Machines, CVPR 2016 Pavlakos et al. Harvesting Multiple Views for Marker-less](https://reader034.vdocuments.site/reader034/viewer/2022050200/5f538e5702cb8d1be9562a30/html5/thumbnails/26.jpg)
Conclusion
• Novel 3D human pose estimation fusing MVV
and IMU signals
• Demonstrates high accuracy and
complementary nature of the two modalities
• New hybrid MVV dataset including video, IMU
and 3D ground truth
Tuesday, 03 October 2017 29
http://cvssp.org/data/totalcapture