a non-parametric bayesian network prior of human pose › sebastian › papers ›...

1
We propose a general-purpose Bayesian network prior of human pose. Fully non-parametric: Estimation of both optimal information-theoretic topology and local conditional distributions from data. Compositional: Eective handling of the combinatorial explosion of articulated objects, thereby improving generalization. Superior performance: Better data representation than traditional global models and parametric networks on the large Human 3.6M dataset. Real-time: Fast and accurate computation of approximate likelihoods on datasets with up to 100k training poses. We compute expected log-likelihoods for our Chow-Liu/CKDE model and several baselines on the Human 3.6M dataset. Figure 1: Samples drawn from a single Chow-Liu/CKDE model. training data test data cluster centers exact evaluations approx. evaluations 4 25 50 10 15 10 10 10 5 10 2 10 0 Exact LL Mean absolute error (nats) Core clusters 4 25 50 0 1.5 5 10 Mean runtime (ms) 82% Approximate LL Approximation with 4 core clusters Our formulation allows to freely combine substructures, but only if they do not share a lot of information. = ) Compositionality exactly where needed and only where appropriate. Perceiving Systems – ps.is.tue.mpg.de 2 Non-parametric Networks “Standing” “Sitting” “Kneeing / Lying” Training samples Samples from model “wave both” ... ... “neutral” “wave left” ... “wave right” ... ... “wave right”: 50% “wave left”: 50% ... Inferred network Table 1: Expected log-likelihoods. Method Graph structure Training Testing Gaussian Global -266.84 -271.15 KDE Global -239.61 -263.77 GPLVM * Global -327.85 -341.89 Independent -352.80 -345.94 Gaussian linear Kinematic chain (order 1) -311.54 -310.98 network Kinematic chain (order 2) -305.54 -307.88 Chow-Liu tree -283.82 -284.03 CKDE network Independent -322.64 -322.25 Kinematic chain (order 1) -260.04 -270.52 Kinematic chain (order 2) -247.35 -263.83 Chow-Liu tree (ours) -242.24 - 254.98 * 25% subsampling; FITC Kinematic chain Mutual information Chow-Liu tree Learning the conditional distributions: We use a conditional kernel density estimate (CKDE) to learn the local models of the inferred tree, p ( X j X pa(j ) ) = p ( X j ,X pa(j ) ) R X j p ( X j ,X pa(j ) ) dX j = P i N (X j ,X pa(j ) ) (X (i) j ,X (i) pa(j ) ),BB > P i N X pa(j ) X (i) pa(j ) , (BB > )| X pa(j ) , where p ( X j ,X pa(j ) ) is an unconditional KDE with isotropic Gaussian kernel and bandwidth B proportional to the square root of the covariance. Important operations are ecient: Computation of a log-likelihood requires O (|V |) KDE evaluations. Ancestral sampling requires O (|V |) samples from the local models. [Gaussian mixture models with non-uniform weight distribution] A Non-parametric Bayesian Network Prior of Human Pose Andreas M. Lehrmann¹, Peter V. Gehler¹, Sebastian Nowozin² MPI for Intelligent Systems¹, Microsoft Research Cambridge² 3D pose dataset Learn topology / local models Non-parametric Bayesian Network Prior of Human Pose Score Test 3D skeletal data e.g., pose estimation 1 Overview 3 Compositionality & Generalization 4 Live Scoring References [1] C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 1968. [2] A. Gray and A. Moore. Nonparametric density estimation: Toward computational tractability. SIAM International Conference on Data Mining, 2003. [3] C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. Technical report, University of Bonn, 2012. Visit Us Learn a sparse and non-parametric Bayesian network B =(p, G (V,E )). Learning the graph structure: Minimize KL-divergence between the high-dimensional pose distribution q (X) and the tree-structured network p(X)= Q |V | j =1 p ( X j X pa(j ) ) , G := argmin pa KL (q (X) k p(X)) = MST(G 0 ), where G 0 is the complete graph with edge weights e jk = c MI(X j ,X k ). Applications in real-time environments require additional speed. Training: Cluster the training points into clusters {C (i) } i using k -means and build a kd-tree for their centres. Testing: Given a test pose x, use the kd-tree to compute a k -NN partitioning {C (i) } i = C e (x) U C a (x) and approximate the likelihood as p (x) (S e + S a ) . (N · det(B )), with S e = X C 2C e X j 2C B -1 x - x (j ) ⌘⌘ , [exact] S a = X C 2C a |C | ( B -1 ( x - C )) , [approx.] where C and |C | denote the centre and size of cluster C , respectively.

Upload: others

Post on 27-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Non-parametric Bayesian Network Prior of Human Pose › sebastian › papers › lehrmann2013human... · Non-parametric Bayesian Network Prior of Human Pose Test Score 3D skeletal

• We propose a general-purpose Bayesian network prior of human pose.

• Fully non-parametric: Estimation of both optimal information-theoretictopology and local conditional distributions from data.

• Compositional: E↵ective handling of the combinatorial explosion ofarticulated objects, thereby improving generalization.

• Superior performance: Better data representation than traditionalglobal models and parametric networks on the large Human 3.6M dataset.

• Real-time: Fast and accurate computation of approximate likelihoods ondatasets with up to 100k training poses.

• We compute expected log-likelihoods for our Chow-Liu/CKDE model andseveral baselines on the Human 3.6M dataset.

Figure 1: Samples drawn from a single Chow-Liu/CKDE model.

training datatest datacluster centersexact evaluationsapprox. evaluations

4 25 5010

−15

10−10

10−5

10−2

100

Exact LL

Me

an

ab

solu

te e

rro

r (n

ats

)

Core clusters4 25 50

0

1.5

5

10

Me

an

ru

ntim

e (

ms)

−82%

Approximate LL

Approximation with 4 core clusters

• Our formulation allows to freely combine substructures, but only if theydo not share a lot of information.

=) Compositionality exactly where needed and only where appropriate.

Perceiving Systems – ps.is.tue.mpg.de

2 Non-parametric Networks

“Standing”! “Sitting”!

“Kneeing / Lying”!

Training samples! Samples from model!

“wave both”!

...!

...!“neutral” !

“wave left” !

...!“wave right”!

...!...!“wave right”: 50% !

“wave left”: 50% !

...!

Inferred network!

Table 1: Expected log-likelihoods.

Method Graph structure Training Testing

Gaussian Global �266.84 �271.15KDE Global �239.61 �263.77GPLVM

*Global �327.85 �341.89

Independent �352.80 �345.94Gaussian linear Kinematic chain (order 1) �311.54 �310.98network Kinematic chain (order 2) �305.54 �307.88

Chow-Liu tree �283.82 �284.03

CKDE network

Independent �322.64 �322.25Kinematic chain (order 1) �260.04 �270.52Kinematic chain (order 2) �247.35 �263.83Chow-Liu tree (ours) �242.24 � 254.98

*25% subsampling; FITC

Kinematic chain Mutual information Chow-Liu tree

• Learning the conditional distributions:

We use a conditional kernel density estimate (CKDE) to learn the local

models of the inferred tree,

p�Xj

��Xpa(j)

�=

p�Xj , Xpa(j)

�RXj

p�Xj , Xpa(j)

�dXj

=

Pi N

⇣(Xj , Xpa(j))

��� (X(i)j , X(i)

pa(j)), BB>⌘

Pi N

⇣Xpa(j)

��� X(i)pa(j), (BB>

)|Xpa(j)

⌘ ,

where p�Xj , Xpa(j)

�is an unconditional KDE with isotropic Gaussian

kernel and bandwidth B proportional to the square root of the covariance.

• Important operations are e�cient:

– Computation of a log-likelihood requires O(|V |) KDE evaluations.

– Ancestral sampling requires O(|V |) samples from the local models.

[Gaussian mixture models with non-uniform weight distribution]

A Non-parametric Bayesian Network Prior of Human Pose Andreas M. Lehrmann¹, Peter V. Gehler¹, Sebastian Nowozin²

MPI for Intelligent Systems¹, Microsoft Research Cambridge²

3D pose dataset

Learn topology / local models

Non-parametric Bayesian Network

Prior of Human Pose

Score Test

3D skeletal data

e.g., pose estimation

1 Overview

3 Compositionality & Generalization

4 Live Scoring

References [1] C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees.

IEEE Transactions on Information Theory, 1968.[2] A. Gray and A. Moore. Nonparametric density estimation: Toward computational tractability.

SIAM International Conference on Data Mining, 2003.[3] C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu. Human3.6M: Large Scale Datasets and

Predictive Methods for 3D Human Sensing in Natural Environments.Technical report, University of Bonn, 2012.

Visit Us

Learn a sparse and non-parametric Bayesian network B = (p,G(V,E)).

• Learning the graph structure:Minimize KL-divergence between the high-dimensional pose distribution

q(X) and the tree-structured network p(X) =

Q|V |j=1 p

�Xj

��Xpa(j)

�,

G := argmin

paKL (q(X) k p(X)) = MST(G0

),

where G0is the complete graph with edge weights ejk =

cMI(Xj , Xk).

• Applications in real-time environments require additional speed.

• Training: Cluster the training points into clusters {C(i)}i usingk-means and build a kd-tree for their centres.

• Testing: Given a test pose x, use the kd-tree to compute a k-NN

partitioning {C(i)}i = Ce(x)U

Ca(x) and approximate the likelihood as

p (x) ⇡ (Se + Sa)

.(N · det(B)),

with

Se =

X

C2Ce

X

j2C

⇣B�1

⇣x� x

(j)⌘⌘

, [exact]

Sa =

X

C2Ca

|C|�B�1

�x� C

��, [approx.]

where C and |C| denote the centre and size of cluster C, respectively.