expectation propagation in dynamical systems · 8/10/2012 · linear systems: kalman lter/smoother...
TRANSCRIPT
Expectation Propagation in Dynamical Systems
Marc Peter Deisenroth
Joint Work with Shakir Mohamed (UBC)
August 10, 2012
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1
Motivation
Figure : Complex time series: motion capture, GDP, climate
Time series in economics, robotics, motion capture, etc. haveunknown dynamical structure, are high-dimensional and noisy
Flexible and accurate modelsNonlinear (Gaussian process) dynamical systems (GPDS)
Accurate inference in (GP)DS important forBetter knowledge about latent structuresParameter learning
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 2
Outline
1 Inference in Time Series ModelsFiltering and SmoothingExpectation PropagationApproximating the Partition FunctionRelation to Smoothing
2 EP in Gaussian Process Dynamical SystemsGaussian ProcessesFiltering/Smoothing in GPDSExpectation Propagation in GPDS
3 Results
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 3
Inference in Time Series Models Filtering and Smoothing
Time Series Models
xt−1 xt xt+1
zt−1 zt zt+1
xt = f(xt−1) + w , w ∼ N(0, Q
)zt = g(xt) + v , v ∼ N
(0, R
)Latent state x ∈ RD
Measurement/observation z ∈ RE
Transition function f
Measurement function g
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 4
Inference in Time Series Models Filtering and Smoothing
Inference in Time Series Models
xt−1 xt xt+1
zt−1 zt zt+1
Objective: Posterior distribution over latent variables xtFiltering (Forward Inference)Compute p(xt|z1:t) for t = 1, . . . , TSmoothing (Forward-Backward Inference)Compute p(xt|z1:t) for t = 1, . . . , T (forward sweep)Compute p(xt|z1:T ) for t = T, . . . , 1 (backward sweep)
Examples:
Linear systems: Kalman filter/smoother (Kalman, 1959)Nonlinear systems: Approximate inference
Extended Kalman Filter/Smoother (Kalman, 1959–1961)Unscented Kalman Filter/Smoother (Julier & Uhlmann, 1997)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 5
Inference in Time Series Models Filtering and Smoothing
Inference in Time Series Models
xt−1 xt xt+1
zt−1 zt zt+1
Objective: Posterior distribution over latent variables xtFiltering (Forward Inference)Compute p(xt|z1:t) for t = 1, . . . , TSmoothing (Forward-Backward Inference)Compute p(xt|z1:t) for t = 1, . . . , T (forward sweep)Compute p(xt|z1:T ) for t = T, . . . , 1 (backward sweep)
Examples:
Linear systems: Kalman filter/smoother (Kalman, 1959)Nonlinear systems: Approximate inference
Extended Kalman Filter/Smoother (Kalman, 1959–1961)Unscented Kalman Filter/Smoother (Julier & Uhlmann, 1997)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 5
Inference in Time Series Models Filtering and Smoothing
Machine Learning Perspective
xt−1 xt xt+1
zt−1 zt zt+1
Treat filtering/smoothing as an inference problem in graphicalmodels with hidden variables
Allows for efficient local message passing distributed
Messages are unnormalized probability distributions
Iterative refinement of the posterior marginals p(xt), t = 1, . . . , TMultiple forward-backward sweeps until global consistency
(convergence)
Here: Expectation Propagation (Minka 2001)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 6
Inference in Time Series Models Expectation Propagation
Expectation Propagation
xt−1 xt xt+1
zt−1 zt zt+1
xt xt+1
p(xt+1|xt)
p(zt|xt) p(zt+1|xt+1)
Inference in factor graphs
p(xt) =∏n
i=1 ti(xt)
q(xt) =∏n
i=1 ti(xt)
Approximate factors ti are members of the Exponential Family(e.g., Multinomial, Gamma, Gaussian)
Find good a good approximation such that q ≈ p
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 7
Inference in Time Series Models Expectation Propagation
Expectation Propagation
xt−1 xt xt+1
zt−1 zt zt+1
xt xt+1
p(xt+1|xt)
p(zt|xt) p(zt+1|xt+1)
Inference in factor graphs
p(xt) =∏n
i=1 ti(xt)
q(xt) =∏n
i=1 ti(xt)
Approximate factors ti are members of the Exponential Family(e.g., Multinomial, Gamma, Gaussian)
Find good a good approximation such that q ≈ p
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 7
Inference in Time Series Models Expectation Propagation
Expectation Propagation
Figure : Moment matching vs. mode matching. Borrowed from Bishop (2006)
EP locally minimizes KL(p||q), where p is the true distribution and qis an approximation (from Exponential Family) to it.
EP = moment matching (unlike Variational Bayes [“modematching”], which minimizes KL(q||p))
EP exploits properties of the Exponential Family: Compute momentsof distributions via derivatives of the log-partition function
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 8
Inference in Time Series Models Expectation Propagation
Expectation Propagation
qB(xt) xt
qM(xt)
xt+1qC(xt+1)
qM(xt+1)
p(xt+1|xt)
p(zt|xt) p(zt+1|xt+1)
qB(xt)xt
qM(xt)
xt+1
qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
Figure : Factor graph (left) and fully factored factor graph (right).
Write down the (fully factored) factor graph
p(xt) =∏n
i=1 ti(xt)
q(xt) =∏n
i=1 ti(xt)
Find approximate ti, such that KL(p||q) is minimized.
Multiple sweeps through graph until global consistency of themessages is assured
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 9
Inference in Time Series Models Expectation Propagation
Expectation Propagation
qB(xt) xt
qM(xt)
xt+1qC(xt+1)
qM(xt+1)
p(xt+1|xt)
p(zt|xt) p(zt+1|xt+1)
qB(xt)xt
qM(xt)
xt+1
qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
Figure : Factor graph (left) and fully factored factor graph (right).
Write down the (fully factored) factor graph
p(xt) =∏n
i=1 ti(xt)
q(xt) =∏n
i=1 ti(xt)
Find approximate ti, such that KL(p||q) is minimized.
Multiple sweeps through graph until global consistency of themessages is assured
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 9
Inference in Time Series Models Expectation Propagation
Messages in a Dynamical System
qB(xt)xt
qM(xt)
xt+1
qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
Approximate (factored) marginal: q(xt) =∏
i ti(xt)
Here, our messages ti have names:
Measurement message qMForward message qBBackward message qC
Define cavity distribution: q\i(xt) = q(xt)/ti(xt) =∏
k 6=i tk(xt)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 10
Inference in Time Series Models Expectation Propagation
Gaussian EP in More Detail
qB(xt)xt
qM(xt)
xt+1
qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
1 Write down the factor graph
2 Initialize all messages ti, i = M,B,CUntil convergence:
3 For all latent variables xt and corresponding messages ti(xt) do
1 Compute the cavity distribution q\i(xt) = N(xt |µ\i
t , Σ\it
)by
Gaussian division.2 Compute the moments of ti(xt)q
\i(xt)Updated moments of q(xt)
3 Compute updated message
ti(xt) = q(xt)/q\i(xt)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11
Inference in Time Series Models Expectation Propagation
Gaussian EP in More Detail
qB(xt)xt
qM(xt)
xt+1
qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
1 Write down the factor graph
2 Initialize all messages ti, i = M,B,CUntil convergence:
3 For all latent variables xt and corresponding messages ti(xt) do
1 Compute the cavity distribution q\i(xt) = N(xt |µ\i
t , Σ\it
)by
Gaussian division.2 Compute the moments of ti(xt)q
\i(xt)Updated moments of q(xt)
3 Compute updated message
ti(xt) = q(xt)/q\i(xt)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11
Inference in Time Series Models Expectation Propagation
Gaussian EP in More Detail
qB(xt)xt
qM(xt)
xt+1
qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
1 Write down the factor graph
2 Initialize all messages ti, i = M,B,CUntil convergence:
3 For all latent variables xt and corresponding messages ti(xt) do
1 Compute the cavity distribution q\i(xt) = N(xt |µ\i
t , Σ\it
)by
Gaussian division.2 Compute the moments of ti(xt)q
\i(xt)Updated moments of q(xt)
3 Compute updated message
ti(xt) = q(xt)/q\i(xt)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11
Inference in Time Series Models Expectation Propagation
Gaussian EP in More Detail
qB(xt)xt
qM(xt)
xt+1
qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
1 Write down the factor graph
2 Initialize all messages ti, i = M,B,CUntil convergence:
3 For all latent variables xt and corresponding messages ti(xt) do
1 Compute the cavity distribution q\i(xt) = N(xt |µ\i
t , Σ\it
)by
Gaussian division.
2 Compute the moments of ti(xt)q\i(xt)
Updated moments of q(xt)3 Compute updated message
ti(xt) = q(xt)/q\i(xt)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11
Inference in Time Series Models Expectation Propagation
Gaussian EP in More Detail
qB(xt)xt
qM(xt)
xt+1
qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
1 Write down the factor graph
2 Initialize all messages ti, i = M,B,CUntil convergence:
3 For all latent variables xt and corresponding messages ti(xt) do
1 Compute the cavity distribution q\i(xt) = N(xt |µ\i
t , Σ\it
)by
Gaussian division.2 Compute the moments of ti(xt)q
\i(xt)Updated moments of q(xt)
3 Compute updated message
ti(xt) = q(xt)/q\i(xt)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11
Inference in Time Series Models Expectation Propagation
Gaussian EP in More Detail
qB(xt)xt
qM(xt)
xt+1
qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
1 Write down the factor graph
2 Initialize all messages ti, i = M,B,CUntil convergence:
3 For all latent variables xt and corresponding messages ti(xt) do
1 Compute the cavity distribution q\i(xt) = N(xt |µ\i
t , Σ\it
)by
Gaussian division.2 Compute the moments of ti(xt)q
\i(xt)Updated moments of q(xt)
3 Compute updated message
ti(xt) = q(xt)/q\i(xt)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11
Inference in Time Series Models Expectation Propagation
Updating the Measurement Message
qB(xt) xt
qM(xt)
qC(xt)
Measurement message
qM(xt) =proj[
true factor︷ ︸︸ ︷tM(xt)
cavity distr.︷ ︸︸ ︷q\M(xt) ]
q\M(xt)
The proj[.] operator projects onto Exponential Family distributionsImplemented by taking derivatives of the log partition
function logZM, where
ZM =
∫tM(xt)q
\M(xt)dxt , tM(xt) = p(zt|xt)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 12
Inference in Time Series Models Expectation Propagation
Updating in Context: Forward Message
qB(xt) xt
qM(xt)
xt+1qC(xt+1)
qM(xt+1)
p(xt+1|xt) qB(xt) xt
qM(xt)
xt+1qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
Forward message Need to take the coupling between xt and xt+1
into account (lost when writing down the fully factored factor graph).
Key insight: Want a close approximation
qC(xt+1)qM(xt+1)︸ ︷︷ ︸context q\B(xt+1)
qB(xt+1) ≈ q\B(xt+1)
∫p(xt+1|xt)qB(xt)qM(xt)dxt
Achieve this by projection
qB(xt+1) =proj[
cavity distr.︷ ︸︸ ︷q\B(xt+1)
true factor︷ ︸︸ ︷tB(xt+1)]
q\B(xt+1),
tB(xt+1) =
∫p(xt+1|xt)qB(xt)qM(xt)dxt
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 13
Inference in Time Series Models Expectation Propagation
Updating in Context: Forward Message
qB(xt) xt
qM(xt)
xt+1qC(xt+1)
qM(xt+1)
p(xt+1|xt) qB(xt) xt
qM(xt)
xt+1qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
Forward message Need to take the coupling between xt and xt+1
into account (lost when writing down the fully factored factor graph).Key insight: Want a close approximation
qC(xt+1)qM(xt+1)︸ ︷︷ ︸context q\B(xt+1)
qB(xt+1) ≈ q\B(xt+1)
∫p(xt+1|xt)qB(xt)qM(xt)dxt
Achieve this by projection
qB(xt+1) =proj[
cavity distr.︷ ︸︸ ︷q\B(xt+1)
true factor︷ ︸︸ ︷tB(xt+1)]
q\B(xt+1),
tB(xt+1) =
∫p(xt+1|xt)qB(xt)qM(xt)dxt
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 13
Inference in Time Series Models Expectation Propagation
Updating in Context: Forward Message
qB(xt) xt
qM(xt)
xt+1qC(xt+1)
qM(xt+1)
p(xt+1|xt) qB(xt) xt
qM(xt)
xt+1qC(xt+1)
qM(xt+1)
qB(xt+1)qC(xt)
Forward message Need to take the coupling between xt and xt+1
into account (lost when writing down the fully factored factor graph).Key insight: Want a close approximation
qC(xt+1)qM(xt+1)︸ ︷︷ ︸context q\B(xt+1)
qB(xt+1) ≈ q\B(xt+1)
∫p(xt+1|xt)qB(xt)qM(xt)dxt
Achieve this by projection
qB(xt+1) =proj[
cavity distr.︷ ︸︸ ︷q\B(xt+1)
true factor︷ ︸︸ ︷tB(xt+1)]
q\B(xt+1),
tB(xt+1) =
∫p(xt+1|xt)qB(xt)qM(xt)dxt
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 13
Inference in Time Series Models Approximating the Partition Function
Key Points and Challenge
EP is based on matching the moments of ti(xt)q\i(xt)
Computing the partition function
Zi(µ\it ,Σ
\it ) =
∫ti(xt)q
\i(xt)dxt
and its derivatives with respect to µ\it and Σ
\it are sufficient for EP
Properties of the Exponential Family
Tricky part: Integral not solvable for nonlinear systems withcontinuous variables
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 14
Inference in Time Series Models Approximating the Partition Function
Key Points and Challenge
EP is based on matching the moments of ti(xt)q\i(xt)
Computing the partition function
Zi(µ\it ,Σ
\it ) =
∫ti(xt)q
\i(xt)dxt
and its derivatives with respect to µ\it and Σ
\it are sufficient for EP
Properties of the Exponential Family
Tricky part: Integral not solvable for nonlinear systems withcontinuous variables
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 14
Inference in Time Series Models Approximating the Partition Function
Approach
Interpretation of partition function Zi as a probability distribution.Example: Measurement message
ZM =
∫tM(x)q
\M(x)dx =
∫p(z|x)q\M(x)dx
= p(z)
Idea: Approximate p(z) by a (Gaussian) distribution ZM
Take the derivatives of log ZM with respect to the moments of thecavity distribution
Get updated moments for the posterior and the messagesFixes the intractability problems, but we are no longer exact
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 15
Inference in Time Series Models Approximating the Partition Function
Approach
Interpretation of partition function Zi as a probability distribution.Example: Measurement message
ZM =
∫tM(x)q
\M(x)dx =
∫p(z|x)q\M(x)dx
= p(z)
Idea: Approximate p(z) by a (Gaussian) distribution ZM
Take the derivatives of log ZM with respect to the moments of thecavity distribution
Get updated moments for the posterior and the messagesFixes the intractability problems, but we are no longer exact
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 15
Inference in Time Series Models Approximating the Partition Function
Approach
Interpretation of partition function Zi as a probability distribution.Example: Measurement message
ZM =
∫tM(x)q
\M(x)dx =
∫p(z|x)q\M(x)dx
= p(z)
Idea: Approximate p(z) by a (Gaussian) distribution ZM
Take the derivatives of log ZM with respect to the moments of thecavity distribution
Get updated moments for the posterior and the messagesFixes the intractability problems, but we are no longer exact
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 15
Inference in Time Series Models Approximating the Partition Function
Possible Gaussian Approximations
Example: Measurement message
ZM =
∫tM(x)q
\M(x)dx =
∫tM(x)N
(x |µ\M, Σ\M
)dx
tM(x) = N(z | g(x), S
)
Linearize g at µ\M integral tractable
Gaussian moment matching: compute mean and variance of ZMapproximate ZM by a Gaussian with the correct mean/variance
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 16
Inference in Time Series Models Approximating the Partition Function
Possible Gaussian Approximations
Example: Measurement message
ZM =
∫tM(x)q
\M(x)dx =
∫tM(x)N
(x |µ\M, Σ\M
)dx
tM(x) = N(z | g(x), S
)Linearize g at µ\M integral tractable
Gaussian moment matching: compute mean and variance of ZMapproximate ZM by a Gaussian with the correct mean/variance
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 16
Inference in Time Series Models Relation to Smoothing
Theoretical Results
ZM =
∫tM(x)q
\M(x)dx =
∫tM(x)N
(x |µ\M, Σ\M
)dx
tM(x) = N(z | g(x), S
)Relation to Common Filters/Smoothers
Approximating ZM by a Gaussian ZM is equivalent to approximatingp(x, z) by a Gaussian—an approximation that is common to almost allfiltering algorithmsa
aDeisenroth & Ohlsson (ACC 2011)
Generalizing Common Smoothers
Linearizing g(x) in ZM generalizes the EKS to an iterative procedure
Moment matching generalizes the ADS to an iterative procedure
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 17
Inference in Time Series Models Relation to Smoothing
Theoretical Results
ZM =
∫tM(x)q
\M(x)dx =
∫tM(x)N
(x |µ\M, Σ\M
)dx
tM(x) = N(z | g(x), S
)Relation to Common Filters/Smoothers
Approximating ZM by a Gaussian ZM is equivalent to approximatingp(x, z) by a Gaussian—an approximation that is common to almost allfiltering algorithmsa
aDeisenroth & Ohlsson (ACC 2011)
Generalizing Common Smoothers
Linearizing g(x) in ZM generalizes the EKS to an iterative procedure
Moment matching generalizes the ADS to an iterative procedure
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 17
Inference in Time Series Models Relation to Smoothing
Interesting Side Effects
To minimize the KL divergence, EP updates require the derivatives
∂ logZM∂µ\M
,∂ logZM
∂Σ\M
The Gaussian approximation of ZM = p(z) ≈ N(µz, Σz
)is exact if
and only if there is a linear relationship between x and z, i.e.,
z = Jx , x ∼ N(µ\M, Σ\M
)for some J µz,Σz have a special form
Linearity must be explicitly encoded in the partial derivatives!
Example:
∂ logZM∂µ\M
=∂ logZM∂µz
∂µz
∂µ\M= (z− µz)
>Σ−1z J>
Even if µz is a general function of µ\M and Σ\M, this must beignored. Otherwise: Inconsistent EP updates!1
1Deisenroth & Mohamed (arXiv preprint, 2012)Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 18
Inference in Time Series Models Relation to Smoothing
Interesting Side Effects
To minimize the KL divergence, EP updates require the derivatives
∂ logZM∂µ\M
,∂ logZM
∂Σ\M
The Gaussian approximation of ZM = p(z) ≈ N(µz, Σz
)is exact if
and only if there is a linear relationship between x and z, i.e.,
z = Jx , x ∼ N(µ\M, Σ\M
)for some J µz,Σz have a special form
Linearity must be explicitly encoded in the partial derivatives!
Example:
∂ logZM∂µ\M
=∂ logZM∂µz
∂µz
∂µ\M= (z− µz)
>Σ−1z J>
Even if µz is a general function of µ\M and Σ\M, this must beignored. Otherwise: Inconsistent EP updates!1
1Deisenroth & Mohamed (arXiv preprint, 2012)Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 18
Inference in Time Series Models Relation to Smoothing
Interesting Side Effects
To minimize the KL divergence, EP updates require the derivatives
∂ logZM∂µ\M
,∂ logZM
∂Σ\M
The Gaussian approximation of ZM = p(z) ≈ N(µz, Σz
)is exact if
and only if there is a linear relationship between x and z, i.e.,
z = Jx , x ∼ N(µ\M, Σ\M
)for some J µz,Σz have a special form
Linearity must be explicitly encoded in the partial derivatives!
Example:
∂ logZM∂µ\M
=∂ logZM∂µz
∂µz
∂µ\M= (z− µz)
>Σ−1z J>
Even if µz is a general function of µ\M and Σ\M, this must beignored. Otherwise: Inconsistent EP updates!1
1Deisenroth & Mohamed (arXiv preprint, 2012)Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 18
Inference in Time Series Models Relation to Smoothing
Interesting Side Effects
To minimize the KL divergence, EP updates require the derivatives
∂ logZM∂µ\M
,∂ logZM
∂Σ\M
The Gaussian approximation of ZM = p(z) ≈ N(µz, Σz
)is exact if
and only if there is a linear relationship between x and z, i.e.,
z = Jx , x ∼ N(µ\M, Σ\M
)for some J µz,Σz have a special form
Linearity must be explicitly encoded in the partial derivatives!
Example:
∂ logZM∂µ\M
=∂ logZM∂µz
∂µz
∂µ\M= (z− µz)
>Σ−1z J>
Even if µz is a general function of µ\M and Σ\M, this must beignored. Otherwise: Inconsistent EP updates!1
1Deisenroth & Mohamed (arXiv preprint, 2012)Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 18
Inference in Time Series Models Relation to Smoothing
Illustration: Toy Tracking Problem
5 10 15 20
−4
−2
0
2
4
Time step
Sta
te
Ground truthEKS
5 10 15 20
−4
−2
0
2
4
Time step
Sta
te
Ground truthEP−EKS
Iteratively improving the posteriors via EP can heal the the EKS
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 19
Inference in Time Series Models Relation to Smoothing
Illustration: Toy Tracking Problem
5 10 15 20
−4
−2
0
2
4
Time step
Sta
te
Ground truthEKS
5 10 15 20
−4
−2
0
2
4
Time stepS
tate
Ground truthEP−EKS
Iteratively improving the posteriors via EP can heal the the EKS
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 19
EP in Gaussian Process Dynamical Systems
Gaussian Process Dynamical Systems
xt−1 xt xt+1
zt−1 zt zt+1
xt = f(xt−1) + w , w ∼ N(0, Q
)zt = g(xt) + v , v ∼ N
(0, R
)State x (not observed)
Measurement/observation z
GP distribution p(f) over transition function f
GP distribution p(g) over measurement function g
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 20
EP in Gaussian Process Dynamical Systems Gaussian Processes
Gaussian Processes for Flexible Modeling
Non-parametric method flexible, i.e., shape of functionadapts to dataProbabilistic method consistently describes uncertaintiesabout the unknown functionSufficient: specification of high-level assumptions (e.g.,smoothness)Automatic trade-off between data-fit and complexity of thefunction (Occam’s razor)
−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10
−2
0
2
(xt−1
, ut−1
)
xt
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 21
EP in Gaussian Process Dynamical Systems Gaussian Processes
Gaussian Process Regression
Mathematically: Probability distribution over functions
Bayesian inference tractable:1 Specify high-level prior beliefs p(f) about the function (e.g.,
smoothness)2 Observe data X,y = f(X) + ε3 Compute posterior distribution p(f |X,y) over functions
Bayes’ theorem:
p(f |X,y) =p(y|X, f)p(f)
p(y|X)
p(f): Prior (over functions)p(y|X, f): Likelihood (noise model)p(f |X,y): Posterior (over functions)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 22
EP in Gaussian Process Dynamical Systems Gaussian Processes
Pictorial Introduction to Gaussian Processes
−5 0 5−3
−2
−1
0
1
2
3
x
f(x)
Prior belief about the function.
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 23
EP in Gaussian Process Dynamical Systems Gaussian Processes
Pictorial Introduction to Gaussian Processes
−5 0 5−3
−2
−1
0
1
2
3
x
f(x)
Observe some function values.
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 23
EP in Gaussian Process Dynamical Systems Gaussian Processes
Pictorial Introduction to Gaussian Processes
−5 0 5−3
−2
−1
0
1
2
3
x
f(x)
Posterior belief about the function.
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 23
EP in Gaussian Process Dynamical Systems Filtering/Smoothing in GPDS
Gaussian Process Dynamical Systems
xt−1 xt xt+1
zt−1 zt zt+1
xt = f(xt−1) + w , w ∼ N(0, Q
)zt = g(xt) + v , v ∼ N
(0, R
)GP distribution p(f) over transition function f
GP distribution p(g) over measurement function g
Let’s talk about inference in GPDSs
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 24
EP in Gaussian Process Dynamical Systems Filtering/Smoothing in GPDS
Inference in GPDS
−1 −0.5 0 0.5 1
∆t
−1 −0.5 0 0.5 10
1
(xt−1
, ut−1
)
p(x
t−1,
ut−
1)
0 1 2 3
∆t
p(∆t)
Objective: Gaussian approximations to the joints p(xt, zt|z1:t−1) andp(xt−1,xt|z1:t−1) sufficient for Gaussian filtering/smoothing2
Mapping distributions through a GP requires approximations, e.g.,Linearization of the posterior GP mean function (red)Moment matching (blue)
Filtering/smoothing in GPDS3: GP-EKS, GP-ADS, GP-CKS, ...
2Deisenroth & Ohlsson (ACC 2011)3Deisenroth et al. (ICML 2009), Deisenroth et al. (IEEE-TAC, 2012)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 25
EP in Gaussian Process Dynamical Systems Filtering/Smoothing in GPDS
Inference in GPDS
−1 −0.5 0 0.5 1
∆t
−1 −0.5 0 0.5 10
1
(xt−1
, ut−1
)
p(x
t−1,
ut−
1)
0 1 2 3
∆t
p(∆t)
Objective: Gaussian approximations to the joints p(xt, zt|z1:t−1) andp(xt−1,xt|z1:t−1) sufficient for Gaussian filtering/smoothing2
Mapping distributions through a GP requires approximations, e.g.,Linearization of the posterior GP mean function (red)Moment matching (blue)
Filtering/smoothing in GPDS3: GP-EKS, GP-ADS, GP-CKS, ...2Deisenroth & Ohlsson (ACC 2011)3Deisenroth et al. (ICML 2009), Deisenroth et al. (IEEE-TAC, 2012)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 25
EP in Gaussian Process Dynamical Systems Expectation Propagation in GPDS
EP in GPDS
Generalize single-sweep forward-backward smoothing in GPDSs to aniterative procedure using EP
Slightly more involved than EP in nonlinear systems (e.g., EP-EKS)Also have to average over function distribution (GP)
Key idea the same as before:Approximate the partition function by a Gaussian distribution 4
Linearization of the posterior mean function (e.g., Ko & Fox, 2009)EP-GPEKS
Moment matching (e.g., Quinonero-Candela et al., 2003)EP-GPADS
4Deisenroth & Mohamed (arXiv preprint, 2012)Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 26
EP in Gaussian Process Dynamical Systems Expectation Propagation in GPDS
EP in GPDS
Generalize single-sweep forward-backward smoothing in GPDSs to aniterative procedure using EP
Slightly more involved than EP in nonlinear systems (e.g., EP-EKS)Also have to average over function distribution (GP)
Key idea the same as before:Approximate the partition function by a Gaussian distribution 4
Linearization of the posterior mean function (e.g., Ko & Fox, 2009)EP-GPEKS
Moment matching (e.g., Quinonero-Candela et al., 2003)EP-GPADS
4Deisenroth & Mohamed (arXiv preprint, 2012)Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 26
Results
Results: Synthetic Data (1)
−5 0 5
−4
−2
0
2
4
x
f(x)
Ground truth
Training data
GP
Figure : GP model with training set and ground truth
xt+1 = 4 sin(4xt) + w , w ∼ N(0, 0.12
)zt = 4 sin(4xt) + v , v ∼ N
(0, 0.12
)Initial state distribution p(x1) = N
(0, 1
)very broad
30 training points for GP models, randomly selected
Tracking horizon: 20 time steps
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 27
Results
Results: Synthetic Data (2)
0 5 10 15
−4
−2
0
2
4
6
Time step
sta
te
True statePosterior state distribution (EP−GPADS)Posterior state distribution (GPADS)
(a) Posterior trajectories with confidencebounds.
5 10 15 20 25 30
−2
−1
0
1
2
EP iteration
Avera
ge N
LL p
er
data
poin
t
EP−GPADSGPADS
(b) Average NLL as a function of the EPiteration with standard error.
After convergence, the posterior is spot on (left)
Iterating EP greatly improves predictive power (right)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 28
Results
Results: Pendulum Tracking
PendulumMethod NLLx MAEx LPUx
GPEKS −0.29± 0.30 0.30± 0.02 −2.76± 0.12EP-GPEKS −0.24± 0.33 0.31± 0.02 −2.77± 0.12GPADS −0.75± 0.06 0.29± 0.02 −2.52± 0.06EP-GPADS −0.79± 0.06 0.29± 0.02 −2.58± 0.04
NLL: negative log likelihood predictive performanceMAE: mean absolute error error of the posterior meanLPU: log posterior uncertainty tightness of the posterior
Linearization-based inference: Variances too smallEP makes things worse
Moment-matching based inference: Coherent estimatesEP improves posterior
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 29
Results
Results: Motion Capture Data
10 trials of golf swings recorded at 40 Hz (mocap.cs.cmu.edu)
Observations z ∈ R56
Latent space x ∈ R3
7 training sequences, 3 test sequences
GPDS learning via GPDM approach (Wang et al., 2008)
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 30
Results
Results: Motion Capture Data
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 31
Results
Summary
General framework for iterative inference in dynamical systems
Key: Approximation of the partition function
Rederive classical filters/smoothers as a special case
Promising results in (GP)DS
http://www.ias.tu-darmstadt.de/Team/MarcDeisenroth
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 32
Results
References
[1] C. M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer-Verlag, 2006.
[2] M. P. Deisenroth, M. F. Huber, and U. D. Hanebeck. Analytic Moment-based Gaussian Process Filtering. In L. Bouttouand M. L. Littman, editors, Proceedings of the 26th International Conference on Machine Learning, pages 225–232,Montreal, QC, Canada, June 2009. Omnipress.
[3] M. P. Deisenroth and S. Mohamed. Expectation Propagation in Gaussian Process Dynamical Systems, July 2012.http://arxiv.org/abs/1207.2940.
[4] M. P. Deisenroth and H. Ohlsson. A General Perspective on Gaussian Filtering and Smoothing: Explaining Current andDeriving New Algorithms. In Proceedings of the American Control Conference, 2011.
[5] M. P. Deisenroth, R. Turner, M. Huber, U. D. Hanebeck, and C. E. Rasmussen. Robust Filtering and Smoothing withGaussian Processes. IEEE Transactions on Automatic Control, 57(7):1865–1871, 2012. doi:10.1109/TAC.2011.2179426.
[6] S. J. Julier and J. K. Uhlmann. A New Extension of the Kalman Filter to Nonlinear Systems. In Proceedings ofAeroSense: 11th Symposium on Aerospace/Defense Sensing, Simulation and Controls, pages 182–193, 1997.
[7] R. E. Kalman. A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME — Journal ofBasic Engineering, 82(Series D):35–45, 1960.
[8] J. Ko and D. Fox. GP-BayesFilters: Bayesian Filtering using Gaussian Process Prediction and Observation Models.Autonomous Robots, 27(1):75–90, July 2009.
[9] T. P. Minka. A Family of Algorithms for Approximate Bayesian Inference. PhD thesis, Massachusetts Institute ofTechnology, Cambridge, MA, USA, January 2001.
[10] J. Quinonero-Candela, A. Girard, J. Larsen, and C. E. Rasmussen. Propagation of Uncertainty in Bayesian KernelModels—Application to Multiple-Step Ahead Forecasting. In IEEE International Conference on Acoustics, Speech andSignal Processing, volume 2, pages 701–704, April 2003.
[11] J. M. Wang, D. J. Fleet, and A. Hertzmann. Gaussian Process Dynamical Models for Human Motion. IEEE Transactionson Pattern Analysis and Machine Intelligence, 30(2):283–298, 2008.
Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 33