expectation propagation in dynamical systems · 8/10/2012  · linear systems: kalman lter/smoother...

56
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1

Upload: others

Post on 22-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Expectation Propagation in Dynamical Systems

Marc Peter Deisenroth

Joint Work with Shakir Mohamed (UBC)

August 10, 2012

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1

Page 2: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Motivation

Figure : Complex time series: motion capture, GDP, climate

Time series in economics, robotics, motion capture, etc. haveunknown dynamical structure, are high-dimensional and noisy

Flexible and accurate modelsNonlinear (Gaussian process) dynamical systems (GPDS)

Accurate inference in (GP)DS important forBetter knowledge about latent structuresParameter learning

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 2

Page 3: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Outline

1 Inference in Time Series ModelsFiltering and SmoothingExpectation PropagationApproximating the Partition FunctionRelation to Smoothing

2 EP in Gaussian Process Dynamical SystemsGaussian ProcessesFiltering/Smoothing in GPDSExpectation Propagation in GPDS

3 Results

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 3

Page 4: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Filtering and Smoothing

Time Series Models

xt−1 xt xt+1

zt−1 zt zt+1

xt = f(xt−1) + w , w ∼ N(0, Q

)zt = g(xt) + v , v ∼ N

(0, R

)Latent state x ∈ RD

Measurement/observation z ∈ RE

Transition function f

Measurement function g

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 4

Page 5: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Filtering and Smoothing

Inference in Time Series Models

xt−1 xt xt+1

zt−1 zt zt+1

Objective: Posterior distribution over latent variables xtFiltering (Forward Inference)Compute p(xt|z1:t) for t = 1, . . . , TSmoothing (Forward-Backward Inference)Compute p(xt|z1:t) for t = 1, . . . , T (forward sweep)Compute p(xt|z1:T ) for t = T, . . . , 1 (backward sweep)

Examples:

Linear systems: Kalman filter/smoother (Kalman, 1959)Nonlinear systems: Approximate inference

Extended Kalman Filter/Smoother (Kalman, 1959–1961)Unscented Kalman Filter/Smoother (Julier & Uhlmann, 1997)

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 5

Page 6: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Filtering and Smoothing

Inference in Time Series Models

xt−1 xt xt+1

zt−1 zt zt+1

Objective: Posterior distribution over latent variables xtFiltering (Forward Inference)Compute p(xt|z1:t) for t = 1, . . . , TSmoothing (Forward-Backward Inference)Compute p(xt|z1:t) for t = 1, . . . , T (forward sweep)Compute p(xt|z1:T ) for t = T, . . . , 1 (backward sweep)

Examples:

Linear systems: Kalman filter/smoother (Kalman, 1959)Nonlinear systems: Approximate inference

Extended Kalman Filter/Smoother (Kalman, 1959–1961)Unscented Kalman Filter/Smoother (Julier & Uhlmann, 1997)

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 5

Page 7: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Filtering and Smoothing

Machine Learning Perspective

xt−1 xt xt+1

zt−1 zt zt+1

Treat filtering/smoothing as an inference problem in graphicalmodels with hidden variables

Allows for efficient local message passing distributed

Messages are unnormalized probability distributions

Iterative refinement of the posterior marginals p(xt), t = 1, . . . , TMultiple forward-backward sweeps until global consistency

(convergence)

Here: Expectation Propagation (Minka 2001)

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 6

Page 8: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Expectation Propagation

Expectation Propagation

xt−1 xt xt+1

zt−1 zt zt+1

xt xt+1

p(xt+1|xt)

p(zt|xt) p(zt+1|xt+1)

Inference in factor graphs

p(xt) =∏n

i=1 ti(xt)

q(xt) =∏n

i=1 ti(xt)

Approximate factors ti are members of the Exponential Family(e.g., Multinomial, Gamma, Gaussian)

Find good a good approximation such that q ≈ p

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 7

Page 9: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Expectation Propagation

Expectation Propagation

xt−1 xt xt+1

zt−1 zt zt+1

xt xt+1

p(xt+1|xt)

p(zt|xt) p(zt+1|xt+1)

Inference in factor graphs

p(xt) =∏n

i=1 ti(xt)

q(xt) =∏n

i=1 ti(xt)

Approximate factors ti are members of the Exponential Family(e.g., Multinomial, Gamma, Gaussian)

Find good a good approximation such that q ≈ p

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 7

Page 10: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Expectation Propagation

Expectation Propagation

Figure : Moment matching vs. mode matching. Borrowed from Bishop (2006)

EP locally minimizes KL(p||q), where p is the true distribution and qis an approximation (from Exponential Family) to it.

EP = moment matching (unlike Variational Bayes [“modematching”], which minimizes KL(q||p))

EP exploits properties of the Exponential Family: Compute momentsof distributions via derivatives of the log-partition function

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 8

Page 11: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Expectation Propagation

Expectation Propagation

qB(xt) xt

qM(xt)

xt+1qC(xt+1)

qM(xt+1)

p(xt+1|xt)

p(zt|xt) p(zt+1|xt+1)

qB(xt)xt

qM(xt)

xt+1

qC(xt+1)

qM(xt+1)

qB(xt+1)qC(xt)

Figure : Factor graph (left) and fully factored factor graph (right).

Write down the (fully factored) factor graph

p(xt) =∏n

i=1 ti(xt)

q(xt) =∏n

i=1 ti(xt)

Find approximate ti, such that KL(p||q) is minimized.

Multiple sweeps through graph until global consistency of themessages is assured

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 9

Page 12: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Expectation Propagation

Expectation Propagation

qB(xt) xt

qM(xt)

xt+1qC(xt+1)

qM(xt+1)

p(xt+1|xt)

p(zt|xt) p(zt+1|xt+1)

qB(xt)xt

qM(xt)

xt+1

qC(xt+1)

qM(xt+1)

qB(xt+1)qC(xt)

Figure : Factor graph (left) and fully factored factor graph (right).

Write down the (fully factored) factor graph

p(xt) =∏n

i=1 ti(xt)

q(xt) =∏n

i=1 ti(xt)

Find approximate ti, such that KL(p||q) is minimized.

Multiple sweeps through graph until global consistency of themessages is assured

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 9

Page 13: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Expectation Propagation

Messages in a Dynamical System

qB(xt)xt

qM(xt)

xt+1

qC(xt+1)

qM(xt+1)

qB(xt+1)qC(xt)

Approximate (factored) marginal: q(xt) =∏

i ti(xt)

Here, our messages ti have names:

Measurement message qMForward message qBBackward message qC

Define cavity distribution: q\i(xt) = q(xt)/ti(xt) =∏

k 6=i tk(xt)

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 10

Page 14: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Expectation Propagation

Gaussian EP in More Detail

qB(xt)xt

qM(xt)

xt+1

qC(xt+1)

qM(xt+1)

qB(xt+1)qC(xt)

1 Write down the factor graph

2 Initialize all messages ti, i = M,B,CUntil convergence:

3 For all latent variables xt and corresponding messages ti(xt) do

1 Compute the cavity distribution q\i(xt) = N(xt |µ\i

t , Σ\it

)by

Gaussian division.2 Compute the moments of ti(xt)q

\i(xt)Updated moments of q(xt)

3 Compute updated message

ti(xt) = q(xt)/q\i(xt)

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Page 15: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Expectation Propagation

Gaussian EP in More Detail

qB(xt)xt

qM(xt)

xt+1

qC(xt+1)

qM(xt+1)

qB(xt+1)qC(xt)

1 Write down the factor graph

2 Initialize all messages ti, i = M,B,CUntil convergence:

3 For all latent variables xt and corresponding messages ti(xt) do

1 Compute the cavity distribution q\i(xt) = N(xt |µ\i

t , Σ\it

)by

Gaussian division.2 Compute the moments of ti(xt)q

\i(xt)Updated moments of q(xt)

3 Compute updated message

ti(xt) = q(xt)/q\i(xt)

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Page 16: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Expectation Propagation

Gaussian EP in More Detail

qB(xt)xt

qM(xt)

xt+1

qC(xt+1)

qM(xt+1)

qB(xt+1)qC(xt)

1 Write down the factor graph

2 Initialize all messages ti, i = M,B,CUntil convergence:

3 For all latent variables xt and corresponding messages ti(xt) do

1 Compute the cavity distribution q\i(xt) = N(xt |µ\i

t , Σ\it

)by

Gaussian division.2 Compute the moments of ti(xt)q

\i(xt)Updated moments of q(xt)

3 Compute updated message

ti(xt) = q(xt)/q\i(xt)

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Page 17: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Expectation Propagation

Gaussian EP in More Detail

qB(xt)xt

qM(xt)

xt+1

qC(xt+1)

qM(xt+1)

qB(xt+1)qC(xt)

1 Write down the factor graph

2 Initialize all messages ti, i = M,B,CUntil convergence:

3 For all latent variables xt and corresponding messages ti(xt) do

1 Compute the cavity distribution q\i(xt) = N(xt |µ\i

t , Σ\it

)by

Gaussian division.

2 Compute the moments of ti(xt)q\i(xt)

Updated moments of q(xt)3 Compute updated message

ti(xt) = q(xt)/q\i(xt)

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Page 18: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Expectation Propagation

Gaussian EP in More Detail

qB(xt)xt

qM(xt)

xt+1

qC(xt+1)

qM(xt+1)

qB(xt+1)qC(xt)

1 Write down the factor graph

2 Initialize all messages ti, i = M,B,CUntil convergence:

3 For all latent variables xt and corresponding messages ti(xt) do

1 Compute the cavity distribution q\i(xt) = N(xt |µ\i

t , Σ\it

)by

Gaussian division.2 Compute the moments of ti(xt)q

\i(xt)Updated moments of q(xt)

3 Compute updated message

ti(xt) = q(xt)/q\i(xt)

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Page 19: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Expectation Propagation

Gaussian EP in More Detail

qB(xt)xt

qM(xt)

xt+1

qC(xt+1)

qM(xt+1)

qB(xt+1)qC(xt)

1 Write down the factor graph

2 Initialize all messages ti, i = M,B,CUntil convergence:

3 For all latent variables xt and corresponding messages ti(xt) do

1 Compute the cavity distribution q\i(xt) = N(xt |µ\i

t , Σ\it

)by

Gaussian division.2 Compute the moments of ti(xt)q

\i(xt)Updated moments of q(xt)

3 Compute updated message

ti(xt) = q(xt)/q\i(xt)

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 11

Page 20: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Expectation Propagation

Updating the Measurement Message

qB(xt) xt

qM(xt)

qC(xt)

Measurement message

qM(xt) =proj[

true factor︷ ︸︸ ︷tM(xt)

cavity distr.︷ ︸︸ ︷q\M(xt) ]

q\M(xt)

The proj[.] operator projects onto Exponential Family distributionsImplemented by taking derivatives of the log partition

function logZM, where

ZM =

∫tM(xt)q

\M(xt)dxt , tM(xt) = p(zt|xt)

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 12

Page 21: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Expectation Propagation

Updating in Context: Forward Message

qB(xt) xt

qM(xt)

xt+1qC(xt+1)

qM(xt+1)

p(xt+1|xt) qB(xt) xt

qM(xt)

xt+1qC(xt+1)

qM(xt+1)

qB(xt+1)qC(xt)

Forward message Need to take the coupling between xt and xt+1

into account (lost when writing down the fully factored factor graph).

Key insight: Want a close approximation

qC(xt+1)qM(xt+1)︸ ︷︷ ︸context q\B(xt+1)

qB(xt+1) ≈ q\B(xt+1)

∫p(xt+1|xt)qB(xt)qM(xt)dxt

Achieve this by projection

qB(xt+1) =proj[

cavity distr.︷ ︸︸ ︷q\B(xt+1)

true factor︷ ︸︸ ︷tB(xt+1)]

q\B(xt+1),

tB(xt+1) =

∫p(xt+1|xt)qB(xt)qM(xt)dxt

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 13

Page 22: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Expectation Propagation

Updating in Context: Forward Message

qB(xt) xt

qM(xt)

xt+1qC(xt+1)

qM(xt+1)

p(xt+1|xt) qB(xt) xt

qM(xt)

xt+1qC(xt+1)

qM(xt+1)

qB(xt+1)qC(xt)

Forward message Need to take the coupling between xt and xt+1

into account (lost when writing down the fully factored factor graph).Key insight: Want a close approximation

qC(xt+1)qM(xt+1)︸ ︷︷ ︸context q\B(xt+1)

qB(xt+1) ≈ q\B(xt+1)

∫p(xt+1|xt)qB(xt)qM(xt)dxt

Achieve this by projection

qB(xt+1) =proj[

cavity distr.︷ ︸︸ ︷q\B(xt+1)

true factor︷ ︸︸ ︷tB(xt+1)]

q\B(xt+1),

tB(xt+1) =

∫p(xt+1|xt)qB(xt)qM(xt)dxt

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 13

Page 23: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Expectation Propagation

Updating in Context: Forward Message

qB(xt) xt

qM(xt)

xt+1qC(xt+1)

qM(xt+1)

p(xt+1|xt) qB(xt) xt

qM(xt)

xt+1qC(xt+1)

qM(xt+1)

qB(xt+1)qC(xt)

Forward message Need to take the coupling between xt and xt+1

into account (lost when writing down the fully factored factor graph).Key insight: Want a close approximation

qC(xt+1)qM(xt+1)︸ ︷︷ ︸context q\B(xt+1)

qB(xt+1) ≈ q\B(xt+1)

∫p(xt+1|xt)qB(xt)qM(xt)dxt

Achieve this by projection

qB(xt+1) =proj[

cavity distr.︷ ︸︸ ︷q\B(xt+1)

true factor︷ ︸︸ ︷tB(xt+1)]

q\B(xt+1),

tB(xt+1) =

∫p(xt+1|xt)qB(xt)qM(xt)dxt

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 13

Page 24: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Approximating the Partition Function

Key Points and Challenge

EP is based on matching the moments of ti(xt)q\i(xt)

Computing the partition function

Zi(µ\it ,Σ

\it ) =

∫ti(xt)q

\i(xt)dxt

and its derivatives with respect to µ\it and Σ

\it are sufficient for EP

Properties of the Exponential Family

Tricky part: Integral not solvable for nonlinear systems withcontinuous variables

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 14

Page 25: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Approximating the Partition Function

Key Points and Challenge

EP is based on matching the moments of ti(xt)q\i(xt)

Computing the partition function

Zi(µ\it ,Σ

\it ) =

∫ti(xt)q

\i(xt)dxt

and its derivatives with respect to µ\it and Σ

\it are sufficient for EP

Properties of the Exponential Family

Tricky part: Integral not solvable for nonlinear systems withcontinuous variables

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 14

Page 26: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Approximating the Partition Function

Approach

Interpretation of partition function Zi as a probability distribution.Example: Measurement message

ZM =

∫tM(x)q

\M(x)dx =

∫p(z|x)q\M(x)dx

= p(z)

Idea: Approximate p(z) by a (Gaussian) distribution ZM

Take the derivatives of log ZM with respect to the moments of thecavity distribution

Get updated moments for the posterior and the messagesFixes the intractability problems, but we are no longer exact

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 15

Page 27: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Approximating the Partition Function

Approach

Interpretation of partition function Zi as a probability distribution.Example: Measurement message

ZM =

∫tM(x)q

\M(x)dx =

∫p(z|x)q\M(x)dx

= p(z)

Idea: Approximate p(z) by a (Gaussian) distribution ZM

Take the derivatives of log ZM with respect to the moments of thecavity distribution

Get updated moments for the posterior and the messagesFixes the intractability problems, but we are no longer exact

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 15

Page 28: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Approximating the Partition Function

Approach

Interpretation of partition function Zi as a probability distribution.Example: Measurement message

ZM =

∫tM(x)q

\M(x)dx =

∫p(z|x)q\M(x)dx

= p(z)

Idea: Approximate p(z) by a (Gaussian) distribution ZM

Take the derivatives of log ZM with respect to the moments of thecavity distribution

Get updated moments for the posterior and the messagesFixes the intractability problems, but we are no longer exact

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 15

Page 29: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Approximating the Partition Function

Possible Gaussian Approximations

Example: Measurement message

ZM =

∫tM(x)q

\M(x)dx =

∫tM(x)N

(x |µ\M, Σ\M

)dx

tM(x) = N(z | g(x), S

)

Linearize g at µ\M integral tractable

Gaussian moment matching: compute mean and variance of ZMapproximate ZM by a Gaussian with the correct mean/variance

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 16

Page 30: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Approximating the Partition Function

Possible Gaussian Approximations

Example: Measurement message

ZM =

∫tM(x)q

\M(x)dx =

∫tM(x)N

(x |µ\M, Σ\M

)dx

tM(x) = N(z | g(x), S

)Linearize g at µ\M integral tractable

Gaussian moment matching: compute mean and variance of ZMapproximate ZM by a Gaussian with the correct mean/variance

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 16

Page 31: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Relation to Smoothing

Theoretical Results

ZM =

∫tM(x)q

\M(x)dx =

∫tM(x)N

(x |µ\M, Σ\M

)dx

tM(x) = N(z | g(x), S

)Relation to Common Filters/Smoothers

Approximating ZM by a Gaussian ZM is equivalent to approximatingp(x, z) by a Gaussian—an approximation that is common to almost allfiltering algorithmsa

aDeisenroth & Ohlsson (ACC 2011)

Generalizing Common Smoothers

Linearizing g(x) in ZM generalizes the EKS to an iterative procedure

Moment matching generalizes the ADS to an iterative procedure

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 17

Page 32: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Relation to Smoothing

Theoretical Results

ZM =

∫tM(x)q

\M(x)dx =

∫tM(x)N

(x |µ\M, Σ\M

)dx

tM(x) = N(z | g(x), S

)Relation to Common Filters/Smoothers

Approximating ZM by a Gaussian ZM is equivalent to approximatingp(x, z) by a Gaussian—an approximation that is common to almost allfiltering algorithmsa

aDeisenroth & Ohlsson (ACC 2011)

Generalizing Common Smoothers

Linearizing g(x) in ZM generalizes the EKS to an iterative procedure

Moment matching generalizes the ADS to an iterative procedure

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 17

Page 33: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Relation to Smoothing

Interesting Side Effects

To minimize the KL divergence, EP updates require the derivatives

∂ logZM∂µ\M

,∂ logZM

∂Σ\M

The Gaussian approximation of ZM = p(z) ≈ N(µz, Σz

)is exact if

and only if there is a linear relationship between x and z, i.e.,

z = Jx , x ∼ N(µ\M, Σ\M

)for some J µz,Σz have a special form

Linearity must be explicitly encoded in the partial derivatives!

Example:

∂ logZM∂µ\M

=∂ logZM∂µz

∂µz

∂µ\M= (z− µz)

>Σ−1z J>

Even if µz is a general function of µ\M and Σ\M, this must beignored. Otherwise: Inconsistent EP updates!1

1Deisenroth & Mohamed (arXiv preprint, 2012)Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 18

Page 34: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Relation to Smoothing

Interesting Side Effects

To minimize the KL divergence, EP updates require the derivatives

∂ logZM∂µ\M

,∂ logZM

∂Σ\M

The Gaussian approximation of ZM = p(z) ≈ N(µz, Σz

)is exact if

and only if there is a linear relationship between x and z, i.e.,

z = Jx , x ∼ N(µ\M, Σ\M

)for some J µz,Σz have a special form

Linearity must be explicitly encoded in the partial derivatives!

Example:

∂ logZM∂µ\M

=∂ logZM∂µz

∂µz

∂µ\M= (z− µz)

>Σ−1z J>

Even if µz is a general function of µ\M and Σ\M, this must beignored. Otherwise: Inconsistent EP updates!1

1Deisenroth & Mohamed (arXiv preprint, 2012)Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 18

Page 35: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Relation to Smoothing

Interesting Side Effects

To minimize the KL divergence, EP updates require the derivatives

∂ logZM∂µ\M

,∂ logZM

∂Σ\M

The Gaussian approximation of ZM = p(z) ≈ N(µz, Σz

)is exact if

and only if there is a linear relationship between x and z, i.e.,

z = Jx , x ∼ N(µ\M, Σ\M

)for some J µz,Σz have a special form

Linearity must be explicitly encoded in the partial derivatives!

Example:

∂ logZM∂µ\M

=∂ logZM∂µz

∂µz

∂µ\M= (z− µz)

>Σ−1z J>

Even if µz is a general function of µ\M and Σ\M, this must beignored. Otherwise: Inconsistent EP updates!1

1Deisenroth & Mohamed (arXiv preprint, 2012)Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 18

Page 36: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Relation to Smoothing

Interesting Side Effects

To minimize the KL divergence, EP updates require the derivatives

∂ logZM∂µ\M

,∂ logZM

∂Σ\M

The Gaussian approximation of ZM = p(z) ≈ N(µz, Σz

)is exact if

and only if there is a linear relationship between x and z, i.e.,

z = Jx , x ∼ N(µ\M, Σ\M

)for some J µz,Σz have a special form

Linearity must be explicitly encoded in the partial derivatives!

Example:

∂ logZM∂µ\M

=∂ logZM∂µz

∂µz

∂µ\M= (z− µz)

>Σ−1z J>

Even if µz is a general function of µ\M and Σ\M, this must beignored. Otherwise: Inconsistent EP updates!1

1Deisenroth & Mohamed (arXiv preprint, 2012)Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 18

Page 37: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Relation to Smoothing

Illustration: Toy Tracking Problem

5 10 15 20

−4

−2

0

2

4

Time step

Sta

te

Ground truthEKS

5 10 15 20

−4

−2

0

2

4

Time step

Sta

te

Ground truthEP−EKS

Iteratively improving the posteriors via EP can heal the the EKS

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 19

Page 38: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Inference in Time Series Models Relation to Smoothing

Illustration: Toy Tracking Problem

5 10 15 20

−4

−2

0

2

4

Time step

Sta

te

Ground truthEKS

5 10 15 20

−4

−2

0

2

4

Time stepS

tate

Ground truthEP−EKS

Iteratively improving the posteriors via EP can heal the the EKS

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 19

Page 39: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

EP in Gaussian Process Dynamical Systems

Gaussian Process Dynamical Systems

xt−1 xt xt+1

zt−1 zt zt+1

xt = f(xt−1) + w , w ∼ N(0, Q

)zt = g(xt) + v , v ∼ N

(0, R

)State x (not observed)

Measurement/observation z

GP distribution p(f) over transition function f

GP distribution p(g) over measurement function g

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 20

Page 40: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

EP in Gaussian Process Dynamical Systems Gaussian Processes

Gaussian Processes for Flexible Modeling

Non-parametric method flexible, i.e., shape of functionadapts to dataProbabilistic method consistently describes uncertaintiesabout the unknown functionSufficient: specification of high-level assumptions (e.g.,smoothness)Automatic trade-off between data-fit and complexity of thefunction (Occam’s razor)

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10

−2

0

2

(xt−1

, ut−1

)

xt

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 21

Page 41: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

EP in Gaussian Process Dynamical Systems Gaussian Processes

Gaussian Process Regression

Mathematically: Probability distribution over functions

Bayesian inference tractable:1 Specify high-level prior beliefs p(f) about the function (e.g.,

smoothness)2 Observe data X,y = f(X) + ε3 Compute posterior distribution p(f |X,y) over functions

Bayes’ theorem:

p(f |X,y) =p(y|X, f)p(f)

p(y|X)

p(f): Prior (over functions)p(y|X, f): Likelihood (noise model)p(f |X,y): Posterior (over functions)

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 22

Page 42: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

EP in Gaussian Process Dynamical Systems Gaussian Processes

Pictorial Introduction to Gaussian Processes

−5 0 5−3

−2

−1

0

1

2

3

x

f(x)

Prior belief about the function.

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 23

Page 43: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

EP in Gaussian Process Dynamical Systems Gaussian Processes

Pictorial Introduction to Gaussian Processes

−5 0 5−3

−2

−1

0

1

2

3

x

f(x)

Observe some function values.

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 23

Page 44: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

EP in Gaussian Process Dynamical Systems Gaussian Processes

Pictorial Introduction to Gaussian Processes

−5 0 5−3

−2

−1

0

1

2

3

x

f(x)

Posterior belief about the function.

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 23

Page 45: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

EP in Gaussian Process Dynamical Systems Filtering/Smoothing in GPDS

Gaussian Process Dynamical Systems

xt−1 xt xt+1

zt−1 zt zt+1

xt = f(xt−1) + w , w ∼ N(0, Q

)zt = g(xt) + v , v ∼ N

(0, R

)GP distribution p(f) over transition function f

GP distribution p(g) over measurement function g

Let’s talk about inference in GPDSs

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 24

Page 46: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

EP in Gaussian Process Dynamical Systems Filtering/Smoothing in GPDS

Inference in GPDS

−1 −0.5 0 0.5 1

∆t

−1 −0.5 0 0.5 10

1

(xt−1

, ut−1

)

p(x

t−1,

ut−

1)

0 1 2 3

∆t

p(∆t)

Objective: Gaussian approximations to the joints p(xt, zt|z1:t−1) andp(xt−1,xt|z1:t−1) sufficient for Gaussian filtering/smoothing2

Mapping distributions through a GP requires approximations, e.g.,Linearization of the posterior GP mean function (red)Moment matching (blue)

Filtering/smoothing in GPDS3: GP-EKS, GP-ADS, GP-CKS, ...

2Deisenroth & Ohlsson (ACC 2011)3Deisenroth et al. (ICML 2009), Deisenroth et al. (IEEE-TAC, 2012)

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 25

Page 47: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

EP in Gaussian Process Dynamical Systems Filtering/Smoothing in GPDS

Inference in GPDS

−1 −0.5 0 0.5 1

∆t

−1 −0.5 0 0.5 10

1

(xt−1

, ut−1

)

p(x

t−1,

ut−

1)

0 1 2 3

∆t

p(∆t)

Objective: Gaussian approximations to the joints p(xt, zt|z1:t−1) andp(xt−1,xt|z1:t−1) sufficient for Gaussian filtering/smoothing2

Mapping distributions through a GP requires approximations, e.g.,Linearization of the posterior GP mean function (red)Moment matching (blue)

Filtering/smoothing in GPDS3: GP-EKS, GP-ADS, GP-CKS, ...2Deisenroth & Ohlsson (ACC 2011)3Deisenroth et al. (ICML 2009), Deisenroth et al. (IEEE-TAC, 2012)

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 25

Page 48: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

EP in Gaussian Process Dynamical Systems Expectation Propagation in GPDS

EP in GPDS

Generalize single-sweep forward-backward smoothing in GPDSs to aniterative procedure using EP

Slightly more involved than EP in nonlinear systems (e.g., EP-EKS)Also have to average over function distribution (GP)

Key idea the same as before:Approximate the partition function by a Gaussian distribution 4

Linearization of the posterior mean function (e.g., Ko & Fox, 2009)EP-GPEKS

Moment matching (e.g., Quinonero-Candela et al., 2003)EP-GPADS

4Deisenroth & Mohamed (arXiv preprint, 2012)Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 26

Page 49: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

EP in Gaussian Process Dynamical Systems Expectation Propagation in GPDS

EP in GPDS

Generalize single-sweep forward-backward smoothing in GPDSs to aniterative procedure using EP

Slightly more involved than EP in nonlinear systems (e.g., EP-EKS)Also have to average over function distribution (GP)

Key idea the same as before:Approximate the partition function by a Gaussian distribution 4

Linearization of the posterior mean function (e.g., Ko & Fox, 2009)EP-GPEKS

Moment matching (e.g., Quinonero-Candela et al., 2003)EP-GPADS

4Deisenroth & Mohamed (arXiv preprint, 2012)Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 26

Page 50: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Results

Results: Synthetic Data (1)

−5 0 5

−4

−2

0

2

4

x

f(x)

Ground truth

Training data

GP

Figure : GP model with training set and ground truth

xt+1 = 4 sin(4xt) + w , w ∼ N(0, 0.12

)zt = 4 sin(4xt) + v , v ∼ N

(0, 0.12

)Initial state distribution p(x1) = N

(0, 1

)very broad

30 training points for GP models, randomly selected

Tracking horizon: 20 time steps

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 27

Page 51: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Results

Results: Synthetic Data (2)

0 5 10 15

−4

−2

0

2

4

6

Time step

sta

te

True statePosterior state distribution (EP−GPADS)Posterior state distribution (GPADS)

(a) Posterior trajectories with confidencebounds.

5 10 15 20 25 30

−2

−1

0

1

2

EP iteration

Avera

ge N

LL p

er

data

poin

t

EP−GPADSGPADS

(b) Average NLL as a function of the EPiteration with standard error.

After convergence, the posterior is spot on (left)

Iterating EP greatly improves predictive power (right)

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 28

Page 52: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Results

Results: Pendulum Tracking

PendulumMethod NLLx MAEx LPUx

GPEKS −0.29± 0.30 0.30± 0.02 −2.76± 0.12EP-GPEKS −0.24± 0.33 0.31± 0.02 −2.77± 0.12GPADS −0.75± 0.06 0.29± 0.02 −2.52± 0.06EP-GPADS −0.79± 0.06 0.29± 0.02 −2.58± 0.04

NLL: negative log likelihood predictive performanceMAE: mean absolute error error of the posterior meanLPU: log posterior uncertainty tightness of the posterior

Linearization-based inference: Variances too smallEP makes things worse

Moment-matching based inference: Coherent estimatesEP improves posterior

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 29

Page 53: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Results

Results: Motion Capture Data

10 trials of golf swings recorded at 40 Hz (mocap.cs.cmu.edu)

Observations z ∈ R56

Latent space x ∈ R3

7 training sequences, 3 test sequences

GPDS learning via GPDM approach (Wang et al., 2008)

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 30

Page 54: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Results

Results: Motion Capture Data

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 31

Page 55: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Results

Summary

General framework for iterative inference in dynamical systems

Key: Approximation of the partition function

Rederive classical filters/smoothers as a special case

Promising results in (GP)DS

[email protected]

http://www.ias.tu-darmstadt.de/Team/MarcDeisenroth

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 32

Page 56: Expectation Propagation in Dynamical Systems · 8/10/2012  · Linear systems: Kalman lter/smoother (Kalman, 1959) Nonlinear systems: Approximate inference Extended Kalman Filter/Smoother

Results

References

[1] C. M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer-Verlag, 2006.

[2] M. P. Deisenroth, M. F. Huber, and U. D. Hanebeck. Analytic Moment-based Gaussian Process Filtering. In L. Bouttouand M. L. Littman, editors, Proceedings of the 26th International Conference on Machine Learning, pages 225–232,Montreal, QC, Canada, June 2009. Omnipress.

[3] M. P. Deisenroth and S. Mohamed. Expectation Propagation in Gaussian Process Dynamical Systems, July 2012.http://arxiv.org/abs/1207.2940.

[4] M. P. Deisenroth and H. Ohlsson. A General Perspective on Gaussian Filtering and Smoothing: Explaining Current andDeriving New Algorithms. In Proceedings of the American Control Conference, 2011.

[5] M. P. Deisenroth, R. Turner, M. Huber, U. D. Hanebeck, and C. E. Rasmussen. Robust Filtering and Smoothing withGaussian Processes. IEEE Transactions on Automatic Control, 57(7):1865–1871, 2012. doi:10.1109/TAC.2011.2179426.

[6] S. J. Julier and J. K. Uhlmann. A New Extension of the Kalman Filter to Nonlinear Systems. In Proceedings ofAeroSense: 11th Symposium on Aerospace/Defense Sensing, Simulation and Controls, pages 182–193, 1997.

[7] R. E. Kalman. A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME — Journal ofBasic Engineering, 82(Series D):35–45, 1960.

[8] J. Ko and D. Fox. GP-BayesFilters: Bayesian Filtering using Gaussian Process Prediction and Observation Models.Autonomous Robots, 27(1):75–90, July 2009.

[9] T. P. Minka. A Family of Algorithms for Approximate Bayesian Inference. PhD thesis, Massachusetts Institute ofTechnology, Cambridge, MA, USA, January 2001.

[10] J. Quinonero-Candela, A. Girard, J. Larsen, and C. E. Rasmussen. Propagation of Uncertainty in Bayesian KernelModels—Application to Multiple-Step Ahead Forecasting. In IEEE International Conference on Acoustics, Speech andSignal Processing, volume 2, pages 701–704, April 2003.

[11] J. M. Wang, D. J. Fleet, and A. Hertzmann. Gaussian Process Dynamical Models for Human Motion. IEEE Transactionson Pattern Analysis and Machine Intelligence, 30(2):283–298, 2008.

Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 33