ca07 gpforsdes talk

18
Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London [email protected] CSML 2007 Reading Group on SDEs Joint work with Manfred Opper (TU Berlin), John Shawe-Taylor (UCL) and Dan Cornford (Aston).

Upload: jhon

Post on 06-Jul-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 1/18

Gaussian Process Approximations

of Stochastic Differential Equations

Cédric Archambeau

Centre for Computational Statistics and Machine Learning

University College London

[email protected] 

CSML 2007 Reading Group on SDEsJoint work with Manfred Opper (TU Berlin), John Shawe-Taylor (UCL) and Dan Cornford (Aston).

Page 2: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 2/18

C. Archambeau (CSML)

C. Archambeau GP Approximations of SDEs

Context: numerical weather prediction (data assimilation)

• Model equations for the atmosphere are ordinary differential equations

(deterministic)

• Stochasticity arises from:• Dynamics simplifications

• Discretisation of continuous process

• Physics (unknown parameters)

• Observations are satellite radiances (indirect, possibly non-linear)

• Measurement error

• Decision making must take the uncertainty into account!

Page 3: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 3/18

C. Archambeau (CSML)

C. Archambeau GP Approximations of SDEs

Issues with the current methods

Double-well potential Ensemble Kalman smoother

(Eyink, et al., 2002)

Page 4: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 4/18

C. Archambeau (CSML)

C. Archambeau GP Approximations of SDEs

Motivation

Data Assimilation:

• Improve current prediction tools• Current methods cannot exploit large data sets

• Current methods fail on (highly) non-linear data

• Parameters are unknown (model, noise, observation operator)

Machine Learning:

• Deal with uncertainty in a principled way

• Discover optimal data representation:• find sparse solution

• learn the kernel

• …

• Deal with non-linear data

• Deal with huge amount of data

• Deal with high dimensional data

• Deal with large noise

Page 5: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 5/18

C. Archambeau (CSML)

C. Archambeau GP Approximations of SDEs

Outline

• Diffusion processes• Gaussian approximation of the prior process

• Approximate posterior process after observing the data

• Smoothing algorithm (E-step)

• Parameter estimation (M-step)

• Conclusion

Page 6: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 6/18

C. Archambeau (CSML)

C. Archambeau GP Approximations of SDEs

Continuous-time continuous-state stochastic processes

• Stochastic differential equation for a diffusion process:

• To be interpreted as an (Ito) stochastic integral:

Wiener Process:

• Almost surely continuous

• Almost surely non-differentiable

Page 7: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 7/18

C. Archambeau (CSML)

C. Archambeau GP Approximations of SDEs

Describing diffusion processes

Fokker-Planck equation:

• Time evolution of the transition density• Drift: instantaneous rate of change of the mean

• Diffusion: instantaneous rate of change of squared fluctuations

Stochastic differential equation:• Depends on same drift and diffusion terms

Alternative representation of the transition density• (Non-)linear SDE leads to (non-)Gaussian transition density

Kernel representation:• Captures correlations between data

• Particular SDE induces a specific (two-time) kernel

Time varying approximation of an SDE is equivalent to learning the kernel(hot topic!)

Page 8: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 8/18

C. Archambeau (CSML)C. Archambeau GP Approximations of SDEs

Gaussian approximation of the non-Gaussian process

• Time varying linear approximation of the drift:

• Gaussian marginal density:

• Time evolution of the means and the covariances (consistency):

(see for example second CSML reading group on SDEs) 

Page 9: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 9/18

C. Archambeau (CSML)C. Archambeau GP Approximations of SDEs

Optimal (Gaussian) approximation of the prior process

• Euler-Maruyama discrete approximation:

• Probability density of a discrete-time sample path:

• Optimal prior process:

… path integral Kullback-Leibler divergence!

Same stochastic noise!

Page 10: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 10/18

C. Archambeau (CSML)C. Archambeau GP Approximations of SDEs

Comment on the Kullback-Leibler divergence

Is this measure a good criterion?

Support…

Page 11: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 11/18

C. Archambeau (CSML)C. Archambeau GP Approximations of SDEs

Including the observations

• Posterior process:

• Gaussian likelihood with linear observation operator (for simplicity):

• EM-type training algorithm:

where the lower bound is defined as

Page 12: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 12/18

C. Archambeau (CSML)C. Archambeau GP Approximations of SDEs

E-step: estimating the optimal (latent) state path

• Find optimal variational functions

• Constrained optimization problem:• ODE for the marginal means

• ODE for the marginal covariances

• Integrating the Lagrangian by parts and differentiating leads to:• ODEs for the Lagrange multipliers

• Gradient for the variational functions

SDE   Likelihood

Page 13: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 13/18

C. Archambeau (CSML)C. Archambeau GP Approximations of SDEs

Smoothing algorithm:

Fix initial conditions.Repeat until convergence:

• Forward sweep:

Propagate means and covariances forward in time:

• Backward sweep:

Propagate Lagrange multipliers backward in time (adjoint operation):

Use jump conditions when observations:

• Update variational functions:

(scaled gradient step)

InitialConditions

Page 14: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 14/18

C. Archambeau (CSML)C. Archambeau GP Approximations of SDEs

Double-well example

Gaussian process regression Variational Gaussian approximation

Page 15: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 15/18

C. Archambeau (CSML)C. Archambeau GP Approximations of SDEs

Comparison to MCMC simulation

Hybrid Monte Carlo approach:

• Reference solution

• Generate sample paths from posterior

• Modify scheme in order to increase acceptance rate (molecular dynamics)

• Still requires to generate 100,000 for good results

• Hard to check convergence

(Y. Shen, Aston) 

Page 16: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 16/18

C. Archambeau (CSML)C. Archambeau GP Approximations of SDEs

M-step: learning the parameters by maximum likelihood

• Linear transformation

• Observations noise

• Stochastic noise?

Page 17: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 17/18

C. Archambeau (CSML)C. Archambeau GP Approximations of SDEs

When things go wrong…

Page 18: Ca07 GPforSDEs Talk

8/16/2019 Ca07 GPforSDEs Talk

http://slidepdf.com/reader/full/ca07-gpforsdes-talk 18/18

C. Archambeau (CSML)C. Archambeau GP Approximations of SDEs

Conlusion

• Machine Learning for Data Assimilation

• Modeling the uncertainty is essential• Learning the kernel is a challenging new concern

• Bunch of suboptimal/intermediate good solutions?

• Promising results: quality of the solution & potential extensions

Future work includes:• High(er) dimensional data• Check gradient approach (cf. stochastic noise)

• Simplifications when the force (drift) derives from a potential

• Investigate a full variational Bayesian treatment

• Combine the variational approach with MCMC?

Paper available from: www.cs.ucl.ac.uk/staff/C.Archambeau

Research project: VISDEM (multi-site, EPSRC funded)