mathematical foundation of data assimilation · mathematical foundation of data assimilation...

64
Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018 1

Upload: others

Post on 20-May-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Mathematical Foundation of Data Assimilation

Sebastian Reich

Universität Potsdam/ University of Reading

RISDA, January 24th, 2018

1

Page 2: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Outline

Part 1. Foundation of Bayesian inference

Part 2. Filtering and Smoothing for State Space Models

Part 3. Ensemble Kalman filtering and smoothing

Part 4. Particle filters for high-dimensional systems

2

Page 3: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

My Background

Electrical engineer and applied mathematician by training.

Research interests in:É numerical analysisÉ Hamiltonian and molecular dynamicsÉ computational fluid dynamicsÉ data assimilation

DFG funded Collaborative Research Centeron Data Assimilation (www.sfb1294.de)É maximum funding period: 12 yearsÉ 12 scientific projectsÉ 24 doctoral and postdoctoral positionsÉ schools, fellowships, etc.

3

Page 4: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Numerical Weather Prediction

É Model: highly nonlinear discretized partial differential equationsÉ Data: heterogeneous mix of ground-, airborne-, satellite-based and

radar dataÉ 24/7 data assimilation service for optimal weather prediction

4

Page 5: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Bayesian inference

The three key ingredients of Bayesian inference:

É a prior measure over the variable of interest

É the likelihood of an observation given the variable of interest

É the posterior measure over the variable of interest conditioned onthe given observation

Note: All variables are treated as random variables contrary tofrequentist approach to inference.

Part 1. Foundation of Bayesian inference 6

Page 6: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Bayes’ formula I

Random variable of interest Z with prior distribution/measure/density

Z ∼ P or Z ∼ π

The expectation (expected value) of a function g(z) under P is defined as

E[g] =

g(z)P(dz)

or

E[g] =

g(z)π(z) dz .

We also use the shorthand

g, π[g], P[g].

Part 1. Foundation of Bayesian inference 7

Page 7: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Bayes’ formula II

The likelihood characterizes the probability of observing y given z:

l(y|z)

Note: We assume for simplicity that l is normalized, i.e.∫

l(y|z) dy = 1.

Evidence of y under the prior P:

l(y) =

l(y|z)P(dz)

= P[l(y, ·)] = π[l(y, ·)] .

Part 1. Foundation of Bayesian inference 8

Page 8: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Bayes’ formula III

Rules of conditional probabilities

π(z,y) = l(y|z)π(z) = π(z|y) l(y)

yield the posterior density

π(z|y) =π(z,y)

l(y)=l(y|z)π(z)

l(y)

∝ l(y|z)π(z) .

Notation. We use the shorthand ν∗(z) for π(z|y) or, more generally, incase of a posterior measure, Q∗(dz).

Remark. Normalizing constant, i.e. evidence l(y), is important whencomparing models, e.g. different prior distributions Pθ.

Part 1. Foundation of Bayesian inference 9

Page 9: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Bayes’ formula IV

Bayes’ formula needs to be generalized when the prior is a measure.

Radon-Nikodym derivative

dQ∗

dP=l(y|·)

l(y)∝ l(y|·)

of posterior wrt prior measure.

In words: Q∗ is absolute continuous with respect to P with densityl(y,z)/ l(y).

In equation form: Q∗ � P and

Q∗[g] =

g(z)Q∗(dz)

=

g(z)dQ∗

dPP(dz) =

g(z)l(y|z)

l(y)P(dz)

=1

l(y)P[g l(y|·)].

Part 1. Foundation of Bayesian inference 10

Page 10: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Variational formulation

Kullbeck-Leibler divergence:

D(Q|Q∗) =

logdQ

dQ∗Q(dz) = Q

logdQ

dQ∗

.

It holds thatD(Q|Q∗) > 0 for all Q 6= Q∗ .

Donsker-Varadhan principle:

− log l(y) = infQ�P

−Q[log l(y|·)] +D(Q|P)

with the infimum taken over all measures Q which are absolutelycontinuous wrt P. The infimum is achieved for Q = Q∗.

F = − log l(y) is called the free energy.

Remark. −Q[log l(y|·)] is called the expected loss under Q.

Part 1. Foundation of Bayesian inference 11

Page 11: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Machine learning vs data assimilation

Key element of both machine learning (ML) and DA:

joint probability : π(z,y) = l(y|z)π(z)

ML: the (effective) dimension of the data y is much larger than the(effective) dimension of the parameters z (big data)

DA: the (effective) dimension of z is much larger than the (effective)dimension of the data y (complex models)

In addition:É ML addresses mostly static inference problemsÉ DA has an element of forgetting (not just learning)É Both ML and DA lead to complex minimization and quantification of

uncertainty (UQ) problems

Part 1. Foundation of Bayesian inference 12

Page 12: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Computational approaches

Overview:

É distributional approximations (deterministic)

É point estimators such as MAP estimator:

z∗ := argminV(z), V(z) := − logν∗(z))

leading to 3DVar, 4DVar from meteorology

É variational Bayes (VB)

É Monte Carlo approximations (random)

É Markov chain Monte Carlo (MCMC)

É importance sampling (IS)

Part 1. Foundation of Bayesian inference 13

Page 13: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Variational Bayes

Approximate posterior ν∗ by a Gaussian distribution

ν(z) = (2π)−Nz/2|P|−Nz/2e−12 (z−μ)TP−1(z−μ)

with mean μ and covariance P chosen such that the variational freeenergy

F(ν) = −ν[log l(y|·)] +D(ν|π)

is minimised.

Critical points (μ∗,P∗) satisfy:

0 = ν�

∇z logν∗�

, (P∗)−1 = −ν�

∇z∇z logν∗�

Remark. Compare to Laplace approximation:

0 = ∇z logν∗ , (P∗)−1 = −∇z∇z logν∗|z=μ∗ .

Part 1. Foundation of Bayesian inference 14

Page 14: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Random algorithms: Monte Carlo

Monte Carlo methods: Random algorithms for producing (weighted)samples zi = Zi(ω), i = 1, . . . ,M, from Q∗.

The target measure Q∗ is approximated by the associated randommeasure:

Q∗ ≈1

M

M∑

i=1

wi δ(z− zi) ,

δ(·) the standard Dirac delta measure.

Two examples:

É Markov chain Monte Carlo (MCMC): wi = 1É importance sampling (IS): nonuniform weights wi

Part 1. Foundation of Bayesian inference 15

Page 15: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Markov chain Monte Carlo I

General idea of MCMC:

Find a transition kernel q(dz′|z) such that

Invariance: Q∗(dz′) =

q(dz′|z)Q∗(dz)

holds.

Produce correlated samples zi, i = 1, . . . ,M, sequentially

zi = Zi(ω) ∼ q(·|zi−1), i = 1, . . . ,M .

Efficiency: equivalent number, Meff, of independent samples required toproduce the same accuracy; typically

Meff �M .

Part 1. Foundation of Bayesian inference 16

Page 16: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Markov chain Monte Carlo II

Example. Consider gradient flow SDE (Brownian dynamics)

dZt = ∇z logν∗(Zt) dt +p

2dWt ,

Wt Nz-dimensional standard Brownian motion.

This SDE has ν∗ as a stationary distribution.

Discretize in time by Euler-Maryama method

zi = zi−1 + ∇z logν∗(zi−1) ∆t +p

2∆tΞi

with Ξi ∼ N(0, I) and step-size ∆t > 0.

If exact sampling is desired, apply a Metropolis-Hastings accept-rejectcriterion to correct for numerical errors.

Part 1. Foundation of Bayesian inference 17

Page 17: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Importance Sampling I

Find proposal density Q such thatÉ Q∗ � Q, i.e.

g :=

g(z)Q∗(dz) =

g(z)dQ∗

dQQ(dz)

É Q can be easily sampled from.

Example. Q = P.

Approximate Q∗ by

Q∗ ≈1

M

M∑

i=1

wi δ(z− zi)

with

zi = Zi(ω) ∼ Q, wi =dQ∗

dQ(zi) .

Example. Q = P, wi ∝ l(y|zi),∑

iwi = M.

Part 1. Foundation of Bayesian inference 18

Page 18: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Importance Sampling II

Notation: Radon-Nikodym derivative (e.g. likelihood):

L(z) :=dQ∗

dQ(z)

Effective sample size:

Meff :=1

1M

iw2i

M ≤M, wi := L(zi) .

Law of large numbers:Meff ≈ ρM

with

ρ :=1

Q[L2]=Q[L]2

Q[L2]≤ 1 .

Upper bound:ρ ≤ e−2D(Q∗ |Q) ≤ 1 .

Part 1. Foundation of Bayesian inference 19

Page 19: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

References

Christian Robert, The Bayesian Choice, Springer, 2007

Christian Robert, George Casella, Monte Carlo Statistical Methods,Springer, 2010

Sebastian Reich and Colin Cotter, Probabilistic Forecasting and BayesianData Assimilation, Cambridge University Press, 2015

Andrew Stuart, Inverse problems: A Bayesian perspective, ActaNumerica, 2010, 451–559

Manfred Opper and Cedric Archambeau, The Variational GaussianApproximation Revisited, Journal Neural Computation, 21, 2009, 786–792

Part 1. Foundation of Bayesian inference 20

Page 20: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Signal process I

In case of state space models, the prior measure P is defined recursively.

Discrete-time models:

Xn = M(Xn−1) +Q1/2Ξn ,

Ξn ∼ N(0, I), X0 ∼ π0, n = 1, . . . ,N, xn = Xn(ω) ∈ RNx .

Continuous-time models (SDEs):

dXt = f (Xt) dt +Q1/2dWt ,

Wt Nx-dimensional standard Brownian motion, X0 ∼ π0, t ∈ [0,T].

Part 2. Filtering and Smoothing for State Space Models 22

Page 21: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Signal process II

Variable of interest (discrete time):

z = x0:N = (x0,x1, . . . ,xN)

Prior density/measure:

π(z) = π(x0:N) = π0(x0)π(x1|x0) · · ·π(xN|xN−1)

Transition kernel:Xn ∼ π(· |xn−1)

with

π(x|x′) ∝ exp�

−1

2(x−M(x′))TQ−1(x−M(x′))

.

Remark. It is easy to generate samples from prior density π.

Part 2. Filtering and Smoothing for State Space Models 23

Page 22: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Signal process II

Formal derivation of prior measure for time-continuous models.

Euler-Maruyama method for SDEs:

Xn = Xn−1 + f (Xn−1) ∆t +Q1/2Ξn ,

with T = ∆t N, Ξn ∼ N(0,∆tI).

Transition kernel of Euler-Maruyama method for finite ∆t:

π∆t(x|x′) ∝ exp�

−1

2∆t(x− x′ − f (x′)∆t)TQ−1(x− x′ − f (x′)∆t)

.

and z = x0:N, T = ∆t N.

Part 2. Filtering and Smoothing for State Space Models 24

Page 23: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Signal process III

Limit ∆t→ 0 for fixed T = ∆t N.

Realizations of Z are a.s. continuous functions

z = x[0,T] ∈ C([0,T],RNx)

with measure P over C([0,T],RNx) formally defined by

limN→∞

(

π(x0)N∏

n=1

π∆t(xn|xn−1)

)

→ P(dx[0,T])

Note. Two SDEs with different diffusion matrices Q lead to measures Pwhich are mutually singular.

Part 2. Filtering and Smoothing for State Space Models 25

Page 24: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Observation process I

Discrete-in-time.

Forward model:Yn = h(Xn) +R1/2Σn

Σn ∼ N(0, I), n = 1, . . . ,N.

Likelihood:

l(y1:N|x0:N) ∝ exp

−1

2

N∑

n=1

(yn − h(xn))TR−1(yn − h(xn))

!

.

Part 2. Filtering and Smoothing for State Space Models 26

Page 25: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Observation process I

Continuous-in-time.

Forward model:

Yt =

∫ t

0h(Xs) ds+R1/2Vt ,

Vt standard Brownian motion, Y0 =, t ∈ [0,T].

Likelihood:

l(y[0,T]|x[0,T]) ∝ exp

−1

2

∫ T

0

hTt R−1htdt − 2hT

t R−1dyt

with ht = h(xt).

Part 2. Filtering and Smoothing for State Space Models 27

Page 26: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Posterior measure I

Bayes’ formula:

Discrete-in-time model and observations:

ν∗(x0:N) ∝ l(y1:N|x0:N)π(x0:N)

Continuous-in-time model and observations:

dQ∗

dP∝ l(y[0,T]|x[0,T]) .

Remark. Normalising constants (i.e. evidence) l(y1:N) and l(y[0,T]),respectively, are important for model comparison.

Part 2. Filtering and Smoothing for State Space Models 28

Page 27: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Posterior measure II

We are typically interested in marginal distributions only.

(a) Smoothing/reanalysis: distribution of xt /xn given all the data:

Xt|T ∼ νt|T Xn|T ∼ νn|T .

(b) Filtering: distribution of xt /xn given all the data up to t /n:

Xt|t ∼ νt|t Xn|n ∼ νn|n .

(c) Prediction: distribution of xt /xn given data up to τ < t /k < n:

Xt|τ ∼ νt|τ Xn|k ∼ νn|k .

Part 2. Filtering and Smoothing for State Space Models 29

Page 28: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Posterior measure III

Consider the modified SDE

dXt = f (Xt) dt +Q1/2dWt, X0(ω) = x0 , (1)

with

Wt = Wt +

∫ t

0usds .

Theorem (Girsanov).

Measure P introduced by (1) with us ≡ 0.Measure Qu introduced for any us 6= 0 such that

E

exp

1

2

∫ t

0|us|2ds

��

<∞ .

Qu is absolutely continuous wrt P with Radon-Nikodym derivative

dQu

dP |W[0,t]

= expZut , Zut =

∫ t

0uTsdWs +

1

2

∫ t

0|us|2ds .

Part 2. Filtering and Smoothing for State Space Models 30

Page 29: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Posterior measure IV

Application of Girsanov to data assimilation with underlying SDE models:

Find a control law u and a change of the initial measure π0 such that

Qu ≈ Q∗ .

See below and Part IV on proposals steps.

Remarks.

(i) A good choice of the model error diffusion matrix Q in

dXt = f (Xt) dt +Q1/2dWt

is crucial (smoothing vs. prediction).

(ii) The filtering problem also leads to control-type formulations; but theyare motivated differently. See feedback particle filter later in thispart.

Part 2. Filtering and Smoothing for State Space Models 31

Page 30: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Kalman-Bucy filter I

Linear SDE:dXt = AXt dt +Q1/2dWt , X0 ∼ N(x0,P0) .

Linear forward model:

dYt = HXt dt + dVt , Y0 = 0 .

Prior and posterior distributions are Gaussian:

Signal: (xt,Pt),

Filtering: (xt|t,Pt|t),

Smoothing: (xt|T ,Pt|T)

Part 2. Filtering and Smoothing for State Space Models 32

Page 31: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Kalman-Bucy filter II

Evolution equations for

Signal:

dxt

dt= Axt ,

dPt

dt= APt + PtAT +Q ,

x0 = x0|0, P0 = P0|0 given.

Filter:

dxt|t = Axt|t dt − Kt(Hxt|t dt − dYt) ,

dPt|t

dt= APt|t + Pt|tAT +Q− KtHPt|t

with Kalman gain matrixKt = Pt|tHT .

Part 2. Filtering and Smoothing for State Space Models 33

Page 32: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Kalman smoother

Smoother:

Given the filter solution (xt|t,Pt|t), t ∈ [0,T], solve backward in time

dxt|T

dt= Axt|T +QP−1

t|t (xt|T − xt|t) ,

dPt|T

dt= APt|T + Pt|TAT +Q+QP−1

t|t Pt|T + Pt|TP−1t|t Q

for given xT,T and PT|T at t = T.

Part 2. Filtering and Smoothing for State Space Models 34

Page 33: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Kalman-McKean-Vlasov equations

Interacting particle McKean-Vlasov representation of Kalmanfilter/smoother equations:

Signal:dXt = AXt dt +Q1/2dWt, X0 ∼ π0 .

Filter:

dXt|t = AXt|t dt +Q1/2dWt − Kt�

1

2

HXt|t +Hxt|t

dt − dYt

with xt|t = E[Xt|t], Kt = Pt|tHT, and Pt|t = E[(Xt|t − xt|t)(Xt|t − xt|t)T].

Smoother:dXt|T = AXt|T dt +Q1/2dWt +QP−1

t|t (Xt|T − xt|t)

with XT|T ∼ πT|T .

Part 2. Filtering and Smoothing for State Space Models 35

Page 34: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Nonlinear extension I

Signal:dXt = f (Xt) dt +Q1/2dWt, X0 ∼ π0 .

Feedback particle filter:

dXt|t = f (Xt|t) dt +Q1/2dWt − Kt ◦�

1

2

h(Xt|t) + ht|t

dt − dYt

with the Kalman gain now implicitly defined by

∇x · (πt|tKt) = πt|t(h− ht|t) .

Part 2. Filtering and Smoothing for State Space Models 36

Page 35: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Nonlinear extension II

Smoother extension of feedback particle filter:

dXt|T = f (Xt|T) dt +Q1/2dWt − Q∇x logπt|t(Xt|T) dt

with XT|T given.

Forward optimal control formulation:

dXt = f (Xt) dt +Q1/2dWt +Q�

∇x logπt|T

πt|t(Xt)

dt

with π0|T given by the backward smoother formulation.

The (time-dependent) control law is given by

ut(x) = Q1/2 ∇x logπt|T

πt|t(x) .

Remark. Time-averaged controls

u(x) = limT→∞

1

T

∫ T

0ut(x) dt

provide systematic model correction terms.Part 2. Filtering and Smoothing for State Space Models 37

Page 36: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

References

A. Jazwinski, Stochastic Processes and Filtering Theory, Academic Press,1970

Kody Law, Andrew Stuart and Konstantinos Zygalakis, Data Assimilation:A Mathematical Introduction, Springer, 2015

Greg Pavliotis, Stochastic Processes and Applications, Springer, 2014

Sebastian Reich and Colin Cotter, Probabilistic Forecasting and BayesianData Assimilation, Cambridge University Press, 2015

Amir Taghvaei, Jana de Wiljes, Prashant Mehta and Sebastian Reich,Kalman Filter and Its Modern Extensions for the Continuous-TimeNonlinear Filtering Problem, J. Dyn. Sys. Meas., Control, 140, 2017,030904

Carsten Hartmann, L. Richter, Christof Schütte and W. Zhang, Variationalcharacterization of free energy: theory and algorithms, Entropy, 19,2017, 626–653.

Kai Bergemann and Sebastian Reich, An ensemble Kalman-Bucy filter forcontinuous data assimilation, Meteorologische Zeitschrift, 21, 2012,213–219.

Part 2. Filtering and Smoothing for State Space Models 38

Page 37: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Ensemble Kalman-Bucy filter I

McKean-Vlasov representation of Kalman-Bucy filter for continuous-timesignal and forward models:

dXt|t = AXt|t dt +Q1/2dWt − Kt�

1

2

HXt|t +Hxt|t

dt − dYt

with xt|t = E[Xt|t], Kt = Pt|tHT, and Pt|t = E[(Xt|t − xt|t)(Xt|t − xt|t)T].

Monte Carlo approximation: for i = 1, . . . ,M

dxit|t = Axit|t dt +Q1/2dWit − K

Mt

1

2

¦

Hxit|t +HxMt|t©

dt − dYt

with

xMt|t =1

Mxit|t, PMt|t =

1

M− 1

M∑

i=1

(xit,t − xMt|t)(xit,t − x

Mt|t)

T

and KMt = PMt|tHT.

Part 3. Ensemble Kalman filtering and smoothing 40

Page 38: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Ensemble Kalman-Bucy filter II

Extension to continuous-time nonlinear signal and forward models: fori = 1, . . . ,M

dxit|t = f (xit|t) dt +Q1/2dWit − K

Mt

1

2

¦

h(xt|t)i + hMt|t©

dt − dYt

with

hMt|t =1

M

M∑

i=1

h(xit|t)

and

KMt =1

M− 1

M∑

i=1

(xit|t − xMt|t)(h(xit|t)− h

Mt|t)

T .

Alternative formulation

dxit|t = f (xit|t) dt +Q1/2dWit − K

Mt

�¦

h(xt|t)idt + dVit

©

− dYt�

Part 3. Ensemble Kalman filtering and smoothing 41

Page 39: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Ensemble Kalman-Bucy smoother

McKean-Vlasov representation of Kalman smoother:

dXt|T = AXt|T dt +Q1/2dWt +QP−1t|t (Xt|T − xt|t)

with XT|T ∼ πT|T .

Monte Carlo approximation: for i = 1, . . . ,M

dxit|T = Axit|T dt +Q1/2dWit +Q(PMt|t)

−1(xit|T − xt|t)

with XiT|T given by Monte Carlo approximation to Kalman-Bucy filter.

Ensemble Kalman-Bucy smoother:

dxit|T = f (xit|T) dt +Q1/2dWit +Q(PMt|t)

−1(xit|T − xt|t)

with XiT|T at t = T given by ensemble Kalman-Bucy filter.

Part 3. Ensemble Kalman filtering and smoothing 42

Page 40: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Ensemble Kalman filter I

Discrete-time observations

yn = Hxn +R1/2Σn

Σn ∼ N(0, I), n = 1, . . . ,N.

Required is an update from the forecast

Xf := Xtn|tn−1

to the analysisXa := Xtn|tn

at time tn

Part 3. Ensemble Kalman filtering and smoothing 43

Page 41: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Ensemble Kalman filter II

Notation:

mean : xf := E[Xf], xa := E[Xa]

deviation : ∆Xf := Xf − xf, ∆Xa := Xa − xa

covariance matrix : Pf := E[∆Xf∆XTf ], Pa := E[∆Xa∆XT

a]

Ensemble Kalman filter (EnKF) produces Xa such that

mean update : xa = xf − K(Hxf − yn)

covariance update : Pa = Pf − KHPf

with Kalman gain matrix

K = PfHT(HPfHT +R)−1 .

Remarks. (i) Neither Xf nor Xa need to be Gaussian random variables.(ii) Stated conditions in red do not determine Xa uniquely.

Part 3. Ensemble Kalman filtering and smoothing 44

Page 42: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Ensemble Kalman filter III

Implementation:

Forecast ensemble xif, i = 1, . . . ,M:

empirical mean : xMf =1

M

M∑

i=1

xif

emprical covariance matrix : PMf :=1

M− 1

M∑

i=1

xif(xif − x

Mf )T

Kalman gain matrix : KM := PMf HT(HPMf H

T +R)−1

Stochastic EnKF:

xja = xjf − KM�n

Hxjf + ηjo

− yn�

, ηi ∼ N(0,R) .

j = 1, . . . ,M.

Remark. There are many other variants of the EnKF.

Part 3. Ensemble Kalman filtering and smoothing 45

Page 43: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Ensemble transform filter I

Rewrite of the stochastic EnKF:

xja = xjf −M∑

i=1

xif1

M− 1

n

(xif − xMf )THT(HPMf H

T +R)(Hxjf + ηj − yn)o

= xjf −M∑

i=1

xif sij

=M∑

i=1

xif�

δij − sij

, (δij the Kronecker delta)

=M∑

i=1

xif dij

Remark. Different EnKF formulations lead to different dij’s. But they allsatisfy

M∑

i=1

dij = 1 .

Part 3. Ensemble Kalman filtering and smoothing 46

Page 44: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Ensemble transform filter II

Definition. The class of (linear) ensemble transform filters is defined by

xja =M∑

i=1

xif dij

for appropriate coefficients dij satisfying

M∑

i=1

dij = 1 .

Remark. Define

wi :=M∑

j=1

dij

and note that

xMa =1

M

M∑

j=1

xja =1

M

M∑

i,j=1

xifdij =1

M

M∑

i=1

wi xif .

Part 3. Ensemble Kalman filtering and smoothing 47

Page 45: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Beyond Gaussianity I

(i) For a ensemble transform filter to be consistent, it should hold that

wi ∝ exp�

−1

2(h(xif)− yn)TR−1(h(xif)− yn)

(importance weights)

subject to∑M

i=1wi = M.

(ii) Absolute continuity of the posterior measure with respect to the priormeasure suggests that any xja should be in the convex hall formed by theprior ensemble {xif}.

This holds provided dij ≥ 0 and∑

i dij = 1 for all i, j = 1, . . . ,M.

Summary. A consistent ensemble transform filter should satisfy

dij ≥ 0,M∑

i=1

dij = 1,M∑

j=1

dij = wi (importance weights)

Part 3. Ensemble Kalman filtering and smoothing 48

Page 46: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Beyond Gaussianity II

The conditions

dij ≥ 0,M∑

i=1

dij = 1,M∑

j=1

dij = wi (importance weights)

do not uniquely determine the coefficients dij.

The ensemble transform particle filter (ETPF) is based on

{dij} = argmaxM∑

i=1

(xia − xMa )T(xif − x

Mf )

subject to the constraints stated above and

xja :=M∑

i=1

xif dij.

Remark. This is equivalent to a discrete optimal transport problem.

Part 3. Ensemble Kalman filtering and smoothing 49

Page 47: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Beyond Gaussianity II

Lorenz-63 model, first component observed infrequently (∆t = 0.12) andwith large measurement noise (R = 8):

Figure: RMSEs for various second-order accurate LETFs compared to the ETPF, theESRF, and the SIR PF as a function of the sample size, M.

Part 3. Ensemble Kalman filtering and smoothing 50

Page 48: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Stability and accuracy

Data (scalar) at time tn: yn ∼ N(ytrue,R)

Analysis at time tn: {yin|n}

RMS error:

RMSE :=

(

1

N

N∑

n=1

(yn − yn|n)2

)1/2

< R1/2

Ensemble spread:

VAR :=1

N

N∑

n=1

(yn|n − yn|n)2 < R

Calibration and sharpness:

CRPS :=1

N

N∑

n=1

(Fyn(y)− Fyin|n(y))2dy

Part 3. Ensemble Kalman filtering and smoothing 51

Page 49: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

References

G. Evensen, Data Assimilation. The Ensemble Kalman Filter, Springer,2006

Kody Law, Andrew Stuart and Konstantinos Zygalakis, Data Assimilation:A Mathematical Introduction, Springer, 2015

Sebastian Reich and Colin Cotter, Probabilistic Forecasting and BayesianData Assimilation, Cambridge University Press, 2015

Mark Asch, Marc Bocquet and Maelle Nodet, Data Assimilation. Methods,Algorithms, and Applications, SIAM, 2017.

Jana de Wiljes, Sebastian Reich, Wilhelm Stannat, Long-time stability andaccuracy of the ensemble Kalman-Bucy filter for fully observed processesand small measurement noise, arXiv:1612.06065, 2017

Sebastian Reich, A nonparametric ensemble transform method forBayesian inference, SIAM J. Sci. Comput., 35, 2013, A2013–A2024.

Tilmann Gneiting, Fadoua Balabdaoui, Adrian Raftery, Probabilisticforecasts, calibration and sharpness, J. Royal Stats. Soc., Series B, 69,2007, 243–268.

Part 3. Ensemble Kalman filtering and smoothing 52

Page 50: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Curse of dimensionality

Importance sampling leads to weighted particle approximation of theposterior measure:

Q∗ ≈1

M

M∑

i=1

wi δ(z− zi)

with zi = Zi(ω) ∼ P and

wi = L(zi) :=l(y|zi)

l(y).

It holds that

ρ :=P[L]2

P[L2]=

1

P[L2]≤ e−2D(Q∗ |P) ,

which scales likeρ ≈ Ce−Ny

in case of Ny independent observations and the effective sample size(see Part I) decreases exponentially fast as Ny � 1.

Part 4. Particle filters for high-dimensional systems 54

Page 51: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Currently available remedies

Available approaches to beat the curse of dimensionality include:

É variational data assimilation

É localization

É ensemble inflation

É hybrid filter

É alternative proposal steps

Part 4. Particle filters for high-dimensional systems 55

Page 52: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Variational data assimilation I

Weak constraint 4DVar data assimilation:

xMAP0:N|N = argminL(x0:N)

with

L(x0:N) =1

2(x0 − x0)TP0(x0 − x0) +

1

2

N∑

n=1

¦

aTnQ−1an + bT

nR−1bn

©

subject toan := xn −M(xn−1), bn := h(xn)− yn .

Remark. Laplace approximation requires Hessian of L at xMAP0:N|N, which can

be obtained as a byproduct of quasi-Newton methods.

Part 4. Particle filters for high-dimensional systems 56

Page 53: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Variational data assimilation II

The Randomized Maximum Likelihood (RML) method is one method thatcombines ensemble and variational approaches.

Idee. Perturb cost functional J(x0:N) in the following manner:

Initial conditions : x0 + ξi0, ξi0 ∼ N(0,P0)

Model errors : an − ξin, ξin ∼ N(0,Q)

Measurement errors : bn + ηin, ηin ∼ N(0,R)

This leads toxi0:N|N = argminLi(x0:N)

with an and bn as defined before and

Li(x0:N) =1

2(x0 − x0 − ξi0)TP0(x0 − x0 − ξi0) +

1

2

N∑

n=1

(an − ξin)TQ−1(an − ξin)

+1

2

N∑

n=1

(bn + ηin)TR−1(bn + ηin)

Remark. Exact sampling for linear M and h.Part 4. Particle filters for high-dimensional systems 57

Page 54: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Localization I

States x are spatially dependent. To emphasise this aspect wetemporarily switch to notion:

xn ∈ C(R3,R) → u(x, tn) ∈ R, x ∈ R3

Observations at location xl ∈ R3:

yn,l = utruth(xl, tn) +R1/2l ξn,l, ξn,l ∼ N(0, I) .

Standard EnKF/ ensemble transform filters lead to

uja(x) =M∑

i=1

uif(x)dij ∀x ∈ R3 .

Two concepts of localization:É domain or B-localizationÉ observation or R-localization

Part 4. Particle filters for high-dimensional systems 58

Page 55: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Localization II

R-localization for EnKF/ ensemble transform filter:

uja(x) =M∑

i=1

uif(x)dij(x) ∀x ∈ R3 .

Spatially-dependent coefficients dij(x) depend only on observations in thevicinity of x.This is achieved through

1

Rl(x):=

ρ(x− xl)

Rl

with ρ(0) = 1 and ρ(x)→ 0 as |x| → 0.E.g. importance weights:

wi(x) ∝ exp

−∑

l

1

2Rl(x)(yn,l − ui(xl, tn))2

!

.

for updating uif(x), i = 1, . . . ,M.

Part 4. Particle filters for high-dimensional systems 59

Page 56: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Ensemble inflation

Multiplicative inflation:

xif → xif + α(xif − xMf ), α > 0 .

Equivalent to forward Euler discretization of

d

dtxi = (xi − xM), i = 1, . . . ,M,

with step-size α > 0.

Statistically equivalent to Euler-Maruyama discretization of SDE

dX = PdW , P = E[(X − x)(X − x)T], ,

with step-size α > 0 as ensemble size M→∞.

Compare to Brownian motion where P is replaced by a constant matrix Q.

Part 4. Particle filters for high-dimensional systems 60

Page 57: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Hybrid filter I

Aim: Bridge EnKF and particle filters in an adaptive manner.

Idee: Decompose likelihood function l(yn|xf)

l(yn|xf) = l(yn|xf)α l(yn|xf)

1−α = l1(yn|xf) l2(yn|xf)

with α ∈ [0,1]. Bayes’ formula becomes

dQ1

dP(xf) ∝ l1(yn|xf)

dQ2

dQ1(xf) ∝ l2(yn|xf)

It holds that Q∗ = Q2.

Part 4. Particle filters for high-dimensional systems 61

Page 58: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Hybrid filter II

Apply ensemble transform particle filter to the first inference problemand EnKF to the second (or vice verse).

Denote the filter coefficients by dij,1 and dij,2, respectively.

Resulting ensemble transform filter is of the form:

xja =M∑

i=1

M∑

k=1

xif dik,1 dkj,2 =M∑

i=1

xif dij

with

dij =M∑

k=1

dik,1 dkj,2 .

Question: How to choose bridging parameter α ∈ [0,1]? Currently used:

effective sample size : Meff =M

1M

iw2i

≥ cM, wi ∝ l1(yn|xif) = l(yn|xif)α

c < 1 a given threshold value,∑M

i=1wi = M.

Part 4. Particle filters for high-dimensional systems 62

Page 59: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Hybrid filter III

Hybrid filter: D := DESRF(α) DETPF(1− α).

Figure: RMSEs for hybrid ESRF (α = 0) and 2nd-order corrected LETF/ETPF (α = 1)as a function of the sample size, M.

Part 4. Particle filters for high-dimensional systems 63

Page 60: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Hybrid filter IV

Lorenz-96 model, discretized nonlinear advection equation, 40 gridpoints, every second observed.Hybrid filter P := PLETKF(α) PETPF(1− α) + localization.

Figure: RMSE for hybrid LETKF (α = 0) and 2nd-order corrected LETF/ETPF (α = 1).

Part 4. Particle filters for high-dimensional systems 64

Page 61: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Alternative Proposal steps I

Standard filter algorithms use model dynamics

dXt = f (Xt) dt +Q1/2dWt

to produce forecasts xif at time tn given an analysis xia at time tn−1.

Alternatively, one can try to find controls uis, s ∈ [tn−1, tn] and use

dXt = f (Xt) dt +Q1/2uit dt +Q1/2dWt

to produce forecasts xif at time tn given an analysis xia at time tn−1.

Denote the resulting proposal distribution at tn by

q(xf|u,xa) .

Part 4. Particle filters for high-dimensional systems 65

Page 62: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Alternative Proposal steps II

Target density:

π(yn,xf,xa) ∝ l(yn|xf)q(xf|0,xa)πn−1|n−1(xa)

= π(xf|xa,yn)π(yn|xa)πn−1|n−1(xa)

Proposal density:q(xf|u,xa)πn−1|n−1(xa)

Importance sampling:

wi ∝l(yn|xif)q(xif|0,x

ia)πn−1|n−1(xia)

q(xif|ui,xia)πn−1|n−1(xia)

∝π(xif|x

ia,yn)π(yn|xia)

q(xif|ui,xia)

Part 4. Particle filters for high-dimensional systems 66

Page 63: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

Alternative Proposal steps III

Recall from Part II:

dQu

dP |W[tn−1 ,tn ]

= expZut ⇒qf(· |u,xa)

qf(· |0,xa) |W[tn−1 ,tn ]

with

Zut =

∫ tn

tn−1

uTsdWs +

1

2

∫ tn

tn−1

|us|2ds .

Optimal choice:q(xif|u

i,xia) = π(xif|xia,yn)

with importance weightswi ∝ π(yn|xa) .

I.e. optimal control problem for finding uis, s ∈ [tn−1|tn] given Xtn = xia anda change of measure at tn−1 from πn−1|n−1 to πn−1|n.

Part 4. Particle filters for high-dimensional systems 67

Page 64: Mathematical Foundation of Data Assimilation · Mathematical Foundation of Data Assimilation Sebastian Reich Universität Potsdam/ University of Reading RISDA, January 24th, 2018

References

Peter Jan van Leeuwen, Yuan Cheng and Sebastian Reich, Frontiers inApplied Dynamical Systems: Reviews and Tutorials 2: Nonlinear DataAssimilation, Springer, 2015.

Nawinda Chustagulprom, Sebastian Reich, and Maria Reinhardt, A hybridensemble transform particle filter for nonlinear spatially extendeddynamical systems, SIAM/ASA J UQ, 4, 2016, 592–608.

Paul Fearnhead and Hans R. Künsch, Particle Filters and Data Assimilation,arXiv:1709.04196, 2017.

Walter Acevedo, Jana de Wiljes, and Sebastian Reich, Second-orderaccurate ensemble transform particle filters, SIAM J. Sci. Comput., 39,2017, A1834–A1850.

S. Agapiou, O. Papaspiliopoulos, D. Sanz-Alonso, A.M. Stuart, Importancesampling: Intrinsic dimension and computational cost, Statistical Science,32, 2017, 405–431.

R.N. Bannister, A review of operational methods of variational andensemble-variational data assimilation, Q.J. Royal Meteorol. Soc., 143,2017, 607–633.

Part 4. Particle filters for high-dimensional systems 68