mathematical foundation of data assimilation · mathematical foundation of data assimilation...

Mathematical Foundation of Data Assimilation

Sebastian Reich

Universität Potsdam/ University of Reading

RISDA, January 24th, 2018

1

Outline

Part 1. Foundation of Bayesian inference

Part 2. Filtering and Smoothing for State Space Models

Part 3. Ensemble Kalman filtering and smoothing

Part 4. Particle filters for high-dimensional systems

2

My Background

Electrical engineer and applied mathematician by training.

Research interests in:É numerical analysisÉ Hamiltonian and molecular dynamicsÉ computational fluid dynamicsÉ data assimilation

DFG funded Collaborative Research Centeron Data Assimilation (www.sfb1294.de)É maximum funding period: 12 yearsÉ 12 scientific projectsÉ 24 doctoral and postdoctoral positionsÉ schools, fellowships, etc.

3

Numerical Weather Prediction

É Model: highly nonlinear discretized partial differential equationsÉ Data: heterogeneous mix of ground-, airborne-, satellite-based and

radar dataÉ 24/7 data assimilation service for optimal weather prediction

4

Bayesian inference

The three key ingredients of Bayesian inference:

É a prior measure over the variable of interest

É the likelihood of an observation given the variable of interest

É the posterior measure over the variable of interest conditioned onthe given observation

Note: All variables are treated as random variables contrary tofrequentist approach to inference.

Part 1. Foundation of Bayesian inference 6

Bayes’ formula I

Random variable of interest Z with prior distribution/measure/density

Z ∼ P or Z ∼ π

The expectation (expected value) of a function g(z) under P is defined as

E[g] =

∫

g(z)P(dz)

or

E[g] =

∫

g(z)π(z) dz .

We also use the shorthand

g, π[g], P[g].


Bayes’ formula II

The likelihood characterizes the probability of observing y given z:

l(y|z)

Note: We assume for simplicity that l is normalized, i.e.∫

l(y|z) dy = 1.

Evidence of y under the prior P:

l(y) =

∫

l(y|z)P(dz)

= P[l(y, ·)] = π[l(y, ·)] .


Bayes’ formula III

Rules of conditional probabilities

π(z,y) = l(y|z)π(z) = π(z|y) l(y)

yield the posterior density

π(z|y) =π(z,y)

l(y)=l(y|z)π(z)

l(y)

∝ l(y|z)π(z) .

Notation. We use the shorthand ν∗(z) for π(z|y) or, more generally, incase of a posterior measure, Q∗(dz).

Remark. Normalizing constant, i.e. evidence l(y), is important whencomparing models, e.g. different prior distributions Pθ.


Bayes’ formula IV

Bayes’ formula needs to be generalized when the prior is a measure.

Radon-Nikodym derivative

dQ∗

dP=l(y|·)

l(y)∝ l(y|·)

of posterior wrt prior measure.

In words: Q∗ is absolute continuous with respect to P with densityl(y,z)/ l(y).

In equation form: Q∗ � P and

Q∗[g] =

∫

g(z)Q∗(dz)

=

∫

g(z)dQ∗

dPP(dz) =

∫

g(z)l(y|z)

l(y)P(dz)

=1

l(y)P[g l(y|·)].


Variational formulation

Kullbeck-Leibler divergence:

D(Q|Q∗) =

∫

logdQ

dQ∗Q(dz) = Q

�

logdQ

dQ∗

�

.

It holds thatD(Q|Q∗) > 0 for all Q 6= Q∗ .

Donsker-Varadhan principle:

− log l(y) = infQ�P

�

−Q[log l(y|·)] +D(Q|P)

with the infimum taken over all measures Q which are absolutelycontinuous wrt P. The infimum is achieved for Q = Q∗.

F = − log l(y) is called the free energy.

Remark. −Q[log l(y|·)] is called the expected loss under Q.


Machine learning vs data assimilation

Key element of both machine learning (ML) and DA:

joint probability : π(z,y) = l(y|z)π(z)

ML: the (effective) dimension of the data y is much larger than the(effective) dimension of the parameters z (big data)

DA: the (effective) dimension of z is much larger than the (effective)dimension of the data y (complex models)

In addition:É ML addresses mostly static inference problemsÉ DA has an element of forgetting (not just learning)É Both ML and DA lead to complex minimization and quantification of

uncertainty (UQ) problems


Computational approaches

Overview:

É distributional approximations (deterministic)

É point estimators such as MAP estimator:

z∗ := argminV(z), V(z) := − logν∗(z))

leading to 3DVar, 4DVar from meteorology

É variational Bayes (VB)

É Monte Carlo approximations (random)

É Markov chain Monte Carlo (MCMC)

É importance sampling (IS)


Variational Bayes

Approximate posterior ν∗ by a Gaussian distribution

ν(z) = (2π)−Nz/2|P|−Nz/2e−12 (z−μ)TP−1(z−μ)

with mean μ and covariance P chosen such that the variational freeenergy

F(ν) = −ν[log l(y|·)] +D(ν|π)

is minimised.

Critical points (μ∗,P∗) satisfy:

0 = ν�

∇z logν∗�

, (P∗)−1 = −ν�

∇z∇z logν∗�

Remark. Compare to Laplace approximation:

0 = ∇z logν∗ , (P∗)−1 = −∇z∇z logν∗|z=μ∗ .


Random algorithms: Monte Carlo

Monte Carlo methods: Random algorithms for producing (weighted)samples zi = Zi(ω), i = 1, . . . ,M, from Q∗.

The target measure Q∗ is approximated by the associated randommeasure:

Q∗ ≈1

M

M∑

i=1

wi δ(z− zi) ,

δ(·) the standard Dirac delta measure.

Two examples:

É Markov chain Monte Carlo (MCMC): wi = 1É importance sampling (IS): nonuniform weights wi


Markov chain Monte Carlo I

General idea of MCMC:

Find a transition kernel q(dz′|z) such that

Invariance: Q∗(dz′) =

∫

q(dz′|z)Q∗(dz)

holds.

Produce correlated samples zi, i = 1, . . . ,M, sequentially

zi = Zi(ω) ∼ q(·|zi−1), i = 1, . . . ,M .

Efficiency: equivalent number, Meff, of independent samples required toproduce the same accuracy; typically

Meff �M .


Markov chain Monte Carlo II

Example. Consider gradient flow SDE (Brownian dynamics)

dZt = ∇z logν∗(Zt) dt +p

2dWt ,

Wt Nz-dimensional standard Brownian motion.

This SDE has ν∗ as a stationary distribution.

Discretize in time by Euler-Maryama method

zi = zi−1 + ∇z logν∗(zi−1) ∆t +p

2∆tΞi

with Ξi ∼ N(0, I) and step-size ∆t > 0.

If exact sampling is desired, apply a Metropolis-Hastings accept-rejectcriterion to correct for numerical errors.


Importance Sampling I

Find proposal density Q such thatÉ Q∗ � Q, i.e.

g :=

∫

g(z)Q∗(dz) =

∫

g(z)dQ∗

dQQ(dz)

É Q can be easily sampled from.

Example. Q = P.

Approximate Q∗ by

Q∗ ≈1

M

M∑

i=1

wi δ(z− zi)

with

zi = Zi(ω) ∼ Q, wi =dQ∗

dQ(zi) .

Example. Q = P, wi ∝ l(y|zi),∑

iwi = M.


Importance Sampling II

Notation: Radon-Nikodym derivative (e.g. likelihood):

L(z) :=dQ∗

dQ(z)

Effective sample size:

Meff :=1

1M

∑

iw2i

M ≤M, wi := L(zi) .

Law of large numbers:Meff ≈ ρM

with

ρ :=1

Q[L2]=Q[L]2

Q[L2]≤ 1 .

Upper bound:ρ ≤ e−2D(Q∗ |Q) ≤ 1 .


References

Christian Robert, The Bayesian Choice, Springer, 2007

Christian Robert, George Casella, Monte Carlo Statistical Methods,Springer, 2010

Sebastian Reich and Colin Cotter, Probabilistic Forecasting and BayesianData Assimilation, Cambridge University Press, 2015

Andrew Stuart, Inverse problems: A Bayesian perspective, ActaNumerica, 2010, 451–559

Manfred Opper and Cedric Archambeau, The Variational GaussianApproximation Revisited, Journal Neural Computation, 21, 2009, 786–792


Signal process I

In case of state space models, the prior measure P is defined recursively.

Discrete-time models:

Xn = M(Xn−1) +Q1/2Ξn ,

Ξn ∼ N(0, I), X0 ∼ π0, n = 1, . . . ,N, xn = Xn(ω) ∈ RNx .

Continuous-time models (SDEs):

dXt = f (Xt) dt +Q1/2dWt ,

Wt Nx-dimensional standard Brownian motion, X0 ∼ π0, t ∈ [0,T].

Part 2. Filtering and Smoothing for State Space Models 22

Signal process II

Variable of interest (discrete time):

z = x0:N = (x0,x1, . . . ,xN)

Prior density/measure:

π(z) = π(x0:N) = π0(x0)π(x1|x0) · · ·π(xN|xN−1)

Transition kernel:Xn ∼ π(· |xn−1)

with

π(x|x′) ∝ exp�

−1

2(x−M(x′))TQ−1(x−M(x′))

�

.

Remark. It is easy to generate samples from prior density π.


Signal process II

Formal derivation of prior measure for time-continuous models.

Euler-Maruyama method for SDEs:

Xn = Xn−1 + f (Xn−1) ∆t +Q1/2Ξn ,

with T = ∆t N, Ξn ∼ N(0,∆tI).

Transition kernel of Euler-Maruyama method for finite ∆t:

π∆t(x|x′) ∝ exp�

−1

2∆t(x− x′ − f (x′)∆t)TQ−1(x− x′ − f (x′)∆t)

�

.

and z = x0:N, T = ∆t N.


Signal process III

Limit ∆t→ 0 for fixed T = ∆t N.

Realizations of Z are a.s. continuous functions

z = x[0,T] ∈ C([0,T],RNx)

with measure P over C([0,T],RNx) formally defined by

limN→∞

(

π(x0)N∏

n=1

π∆t(xn|xn−1)

)

→ P(dx[0,T])

Note. Two SDEs with different diffusion matrices Q lead to measures Pwhich are mutually singular.


Observation process I

Discrete-in-time.

Forward model:Yn = h(Xn) +R1/2Σn

Σn ∼ N(0, I), n = 1, . . . ,N.

Likelihood:

l(y1:N|x0:N) ∝ exp

−1

2

N∑

n=1

(yn − h(xn))TR−1(yn − h(xn))

!

.


Observation process I

Continuous-in-time.

Forward model:

Yt =

∫ t

0h(Xs) ds+R1/2Vt ,

Vt standard Brownian motion, Y0 =, t ∈ [0,T].

Likelihood:

l(y[0,T]|x[0,T]) ∝ exp

�

−1

2

∫ T

0

�

hTt R−1htdt − 2hT

t R−1dyt

�

�

with ht = h(xt).


Posterior measure I

Bayes’ formula:

Discrete-in-time model and observations:

ν∗(x0:N) ∝ l(y1:N|x0:N)π(x0:N)

Continuous-in-time model and observations:

dQ∗

dP∝ l(y[0,T]|x[0,T]) .

Remark. Normalising constants (i.e. evidence) l(y1:N) and l(y[0,T]),respectively, are important for model comparison.


Posterior measure II

We are typically interested in marginal distributions only.

(a) Smoothing/reanalysis: distribution of xt /xn given all the data:

Xt|T ∼ νt|T Xn|T ∼ νn|T .

(b) Filtering: distribution of xt /xn given all the data up to t /n:

Xt|t ∼ νt|t Xn|n ∼ νn|n .

(c) Prediction: distribution of xt /xn given data up to τ < t /k < n:

Xt|τ ∼ νt|τ Xn|k ∼ νn|k .


Posterior measure III

Consider the modified SDE

dXt = f (Xt) dt +Q1/2dWt, X0(ω) = x0 , (1)

with

Wt = Wt +

∫ t

0usds .

Theorem (Girsanov).

Measure P introduced by (1) with us ≡ 0.Measure Qu introduced for any us 6= 0 such that

E

�

exp

�

1

2

∫ t

0|us|2ds

��

<∞ .

Qu is absolutely continuous wrt P with Radon-Nikodym derivative

dQu

dP |W[0,t]

= expZut , Zut =

∫ t

0uTsdWs +

1

2

∫ t

0|us|2ds .


Posterior measure IV

Application of Girsanov to data assimilation with underlying SDE models:

Find a control law u and a change of the initial measure π0 such that

Qu ≈ Q∗ .

See below and Part IV on proposals steps.

Remarks.

(i) A good choice of the model error diffusion matrix Q in

dXt = f (Xt) dt +Q1/2dWt

is crucial (smoothing vs. prediction).

(ii) The filtering problem also leads to control-type formulations; but theyare motivated differently. See feedback particle filter later in thispart.


Kalman-Bucy filter I

Linear SDE:dXt = AXt dt +Q1/2dWt , X0 ∼ N(x0,P0) .

Linear forward model:

dYt = HXt dt + dVt , Y0 = 0 .

Prior and posterior distributions are Gaussian:

Signal: (xt,Pt),

Filtering: (xt|t,Pt|t),

Smoothing: (xt|T ,Pt|T)


Kalman smoother

Smoother:

Given the filter solution (xt|t,Pt|t), t ∈ [0,T], solve backward in time

dxt|T

dt= Axt|T +QP−1

t|t (xt|T − xt|t) ,

dPt|T

dt= APt|T + Pt|TAT +Q+QP−1

t|t Pt|T + Pt|TP−1t|t Q

for given xT,T and PT|T at t = T.


Nonlinear extension II

Smoother extension of feedback particle filter:

dXt|T = f (Xt|T) dt +Q1/2dWt − Q∇x logπt|t(Xt|T) dt

with XT|T given.

Forward optimal control formulation:

dXt = f (Xt) dt +Q1/2dWt +Q�

∇x logπt|T

πt|t(Xt)

�

dt

with π0|T given by the backward smoother formulation.

The (time-dependent) control law is given by

ut(x) = Q1/2 ∇x logπt|T

πt|t(x) .

Remark. Time-averaged controls

u(x) = limT→∞

1

T

∫ T

0ut(x) dt

provide systematic model correction terms.Part 2. Filtering and Smoothing for State Space Models 37

References

A. Jazwinski, Stochastic Processes and Filtering Theory, Academic Press,1970

Kody Law, Andrew Stuart and Konstantinos Zygalakis, Data Assimilation:A Mathematical Introduction, Springer, 2015

Greg Pavliotis, Stochastic Processes and Applications, Springer, 2014


Amir Taghvaei, Jana de Wiljes, Prashant Mehta and Sebastian Reich,Kalman Filter and Its Modern Extensions for the Continuous-TimeNonlinear Filtering Problem, J. Dyn. Sys. Meas., Control, 140, 2017,030904

Carsten Hartmann, L. Richter, Christof Schütte and W. Zhang, Variationalcharacterization of free energy: theory and algorithms, Entropy, 19,2017, 626–653.

Kai Bergemann and Sebastian Reich, An ensemble Kalman-Bucy filter forcontinuous data assimilation, Meteorologische Zeitschrift, 21, 2012,213–219.


Ensemble Kalman filter I

Discrete-time observations

yn = Hxn +R1/2Σn

Σn ∼ N(0, I), n = 1, . . . ,N.

Required is an update from the forecast

Xf := Xtn|tn−1

to the analysisXa := Xtn|tn

at time tn


Ensemble Kalman filter II

Notation:

mean : xf := E[Xf], xa := E[Xa]

deviation : ∆Xf := Xf − xf, ∆Xa := Xa − xa

covariance matrix : Pf := E[∆Xf∆XTf ], Pa := E[∆Xa∆XT

a]

Ensemble Kalman filter (EnKF) produces Xa such that

mean update : xa = xf − K(Hxf − yn)

covariance update : Pa = Pf − KHPf

with Kalman gain matrix

K = PfHT(HPfHT +R)−1 .

Remarks. (i) Neither Xf nor Xa need to be Gaussian random variables.(ii) Stated conditions in red do not determine Xa uniquely.


Ensemble Kalman filter III

Implementation:

Forecast ensemble xif, i = 1, . . . ,M:

empirical mean : xMf =1

M

M∑

i=1

xif

emprical covariance matrix : PMf :=1

M− 1

M∑

i=1

xif(xif − x

Mf )T

Kalman gain matrix : KM := PMf HT(HPMf H

T +R)−1

Stochastic EnKF:

xja = xjf − KM�n

Hxjf + ηjo

− yn�

, ηi ∼ N(0,R) .

j = 1, . . . ,M.

Remark. There are many other variants of the EnKF.


Ensemble transform filter I

Rewrite of the stochastic EnKF:

xja = xjf −M∑

i=1

xif1

M− 1

n

(xif − xMf )THT(HPMf H

T +R)(Hxjf + ηj − yn)o

= xjf −M∑

i=1

xif sij

=M∑

i=1

xif�

δij − sij

, (δij the Kronecker delta)

=M∑

i=1

xif dij

Remark. Different EnKF formulations lead to different dij’s. But they allsatisfy

M∑

i=1

dij = 1 .


Ensemble transform filter II

Definition. The class of (linear) ensemble transform filters is defined by

xja =M∑

i=1

xif dij

for appropriate coefficients dij satisfying

M∑

i=1

dij = 1 .

Remark. Define

wi :=M∑

j=1

dij

and note that

xMa =1

M

M∑

j=1

xja =1

M

M∑

i,j=1

xifdij =1

M

M∑

i=1

wi xif .


Beyond Gaussianity I

(i) For a ensemble transform filter to be consistent, it should hold that

wi ∝ exp�

−1

2(h(xif)− yn)TR−1(h(xif)− yn)

�

(importance weights)

subject to∑M

i=1wi = M.

(ii) Absolute continuity of the posterior measure with respect to the priormeasure suggests that any xja should be in the convex hall formed by theprior ensemble {xif}.

This holds provided dij ≥ 0 and∑

i dij = 1 for all i, j = 1, . . . ,M.

Summary. A consistent ensemble transform filter should satisfy

dij ≥ 0,M∑

i=1

dij = 1,M∑

j=1

dij = wi (importance weights)


Beyond Gaussianity II

The conditions

dij ≥ 0,M∑

i=1

dij = 1,M∑

j=1

dij = wi (importance weights)

do not uniquely determine the coefficients dij.

The ensemble transform particle filter (ETPF) is based on

{dij} = argmaxM∑

i=1

(xia − xMa )T(xif − x

Mf )

subject to the constraints stated above and

xja :=M∑

i=1

xif dij.

Remark. This is equivalent to a discrete optimal transport problem.


Beyond Gaussianity II

Lorenz-63 model, first component observed infrequently (∆t = 0.12) andwith large measurement noise (R = 8):

Figure: RMSEs for various second-order accurate LETFs compared to the ETPF, theESRF, and the SIR PF as a function of the sample size, M.


Stability and accuracy

Data (scalar) at time tn: yn ∼ N(ytrue,R)

Analysis at time tn: {yin|n}

RMS error:

RMSE :=

(

1

N

N∑

n=1

(yn − yn|n)2

)1/2

< R1/2

Ensemble spread:

VAR :=1

N

N∑

n=1

(yn|n − yn|n)2 < R

Calibration and sharpness:

CRPS :=1

N

N∑

n=1

∫

(Fyn(y)− Fyin|n(y))2dy


References

G. Evensen, Data Assimilation. The Ensemble Kalman Filter, Springer,2006

Kody Law, Andrew Stuart and Konstantinos Zygalakis, Data Assimilation:A Mathematical Introduction, Springer, 2015


Mark Asch, Marc Bocquet and Maelle Nodet, Data Assimilation. Methods,Algorithms, and Applications, SIAM, 2017.

Jana de Wiljes, Sebastian Reich, Wilhelm Stannat, Long-time stability andaccuracy of the ensemble Kalman-Bucy filter for fully observed processesand small measurement noise, arXiv:1612.06065, 2017

Sebastian Reich, A nonparametric ensemble transform method forBayesian inference, SIAM J. Sci. Comput., 35, 2013, A2013–A2024.

Tilmann Gneiting, Fadoua Balabdaoui, Adrian Raftery, Probabilisticforecasts, calibration and sharpness, J. Royal Stats. Soc., Series B, 69,2007, 243–268.


Curse of dimensionality

Importance sampling leads to weighted particle approximation of theposterior measure:

Q∗ ≈1

M

M∑

i=1

wi δ(z− zi)

with zi = Zi(ω) ∼ P and

wi = L(zi) :=l(y|zi)

l(y).

It holds that

ρ :=P[L]2

P[L2]=

1

P[L2]≤ e−2D(Q∗ |P) ,

which scales likeρ ≈ Ce−Ny

in case of Ny independent observations and the effective sample size(see Part I) decreases exponentially fast as Ny � 1.

Part 4. Particle filters for high-dimensional systems 54

Currently available remedies

Available approaches to beat the curse of dimensionality include:

É variational data assimilation

É localization

É ensemble inflation

É hybrid filter

É alternative proposal steps


Variational data assimilation I

Weak constraint 4DVar data assimilation:

xMAP0:N|N = argminL(x0:N)

with

L(x0:N) =1

2(x0 − x0)TP0(x0 − x0) +

1

2

N∑

n=1

¦

aTnQ−1an + bT

nR−1bn

©

subject toan := xn −M(xn−1), bn := h(xn)− yn .

Remark. Laplace approximation requires Hessian of L at xMAP0:N|N, which can

be obtained as a byproduct of quasi-Newton methods.


Variational data assimilation II

The Randomized Maximum Likelihood (RML) method is one method thatcombines ensemble and variational approaches.

Idee. Perturb cost functional J(x0:N) in the following manner:

Initial conditions : x0 + ξi0, ξi0 ∼ N(0,P0)

Model errors : an − ξin, ξin ∼ N(0,Q)

Measurement errors : bn + ηin, ηin ∼ N(0,R)

This leads toxi0:N|N = argminLi(x0:N)

with an and bn as defined before and

Li(x0:N) =1

2(x0 − x0 − ξi0)TP0(x0 − x0 − ξi0) +

1

2

N∑

n=1

(an − ξin)TQ−1(an − ξin)

+1

2

N∑

n=1

(bn + ηin)TR−1(bn + ηin)

Remark. Exact sampling for linear M and h.Part 4. Particle filters for high-dimensional systems 57

Localization I

States x are spatially dependent. To emphasise this aspect wetemporarily switch to notion:

xn ∈ C(R3,R) → u(x, tn) ∈ R, x ∈ R3

Observations at location xl ∈ R3:

yn,l = utruth(xl, tn) +R1/2l ξn,l, ξn,l ∼ N(0, I) .

Standard EnKF/ ensemble transform filters lead to

uja(x) =M∑

i=1

uif(x)dij ∀x ∈ R3 .

Two concepts of localization:É domain or B-localizationÉ observation or R-localization


Localization II

R-localization for EnKF/ ensemble transform filter:

uja(x) =M∑

i=1

uif(x)dij(x) ∀x ∈ R3 .

Spatially-dependent coefficients dij(x) depend only on observations in thevicinity of x.This is achieved through

1

Rl(x):=

ρ(x− xl)

Rl

with ρ(0) = 1 and ρ(x)→ 0 as |x| → 0.E.g. importance weights:

wi(x) ∝ exp

−∑

l

1

2Rl(x)(yn,l − ui(xl, tn))2

!

.

for updating uif(x), i = 1, . . . ,M.


Ensemble inflation

Multiplicative inflation:

xif → xif + α(xif − xMf ), α > 0 .

Equivalent to forward Euler discretization of

d

dtxi = (xi − xM), i = 1, . . . ,M,

with step-size α > 0.

Statistically equivalent to Euler-Maruyama discretization of SDE

dX = PdW , P = E[(X − x)(X − x)T], ,

with step-size α > 0 as ensemble size M→∞.

Compare to Brownian motion where P is replaced by a constant matrix Q.


Hybrid filter II

Apply ensemble transform particle filter to the first inference problemand EnKF to the second (or vice verse).

Denote the filter coefficients by dij,1 and dij,2, respectively.

Resulting ensemble transform filter is of the form:

xja =M∑

i=1

M∑

k=1

xif dik,1 dkj,2 =M∑

i=1

xif dij

with

dij =M∑

k=1

dik,1 dkj,2 .

Question: How to choose bridging parameter α ∈ [0,1]? Currently used:

effective sample size : Meff =M

1M

∑

iw2i

≥ cM, wi ∝ l1(yn|xif) = l(yn|xif)α

c < 1 a given threshold value,∑M

i=1wi = M.


Hybrid filter III

Hybrid filter: D := DESRF(α) DETPF(1− α).

Figure: RMSEs for hybrid ESRF (α = 0) and 2nd-order corrected LETF/ETPF (α = 1)as a function of the sample size, M.


Hybrid filter IV

Lorenz-96 model, discretized nonlinear advection equation, 40 gridpoints, every second observed.Hybrid filter P := PLETKF(α) PETPF(1− α) + localization.

Figure: RMSE for hybrid LETKF (α = 0) and 2nd-order corrected LETF/ETPF (α = 1).


Alternative Proposal steps I

Standard filter algorithms use model dynamics

dXt = f (Xt) dt +Q1/2dWt

to produce forecasts xif at time tn given an analysis xia at time tn−1.

Alternatively, one can try to find controls uis, s ∈ [tn−1, tn] and use

dXt = f (Xt) dt +Q1/2uit dt +Q1/2dWt

to produce forecasts xif at time tn given an analysis xia at time tn−1.

Denote the resulting proposal distribution at tn by

q(xf|u,xa) .


References

Peter Jan van Leeuwen, Yuan Cheng and Sebastian Reich, Frontiers inApplied Dynamical Systems: Reviews and Tutorials 2: Nonlinear DataAssimilation, Springer, 2015.

Nawinda Chustagulprom, Sebastian Reich, and Maria Reinhardt, A hybridensemble transform particle filter for nonlinear spatially extendeddynamical systems, SIAM/ASA J UQ, 4, 2016, 592–608.

Paul Fearnhead and Hans R. Künsch, Particle Filters and Data Assimilation,arXiv:1709.04196, 2017.

Walter Acevedo, Jana de Wiljes, and Sebastian Reich, Second-orderaccurate ensemble transform particle filters, SIAM J. Sci. Comput., 39,2017, A1834–A1850.

S. Agapiou, O. Papaspiliopoulos, D. Sanz-Alonso, A.M. Stuart, Importancesampling: Intrinsic dimension and computational cost, Statistical Science,32, 2017, 405–431.

R.N. Bannister, A review of operational methods of variational andensemble-variational data assimilation, Q.J. Royal Meteorol. Soc., 143,2017, 607–633.


mathematical foundation of data assimilation · mathematical foundation of data assimilation...

Documents