an introduction to data assimilation · e. blayo - an introduction to data assimilation ecole gdr...

87
An introduction to data assimilation Eric Blayo University of Grenoble and INRIA

Upload: others

Post on 12-Oct-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

An introductionto data assimilation

Eric BlayoUniversity of Grenoble and INRIA

Page 2: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Data assimilation, the science of compromisesContext characterizing a (complex) system and/or forecasting its

evolution, given several heterogeneous and uncertainsources of information

Widely used for geophysical fluids (meteorology, oceanography,atmospheric chemistry. . . ), but also in other numerousdomains (e.g. glaciology, nuclear energy, medicine,agriculture planning. . . )

Closely linked to inverse methods, control theory, estimation theory,filtering. . .

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 2/61

Page 3: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Data assimilation, the science of compromises

Numerous possible aims:I Forecast: estimation of the present state (initial condition)I Model tuning: parameter estimationI Inverse modeling: estimation of parameter fieldsI Data analysis: re-analysis (model = interpolation operator)I OSSE: optimization of observing systemsI . . .

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 3/61

Page 4: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Data assimilation, the science of compromises

Its application to Earth sciences generally raises a number of difficulties,some of them being rather specific:

I non linearitiesI huge dimensionsI poor knowledge of error statisticsI non reproducibility (each experiment is unique)I operational forecast (computations must be performed in a limited

time)

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 4/61

Page 5: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Objectives for these two lectures

I introduce data assimilation from several points of viewI give an overview of the main methodsI detail the basic ones and highlight their pros and consI introduce some current research problems

Outline1. Data assimilation for dummies: a simple model problem2. Generalization: linear estimation theory, variational and sequential

approaches3. Variational algorithms - Adjoint techniques4. Reduced order Kalman filters5. Some current research tracks

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 5/61

Page 6: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Objectives for these two lectures

I introduce data assimilation from several points of viewI give an overview of the main methodsI detail the basic ones and highlight their pros and consI introduce some current research problems

Outline1. Data assimilation for dummies: a simple model problem2. Generalization: linear estimation theory, variational and sequential

approaches3. Variational algorithms - Adjoint techniques4. Reduced order Kalman filters5. Some current research tracks

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 5/61

Page 7: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Some references

1. Blayo E. and M. Nodet, 2012: Introduction a l’assimilation de donnees variationnelle. Lecture notes forUJF Master course on data assimilation.https://team.inria.fr/moise/files/2012/03/Methodes-Inverses-Var-M2-math-2009.pdf

2. Bocquet M., 2014 : Introduction aux principes et methodes de l’assimilation de donnees en geophysique.Lecture notes for ENSTA - ENPC data assimilation course.http://cerea.enpc.fr/HomePages/bocquet/Doc/assim-mb.pdf

3. Bocquet M., E. Cosme and E. Blayo (Eds.), 2014: Advanced Data Assimilation for Geosciences.Oxford University Press.

4. Bouttier F. and P. Courtier, 1999: Data assimilation, concepts and methods. Meteorological trainingcourse lecture series ECMWF, European Center for Medium range Weather Forecast, Reading, UK.http://www.ecmwf.int/newsevents/training/rcourse notes/DATA ASSIMILATION/

ASSIM CONCEPTS/Assim concepts21.html

5. Cohn S., 1997: An introduction to estimation theory. Journal of the Meteorological Society of Japan, 75,257-288.

6. Daley R., 1993: Atmospheric data analysis. Cambridge University Press.7. Evensen G., 2009: Data assimilation, the ensemble Kalman filter. Springer.8. Kalnay E., 2003: Atmospheric modeling, data assimilation and predictability. Cambridge University Press.9. Lahoz W., B. Khattatov and R. Menard (Eds.), 2010: Data assimilation. Springer.

10. Rodgers C., 2000: Inverse methods for atmospheric sounding. World Scientific, Series on AtmosphericOceanic and Planetary Physics.

11. Tarantola A., 2005: Inverse problem theory and methods for model parameter estimation. SIAM.http://www.ipgp.fr/~tarantola/Files/Professional/Books/InverseProblemTheory.pdf

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61

Page 8: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

A simple but fundamentalexample

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 7/61

Page 9: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: least squares approach

Two different available measurements of a single quantity. Whichestimation for its true value ? −→ least squares approach

Example 2 obs y1 = 19◦C and y2 = 21◦C of the (unknown) presenttemperature x .

I Let J(x) = 12[(x − y1)

2 + (x − y2)2]

I Minx J(x) −→ x =y1 + y2

2 = 20◦C

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 8/61

Page 10: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: least squares approach

Two different available measurements of a single quantity. Whichestimation for its true value ? −→ least squares approach

Example 2 obs y1 = 19◦C and y2 = 21◦C of the (unknown) presenttemperature x .

I Let J(x) = 12[(x − y1)

2 + (x − y2)2]

I Minx J(x) −→ x =y1 + y2

2 = 20◦C

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 8/61

Page 11: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: least squares approachObservation operator If 6= units: y1 = 66.2◦F and y2 = 69.8◦F

I Let H(x) = 95 x + 32

I Let J(x) = 12[(H(x)− y1)

2 + (H(x)− y2)2]

I Minx J(x) −→ x = 20◦C

Drawback # 1: if observation units are inhomogeneousy1 = 66.2◦F and y2 = 21◦C

I J(x) = 12[(H(x)− y1)

2 + (x − y2)2] −→ x = 19.47◦C !!

Drawback # 2: if observation accuracies are inhomogeneousIf y1 is twice more accurate than y2, one should obtain x =

2y1 + y23 = 19.67◦C

−→ J should be J(x) = 12

[(x − y1

1/2

)2+

(x − y2

1

)2]

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 9/61

Page 12: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: least squares approachObservation operator If 6= units: y1 = 66.2◦F and y2 = 69.8◦F

I Let H(x) = 95 x + 32

I Let J(x) = 12[(H(x)− y1)

2 + (H(x)− y2)2]

I Minx J(x) −→ x = 20◦C

Drawback # 1: if observation units are inhomogeneousy1 = 66.2◦F and y2 = 21◦C

I J(x) = 12[(H(x)− y1)

2 + (x − y2)2] −→ x = 19.47◦C !!

Drawback # 2: if observation accuracies are inhomogeneousIf y1 is twice more accurate than y2, one should obtain x =

2y1 + y23 = 19.67◦C

−→ J should be J(x) = 12

[(x − y1

1/2

)2+

(x − y2

1

)2]

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 9/61

Page 13: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: least squares approachObservation operator If 6= units: y1 = 66.2◦F and y2 = 69.8◦F

I Let H(x) = 95 x + 32

I Let J(x) = 12[(H(x)− y1)

2 + (H(x)− y2)2]

I Minx J(x) −→ x = 20◦C

Drawback # 1: if observation units are inhomogeneousy1 = 66.2◦F and y2 = 21◦C

I J(x) = 12[(H(x)− y1)

2 + (x − y2)2] −→ x = 19.47◦C !!

Drawback # 2: if observation accuracies are inhomogeneousIf y1 is twice more accurate than y2, one should obtain x =

2y1 + y23 = 19.67◦C

−→ J should be J(x) = 12

[(x − y1

1/2

)2+

(x − y2

1

)2]

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 9/61

Page 14: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: statistical approach

Reformulation in a probabilistic framework:I the goal is to estimate a scalar value x

I yi is a realization of a random variable Yi

I One is looking for an estimator (i.e. a r.v.) X that isI linear: X = α1Y1 + α2Y2 (in order to be simple)I unbiased: E (X ) = x (it seems reasonable)I of minimal variance: Var(X ) minimum (optimal accuracy)

−→ BLUE (Best Linear Unbiased Estimator)

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 10/61

Page 15: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: statistical approach

Let Yi = x + εi with

HypothesesI E (εi) = 0 (i = 1, 2) unbiased measurement devicesI Var(εi) = σ2

i (i = 1, 2) known accuraciesI Cov(ε1, ε2) = 0 independent measurement errors

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 11/61

Page 16: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Reminder: covariance of two random variablesLet X and Y two random variables.

� Covariance: Cov(X ,Y ) = E [(X − E (X )) (Y − E (Y ))]= E (XY )− E (X )E (Y )

Cov(X ,X ) = Var(X )

� Linear correlation coefficient: ρ(X ,Y ) =Cov(X ,Y )

σX σY

� Property: X and Y independent =⇒ Cov(X ,Y ) = 0

The reciprocal is generally false.

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 12/61

Page 17: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: statistical approach

Let Yi = x + εi with

HypothesesI E (εi) = 0 (i = 1, 2) unbiased measurement devicesI Var(εi) = σ2

i (i = 1, 2) known accuraciesI Cov(ε1, ε2) = 0 independent measurement errors

Then, since X = α1Y1 + α2Y2 = (α1 + α2)x + α1ε1 + α2ε2 :I E (X ) = (α1 + α2)x + α1E (ε1) + α2E (ε2) =⇒ α1 + α2 = 1I Var(X ) = E

[(X − x)2

]= E

[(α1ε1 + α2ε2)

2] = α21σ

21 +(1−α1)

2σ22

∂α1= 0 =⇒ α1 =

σ22

σ21 + σ2

2

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 13/61

Page 18: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: statistical approach

Let Yi = x + εi with

HypothesesI E (εi) = 0 (i = 1, 2) unbiased measurement devicesI Var(εi) = σ2

i (i = 1, 2) known accuraciesI Cov(ε1, ε2) = 0 independent measurement errors

Then, since X = α1Y1 + α2Y2 = (α1 + α2)x + α1ε1 + α2ε2 :I E (X ) = (α1 + α2)x + α1E (ε1) + α2E (ε2) =⇒ α1 + α2 = 1

I Var(X ) = E[(X − x)2

]= E

[(α1ε1 + α2ε2)

2] = α21σ

21 +(1−α1)

2σ22

∂α1= 0 =⇒ α1 =

σ22

σ21 + σ2

2

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 13/61

Page 19: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: statistical approach

Let Yi = x + εi with

HypothesesI E (εi) = 0 (i = 1, 2) unbiased measurement devicesI Var(εi) = σ2

i (i = 1, 2) known accuraciesI Cov(ε1, ε2) = 0 independent measurement errors

Then, since X = α1Y1 + α2Y2 = (α1 + α2)x + α1ε1 + α2ε2 :I E (X ) = (α1 + α2)x + α1E (ε1) + α2E (ε2) =⇒ α1 + α2 = 1I Var(X ) = E

[(X − x)2

]= E

[(α1ε1 + α2ε2)

2] = α21σ

21 +(1−α1)

2σ22

∂α1= 0 =⇒ α1 =

σ22

σ21 + σ2

2

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 13/61

Page 20: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: statistical approachIn summary:

BLUE

X =

1σ2

1Y1 +

1σ2

2Y2

1σ2

1+

1σ2

2

Its accuracy:[Var(X )

]−1=

1σ2

1+

1σ2

2accuracies are added

go to general case

Remarks:I The hypothesis Cov(ε1, ε2) = 0 is not compulsory at all.

Cov(ε1, ε2) = c −→ αi =σ2

j −cσ2

1+σ22−2c

I Statistical hypotheses on the two first moments of ε1, ε2 lead tostatistical results on the two first moments of X .

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 14/61

Page 21: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: statistical approachIn summary:

BLUE

X =

1σ2

1Y1 +

1σ2

2Y2

1σ2

1+

1σ2

2

Its accuracy:[Var(X )

]−1=

1σ2

1+

1σ2

2accuracies are added

go to general case

Remarks:I The hypothesis Cov(ε1, ε2) = 0 is not compulsory at all.

Cov(ε1, ε2) = c −→ αi =σ2

j −cσ2

1+σ22−2c

I Statistical hypotheses on the two first moments of ε1, ε2 lead tostatistical results on the two first moments of X .

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 14/61

Page 22: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: statistical approach

Variational equivalenceThis is equivalent to the problem:

Minimize J(x) = 12

[(x − y1)

2

σ21

+(x − y2)

2

σ22

]

Remarks:I This answers the previous problems of sensitivity to inhomogeneous

units and insensitivity to inhomogeneous accuraciesI This gives a rationale for choosing the norm for defining J

I J ′′(x)︸ ︷︷ ︸convexity

=1σ2

1+

1σ2

2= [Var(x)]−1︸ ︷︷ ︸

accuracy

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 15/61

Page 23: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: statistical approach

Variational equivalenceThis is equivalent to the problem:

Minimize J(x) = 12

[(x − y1)

2

σ21

+(x − y2)

2

σ22

]

Remarks:I This answers the previous problems of sensitivity to inhomogeneous

units and insensitivity to inhomogeneous accuraciesI This gives a rationale for choosing the norm for defining J

I J ′′(x)︸ ︷︷ ︸convexity

=1σ2

1+

1σ2

2= [Var(x)]−1︸ ︷︷ ︸

accuracy

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 15/61

Page 24: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem

Alternative formulation: background + observation

If one considers that y1 is a prior (or background) estimate xb for x , andy2 = y is an independent observation, then:

J(x) = 12(x − xb)

2

σ2b︸ ︷︷ ︸

Jb

+12(x − y)2

σ2o︸ ︷︷ ︸

Jo

and

x =

1σ2

bxb +

1σ2

oy

1σ2

b+

1σ2

o

= xb +σ2

bσ2

b + σ2o︸ ︷︷ ︸

gain

(y − xb)︸ ︷︷ ︸innovation

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 16/61

Page 25: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem

Interpretation

If the background error and the observationerror are uncorrelated: E (eoeb) = 0, thenone can show that the estimation error andthe innovation are uncorrelated:

E (ea(Y − Xb)) = 0

→ orthogonal projection for the scalar prod-uct < Z1,Z2 >= E (Z1Z2)

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 17/61

Page 26: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: Bayesian approachOne can also consider x as a realization of a r.v. X , and be interested inthe pdf p(X |Y ).

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 18/61

Page 27: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Reminder: Bayes theoremLet A and B two events.

� Conditional probability: P(A|B) =P(A ∩ B)

P(B)

Example:P(heart card | red card) = 1

2 =P(heart card ∩ red card)

P(red card) =8/32

16/32

� Bayes theorem: P(A|B) =P(B|A)P(A)

P(B)

Thus, if X and Y are two random variables:

P(X = x |Y = y) = P(Y = y |X = x)P(X = x)P(Y = y)

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 19/61

Page 28: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: Bayesian approachOne can also consider x as a realization of a r.v. X , and be interested inthe pdf p(X |Y ).

Several optimality criteriaI minimum variance: XMV such that the spread around it is minimal

−→ XMV = E (X |Y )

I maximum a posteriori: most probable value of X given Y−→ XMAP such that ∂p(X |Y )

∂X = 0

I maximum likelihood: XML that maximizes p(Y |X )

I Based on the Bayes rule:P(X = x |Y = y) = P(Y = y |X = x)P(X = x)

P(Y = y)I requires additional hypotheses on prior pdf for X and for Y |XI In the Gaussian case, these estimations coincide with the BLUE

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 20/61

Page 29: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: Bayesian approach

Back to our example: observations y1 and y2 of an unknown value x .

The simplest approach: maximum likelihood (no prior on X )

Hypotheses:I Yi ↪→ N (X , σ2

i ) unbiased, known accuracies + known pdfI Cov(Y1,Y2) = 0 independent measurement errors

Likelihood function: L(x) = dP(Y1 = y1 and Y2 = y2 |X = x)

One is looking for xML = Argmax L(x) maximum likelihood estimation

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 21/61

Page 30: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: Bayesian approach

L(x) =2∏

i=1dP(Yi = yi |X = x) =

2∏i=1

1√2π σi

e− (yi −x)2

2σ2i

Argmax L(x) = Argmin (− lnL(x))

= Argmin 12

[(x − y1)

2

σ21

+(x − y2)

2

σ22

]

Hence xML =

1σ2

1y1 +

1σ2

2y2

1σ2

1+

1σ2

2

BLUE again(because of Gaussian hypothesis)

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 22/61

Page 31: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: synthesis

Data assimilation methods are often split into two families: variationalmethods and statistical methods.

I Variational methods: minimization of a cost function (least squaresapproach)

I Statistical methods: algebraic computation of the BLUE (withhypotheses on the first two moments), or approximation of pdfs (withhypotheses on the pdfs) and computation of the MAP estimator

I There are strong links between those approaches, depending on thecase (linear ? Gaussian ?)

TheoremIf you have understood this previous stuff, you have (almost) understoodeverything on data assimilation.

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 23/61

Page 32: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Model problem: synthesis

Data assimilation methods are often split into two families: variationalmethods and statistical methods.

I Variational methods: minimization of a cost function (least squaresapproach)

I Statistical methods: algebraic computation of the BLUE (withhypotheses on the first two moments), or approximation of pdfs (withhypotheses on the pdfs) and computation of the MAP estimator

I There are strong links between those approaches, depending on thecase (linear ? Gaussian ?)

TheoremIf you have understood this previous stuff, you have (almost) understoodeverything on data assimilation.

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 23/61

Page 33: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Generalization:variational approach

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 24/61

Page 34: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Generalization: arbitrary number of unknowns and observations

To be estimated: x =

x1...

xn

∈ IRn

Observations: y =

y1...

yp

∈ IRp

Observation operator: y ≡ H(x), with H : IRn −→ IRp

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 25/61

Page 35: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Generalization: arbitrary number of unknowns and observations

A simple example of observation operator

If x =

x1x2x3x4

and y =

(an observation of x1+x2

2an observation of x4

)

then H(x) = Hx with H =

12

12 0 0

0 0 0 1

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 26/61

Page 36: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Generalization: arbitrary number of unknowns and observations

To be estimated: x =

x1...

xn

∈ IRn

Observations: y =

y1...

yp

∈ IRp

Observation operator: y ≡ H(x), with H : IRn −→ IRp

Cost function: J(x) = 12 ‖H(x)− y‖2 with ‖.‖ to be chosen.

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 27/61

Page 37: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Reminder: norms and scalar products

u =

u1...

un

∈ IRn

� Euclidian norm: ‖u‖2 = uT u =n∑

i=1u2

i

Associated scalar product: (u, v) = uT v =n∑

i=1ui vi

� Generalized norm: let M a symmetric positive definite matrix

M-norm: ‖u‖2M = uT M u =

n∑i=1

n∑j=1

mij ui uj

Associated scalar product: (u, v)M = uT M v =n∑

i=1

n∑j=1

mij ui vj

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 28/61

Page 38: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Generalization: arbitrary number of unknowns and observations

To be estimated: x =

x1...

xn

∈ IRn

Observations: y =

y1...

yp

∈ IRp

Observation operator: y ≡ H(x), with H : IRn −→ IRp

Cost function: J(x) = 12 ‖H(x)− y‖2 with ‖.‖ to be chosen.

Remark(Intuitive) necessary (but not sufficient) condition for the existence of aunique minimum:

p ≥ n

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 29/61

Page 39: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Generalization: arbitrary number of unknowns and observations

To be estimated: x =

x1...

xn

∈ IRn

Observations: y =

y1...

yp

∈ IRp

Observation operator: y ≡ H(x), with H : IRn −→ IRp

Cost function: J(x) = 12 ‖H(x)− y‖2 with ‖.‖ to be chosen.

Remark(Intuitive) necessary (but not sufficient) condition for the existence of aunique minimum:

p ≥ n

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 29/61

Page 40: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Formalism “background value + new observations”

z =

(xby

)←− background←− new observations

The cost function becomes:

J(x) = 12 ‖x− xb‖2

b︸ ︷︷ ︸Jb

+12 ‖H(x)− y‖2

o︸ ︷︷ ︸Jo

The necessary condition for the existence of a unique minimum (p ≥ n)is automatically fulfilled.

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 30/61

Page 41: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Formalism “background value + new observations”

z =

(xby

)←− background←− new observations

The cost function becomes:

J(x) = 12 ‖x− xb‖2

b︸ ︷︷ ︸Jb

+12 ‖H(x)− y‖2

o︸ ︷︷ ︸Jo

The necessary condition for the existence of a unique minimum (p ≥ n)is automatically fulfilled.

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 30/61

Page 42: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

If the problem is time dependent

I Observations are distributed in time: y = y(t)

I The observation cost function becomes:

Jo(x) =12

N∑i=0‖Hi(x(ti))− y(ti)‖2

o

I There is a model describing the evolution of x: dxdt = M(x) with

x(t = 0) = x0. Then J is often no longer minimized w.r.t. x, butw.r.t. x0 only, or to some other parameters.

Jo(x0) =12

N∑i=0‖Hi(x(ti))− y(ti)‖2

o =12

N∑i=0‖Hi(M0→ti (x0))− y(ti)‖2

o

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 31/61

Page 43: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

If the problem is time dependent

I Observations are distributed in time: y = y(t)

I The observation cost function becomes:

Jo(x) =12

N∑i=0‖Hi(x(ti))− y(ti)‖2

o

I There is a model describing the evolution of x: dxdt = M(x) with

x(t = 0) = x0. Then J is often no longer minimized w.r.t. x, butw.r.t. x0 only, or to some other parameters.

Jo(x0) =12

N∑i=0‖Hi(x(ti))− y(ti)‖2

o =12

N∑i=0‖Hi(M0→ti (x0))− y(ti)‖2

o

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 31/61

Page 44: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

If the problem is time dependent

J(x0) =12 ‖x0 − xb

0‖2b︸ ︷︷ ︸

background term Jb

+12

N∑i=0‖Hi(x(ti))− y(ti)‖2

o︸ ︷︷ ︸observation term Jo

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 32/61

Page 45: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Uniqueness of the minimum ?

J(x0) = Jb(x0)+Jo(x0) =12 ‖x0−xb‖2

b +12

N∑i=0‖Hi(M0→ti (x0))−y(ti)‖2

o

I If H and M are linear then Jo is quadratic.

I However it generally does not have a unique minimum, since thenumber of observations is generally less than the size of x0 (theproblem is underdetermined: p < n).

Example: let (x t1 , x t

2) = (1, 1) and y = 1.1 an observa-tion of 1

2 (x1 + x2).

Jo(x1, x2) =12

(x1 + x2

2− 1.1

)2

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 33/61

Page 46: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Uniqueness of the minimum ?

J(x0) = Jb(x0)+Jo(x0) =12 ‖x0−xb‖2

b +12

N∑i=0‖Hi(M0→ti (x0))−y(ti)‖2

o

I If H and M are linear then Jo is quadratic.I However it generally does not have a unique minimum, since the

number of observations is generally less than the size of x0 (theproblem is underdetermined: p < n).

Example: let (x t1 , x t

2) = (1, 1) and y = 1.1 an observa-tion of 1

2 (x1 + x2).

Jo(x1, x2) =12

(x1 + x2

2− 1.1

)2

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 33/61

Page 47: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Uniqueness of the minimum ?

J(x0) = Jb(x0)+Jo(x0) =12 ‖x0−xb‖2

b +12

N∑i=0‖Hi(M0→ti (x0))−y(ti)‖2

o

I If H and M are linear then Jo is quadratic.I However it generally does not have a unique minimum, since the

number of observations is generally less than the size of x0 (theproblem is underdetermined).

I Adding Jb makes the problem of minimizing J = Jo + Jb well posed.

Example: let (x t1 , x t

2) = (1, 1) and y = 1.1 an observa-tion of 1

2 (x1 + x2). Let (xb1 , xb

2 ) = (0.9, 1.05)

J(x1, x2) =12

(x1 + x2

2− 1.1

)2

︸ ︷︷ ︸Jo

+12[(x1 − 0.9)2 + (x2 − 1.05)2]︸ ︷︷ ︸

Jb

−→ (x∗1 , x∗2 ) = (0.94166..., 1.09166...)

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 34/61

Page 48: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Uniqueness of the minimum ?

J(x0) = Jb(x0)+Jo(x0) =12 ‖x0−xb‖2

b +12

N∑i=0‖Hi(M0→ti (x0))−y(ti)‖2

o

I If H and/or M are nonlinear then Jo is no longer quadratic.

Example: the Lorenz system (1963)

dxdt = α(y − x)

dydt = βx − y − xz

dzdt = −γz + xy

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 35/61

Page 49: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Uniqueness of the minimum ?

J(x0) = Jb(x0)+Jo(x0) =12 ‖x0−xb‖2

b +12

N∑i=0‖Hi(M0→ti (x0))−y(ti)‖2

o

I If H and/or M are nonlinear then Jo is no longer quadratic.

Example: the Lorenz system (1963)

dxdt = α(y − x)

dydt = βx − y − xz

dzdt = −γz + xy

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 35/61

Page 50: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

http://www.chaos-math.org

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 36/61

Page 51: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Uniqueness of the minimum ?J(x0) = Jb(x0)+Jo(x0) =

12 ‖x0−xb‖2

b +12

N∑i=0‖Hi(M0→ti (x0))−y(ti)‖2

o

I If H and/or M are nonlinear then Jo is no longer quadratic.

Example: the Lorenz system (1963)

dxdt = α(y − x)

dydt = βx − y − xz

dzdt = −γz + xy

Jo(y0) =12

N∑i=0

(x(ti)− xobs(ti))2 dt

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 37/61

Page 52: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Uniqueness of the minimum ?

J(x0) = Jb(x0)+Jo(x0) =12 ‖x0−xb‖2

b +12

N∑i=0‖Hi(M0→ti (x0))−y(ti)‖2

o

I If H and/or M are nonlinear then Jo is no longer quadratic.

I Adding Jb makes it “more quadratic” (Jb is a regularization term),but J = Jo + Jb may however have several local minima.

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 38/61

Page 53: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Uniqueness of the minimum ?

J(x0) = Jb(x0)+Jo(x0) =12 ‖x0−xb‖2

b +12

N∑i=0‖Hi(M0→ti (x0))−y(ti)‖2

o

I If H and/or M are nonlinear then Jo is no longer quadratic.

I Adding Jb makes it “more quadratic” (Jb is a regularization term),but J = Jo + Jb may however have several local minima.

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 38/61

Page 54: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

A fundamental remark before going into minimizationaspects

Once J is defined (i.e. once all the ingredients are chosen: controlvariables, norms, observations. . . ), the problem is entirely defined. Henceits solution.

The “physical” (i.e. the most important) part of dataassimilation lies in the definition of J .

The rest of the job, i.e. minimizing J , is “only” technical work.

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 39/61

Page 55: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Minimum of a quadratic function in finite dimension

Theorem: Generalized (or Moore-Penrose) inverseLet M a p × n matrix, with rank n, and b ∈ IRp. (hence p ≥ n)

Let J(x) = ‖Mx− b‖2 = (Mx− b)T (Mx− b).

J is minimum for x = M+b , where M+ = (MT M)−1MT

(generalized, or Moore-Penrose, inverse).

Corollary: with a generalized normLet N a p × p symmetric definite positive matrix.

Let J1(x) = ‖Mx− b‖2N = (Mx− b)T N (Mx− b).

J1 is minimum for x = (MT NM)−1MT N b.

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 40/61

Page 56: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Minimum of a quadratic function in finite dimension

Theorem: Generalized (or Moore-Penrose) inverseLet M a p × n matrix, with rank n, and b ∈ IRp. (hence p ≥ n)

Let J(x) = ‖Mx− b‖2 = (Mx− b)T (Mx− b).

J is minimum for x = M+b , where M+ = (MT M)−1MT

(generalized, or Moore-Penrose, inverse).

Corollary: with a generalized normLet N a p × p symmetric definite positive matrix.

Let J1(x) = ‖Mx− b‖2N = (Mx− b)T N (Mx− b).

J1 is minimum for x = (MT NM)−1MT N b.

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 40/61

Page 57: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Link with data assimilation

In the case of a linear, time independent, data assimilation problem:

Jo(x) =12 ‖Hx− y‖2

o =12 (Hx− y)T R−1(Hx− y)

Optimal estimation in the linear case: Jo only

minx∈IRn

Jo(x) −→ x = (HT R−1H)−1HT R−1 y

go to statistical approach

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 41/61

Page 58: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Link with data assimilationWith the formalism “background value + new observations”:

J(x) = Jb(x) + Jo(x)=

12 ‖x− xb‖2

b +12 ‖H(x)− y‖2

o

=12 (x− xb)

T B−1(x− xb) +12 (Hx− y)T R−1(Hx− y)

= (Mx− b)T N (Mx− b) = ‖Mx− b‖2N

with M =

(InH

)b =

(xby

)N =

(B−1 00 R−1

)

Optimal estimation in the linear case: Jb + Jo

x = xb + (B−1 + HT R−1H)−1HT R−1︸ ︷︷ ︸gain matrix

(y−Hxb)︸ ︷︷ ︸innovation vector

Remark: The gain matrix also reads BHT (HBHT + R)−1

(Sherman-Morrison-Woodbury formula) go to statistical approach

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 42/61

Page 59: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Link with data assimilationWith the formalism “background value + new observations”:

J(x) = Jb(x) + Jo(x)=

12 ‖x− xb‖2

b +12 ‖H(x)− y‖2

o

=12 (x− xb)

T B−1(x− xb) +12 (Hx− y)T R−1(Hx− y)

= (Mx− b)T N (Mx− b) = ‖Mx− b‖2N

with M =

(InH

)b =

(xby

)N =

(B−1 00 R−1

)Optimal estimation in the linear case: Jb + Jo

x = xb + (B−1 + HT R−1H)−1HT R−1︸ ︷︷ ︸gain matrix

(y−Hxb)︸ ︷︷ ︸innovation vector

Remark: The gain matrix also reads BHT (HBHT + R)−1

(Sherman-Morrison-Woodbury formula) go to statistical approach

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 42/61

Page 60: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Remark

Given the size of n and p, it is generally impossible to handle explicitly H,B and R. So the direct computation of the gain matrix is impossible.

� even in the linear case (for which we have an explicit expression for x),the computation of x is performed using an optimization algorithm.

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 43/61

Page 61: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Generalization:statistical approach

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 44/61

Page 62: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Generalization: statistical approach

To be estimated: x =

x1...

xn

∈ IRn Observations: y =

y1...

yp

∈ IRp

Observation operator: y ≡ H(x), with H : IRn −→ IRp

Statistical framework:I y is a realization of a random vector Y

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 45/61

Page 63: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Generalization: statistical approach

To be estimated: x =

x1...

xn

∈ IRn Observations: y =

y1...

yp

∈ IRp

Observation operator: y ≡ H(x), with H : IRn −→ IRp

Statistical framework:I y is a realization of a random vector Y

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 45/61

Page 64: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Reminder: random vectors and covariance matrices

� Random vector: X =

X1...Xn

where each Xi is a random variable

� Covariance matrix:

C = (Cov(Xi ,Xj))1≤i,j≤n = E([X− E (X)][X− E (X)]T

)on the diagonal: Cii = Var(Xi)

� Property: A covariance matrix is symmetric positive semidefinite.(definite if the r.v. are linearly independent)

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 46/61

Page 65: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Generalization: statistical approach

To be estimated: x =

x1...

xn

∈ IRn Observations: y =

y1...

yp

∈ IRp

Observation operator: y ≡ H(x), with H : IRn −→ IRp

Statistical framework:I y is a realization of a random vector Y

I One is looking for the BLUE, i.e. a r.v. X that isI linear: X = AY with size(A) = (n, p)I unbiased: E (X) = xI of minimal variance:

Var(X) =n∑

i=1Var(Xi) = Tr(Cov(X)) minimum

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 47/61

Page 66: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Generalization: statistical approach

HypothesesI Linear observation operator: H(x) = HxI Let Y = Hx + ε with ε random vector in IRp

I E (ε) = 0 unbiased measurement devicesI Cov(ε) = E (εεT ) = R known accuracies and covariances

BLUE:I linear: X = AY with A(n, p)I unbiased: E (X) = E (AHx + Aε) = AHx + AE (ε) = AHx

So: E (X) = x⇐⇒ AH = In.

Remark: AH = In =⇒ ker H = {0} =⇒ rank(H) = n

Since size(H) = (p, n), this implies n ≤ p (again !)

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 48/61

Page 67: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Generalization: statistical approach

HypothesesI Linear observation operator: H(x) = HxI Let Y = Hx + ε with ε random vector in IRp

I E (ε) = 0 unbiased measurement devicesI Cov(ε) = E (εεT ) = R known accuracies and covariances

BLUE:I linear: X = AY with A(n, p)I unbiased: E (X) = E (AHx + Aε) = AHx + AE (ε) = AHx

So: E (X) = x⇐⇒ AH = In.

Remark: AH = In =⇒ ker H = {0} =⇒ rank(H) = n

Since size(H) = (p, n), this implies n ≤ p (again !)

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 48/61

Page 68: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Generalization: statistical approach

HypothesesI Linear observation operator: H(x) = HxI Let Y = Hx + ε with ε random vector in IRp

I E (ε) = 0 unbiased measurement devicesI Cov(ε) = E (εεT ) = R known accuracies and covariances

BLUE:I linear: X = AY with A(n, p)I unbiased: E (X) = E (AHx + Aε) = AHx + AE (ε) = AHx

So: E (X) = x⇐⇒ AH = In.

Remark: AH = In =⇒ ker H = {0} =⇒ rank(H) = n

Since size(H) = (p, n), this implies n ≤ p (again !)

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 48/61

Page 69: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

BLUE:I minimal variance: min Tr(Cov(X))

X = AY = AHx + Aε = x + Aε

Cov(X) = E([X− E (X)][X− E (X)]T

)= AE (εεT )AT = ARAT

Find A that minimizes Tr(ARAT ) under the constraint AH = In

Gauss-Markov theorem

A = (HT R−1H)−1HT R−1

This also leads to Cov(X) = (HT R−1H)−1

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 49/61

Page 70: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

BLUE:I minimal variance: min Tr(Cov(X))

X = AY = AHx + Aε = x + Aε

Cov(X) = E([X− E (X)][X− E (X)]T

)= AE (εεT )AT = ARAT

Find A that minimizes Tr(ARAT ) under the constraint AH = In

Gauss-Markov theorem

A = (HT R−1H)−1HT R−1

This also leads to Cov(X) = (HT R−1H)−1

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 49/61

Page 71: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

BLUE:I minimal variance: min Tr(Cov(X))

X = AY = AHx + Aε = x + Aε

Cov(X) = E([X− E (X)][X− E (X)]T

)= AE (εεT )AT = ARAT

Find A that minimizes Tr(ARAT ) under the constraint AH = In

Gauss-Markov theorem

A = (HT R−1H)−1HT R−1

This also leads to Cov(X) = (HT R−1H)−1

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 49/61

Page 72: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Link with the variational approachStatistical approach: BLUE

X = (HT R−1H)−1HT R−1Y with Cov(X) = (HT R−1H)−1

go to variational approach

Variational approach in the linear case

Jo(x) =12 ‖Hx− y‖2

o =12 (Hx− y)T R−1(Hx− y)

minx∈IRn

Jo(x) −→ x = (HT R−1H)−1HT R−1 y

RemarksI The statistical approach rationalizes the choice of the norm in the

variational approach.

I[Cov(X)

]−1

︸ ︷︷ ︸accuracy

= HT R−1H = Hess(Jo)︸ ︷︷ ︸convexity

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 50/61

Page 73: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Link with the variational approachStatistical approach: BLUE

X = (HT R−1H)−1HT R−1Y with Cov(X) = (HT R−1H)−1

go to variational approach

Variational approach in the linear case

Jo(x) =12 ‖Hx− y‖2

o =12 (Hx− y)T R−1(Hx− y)

minx∈IRn

Jo(x) −→ x = (HT R−1H)−1HT R−1 y

RemarksI The statistical approach rationalizes the choice of the norm in the

variational approach.

I[Cov(X)

]−1

︸ ︷︷ ︸accuracy

= HT R−1H = Hess(Jo)︸ ︷︷ ︸convexity

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 50/61

Page 74: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Link with the variational approachStatistical approach: BLUE

X = (HT R−1H)−1HT R−1Y with Cov(X) = (HT R−1H)−1

go to variational approach

Variational approach in the linear case

Jo(x) =12 ‖Hx− y‖2

o =12 (Hx− y)T R−1(Hx− y)

minx∈IRn

Jo(x) −→ x = (HT R−1H)−1HT R−1 y

RemarksI The statistical approach rationalizes the choice of the norm in the

variational approach.

I[Cov(X)

]−1

︸ ︷︷ ︸accuracy

= HT R−1H = Hess(Jo)︸ ︷︷ ︸convexity

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 50/61

Page 75: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Statistical approach: formalism “background value + newobservations”

Z =

(XbY

)←− background←− new observations

Let Xb = x + εb and Y = Hx + εo

Hypotheses:I E (εb) = 0 unbiased backgroundI E (εo) = 0 unbiased measurement devicesI Cov(εb, εo) = 0 independent background and observation errorsI Cov(εb) = B et Cov(εo) = R known accuracies and covariances

This is again the general BLUE framework, with

Z =

(XbY

)=

(InH

)x +

(εbεo

)and Cov(ε) =

(B 00 R

)

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 51/61

Page 76: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Statistical approach: formalism “background value + newobservations”

Statistical approach: BLUEX = Xb + (B−1 + HT R−1H)−1HT R−1︸ ︷︷ ︸

gain matrix

(Y−HXb)︸ ︷︷ ︸innovation vector

with[Cov(X)

]−1= B−1 + HT R−1H accuracies are added

go to model problem

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 52/61

Page 77: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Link with the variational approachStatistical approach: BLUE

X = Xb + (B−1 + HT R−1H)−1HT R−1(Y−HXb)

with Cov(X) = (B−1 + HT R−1H)−1 go to variational approach

Variational approach in the linear caseJ(x) =

12 ‖x− xb‖2

b +12 ‖H(x)− y‖2

o

=12 (x− xb)

T B−1(x− xb) +12 (Hx− y)T R−1(Hx− y)

minx∈IRn

J(x) −→ x = xb + (B−1 + HT R−1H)−1HT R−1 (y−Hxb)

Same remarks as previouslyI The statistical approach rationalizes the choice of the norms for Jo and Jb

in the variational approach.

I[Cov(X)

]−1

︸ ︷︷ ︸accuracy

= B−1 + HT R−1H = Hess(J)︸ ︷︷ ︸convexity

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 53/61

Page 78: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Link with the variational approachStatistical approach: BLUE

X = Xb + (B−1 + HT R−1H)−1HT R−1(Y−HXb)

with Cov(X) = (B−1 + HT R−1H)−1 go to variational approach

Variational approach in the linear caseJ(x) =

12 ‖x− xb‖2

b +12 ‖H(x)− y‖2

o

=12 (x− xb)

T B−1(x− xb) +12 (Hx− y)T R−1(Hx− y)

minx∈IRn

J(x) −→ x = xb + (B−1 + HT R−1H)−1HT R−1 (y−Hxb)

Same remarks as previouslyI The statistical approach rationalizes the choice of the norms for Jo and Jb

in the variational approach.

I[Cov(X)

]−1

︸ ︷︷ ︸accuracy

= B−1 + HT R−1H = Hess(J)︸ ︷︷ ︸convexity

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 53/61

Page 79: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Link with the variational approachStatistical approach: BLUE

X = Xb + (B−1 + HT R−1H)−1HT R−1(Y−HXb)

with Cov(X) = (B−1 + HT R−1H)−1 go to variational approach

Variational approach in the linear caseJ(x) =

12 ‖x− xb‖2

b +12 ‖H(x)− y‖2

o

=12 (x− xb)

T B−1(x− xb) +12 (Hx− y)T R−1(Hx− y)

minx∈IRn

J(x) −→ x = xb + (B−1 + HT R−1H)−1HT R−1 (y−Hxb)

Same remarks as previouslyI The statistical approach rationalizes the choice of the norms for Jo and Jb

in the variational approach.

I[Cov(X)

]−1

︸ ︷︷ ︸accuracy

= B−1 + HT R−1H = Hess(J)︸ ︷︷ ︸convexity

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 53/61

Page 80: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

If the problem is time dependentDynamical system: xt(tk+1) = M(tk , tk+1)xt(tk) + e(tk)

I xt(tk) true state at time tk

I M(tk , tk+1) model assumed linear between tk and tk+1

I e(tk) model error at time tk

At every observation time tk , we have an observation yk and a modelforecast xf (tk). The BLUE can be applied:

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 54/61

Page 81: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

If the problem is time dependent

xt(tk+1) = M(tk , tk+1) xt(tk) + e(tk)

HypothesesI e(tk) is unbiased, with covariance matrix QkI e(tk) and e(tl) are independent (k 6= l)I Unbiased observation yk , with error covariance matrix RkI e(tk) and analysis error xa(tk)− xt(tk) are independent

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 55/61

Page 82: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

If the problem is time dependentKalman filter (Kalman and Bucy, 1961)Initialization: xa(t0) = x0 approximate initial state

Pa(t0) = P0 error covariance matrix

Step k: (prediction - correction, or forecast - analysis)

xf (tk+1) = M(tk , tk+1) xa(tk) ForecastPf (tk+1) = M(tk , tk+1)Pa(tk)MT (tk , tk+1) + Qk

xa(tk+1) = xf (tk+1) + Kk+1[yk+1 −Hk+1xf (tk+1)

]BLUE

Kk+1 = Pf (tk+1)HTk+1

[Hk+1Pf (tk+1)HT

k+1 + Rk+1]−1

Pa(tk+1) = Pf (tk+1)−Kk+1Hk+1Pf (tk+1)

where exponents f and a stand respectively for forecast and analysis.

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 56/61

Page 83: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

If the problem is time dependent

Equivalence with the variational approachIf Hk and M(tk , tk+1) are linear, and if the model is perfect (ek = 0),then the Kalman filter and the variational method minimizing

J(x) =12(x− x0)

T P−10 (x− x0) +

12

N∑k=0

(HkM(t0, tk)x− yk)T R−1

k (HkM(t0, tk)x− yk)

lead to the same solution at t = tN .

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 57/61

Page 84: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

In summary

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 58/61

Page 85: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

In summaryvariational approach least squares minimization (non dimensional terms)

I no particular hypothesisI either for stationary or time dependent problemsI If M and H are linear, the cost function is quadratic:

a unique solution if p ≥ nI Adding a background term ensures this property.I If things are non linear, the approach is still valid.

Possibly several minima

statistical approachI hypotheses on the first two momentsI time independent + H linear + p ≥ n: BLUE (first

two moments)I time dependent + M and H linear: Kalman filter

(based on the BLUE)

I hypotheses on the pdfs: Bayesian approach (pdf) +ML or MAP estimator

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 59/61

Page 86: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

In summary

The statistical approach gives a rationale for the choice of the norms, andgives an estimation of the uncertainty.

time independent problems if H is linear, the variational and thestatistical approaches lead to the same solution (provided‖.‖b is based on B−1 and ‖.‖o is based on R−1)

time dependent problems if H and M are linear, if the model is perfect,both approaches lead to the same solution at final time.

4D-Var Kalman filter

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 60/61

Page 87: An introduction to data assimilation · E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 6/61. A simple but fundamental example E. Blayo - An introduction to data

Common main methodological difficulties

I Non linearities: J non quadratic / what about Kalman filter ?I Huge dimensions [x] = O(106 − 109): minimization of J /

management of huge matricesI Poorly known error statistics: choice of the norms / B,R,Q

I Scientific computing issues (data management, code efficiency,parallelization...)

−→ NEXT LECTURE

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 61/61