an introduction to data assimilation for the geosciences ross bannister amos lawless alison fowler...

An introduction to data assimilation for the geosciences

Ross BannisterAmos LawlessAlison Fowler

National Centre for Earth ObservationSchool of Mathematics and Physical Sciences

University of Reading

(A) Introductory lecture

(B) Variational intro + practical (C) Kalman filter + practicalDA ‘surgery’

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation of 20

What is data assimilation?

What is the temperature, T, of the fluid inside each jar as a function of time, t?

measurementat t=0:

thermometeryA(0)

radiometeryB(0)

(in-situ) (remotely sensed)

model TA(t) = Tenv + (TA(0)-Tenv) × exp –αAt TB(t) = Tenv + (TB(0)-Tenv) × exp –αBt

measurementat t=t:

thermometeryA(t)

radiometeryB(t)

A B



Data assimilation is concerned with how we combine these pieces of information to obtain the best possible knowledge of the system as a function of time.

Observations + gauge of uncertainty

Model estimates + gauge of uncertainty

Data assimilation → Combined estimate + gauge of uncertainty

prob

abili

ty

possible

Note on uncertainty:

value (observed or modelled)

Gaussian with std dev.σ = √<ε2>

“All models are wrong …” (George Box)

“All models are wrong and all observations are inaccurate” (a data assimilator)



star

t of t

he sy

stem

time

= observation

xtrue(t)(unknown)

xf(t1)xa(t1)

xf(t2)

xa(t2)

xf(t3)

This is an example of a ‘filter’

Data assimilation has:• prediction stages (xf = ‘forecast’, ‘prior’, ‘background’)• analysis stages (xa)

(extrapolation)(interpolation)



“[The atmosphere] is a chaotic system in which errors introduced into the system can grow with time … As a consequence, data assimilation is a struggle between chaotic destruction of knowledge and its restoration by new observations.”

Leith (1993)


Outline and references


Applications of data assimilation in the geosciences

A prototype data assimilation system

Indirect observations and prior knowledge

Errors

Leading data assimilation methods

Essential mathematics

Challenges, subtleties, caveats, …

References:• Kalnay, 2003, Atmospheric Modeling, Data Assimilation and Predictability.• Daley, 1991, Atmospheric Data Analysis.• Lorenc, 2003, The potential of the ensemble Kalman Filter for NWP – a comparison with 4d-Var, QJRMS 129, 3183-3203.• van Leeuwen, Particle filtering in geophysical systems.• Rodgers , 2000, Inverse methods for atmospheric sounding, theory and practice, World Scientific, Singapore.• Wang X., Snyder C., Hamill T.M., 2007, On the theoretical equivalence of differently proposed ensemble-3D-Var hybrid analysis

schemes, Mon. Wea. Rev. 135. pp. 222-227.


Applications of data assimilation in the geosciences

Atmospheric retrievals

H

L L

Atmospheric dynamics / NWP

Inverse modelling for sources/sinks

Reanalysis

Atmospheric chemistry

Hydrological cycle

Carbon cycle Oceanography

Parameter estimation

α, β, γ


A prototype data assimilation problem

Consider two sources of information (e.g. measurements), x1 ± σ1 and x2 ± σ2 that each estimate x (assume Gaussian statistics)

21

21

1

11 2

)(exp

2

1)|(

xx

xxp

22

22

2

22 2

)(exp

2

1)|(

xx

xxp

pn(xn|x) δxn :“the probability that the data xn lies between xn and xn+δxn given that the ‘true’ value is x”

The joint probability is p1(x1|x) δx1 p2(x2|x) δx2(“the probability that x1 is … and x2 is … given x”)

22

22

21

21

212121 2

)(

2

)(exp

2

1)|()|()|,(

xxxx

xxpxxpxxxp

In the above theory, x is known and x1 and x2 are unknown.Now introduce actual information x1 and x2 : now x1 and x2 are known and x is unknown.

What x maximizes p(x1, x2|x)?

Combining imperfect data


A prototype data assimilation problemCombining imperfect data

22

22

21

21

212121 2

)(

2

)(exp

2

1)|()|()|,(

xxxx

xxpxxpxxxp

What x maximizes p(x1, x2|x)? The same x that minimizes the ‘cost function’

22

22

21

21 )()(

2

1)(

xxxx

xI

To minimize, look for stationary values of I:

22

21

222

211

e /1/1

// 0

e

xxx

x

I

xx

If information source 1 is much more accurate than information source 2, then σ1 << σ2:

122

21

22

2121

e /1

/x

xxx

If information source 2 is much more accurate than information source 1, then σ2 << σ1:

221

22

221

221

e 1/

/x

xxx


Indirect observations and prior information

If x1 and x2 were measurements, they are direct measurements of x. Many observations are indirect. E.g.

Interested in (x) … Have observations of (y) …

Atmospheric T, O3, q, ρx Infrared radiances from satellite

Atmospheric T, q Time delays from GPS satellite

Sources of trace gases Trace gas measurements

Leaf area index Optical reflectance from satellite

Sea surface temperature Infrared or microwave radiances from satellite

Precipitation Radar reflectivity

Generalise:•x is the state vector (n elements)•ymo is the model’s version of the observations (mo=“model observations”) (p elements)•h is the forward model or observation operator (input n elements, output p elements)•y is the observation vector (p elements)

Strategy: what x gives best fit between y and ymo?)(mo xhy



)(mo xhy

modelin used parameters

ninformatio field modelx

The structure of the state vector (for the example of meteorological fields u, v, θ, p, q are 3-D fields; λ, φ and ℓ are longitude, latitude and vertical level). There are n elements in total.

The observation vector – comprising each observation made. There are p observations.

modelparameters

pn observatio

1n observatio

, mo yy



)(mo xhy

Examples of h•For in-situ observations, h is an interpolation function.•For radiance observations, h is a radiative transfer operator.•For observations at a later time than that of x, h includes a forecast model.

Prior information•Often the observations are insufficient to determine x.• Introduce prior information (a-priori, background, first guess, forecast), xf.

One strategy (variational assimilation) to solving the assimilation problem is to ask:

“What x (called xa [in earlier slide this was called xe]) gives:•ymo that is the closest possible to y and•x that is the closest possible to xf?”

Construct a cost functional and minimize w.r.t. x(a generalized least-squares problem).

22

f )(~)( xhyxxx J



))(())((2

1)()(

2

1

)(~)(

1Tf

1-f

Tf

22

f

xhyRxhyxxPxx

xhyxxx

J

Square of length of vector

Error covariance matrices define the norm (these respect the uncertainty of xf and y and are important!)• Pf forecast (or background) error covariance matrix (n × n matrix). Sometimes called B.•R observation error covariance matrix (p × p matrix).

This cost function•can be derived from Bayes’ Theorem by assuming forecast and obs errors obey Gaussian stats,•has argument, x (think of as a control variable),•may be extended to include fit to other unknowns in the system (e.g. the fact that h is imperfect,

including model parameters.

1T

1Tf

1-f

Tf

2

1

])),([(])),([(2

1)()(

2

1),,(

Q

pxhyRpxhyxxPxxpxJ


Errrors everywhere

Random errors:• background (a-priori) errors• observation errors• model errors• representivity errors

Systematic errors:• biases in background• biases in observations• biases in model

All significant sources of uncertainty should be accounted for in data assimilation

Example 1 – repeated observations of air temperature

y (T observations)

truthunbiased thermometer

truth

biased thermometer

Example 2 – representivity errors due to model grid


Leading methods of solving the DA problem

))(())((2

1)()(

2

1)( 1T

f1-

fT

f xhyRxhyxxPxxx J

Variational-type approach

Kalman filter-type approach (linear obs operator, Ht xt= ht(xt)

ttt

ttt

t

ttt

t

ttttttttt

ttttttttttt

QMPMP

xMx

PHHPHRHPIP

xHyHPHRHPxx

T1a1

1f

a11

f

f1T

fT

fa

f1T

fT

ffa

])([

)()(

)( minimizes that a xxx J

← analysis update at time t

← analysis error covariance

← forecast

← forecast error covarianceModel error covariance matrix

Linear forecast model


Leading methods of solving the da problem

Ensemble Kalman filter-type approach

Have N ensemble members (index i, 1 ≤ i ≤ N). Differences between them represent uncertainty.

Approximate the forecast error covariance matrix with an ensemble to make manageable the Kalman update equation for n << p

ppN

pnN

N

nnN

N

i

tti

ttti

tttt

N

i

tti

ttti

tN

i

tti

tti

tt

ti

tti

ttttttti

ti

N

i

tti

tti

t

1

1

1

1

1

1

)()(

1

1

T

1ffff

Tf

T

1ffff

T

1

T

ffffT

f

f1T

fT

ffa

1

T

fffff

xxHxxHHPH

xxHxx

HxxxxHP

xHyHPHRHPxx

xxxxP

ti

ti

tti

N

i

tti

ti

ti

tN

i

ti

ti

ttttt

tti

tti

ti

tti

ti

N

N

1

1

1

1Let

Let

Let

1T

1fffa

1

TTf

ff

f

dSyxxxx

RyyRHPHS

xxHy

xHyd

A superposition of ensemble members

But beware ...


Leading methods of solving the da problem

Method Description Pros Cons

A. Data insertion

Set grid points to observation values

1. Easy to do 1. No respect of uncertainty2. What about observation voids?3. Can’t deal with indirect observations

B. Variational data assimilation

Minimize a cost functionMany flavours: 3D, 4D, weak/strong constraint

1. Respect of data uncertainty2. Direct and indirect observations3. Pf gives smooth and balanced fields4. Efficient5. Can deal with (weakly) non-linear h

1. Pf is difficult to know, often static and suboptimal

2. High development costs3. h: need tangent linear, H and adjoint, HT

4. Gaussian pdfC. Kalman filtering

Evaluate KF equations

1. As B.1, B.2, B.32. Pf adapts with the state

1. As B.3, B.42. Difficult to use with non-linear h3. Prohibitively expensive for large n

D. Ensemble Kalman filtering

Approximate KF equations with ensemble of N model runsMany flavours

1. As B.1,B.2, B.4, B.5, C.22. h: do not need H and HT

3. Have measure of analysis spread

1. As B.42. Serious sampling issues when N << n3. Need ensemble inflation and localization

schemes to overcome D.2

E. Hybrid Cross between C/D 1. As B.1, B.2, B.3, B.4, B.5, C.2 1. As D.2

F. Particle filter

Assign weights to ensemble members to represent any pdf

1. As. B.1, B.22. Can deal with non-linear h3. Can deal with non-Gaussian pdf4. Have measure of analysis spread

1. As D.22. Inefficient – members often become

redundant3. Need special techniques to overcome F.2


Mathematics required

• Vector representation of fields• Matrix algebra• Linear vector spaces• Matrix inversion• Vector derivative• Generalized chain rule• Jacobians• Eigenvectors/eigenvalues• Singular vectors/values• Variances, covariances, correlations• Matrix rank• Lagrange multipliers

www.met.reading.ac.uk/~ross/MTMD02/MathTools.pdf


Summary of basic principles

• DA is concerned with estimating the state of a system given:• observations (direct [e.g. in-situ] and indirect [e.g. remotely sensed]),• forecast models (to provide a-priori data, given too-few obs),• observation operators (to connect model state with obs).

• All data have uncertainties, which must be quantified.• DA estimates are sensitive to uncertainty characteristics, which are often poorly known.• Many observations and model have systematic as well as random errors.• Should take into account all sources of error in the system.

• DA theory is suited mostly to errors that are Gaussian distributed.• Most errors are non-Gaussian and non-linearity is synonymous with non-Gaussianity.

• DA problems are computationally expensive and require intensive development effort.


Some subtleties and caveats of DA

• DA estimates are not the ‘truth’ and can be problematic for some kinds of analyses:• A good fit to observations does not guarantee that the analysis is correct!• E.g. if h-operator has inadequacies not accounted for, or if error covariances matrices are poor.• Unobserved parts of the system may be poor.• E.g. in meteorology, horizontal winds may be constrained well by obs, but implied vertical wind

may be poor.

• Assimilated fields may be subject to other constraints:• E.g. certain balance constraints.

• Be careful with error covariance matrices:• Pf, R need to be tuned for variational DA, Pf subject to sampling problems for ensemble DA.

• DA systems should be well tested before using real data:• Test h-operators (forecast models and obs. operators) – which parts of x is ymo sensitive to? • Adjoint tests, H, HT if using variational data assimilation.• Test DA system with simulated obs. from a made-up truth (identical twin experiments).• For assimilation of real data, validate analysis against independent obs. if possible.

an introduction to data assimilation for the geosciences ross bannister amos lawless alison fowler...

Documents

data assimilation page

uncertaintydata assimilation

filterdata assimilation

atmospheric data analysis

future observations

new observations

future temperature

function of time