dee: practice of quality controlncar summer colloquium 20031 practice of quality control dick dee...

NCAR Summer Colloquium 20031Dee: Practice of Quality Control

Practice of Quality Control

Dick Dee

Global Modeling and Assimilation Office

NASA Goddard Space Flight Center

NCAR Summer Colloquium 2003


Outline

• Motivation• QC procedures• The background check• The buddy check• An adaptive buddy check algorithm• The Bayesian framework • Variational quality control• Summary


QC Example 1: Rotated earth scenario


QC Example 2: Strange sat winds


QC Example 3: French Christmas Storm No. 2


Quality Control Procedures

• At the instrument site: – E.g. radiation correction for rawinsonde temperatures

• During the retrieval process:– E.g. cloud-track wind height assignment

• As part of preprocessing at the DAS site:– E.g. aircraft wind checks– E.g. hydrostatic checks for rawinsonde temperatures

• During the assimilation: Statistical quality control

• Background reading:

Some of the early papers in numerical weather map analysis:Bergthórsson and Döös 1955; Bedient and Cressman 1957

More recent papers with a good general discussion of QC:Lorenc and Hammon 1988; Collins and Gandin 1990


Statistical Quality Control

• Since this takes place late in the data assimilation process, a lot of information is at hand:

– Observations from various instruments– A short-term forecast valid at the time of the observations– Some information about expected errors

• Basic idea: check if each observed value is reasonable in view of all other available information

• Danger: rejecting good data / including bad data

This is clearly a problem in probability theory..


Background check

• Bergthórsson and Döös 1955; Bedient and Cressman 1957

• Compare each observation against its prediction based on first-guess fields(e.g. interpolated background)

• Flag or reject the observation if the difference is large(but what is large?)

Example: rawinsonde observed-minus-forecast temperature residuals


The background check as a hypothesis test

Definitions: observations

background

data residuals

In terms of errors:

Assumptions: errors for ‘good’ data

background errors

Therefore in the absence of gross errors.

For each single residual, the null hypothesis is

Reject the hypothesis if for some fixed tolerance

Probability of false rejection:


Traditional buddy check

• Identify a suspect observation (e.g. using a background check)

• Define a set of buddies (e.g. based on distance, data type)

• Predict the suspect from the buddies (e.g. using local OI)

• Reject the suspect observation if it is too far from the predicted value (based on error statistics)

• See: Lorenc 1981


The buddy check as a hypothesis test

Null hypothesis H0:

Divide into suspects and buddies:

Given H0, the conditional pdf of the suspects given the buddies is

where

Let

Reject the null hypothesis if for some fixed tolerance

The choice of determines the significance level δ of the test, which bounds the probability of false rejection of the null hypothesis:


Illustration of the buddy check


An adaptive buddy check algorithm

Loop:

End loop

identify suspects

predict suspects from buddies

prediction error covariances

null hypothesis:

adjust the error estimates


Illustration with fixed tolerances

true range (μ ± 2σ)

expected range

suspect observations

predicted suspects

rejected observations

acceptable discrepancy


Illustration with adaptive tolerances

adjusted range

adjusted range


Illustration with real data

Fixed tolerances

Adaptive tolerances


Some remarks on the adaptive buddy check

Very little dependence on prescribed error statistics in densely observed regions

… but reverts to a simple background check for isolated observations

Cheap and simple to implement, although parallel implementation takes some care

Not effective for detecting systematic gross errors (coherent batches of bad data)

Does not incorporate prior information about instrument reliability … but that can be done, following Lorenc and Hammon (1988)

The analysis is not a smooth function of the observations

Quality control and analysis are treated as separate steps in the assimilation process


The Bayesian framework (1)

For example, our earlier Gaussian error models:

can also be written as

See: Lorenc 1986, Cohn 1997

We can formulate the analysis problem in terms of conditional probabilities:


Example: Gaussian distributions

Lorenc and Hammon (1988)


The Bayesian framework (2)

The Bayesian framework is not restricted to Gaussian distributions and/or linear operators.

This represents the most likely state in view of the available information.

Actually we’d be happy with just the mode of the conditional pdf:

When h(x) is linear, J(x) is quadratic and the solution is

with

For Gaussian distributions,


Error models that account for bad data

Generalize the observation error model to account for possible gross errors:

If G is the event that a gross error occurred, then:

and

This is no longer a Gaussian pdf, and the variational problem becomes non-linear.

See: Purser 1984, Lorenc and Hammon 1988.


Example: Non-Gaussian observation errors

Lorenc and Hammon (1988)


Variational Quality Control at ECMWF (1)

After modification of p(y|x) to account for gross errors we have instead

Assuming independent Gaussian errors, the contribution of a single observation is

(cost)

(gradient)

Minimize cost function

where and

It turns out that is the a posteriori prob. of gross error


Example: Impact of an observation in VarQC

Andersson and Järvinen (1999)


Some remarks on variational QC

Strong dependence on prescribed error statistics

Implementation for observations with correlated errors is much more complicated

Not effective for detecting systematic gross errors (coherent batches of bad data)

Incorporates prior information about instrument reliability

In principle, the analysis is a smooth function of the observations … but not really (multiple minima)

Quality control and analysis are done simultaneously – each can take advantage of iterative improvement during the optimization

Requires a relatively strict background check to avoid convergence issues


Summary


Literature

• Andersson, E., and H. Järvinen, 1999: Variational quality control. Quart. J. Royal Meteor. Soc., 125, 697-722

• Bedient, H. A., and G. P. Cressman, 1957: An experiment in automatic data processing. Mon. Wea. Rev., 85, 333-340.

• Bergthórsson, P., and B. R. Döös, 1955: Numerical weather map analysis. Tellus, 7, 329-340• Collins, W. G., 1998: Complex quality control of significant level rawinsonde temperatures. J. Atmos.

Ocean. Tech., 15, 69-79.• Collins, W. G., and L. S. Gandin, 1990: Comprehensive hydrostatic quality control at the National

Meteorological Center. Mon. Wea. Rev., 118, 2752-2767• Dee, D. P., L. Rukhovets, R. Todling, A. M. da Silva, and J. W. Larson, 2001: An adaptive buddy check for

observational quality control. Quart. J. Royal Meteor. Soc., 114, 2451-2471.• Dharssi, I., A. C. Lorenc, and N. B. Ingleby, 1992: Treatment of gross errors using maximum probability

theory. Quart. J. Royal Meteor. Soc., 118, 1017-1036• Gandin, L. S., 1988: Complex quality control of meteorological observations. Mon. Wea. Rev., 116, 1137-

1156• Ingleby, N. B., and A. C. Lorenc, 1993: Bayesian quality control using multivariate normal distributions.

Quart. J. Royal Meteor. Soc., 119, 1195-1225.• Lorenc, A. C., 1981: A global three-dimensional multivariate statistical interpolation scheme. Mon. Wea.

Rev., 109, 701-721.• Lorenc, A. C., and O. Hammon, 1988: Objective quality control of observations using Bayesian methods:

Theory, and a practical implementation. Quart. J. Royal Meteor. Soc., 114, 515-543.• Purser, R. J., 1984: A new approach to the optimal assimilation of meteorological data by iterative

Bayesian analysis. Proceedings of 10th Conf. On Weather Forecasting and Analysis, American Meteorological Society, Boston, 102-105.

dee: practice of quality controlncar summer colloquium 20031 practice of quality control dick dee...

Documents