validation of uncertain predictions against uncertain observations scott ferson, [email protected] 16...

40
Validation of uncertain predictions against uncertain observations Scott Ferson, [email protected] ober 2007, Stony Brook University, MAR 550, Challen

Upload: amberlynn-leonard

Post on 18-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Validation of uncertain predictions against uncertain observations

Scott Ferson, [email protected] October 2007, Stony Brook University, MAR 550, Challenger 165

Page 2: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

V & V

• Verification (checking the math)– Code testing– Interval analysis, probability bounds analysis– Units/dimension checking

• Validation (checking against data)

Page 3: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Goals

• Objectively measure the conformance of predictions with empirical data

• Use this measure to characterize the reliability of other predictions

Page 4: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Initial setting

• The model is fixed, at least for the time being– No changing it on the fly during validation

• A prediction is a probability distribution– Expressing stochastic uncertainty

• Observations are precise (scalar) numbers– Measurement uncertainty is negligible

relaxed later

Page 5: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Validation metric

• A measure of the mismatch between the observed data and the model’s predictions– Low value means a good match– High value means they disagree

• Distance between prediction and data

Page 6: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Desirable properties of a metric

• Should be expressed in physical units

• Should generalize deterministic comparisons

• Should reflect performance of full distribution

• Shouldn’t be too sensitive to long tails

• Should be a true mathematical metric

• Should be unbounded (you can be really off)

Page 7: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

200

250

300

350

400

1000900800700600

Time [seconds]

Tem

pera

ture

[de

gree

s C

elsi

us]

How the data come

Page 8: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

How we look at them

0

1

Pro

babi

lity

200 250 300 350 450400Temperature

Page 9: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

0

1

Pro

babi

lity

200 250 300 350 450400Temperature

One suggestion for a metric

Area or average horizontal distance between the empirical distribution Sn and the predicted distribution

Page 10: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Reflects full distribution

Matches in mean

Both mean and variance

Matches well overall

5 150

1

0 10 200

10 10 20

0

1

10

Pro

babi

lity

Page 11: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Single observation

0

1

0 1 2 3 4

Pro

babi

lity

Page 12: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Single observation

• A single datum can’t match an entire distribution (unless it’s degenerate)

• Single datum matches best if it’s at the median

• If the prediction is a uniform distribution over [a,b], a single observation can’t be any ‘closer’ to it than (b a)/4

Page 13: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

When the prediction is really bad

• The metric degenerates to simple distance

• Probability is dimensionless, so units are the same

0

1

6 8 10 12 14 16 18 20 22 24 26 280 2 4

d 24

Pro

babi

lity

Page 14: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Depends on the local scale

• The metric depends on the units

• Can standardize (divide by s.d.), but this means the metric will no longer be in physical units

0

1

0 1 2 3 4

0

1

0 100 200 300 400

d 0.45

d 45

Page 15: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Why physical units?

• Distributions in the left graph don’t overlap but they seem closer than those on the right

0

1

0 1 2 3 4

Pro

babi

lity

0 1 2 3 4

1

0

Page 16: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Why an unbounded metric?

• Neither overlaps, but left is better fit than right

• Smirnov’s metric Dmax considers these two cases indistinguishable (they’re both just ‘far’)

0

1

0 1 2 3 4

Pro

babi

lity

0 10 20 30 40

Page 17: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

200

250

300

350

400

1000900800700600

Time [seconds]

Tem

pera

ture

[de

gree

s C

elsi

us]

The model says different things

Page 18: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

0

1

Pro

babi

lity

200 250 300 350 450400Temperature

Page 19: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Pooling data comparisons

• When data are to be compared against a single distribution, they’re pooled into Sn

• When data are compared against different distributions, this isn’t possible

• Conformance must be expressed on some universal scale

Page 20: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Universal scale

ui=Fi (xi) where xi are the data and Fi are their

respective predictions

1 10 100 10000

1

0 1 2 3 40

1

0 100

1

u1

u2

u3

5

Pro

babi

lity

Page 21: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Pro

babi

lity

Backtransforming to physical scale

u

0 50

1

1 32 4

Pro

babi

lity G

Page 22: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Backtransforming to physical scale

• The distribution of G1(Fi (xi)) represents the

empirical data (like Sn does) but in a common, transformed scale

• Could pick any of many scales, and each leads to a different value for the metric

• The distribution of interest is the one used for the regulatory statement

Page 23: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Number of function evaluations

• Some models are difficult to evaluate

• Extracting distributional predictions may be expensive in terms of function evaluations – Blame the modeler rather than the validator!

• Can our validation metric be applied when only very coarse predictions based on few function evaluations are available?

Page 24: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Coarse prediction

0

1

0 1 2 3 4

Pro

babi

lity

Prediction can be expressed as an ‘empirical’ distribution too

Page 25: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Statistical test for model accuracy

• Kolmogorov-Smirnov test of distribution of ui’s against uniform over [0,1]

• This tests whether the empirical data are as though they were drawn from the respective prediction distributions

Probability integral transform theorem (Angus 1994) says the u’s will be distributed as uniform(0,1) if xi ~ Fi

• Assumes the empirical data are independent of each other

Page 26: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Generalizations

• Nonrandom sampling of observations

• Measurement uncertainty of empirical observations

• Imprecise predictions (intervals or p-boxes)

Page 27: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Epistemic uncertainty in predictions

• In left, the datum evidences no discrepancy at all• In middle, the discrepancy is relative to the edge• In right, the discrepancy is even smaller

Pro

babi

lity

0 10 200

1

0 10 200

1

0 10 200

1

a = N([5,11],1)show a

b = 8.1show b in blue

b = 15

breadth(env(rightside(a),b)) 4.023263478773

b = 11breadth(env(rightside(a),b)) / 2 0.4087173895951 d = 0 d 4 d 0.4

Page 28: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Epistemic uncertainty in bothP

roba

bili

ty

0 5 100

1

0 5 100

1

0 5 100

1

z=0.0001; zz =9.999show z,zza = N([6,7],1)-1show a

b = -1+mix(1,[5,7], 1,[6.5,8], 1,[7.6,9.99], 1, [3.3,6], 1,[4,8], 1,[4.5,8], 1,[5,7], 1,[7.5,9], 1,[4,8], 1,[5,9], 1,[6,9.99])show b in blue

b = -0.2+mix(1, [9,9.6],1, [5.3,6.2], 1,[5.6,6], 1,[7.8,8.4], 1,[5.9,7.8], 1,[8.3,8.7], 1,[5,7], 1,[7.5,8], 1,[7.6,9.99], 1, [3.3,6], 1,[4,8], 1,[4.5,8], 1,[5,7], 1,[8.5,9], 1,[7,8], 1,[7,9], 1,[8,9.99])

breadth(env(rightside(a),b)) 2.137345705795

c = -4b = -0.2+mix(1, [9,9.6],1, [5.3,6.2]+c, 1,[5.6,6]+c, 1,[7.8,8.4], 1,[5.9,7.8], 1,[8.3,8.7], 1,[5,7], 1,[7.5,8], 1,[7.6,9.99], 1, [3.3,6], 1,[4,8], 1,[4.5,8]+c, 1,[5,7]+c, 1,[8.5,9], 1,[7,8], 1,[7,9], 1,[8,9.99])

breadth(env(rightside(a),b)) / 2 1.329372857714

d = 0 d 0.05 d 0.07

Predictions in whiteObservations in blue

Page 29: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Validation: summary

• Both assessment and reliability of extrapolation– How good is the model?– Should we trust its pronouncements?

• Need metric to be both ad hoc and universal

• Updating is a separate activity

• Epistemic uncertainty introduces some wrinkles– Full credit for being modest about predictions

Page 30: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Complexities we can handle

• Variability in the experimental data• Prediction is a probability distribution rather than a point• Large measurement uncertainty in the data• Multiple predictions (dimensions) to be assessed• Available data aren’t directly relevant to the predictions• Validation data collected under other conditions• Model’s predictions are extremely expensive to compute

Page 31: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

End

Page 32: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger
Page 33: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Definition of a true metric

• Positive, d(x, y) 0

• Symmetric, d(x, y) = d(y, x)

• Identicals indistinguishable, d(x, y) = 0 x = y

• Triangle inequality, d(x, y) + d(y, z) d(x, z)

• Quasi-, semi-, pseudo-, ultra-metric

Page 34: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Other metrics

• Area is only one of many possible metrics

• Area favors central tendency (median)

• Could also use the medial distance from a datum to the distribution, or maybe the 95th percentile of distances

• Might prefer conformance in the tails, or one tail in particular

Page 35: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Degrees of impossibility

• If a datum is completely outside the range of the prediction, it’s ‘impossible’

• Transforming to the u scale makes it 0 or 1

• We’d like to preserve how far outside it is

Page 36: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

0

1

0 10 20-1

2

30 40

Extended distribution functions F <(x), x < 0

F*(x) = F(x), 0 x 1F

>(x), x > 1

0 10 200

1

Pro

babi

lity

FF*Extension slopes can be set by the distribution’s dispersion, to mimic tails, or as just relocated 45 lines

Page 37: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Using extensions in the metric

• Extended functions Fi* can be used to get u’s (now no longer ranging only on [0,1])

• The common backtransformation scale can also be extended to G* to accept these u’s

• This allows values considered impossible by the prediction to be represented

Page 38: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Vector of outputs

• Usually want to treat dimensions separately

• Possible to unify (pool) prediction-observation pairs even if they’re from different dimensions– Degrees, seconds, pascals, meters, etc.

• But there’s no G for backcalculation and so there can’t be a physically meaningful scale

Page 39: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Comparing accuracies

• Questions like “Is the match for temperature as good as the match for conductivity?” also require a universal scale to which all physical dimensions must be transformed

• If we do this, the metric becomes a norm

Page 40: Validation of uncertain predictions against uncertain observations Scott Ferson, scott@ramas.com 16 October 2007, Stony Brook University, MAR 550, Challenger

Uncertainty about a distribution

300 350 400 450 500

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ulat

ive

prob

abili

ty

95% confidence bounds on the normal distribution