uq: from end to end
DESCRIPTION
UQ: from end to end. Tony O’Hagan. Outline. Session 1: Quantifying input uncertainty Information Modelling Elicitation Coffee break: Propagation! Session 2: Model discrepancy and calibration All models are wrong Impact of model discrepancy Modelling model discrepancy. - PowerPoint PPT PresentationTRANSCRIPT
UQ: from end to end
Tony O’Hagan
UQ Summerschool 2014 2
Outline
Session 1: Quantifying input uncertainty
Information
Modelling
Elicitation
Coffee break: Propagation!
Session 2: Model discrepancy and calibration
All models are wrong
Impact of model discrepancy
Modelling model discrepancy
12/8/2014
UQ: from end to end
Session 1: Quantifying input uncertainty
UQ Summerschool 2014 4
Context
You have a modelTo simulate or predict some real-world process
I’ll call it a simulator
For a given use of the simulator you are unsure of the true or correct values of inputs
This uncertainty is a major component of UQ
Propagating it through the simulator is a fundamental step in UQ
We need to express that uncertainty in the form of probability distributions
But how?
I feel that this is a neglected area in UQDistributions assumed, often with no discussion of where they came from
12/8/2014
UQ Summerschool 2014 5
Focus of this session
Probability distributions for inputsRepresenting the analyst’s knowledge/uncertainty
What they meanInterpretation of probability
Where they come fromAnalysis of data and/or judgement
ElicitationPrinciples
Single input
Multiple inputs
Multiple experts
12/8/2014
UQ Summerschool 2014 6
The analyst
The distributions should represent the best knowledge of the model user about the inputs
I will refer to the model user as the analyst
They are the analyst’s responsibilityThe analyst is the one who is interested in the simulator output
For a particular application
And some or all of the inputs refer specifically to that application
The analyst must own the input distributions
They should represent best knowledgeObviously!
Anything else is unscientific
Less input uncertainty means (generally) less output uncertainty
12/8/2014
7
What probability?
Before we go further, we need to understand how a probability distribution represents someone’s knowledge
The question goes right to the heart of what probability means
Example:We are interested in X = the proportion of people infected with HIV who will develop AIDS within 10 years
when treated with a new drug
X will be an input to a clinical trial simulatorTo assist the pharmaceutical company in designing the drug’s development plan
Analyst Mary expresses a probability distribution for X
12/8/2014 UQ Summerschool 2014
8
Mary’s distribution
The stated distribution is shown on the right
It specifies how probable any particular values of X are
E.g. It says there is a probability of almost 0.7 that X is below 0.4
And the expected value of X is 0.35
It even gives a nontrivial probability to X being less than 0.2
Which would represent a major reduction in HIV progression
12/8/2014 UQ Summerschool 2014
9
How can X have probabilities?
Almost everyone learning probability is taught the frequency interpretation
The probability of something is the long run relative frequency with which it occurs in a very long sequence of repetitions
How can we have repetitions of X?It’s a one-off: it will only ever have one value
It’s that unique value we’re interested in
Simulator inputs are almost always like this – they’re one-off!
Mary’s distribution can’t be a probability distribution in that sense
So what do her probabilities actually mean?And does she know?
12/8/2014 UQ Summerschool 2014
10
Mary’s probabilities
Mary’s probability 0.7 that X < 0.4 is a judgementShe thinks it’s more likely to be below 0.4 than above
So in principle she would bet even money on it
In fact she would bet £2 to win £1 (because 0.7 > 2/3)
Her expectation of 0.35 is a kind of best estimateNot a long run average over many repetitions
Her probabilities are an expression of her beliefs
They are personal judgementsYou or I would have different probabilities
We want her judgements because she’s the expert!
We need a new definition of probability
12/8/2014 UQ Summerschool 2014
11
Subjective probability
The probability of a proposition E is a measure of a person’s degree of belief in the truth of E
If they are certain that E is true then P(E) = 1
If they are certain it is false then P(E) = 0
Otherwise P(E) lies between these two extremes
Exercise 1 – How many Muslims in Britain?Refer to the two questions on your sheet
The first asks for a probabilityMake your own personal judgement
If you don’t already have a good feel for the probability scale, you may find it useful to think about betting odds
The second asks for another probability
12/8/2014 UQ Summerschool 2014
12
Subjective includes frequency
The frequency and subjective definitions of probability are compatible
If the results of a very long sequence of repetitions are available, they agree
Frequency probability equates to the long run frequency
All observers who accept the sequence as comprising repetitions will give that frequency as their (personal/subjective) probability
for the next (or any future) result in the sequence
Subjective probability extends frequency probabilityBut also seamlessly covers propositions that are not repeatable
It’s also more controversial
12/8/2014 UQ Summerschool 2014
13
It doesn’t include prejudice etc!
The word “subjective” has derogatory overtonesSubjectivity should not admit prejudice, bias, superstition, wishful thinking, sloppy thinking, manipulation ...
Subjective probabilities are judgements but they should be careful, honest, informed judgements
As “objective” as possible without ducking the issue
Using best practiceFormal elicitation methods
Bayesian analysis
Probability judgements go along with all the other judgements that a scientist necessarily makes
And should be argued for in the same careful, honest and informed way
12/8/2014 UQ Summerschool 2014
UQ Summerschool 2014 14
What about data?
I’ve presented the analyst’s probability distributions as a matter of pure subjective judgement – what about data?
Many possible scenarios:X is a parameter for which there is a published value
Analyst has one or more direct experimental evaluations for X
Analyst has data relating more or less directly to X
Analyst has some hard data but also personal expertise about X
Analyst relies on personal expertise about X
Analyst seeks input from an expert on X
…
12/8/2014
UQ Summerschool 2014 15
The case of a published value
The published value may come with a completely characterised probability distribution for X representing uncertainty in the value
The analyst simply accepts this distribution as her own judgement
Or it may notThe analyst needs to consider the uncertainty in X around the published value P
X = P + E, where E is the error
Analyst formulates her own probability distribution for E
The published value P may simply come with a standard deviation
The analyst accepts this as one judgement about E
12/8/2014
UQ Summerschool 2014 16
Using data – principles
The appropriate framework for using data is Bayesian statistics
Because it delivers a probability distribution for X
Classical frequentist statistics can’t do thatEven a confidence interval is not a probability statement about X
The data are related to X through a likelihood functionDerived from a statistical model
This is combined with whatever additional knowledge the analyst may have
In the form of a prior distribution
Combination is performed by Bayes’ theorem
The result is the analyst’s posterior distribution for X
12/8/2014
UQ Summerschool 2014 17
Using data – practicalities
If data are highly informative about X, prior information may not matter
Use a conventional non-informative prior distribution
Otherwise the analyst formulates her own prior distribution for X
Bayesian analysis can be complexAnalyst is likely to need the services of a Bayesian statistician
The likelihood/model is also a matter of judgement!Although I will not delve into this today
12/8/2014
UQ Summerschool 2014 18
Summary
We have identified several situations where distributions need to be formulated by personal judgement
No good data – analyst formulates distribution for X
Published data does not have complete characterisation of uncertainty – analyst formulates distribution for E
Data supplemented by additional expertise – analyst formulates prior distribution for X
Analyst may seek judgements of one or more expertsRather than relying on her own
Particularly when the stakes are high
We have identified just one situation where personal judgements are not needed
Published data with completely characterised uncertainty
12/8/2014
19
Elicitation
The process of representing the knowledge
of one or more persons (experts)
concerning an uncertain quantity
as a probability distribution for that quantity
Typically conducted as a dialogue betweenthe experts – who have substantive knowledge about the quantity (or quantities) of interest – and
a facilitator – who has expertise in the process of elicitation
ideally face to facebut may also be done by video-conference, teleconference or online
12/8/2014 UQ Summerschool 2014
20
Some history
The idea of formally representing uncertainty using subjective probability judgements began to be taken seriously in the 1960s
For instance, for judgement of extreme risks
Psychologists became interestedHow do people make probability judgements?
What mental processes are used, and what does this tell us about the brain’s processing generally?
They found many ways that we make bad judgementsThe heuristics and biases movement
And continued to look mostly at how we get it wrongSince this told them a lot about our mental processes
12/8/2014 UQ Summerschool 2014
21
Meanwhile ...
Statisticians increasingly made use of subjective probabilities
Growth of Bayesian statistics
Some formal elicitation but mostly unstructured judgements
Little awareness of the work in psychology
Reinforced recently by UQ with uncertain simulator inputs
Our interests are more complexNot really interested in single probabilities
Whole probability distributions
Multivariate distributions
We want to know how to get it rightPsychology provides almost no help with these challenges
12/8/2014 UQ Summerschool 2014
22
Heuristics and biases
Our brains evolved to make quick decisionsHeuristics are short-cut reasoning techniques
Allow us to make good judgements quickly in familiar situations
Judgement of probability is not something that we evolved to do well
The old heuristics now produce biasesAnchoring and adjustment
Availability
Overconfidence
And many others12/8/2014 UQ Summerschool 2014
23
Anchoring and adjustment
Exercise 1 was designed to exhibit this heuristicThe probabilities should on average be different in the two groups
When asked to make two related judgements, the second is affected by the first
The second is judged relative to the first
By adjustment away from the first judgement
The first is called the anchor
Adjustment is typically inadequateSecond response too close to the first (anchor)
Anchoring can be strong even when obviouslynot really relevant to the second question
Just putting any numbers into the discussioncreates anchors
Exercise 112/8/2014 UQ Summerschool 2014
24
Availability
The probability of an event is judged more likely if we can quickly bring to mind instances of it
Things that are more memorable are deemed more probable
High profile train accidents in the UK lead people to imagine rail travel is more risky than it really is
My judgement of the risk of dying from a particular disease will be increased if I know (of) people who have the disease or have died from it
Important for analyst to review all the evidence12/8/2014 UQ Summerschool 2014
25
Overconfidence
It is generally said that experts are overconfidentWhen asked to give 95% intervals, say, far fewer than 95% contain the true value
Several possible explanationsWish to demonstrate expertise
Anchoring to a central estimate
Difficulty of judging extreme events
Not thinking ‘outside the box’Expertise often consists of specialist heuristics
Situations we elicit judgements on are not typical
Probably over-stated as a general phenomenonExperts can be under-confident if afraid of consequences
A matter of personality and feeling of security
Evidence of over-confidence is not from real experts making judgements on serious questions
12/8/2014 UQ Summerschool 2014
26
The keys to good elicitation
First, pay attention to the literature on psychology of elicitation
How you ask a question influences the answer
Second, ask about the right thingsThings that experts are likely to assess most accurately
Third, prepare thoroughlyProvide help and training for experts
These are built into the SHELF systemSheffield Elicitation Framework
12/8/2014 UQ Summerschool 2014
UQ Summerschool 2014 27
The SHELF system
SHELF is a package of documents and simple software to aid elicitation
General advice on conducting the elicitation
Templates for recording the elicitationSuitable for several different basic methods
Annotated versions of the templates with detailed guidance
Some R functions for fitting distributions and providing feedback
SHELF is freely available and comments and suggestions for additions are welcomed
Developed by Tony O’Hagan and Jeremy OakleyR functions by Jeremy
http://tonyohagan.co.uk/shelf
12/8/2014
UQ Summerschool 2014 28
A SHELF template
Word documentFacilitator follows acarefully constructedsequence of questions
Final step invites experts to give their own feed-back
The tertile methodOne of several supported in SHELF
12/8/2014
Annotated template
For facilitator’s guidanceAdvice on each fieldof the template
Ordinary text sayswhat is required ineach field
Text in brackets gives advice on how to work with experts
Text in italics says why we are doing it this way
Based on findings in psychology
12/8/2014 UQ Summerschool 2014
UQ Summerschool 2014 30
Let’s see how it works
12/8/2014
SHELF templates provide a carefully structured sequence of steps
Informed by psychology and practical experience
I’ll work through these, using the following illustrative example
An Expert is asked for her judgements about the distance D between the airports of Paris Charles de Gaulle and Chicago O’Hare
in miles
She has experience of flying distances but has not flown this route before
She knows that from LHR to JFK is about 3500 miles
UQ Summerschool 2014 31
Expert is asked for lower and upper credible boundsExpert would be very surprised if X was found to be below the lower credible bound or above the upper credible bound
It’s not impossible to be outside the credible range, just highly unlikely
Practical interpretation might be a probability of 1% that X is below L and 1% that it’s above U
ExampleExpert sets lower bound L = 3500
CDG to ORD surely more than LHR to JFK
Upper bound U = 5000Additional flying distance for CDG to ORD surely less than 1500
Credible range L to U
12/8/2014
UQ Summerschool 2014 32
The median M
The value of x for which the expert judges X to be equally likely to be above or below x
Probability 0.5 (or 50%) below and 0.5 above
Like a toss of a coinOr chopping the range into two equallyprobable parts
If the expert were asked to choose either to bet on X < x or on X > x, he/she should have no preference
It’s a specific kind of ‘estimate’ of X
Need to think, not just go for mid-point of the credible range
ExampleExpert chooses median M = 4000
12/8/2014
L = 0, U = 1M = 0.36
UQ Summerschool 2014 33
The quartiles Q1 and Q3
12/8/2014
The lower quartile Q1 is the p = 25% quantileThe expert judges X < x to have probability 0.25
Like tossing two successive Heads with a coin
Equivalently, x divides the range below the median into two equi-probable parts
‘Less than Q1’ & ‘between Q1 and M’Should generally be closer to M than Q1
Similarly, upper quartile Q3 is p = 75%
Q1, M and Q3 divide the range into four equi-probable parts
ExampleExpert chooses Q1 = 3850, Q3 = 4300
L = 0, U = 1M = 0.36Q1 = 0.25Q3 = 0.49
34
Then fit a distribution
Any convenient distribution As long as it fits the elicited summaries adequately
SHELF has software for fitting a range of standard distributions
At this point, the choice should not matterThe idea is that we have elicited enough
Any reasonable distribution choice will be similar to any other
Elicitation can never be exactThe elicited summaries are only approximate anyway
If the choice does matteri.e. different fitted distributions give different answers to the problem for which we are doing the elicitation
We can try to remove the sensitivity by eliciting more summaries Or involving more experts
12/8/2014 UQ Summerschool 2014
UQ Summerschool 2014 35
Exercise 2
So let’s do it!
We’re going to elicit your beliefs about one of the following (you can choose!)
Number of gold medals to be won by China in 2016 Olympics
Length of the Yangtze River
Population of Beijing in 2011
Proportion of the total world land area covered by China
12/8/2014
UQ Summerschool 2014 36
Do we need a facilitator?
Yes, if the simulator output is sufficiently importantA skilled facilitator is essential to get the most accurate and reliable representation of the expert’s knowledge
At least for the most influential inputs
Otherwise, noThe analyst can simply quantify her own judgements
But it’s still very useful to follow the SHELF process
In effect, the analyst interrogates herselfPlaying the role of facilitator as well as that of expert
12/8/2014
UQ Summerschool 2014 37
Multiple inputs
Hitherto we’ve basically considered just one input X
In practice, simulators almost always have multiple inputs
Then we need to think about dependence
Two or more uncertain quantities are independent if:When you learn something about one of them it doesn’t change your beliefs about the others
It’s a personal judgement, like everything else in elicitation!They may be independent for one expert but not for another
Independence is niceIndependent inputs can just be elicited separately
12/8/2014
38
Exercise 3
Which of the following sets of quantities would you consider independent?
1. The average weight B of black US males aged 40 and the average weight W of white US males aged 40
2. My height H and my age A
3. The time T taken by the Japanese bullet train to travel from Tokyo to Kyoto and the distance D travelled
4. The atomic numbers of Calcium (Ca), Silver (Ag) and Arsenic (As)
12/8/2014 UQ Summerschool 2014
39
Eliciting dependence
If quantities are not independent we must elicit the nature and magnitude of dependence between them
Remembering that probabilities are the best summaries to elicit
Joint probabilitiesProbability that X takes some x-values and Y takes some y-values
Conditional probabilitiesProbability that Y takes some y-values if X takes some x-values
Much harder to think about than probabilities for a single quantity
Perhaps the simplest is the quadrant probabilityProbability both X and Y are above their individual medians
12/8/2014 UQ Summerschool 2014
UQ Summerschool 2014 40
Bivariate quadrant probability
First elicit medians
Now elicit quadrant probabilityIt can’t be negative
Or more than 0.5
Value indicates direction and strength of dependence
0.25 if X and Y are independent
Greater if positively correlated0.5 if when one is above its median the other must be
Less than 0.25 if negatively correlated
Zero if they can’t both be above their medians
12/8/2014
Median
Median
?
0.5
0.5
41
Higher dimensions
This is already hardJust for two uncertain quantities
In order to elicit dependence in any depth we will need to elicit several more joint or conditional probabilities
More than two variables – more complex still!
Even with just three quantities...Three pairwise bivariate distributions
With constraints
The three-way joint distribution is not implied by those, eitherWe can’t even visualise or draw it!
There is no clear understanding among elicitation practitioners on how to elicit dependence
12/8/2014 UQ Summerschool 2014
42
Avoiding the problem
It would be so much easier if the quantities we chose to elicit were independent
i.e. no dependence or correlation between them
Then eliciting a distribution for each quantity would be enough
We wouldn’t need to elicit multivariate summaries
The trick is to ask about theright quantities
Redefine inputs so they become independent
This is called elaborationOr structuring
12/8/2014 UQ Summerschool 2014
UQ Summerschool 2014 43
Example – two treatment effects
A clinical trial will compare a new treatment with an existing treatment
Existing treatment effect A is relatively well knownExpert has low uncertainty
But added uncertainty due to the effects of the sample population
New treatment effect B is more uncertainEvidence mainly from small-scale Phase III trial
A and B will not be independentMainly because of the trial population effect
If A is at the high end of the expert’s distribution, she would expect B also to be relatively high
Can we break this dependence with elaboration?
12/8/2014
UQ Summerschool 2014 44
Relative effect
In the two treatments example, note that in clinical trials attention often focuses on the relative effect R = B/A
When effect is bad, like deaths, this is called relative risk
Expert may judge R to be independent of AParticularly if random trial effect is assumed multiplicative
If additive we might instead consider A independent of D = B – ABut this is unusual
So elicit separate distributions for R and AThe joint distribution of (A, B) is now implicit
Can be derived if needed
But often the motivating task can be rephrased in terms of (A, R)
12/8/2014
UQ Summerschool 2014 45
Trial effect
Instead of simple structuring with the relative risk R, we can explicitly recognise the cause of the correlation
Let T be the trial effect due to difference between the trial patients and the wider population
Let E and N be efficacies of existing and new treatments in the wider population
Then A = E x T and B = N x T
Expert may be comfortable with independence of T, E and N
With E well known, T fairly well known and N more uncertain
We now have to elicit distributions for three quantities instead of two
But can possibly assume them independent
12/8/2014
UQ Summerschool 2014 46
General principles
Independence or dependence are in the head of the expert
Two quantities are dependent if learning about one of them would change his/her beliefs about the other
Explore possible structures with the expert(s)Find out how they think about these quantities
Expertise often involves learning how to break a problem down into independent components
SHELF does not yet handle multivariate elicitationBut it does include an explicit structuring step
Which we can now see is potentially very important!
Templates for some special cases expected in the next release
12/8/2014
47
Multiple experts
The case of multiple experts is important
When elicitation is used to provide expert input to a decision problem with substantial consequences, we generally want to use the skill of as many experts as possible
But they will all have different opinionsDifferent distributions
How do we aggregate them?In order to get a single elicited distribution
12/8/2014 UQ Summerschool 2014
48
Aggregating expert judgements
Two approaches
1. Aggregate the distributionsElicit a distribution from each expert separately
Combine them using a suitable formulaFor instance, simply average them
Called ‘mathematical aggregation’ or ‘pooling’
2. Aggregate the expertsGet the experts together and elicit a single distribution
Called ‘behavioural aggregation’
Neither is without problems
12/8/2014 UQ Summerschool 2014
49
Multiple experts in SHELF
SHELF uses behavioural aggregation
However, distributions are first elicited from experts separately
After sharing of key information
Allows facilitator to see the range of belief before aggregation
Then experts discuss their differencesWith a view to assigning an aggregate distribution
To represent what an impartial, intelligent observer might reasonably believe after seeing the experts’ judgements and hearing their discussions
Facilitator can judge whether degree of compromise is appropriate to the intervening discussion
12/8/2014 UQ Summerschool 2014
50
Challenges in behavioural aggregation
More psychological hazardsGroup dynamic – dominant/reticent experts
Tendency to end up more confident
Block votes
Requires careful management
What to do if they can’t agree?End up with two or more composite distributions
Need to apply mathematical pooling to these
But this is rare in practice
12/8/2014 UQ Summerschool 2014
51
Conclusions – Session 1
The analyst needs to supply probability distributions for uncertain inputs
Probabilities are personal judgementsBut as objective and scientific as possible
Distributions should represent her best knowledge
A range of scenarios for specifying distributionsFrom pure judgement
When no good data are available
To simply using a published valueWith quantification of uncertainty around that value
Almost always, some part of the task will require distributions based on personal judgement
E.g. prior distributions for Bayesian analysis of data
12/8/2014 UQ Summerschool 2014
UQ Summerschool 2014 52
Elicitation is the process of formulating knowledge about an uncertain quantity as a probability distribution
Many pitfalls, practical and psychological
SHELF is a set of protocols designed to avoid pitfallsAn example of best practice in elicitation
Templates to guide the facilitator or analyst through a structured sequence of steps
Particular challenges arise when eliciting judgements about multiple inputs or from multiple experts
12/8/2014
UQ Summerschool 2014 53
And so to the coffee break
Once we have specified input distributions, the next task is propagation of uncertainty through the simulator
A well studied problem in UQ
Polynomial chaos favoured by engineers, mathematicians
Gaussian process emulators preferred by statisticiansIn my opinion, a more powerful and comprehensive UQ approach
12/8/2014
UQ: from end to end
Session 2: Model discrepancy and calibration
UQ Summerschool 2014 5512/8/2014
Case study – carbon flux
Vegetation can be a major factor in mitigating the increase of CO2 in the atmosphere
And hence reducing the greenhouse effect
Through photosynthesis, plants take atmospheric CO2
Carbon builds new plant material and O2 is released
But some CO2 is released again Respiration, death and decay
The net reduction of CO2 is called Net Biosphere Production (NBP)
I will refer to it as the carbon flux
Complex processes modelled in SDGVMSheffield Global Dynamic Vegetation Model
UQ Summerschool 2014 5612/8/2014
SDGVM C flux outputs for 2000
Map of SDGVM estimatesshows positive flux (C sink)in North, but negative(C source) in Midlands
Total estimated flux is9.06 Mt C
Highly dependent onweather, so will varygreatly between years
UQ Summerschool 2014 5712/8/2014
Quantifying input uncertainties
Plant functional type parameters (growth characteristics)
Expert elicitation
Soil composition (nutrients and decomposition)Simple analysis from extensive (published) data
Land cover (which PFTs are where)More complex Bayesian analysis of ‘confusion matrix’ data
UQ Summerschool 2014 5812/8/2014
Elicitation
Beliefs of expert (developer of SDGVMd) regarding plausible values of PFT parameters
Four PFTs – Deciduous broadleaf (DBL), evergreen needleleaf (ENL), crops, grass
Many parameters for each PFTKey ones identified by preliminary sensitivity analysis
Important to allow for uncertainty about mix of species in a site and role of parameter in the model
In the case of leaf life span for ENL, this was more complex
UQ Summerschool 2014 5912/8/2014
ENL leaf life span
UQ Summerschool 2014 6012/8/2014
Correlations
PFT parameter value at one site may differ from its value in another
Because of variation in species mix
Common uncertainty about average over all species induces correlation
Elicit beliefs about average over whole UKENL joint distributions are mixtures of 25 components, with correlation both between and within years
UQ Summerschool 2014 6112/8/2014
Soil composition
Percentages of sand, clay and silt, plus bulk density
Soil map available at high resolution
Multiple values in each SDGVM siteUsed to form average (central estimate)
And to assess uncertainty (variance)
Augmented to allow for uncertainty in original data (expert judgement)
Assumed independent between sites
UQ Summerschool 2014 6212/8/2014
Land cover map
LCM2000 is another high resolution mapObtained from satellite imagesVegetation in each pixel assigned to one of 26 classesAggregated to give proportions of each PFT at each siteBut data are uncertain
Field data are available at a sample of pixels
Countryside Survey 2000Table of CS2000 class versus LCM2000 class is called the confusion matrix
UQ Summerschool 2014 6312/8/2014
CS2000 versus LCM2000 matrix
Not symmetricRather small numbersBare is not a PFT and produces zero NBP
LCM2000CS2000
DBL ENL Grass Crop Bare
DBL 66 3 19 4 5
Enl 8 20 1 0 0
Grass 31 5 356 22 15
Crop 7 1 41 289 9
Bare 2 0 3 8 81
UQ Summerschool 2014 6412/8/2014
Modelling land cover
The matrix tells us about the probability distribution of LCM2000 class given the true (CS2000) class
Subject to sampling errors
But we need the probability distribution of true PFT given observed PFT
Posterior probabilities as opposed to likelihoodsWe need a prior distribution for land coverWe used observations in a neighbourhood
Implicitly assuming an underlying smooth random field
And the confusion matrix says nothing about spatial correlation of LCM2000 errors
We again relied on expert judgementUsing a notional equivalent number of independent pixels per site
UQ Summerschool 2014 6512/8/2014
Overall proportions
Red lines show LCM2000 proportionsClear overall biases
Analysis gives estimates for all PFTs in each SDGVM siteWith variances and correlations
UQ Summerschool 2014 66
Case study – results
Following on the carbon flux case study, input uncertainties were propagated through the SDGVM simulator
Extensive use of Gaussian process emulators
12/8/2014
UQ Summerschool 2014 6712/8/2014
Mean NBP corrections
UQ Summerschool 2014 6812/8/2014
NBP standard deviations
UQ Summerschool 2014 6912/8/2014
Aggregate across 4 PFTs
Mean NBP Standard deviation
UQ Summerschool 2014 7012/8/2014
England & Wales aggregate
PFTPlug-in estimate
(Mt C)Mean(Mt C)
Variance (Mt C2)
Grass 5.28 4.37 0.2453
Crop 0.85 0.43 0.0327
Deciduous 2.13 1.80 0.0221
Evergreen 0.80 0.86 0.0048
Covariances -0.0081
Total 9.06 7.46 0.2968
UQ Summerschool 2014 7112/8/2014
Sources of uncertainty
The total variance of 0.2968 is made up as follows
Variance due to PFT and soil inputs = 0.2642Variance due to land cover uncertainty = 0.0105Variance due to interpolation/emulation = 0.0222
Land cover uncertainty much larger for individual PFT contributions
Dominates for ENLBut overall tends to cancel outChanges estimates
Larger mean corrections and smaller overall uncertainty
But we haven’t addressed what is probably the biggest source of uncertainty in this carbon flux problem …
Notation
A simulator takes a number of inputs and produces a number of outputs
We can represent any output y as a function
y = f (x)
of a vector x of inputs
UQ Summerschool 2014 7212/8/2014
UQ Summerschool 2014 73
Example: A simple machine (SM)
A machine produces an amount of work y which depends on the amount of effort x put into it
Ideally, y = f(x, β) = βxParameter β is the rate at which effort can be converted to work
True value of β is β* = 0.65
Data zi = yi + εi
Graph shows observed dataPoints lie below y = 0.65x
For large enough x
Because of losses due to friction etc.
Large relative to observation errors
12/8/2014
UQ Summerschool 2014 74
The SM as a simulator
12/8/2014
A simulator produces output from inputs
When we consider calibration we divide its inputs intoCalibration parameters – unknown but fixed
Control variables – known features of application context
Calibration concerns learning about the calibration parameters
Using observations of the real process
Extrapolation concerns predicting the real processAt control variable values beyond where we have observations
We can view the SM as a (very simple) simulatorx is a control variable, β is a calibration parameter
UQ Summerschool 2014 75
Tuning and physical parameters
Calibration parameters may be physical or just for tuning
We adjust tuning parameters so the model fits reality better
We are not really interested in their ‘true’ values
We calibrate tuning parameters for prediction
Physical parameters are differentWe are often really interested in true physical values
The SM’s efficiency parameter β is physical
It’s the theoretically achievable efficiency in the absence of friction
We like to think that calibration can help us learn about them
12/8/2014
UQ Summerschool 2014 76
Exercise 4
Look at the four datasets, and in each case estimate the best fitting slope β
Draw a line by eye through the origin
Using a straight-edge
Read off the slope as the y value on the line when x = 1
Write that value beside the graph
The actual best-fitting calibrated values are:Dataset 1 – 0.58
Dataset 2 – 0.58
Dataset 3 – 0.66
Dataset 4 – 0.57
12/8/2014
UQ Summerschool 2014 77
Calibrating the SM
12/8/2014
It’s basically a simple linear regression through the originzi = βxi + εi
CalibrationPosterior distribution misses the true value completely
More data makes things worse
More and more tightlyconcentrated on thewrong value
We could use a quadraticregression but the problemwould remain
UQ Summerschool 2014 78
The problem is completely general
12/8/2014
Calibrating (inverting, tuning, matching) a wrong model gives parameter estimates that are wrong
Not equal to their true physical values – biased
With more data we become more sure of these wrong values
The SM is a trivial model, but the same conclusions apply to all models
All models are wrong
In more complex models it is just harder to see what is going wrong
Even with the SM, it takes a lot of data to see any curvature in reality
What can we do about this?
UQ Summerschool 2014 79
Model discrepancy
12/8/2014
The SM example suggests that we need to allow that the model does not correctly represent reality
For any values of the calibration parameters
The simulator outputs deviate systematically from realityModel discrepancy (or model bias or model error)
There is a difference between the model with best/true parameter values and reality
r(x) = f(x, θ) + δ(x)where δ(x) represents this discrepancy
and will typically itself have uncertain parameters
We observe
zi = r(xi) + εi = f(xi, θ) + δ(xi) + εi
UQ Summerschool 2014 80
SM revisited
Kennedy and O’Hagan (2001) introduced this model discrepancy
Modelled it as a zero-mean Gaussian process
They claimed it acknowledges additional uncertainty
And mitigates against over-fitting of θ
So add this model discrepancy term to the linear model of the simple machine
r(x) = βx + δ(x)With δ(x) modelled as a zero-mean GP
Posterior distribution of β now behaves quite differently
12/8/2014
UQ Summerschool 2014 81
SM – calibration, with discrepancy
12/8/2014
Posterior distribution much broader and doesn’t get worse with more data
But still misses the true value
UQ Summerschool 2014 82
Interpolation
12/8/2014
Main benefit of simple GP model discrepancy is predictionE.g. at x = 1.5
Prediction within the range of the data is possibleAnd gets better with more data
UQ Summerschool 2014 83
But when it comes to extrapolation …
12/8/2014
… at x = 6
More data doesn’t help because it’s all in the range [0, 4]
Prediction OK here but gets worse for larger x
UQ Summerschool 2014 84
Extrapolation
12/8/2014
One reason for wish to learn about physical parametersShould be better for extrapolation than just tuning
Without model discrepancy The parameter estimates will be biased
Extrapolation will also be biasedBecause best fitting parameter values are different in different parts of the control variable space
With more data we become more sure of these wrong values
With GP model discrepancyExtrapolating far from the data does not work
No information about model discrepancy
Prediction just uses the (calibrated) simulator
UQ Summerschool 2014 85
We haven’t solved the problem
12/8/2014
With simple GP model discrepancy the posterior distribution for θ is typically very wide
Increases the chance that we cover the true value
But is not very helpful
And increasing data does not improve the precision
Similarly, extrapolation with model discrepancy gives wide prediction intervals
And may still not be wide enough
What’s going wrong here?
UQ Summerschool 2014 86
Nonidentifiability
12/8/2014
Formulation with model discrepancy is not identifiable
For any θ, there is a δ(x) to match reality perfectlyReality is r(x) = f(x, θ) + δ(x)
Given θ, model discrepancy is δ(x) = r(x) – f(x, θ)
Suppose we had an unlimited number of observationsWe would learn reality’s true function r(x) exactly
Within the range of the data
Interpolation works
But we would still not learn θIt could in principle be anything
And we would still not be able to extrapolate reliably
UQ Summerschool 2014 87
The joint posterior
Calibration leads to a joint posterior distribution for θ and δ(x)
But nonidentifiability means there are many equally good fits (θ, δ(x)) to the data
Induces strong correlation between θ and δ(x)
This may be compounded by the fact that simulators often have large numbers of parameters
(Near-)redundancy means that different θ values produce (almost) identical predictions
Sometimes called equifinality
Within this set, the prior distributions for θ and δ(x) count
12/8/2014
UQ Summerschool 2014 88
The importance of prior information
The nonparametric GP term allows the model to fit and predict reality accurately given enough data
Within the range of the data
But it doesn’t mean physical parameters are correctly estimated
The separation between original model and discrepancy is unidentified
Estimates depend on prior information
Unless the real model discrepancy is just the kind expected a priori the physical parameter estimates will still be biased
To learn about θ in the presence of model discrepancy we need better prior information
And this is also crucial for extrapolation
12/8/2014
UQ Summerschool 2014 89
Better prior information
12/8/2014
For calibrationPrior information about θ and/or δ(x)
We wish to calibrate because prior information about θ is not strong enough
So prior knowledge of model discrepancy is crucialIn the range of the data
For extrapolationAll this plus good prior knowledge of δ(x) outside the range of the calibration data
That’s seriously challenging!
In the SM, a model for δ(x) that says it is zero at x = 0, then increasingly negative, should do better
UQ Summerschool 2014 90
Inference about the physical parameter
12/8/2014
We conditioned the GP
δ(0) = 0
δ′(0) = 0
δ′(0.5) < 0
δ′(1.5) < 0
UQ Summerschool 2014 91
Prediction
12/8/2014
x = 1.5 x = 6
Where is the uncertainty?
Return to the general case
How might the simulator output y = f (x) differ from the true real-world value z that the simulator is supposed to predict?
Error in inputs xInitial values
Forcing inputs
Model parameters
Error in model structure or solutionWrong, inaccurate or incomplete science
Bugs, solution errors
UQ Summerschool 2014 9212/8/2014
Quantifying uncertainty
UQ Summerschool 2014 93
The ideal is to provide a probability distribution p(z) for the true real-world value
The centre of the distribution is a best estimate
Its spread shows how much uncertainty about z is induced by uncertainties on the previous slide
How do we get this?Input uncertainty: characterise p(x), propagate through to p(y)
Model discrepancy: characterise p(z - y)
12/8/2014
UQ Summerschool 2014 94
The hard part
12/8/2014
We know pretty well how to do uncertainty propagationUncertainties associated with the simulator output
The hard part is the link to realityThe difference between the real system z and the simulator output y = f(x) using best input values
Because all models are wrong (Box, 1979)
It was through thinking about this link …Particularly in the context of calibration
i.e. learning about uncertain parameters in the model
And also extrapolation
… that I was led to think more deeply about parametersAnd to realise just how important model discrepancy is
UQ Summerschool 2014 95
Modelling model discrepancy
Three rules:
1. Must account for model discrepancyIgnoring it leads to biased calibration, over-optimistic predictions
2. Discrepancy term must be modelled nonparametricallyAllows learning about reality and interpolative prediction
3. Model must incorporate realistic knowledge about discrepancy
To get unbiased learning about physical parameters and extrapolation
But following these rules is hardOngoing research
12/8/2014
Managing uncertainty
To understand the implications of different uncertainty sources
Probabilistic, variance-based sensitivity analysis
Helps with targeting and prioritising research
To reduce uncertainty, get more information!
Informal – more/better scienceTighten p(x) through improved understanding
Tighten p(z - y) through improved modelling or programming
Formal – using real-world dataCalibration – learn about model parameters
Data assimilation – learn about the state variables
Learn about model discrepancy z - y
Validation (another talk!)UQ Summerschool 2014 9612/8/2014
UQ Summerschool 2014 97
Conclusions – Session 2
12/8/2014
Without model discrepancyInference about physical parameters will be wrong
And will get worse with more data
The same is true of prediction Both interpolation and extrapolation
With crude GP model discrepancyInterpolation inference is OK
And gets better with more data
But we still get physical parameters and extrapolation wrong
The better our prior knowledge about model discrepancy The more chance we have of getting physical parameters right
Also extrapolation But then we need even better prior knowledge
98
Any final questions?
It remains just to say thank you for sitting through this morning’s sessions!
12/8/2014 UQ Summerschool 2014