characterising uncertainty through climate model...
TRANSCRIPT
Characterising uncertainty through climate
model ensembles Open issues with model dependence,
performance, and robustness
CliMathNet 2014
University of Leeds
July 14-18,2014
Claudia Tebaldi
Climate and Global Dynamics Division
National Center for Atmospheric Research
Boulder, CO, USA
Why do we want ensembles?
• Either we believe that the truth lies in the consensus, or we
want them to span a comprehensive range of possibilities
(sometimes both, depending on the problem we are dealing
with)
• In both cases we are undermined by
the lack of independence among models, which makes the
robustness of their results less “comforting” than it appears
the heterogeneity of the models in the ensemble, with
performance metrics telling us that not all models perform the
same. But how do we take that into account?
Robustness
Are we making progress? Projections are robust, uncertainties remain
Old New
Knutti and Sedlacek, Nature Climate Change 2012
Are we making progress? June-July-August mean precipitation projections
New Old
Knutti and Sedlacek, Nature Climate Change 2012
Dependence
Climate model genealogy Models are not independent
Edwards, WIRE 2011
Climate model genealogy Models are not independent
Dissimilarity for surface temperature and precipitation
Knutti et al., GRL 2013
Climate model genealogy Models are not independent
Masson and Knutti, GRL 2011, Knutti et al., GRL 2013
Model performance
Metrics and model quality
• An infinite number of metrics can be defined.
• Many metrics are dependent.
• Observation datasets and uncertainty matters.
• The concept of a “best model” is ill-defined.
• There may be a best model for a particular purpose, where
“best” measured in a specific way. But determining that is
hard.
How do we measure model performance?
Gleckler et al., JGR 2008
Performance metric
Measure of agreement
between model and
observation
Model quality metric
Measure designed to infer
the skill of a model for a
specific purpose
My model is better than your model
Reichler and Kim, BAMS 2008
Model performance
Better Worse
Distance to observations Surface temperature and precipitation
Knutti et al., GRL 2013
Variability
Stippling indicates agreement Why models disagree
IPCC AR4 WG1, Fig. SPM.7
Limits of predictability Warmest and coolest of 40 realizations
Deser et al., Nature Climate Change 2012
Limits of predictability Wettest and driest of 40 realizations
Deser et al., Nature Climate Change 2012
Stippling indicates agreement Why models disagree
Tebaldi et al. 2011 GRL
Stippling indicates agreement Why models disagree
Knutti et al. 2013 Nature Climate Change
Why do we want ensembles?
• Either we believe that the truth lies in the consensus, or we
want them to span a comprehensive range of possibilities
(sometimes both, depending on the problem we are dealing
with)
• In both cases we are undermined by
the lack of independence among models, which makes the
robustness of their results less “comforting” than it appears
the heterogeneity of the models in the ensemble, with
performance metrics telling us that not all models perform the
same. But how do we take that into account?
Any attempt at formal uncertainty
characterization/quantification should decide
a) if it is after a consensus or a distribution of equally
likely outcomes spanning the range of possibilities
b) if model performance should inform weighting, or if
models should be treated equally
Assumptions, esp. in the weighting
matter for the result
Tebaldi and Knutti, Phil Trans Roy Soc, 2007
Surface warming South East Asia
Dec-Feb 2080-2099, A1B scenario
Surface warming (°C)
Pro
ba
bili
ty d
en
sity
Tebaldi
Greene
Furrer
Raw models
Assumptions, esp. in the weighting
matter for the result
Tebaldi and Knutti, Phil Trans Roy Soc, 2007
Surface warming (°C)
Pro
ba
bili
ty d
en
sity
Tebaldi
Greene
Furrer
Raw models
Uses a truth+error
paradigm and weighs
according to regional
bias and consensus
Uses a truth+error
approach but weighs
according to past
performance on
regional trends
Uses a truth+error
approach, but weighs
according to global bias
measures
What type of constraints/weighting matter for future
projections is still an open question.
The research areas of “emergent constraints” is open
and actively looking for such sources of information
It is not just a matter of mining for relationships
between observables and future behavior within the
ensembles, but also of understanding why such
relationships are in place and significant.
Some observable climate indices
do correlate with future warming
Huber et al., J. Climate 2011
Land o
cean c
ontr
ast
in s
urf
ace lon
gw
ave
dow
nw
ard
all
sky r
adia
tion
How to model dependence or discount “duplicate models”
is also an open question
Reweighting by performance and dependence
Ben Sanderson, in preparation
Climate sensitivity
Reweighted for performance and dependence
Work by Ben Sanderson,
Planning CMIP6
What kind of ensemble of simulations do we want?
• The community is developing a better appreciation of the
value of experimental design, driven by the necessity of
limiting the simulations’ load for modeling centers.
• Question about the size and make-up of multi-model
ensembles, and the size of initial conditions ensembles are
front and center
Within each model’s simulation
• How large should Initial Condition ensembles be?
• Perhaps we can run a large ensemble under one scenario and apply
our findings to others, limiting the ensemble size to one for other
scenarios?
• Is the unforced variability in the Control run the same as the
variability in forced experiments? If it is we can use long Control
experiments in place of large Initial Condition ensembles.
• What type of scenarios is it worth running?
• What does it mean for scenarios to be significantly different?
This is relevant for both global forcings and regional forcings
specifications. Should we rather focus on idealized experiments
in order to build better emulators?
• Can pattern scaling/statistical emulators fill-in the gaps?
As for the scenarios themselves
How different should scenarios be, to justify investing in coupled
model experiments?
- It probably depends on the eventual use of climate
information (which variables, which timescales, which
regional scales);
- It depends on horizon of interest, and on model uncertainties;
- It depends both on the final radiative forcing difference and
on the path to arrive there, on the mix of forcings, regional
and short lived.
From the perspective
of the Multi-Model Ensemble design:
ΔGAT
(°C)
RF
(W/m2)
How much of the Earth’s surface experiences significant temperature change
for different levels of
global temperature change or Radiative Forcings?
• Should we have criteria of acceptance for models participating
in CMIP exercises? A minimum/low bar, e.g., about the size of
internal variability?
• Should we require a control run of sizeable length, clean of
drift?
• Is the sociology of modeling/the politics of modeling centers
preventing a treatment of these problems as pure scientific
questions?
About membership in the CMIP ensemble
Untouched issues
• Limited coverage of climate sensitivity range?
• Common biases.
• Irreducible discrepancies between models and real world.
Conclusions
• Multi-model Ensembles contain a wealth of (costly)
information, and as a set constitute an invaluable resource for
uncertainty characterisation..
• Design of experiments in the multi-model frameworks are
hampered by the “opportunistic” nature of these ensembles,
and by the sociology of model development.
• At least from this “bird-eye” perspective, challenges remain in
quantifying dependencies and leveraging past performance to
guide weighting of future projections.
• For specific problems/projections/times/regions/variables
there is hope: Stay tuned for Phil Sansom’s talk later on!