characterising uncertainty through climate model...

Characterising uncertainty through climate

model ensembles Open issues with model dependence,

performance, and robustness

CliMathNet 2014

University of Leeds

July 14-18,2014

Claudia Tebaldi

Climate and Global Dynamics Division

National Center for Atmospheric Research

Boulder, CO, USA

Why do we want ensembles?

• Either we believe that the truth lies in the consensus, or we

want them to span a comprehensive range of possibilities

(sometimes both, depending on the problem we are dealing

with)

• In both cases we are undermined by

the lack of independence among models, which makes the

robustness of their results less “comforting” than it appears

the heterogeneity of the models in the ensemble, with

performance metrics telling us that not all models perform the

same. But how do we take that into account?

Robustness

Are we making progress? Projections are robust, uncertainties remain

Old New

Knutti and Sedlacek, Nature Climate Change 2012

Are we making progress? June-July-August mean precipitation projections

New Old

Knutti and Sedlacek, Nature Climate Change 2012

Dependence

Climate model genealogy Models are not independent

Edwards, WIRE 2011


Dissimilarity for surface temperature and precipitation

Knutti et al., GRL 2013


Masson and Knutti, GRL 2011, Knutti et al., GRL 2013

Model performance

Metrics and model quality

• An infinite number of metrics can be defined.

• Many metrics are dependent.

• Observation datasets and uncertainty matters.

• The concept of a “best model” is ill-defined.

• There may be a best model for a particular purpose, where

“best” measured in a specific way. But determining that is

hard.

How do we measure model performance?

Gleckler et al., JGR 2008

Performance metric

Measure of agreement

between model and

observation

Model quality metric

Measure designed to infer

the skill of a model for a

specific purpose

My model is better than your model

Reichler and Kim, BAMS 2008

Model performance

Better Worse

Distance to observations Surface temperature and precipitation

Knutti et al., GRL 2013

Variability

Stippling indicates agreement Why models disagree

IPCC AR4 WG1, Fig. SPM.7

Limits of predictability Warmest and coolest of 40 realizations

Deser et al., Nature Climate Change 2012

Limits of predictability Wettest and driest of 40 realizations

Deser et al., Nature Climate Change 2012


Tebaldi et al. 2011 GRL


Knutti et al. 2013 Nature Climate Change

Why do we want ensembles?

• Either we believe that the truth lies in the consensus, or we

want them to span a comprehensive range of possibilities

(sometimes both, depending on the problem we are dealing

with)

• In both cases we are undermined by

the lack of independence among models, which makes the

robustness of their results less “comforting” than it appears

the heterogeneity of the models in the ensemble, with

performance metrics telling us that not all models perform the

same. But how do we take that into account?

Any attempt at formal uncertainty

characterization/quantification should decide

a) if it is after a consensus or a distribution of equally

likely outcomes spanning the range of possibilities

b) if model performance should inform weighting, or if

models should be treated equally

Assumptions, esp. in the weighting

matter for the result

Tebaldi and Knutti, Phil Trans Roy Soc, 2007

Surface warming South East Asia

Dec-Feb 2080-2099, A1B scenario

Surface warming (°C)

Pro

ba

bili

ty d

en

sity

Tebaldi

Greene

Furrer

Raw models

Assumptions, esp. in the weighting

matter for the result

Tebaldi and Knutti, Phil Trans Roy Soc, 2007

Surface warming (°C)

Pro

ba

bili

ty d

en

sity

Tebaldi

Greene

Furrer

Raw models

Uses a truth+error

paradigm and weighs

according to regional

bias and consensus

Uses a truth+error

approach but weighs

according to past

performance on

regional trends

Uses a truth+error

approach, but weighs

according to global bias

measures

What type of constraints/weighting matter for future

projections is still an open question.

The research areas of “emergent constraints” is open

and actively looking for such sources of information

It is not just a matter of mining for relationships

between observables and future behavior within the

ensembles, but also of understanding why such

relationships are in place and significant.

Some observable climate indices

do correlate with future warming

Huber et al., J. Climate 2011

Land o

cean c

ontr

ast

in s

urf

ace lon

gw

ave

dow

nw

ard

all

sky r

adia

tion

How to model dependence or discount “duplicate models”

is also an open question

Reweighting by performance and dependence

Ben Sanderson, in preparation

Climate sensitivity

Reweighted for performance and dependence

Work by Ben Sanderson,

Planning CMIP6

What kind of ensemble of simulations do we want?

• The community is developing a better appreciation of the

value of experimental design, driven by the necessity of

limiting the simulations’ load for modeling centers.

• Question about the size and make-up of multi-model

ensembles, and the size of initial conditions ensembles are

front and center

Within each model’s simulation

• How large should Initial Condition ensembles be?

• Perhaps we can run a large ensemble under one scenario and apply

our findings to others, limiting the ensemble size to one for other

scenarios?

• Is the unforced variability in the Control run the same as the

variability in forced experiments? If it is we can use long Control

experiments in place of large Initial Condition ensembles.

• What type of scenarios is it worth running?

• What does it mean for scenarios to be significantly different?

This is relevant for both global forcings and regional forcings

specifications. Should we rather focus on idealized experiments

in order to build better emulators?

• Can pattern scaling/statistical emulators fill-in the gaps?

As for the scenarios themselves

How different should scenarios be, to justify investing in coupled

model experiments?

- It probably depends on the eventual use of climate

information (which variables, which timescales, which

regional scales);

- It depends on horizon of interest, and on model uncertainties;

- It depends both on the final radiative forcing difference and

on the path to arrive there, on the mix of forcings, regional

and short lived.

From the perspective

of the Multi-Model Ensemble design:

ΔGAT

(°C)

RF

(W/m2)

How much of the Earth’s surface experiences significant temperature change

for different levels of

global temperature change or Radiative Forcings?

• Should we have criteria of acceptance for models participating

in CMIP exercises? A minimum/low bar, e.g., about the size of

internal variability?

• Should we require a control run of sizeable length, clean of

drift?

• Is the sociology of modeling/the politics of modeling centers

preventing a treatment of these problems as pure scientific

questions?

About membership in the CMIP ensemble

Untouched issues

• Limited coverage of climate sensitivity range?

• Common biases.

• Irreducible discrepancies between models and real world.

Conclusions

• Multi-model Ensembles contain a wealth of (costly)

information, and as a set constitute an invaluable resource for

uncertainty characterisation..

• Design of experiments in the multi-model frameworks are

hampered by the “opportunistic” nature of these ensembles,

and by the sociology of model development.

• At least from this “bird-eye” perspective, challenges remain in

quantifying dependencies and leveraging past performance to

guide weighting of future projections.

• For specific problems/projections/times/regions/variables

there is hope: Stay tuned for Phil Sansom’s talk later on!

characterising uncertainty through climate model...

Documents