visualizing climate variability with time-dependent ... · visualizing climate variability with...

11
Procedia Computer Science 00 (2012) 1–11 Procedia Computer Science International Conference on Computational Science, ICCS 2012 Visualizing Climate Variability with Time-Dependent Probability Density Functions, Detecting it with Information Theory J. Walter Larson a,b,c a Mathematics and Computer Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439, USA b Computation Institute, University of Chicago c Research School of Computer Science, The Australian National University Abstract A framework is presented for visualizing and detecting climate variability and change based on time-dependent probability density functions (PDFs). The PDFs show how the distribution of values in the sample window changes over time and show more detail than do timeseries of windowed moments. A set of information-theoretic statistics based on the Shannon entropy and the Kullback-Leibler divergence (KLD) are defined to assess PDF complexity and temporal variability. The KLD-based measures quantify the representativeness of a 30-year sampling window of a larger climatic record: how well a long sample can predict a smaller sample’s PDF, and how well one 30-year sample matches a similar sample shifted in time. These information-theoretic statistics constitute a new type of climate variability, informatic variability. These techniques are applied to the Central England Temperature record, the longest continuous meteorological observational record. Keywords: Probability Density Function, Information Theory, Climate Variability, Climate Change 1. Introduction Climate is a statistical construct computed from meteorological state data sampled over a predefined period. By convention, this window sampling period W is 30 years—a number arrived at by a vote at the 1937 International Meteorological Organization meeting [1, 2]. Climate models do not model the climate directly; they compute solutions to equations of evolution for the Earth system’s instantaneous state and write daily or monthly summaries of the state to history files, which are then postprocessed to compute climatologies. The I/O-intensive nature of coupled climate models is arguably the most significant barrier to creating an exascale climate model. A natural question arises: Can we model the climate directly? Hasselmann [3] proposed a statistical dynamical model (SDM) approach that used the Fokker-Planck equation (FPE) as the equation of evolution for the probability density function (PDF). For some quantity X and time t with time-dependent PDF ρ(X, t), the univariate FPE is t ρ(X, t) + X [D 1 (X, t)ρ(X, t)] = X [D 2 (X, t)X ρ(X, t)]. (1) In (1), D 1 (X, t) is the drift term, and D 2 (X, t) is the diusion term. In Hasselmann’s narrative, weather systems— atmospheric variations of duration on the order of 15 days or less—provided stochastic forcing to more slowly re- sponding, integrative components of the Earth system (oceans, cryosphere, and biosphere), much in the way that molecular motions drive Brownian motion of larger particles suspended in a fluid. Direct deterministic modeling of

Upload: others

Post on 07-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visualizing Climate Variability with Time-Dependent ... · Visualizing Climate Variability with Time-Dependent Probability Density Functions, Detecting it with Information Theory

Procedia Computer Science 00 (2012) 1–11

Procedia ComputerScience

International Conference on Computational Science, ICCS 2012

Visualizing Climate Variability with Time-Dependent ProbabilityDensity Functions, Detecting it with Information Theory

J. Walter Larsona,b,c

aMathematics and Computer Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439, USAbComputation Institute, University of Chicago

cResearch School of Computer Science, The Australian National University

Abstract

A framework is presented for visualizing and detecting climate variability and change based on time-dependentprobability density functions (PDFs). The PDFs show how the distribution of values in the sample window changesover time and show more detail than do timeseries of windowed moments. A set of information-theoretic statisticsbased on the Shannon entropy and the Kullback-Leibler divergence (KLD) are defined to assess PDF complexityand temporal variability. The KLD-based measures quantify the representativeness of a 30-year sampling windowof a larger climatic record: how well a long sample can predict a smaller sample’s PDF, and how well one 30-yearsample matches a similar sample shifted in time. These information-theoretic statistics constitute a new type ofclimate variability, informatic variability. These techniques are applied to the Central England Temperature record,the longest continuous meteorological observational record.

Keywords: Probability Density Function, Information Theory, Climate Variability, Climate Change

1. Introduction

Climate is a statistical construct computed from meteorological state data sampled over a predefined period. Byconvention, this window sampling period W is 30 years—a number arrived at by a vote at the 1937 InternationalMeteorological Organization meeting [1, 2]. Climate models do not model the climate directly; they compute solutionsto equations of evolution for the Earth system’s instantaneous state and write daily or monthly summaries of the stateto history files, which are then postprocessed to compute climatologies. The I/O-intensive nature of coupled climatemodels is arguably the most significant barrier to creating an exascale climate model. A natural question arises: Canwe model the climate directly? Hasselmann [3] proposed a statistical dynamical model (SDM) approach that usedthe Fokker-Planck equation (FPE) as the equation of evolution for the probability density function (PDF). For somequantity X and time t with time-dependent PDF ρ(X, t), the univariate FPE is

∂tρ(X, t) + ∂X[D1(X, t)ρ(X, t)] = ∂X[D2(X, t)∂Xρ(X, t)]. (1)

In (1), D1(X, t) is the drift term, and D2(X, t) is the diffusion term. In Hasselmann’s narrative, weather systems—atmospheric variations of duration on the order of 15 days or less—provided stochastic forcing to more slowly re-sponding, integrative components of the Earth system (oceans, cryosphere, and biosphere), much in the way thatmolecular motions drive Brownian motion of larger particles suspended in a fluid. Direct deterministic modeling of

Page 2: Visualizing Climate Variability with Time-Dependent ... · Visualizing Climate Variability with Time-Dependent Probability Density Functions, Detecting it with Information Theory

J.W. Larson / Procedia Computer Science 00 (2012) 1–11 2

a time-dependent climate PDF using something like (1) is a potent idea because the PDF provides a comprehensivedescription of a sample’s underlying population statistics. Direct dynamic modeling of ρ(X, t) would, in theory, en-able deterministic modeling of moments and estimates for quantiles and extrema. SDMs, however, fell out of favor adecade after Hasselman’s 1976 paper because the role of atmospheric momentum transport was not adequately cov-ered by SDMs and advances in computer hardware made practicable the general circulation model-based approachemployed in current coupled climate and Earth system models.

An alternative to the SDM approach is to ask a slightly different question: What does the empirical climatic PDFρ(X, t) look like for the observational timeseries of some meteorological variable X, and is it possible to fit an equationof evolution similar to the FPE for ρ(X, t)? Here I tackle the first part of this question by posing—and proposinganswers to—the following questions regarding a windowed sample of W years taken from a longer record of Y years:

Q1 What does the W-window-sampled ρ(t, X) look like, how confident can we be of its structure, and how does itevolve in time?

Q2 What is the information content of a sample of W years? How does it evolve in time over the record of Y years,and what does this signify?

Q3 How well does a sample of W years predict the whole available data record of Y > W years or other majorclimatically relevant subsets of the whole record?

Q4 How well does knowledge of the whole record’s time-independent PDF ρ(X) predict a local W-window-sample-generated PDF ρ(t, X)? That is, when viewed with prior knowledge of the parent density function, how unusualdoes ρ(t, X) look?

Q5 How well does one W-window sample’s PDF q(t, X) predict another W-window sample’s PDF p(t′, X)?

Q6 Is it possible to use time-dependent PDFs to classify periods of time that are climatically stable or undergoingclimatic change?

I will address Q1 by using a density estimation technique that employs a Bayesian-derived optimal binning scheme.This binning scheme provides estimated PDFs in a form that is highly compatible with computing key information-theoretic statistics. These information-theoretic statistics provide the means to address Q2–Q6 and constitute a newtype of climatic variability—informatic variability. In particular, this approach expresses differences in PDFs fromdifferent climatic sampling periods—and, by association, climate change—as a form of information loss, specifically,loss of ability of a past (changed) climate record’s PDF to predict a changed (past) climate record’s PDF. Applicationof these techniques to a classic meteorological timeseries—the Central England Temperature (CET) record [4])—provide striking visualizations of the evolution of the climate over this record, reveals previously known properties,and puts in stark contrast the current climate’s oddity with respect to the previous observational record.

2. Probability Density Functions, Information Theory, and Climate

For a random continuous variable X ∈ (−∞,∞) the probability density function p(X) satisfies the followingconditions: p(x) > 0,∀x ∈ (−∞,∞), and

∫ ∞−∞

p(x)dx = 1. Suppose x depends on the time t. The dependency x→ x(t)implies potential time dependence in the PDF; that is, p(x) → p(x, t). Note that each “time slice” of p(x, t) satisfiesthe normalization condition of a univariate PDF; that is, for t = tC ,

∫ ∞−∞

p(x, t)|t=tC dx = 1. If the underlying statistics ofX remain stationary, then the PDF remains solely a function of x. Nonstationarity—temporal sensitivity of the PDF—raises the question of how to estimate p(x, t). A common technique used by the climate community is to sample x(t)using a time window of width W and centered at a time tC , resulting in a sample S = x(t), t ∈ [tC − W

2 , tC + W2 ),

and estimating a univariate p(x)|t=tC to get p(x, tC). Advancing this window through time then provides p(x, t). Thiswindowed sampling technique underlies PDF estimation in this paper. The windowed sampling and binning techniquedescribed in Section 3, combined with visualization, answer Q1.

Information theory [5, 6] is a mathematical framework for quantifying information content and identifying rela-tionships between random variables. It has been used extensively in the telecommunications and signal-processing

Page 3: Visualizing Climate Variability with Time-Dependent ... · Visualizing Climate Variability with Time-Dependent Probability Density Functions, Detecting it with Information Theory

J.W. Larson / Procedia Computer Science 00 (2012) 1–11 3

communities; but it also has been applied by the climate community to solve problems of predictability [7] and model-reality comparison [8, 9] and to evaluate climate sampling window sizes [10]. Its conceptual roots lie in Boltzmann’sstatistical mechanical formulation of thermodynamic entropy. The Shannon entropy (SE) H(X) is

H(X) = −

∫ ∞

−∞

p(x) log p(x)dx. (2)

The logarithm base in (2)—and in (3) below—defines the units for information; for bases 2 and e, H(X) is measuredin bits and nats, respectively. H(X) broadly quantifies the amount of “surprise” in the distribution of values of X.Given a precomputed PDF, computing the SE provides an answer to information content component of Q2. Note thatthe integral formulation of the SE (2) can yield negative or infinite values; the reason is that the probability densityfunction p(x) may locally exceed unity.

Consider two distinct PDFs for X: p(x) and q(x). The additional information, or gain, required to predict p(x)given q(x) is the Kullback-Leibler divergence (KLD):

DKL(p ‖ q) =

∫ ∞

−∞

p(x)[ p(x)

q(x)

]dx. (3)

For a time-dependent variable x(t), similar arguments to those presented for PDFs can be applied to compute time-window-sampled SE and KLD values from (2) and (3), respectively.

The KLD quantifies differences between PDFs. Sometimes it is called a “distance measure” or “metric” for PDFs,but such terms are inaccurate: The KLD is not symmetric; in general, DKL(p ‖ q) , DKL(q ‖ p). Furthermore,the KLD does not satisfy the triangle inequality. The nonsymmetric nature of the KLD is of particular use. Therepresentativeness of a particular W-window’s PDF of a larger record can be addressed by constructing q(x, t) for eachW-window of a climate record, using the entire Y-year record to construct a density p(x), and computing DKL(p ‖ q).In fact, DKL(p ‖ q) provides an answer to Q3. Reversing the arguments, an answer to Q4 is DKL(q ‖ p). Q5 may beanswered by constructing p(x, t) from two different W-windows and computing the KLD. That is, one can constructtwo samples S1 = x(t), t ∈ [t1 − W

2 , t1 + W2 ) and S2 = x(t), t ∈ [t2 − W

2 , t2 + W2 ), and compute from them p(x, t1) and

p(x, t2), respectively. The time-shifted KLD DKL(p(x, t1) ‖ p(x, t2)) quantifies how much additional information isrequired to predict the PDF at t = t2 from the PDF at t = t1. Large-scale application of the time-shifted KLD to allpossible W-sampling windows in a record of Y years can provide clues about periods of relative climatic stability andrapid climate change, providing answers to Q6.

The aforementioned SE and KLD statistics characterize the system’s informatic variability.Both the SE and KLD were originally formulated for discrete variables; one replaces the PDFs {p(x), q(x)} with

the probability mass functions (PMF) {~π, ~ζ ∈ <N}, respectively, and integrals w.r.t. x in (2) and (3) with summationsover N discrete states [6]. All PDFs plotted and computed SE and KLD quantities presented in this paper derivedfrom PMFs estimated from sample data. Numerical computation of the SE is sensitive to the discretization dx→ ∆x.Thus, one must take care in discretizing or binning continuous data to form a PMF or PDF, respectively.

3. Computational Methodology

PDF estimation from observational data is nontrivial and remains an active area of research; an excellent overviewis given in [11]. I used an optimal binning scheme derived from Bayesian principles [12]. The technique’s underlyingassumptions are threefold: a discrete uniform prior distribution for the number of bins within a feasible range, aDirichlet prior distribution for the bin probabilities, and a piecewise constant PDF with uniform bins. Other priorknowledge I comprises the data sample’s size N, range V , and implied bin boundary locations assuming uniformwidths. The posterior probability P(M|~d, I) that a PDF with M bins describes the data sample ~d is [12]

P(M|~d, I) ∝( M

V

)N Γ( M2 )

Γ( 12 )M

∏Mk=1 Γ(nk + 1

2 )

Γ(N + M2 )

, (4)

where Γ(·) is the gamma function and nk is the number of counts in each bin. The value of M that maximizes P(M|~d, I)yields the most probable piecewise constant, uniform-bin-width PDF for ~d. Maximizing the logarithm of the right-hand side (RHS) of (4) is computationally easier and also maximizes P(M|~d, I). The optimally estimated PDF is

Page 4: Visualizing Climate Variability with Time-Dependent ... · Visualizing Climate Variability with Time-Dependent Probability Density Functions, Detecting it with Information Theory

J.W. Larson / Procedia Computer Science 00 (2012) 1–11 4

parameterized as

µi =

( MV

)(ni +

12

)(N +

M2

)−1(5)

σ2i =

( MV

)2(ni +

12

)(N − ni +

M − 12

)(N +

M2

+ 1)−1(

N +M2

)−2, (6)

where µi and σ2i are the bin probability densities and associated variances, respectively [12]. Even when data is absent

from a bin, µi , 0, and the µi are normalized by construction. The density and its associated uncertainties are definedby (5) and (6), respectively, thus answering Q1. Densities computed in this manner are used to evaluate all SE- andKLD-based statistics presented in this paper.

Another desirable characteristic of this PDF estimation scheme is that it can detect severely truncated data [13].Truncated data will clump at its truncated values, adding fine structure to a PDF’s broad shape, spawning a series oflocal maxima in the RHS of (4), growing with M (cf. Figure 5D in [12]). This presents a problem when applied tothe CET data. The CET data are truncated to the nearest 0.1◦C, and early application of this technique encounteredthe problem of successive maxima in the RHS of (4). Truncation effects can be removed by adding a random uniformdeviate that brackets the rounded value by half the truncation value; this will not replace lost information [13] butdoes allow estimation of the PDF’s large-scale structure. I have smoothed the CET data by adding to each observationa uniform random deviate δi ∈ [−0.05, 0.05); this smoothing allows PDF estimation using (4) and, when truncated tothe nearest 0.1◦C, yields the original timeseries. Multiple random smoothings of the CET have been performed, andSE and KLD results computed from the different smoothings agree [10].

4. Central England Temperature Record

The CET is the longest observational record for surface air temperature and one of the most thoroughly stud-ied [14]. Manley’s original CET [15, 4] comprises monthly averages beginning in 1659. The daily CET [16] (Figure1(a)) spans the period 1772–present for average temperatures (Tavg) and 1878–present for minimum (Tmin) and max-imum (Tmax) temperatures. Sampling periods were 1659–2009 for monthly Tavg, 1772–2006 for daily Tavg, and1878–2006 for (Tmin,Tmax). All data were obtained from the British Atmospheric Data Centre [17]. CET daily andmonthly temperature values are rounded to the nearest 0.1◦C, with the exception of the monthly record, which hasperiods of low (0.5◦C–1.0◦C) precision for the periods 1659–1699 and 1707–1721.

The CET has a secular warming trend (Figure 1(a)) [14], which emerges after 1900. The 18th and 19th centuriesare periods of relative climate stability in that there are oscillations but no overall trend in the CET [14]. The CETexhibits oscillatory behavior at multiple periods up to and beyond the century scale [18]. Figure 1(b) (1(c) ) showsthe daily CET average temperature PDF ρfull(Tavg) (ρpreind(Tavg)) for the full record 1772–2006 (preindustrial era1772–1870). These PDFs were obtained by using the technique outlined in Section 3.

1770 1800 1830 1860 1890 1920 1950 1980 2010Date

-10

-5

0

5

10

15

20

25

Tem

pera

ture

oC

original timeseries

1-year average

5-year average

10-year average

Central England Temperature RecordDaily Average Temperature (1772-2006)

(a)

-14 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 22 24 26

Temperature oC

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Pro

bab

ilit

y D

en

sit

y

CET Probability Density FunctionSampling Period 1772-2006

(b)

-14 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 22 24 26

Temperature oC

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Pro

bab

ilit

y D

en

sit

y

CET Probability Density FunctionPreindustrial Period 1772-1870

(c)

Figure 1: CET: (a) daily Tavg timeseries (1772–2006), (b) full record ρfull(Tavg) (1772–2006), and (c) preindustrial record ρpreind(Tavg) (1772–1870)

Page 5: Visualizing Climate Variability with Time-Dependent ... · Visualizing Climate Variability with Time-Dependent Probability Density Functions, Detecting it with Information Theory

J.W. Larson / Procedia Computer Science 00 (2012) 1–11 5

5. CET Probability Density Functions and Informatic Variability

The time-dependent PDF ρ(t,Tavg) for CET monthly averages (Figure 2(a)) was computed by using (5). The PDFis bimodal, with the lower (upper) mode steady during the 17th century at about 5◦C (15◦C). The upper mode isbroadened and flattened around the year 1800 and reappears shortly afterward, trending gently upward through thelate 20th century. The lower mode is boadened and flattened around the year 1820, reappearing shortly afterward. Thismode shifts toward warmer temperatures during the 19th and 20th centuries more dramatically than does the uppermode, and the shift steepens after 1950. The early years of the sample 1659–1721 show a more detailed multimodalstructure. The reason is that the optimal binning scheme chose more bins for 30-year windows during this period, aconsequence of the algorithm’s ability to detect truncated data

Uncertainties in ρ(t,Tavg) (Figure 2(b)) were computed from (6) and show higher values throughout the tempera-ture spectrum for the period 1659–1720, casting doubt on their associated PDF values in Figure 2(a). The ratio of thevalues in Figure 2(a) to those in Figure 2(b) defines a signal-to-noise (S/N) ratio (Figure 2(c)) that quantifies confi-dence in ρ(t,Tavg). Note that little confidence (S/N < 10) is associated with any values of ρ(t,Tavg) for t ∈ [1659, 1721],except for the higher-precision period 1699–1706. The values of ρ(t,Tavg) in the vicinity of its modes after the year1721 are significant (S/N > 10).

(a) (b) (c)

Figure 2: CET monthly Tavg: (a) ρ(t,Tavg), (b) uncertainties in ρ(t,Tavg), and (c) signal-to-noise ratio for ρ(t,Tavg).

PDFs for the daily CET are presented in Figure 3. The PDF ρ(t,Tavg) (Figure 3(a)) is bimodal, with the upper(lower) mode centered near 7◦C (14◦C). The lower mode is less pronounced before 1830. The warming period 1910–1950 is present with a narrowing and upward shift of the lower mode, which stabilizes for the period 1950–1975before again shifting upward and broadening. This alternating strengthening (weakening) of the upper mode whilethe lower mode is weak (strong) is a pattern that is seen during the period 1790–1950 and has a wavelike character. Ifthis pattern is considered a wave, its period τ may be estimated by measuring the time interval between peaks in theupper mode, or the width of the “island” in the lower mode; both approaches yield a period of τ ≈ 125 years. The S/Nratios for ρ(t,Tavg) (Figure 3(d)) show significant values broadly in the band −1◦C < Tavg < 21◦C, with the centralarea around the modes highly significant (S/N> 20). Within this region, PDF isopleths shift upward dramaticallyafter 1980. The PDF ρ(t,Tmin) (Figure 3(b)) shows a single mode. This mode is broad around 1930, with width∆Tmin ≈ 7◦C and center Tmin ≈ 5.5◦C. After 1930, the mode narrows dramatically to ∆Tmin ≈ 4◦C, with a slightwarming up until 1950.The mode stabilizes and broadens near 1960, with width ∆Tmin ≈ 7.5◦C, and its center shiftsupward to Tmin ≈ 6.0◦C. During the period 1960–1980, the mode broadens further to ∆Tmin ≈ 8.5◦C, and its centerremains stationary. After 1980, the mode narrows, with its lower boundary PDF isopleth moving upward much moredramatically—a shift of nearly 2.0◦C—than its upper counterpart. The mode’s center shifts upward to 8◦C. Thisnarrowing and rebroadening of the mode may be considered a wave structure in ρ(t,Tmin); if viewed that way, it has

Page 6: Visualizing Climate Variability with Time-Dependent ... · Visualizing Climate Variability with Time-Dependent Probability Density Functions, Detecting it with Information Theory

J.W. Larson / Procedia Computer Science 00 (2012) 1–11 6

(a) (b) (c)

(d) (e) (f)

Figure 3: Daily CET PDFs: (a) ρ(t,Tavg) 1772–2006, (b) ρ(t,Tmin) 1878–2006, and (c) ρ(t,Tmax) 1878-2006. Signal-to-noise ratio in time-dependent PDFs for (d) Tavg, (e) Tmin, and (f) Tmax.

a period τ ≈ 70 years, based on measurement between the centers of the broad portions of the mode. S/N valuesin Figure 3(e) for ρ(t,Tmin) show significance for Tmin ∈ [−5, 15], high significance for Tmin ∈ [−2, 13], abd highestsignificance values clustered about Tmin ∈ [0, 11], covering the PDF’s shifting mode. Within this high S/N zone, PDFisopleths in Figure 3(b) show weak variation superimposed on a slow warming trend of approximatly 1.0◦C for theperiod 1878–2006. The PDF ρ(t,Tmax) (Figure 3(c)) is bimodal, though neither mode is present for the full record.The upper mode has width and center (∆T U

max ≈ 3◦C, T Umax ≈ 17.5◦C) and is most evident during the period 1910–

1980. Its weakness before 1910 appears to be due to broadening of the midrange of Tmax. The disappearence of thismode after 1980 is caused by midrange broadening and fattening of the high-Tmax tail. The lower mode has widthand center (∆T L

max ≈ 4◦C, T Lmax ≈ 10.5◦C). It is most pronounced during the periods 1893–1937 and 1973–2006.

It disappears briefly around 1955, just when the upper mode is at its strongest. Taken together, these modes suggestan oscillation in ρ(t,Tmax) with a period of τ ≈ 80 or τ ≈ 70 years if estimated from the peaks in the lower mode orwidth of the upper mode, respectively. All these results are significant for 1◦C < Tmax < 24◦C and highly significantfor 4◦C < Tmax < 21◦C (Figure 3(f)).

The oscillatory nature of the PDFs is striking and may be related to previously known oscillations found in the

Page 7: Visualizing Climate Variability with Time-Dependent ... · Visualizing Climate Variability with Time-Dependent Probability Density Functions, Detecting it with Information Theory

J.W. Larson / Procedia Computer Science 00 (2012) 1–11 7

monthly CET by Benner [18]. Benner estimated power spectra for the CET using four techniques: the fast Fouriertransformation (FFT), the Lomb-Scargle periodogram (LSP), singular spectrum analysis (SSA), and the global waveletspectrum (GWS). He found many different oscillation periods in the CET’s spectrum, and three of these techniques—FFT, LSP, and GWS—identified periods near those in the CET daily ρ(t,Tavg), ρ(t,Tmin), and ρ(t,Tmax). The 125-yearoscillation that appears in ρ(t,Tavg) may be related to the long-period oscillation in the monthly CET Tavg timeseriesthat has τFFT = 113, τLSP = 112.97, and τGWS = 108.53 years. The 70–80-year oscillation that appears in ρ(t,Tmin)and ρ(t,Tmax) may be related to the interdecadal oscillation in the monthly CET Tavg timeseries that has τFFT = 67.8,τLSP = 67.78, and τGWS = 69.77 years, possibly the Atlantic Multidecadal Oscillation (AMO) [19].

1800 1820 1840 1860 1880 1900 1920 1940 1960 1980Center of 30-year Sample WIndow (Year)

3.6

3.7

3.8

3.9

4

4.1

4.2

4.3

4.4

4.5

Sh

an

no

n E

ntr

op

y H

(b

its

)

CET Daily Tavg

(1772-2006)

30-Year Windowed Shannon Entropy

(a)

1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990Center of 30-year Sample WIndow (Year)

3.4

3.5

3.6

3.7

3.8

3.9

4

4.1

4.2

4.3

4.4

4.5

Sh

an

no

n E

ntr

op

y H

(b

its

)CET Daily T

min (1878-2006)

30-Year Windowed Shannon Entropy

(b)

1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990Center of 30-year Sample WIndow (Year)

3.4

3.5

3.6

3.7

3.8

3.9

4

4.1

4.2

4.3

4.4

4.5

Sh

an

no

n E

ntr

op

y H

(b

its

)

CET Daily Tmax

(1878-2006)

30-Year Windowed Shannon Entropy

(c)

Figure 4: Thirty-year windowed Shannon entropy H for CET daily (a) Tavg (1772–2006), (b) Tmin (1878–2006), and (c) Tmax (1878–2006).

SE timeseries H(t) were computed from the ρ(t,T ) by using (2). Uncertainty quantification was performed bycomputing H using Monte Carlo–generated ensembles of 10,000 “neighboring” PDFs defined by (5) and (6) (Figure4). The box-whisker symbols are defined with whiskers corresponding to the 1st and 99th percentiles, box edgesto the 25th and 75th percentiles, and box center to the median. The dramatic up/down jumps in H with respect totime are thus unlikely to be numerical artifacts. The timeseries of H(Tavg) (Figure 4(a)) shows a pronounced peakat its maximum value shortly before 1820, when the lower mode was weak to nonexistent; this broadening woulddramatically lower local values of the PDF and thus create larger logarithmic terms in (2). The periods with low(high) values of H(Tavg) frequently correspond to times when both the upper and lower modes in ρ(t,Tavg) are present(absent or weak), but this correspondence is not complete. The SE H(Tmax) (Figure 4(c)) are equally hard to interpret.Each jump or dip in H(Tmax) indicates some structural change in ρ(t,Tmax), but identifying the responsible feature(s)is difficult. The simplest of the SE plots to interpret is that of H(Tmin) (Figure 4(b)) because ρ(t,Tmin) is practicallyunimodal. The broad high value during 1895–1905 appears to be due to splitting of the mode. Narrow local maximaH(Tmin) are present at other times when the mode is split, namely, shortly before 1920 and 1930 and in the early1980s. A wide, almost uninterrupted trough in H(Tmin) for 1930–1950 appears to be caused by the sharpening of themode. By contrast, an even lower trough during 1965–1982 coexists with the mode being at its widest. Although theSE is clearly a sensitive integrated measure, its integrative nature obscures which features change its value. Thus, Q2is answered, but the answer’s utility is unclear.

Figures 5 (a,c,e) collectively answer Q3 for the CET monthly Tavg and daily (Tavg,Tmin, Tmax). Figure 5(a) showsthe KL gain from any 30-year sample window’s PDF, to predict PDFs derived from the full sample (1659–2009; solidline) and the preindustrial era (1659–1869; dotted line). From previous discussion, the high KLD values for 1659–1750 are suspect because of truncation effects during the period 1659–1721. KLD values for the period 1750–1890are relatively low, with some oscillatory behavior. After 1890 there are dramatic jumps in the KLD to the whole-record PDF and even more dramatic increases in the KLD to the preindustrial PDF. Some drop occurs during thestabilization period 1950–1975, followed by even more dramatic increases after 1975. Late in the 20th century theKL gain to the preindustrial PDF is higher than for any other part of the CET record, signifying strong structuralchange in the PDF since the preindustrial era. The KL gains from the late 20th century PDFs ρ(t,Tavg) are also high.Overall, the KL gain identifies much of the 20th century’s time-evolving climate PDFs as distinctly weaker in theirability to predict the long-scale record, and thus structurally fundamentally different. The daily Tavg (Figure 5(c))shows a similar degradation in skill through increasing KLD values during the 20th century. The representativeness

Page 8: Visualizing Climate Variability with Time-Dependent ... · Visualizing Climate Variability with Time-Dependent Probability Density Functions, Detecting it with Information Theory

J.W. Larson / Procedia Computer Science 00 (2012) 1–11 8

1675 1700 1725 1750 1775 1800 1825 1850 1875 1900 1925 1950 1975Center of 30-year q-Sampling Window (Year)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

DK

L(p

||q

) (b

its

)

Whole Record (1659-2009)

Preindustrial (1659-1869)

Monthly CET Tavg

1659-2009

KL Gain From 30-Year Window

(a)

1675 1700 1725 1750 1775 1800 1825 1850 1875 1900 1925 1950 1975Center of 30-year p-Sampling Window (Year)

0

0.05

0.1

0.15

0.2

0.25

DK

L(p

||q

) (b

its

)

Whole Record (1659-2009)

Preindustrial (1659-1869)

Monthly CET Tavg

1659-2009

KL Gain to 30-Year Window

(b)

1800 1820 1840 1860 1880 1900 1920 1940 1960 1980Center of 30-year q-Sampling Window (Year)

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

DK

L(p

||q

) (b

its)

Whole Record (1772-2006)

Preindustrial (1772-1869)

Daily CET Tavg

1772-2006

KL Gain From 30-Year Window

(c)

1800 1820 1840 1860 1880 1900 1920 1940 1960 1980Center of 30-year p-Sampling Window (Year)

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

DK

L(p

||q

) (b

its)

Whole Record (1772-2006)

Preindustrial (1772-1869)

Daily CET Tavg

1772-2006

KL Gain to 30-Year Window

(d)

1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990Center of q-Sampling Window (Year)

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01

0.011

0.012

DK

L(p

||q

) (b

its)

Tmin

Tmax

Daily CET Extrema 1878-2006KL Gain From 30-Year Window to Whole Record

(e)

1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990Center of p-Sampling Window (Year)

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01

0.011

DK

L(p

||q

) (b

its)

Tmin

Tmax

Daily CET Extrema 1878-2006KL Gain to 30-Year Window from Whole Record

(f)

Figure 5: CET Kullback-Leibler divergences. Divergences from 30-year sampling windows to larger records for (a) monthly CET Tavg, (b) dailyCET Tavg, and (c) daily CET Tmin and Tmax. Divergences from larger record to 30-year sampling windows for (d) CET monthly Tavg, (e) CETDaily Tavg, and (f) CET daily Tmin and Tmax.

of the CET extreme values of their whole record 1878–2006 (Figure 5(e)) show marked contrast between Tmin andTmax. The KL gain from ρ(t,Tmax) is highly cyclic, with strong peaks at 1895, 1918, 1955, and 1990, which suggest aconnection to the oscillation present in ρ(t,Tmax). The 1990 peak is the highest, confirming that ρ(1990,Tmax) is themost distantly related PDF to the complete record’s PDF, again suggesting that recent Tmax climatology is distinctlydifferent from the century that preceded it. The KL gain from ρ(t,Tmin) to the complete record’s ρ(Tmin) has highpeaks around 1895 and 1995, with weaker variability imposed on a trough spanning 1905–1980. Again, this suggestsany 30-year period centered on one of the years 1905–1980 is much more representative than the peaks at either endof the record. For the period 1980–1995 the KL gain required to predict ρ(Tmin) from ρ(t,Tmin) is nearly as high as thehighest value seen in 1895 and suggests that recent Tmin climatology is relatively alien compared with the full record.

Figures 5 (b,d,f) collectively answer Q4 for the CET monthly Tavg and daily (Tavg ,Tmin, Tmax), that is, the amountof additional information needed to predict ρ(t,T ), given the full record’s PDF ρ(T ). The monthly Tavg show KL gainto any 30-year sample-generated ρ(t,Tavg) from the full and preindustrial PDFs to be large for the early, deprecated partof the record and the period after 1975. A long trough covers 1740–1975, with some weaker variability superimposedon it (Figure 5(b)). This is true for the full-sample and preindustrial PDFs. Significantly, the strongest significantvalues for these KL gains lie in the period after 1975, again demonstrating that recent climatology is odder than anyother 30-year period. For the daily Tavg the KL gain to ρ(t,Tavg) from the full-record and preindustrial PDFs (Figure5(d)) identifies the period after 1980 as being the most distantly related to the full and preindustrial records. Thegains necessary to predict recent sample PDFs from the full and preindustrial records are dramatically greater thanthose seen for the 1910–1950 warming period. It is no surprise that the information gain required to predict ρ(t,Tavg)from the preindustrial sample rises steadily from 1870 forward, a consequence of the p- and q- sampling windowsbecoming disjoint after 1900 and then being separated by increasing time lags up to the present. The respite fromthis rising trend is associated with the 1950–1975 cooling/stabilization period. The information gain from the PDF ofthe full daily CET extrema record 1878–2006 to any 30-year windowed PDF (Figure 5(f)) is structurally similar toFigure 5(e) but with slightly lower peaks. The reason is that the q-window used to generate the full PDF contains the

Page 9: Visualizing Climate Variability with Time-Dependent ... · Visualizing Climate Variability with Time-Dependent Probability Density Functions, Detecting it with Information Theory

J.W. Larson / Procedia Computer Science 00 (2012) 1–11 9

reference PDF’s p-window as a subset. Note again that the full record’s PDF has the most trouble predicting PDFsgenerated from windows centered on years after 1985, identifying the climatology of daily CET extreme temperaturesas distinctly odd with respect to the full observational record.

(a) (b) (c)

Figure 6: Time-shifted Kullback-Leibler divergences for the daily CET: (a) Tavg (1772–2006), (b) Tmin (1878–2006), and (c) Tmax (1878–2006).

The answer to Q5 can be found by computing time-shifted KLD values that compare each 30-year window to everyother 30-year window present in the observational record. Analysis of these results will provide clues to whether wecan answer Q6 as well. Time-shifted KLD values for the daily CET are shown in Figure 6. These plots have asordinate (abcissa) the center year tq (tp) of a 30-year sampling window used to generate the PDF ρ(tq,T ) (ρ(tp,T )).The value plotted at the point (tq, tp) is DKL(p ‖ q); this value is zero on the line tq = tp because DKL(p ‖ p) = 0.For a fixed value of tq, points vertically above (below) the point (tq, tq) signify the ability of this windowed PDF topredict future (past) climate PDFs. The CET daily Tavg time-shifted KLD results (Figure 6(a)) reveal a block-diagonalstructure comprising multiple low-KLD-value regions; these correspond to periods of relative climate PDF stabilityin that any 30-year window from this range can, with relatively high accuracy, predict another 30-year windowedPDF in this range. Stable time ranges identified are 1800–1850, 1830–1910, 1910–1950, and 1950–1975. Note thatsome of these intervals touch or overlap; overlapping intervals signify a gentler transition in terms of PDF structure,while intervals that merely touch indicate a more abrupt change in PDF structure. These results are consistent withFigure 3(a). Off-diagonal blocks of low KLD values indicate periodicity expressed as similarities from windowedPDFs from one period versus another. In particular, the intervals 1950–1970 and 1830–1870 generate similar PDFs.Note further that only a modest diagonal band is found after 1980, indicating relatively rapid change in ρ(t,Tavg)during this period. Also note that the highest time-shifted KLD values are in vertical (horizontal) bands defined bytq > 1980 (tp > 1980), indicating that climate PDFs centered after 1980 are distinctly different from pre-20th centuryclimate. Time-shifted KLD values for the CET daily Tmin record (Figure 6(b)) show a narrow block-diagonal structure,indicating a series of overlapping short periods of relative PDF stability. Strong contrasts between the warming periodof the early 20th century and mid-century cooling/stabilization period that follow it are evident in relatively strongKLD values. The sampling period after 1980 again appears distinctly different from most other time periods. Thetime-shifted KLD values for Tmax (Figure 6(c)) show a diagonal band whose width varies from 5 to 15 years, withsome block-diagonal structures associated with the periods 1910–1930, 1940–1960, and 1960–1975. These periodsare times of relatively slow change in ρ(t,Tmax) Off-diagonal low-KLD-value blocks indicate periodicity with theintervals 1930–1950 and 1960–1980 appearing related. Off-diagonal high-KLD blocks indicate a strong divergencebetween the statistics associated with the sampling periods centered on the intervals 1910–1925 and 1945–1960. Notethe strong divergence between PDFs associated with sampling windows centered on years after 1980 and the rest ofthe record, again signaling a fundamental shift in the structure of ρ(t,Tmax).

Page 10: Visualizing Climate Variability with Time-Dependent ... · Visualizing Climate Variability with Time-Dependent Probability Density Functions, Detecting it with Information Theory

J.W. Larson / Procedia Computer Science 00 (2012) 1–11 10

6. Conclusions and Future Work

An exploratory data analysis/information theoretic framework to analyze climate variability has been developedand applied to the Central England Temperature record. Viewing the 30-year windowed PDFs has provided, at aglance, deep insight into the evolution of CET monthly and daily average and daily extreme temperatures. Thetime-dependent PDFs are consistent with and add new detail to the CET’s known warming trend. These PDFs alsoexhibit oscillatory behavior that may be connected to known interdecadal and century-scale oscillations present inthe CET monthly average timeseries. The KLD-based measures of representativeness and oddness put the climate ofthe past 30 years in stark contrast with respect to the full and preindustrial observational records. The climatologyof CET average, maximum, and minimum temperatures is distinctly different from that of previous observed times.Time-shifted KLD metrics have identified previously known periods of relative climatic stability. The metrics cast theclimate of recent decades as distinctly different and changing rapidly with respect to the past century’s climate.

The results reported here are preliminary. Near-term areas of future investigation include using other densityestimation techniques to verify these results, applying spectral techniques to the time-dependent PDFs to search morethoroughly for periodicity, developing automatic feature detection schemes to identify periods of climatic stabilityand change, and applying these techniques to larger observational and model-generated data sets. The long-term goalof this work is to determine whether equations of evolution for time-dependent climate PDFs—something akin to(1)—may be reliably estimated from timeseries data and whether such empirical models have any predictive power.

Acknowledgment

This work was supported by the U.S. Department of Energy, under Contract DE-AC02-06CH11357.

References[1] M. Hulme, S. Dessai, I. Lorenzoni, D. R. Nelson, Unstable climates: Exploring the statistical and social constructions of ’Normal’ climate,

Geoforum 40 (2009) 197–206. doi:10.1016/j.geoforum.2008.09.010.[2] International Meteorological Organization, Proceedings of the Meetings in Danzig and Warsaw, 29–31 August and 12 September 1935,

Secretariat of the IMO, Leyden, 1937.[3] K. Hasselmann, Stochastic climate models, Part I: Theory, Tellus 28 (6) (1976) 473–485. doi:10.1111/j.2153-3490.1976.tb00696.x.

URL http://dx.doi.org/10.1111/j.2153-3490.1976.tb00696.x

[4] G. Manley, Central England temperatures: Monthly means 1659 to 1973, Quarterly Journal of the Royal Meteorological Society 100 (425)(1974) 389–405.

[5] C. E. Shannon, A mathematical theory of communication, Bell System Technical Journal 27 (1948) 379–423.[6] T. M. Cover, J. A. Thomas, Elements of Information Theory, 2nd Edition, Wiley-Interscience, New York, 2006.[7] T. DelSole, M. Tippett, Predictability: Recent insights from information theory, Reviews of Geophysics 45 (2007) RG4002.[8] J. Shukla, T. DelSole, M. Fennessy, J. Kinter, D. Paolino, Climate model fidelity and projections of climate change, Geophysical Research

Letters 33 (2006) L07702.[9] J. W. Larson, Information-theoretic strategies for quantifying variability and model-reality comparison in the climate system, in: R. S.

Anderssen, R. D. Braddock, L. T. H. Newham (Eds.), Proceedings of the 18th World IMACS Congress and MODSIM09 InternationalCongress on Modelling and Simulation, Modelling and Simulation Society of Australia and New Zealand and International Association forMathematics and Computers in Simulation, 2009, pp. 2639–2646.

[10] J. W. Larson, Can we define climate using information theory?, IOP Conference Series: Earth and Environmental Science 11 (1) (2010)012028.

[11] D. W. Scott, Multivariate Density Estimation: Theory, Practice, and Visualization, Wiley, New York, 1992.[12] K. H. Knuth, Optimal data-based binning for histograms, http://arxiv.org/abs/physics/0605197 (2006).[13] K. H. Knuth, J. P. Castle, K. R. Wheeler, Identifying excessively rounded or truncated data, in: A. Rizzi, V. Maurizio (Eds.), Proceedings of the

17th meeting of the International Association for Statistical Computing—European Regional Section: Computational Statistics (COMPSTAT2006), Springer, 2006.

[14] P. D. Jones, M. Hulme, The changing temperature of central England, in: M. Hulme, E. Barrow (Eds.), Climates of the British Isles, Present,Past, and Future, Routledge, London, 1997, pp. 173–196.

[15] G. Manley, The mean temperature of central England 1698–1952, Quarterly Journal of the Royal Meteorological Society 79 (340) (1952)242–261.

[16] D. E. Parker, T. P. Legg, C. K. Folland, A new daily central England temperature series 1772–1991, International Journal of Climatology 12(1992) 317–342.

[17] British Atmospheric Data Centre, Hadley Centre, UK Meteorological Office, Historical Central England Temperature (CET) Data,http://badc.nerc.ac.uk/data/cet/ (2009).

[18] T. C. Benner, Central England temperatures: long-term variability and teleconnections, International Journal of Climatology 19 (1999) 391–403.

[19] M. E. Schlesinger, N. Ramankutty, An oscillation in the global climate system of period 65-70 years, Nature 367 (6). doi:10.1038/367723a0.

Page 11: Visualizing Climate Variability with Time-Dependent ... · Visualizing Climate Variability with Time-Dependent Probability Density Functions, Detecting it with Information Theory

J.W. Larson / Procedia Computer Science 00 (2012) 1–11 11

Government License

The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Labora-tory (”Argonne”). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under ContractNo. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonex-clusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to thepublic, and perform publicly and display publicly, by or on behalf of the Government.