an introduction to statistical extreme value theory · an introduction to statistical extreme value...

36
An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Upload: others

Post on 27-Mar-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

An Introduction toStatistical Extreme Value Theory

Uli Schneider

Geophysical Statistics Project, NCAR

January 26, 2004

NCAR

Page 2: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Outline

Part I - Two basic approaches to extreme valuetheory – block maxima, threshold models.

Part II - Uncertainty, dependence, seasonality,trends.

Tutorial in Extreme Value Theory

Page 3: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Fundamentals

In classical statistics: model the AVERAGE behaviorof a process.

In extreme value theory: model the EXTREMEbehavior (the tail of a distribution).

Usually deal with very small data sets!

Tutorial in Extreme Value Theory

Page 4: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Fundamentals

In extreme value theory: model the EXTREMEbehavior (the tail of a distribution).

Usually deal with very small data sets!

Tutorial in Extreme Value Theory

Page 5: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Fundamentals

In extreme value theory: model the EXTREMEbehavior (the tail of a distribution).

Usually deal with very small data sets!

Tutorial in Extreme Value Theory

Page 6: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Different Approaches

Block Maxima (GEV)

Rth order statistic

Threshold approach (GPD)

Point processes

Tutorial in Extreme Value Theory

Page 7: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Block Maxima Approach

Model extreme daily rainfall in Boulder

Take “block maximum” – maximum daily precipitationfor each year: Mn = max{X1, . . . , X365}

54 annual records (data points for Mn):

1950 1960 1970 1980 1990 2000

100

200

300

400

Annual maximum of daily rainfall for Boulder (1948−2001)

years

max

. dai

ly p

reci

p in

1/1

00 in

Tutorial in Extreme Value Theory

Page 8: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Block Maxima Approach

The distribution of Mn = max{X1, . . . , Xn} convergesto (as n → ∞)

G(x) = exp{−[1 + ξ(x − µ

σ)]−

1

ξ }.

G(x) is called the “Generalized Extreme Value”(GEV) distribution and has 3 parameters:

shape parameter ξ

location parameter µ

scale parameter σ.

Tutorial in Extreme Value Theory

Page 9: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Fitting a GEV – Estimating Parameters

Use the 54 annual records to fit the GEV distribution.

Estimate the 3 parameters ξ, µ and σ with “maximumlikelihood” (MLE) using statistical software (R).

Get a GEV distribution with ξ = 0.09, µ = 50.16, andσ = 133.85.

Den

sity

100 200 300 400 500

0.00

00.

002

0.00

40.

006

Tutorial in Extreme Value Theory

Page 10: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Fitting a GEV – Return Levels

Often of interest: return level zm

P (M > zm) =1

m.

Expect every mth observation to exceed the level zm.Or: at any point, there is a 1/m% probability to exceedthe level zm.

Can be computed easily once the parameters are“known”.

E.g. m = 100, then z100 = 420, i.e. expect the annualdaily maximum to exceed 4.2 inches every 100 yearsin Boulder.

Tutorial in Extreme Value Theory

Page 11: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Fitting a GEV – Return Levels

Often of interest: return level zm

P (M > zm) =1

m.

Expect every mth observation to exceed the level zm.

0 20 40 60 80 100

010

020

030

040

0

Return Levels for Boulder

m (years)

m y

ear r

etur

n le

vel

Tutorial in Extreme Value Theory

Page 12: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Fitting a GEV – Assumptions

We did not need to know what the underlyingdistribution of each Xi, i.e. the daily total rainfall was.

Underlying assumption: observations are “iid”

independently andidentically distributed.

Tutorial in Extreme Value Theory

Page 13: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Threshold Models

Model exceedances over a high threshold u –X − u|X > u.

Daily total rainfall for Boulder exceeding 80 (1/100 in).

Allows to make more efficient use of the data.

1960 1970 1980 1990 2000

010

020

030

040

0

Daily total rainfall for Boulder (1948−2001)

years

max

. dai

ly p

reci

p in

1/1

00 in

1960 1970 1980 1990 2000

010

020

030

040

0

Daily total rainfall for Boulder (1948−2001)

years

max

. dai

ly p

reci

p in

1/1

00 in

Tutorial in Extreme Value Theory

Page 14: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Threshold Models

Model exceedances over a high threshold u –X − u|X > u.

Daily total rainfall for Boulder exceeding 80 (1/100 in).

Allows to make more efficient use of the data.

1950 1960 1970 1980 1990 2000

100

200

300

400

Annual maximum of daily rainfall for Boulder (1948−2001)

years

max

. dai

ly p

reci

p in

1/1

00 in

1960 1970 1980 1990 2000

010

020

030

040

0

Daily total rainfall for Boulder (1948−2001)

years

max

. dai

ly p

reci

p in

1/1

00 in

Tutorial in Extreme Value Theory

Page 15: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Threshold Models

Model exceedances over a high threshold u –X − u|X > u.

Daily total rainfall for Boulder exceeding 80 (1/100 in).

Allows to make more efficient use of the data.

1960 1970 1980 1990 2000

010

020

030

040

0

Daily total rainfall for Boulder (1948−2001)

years

max

. dai

ly p

reci

p in

1/1

00 in

Tutorial in Extreme Value Theory

Page 16: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Threshold Models

The distribution of Y := X − u|X > u converges to(as u → ∞)

H(y) = 1 − (1 + ξy

σ̃)−

1

ξ .

H(y) is called the “Generalized Pareto” distribution(GPD) with 2 parameters.

shape parameter ξ

scale parameter σ̃.

The shape parameter ξ is the “same” parameter as in theGEV distribution.

Tutorial in Extreme Value Theory

Page 17: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Fitting a GPD – Estimating Parameters

Use the 184 exceedances over the threshold u = 80to fit the GEV distribution.

Estimate the 2 parameters ξ and σ (using “maximumlikelihood” using statistical software (R).

Get a GPD distribution with ξ = 0.22 and σ = 51.46.

Den

sity

0 50 100 150 200 250

0.00

00.

005

0.01

00.

015

Tutorial in Extreme Value Theory

Page 18: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Fitting a GPD – Choosing a Threshold

Diagnostics: mean excess function – linear?

0 100 200 300 400

−50

050

100

150

200

u

Mea

n E

xces

s

Tutorial in Extreme Value Theory

Page 19: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Fitting a GPD – Choosing a Threshold

Diagnostics: mean excess function – linear?

0 100 200 300 400

−50

050

100

150

200

u

Mea

n E

xces

s

Tutorial in Extreme Value Theory

Page 20: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Fitting a GPD – Choosing a Threshold

Diagnostics: shape and modified scale – constant?

50 100 150 200 250 300

050

010

00

Threshold

Mod

ified

Sca

le

50 100 150 200 250 300

−3−2

−10

1

Threshold

Sha

pe

Tutorial in Extreme Value Theory

Page 21: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Fitting a GPD – Choosing a Threshold

Alternatively:

Choose the threshold u so that a certain percentage ofthe data lies above it (robust and automatic, but is theapproximation valid?).

Tutorial in Extreme Value Theory

Page 22: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Fitting a GPD – Return Levels

Compute 100-year return level for daily rainfall totalsusing the threshold approach: z36500 = 429, i.e.expect the daily total to exceed 4.29 inches every 100years (36500 days).

0 20 40 60 80 100

010

020

030

040

0

Return Levels for Boulder

m (years)

m y

ear r

etur

n le

vel

Tutorial in Extreme Value Theory

Page 23: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Uncertainty (GEV)

Essentially, the maximum likelihood approach yieldsstandard errors for the estimates and thereforeconfidence bounds on the parameters.

From the GEV (block maxima) fit for the yearlymaximum of daily precipitation for Boulder:

ξ = 0.09, 95% conf. interval is (-0.1,0.28).σ = 50.16, 95% conf. interval is (38.77, 61.54).µ = 133.85, 95% conf. interval is (118.58,149.12).

Tutorial in Extreme Value Theory

Page 24: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Uncertainty (GEV)

Essentially, the maximum likelihood approach yieldsstandard errors for the estimates.

These errors can be propagated to the return levels:

100

200

300

400

500

600

Return Period

Ret

urn

Leve

l

0.1 1 10 100 1000

Tutorial in Extreme Value Theory

Page 25: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Uncertainty (GPD)

More data means less uncertainty.

From the GPD (threshold model) fit for dailyprecipitation in Boulder:

ξ = 0.22, 95% conf. interval is (-0.12,0.16).

σ = 51.46, 95% conf. interval is (40.70, 62.21).

100

200

300

400

Return period (years)

Ret

urn

leve

l

0.1 1 10 100 1000

Tutorial in Extreme Value Theory

Page 26: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Uncertainty (GPD)

More data means less uncertainty.

From the GPD (threshold model) fit for dailyprecipitation in Boulder:

100

200

300

400

Return period (years)

Ret

urn

leve

l

0.1 1 10 100 1000

Tutorial in Extreme Value Theory

Page 27: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Dependence – Declustering

For the GEV and GPD approximations to be valid, weassume independence of the data.

If the data is dependent, can use declustering to“make” them independent.

E.g. pick only one (the max) point in a cluster thatexceeds a threshold.

Tutorial in Extreme Value Theory

Page 28: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Dependence – Declustering

Assume we want to make inference about hourlyprecipitation in Boulder.

To decluster (instead of using 24 values for eachday), we select only the maximum daily (1-h) recordto fit the GPD model.

1950 1960 1970 1980 1990 2000

050

100

150

time

daily

1h

max

. pre

cip.

Tutorial in Extreme Value Theory

Page 29: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Dependence – (fitting the GPD)

Choosing a threshold – mean excess function as adiagnostic:

0 100 200 300 400

−50

050

100

150

200

u

Mea

n E

xces

s

0 100 200 300 400

−50

050

100

150

200

u

Mea

n E

xces

s

Tutorial in Extreme Value Theory

Page 30: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Dependence – (fitting the GPD)

Choosing a threshold – mean excess function as adiagnostic:

0 100 200 300 400

−50

050

100

150

200

u

Mea

n E

xces

s

Tutorial in Extreme Value Theory

Page 31: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Dependence – (fitting the GPD)

50 100 150 200

−100

050

150

Threshold

Mod

ified

Sca

le

50 100 150 200

−0.4

0.0

0.4

0.8

Threshold

Sha

pe

Tutorial in Extreme Value Theory

Page 32: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Dependence – (fitting the GPD)

u = 75 seems to be a good threshold using thediagnostics.

But u = 75 only leaves 28 data points above thethreshold.

Use u = 35 instead (with 108 data points above thethreshold) to get the following estimates:

ξ = −0.05, 95% conf. interval is (-0.27,0.15).σ = 27.98, 95% conf. interval is (19.94, 36.02).100-year return level is zm = 185, i.e. expect thehourly rainfall to exceed 1.85 inches every 100years (10-year level is 1.36 inches.)

Tutorial in Extreme Value Theory

Page 33: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Dependence – (fitting the GPD)

Use u = 35 (with 108 data points above thethreshold) to fit a GPD model.

0 20 40 60 80 100

050

100

150

Return Levels for Boulder

m (years)

m y

ear (

hour

ly) r

etur

n le

vel

Tutorial in Extreme Value Theory

Page 34: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Seasonality

0.3 0.4 0.5 0.6 0.7 0.8

050

100

150

fraction of the year

daily

1h

max

. pre

cip.

Tutorial in Extreme Value Theory

Page 35: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Seasonality

To incorporate seasonality, link the scale parameterto covariates to describe the seasonal cycle.

Use the covariates X1(t) = sin(2πf(t)) andX2(t) = cos(2πf(t)), where f(t) =fraction of the yearfor each day t.

0.3 0.4 0.5 0.6 0.7 0.8

−1.0

−0.5

0.0

0.5

1.0

fraction of the year

cova

riate

s

Tutorial in Extreme Value Theory

Page 36: An Introduction to Statistical Extreme Value Theory · An Introduction to Statistical Extreme Value Theory Uli Schneider Geophysical Statistics Project, NCAR January 26, 2004 NCAR

Seasonality

To incorporate seasonality, link the scale parameterto covariates to describe the seasonal cycle.

Use the covariates X1(t) = sin(2πf(t)) andX2(t) = cos(2πf(t)), where f(t) =fraction of the yearfor each day t.

Use an exponential link function to link the covariatesto the scale parameter:

σ̃(t) = exp(β0 + β1X1(t) + β2X2(t)).

Fit a GPD with density

GPD (ξ, σ̃(t) = exp(β0 + β1X1(t) + β2X2(t))) .

Tutorial in Extreme Value Theory