an introduction to statistical modelling of extreme values

An introduction to statistical modelling of extreme values Application to calculate extreme wind speeds Edward Omey, Fermin Mallor, Eulalia Nualart

HUB RESEARCH PAPER 2009/36 NOVEMBER 2009

An introduction to statistical modelling of extreme values Mallor, Nualart, Omey

I

An introduction to statistical modelling

of extreme values.

Application to calculate extreme wind speeds

Pamplona, 2009

Fermín Mallor, Public University of Navarre

Eulalia Nualart. Public University of Navarre

Edward Omey. Hogeschool Universiteit Brussel


II


III

An introduction to statistical modelling

of extreme values.

Application to calculate extreme wind speeds

Contents

1. INTRODUCTION

2. CLASSICAL EXTREME VALUE THEORY

3. THRESHOLD MODELS

4. EXTREMES OF DEPENDENT SEQUENCES

5. BIBLIOGRAPHY


IV


1

1. INTRODUCTION

High wind speeds pose a threat to the integrity of structures such as wind turbines. An

accurate estimation of the occurrence of extreme wind speeds is an important factor in

achieving a correct balance between safety and cost of “over-design”. This design

problem also arise in many other engineering areas such as ocean engineering (with the

wave height), hydraulics engineering (floods), structural engineering (earthquakes) and

also in meteorology (temperatures, rainfall, etc), fatigue strength (workloads), etc. All

these applications have in common that the interest is not the knowledge of the average

behaviour of the analysed phenomena but the extreme behaviour of them. Then, the

distinguishing feature of an extreme value statistical analysis is that the objective is not

to describe the usual behaviour of the stochastic phenomena but the unusual and the

rarely observed events.

For example, suppose that a sea-wall is going to be built with the purpose of protecting

the coast against all sea-levels

that it is likely to occur within

its projected life span (for

example, 100 years). Accurate

estimation of the highest sea-

level in 100 years is necessary

in order to balance

economical and safety goals.

The problem that the

statistical methods face is that

the records of sea-levels could

span for shorter periods of time, of say 15 years. The challenge is to estimate what sea-

levels might occur over the next 100 years given the 15 year data.


2

Motivation of extreme wind speed analysis: the calculus of Vref

The extreme wind speed estimates are used to determine critical design loads which the

turbine must withstand during its

lifetime. According to the

International Standard IEC 61400-1

for Wind Turbine Generator

Systems the extreme wind speed

Vref is a basic parameter for wind

turbine classes and therefore

strongly related to design of wind

turbines. The Vref is defined as the

extreme 10-min average wind speed

with a recurrence period of 50 years. In general Vref has to be determined statistically

on the basis of on-site measurement.

Classification of wind turbine generators according to Vref

Return levels and return periods

A return level with a return period of T = 1/p years is a high threshold x(p) (e.g.,

annual peak for the10-minutes average wind speed) whose probability of exceedance is

p. For example, if p = 0.01, then the return period is T = 100 years.

Then the parameter of interest vref is the return period of 50 years for the 10 minutes

average wind speed.


3

Two common interpretations of a return level with a return period of T years are:

(i) Waiting time: Average waiting time until next occurrence of event is T years

(ii) Number of events: Average number of events occurring within a T-year time period

is one

The statistical theory developed to deal with these problems and this type of data is

known as Extreme Value Theory. The presentation of its main results as well as its

application to the analysis of extreme wind speeds are the two main purposes of this

monography.

A sample of consequences of extreme winds

http://www.youtube.com/watch?v=oAWMpxX60KM&feature=player_embedded

http://www.youtube.com/watch_popup?v=CqEccgR0q-o

http://www.youtube.com/watch?v=b43lAoovqd8&feature=fvw


4

2. CLASSICAL EXTREME VALUE THEORY

2.1. Model formulation

The core of the extreme value theory is the study of the statistical behaviour of

{ }nn XXM ,,max 1…=

where { }nXX ,,1… is a sequence of independent random variables having a

common distribution function F.

In applications, variables iX usually represent values of a process measured on a

regular time-scale, as for example the 10 minutes average (or maximum) wind

speed. Then nM is the maximum of the observed process over n time units.

The distribution function of nM verifies:

( ) ( ) ( ) ( ) ( )nnnn zFzXPzXPzXzXPzMP )(,, 11 =≤××≤=≤≤=≤ ……

Thus, a way to study nM is to estimate F from the available data (for example the

10 minutes speed records measured during certain interval of time) and then to

substitute this estimation in the previous formula to estimate nM .

The problem of this approach is that small deviances in the estimation of F lead to

large discrepancies for Fn(z).

One alternative approach is to estimate Fn(z) directly from the extreme data. This

idea is similar to that used to estimate the distribution of the sample mean average.

Following this way it is necessary to study the behaviour of Fn(z) as n tends to

infinity. However, since F(z) < 1 for supzz < , where supz is the smallest value of z

such that 1)( =zF , we have 0)( → ∞→nn zF .

To overcome this difficulty, reaching a limit different from 0, we use the following

linear normalization of nM :

n

nnn a

bMM

−=* , where {an } and {bn} are sequences of constants with an > 0.


5

Example 1 The Pareto distribution

The Pareto distribution is azzF −−= 1)( where z > 1 and a > 0.

Looking at )(zF n , we have ( )nan zzF −−= 1)( . In view of the well known formula

for exp(z), we replace z by na z to find that

)exp()1()( 1 annaan zznznF −∞→−− − →−=

Then, YMn na →− , where the distribution function of Y is given by

)exp()( aY zzF −−=

Example 2 The exponential distribution

Now we have 0)exp(1)( >−−= zzzF . In this case nn zzF ))exp(1()( −−= and

again using the definition of the exponential function, we find that

))exp(exp())exp(1())log(( 1 zznnzF nnn −− →−−=+ ∞→−

This shows that YnM n →− )log( , where ))exp(exp()( zzFY −−= .

Example 3 The uniform distribution on [0, 1]

Here we have xxF =)( , for 0 < x < 1. Hence nn xxF =)( (0 < x < 1) and

consequently

)exp()1()1( 11 xxnxnF nnn →+=+ ∞→−− , for x < 0.

This shows that YMn n →− )1( , where )exp()( xzFY = , x < 0.

These examples are an illustration of the following general result.

The following theorem states that if *nM converges in distribution to a

nondegenerate variable Y, then Y automatically has a distribution function within

one of three classes of distributions.


6

Theorem 1: Extremal types theorem. If there exist sequences of constants }{ na

and }{ nb such that

{ } )(/)( zGzabMP nnnn →≤− ∞→ , where G is a non-degenerate distribution

function, then G belongs to one of the following types:

I. ∞<<∞−

−−−= za

bzzG ,expexp)( (Gumbel)

II.

>

−−

≤=

−bz

a

bz

bz

zGα

exp

0

)( (Fréchet)

III.

≥

<

−−−=

bz

bza

bzzG

1

exp)(

α

(Weibull)

In all cases, a > 0 and b real. In the case II and III we have α > 0.

These three classes of distributions are named the extreme value distributions, with

types I, II and III , respectively, and also known as Gumbel, Fréchet and Weibull

families, respectively.

Observe that these three types

of distributions are the only

possible limits for the

distributions of the normalized

maxima regardless of the

distribution F for the

population.

The three limit types have

different forms of tail

behaviour. The end point supz is finite for the Weibull distribution

( ξσµ −=supz ) while ∞=supz for the Fréchet and Gumbel distributions.

However, the density of Gumbel distribution decays exponentially and the density


7

of Fréchet distribution decays polynomially. The Gumbel type is the domain of

attraction for many

common distributions, like

normal, lognormal,

exponential and gamma.

The Fréchet type has a

heavy tail, verifying that

( ) ∞=rXE for ξ1≥r

(which means that it has

infinite variance if

21≥ξ ).

2.2. The generalized extreme value distribution

It was usual in the past to adopt one of the three families and then to estimate the

parameters of the model. But this way has a weakness: it needs to choose one out of

the three models which is assumed to be correct and then the uncertainty implied by

this choice is not considered in the subsequent inferences. A better analysis can be

done combining the three models into one single family of models named the

generalized extreme value distribution (GEV):

−+−=− ξ

σµξ

1

1exp)(z

zG

defined on z such that 0)(1 >−+ σµξ z and with parameters

∞<<∞−>

∞<<∞−

ξσ

µ

shape

scale

location

0

Type II distribution is obtained when 0>ξ

Type III distribution is obtained when 0<ξ

The type I (Gumbel distribution) is obtained by letting 0→ξ


8

This unification facilitates the statistical analysis. The uncertainty in the estimation

of the parameterξ measures the lack of certainty in the choice of one of the three

models. Now, the extremal types theorem can be re-stated in the following way

Theorem 2. If there exist sequences of constants }{ na and }{ nb such that

{ } )(/)( zGzabMP nnnn →≤− ∞→ , where G is a non-degenerate distribution

function G, then G is a distribution of the GEV family:

−+−=− ξ

σµξ

1

1exp)(z

zG

defined on z such that 0)(1 >−+ σµξ z and with parameters

∞<<∞−>

∞<<∞−

ξσ

µ

shape

scale

location

0

One of the nice properties of the GEV-family is that it is a family of max – stable

distributions.

Theorem 2B

Suppose that nYYY ,,, 21 … are independent random variables with distribution

function a GEV distribution FY (x) = G(x). Then ),,,max( 21 nn YYYM …= is of the

same type, i.e. there are constants 0>nu and nv real, such that

YuvMd

nnn =− )(

Proof

For simplicity take µ = 0 and σ = 1. For ξ = 0 we have ))exp(exp()( xxG −−=

and then it is clear that )())log(( xGnxGn =+ .

For ξ ≠ 0, we have ))(exp())1(( 1 ξξ −−=− xxG , and it follows that

))(exp())(exp()1( 11 ξξξξ −−→−=− xxnnxnGn •


9

Furthermore, in practise, we don’t have problems with the normalizing constants.

For large n, we have ( ) )()( zGzabMP nnn ≈≤− . But then it follows that

( ) ( ) )()( * zGabzGzMP nnn =−≈≤ , where G*(z) again belongs too to the GEV

family.

2.3. Practical implementation

The above results lead to the following approach for modelling extremes of a series

of independent and identically distributed observations …,, 21 XX . The first step

consists in blocking the data into sequences of n observations, n being sufficiently

large. Then the maxima Zi of each block i is calculated and, finally, the GEV

distribution is fitted to this series of block maxima Z1, Z2, ….

In environmental applications the length of the blocks usually is one year, and the

we use as data the annual maximum Zi of year i.

Once the GEV distribution has been fitted, let say for the annual maxima, we can

calculate the quantile function, pz , for the annual maximum distribution as:

( )( )

=−−−≠−−−−=

−

0)1log(log

0))1log(1()(

ξσµξξσµ ξ

forp

forpzp

Observe that pzG p −= 1)( . In our terminology, this is the return level associated

with the return period p1 . That is, pz is the level that is expected to be exceeded,

in average, once every p1 years. Equivalently, pz is the level that is exceeded by

the annual maximum in any particular year with probability p.

By using )1log( pyp −−= , this quantile function can be expressed as

=−≠−−

=−

0log

0)1()(

ξσµξξσµ ξ

fory

foryz

p

pp

If pz is plotted against pylog the plot is linear in the case of 0=ξ ; the plot is

convex in the case of 0<ξ with asymptotic limit as p tends to 0 to ξσµ )( − and

the plot is concave for 0>ξ and has not finite bound.


10

This graph is named a return level plot and it is useful as validation tool as well as a

way of presenting the fitted model.

2.4. Inference for the GEV distribution

The choice of the length of blocks implies a trade off between bias and variance.

When the length of the blocks is small, then the approximation of the distributions

by the limit is quite poor and this is leading to bias in estimation and extrapolation.

Long blocks on the other hand generate only few data leading to large estimation

variance.

The method most commonly used to estimate the parameters is the likelihood

method. One difficulty of this approach is that the regularity conditions for its

application are not satisfied by the GEV distributions because the end-point of the

distribution depends on the parameter values. This violation means that the standard

asymptotic likelihood results are not automatically applicable. This problem has

been studied in detail (Smith, 1985) with the following results:

• When 5.0−>ξ the maximum likelihood estimators have the usual

asymptotic properties.

• When 5.01 −<<− ξ the maximum likelihood estimators can be obtained in

general but they do not have the standard asymptotic properties.

• When 1−<ξ the maximum likelihood estimators are unlikely to be

obtainable.

Observe that the case 5.0−<ξ corresponds to distributions with a very short

bounded upper tail, which is rarely the case in real applications of extreme value

modelling.

By denoting mZZ ,,1… the block maxima and under the assumption that they are

independent variables having a GEV distribution, the log-likelihood for the GEV

when 0≠ξ is

∑∑=

−

=

−+−

−++−−=m

i

im

i

i zzm

1

1

111log)11(log),,(

ξ

σµξ

σµξξσξσµℓ


11

provided that 01 >

−+σ

µξ iz for i=1,…,m. When this condition is not satisfied

then the likelihood is zero and the log-likelihood is minus infinity.

In the Gumbel case ( 0=ξ ), the log-likelihood is:

∑∑==

−−−

−−−=m

i

im

i

i zzm

11explog),(

σµ

σµσσµℓ

By maximizing these log-likelihood functions, we obtain the maximum likelihood

estimates )ˆ,ˆ,ˆ( ξσµ . The optimization is made using numerical optimization

algorithms.

The classical theory of maximum likelihood estimation establishes that the

distribution of )ˆ,ˆ,ˆ( ξσµ is approximately normal with mean ),,( ξσµ and variance-

covariance matrix equal to the inverse of the observed information matrix evaluated

at the maximum likelihood estimate. Confidence intervals are obtained from this

approximate normality of the estimator.

Inference for return levels

The maximum likelihood estimate of the 1/p return level pz for 0<p<1 is

=−≠−−=

−

0ˆlogˆˆ

0ˆ)1()ˆˆ(ˆˆ

ˆ

ξσµξξσµ ξ

fory

foryz

p

pp

where )1log( pyp −−= . Confidence intervals can be set using the normal

approximation of the estimator distribution, but caution is required in the

interpretation, especially for return levels corresponding to long return periods

because the normal approximation may be poor. A better approximation is generally

obtained from the profile likelihood function.


12

2.5. Graphical model checking

Though it is impossible to check the validity of an extrapolation based on the GEV

model, assessment can be done with reference to the observed data.

Probability Plot.

A probability plot is a comparison of the empirical and fitted distribution functions.

The empirical distribution function evaluated in the i-th ordered block maximum,

)(iZ , is )1()(~

)( += miZG i , and the fitted distribution function in the same point is

−+−=

− ξ

σµ

ξˆ1

)()( ˆ

ˆˆ1exp)(ˆ ii

zZG .

In order to have a good model it is necessary that )(ˆ)(~

)()( ii zGzG = . In practise the

plot of points ( ) mizGzG ii ,,1)(ˆ),(~

)()( …= , should lie close to the first diagonal. But

because both functions are bounded to approach 1 as the values of z increase, the

plot is least informative in this region. The following graph avoids this deficiency.

Quantile plot.

The quantile plot is a representation of the points

( ) mizmiG i ,,1)),1((ˆ)(

1…=+− , where

mim

imiG ,,1

1log1ˆ

ˆˆ))1/((ˆ

ˆ1

…=

+−−−=+

−−

ξ

ξσµ

In the ideal situations the plot should show a linear function. Departures from

linearity in the quantile plot also indicate model failure.

Return level plot.

The return level plot represents the points ( ) 10ˆ,log << pzy pp . Confidence

intervals are usually added to this plot to increase its information. The importance of

return periods in engineering is due to the fact that the return period is used as a

design criterion. Furthermore, to use this plot as a model diagnostic one, the

empirical estimates of the return level function are also added. For suitable models

the model based curve and empirical estimates should be in agreement.


13

2.6. Case study. Hourly average wind data from Schiphol in

Netherlands.

We consider the records of

the hourly average wind

speed at the location of

Schiphol, Netherlands (lat.

52.330 north, lon. 4.738

east). Data were recorded by

the ''Royal Netherlands

Meteorological Institute'',

through the KNMI HYDRA

PROJECT from March 1,

1950 to December 31, 2005.

The measuring height was

10 meters.

(http://www.knmi.nl/samen

w/hydra/index.html)


14

The data. Next figures show the original data, the daily, monthly and yearly

maxima, respectively.

An introduction to statistical modelling of extreme values

15

Analysis with the package extRemes of the yearly maxima data. http://www.assessment.ucar.edu/toolkit/ The Extremes Toolkit is an interactive program for analyzing extreme value data using the R statistical programming language. A graphical user interface is provided, so a knowledge of R is not necessarily required.


16

Max. Wind Speed 1950-2005. Shirphol

0

50

100

150

200

250

300

1940 1950 1960 1970 1980 1990 2000 2010

Year

Mea

n h

ou

rly

spee

d m

/s

N mean Std.Dev. Min. Q1 median Q3 max 56 208.39 26.06 157 189.5 205.0 226.5 280.0

Step 1. Loading the file with the year maximum wind speed data.


17

The following plots of the data have been made using the scatter plot option of the package extRemes:


18

Step 2. Fit a GEV distribution to this yearly maxima data. We use the option of

extRemes: Analyze > Generalized Extreme Value (GEV) Distribution

The numerical results provided by the software are: ************ GEV fit ----------------------------------- Response variable: speedmaxyear L-moments (stationary case) estimates (used to initialize MLE optimization routine): Location (mu): 197.6025 Scale (sigma): 23.84311 Shape (xi): -0.1420611 Likelihood ratio test (5% level) for xi=0 does not reject Gumbel hypothesis. likelihood ratio statistic is 2.308828 < 3.841459 1 df chi-square critical value. p-value for likelihood-ratio test is 0.128641 Convergence successfull![1] "Convergence successfull!" [1] "Maximum Likelihood Estimates:" MLE Stand. Err. MU: (identity) 197.88060 3.50346 SIGMA: (identity) 23.55972 2.45932 Xi: (identity) -0.15223 0.08944 [1] "Negative log-likelihood: 260.451307992363"


19

Parameter covariance: [,1] [,2] [,3] [1,] 12.2742524 1.68235174 -0.116433152 [2,] 1.6823517 6.04824808 -0.098884367 [3,] -0.1164332 -0.09888437 0.008000298 [1] "Convergence code (see help file for optim): 0" NULL Model name: gev.fit1 The diagnostic plots are:

This diagnostic plot can be recovered using:

Plot > Fit diagnostics


20

Estimation of confidence intervals are provided for the parameters using the profile

likelihood method:

Analyze > Parameter Confidence Intervals > GEV fit

The numerical results are: ***** [1] "Estmating CIs for GEV 50-yr. return level and shape parameter (xi)." [1] "Estimated return level = 267.1967" [1] "Estimated (MLE) shape parameter = -0.1522" [1] "50-year return level: 95% confidence interval approximately" [1] "(253.91618, 295.71891)" [1] "shape parameter (xi): 95% confidence interval approximately" [1] "(-0.34134, 0.65109)" ***** The profile plots are


21

3. THRESHOLD MODELS

3.1. Model formulation and main result

Modelling only block maxima implies that we waste a lot of data if a detailed recording

of the studied phenomenon is available. Now we propose another alternative analysis

that is more efficient in the use of data. The approach consists in considering for the

analysis those data that are viewed as extreme observations, let say, those data that

surpass a threshold level u. Then the stochastic behaviour of these excesses over u is

studied. More formally, given nXX ,,1… a sequence of independent and identically

distributed random variables, having distribution function F, we are interested in the

conditional probability )()( uXyuXPyFu >+≤= , this is)(1

)()()(

uF

uFyuFyFu −

−+= .

The following result gives an approximation to this probability for high values of the

threshold u.

Theorem 3. Let nXX ,,1… be a sequence of independent and identically distributed

random variables with a common distribution function F, and { }nn XXM ,,max 1…=

satisfying the conditions to be approximated by a GEV, that is, for large n:

{ }

−+−=≈≤− ξ

σµξ

1

1exp)(where),(Prz

zGzGzM n

Then, for large enough u, the distribution function of (X – u), conditioned to X > u, is

approximately given by

ξ

σξ 1

~11)(−

+−= yyH GENERALIZED PARETO DISTRIBUTION

defined on { }0)~y(1 and0/ >+> σξyy , and where )(~ µξσσ −+= u .


22

This result relates the two approximations to study the distribution of the maximum. We

see how the parameters of the Generalized Pareto Distribution (GDP) are uniquely

determined by the parameters of the associated GEV distribution of block maxima.

Observe that this imply that if we change the size of blocks in the GEV analysis then the

parameter ξ remains unperturbed while the parameters µ and σ change but

compensating their values to provide a fixed value for σ~ .

As for the GEV distribution the ξ parameter is dominant for determining the qualitative

behaviour of the GPD distribution:

• If ξ < 0, then the distribution of excesses is bounded by ξσ~−u .

• If ξ > 0, then the distribution is unbounded.

• If ξ = 0, then the distribution is also unbounded and is in the exponential family

with parameter σ~1 .


23

3.2. Threshold selection. Mean residual life plot

Let },,{ 1 nxx … be the original data and let us consider as extreme events those that

excess a threshold u, say, )()1( ,, kxx … . We denote the excesses over the threshold by

uxy jj −= )( . Because of the previous theorem, when the threshold u is large enough,

the values jy can be viewed as independent realizations of a variable distributed

according to a GPD, whose parameters have to be estimated and then the model

validated.

The issue of how to choose the threshold is similar to that of selecting the size of a

block in the sense that both imply a balance between bias and variance. A low level

leads to failure in the asymptotic approximation of the model and a high level provides

few observations and then high variance.

A method to help in the choice of the threshold is based on the mean of the GPD: if Y is

a random variable following a GPD with parameters σ and ξ , then )1()( ξσ −=YE .

when ξ < 1. In the other case the mean is infinite.

If a model is valid for a threshold 0u then it is also valid for all thresholds u greater than

0u . The means in both cases are:

e(u0) = )1(~)/(000 ξσ −=>− uuXuXE

e(u) = )1())(~()1(~)/( 00ξξσξσ −−+=−=>− uuuXuXE uu

Thus, e(u) = )/( uXuXE >− is a linear function of u. Based on this result, the

procedure to estimate the threshold is as follows:

• Build the mean residual life plot, by representing the points

( )( )u

n

i i nuxu u )(,1 )(∑ =

− maxxu < , where un is the number of observations

exceeding u and maxx is the maximum observation in the data set.

• Choose as threshold the value above which the plot is approximately linear in u.

The representation of confidence intervals can help to the determination of this

point.


24

3.3. Parameter estimation

Once the threshold has been estimated, the next step is to estimate the parameters of the

GPD, for example by maximum likelihood. If we denote by kyy ,,1… the k excesses

over the threshold, the log-likelihood function, in the case that ξ is not zero, is:

∑=

++−−=k

iiyk

1

)1log()11(log),( σξξσξσℓ , when 0)1( >+ σξ iy , in other case

−∞=),( ξσℓ .

In the case 0=ξ the log-likelihood is ∑=

−−−=k

iiyk

1

1log)( σσσℓ

Return levels

To calculate the return levels, first we need an expression for the unconditional

distribution of variables X. Denoting by ( )uXu >= Prδ and from the conditional

distribution { }ξ

σξ

1)(

1/Pr−

−+=>> uxuXxX we obtain that

{ }ξ

σξδ

1)(

1Pr−

−+=> uxxX u

Hence, the level mx that is exceeded on average once every m observations is the

solution of

ξ

σξδ

1)(

11

−

−+= ux

mm

u , which is ( )1)( −+= ξδξσ

um mux

This expression is valid for values m leading to uxm > .

In the case 0=ξ the return level is ( )um mux δσ log+= , again for m enough large.

The estimation of these return levels requires the substitution of parameters by their

estimates. In the case of the probability ( )uXu >= Prδ , the maximum estimator is the

sample proportion of observations over the threshold u, that is, nku =δ .


25

3.4. Model checking

Another tool to help in the choice of threshold u

As we said before, when the GPD is a valid model for a threshold 0u then it is also a

valid model for any 0uu > . At both levels the parameter ξ is the same and the scale

parameters are related by )( 00uuuu −+= ξσσ . Thus, the new parameter uu ξσσ −=∗

is constant with respect to u. Consequently, estimates of ∗σ and ξ should be constant

above 0u , when it is a valid threshold. This argument leads to plot ∗σ and ξ against u,

together with confident intervals for them and selecting 0u as the lowest value of u for

which the estimates remain near-constant.

Probability plots, quantile plots and return level plots are used for assessing the quality

of a fitted generalized Pareto model. Assuming a threshold u, ordered excesses

)()1( ,, kyy … and an estimated model H for the GPD, then we have:

Probability plot. It represents the points ( ) kiyHki i ,,1)(ˆ,)1( )( …=+ .

Quantile plot. It represents the points ( )( ) kiykiH i ,,1,)1(ˆ)(

1…=+− .

When the model is valid, in both plots the points are almost linearly placed.

When 0ˆ ≠ξ the estimations are:

ξ

σξ

ˆ1

ˆ

ˆ11)(ˆ

−

+−= y

yH and ( )1)1(ˆˆ

)(ˆ ˆ1 −−= −− ξ

ξσ

ppH

When 0ˆ =ξ the expressions are:

−−=σ

exp1)(ˆ yyH , and ( ))1(lnˆ)(ˆ 1 ppH −−=− σ

Return level plot. It represents the points ( )mxm ˆ, , where as we have seen before

for 0ˆ ≠ξ ( )1)ˆ(ˆˆ

ˆˆ −+= ξδ

ξσ

um mux ,

for the case 0ˆ =ξ the return level is ( )um mux δσ ˆlogˆˆ += .

Recall that mx is the estimated value that is exceeded on average once every m

observations.


26

3.5. Case study. Hourly average wind data from Schiphol in

Netherlands.

We analyse the schiphol data using the threshold method. We consider the series of

monthly maximum wind.

670603536469402335268201134671

300

250

200

150

100

Index

speedmaxmonth

Monthly maximum speed (Schiphol)

We analyse these data by using the extReme package of R.

Descriptive statistics

Monthly max. speed data

N 670 mean 149.69

Std.Dev 33.35 min 87 Q1 125

median 146 Q3 172 max 280


27


28

Threshold selection. For selecting the threshold we look at the mean residual life plot. Recall that if we select

a threshold that is too low will

give biased parameter estimates

but if the selected threshold is too

high then only few values will be

used and as result the estimated

parameters will have a large

variance.

We can observe a decreasing

linear tendency between 185 and

255 approximately (this

decreasing tendency indicates a

negative value for parameter ξ).

We choose as threshold u the

value 185.

We fit the Generalized Pareto Distribution and get the following results: Convergence successfull! [1] "Threshold = 185" [1] "Number of exceedances of threshold = 99" [1] "Exceedance rate (per year)= 1.77" [1] "Maximum Likelihood Estimates:" MLE Std. Err. Scale (sigma): 25.6462453 3.7656845 Shape (xi): -0.1298863 0.1078369 [1] "Negative log-likelihood: 407.336" Parameter covariance: [,1] [,2] [1,] 14.1803800 -0.32751390 [2,] -0.3275139 0.01162880


29

We also use as threshold selection tool the data fitting to a GPD over a range of

thresholds, which, remember, requires fitting data to the GPD distribution several times,

each time using a different threshold. We have to find the minimum value above which

the estimations of both parameters remain constant.

It seems that a value a bit greater than 185 would be more appropriate. Then we repeat

the analysis with a threshold of 210.

The new results are: Convergence successfull! [1] "Threshold = 210" [1] "Number of exceedances of threshold = 33" [1] "Exceedance rate (per year)= 0.591" [1] "Maximum Likelihood Estimates:" MLE Std. Err. Scale (sigma): 28.7074116 6.8613851 Shape (xi): -0.3147829 0.1727022 [1] "Negative log-likelihood: 133.3984" Parameter covariance: [,1] [,2] [1,] 47.078606 -1.04385528 [2,] -1.043855 0.02982605


30

Model checking The estimation and confidence interval for the return level are: Estmating CIs for GPD 100-yr. return

level.

Using 12 days per year.

Estimated 100-yr. return level =

276.2093

100-year return level: 95%

confidence interval approximately

(264.44, 320.87)


31

We can also estimate a confidence interval for the shape parameter ξ: Estmating CIs for GPD shape parameter

(xi).

Estimated (MLE) shape parameter 0.2721

shape parameter (xi): 95% confidence

interval approximately

(-0.4667, 0.0589)

The main objective of this type of analysis is the estimation of the return levels.

We have obtained the 50 year return

Estmating CIs for GPD 50-

yr. return level.


Estimated 50-yr. return

level = 269.611

50-year return level: 95%

confidence interval

approximately

(258.20376, 301.5432)

We can compare these

results with that obtained

with block analysis using

GEV distributions:

"Estimated return level =

267.1967"

[1] "50-year return level: 95% confidence interval approximately" [1] "(253.91618, 295.71891)"


32

4. EXTREMES OF DEPENDENT SEQUENCES

4.1 Stationarity and limited long-range dependence

In the models studied so far we supposed that the sequence of observations comes from

a sequence of independent random variables. In real applications this is an unrealistic

assumption because it is observed some dependence over time. For example, in the case

of wind speed records it is natural to find high positive correlation among consecutive

hourly observations. The next figure shows the autocorrelation function for the wind

series of Schiphol (hourly average wind) that we used in the previous sections.

Now we are studying a generalization of a sequence of independent random variables to

a stationary series. Stationarity corresponds to a series with stochastic behaviour that is

homogeneous through time but whose variables may be mutually dependent.

Definition. A random process …,, 21 XX is said to be stationary if, given any set of

integers { }kii ,,1 … and any integer m, the joint distributions of ),,(1 kii XX … and

),,(1 mimi k

XX ++ … are identical.

That is, given a set of variables their joint distribution remains unchanged when they are

view m time units later. Then stationarity allows that the variables can have a structure

of dependence but excludes trends, seasonality and other deterministic cycles.


33

To obtain the theoretical results to use in our analysis of extremes of stationary

sequences, it is usual to assume a condition that limits the extent of long-range

dependence at extreme levels. We will assume that the events uXi > and uX j > are

approximately independent, when the threshold level u is high enough, and when the

time points i and j are far away from each other.

Many physical phenomena satisfy this property. In our example of wind speed it means

that a high wind today might influence the probability of an extreme wind tomorrow,

maybe because both are due to the pass of the same storm, but it is unlikely that it might

influence in an extreme wind in one month’s time.

The following condition formalizes the notion of extreme events being near-

independent if they are sufficiently distant in time.

conditionuD n)( . A stationary series …,, 21 XX is said to satisfy the )( nuD condition if

for all qp jjii <<<<< …… 11 with kij p >−1 ,

( )( ) ( ) ),(,Pr,Pr

,,,Pr

11

11

knuXuXuXuX

uXuXuXuX

njnjnini

njnjnini

qp

qp

α≤≤≤≤≤−

−≤≤≤≤

……

……

where 0),( →nknα for some sequence satisfying 0 → ∞→nn nk .

Observe that for independent sequences the difference is always 0. To get the results of

next section the condition needs to be satisfied only for a threshold nu that increases

with n. In this way we assure almost the independence of extreme observations that are

enough far apart.


34

4.2 Limit result for the maximum of a stationary process

Following result establishes that the distribution function of the maximum of a

stationary process is in the family of generalized extreme value distributions.

Theorem 4. Let …,, 21 XX be a stationary process and { }nn XXM ,,max 1…= . If there

exist sequences of constants { }0>na and { }nb such that

( ){ } )(Pr zGzabM nnnn →≤− ∞→

where G is a non-degenerate distribution function and the )( nuD condition is satisfied

with nnn bzau += for every real z, then G is a member of the generalized extreme value

family of distributions.

Observe that this result implies that when the stationary series has limited long-range

dependence at extreme levels, the maxima follow the same limit laws as in the case of

independent series. Furthermore, there exists a relationship between both distributions.

Theorem 5. Let …,, 21 XX be a stationary process and …,, *2

*1 XX be a sequence of

independent variables with the same marginal distribution. Let { }nn XXM ,,max 1…=

and { }**1

* ,,max nn XXM …= . Under suitable regularity conditions, there exist sequences

of constants { }0>na and { }nb such that

( ){ } )(Pr 1* zGzabM n

nnn →≤− ∞→ if and only if ( ){ } )(Pr 2 zGzabM nnnn →≤− ∞→

where )()( 12 zGzG θ= for some constant θ with 10 ≤< θ .

From the relationship between both distributions, it is easy to see that both have the

same parameter ξ and

when 0≠ξ : ( )ξθξσµµ −−−= 1* and ξθσσ =∗

when 0=ξ : θσµµ log+=∗ and σσ =∗


35

The quantity θ is called the extremal index. This index can be interpreted in terms of

the propensity of the process to cluster at extreme levels. Loosely speaking, we have

( ) 1sizecluster mean limiting −=θ

where limiting is in the sense of cluster of exceedances of increasingly high thresholds.

4.3 Models for block maxima.

The distribution of the block maxima, when the )( nuD condition is satisfied, falls in the

same family of distributions as if the series were independent. It means that dependency

in the data can be ignored and we can model the data as before when we assumed

independence. The only question is that, because nM has similar statistical properties to

∗θnM (corresponding to the maxima of nξ independent observations), the quality of the

GEV family as an approximation to the distribution of block maxima is diminished.

4.4 Threshold models.

The generalized Pareto distribution remains appropriate for threshold excesses but the

methodology needs to be adapted because the extremes have some tendency to cluster,

violating the assumption of independence among the individual excesses.

The most commonly used

method for dealing with the

problem of dependent

exceedances in the threshold

exceedance model is

declustering. This process

filters the dependent

observations to obtain as a

result a set of threshold

excesses that are approximately independent.

Gap

C1

C2

C3

C4C5

C6

GapGap

C1

C2

C3

C4C5

C6


36

The steps for the analysis of stationary series by the threshold method are the following:

• Use an empirical rule to define clusters of exceedances. For example, all

observations over a threshold belong to the same cluster if runs of observations

in between below the threshold have a length less than certain value k. In the

figure we have six clusters of excedances when we consider the length of the

gap vector as value for k.

• Identify the maximum excess within each cluster.

• Assuming cluster maxima to be independent and with conditional excess

distribution given by the generalized Pareto distribution.

• Fitting the generalized Pareto distribution to the cluster maxima.

Return level. The return level associated with the probability 1/m is

( )1)( −+= ξθδξσ

um mux , where σ and ξ are the parameters of the threshold excess

generalized Pareto distribution, uδ is the probability of an exceedence of u, and θ is the

extremal index.

Denoting the number of exceedances above the threshold u by un and the number of

clusters obtained above u by cn , the parameters uδ and θ are estimated as

n

nuu =δ and

u

c

n

n=θ

uδ and θ are the maximum likelihood estimators or uδ and θ , respectively. Then, when

the parameters of the generalized Pareto distribution, ξ and σ , are estimated by the

maximum likelihood method,

( )1)ˆˆ(ˆˆ

ˆˆ −+= ξθδ

ξσ

um mux

is the maximum likelihood estimator of the return lever mx .


37

4.5 Case study. Hourly average wind data from Schiphol in Netherlands.

We analyse the full Schiphol series using the threshold method approach for stationary

process. The descriptive statistics for the data are

Descriptive statistics for the hourly data

N 489504.0 mean 53.2 Std.Dev. 30.9 min 0.0 Q1 31.0 median 48.0 Q3 71.0 max 280.0 missing values 0.0

DatosSchiphol$UP

frequ

ency

0 50 100 150 200 250

020

000

4000

060

000

0e+00 1e+05 2e+05 3e+05 4e+05 5e+05

05

01

0015

020

025

0

Index

spee

d m

/s


38

Previous figures show the histogram, the plot of the complete series and a zoom of a

part of this series.

Declustering the data. First step in the analysis consists in declustering the data to

eliminate the dependence among neighbour extreme observations. The procedure

assumes that the exceedances belong to the same cluster if they are separated by fewer

than 'r' (run length) values below a given threshold. In our case we choose as threshold

170 and as run length 96 (that is, 4 days).


39

As result, 222 clusters are obtained above the threshold 170.

[1] "Declustering ..."

[1] "declustering performed for:"

[1] "UP and assigned to UP.u170r96dc"

[1] "222 clusters using threshold of 170 and r = 96"

A new data vector is generated with the same length as the original data vector, but with

maximums from each cluster followed by 'filler' numbers that are below the given

threshold, 'u'.

The following threshold analysis is done over this new vector data.


40

Fitting declustered data to a generalized Pareto distribution

We fit the declustered data to a GDP using as threshold level 190. Observe that the

number of observations per year is 8766, which corresponds to 365.25x24.

We obtain the following results

An initial estimation of the parameters by the L-moments method:

L-moments estimates for (stationary) GPD are:

scale: 23.63865 shape: -0.1041125

These L-moments estimators were used as initial parameter estimates.

A test hypothesis for 0=ξ

Likelihood ratio test (5% level) for xi=0 does not reject Exponential hypothesis.

likelihood ratio statistic is 1.435923 < 3.841459 1 df chi-square critical value.

p-value for likelihood-ratio test is 0.2308003

Information about excedences

[1] "Threshold = 190" [1] "Number of exceedances of threshold = 83" [1] "Exceedance rate (per year)= 1.4863"


41

The results for the maximum likelihood estimation

[1] "Maximum Likelihood Estimates:"

MLE Std. Err.

Scale (sigma): 24.60 3.766

Shape (xi): -0.1477 0.1079

[1] "Negative log-likelihood: 336.5808"

An the estimation for the variance-covariance matrix

Parameter covariance: [,1] [,2] [1,] 14.1831957 -0.32382041 [2,] -0.3238204 0.01164461

Diagnostic plots show a reasonable good fit.


42

Graphical tools for determining the threshold

Mean residual plot

Linearity of parameter estimations


43

Estimation of return levels

We use the profile likelihood method to obtain confidence intervals for the parameter ξ

and for the return level of 50 years.

The package extRemes of R provides the following result:

Estmating CIs for GPD 50-yr. return level and shape parameter (xi).


Estimated 50-yr. return level = 268.4197

Estimated (MLE) shape parameter = -0.1477

50-year return level: 95% confidence interval approximately

(255.396, 305.700)

shape parameter (xi): 95% confidence interval approximately

(-0.32207, 0.11549)

Changing the confidence level to 90% we obtain these intervals:

50-year return level: 90% confidence interval approximately

(256.975, 296.137)

shape parameter (xi): 90% confidence interval approximately

(-0.29791, 0.06547)


44


45

5. BIBLIOGRAPHY

1. Castillo, E.; Hadi, A. S.; Balakrishnan, N. and Sarabia, J. M. (2005). Extreme Value

and Related Models with Applications in Engineering and Science. Wiley New

Jersey

2. Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values.

Springer London.

3. Eric Gilleland, Rick Katz and Greg Young (2004). extRemes: Extreme value

toolkit.. R package version 1.59.http://www.assessment.ucar.edu/toolkit/

4. Ferro CAT and Segers J (2003) Inference for clusters of extreme values. Journal of

the Royal Statistical Society B 65, 545-556.

5. Gross, J.; Heckert, A.; Lechner, J.; Simiu, E. (1994). Novel Extreme value

estimation procedures: application to extreme wind data. In J. Galambos et al. (eds.),

Extreme Value Theory and Applications, 139-158. Kluwer Academic Publishers.

Netherlands.

6. Perrin, O.; Rootzén, H.; Taesler, R. (2006) A discussion of statistical methods used

to estimate extreme wind speeds. Theor. Appl. Climatol. 85, 203–215.

7. R Development Core Team (2008). R: A language and environment for statistical

computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-

900051-07-0, URL http://www.R-project.org.

8. Reiss, R. D.; Thomas, M. (2007). Statistical Analysis of Extreme Values, with

applications to Insurance, Finance, Hydrology and other Fields. Birkhäuser

9. Sacré, C.; Moisselinb, J.M.; Sabrea, M.; Floria, J.P.; Dubuissonb, B. (2007). A new

statistical approach to extreme wind speeds in France. Journal of Wind Engineering

10. Sanabria, L. A., Cechet, R. P. (2007). A Statistical Model of Severe Winds.

Geoscience Australia Record 2007/12, 60p. ISBN: 978 1 921236 43 3

11. Simiu, E. (2002). Meteorological extremes. Encyclopedia of Environmetrics,

Volume 3, Abdel H. El-Shaarawi and Walter W. Piegorsch, eds., 1255-1259. John

Wiley & Sons, Ltd, Chichester, United Kingdom.

an introduction to statistical modelling of extreme values

Documents