an application of extreme value theory for measuring financial

An Application of Extreme Value Theory for

Measuring Financial Risk 1

Manfred Gilli a,∗, Evis Kellezi b,2,

aDepartment of Econometrics, University of Geneva and FAMEbMirabaud & Cie, Boulevard du Theatre 3, 1204 Geneva, Switzerland

Abstract

Assessing the probability of rare and extreme events is an important issue in therisk management of financial portfolios. Extreme value theory provides the solidfundamentals needed for the statistical modelling of such events and the computa-tion of extreme risk measures. The focus of the paper is on the use of extreme valuetheory to compute tail risk measures and the related confidence intervals, applyingit to several major stock market indices.

Key words: Extreme Value Theory, Generalized Pareto Distribution, GeneralizedExtreme Value Distribution, Quantile Estimation, Risk Measures, MaximumLikelihood Estimation, Profile Likelihood Confidence Intervals

1 Introduction

The last years have been characterized by significant instabilities in financialmarkets worldwide. This has led to numerous criticisms about the existing risk

∗ Corresponding author: Department of Econometrics, University of Geneva, Bddu Pont d’Arve 40, 1211 Geneva 4, Switzerland. Tel.: + 41 22 379 8222; fax: +41 22 379 8299.

Email addresses: [email protected] (Manfred Gilli),[email protected] (Evis Kellezi).1 The original publication is available at www.springerlink.com (DOI:10.1007/s10614-006-9025-7).2 Supported by the Swiss National Science Foundation (project 12–52481.97). Weare grateful to three anonymous referees for corrections and comments and thankElion Jani and Agim Xhaja for their suggestions.

Article published in Computational Economics 27(1), 2006, 1–23.

management systems and motivated the search for more appropriate metho-dologies able to cope with rare events that have heavy consequences.

The typical question one would like to answer is: “If things go wrong, howwrong can they go? ” The problem is then how to model the rare phenomenathat lie outside the range of available observations. In such a situation it seemsessential to rely on a well founded methodology. Extreme value theory (EVT)provides a firm theoretical foundation on which we can build statistical modelsdescribing extreme events.

In many fields of modern science, engineering and insurance, extreme valuetheory is well established (see e.g. Embrechts et al. (1999), Reiss and Thomas(1997)). Recently, numerous research studies have analyzed the extreme vari-ations that financial markets are subject to, mostly because of currency crises,stock market crashes and large credit defaults. The tail behaviour of finan-cial series has, among others, been discussed in Koedijk et al. (1990), Da-corogna et al. (1995), Loretan and Phillips (1994), Longin (1996), Daniels-son and de Vries (2000), Kuan and Webber (1998), Straetmans (1998), Mc-Neil (1999), Jondeau and Rockinger (1999), Rootzen and Kluppelberg (1999),Neftci (2000), McNeil and Frey (2000) and Gencay et al. (2003b). An interest-ing discussion about the potential of extreme value theory in risk managementis given in Diebold et al. (1998).

This paper deals with the behavior of the tails of financial series. More specif-ically, the focus is on the use of extreme value theory to compute tail riskmeasures and the related confidence intervals.

Section 2 presents the definitions of the risk measures we consider in this pa-per. Section 3 reviews the fundamental results of extreme value theory usedto model the distributions underlying the risk measures. In Section 4, a prac-tical application is presented where six major developed market indices areanalyzed. In particular, point and interval estimates of the tail risk measuresare computed. Section 5 concludes.

2 Risk Measures

Some of the most frequent questions concerning risk management in financeinvolve extreme quantile estimation. This corresponds to the determinationof the value a given variable exceeds with a given (low) probability. A typicalexample of such measures is the Value-at-Risk (VaR). Other less frequentlyused measures are the expected shortfall (ES) and the return level. Hereafterwe define the risk measures we focus on in the following chapters.

2

Value-at-Risk

Value-at-Risk is generally defined as the capital sufficient to cover, in mostinstances, losses from a portfolio over a holding period of a fixed numberof days. Suppose a random variable X with continuous distribution functionF models losses or negative returns on a certain financial instrument over acertain time horizon. VaRp can then be defined as the p-th quantile of thedistribution F

VaRp = F−1(1− p), (1)

where F−1 is the so called quantile function 3 defined as the inverse of thedistribution function F .

For internal risk control purposes, most of the financial firms compute a 5%VaR over a one-day holding period. The Basle accord proposed that VaR for thenext 10 days and p = 1%, based on a historical observation period of at least1 year of daily data, should be computed and then multiplied by the ‘safetyfactor’ 3. The safety factor was introduced because the normal hypothesis forthe profit and loss distribution is widely recognized as unrealistic.

Expected Shortfall

Another informative measure of risk is the expected shortfall (ES) or the tailconditional expectation which estimates the potential size of the loss exceedingVaR. The expected shortfall is defined as the expected size of a loss that exceedsVaRp

ESp = E(X | X > VaRp). (2)

Artzner et al. (1999) argue that expected shortfall, as opposed to Value-at-Risk, is a coherent risk measure.

Return Level

If H is the distribution of the maxima observed over successive non overlappingperiods of equal length, the return level Rk

n = H−1(1− 1k) is the level expected

to be exceeded in one out of k periods of length n. The return level can be usedas a measure of the maximum loss of a portfolio, a rather more conservativemeasure than the Value-at-Risk.

3 More generally a quantile function is defined as the generalized inverse of F :F←(p) = inf{x ∈ R : F (x) ≥ p}, 0 < p < 1.

3

3 Extreme Value Theory

When modelling the maxima of a random variable, extreme value theory playsthe same fundamental role as the central limit theorem plays when modellingsums of random variables. In both cases, the theory tells us what the limitingdistributions are.

Generally there are two related ways of identifying extremes in real data. Letus consider a random variable representing daily losses or returns. The firstapproach considers the maximum the variable takes in successive periods, forexample months or years. These selected observations constitute the extremeevents, also called block (or per period) maxima. In the left panel of Figure 1,the observations X2, X5, X7 and X11 represent the block maxima for fourperiods of three observations each.

1 2 3 4

......................................... ................

........

........

......................

................

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

.

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

...X2•

........

........

........

........

........

........

........

........

........

........

......

........

........

........

........

........

........

........

........

........

.......

........

........

........

........

........

........

........

........

........

........

........

........

...X5•

........

........

........

........

........

........

..

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

.X7•

........

........

........

........

........

........

........

........

........

........

........

........

........

........

....

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

....

........

........

........

........

........

........

........

........

........

.

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

....X11•

........

........

........

........

........

...

........

........

........

........

........

........

........

........

........

.....

......................................... ................

........

........

......................

................

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

.X1•

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

...X2•

........

........

........

........

........

........

........

........

........

........

......

........

........

........

........

........

........

........

........

........

.......

........

........

........

........

........

........

........

........

........

........

........

........

...

........

........

........

........

........

........

..

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

.X7•

........

........

........

........

........

........

........

........

........

........

........

........

........

........

....X8•

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

....X9•

........

........

........

........

........

........

........

........

........

.

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

....X11•

........

........

........

........

........

...

........

........

........

........

........

........

........

........

........

.....

............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ...........u

Fig. 1. Block-maxima (left panel) and excesses over a threshold u (right panel).

The second approach focuses on the realizations exceeding a given (high)threshold. The observations X1, X2, X7, X8, X9 and X11 in the right panel ofFigure 1, all exceed the threshold u and constitute extreme events.

The block maxima method is the traditional method used to analyze data withseasonality as for instance hydrological data. However, the threshold methoduses data more efficiently and, for that reason, seems to become the methodof choice in recent applications.

In the following subsections, the fundamental theoretical results underlyingthe block maxima and the threshold method are presented.

3.1 Distribution of Maxima

The limit law for the block maxima, which we denote by Mn, with n the sizeof the subsample (block), is given by the following theorem:

Theorem 1 (Fisher and Tippett (1928), Gnedenko (1943)) Let (Xn) be a

4

sequence of i.i.d. random variables. If there exist constants cn > 0, dn ∈ Rand some non-degenerate distribution function H such that

Mn − dn

cn

d→ H,

then H belongs to one of the three standard extreme value distributions:

Frechet: Φα(x) =

0, x ≤ 0

e−x−α, x > 0

α > 0,

Weibull: Ψα(x) =

e−(−x)α, x ≤ 0

1, x > 0α > 0,

Gumbel: Λ(x) = e−e−x, x ∈ R.

The shape of the probability density functions for the standard Frechet, Weibulland Gumbel distributions is given in Figure 2.

−1 0 1 2 4

0

0.2

0.4

0.6

.............................................................................................................................................................................................................................................................................................................................................................................................................................

α=1.5

−4 −2 0 1

0

0.2

0.4

0.6

................................................................................

......................................................................................................................................................................................................................................................................................................................................................................................................................................................

α=1.5

−2 −1 0 1 3

0

0.2

0.4

0.6

.....................................................................................................................................................................................................................................................................................................................................................................................................

Fig. 2. Densities for the Frechet, Weibull and Gumbel functions.

We observe that the Frechet distribution has a polynomially decaying tail andtherefore suits well heavy tailed distributions. The exponentially decaying tailsof the Gumbel distribution characterize thin tailed distributions. Finally, theWeibull distribution is the asymptotic distribution of finite endpoint distrib-utions.

Jenkinson (1955) and von Mises (1954) suggested the following one-parameterrepresentation

Hξ(x) =

e−(1+ξx)−1/ξif ξ 6= 0

e−e−xif ξ = 0

(3)

of these three standard distributions, with x such that 1 + ξx > 0. Thisgeneralization, known as the generalized extreme value (GEV) distribution, isobtained by setting ξ = α−1 for the Frechet distribution, ξ = −α−1 for the

5

Weibull distribution and by interpreting the Gumbel distribution as the limitcase for ξ = 0.

As in general we do not know in advance the type of limiting distribution ofthe sample maxima, the generalized representation is particularly useful whenmaximum likelihood estimates have to be computed. Moreover the standardGEV defined in (3) is the limiting distribution of normalized extrema. Giventhat in practice we do not know the true distribution of the returns and, asa result, we do not have any idea about the norming constants cn and dn, weuse the three parameter specification

Hξ,σ,µ(x) = Hξ

(x− µ

σ

)x ∈ D, D =

]−∞, µ− σξ [ ξ < 0

]−∞, ∞[ ξ = 0

]µ− σξ , ∞[ ξ > 0

(4)

of the GEV, which is the limiting distribution of the unnormalized maxima.The two additional parameters µ and σ are the location and the scale para-meters representing the unknown norming constants.

The quantities of interest are not the parameters themselves, but the quantiles,also called return levels, of the estimated GEV:

Rk = H−1ξ,σ,µ(1− 1

k) .

Substituting the parameters ξ, σ and µ by their estimates ξ, σ and µ, we get

Rk =

µ− σξ

(1−

(− log(1− 1

k))−ξ

)ξ 6= 0

µ− σ log(− log(1− 1

k))

ξ = 0

. (5)

A value of R10 of 7 means that the maximum loss observed during a periodof one year will exceed 7% once in ten years on average.

3.2 Distribution of Exceedances

An alternative approach, called the peak over threshold (POT) method, is toconsider the distribution of exceedances over a certain threshold. Our problemis illustrated in Figure 3 where we consider an (unknown) distribution functionF of a random variable X. We are interested in estimating the distributionfunction Fu of values of x above a certain threshold u.

6

0 u xF

1

................................. ................

........

.....................

................

............................................................................................................................................................ ..

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

...

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

.........................................................................................................................................................................................................

............................................

........................................................................................................................

F (x)

x

◦Fu

...........................................

0 xF − u

1

................................. ................

........

.....................

................

.................................................................................................................................................................................

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

...

........

........

........

........

........

........

.........

......................................................................................................................................................................................................................

.......................................................................

.............................................................................................................................................................................

Fu(y)

y

Fig. 3. Distribution function F and conditional distribution function Fu.

The distribution function Fu is called the conditional excess distribution func-tion and is defined as

Fu(y) = P (X − u ≤ y | X > u), 0 ≤ y ≤ xF − u (6)

where X is a random variable, u is a given threshold, y = x−u are the excessesand xF ≤ ∞ is the right endpoint of F . We verify that Fu can be written interms of F , i.e.

Fu(y) =F (u + y)− F (u)

1− F (u)=

F (x)− F (u)

1− F (u). (7)

The realizations of the random variable X lie mainly between 0 and u andtherefore the estimation of F in this interval generally poses no problems. Theestimation of the portion Fu however might be difficult as we have in generalvery little observations in this area.

At this point EVT can prove very helpful as it provides us with a powerfulresult about the conditional excess distribution function which is stated in thefollowing theorem:

Theorem 2 (Pickands (1975), Balkema and de Haan (1974)) For a largeclass of underlying distribution functions F the conditional excess distributionfunction Fu(y), for u large, is well approximated by

Fu(y) ≈ Gξ,σ(y), u →∞,

where

Gξ,σ(y) =

1−(1 + ξ

σy)−1/ξ

if ξ 6= 0

1− e−y/σ if ξ = 0(8)

for y ∈ [0, (xF − u)] if ξ ≥ 0 and y ∈ [0, −σξ] if ξ < 0. Gξ,σ is the so called

generalized Pareto distribution (GPD).

7

If x is defined as x = u + y, the GPD can also be expressed as a function ofx, i.e. Gξ,σ(x) = 1− (1 + ξ(x− u)/σ)−1/ξ.

Figure 4 illustrates the shape of the generalized Pareto distribution Gξ,σ(x)when ξ, called the shape parameter or tail index, takes a negative, a positiveand a zero value. The scaling parameter σ is kept equal to one.

0 2 4 8

0

0.5

1

.......................... ................

.......

...................

.......................................

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

........

........

.........

........

.........

........

........

.........

........

.........

........

.........

........

.........

.........

........

...........................................................................................................................................................................................................................................................................................

y

◦

ξ = −.5

← −σ/ξ

0 2 4 8

0

0.5

1

.......................... ................

.......

...................

.................................................................................................................................................................

........

........

........

........

........

........

.........

........

........

........

........

...........................................................................................................................

.........................................................................................................................................

y

ξ = 0

0 2 4 8

0

0.5

1

.......................... ................

.......

...................

.................................................................................................................................................................

........

........

........

........

........

........

...........................................................................................................................................................

..................................................................................

...........................................

y

ξ = .5

Fig. 4. Shape of the generalized Pareto distribution Gξ,σ for σ = 1.

The tail index ξ gives an indication of the heaviness of the tail, the larger ξ, theheavier the tail. As, in general, one cannot fix an upper bound for financiallosses, only distributions with shape parameter ξ ≥ 0 are suited to modelfinancial return distributions.

Assuming a GPD function for the tail distribution, analytical expressions forVaRp and ESp can be defined as a function of GPD parameters. Isolating F (x)from (7)

F (x) = (1− F (u)) Fu(y) + F (u)

and replacing Fu by the GPD and F (u) by the estimate (n−Nu)/n, where nis the total number of observations and Nu the number of observations abovethe threshold u, we obtain

F (x) = Nu

n

(1−

(1 + ξ

σ(x− u)

)−1/ξ)

+(1− Nu

n

)

which simplifies to

F (x) = 1− Nu

n

(1 + ξ

σ(x− u)

)−1/ξ. (9)

Inverting (9) for a given probability p gives

VaRp = u + σξ

((n

Nup)−ξ − 1

). (10)

Let us rewrite the expected shortfall as

ESp = VaRp + E(X − VaRp | X > VaRp)

8

where the second term on the right is the expected value of the exceedancesover the threshold VaRp. It is known that the mean excess function for theGPD with parameter ξ < 1 is

e(z) = E(X − z | X > z) =σ + ξz

1− ξ, σ + ξz > 0 . (11)

This function gives the average of the excesses of X over varying values of athreshold z. Another important result concerning the existence of moments isthat if X follows a GPD then, for all integers r such that r < 1/ξ, the r firstmoments exist. 4

Similarly, given the definition (2) for the expected shortfall and using expres-sion (11), for z = VaRp − u and X representing the excesses y over u weobtain

ESp = VaRp +σ + ξ(VaRp − u)

1− ξ=

VaRp

1− ξ+

σ − ξu

1− ξ. (12)

4 Application

Our aim is to illustrate the tail distribution estimation of a set of financialseries of daily returns and use the results to quantify the market risk. Table 1gives the list of the financial series considered in our analysis. The illustra-tion focuses mainly on the S&P500 index, providing confidence intervals andgraphical visualization of the estimates, whereas for the other series only pointestimates are reported.

Table 1Data analyzed.Symbol Index name Start End ObservationsES50 Dow Jones Euro Stoxx 50 2–01–87 17–08–04 4555FTSE100 FTSE 100 5–01–84 17–08–04 5215HS Hang Seng 9–01–81 17–08–04 5836Nikkei Nikkei 225 7–01–70 17–08–04 8567SMI Swiss Market Index 5–07–88 17–08–04 4050S&P500 S&P 500 5–01–60 16–08–04 11270

The application has been executed in a Matlab 7.x programming environ-ment. 5 The files with the data and the code can be downloaded from www.

4 See Embrechts et al. (1999), page 165.5 Other software for extreme value analysis can be found at www.math.ethz.ch/∼mcneil/software.html or in Gencay et al. (2003a). Standard numerical or statis-tical software, like for example Matlab, now also provide functions or routines thatcan be used for EVT applications.

9

unige.ch/ses/metri/gilli/evtrm/. Figure 5 shows the plot of the n =11270 observed daily returns of the S&P500 index.

1960 1965 1971 1976 1982 1987 1993 1998 2004

−20

−10

0

10

Fig. 5. Daily returns of the S&P500 index.

We consider both the left and the right tail of the return distribution. Thereason is that the left tail represents losses for an investor with a long positionon the index, whereas the right tail represents losses for an investor beingshort on the index.

As it can be seen from Figure 5, returns exhibit dependence in the secondmoment. McNeil and Frey (2000) propose a two stage method consisting inmodelling the conditional distribution of asset returns against the currentvolatility and then fitting the GPD on the tails of residuals. On the otherside, Danielsson and de Vries (2000) argue that for long time horizons an un-conditional approach is better suited. Indeed, as Christoffersen and Diebold(2000) notice, conditional volatility forecasting is not indicated for multipleday predictions. For a detailed discussion on these issues, including the i.i.d.assumptions, we refer the reader to the above mentioned references, believingthat the choice between conditional and unconditional approaches depends onthe final use of the risk measures and the time horizon considered. For shorttime horizons of the order of several hours or days, and if an automatic up-dating of the parameters is feasible, a conditional approach may be indicated.For longer horizons, a non conditional approach might be justified by the factthat it provides stable estimates through time requiring less frequent updates.

The methodology applying to right tails, in the left tail case we change thesign of the returns so that positive values correspond to losses.

First, we consider the distribution of the block maxima, which allows thedetermination of the return level. Second, we model the exceedances overa given threshold which enables us to estimate high quantiles of the returndistribution and the corresponding expected shortfall.

In both cases we use maximum likelihood estimation, which is one of the mostcommon estimation procedures used in practice. We also compute likelihood-

10

based interval estimates of the parameters and the quantities of interest whichprovide additional information related to the accuracy of the point estimates.These intervals, contrarily to those based on standard errors, do not rely onasymptotic theory results and restrictive assumptions. We expect them tobe more accurate in the case of small sample size. Another advantage of thelikelihood-based approach is the possibility to construct joint confidence inter-vals. The greater computational complexity of the likelihood-based approachis nowadays no longer an obstacle for its use.

4.1 Method of Block Maxima

The application of the method of block maxima goes through the followingsteps: divide the sample in n blocks of equal length, collect the maximumvalue in each block, fit the GEV distribution to the set of maxima and, finally,compute point and interval estimates for Rk

n.

The delicate point of this method is the appropriate choice of the periods defin-ing the blocks. The calendar naturally suggests periods like months, quarters,etc. In order to avoid seasonal effects, we choose yearly periods which are likelyto be sufficiently large for Theorem 1 to hold. The S&P500 data sample hasbeen divided into 45 non-overlapping sub-samples, each of them containingthe daily returns of the successive calendar years. Therefore not all our blocksare of exactly the same length. The maximum return in each of the blocksconstitute the data points for the sample of maxima M which is used to es-timate the generalized extreme value distribution (GEV). Figure 6 plots theyearly maxima for the left and right tails of the S&P500.

1960 1965 1971 1976 1982 1987 1993 1998 2004−30

−20

−10

0

10

Fig. 6. Yearly minima and maxima of the daily returns of the S&P500.

The log-likelihood function that we maximize with respect to the three un-

11

known parameters is

L(ξ, µ, σ; x) =∑

i

log(h(xi)

), xi ∈ M (13)

where

h(ξ, µ, σ; x) =1

σ

(1 + ξ

x− µ

σ

)−1/ξ−1

exp

(−

(1 + ξ

x− µ

σ

)−1/ξ)

is the probability density function if ξ 6= 0 and 1 + ξ x−µσ

> 0. If ξ = 0 thefunction h is

h(ξ, µ, σ; x) =1

σexp

(−x− µ

σ

)exp

(− exp

(−x− µ

σ

)).

In Figure 7, we give the plot of the sample distribution function 6 and thecorresponding fitted GEV distribution. Point and interval estimates for theparameters are given in Table 2.

0 5 10 15 20 250

0.2

0.4

0.6

0.8

1

0 5 10 15 20 250

0.2

0.4

0.6

0.8

1

Fig. 7. Sample distribution (dots) of yearly minima (left panel) and maxima (rightpanel) and corresponding fitted GEV distribution for S&P500.

Interval Estimates

In order to be able to compute interval estimates, 7 it is useful to approachthe quantile estimation problem by directly reparameterizing the GEV dis-tribution as a function of the unknown return level Rk. To achieve this, weisolate µ from equation (5) and substitute it into Hξ,σ,µ defined in (4). The

6 The sample distribution function Fn(xni ) for a set of n observations given in

increasing order xn1 ≤ · · · ≤ xn

n, is defined as Fn(xni ) = i

n , i = 1, . . . , n.7 For a good introduction to likelihood-based statistical inference, see Azzalini(1996).

12

GEV distribution function then becomes

Hξ,σ,Rk(x) =

exp(−

(ξσ (x−Rk) +

(− log(1− 1k )

)−ξ)−1/ξ

)ξ 6= 0

(1− 1k )

exp

(−x−Rk

σ

)ξ = 0

for x ∈ D defined as

D =

]−∞,(Rk − ξ

σ

(− log(1− 1k )

)−ξ)

[ ξ < 0

]−∞, ∞[ ξ = 0

](Rk − ξ

σ

(− log(1− 1k )

)−ξ)

, ∞[ ξ > 0

and we can directly obtain maximum likelihood estimates for Rk. The profilelog-likelihood function can then be used to compute separate or joint confi-dence intervals for each of the parameters. For example, in the case where theparameter of interest is Rk, the profile log-likelihood function will be definedas

L∗(Rk) = maxξ, σ

L(ξ, σ,Rk) .

The confidence interval we then derive includes all values of Rk satisfying thecondition

L∗(Rk)− L(ξ, σ, Rk) > −12χ2

α, 1

where χ2α, 1 refers to the (1−α)–level quantile of the χ2 distribution with 1 de-

gree of freedom. The function L∗(Rk)−L∗(ξ, σ, Rk) is called the relative profilelog-likelihood function and is plotted in the left panel of Figure 8. The pointestimate of 6.41% of R10 is included in the rather large interval (4.74, 11). Asless observations are available for higher quantiles, the interval is asymmetric,indicating more uncertainty for the upper bound of maximum losses.

Sometimes we are also interested in the value of ξ, which characterizes thetail heaviness of the underlying distribution. In this case, a joint confidenceregion on both ξ and R10 is needed. The profile log-likelihood function is thendefined as

L∗(ξ, Rk) = maxσ

L(ξ, σ,Rk),

with the confidence region defined as the contour at the level −12χ2

α, 2 of therelative profile log-likelihood function

L∗(ξ, Rk)− L(ξ, σ, Rk).

In the right panel of Figure 8, we reproduce single and joint confidence re-gions at level 95% for ξ and R10 of the S&P500. In the same graph we also plotthe pairs (R10, ξ) estimated on 1000 bootstrap samples. The joint confidenceregion covers approximately 95% of the bootstrap pairs, indicating that com-puting the joint interval region gives a good idea about the likely values of the

13

parameters. Moreover, we notice that the joint region is significantly differentfrom the one defined by the single confidence intervals.

In order to account for the small sample size, single confidence intervals are alsocomputed with a bias-corrected and accelerated (BCa) bootstrap method. 8 Asa result, for R10, the BCa interval narrows to (5.01, 9.19). Regarding the shapeparameter ξ, the difference is less pronounced. However, in both cases, theintervals clearly indicate a positive value for ξ, which implies that the limitingdistribution of maxima belongs to the Frechet family.

4.74 6.41 11

−1.92

0

R10 R10

ξ

5.01 6.41 9.19 11

0.217

0.53

0.849

ML

BCa

Fig. 8. Left panel: Relative profile log-likelihood and 95% confidence interval forR10 of the left tail. Right panel: Single and joint confidence regions for ξ and R10

at level 95%. Maximum likelihood estimates are marked with the symbol ∗.

The point estimates and the single confidence intervals for the reparameterizedGEV distribution for S&P500 are summarized in Table 2.

Table 2Point estimates and 95% maximum likelihood (ML) and bootstrap (BCa) confidenceintervals for the GEV method applied to S&P500.

Lower bound Point estimate Upper boundBCa ML ML ML BCa

Left tailξ 0.217 0.256 0.530 0.771 0.849σ 0.802 0.815 0.964 1.213 1.188

R10 5.006 4.741 6.411 11.001 9.190Right tail

ξ −.288 −.076 0.100 0.341 0.392σ 0.815 0.836 1.024 1.302 1.312

R10 4.368 4.230 4.981 6.485 5.869

8 For an introduction to bootstrap methods see Efron and Tibshirani (1993) orShao and Tu (1995).

14

For k = 10, we obtain for our data R10 = 6.41, meaning that the maximumloss observed during a period of one year exceeds 6.41% in one out of ten yearson average. In the same way we can derive that a loss of R100 = 21.27 % isexceeded on average only once in a century. Notice that this is very close tothe 87 crash daily loss of 22.90 %.

Table 3 summarizes point estimates for GEV for all six indices. Because ofthe low number of observations for ES50 and SMI the corresponding pointestimates are less reliable. We notice the high value of R10 for the left tail ofHang Seng (HS), twice as big as the next riskiest index.

Table 3Point estimates for the GEV method for six market indices.

ES50 FTSE100 HS Nikkei SMI S&P500

# maxima 18 21 24 35 17 45Left tail

ξ −.301 0.679 0.512 0.251 0.172 0.530σ 1.773 0.705 2.707 1.616 1.563 0.964

R10 7.217 6.323 16.950 8.243 8.341 6.411Right tail

ξ 0.185 0.309 0.179 0.096 −.032 0.100σ 1.252 0.919 1.754 1.518 1.773 1.024

R10 6.366 5.501 8.439 7.159 7.611 4.981

One way to better exploit information about extremes in the data sample isto use the POT method. Coles (2001, p. 81) suggests the estimation of returnlevels using GPD. However, if the data set is large enough, GEV may stillprove useful as it can avoid dealing with data clustering issues, provided thatblocks are sufficiently large. Furthermore, the estimation is simplified as theselection of a threshold u is not needed.

4.2 The Peak Over Threshold Method

The implementation of the peak over threshold method involves the followingsteps: select the threshold u, fit the GPD function to the exceedances overu and then compute point and interval estimates for Value-at-Risk and theexpected shortfall.

Selection of the threshold u

Theory tells us that u should be high in order to satisfy Theorem 2, but thehigher the threshold the less observations are left for the estimation of the

15

parameters of the tail distribution function.

So far, no automatic algorithm with satisfactory performance for the selectionof the threshold u is available. The issue of determining the fraction of databelonging to the tail is treated by Danielsson et al. (2001), Danielsson andde Vries (1997) and Dupuis (1998) among others. However these references donot provide a clear answer to the question of which method should be used.

A graphical tool that is very helpful for the selection of the threshold u is thesample mean excess plot defined by the points

(u, en(u)

), xn

1 < u < xnn , (14)

where en(u) is the sample mean excess function defined as

en(u) =

∑ni=k(x

ni − u)

n− k + 1, k = min{i | xn

i > u},

and n− k + 1 is the number of observations exceeding the threshold u.

The sample mean excess function, which is an estimate of the mean excessfunction e(u) defined in (11), should be linear. This property can be usedas a criterion for the selection of u. Figure 9 shows the sample mean excessplots corresponding to the S&P500 data. From a closer inspection of the plotswe suggest the values u = 2.2 for the threshold of the left tail and u = 1.4for the threshold of the right tail. These values are located at the beginningof a portion of the sample mean excess plot that is roughly linear, leavingrespectively 158 and 614 observations in the tails (see Figure 10).

−5 2.2 50

5

10

15

u1.4 5.57

0

1

2

3

4

5

u

Fig. 9. Sample mean excess plot for the left and right tail determination for S&P500data.

16

1960 1965 1971 1976 1982 1987 1993 1998 2004

−20

−10

0

10

Fig. 10. Exceedances of daily returns of the S&P500 index.

Maximum Likelihood Estimation

Given the theoretical results presented in the previous section, we know thatthe distribution of the observations above the threshold in the tail shouldbe a generalized Pareto distribution (GPD). Different methods can be usedto estimate the parameters of the GPD. 9 In the following we describe themaximum likelihood estimation method.

For a sample y = {y1, . . . , yn} the log-likelihood function L(ξ, σ|y) for theGPD is the logarithm of the joint density of the n observations

L(ξ, σ | y) =

−n log σ −

(1ξ

+ 1) ∑n

i=1 log(1 + ξ

σyi

)if ξ 6= 0

−n log σ − 1σ

∑ni=1 yi if ξ = 0.

We compute the values ξ and σ that maximize the log-likelihood functionfor the sample defined by the observations exceeding the threshold u. Weobtain the estimates ξ = 0.388 and σ = 0.545 for the left tail exceedances andξ = 0.137 and σ = 0.579 for the right tail. Figure 11 shows how GPD fits toexceedances of the left and right tails of the S&P500. Clearly the left tail isheavier than the right one. This can also be seen from the estimated value ofthe shape parameter ξ which is positive in both cases, but higher in the lefttail case.

High quantiles and expected shortfall may be directly read in the plot orcomputed from equations (10) and (12) where we replace the parameters bytheir estimates. For instance, for p = 0.01 we can compute VaR 0.01 = 2.397 and

9 These are the maximum likelihood estimation, the method of moments, themethod of probability-weighted moments and the elemental percentile method. Forcomparisons and detailed discussions about their use for fitting the GPD to data,see Hosking and Wallis (1987), Grimshaw (1993), Tajvidi (1996a), Tajvidi (1996b)and Castillo and Hadi (1997).

17

2.4 10 22.90.99

0.992

0.994

0.996

0.998

1

2.5 3.35 5 8.710.99

0.992

0.994

0.996

0.998

1

Fig. 11. Left panel: GPD fitted to the 158 left tail exceedances above the thresh-old u = 2.2. Right panel: GPD fitted to the 614 right tail exceedances above thethreshold u = 1.4.

ES 0.01 = 3.412 (VaR 0.01 = 2.505, ES 0.01 = 3.351 for the right tail). We observethat, with respect to the right tail, the left tail has a lower VaR but a higherES which illustrates the importance to go beyond a simple VaR calculation.

Interval Estimates

Again, we consider single and joint confidence intervals, based on the profilelog-likelihood functions. Log-likelihood based confidence intervals for VaRp canbe obtained by using a reparameterized version of GPD defined as a functionof ξ and VaRp:

Gξ,VaRp(y) =

1−(

1 +(

nNu

p)−ξ −1

VaRp−u y

)− 1ξ

ξ 6= 0

1− nNu

p exp( yVaRp−u) ξ = 0

.

The corresponding probability density function is

gξ,VaRp(y) =

(n

Nup)−ξ−1

ξ(VaRp−u)

(1 +

(n

Nup)−ξ −1

VaRp−u y

)− 1ξ−1

ξ 6= 0

−n

Nup exp( y

VaRp−u)

VaRp−u ξ = 0

.

Similarly, using the following reparameterization for ξ 6= 0

Gξ,ESp = 1−1 +

ξ +(

nNu

p

)−ξ − 1

(ESp − u)(1− ξ)y

− 1ξ

,

gξ,ESp =ξ +

(n

Nup

)−ξ − 1

ξ(1− ξ)(ESp − u)

1 +

ξ +(

nNu

p

)−ξ − 1

(ESp − u)(1− ξ)y

− 1ξ−1

,

18

we compute a log-likelihood based confidence interval for the expected shortfallESp. Figures 12–13 show the likelihood based confidence regions for the lefttail VaR and ES of S&P500 obtained by using these reparameterized versionsof GPD.

2.356 2.397 2.447

−1.921

0

VaR0.01

VaR0.01

ξ

2.361 2.397 2.432

0.216

0.388

0.587

MLBCa

Fig. 12. Left panel: Relative profile log-likelihood function and confidence intervalfor VaR0.01. Right panel: Single and joint confidence intervals at level 95% for ξ andVaR0.01. Dots represent 1000 bootstrap estimates from S&P500 data.

3.147 3.412 4.017

−1.921

0

ES0.01

ES0.01

ξ

3.14 3.412 3.852

0.218

0.388

0.582

ML

BCa

Fig. 13. Left panel: Relative profile log-likelihood function and confidence intervalfor ES0.01. Right panel: Single and joint confidence intervals at level 95% for ξ andES0.01. Dots represent 1000 bootstrap estimates from S&P500 data.

In the same figures we show the single bias-corrected and accelerated boot-strap confidence intervals. We also plot the pairs (ξi, VaR0.01,i), (ξi, ES0.01,i),i = 1, . . . , 1000 estimated from 1000 resampled data sets. We observe thatabout 5% lie outside the 95% joint confidence region based on likelihood(which is not the case for the single intervals). Again this shows the inter-est of considering joint confidence intervals.

The maximum likelihood (ML) point estimates, the maximum likelihood andthe BCa bootstrap confidence intervals for ξ, σ, VaR0.01 and ES0.01 for both tails

19

of the S&P500 are summarized in Table 4.

Table 4Point estimates and 95% maximum likelihood (ML) and bootstrap (BCa) confidenceintervals for the S&P500.

Lower bound Point estimate Upper boundBCa ML ML ML BCa

Left tailξ 0.221 0.245 0.388 0.585 0.588σ 0.442 0.444 0.545 0.671 0.651

VaR 0.01 2.361 2.356 2.397 2.447 2.432ES 0.01 3.140 3.147 3.412 4.017 3.852

Right tailξ 0.068 0.075 0.137 0.212 0.220σ 0.520 0.530 0.579 0.634 0.638

VaR 0.01 2.427 2.411 2.505 2.609 2.591ES 0.01 3.182 3.151 3.351 3.634 3.569

The results in Table 4 indicate that, with probability 0.01, the tomorrow’s losson a long position will exceed the value 2.397% and that the correspondingexpected loss, that is the average loss in situations where the losses exceed2.397%, is 3.412%.

It is interesting to note that the upper bound of the confidence interval forthe parameter ξ is such that the first order moment is finite (1/0.671 > 1).This guarantees that the estimated expected shortfall, which is a conditionalfirst moment, exists for both tails.

The point estimates for all the six market indices are reported in Table 5.Similarly to the S&P500 case the left tail is heavier than the right one for allindices. Looking at estimated VaR and ES values, we observe that Hang Seng(HS) and DJ Euro Stoxx 50 (ES50) are the most exposed to extreme losses,followed by Nikkei and the Swiss Market Index (SMI). The less exposed indicesare S&P500 and FTSE 100.

Regarding the right tail Hang Seng is again the most exposed to daily extrememoves and S&P500 and FTSE100 are the least exposed.

20

Table 5Point estimates for the POT method for six market indices.

ES50 FTSE100 HS Nikkei SMI S&P500

Left tailξ 0.045 0.232 0.298 0.181 0.264 0.388σ 1.102 0.656 1.395 0.810 0.772 0.545

VaR 0.01 3.819 2.862 5.146 3.435 3.347 2.397ES 0.01 5.057 4.103 8.046 4.697 4.880 3.412

Right tailξ 0.113 0.093 0.156 0.165 0.185 0.137σ 0.953 0.711 1.102 0.912 0.764 0.579

VaR 0.01 3.517 2.707 4.581 3.316 3.078 2.505ES 0.01 4.785 3.562 6.258 4.470 4.259 3.351

5 Concluding Remarks

We have illustrated how extreme value theory can be used to model tail-related risk measures such as Value-at-Risk, expected shortfall and returnlevel, applying it to daily log-returns on six market indices.

Our conclusion is that EVT can be useful for assessing the size of extremeevents. From a practical point of view this problem can be approached indifferent ways, depending on data availability and frequency, the desired timehorizon and the level of complexity one is willing to introduce in the model.One can choose to use a conditional or an unconditional approach, the BMMor the POT method, and finally rely on point or interval estimates.

In our application, the POT method proved superior as it better exploitsthe information in the data sample. Being interested in long term behaviorrather than in short term forecasting, we favored an unconditional approach.Finally, we find it is worthwhile computing interval estimates as they provideadditional information about the quality of the model fit.

References

Artzner, P., Delbaen, F., Eber, J.-M., and Heath, D. (1999). Coherent mea-sures of risk. Mathematical Finance, 9(3):203–228.

Azzalini, A. (1996). Statistical Inference Based on the Likelihood. Chapmanand Hall, London.

Balkema, A. A. and de Haan, L. (1974). Residual life time at great age. Annalsof Probability, 2:792–804.

21

Castillo, E. and Hadi, A. (1997). Fitting the Generalized Pareto Distributionto Data. Journal of the American Statistical Association, 92(440):1609–1620.

Christoffersen, P. and Diebold, F. (2000). How relevant is volatility forecastingfor financial risk management. Review of Economics and Statistics, 82:1–11.

Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values.Springer.

Dacorogna, M. M., Muller, U. A., Pictet, O. V., and de Vries, C. G. (1995).The distribution of extremal foreign exchange rate returns in extremelylarge data sets. Preprint, O&A Research Group.

Danielsson, J. and de Vries, C. (1997). Beyond the Sample: Extreme Quantileand Probability Estimation. Mimeo.

Danielsson, J. and de Vries, C. (2000). Value-at-Risk and Extreme Returns.Annales d’Economie et de Statistique, 60:239–270.

Danielsson, J., de Vries, C., de Haan, L., and Peng, L. (2001). Using a Boot-strap Method to Choose the Sample Fraction in Tail Index Estimation.Journal of Multivariate Analysis, 76(2):226–248.

Diebold, F. X., Schuermann, T., and Stroughair, J. D. (1998). Pitfalls andopportunities in the use of extreme value theory in risk management. InRefenes, A.-P., Burgess, A., and Moody, J., editors, Decision Technologiesfor Computational Finance, pages 3–12. Kluwer Academic Publishers.

Dupuis, D. J. (1998). Exceedances over high thresholds: A guide to thresholdselection. Extremes, 1(3):251–261.

Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap.Chapman & Hall, New York.

Embrechts, P., Kluppelberg, C., and Mikosch, T. (1999). Modelling ExtremalEvents for Insurance and Finance. Applications of Mathematics. Springer.2nd ed.(1st ed., 1997).

Fisher, R. and Tippett, L. H. C. (1928). Limiting forms of the frequencydistribution of largest or smallest member of a sample. Proceedings of theCambridge philosophical society, 24:180–190.

Gencay, R., Selcuk, F., and Ulugulyagci, A. (2003a). EVIM: a software packagefor extreme value analysis in Matlab. Studies in Nonlinear Dynamics andEconometrics, 5:213–239.

Gencay, R., Selcuk, F., and Ulugulyagci, A. (2003b). High volatility, thick tailsand extreme value theory in value-at-risk estimation. Insurance: Mathemat-ics and Economics, 33:337–356.

Gnedenko, B. V. (1943). Sur la distribution limite du terme d’une seriealeatoire. Annals of Mathematics, 44:423–453.

Grimshaw, S. (1993). Computing the Maximum Likelihood Estimates for theGeneralized Pareto Distribution to Data. Technometrics, 35(2):185–191.

Hosking, J. R. M. and Wallis, J. R. (1987). Parameter and quantile estimationfor the generalised Pareto distribution. Technometrics, 29:339–349.

Jenkinson, A. F. (1955). The frequency distribution of the annual maximum(minimum) values of meteorological events. Quarterly Journal of the Royal

22

Meteorological Society, 81:158–172.Jondeau, E. and Rockinger, M. (1999). The tail behavior of stock returns:

Emerging versus mature markets. Mimeo, HEC and Banque de France.Koedijk, K. G., Schafgans, M., and de Vries, C. (1990). The Tail Index of

Exchange Rate Returns. Journal of International Economics, 29:93–108.Kuan, C. H. and Webber, N. (1998). Valuing Interest Rate Derivatives Con-

sistent with a Volatility Smile. Working Paper, University of Warwick.Longin, F. M. (1996). The Assymptotic Distribution of Extreme Stock Market

Returns. Journal of Business, 69:383–408.Loretan, M. and Phillips, P. (1994). Testing the covariance stationarity of

heavy-tailed time series. Journal of Empirical Finance, 1(2):211–248.McNeil, A. J. (1999). Extreme value theory for risk managers. In Internal

Modelling and CAD II, pages 93–113. RISK Books.McNeil, A. J. and Frey, R. (2000). Estimation of tail-related risk measures for

heteroscedastic financial time series: an extreme value approach. Journal ofEmpirical Finance, 7(3–4):271–300.

Neftci, S. N. (2000). Value at risk calculations, extreme events, and tail esti-mation. Journal of Derivatives, pages 23–37.

Pickands, J. I. (1975). Statistical inference using extreme value order statistics.Annals of Statististics, 3:119–131.

Reiss, R. D. and Thomas, M. (1997). Statistical Analysis of Extreme Va-lues with Applications to Insurance, Finance, Hydrology and Other Fields.Birkhauser Verlag, Basel.

Rootzen, H. and Kluppelberg, C. (1999). A single number can’t hedge againsteconomic catastrophes. Ambio, 28(6):550–555. Preprint.

Shao, J. and Tu, D. (1995). The Jackknife and Bootstrap. Springer Series inStatistics. Springer.

Straetmans, S. (1998). Extreme financial returns and their comovements.Ph.D. Thesis, Tinbergen Institute Research Series, Erasmus University Rot-terdam.

Tajvidi, N. (1996a). Confidence Intervals and Accuracy Estimation for Heavy-tailed Generalized Pareto Distribution. Thesis article, Chalmers Universityof Technology. www.maths.lth.se/matstat/staff/nader/.

Tajvidi, N. (1996b). Design and Implementation of Statistical Computationsfor Generalized Pareto Distributions. Technical Report, Chalmers Univer-sity of Technology. www.maths.lth.se/matstat/staff/nader/.

von Mises, R. (1954). La distribution de la plus grande de n valeurs. InSelected Papers, Volume II, pages 271–294. American Mathematical Society,Providence, RI.

23

an application of extreme value theory for measuring financial

Documents