on tail index estimation for dependent...

Econometric Theory, 26, 2010, 1398–1436.doi:10.1017/S0266466609990624

ON TAIL INDEX ESTIMATIONFOR DEPENDENT,

HETEROGENEOUS DATA

JONATHAN B. HILLUniversity of North Carolina–Chapel Hill

In this paper we analyze the asymptotic properties of the popular distribution tailindex estimator by Hill (1975) for dependent, heterogeneous processes. We developnew extremal dependence measures that characterize a massive array of linear, non-linear, and conditional volatility processes with long or short memory. We prove thatthe Hill estimator is weakly and uniformly weakly consistent for processes with ex-tremes that form mixingale sequences and asymptotically normal for processes withextremes that are near epoch dependent (NED) on some arbitrary mixing functional.The extremal persistence assumptions in this paper are known to hold for mixing,L p-NED, and some non-L p-NED processes, including ARFIMA, FIGARCH, ex-plosive GARCH, nonlinear ARMA-GARCH, and bilinear processes, and nonlineardistributed lags like random coefficient and regime-switching autoregressions.

Finally, we deliver a simple nonparametric estimator of the asymptotic varianceof the Hill estimator and prove consistency for processes with NED extremes.

1. INTRODUCTION

This paper develops an asymptotic theory for the popular distribution tail indexestimator due to B.M. Hill (1975) under general conditions. Many time series infinance, macroeconomics, and meteorology exhibit extreme values that appear tocluster (Leadbetter, Lindgren, and Rootzen, 1983; Embrechts, Kluppelberg, andMikosch, 1997). In order to deliver a Gaussian limit theory that is robust to thenature of persistence and heterogeneity in extremes, we introduce new extremaldependence measures and develop an associated weak and uniform limit theoryfor dependent, heterogeneous tail arrays.

Denote by {Xt } = {Xt : −∞ < t < ∞} a stochastic process on some probabilitymeasure space, write Ft (x) := P(Xt ≤ x), and assume Ft has support on [0,∞).Assume Ft (x) := P(Xt > x) is regularly varying at ∞: for all λ > 0 and each t ,

limx→∞

Ft (λx)

Ft (x)= λ−α, (1)

I would like to thank Enno Mammen for insight into the extremal dependence properties developed here, OliverLinton for discussions concerning the strong-GARCH case, and Holger Drees for comments on an earlier version. Inparticular, I kindly thank three anonymous referrees, co-editor Bruce Hansen, and editor Peter C.B. Phillips for expertcommentary that led to substantial improvements. All errors, of course, are mine alone. Address correspondence toJonathan B. Hill, Dept. of Economics, University of North Carolina–Chapel Hill; e-mail: [email protected].

1398 c© Cambridge University Press 2010 0266-4666/10 $15.00

TAIL INDEX ESTIMATION 1399

where α > 0 denotes the index of regular variation. Equivalently,

Ft (x) = x−α L(x), x > 0, where L(x) is slowly varying. (2)

The distribution class (2) includes the domain of attraction of the stable laws, co-incides with the maximum domain of attraction of the extreme value distributionsexp{−x−α}, and characterizes the tails of many stochastic recurrence equations,including GARCH processes. See Bingham, Goldie, and Teugels (1987), Resnick(1987), and Basrak, Davis, and Mikosch (2002).

Let X(i) > 0 denote the i th order statistic of a sample path {Xt }nt=1 with sample

size n ≥ 1, X(1) ≥ X(2) ≥ ·· · ≥ X(n), and let {mn} be an intermediate ordersequence: 1 ≤ mn < n, mn → ∞ as n → ∞, and mn = o(n). B.M. Hill’s (1975)estimator of α−1 is simply the average log-exceedance

α−1mn

:= 1

mn

n

∑t=1

(ln(

Xt/X(mn+1)

))+ = 1

mn

mn

∑i=1

ln(

X(i)/X(mn+1)

),

where (z)+ := max{z,0}. The so-called Hill estimator has been used pervasivelyin the applied finance, economics, statistics, and telecommunications literatures.Consider Akgiray and Booth (1988), Cheng and Rachev (1995), Quintos, Fan, andPhillips (2001), Resnick and Rootzen (2000), Chan, Deng, Peng, and Xia (2007),and Hill (2008b), to name a few. For alternative estimation techniques consultPickands (1975), Smith (1987), Rootzen, Leadbetter, and de Haan (1990), Smithand Weissman (1994), Drees, Ferreira, and de Haan (2004), Csorgo and Viharos(1995), Beirlant, Dierckx, and Gaillou (2005), and Iglesias and Linton (2008).

We are interested in the asymptotic properties of α−1mn

under minimal but verifi-able conditions. Asymptotic normality has been established for i.i.d., strong mix-ing, and l-dependent approximable sequences including GARCH(1,1) processes;and consistency was shown for l-dependent approximable sequences, infinite or-der moving averages, bilinear, ARCH(1), and stochastic recurrence equations(e.g., GARCH). See Mason (1982), Hall (1982), Davis and Resnick (1984), Halland Welsh (1984), Haeusler and Teugels (1985), Rootzen et al. (1990), Hsing(1991, 1993), Resnick and Starica (1995, 1998), de Haan and Resnick (1998),and Quintos et al. (2001).

Hsing (1991) develops an asymptotic theory under remarkably general con-ditions and proves asymptotic normality for strong mixing processes. Sufficientconditions include restrictions on tail decay (2) and the existence of probabilityand distribution limits for nonlinear tail arrays based on {Xt } (see Section 2). It isnot obvious whether such limit theory holds beyond the strong mixing case and{mn} is intimately tied to tail decay.

Mixing properties are convenient because functions of mixing random vari-ables are mixing, and a well-established limit theory exists (e.g., Ibragimov andLinnik, 1971). Nevertheless, it is typically difficult to verify a mixing condition,and many time series are not mixing, or are mixing only under harsh conditions.Infinite order distributed lags, for example, need not be mixing due to density

1400 JONATHAN B. HILL

smoothness requirements, including ARFIMA, nonlinear ARMA-GARCH, andsome long memory processes (see Gorodetskii, 1977; Andrews, 1984; Gueganand Ladoucette, 2001; Carrasco and Chen, 2002; and Wu, 2005).

The near epoch dependence (NED) property (Ibragimov, 1962; Ibragimov andLinnik, 1971; Gallant and White, 1988), however, has substantial practical ad-vantages because it only requires computation of a conditional expectation, it istypically easy to verify, it carries over to a large class of functions of NED ran-dom variables, and powerful central limit theory is available (Davidson, 1992; deJong, 1997). NED characterizes any mixing process, infinite order distributed lagsof a mixing process, and many nonmixing processes, since density smoothness isirrelevant (Davidson, 1994, 2004). McLeish’s (1975) broader mixingale conceptis advantageous for theoretical reasons: Processes that are NED on a mixing pro-cess form mixingale sequences that satisfy useful inequalities and laws of largenumbers, and mixingales decompose to martingale differences for which centrallimit theory is available. A related conditional moment-based concept, L p-weakdependence, and associated central limit theory are treated in Wu (2005) and Wuand Min (2005). NED and L p-weak dependence appear to cover many of the sameprocesses, where neither seems to dominate the other.

In a purely extreme value theoretic environment, however, the analyst may notwant to commit to superfluous assumptions involving nonextremes. Leadbetter(1974) and Leadbetter et al. (1983) provide some relief with a so-called D-mixingproperty for serial extremes, but the property does not necessarily carry over toarbitrary functions of D-mixing random variables (see Section 2).

Further, there are no details in the literature on how to characterize the asymp-totic variance of α−1

mnin general without specifying a parametric model or exploit-

ing independence or a mixing property (e.g., Hall, 1982; Hsing, 1991).In Section 2 we control for memory and heterogeneity in extremes by introduc-

ing extremal versions of mixingale and NED properties. By exploiting primitiveresults in Hsing (1991), we prove in Section 3 that α−1

mnand the intermediate order

statistic X(mn+1) are weakly and uniformly weakly consistent by assuming thatextremes of {Xt } form mixingale arrays and delivering new uniform laws for tailarrays. See Hall and Welsh (1985) for uniform consistency of the Hill estima-tor for i.i.d. data and Smith (1982) for uniform convergence of sample maximaof i.i.d. data. We then prove that α−1

mnis asymptotically normal when {Xt } has

extremes that are NED on a mixing functional of some arbitrary process {εt }.The generality afforded by an extremal version of NED is important if we wish

to analyze Xt itself, rather than a prefiltered series based on a possibly misspeci-fied model or a filter that erodes information reflecting tail shape.1 The propertycharacterizes a massive array of stochastic processes, including any geometricallymixing process (e.g., nonlinear GARCH with sufficiently smoothly distributederrors), both L p-NED (e.g., ARFIMA, stationary GARCH) and non-L p-NED(e.g., explosive GARCH) processes where underlying errors are only required tobe L p-bounded, as well as bilinear processes, and random coefficient and regime-switching autoregressions.


Finally, in Section 4 we develop a nonparametric kernel estimator of the asymp-totic variance of α−1

mnand prove consistency for processes with NED extremes.

As far as we know this is the first of its kind in the extreme value theory litera-ture. An underlying structure that may affect the parametric form of the limitingvariance need not be specified (e.g., ARFIMA, GARCH, regime switching). Nev-ertheless, the asymptotic variance in the i.i.d. case, α−2, may hold for nonidenti-cally distributed weakly orthogonal processes, including stochastic volatility (Hill2008a).2

In related work, Quintos et al. (2001) also work with results due to Hsing(1991). They deliver a functional Gaussian limit for α−1

mnfor GARCH(1,1) pro-

cesses by extending Hsing’s (1991, Cor. 3.3) proof of asymptotic normality fortail mixing data. See Section 2.2 below for a definition of tail mixing. Quin-tos et al. (2001) use the theory to deliver a unique structural break test withrespect to the tail index. Although their approach undoubtedly extends to otherprocesses by case, their arguments closely exploit GARCH model dynamics andrely on a case-dependent semiparametric construction of the asymptotic vari-ance (cf. Hsing, 1991). By comparison we do not require stationarity in general,and our results cover GARCH, IGARCH, explosive GARCH, nonlinear GARCH(e.g., quadratic GARCH), and much more. Similarly, we do require any infor-mation on the asymptotic variance other than existence in order to deliver theconsistent kernel estimator. See, also, Hill (2009a) for functional limit theory forD-valued, dependent heterogeneous tail arrays of the same broad class of pro-cesses covered here.

Appendix A contains proofs of the main results, Appendix B contains prelimi-nary results, and Appendix C compiles variable definitions for quick reference.

We employ the following notation conventions:p→ denotes convergence in

probability,a.s.→ a.s. almost sure convergence, and =⇒ convergence in distribu-

tion; [x] denotes the integer part of x ; K > 0 denotes an arbitrary finite constantwhose value may change from line to line; ι > 0 is an arbitrarily tiny constantwhose value may change; and xn ∼ yn implies xn/yn → 1.

2. EXTREMAL DEPENDENCE

Assume Ft (x)/Ft (x−) → 1 uniformly in t ∈ Z such that there exists a sequenceof positive real numbers {bmn }n≥1 satisfying (e.g., Leadbetter et al., 1983, Thm.1.7.13)

n

mnP(

Xt > bmn

)→ 1. (3)

We implicitly assume {mn,bm} satisfy (3) for all t . Intuitively, bmn estimates theintermediate order statistic X(mn+1), since P(Xt > bmn ) ∼ mn/n and 1/n ∑n

t=1 I(Xt > X(mn+1)) ∼ mn/n by construction.

Hsing (1991, Thms. 2.2, 2.4) proves under a mild second order constraint ontail decay (2) that asymptotics concerning α−1

mnare grounded on triangular tail


arrays based on tail exceedances and events:{(ln(

Xt/bmn

))+ , I

(Xt > bmn eu) : 1 ≤ t ≤ n

}n≥1

.

Hsing (1991, Thm. 3.3) then imposes a mixing property on {(ln(Xt/bmn ))+,I (Xt > bmn eu)} to prove α−1

mnis asymptotically normal. We impose new tail de-

pendence properties on {I (Xt > bmn eu)} that cover and substantially generalizeHsing’s mixing condition.

2.1. Extremal Mixingale and Extremal NED

Let {n,t } = {n,t : 1 ≤ t ≤ n}n≥1 be an increasing triangular array of σ -fieldsinduced by some arbitrary, possibly vector-valued stochastic array {En,t } = {En,t :1 ≤ t ≤ n}n≥1:

n,t := σ(

En,τ : 1 ≤ τ ≤ t).

Since the objects of interest {(ln(Xt/bmn ))+, I (Xt > bmn eu)} are tail arrays de-pendent on the sample size, we restrict information to sample time periods t ∈{1, ...,n}. By convention, t

n,s = {∅,�} if t ≤ 0 or s > n, hence tn,s = t

n,1 =t

n,−∞ if s ≤ 0, tn,s = n

n,s = +∞n,s if t ≥ n, and n,s ⊂ n,t ∀1 ≤ s < t ≤ n.

Consider two extremal dependence properties for {Xt } that characterize howwell information induced from {En,t } can be used to predict extreme values of{Xt : 1 ≤ t ≤ n} as n → ∞. Throughout, {qn} denotes an arbitrary sequence ofinteger displacements satisfying 1 ≤ qn < n, and qn → ∞.

Property 1. L p-E-MIXL {Xt ,n,t } forms an L p-extremal mixingale array,p > 0, with size λ > 0 if∥∥ P(

Xt > bmn eu)− P(

Xt > bmn eu |n,t−qn

)∥∥p ≤ en,t (u)×ϕqn∥∥I

(Xt > bmn eu)− P

(Xt > bmn eu |n,t+qn

)∥∥p ≤ en,t (u)×ϕqn+1,

where en,t : R+ → R+ is Lebesgue measurable, sup1≤t≤n supu≥0 en,t (u) =O((mn/n)1/p), and ϕqn = o(q−λ

n ).

Property 2. L p-E-NED {Xt } is L p-extremal NED on {n,t }, p > 0, with sizeλ > 0 if∥∥∥I(

Xt > bmn eu)− P(

Xt > bmn eu |t+qnn,t−qn

)∥∥∥p

≤ fn,t (u)×ψqn ,

where fn,t : R+ → R+ is Lebesgue measurable, sup1≤t≤n supu≥0 fn,t (u) =O((mn/n)1/p), and ψqn = o(q−λ

n ).

Remark 1. In the spirit of conventional mixingale and NED definitions, the“constants” en,t (u) and fn,t (u) permit time dependence in the L p-norm and allow


the “coefficients” ϕqn and ψqn to be scale independent. Thus, without loss ofgenerality, assume

supn≥1

{ϕqn ,ψqn

} ∈ [0,1).

We say {Xt } is geometrically L p-E-NED if ψqn = o(ρqn ) for some ρ ∈ (0,1), inwhich case size λ > 0 is arbitrary.

Remark 2. L p-E-NED and L p-E-MIXL are simply NED and mixingale prop-erties assigned to {I (Xt > bmn eu)}, with adjustments to scale since I (Xt >bmn eu) is asymptotically degenerate. For example, after multiplying out termsand invoking the law of iterated expectations, the L2-E-NED property implies

limn→∞

n

mnq2λ

n sup1≤t≤n

supu≥0

(P(

Xt > bmn eu)− E

[P(


)2])

= 0,

and since (n/mn)P(Xt > bmn eu) → e−αu for all t under (1)–(3),

limn→∞q2λ

n sup1≤t≤n

supu≥0

(e−αu − n

mnE

[P(


)2])

= 0.

Literally, {Xt } is L2-E-NED on {n,t } when also (n/mn)E[P(Xt > bmn eu |t+qn

n,t−qn)2] → e−αu sufficiently fast. Thus, fn,t (u) = O((mn/n)1/2) ensures the

norm does not collapse to zero simply due to degeneracy associated with the tailfractile (or “bandwidth”) mn → ∞ and mn = o(n), as opposed to (near epoch)dependence.

Remark 3. We exploit a displacement sequence {qn} rather than fixed q due tothe degenerate nature of I (Xt > bmn eu). Unless Xt is l-dependent for finite l orthe base En,t is independent, in general qn → ∞ must be satisfied to be able todiscern degeneracy from the ability to use {En,τ }t+qn

t−qnto predict I (Xt > bmn eu).

See comments following the proof of Lemma 2 in Appendix A. Displacementsequences have been exploited by Leadbetter (1974), Leadbetter et al. (1983),Hsing (1991, 1993) and Davis and Hsing (1995) for tail mixing properties, andde Jong (1997) for mixingale arguments associated with Bernstein block arrays.See, e.g., Ibragimov and Linnik (1971), McLeish (1975), and Gallant and White(1988) for traditional usage of fixed q.

Remark 4. If n,t is adapted to Xt or simply I (Xt > bmn eu), then E-NED istrivial: I (Xt > bmn eu) − P(Xt > bmn eu |t+qn

n,t−qn) = I (Xt > bmn eu) − I (Xt >

bmn eu) = 0, hence size is arbitrary.

Remark 5. A process {Xt } is L p-E-NED with size λ if and only if it is Ls-E-NED with size λp/max{p,s} for any s � p since |I (Xt > bmn eu) − P(Xt

> bmn eu |t+qnn,t−qn

)| ≤ 1 a.s. (see Hill, 2008c). But this suggests p is irrelevant,since L p-E-NED is equivalent to Ls-E-NED. It is nevertheless convenient to


assume that {Xt } is L2-E-NED to ensure both exceedance and event processes{(ln(Xt/bmn ))+, I (Xt > bmn eu)} have the same memory property, since the twoform the stochastic basis of α−1

mn.

2.2. Functional Mixing

In the E-MIXL and E-NED definitions, the σ -fields {n,t } are induced by sometriangular array {En,t }. We restrict persistence in En,t by imposing a mixing con-dition. Assume {En,t } is a possibly vector-valued functional of some process {εt }with σ -field

Gt = σ(ετ : τ ≤ t) and Gts = σ(ετ : s ≤ τ ≤ t), where n,t ⊆ Gt .

Let En,t = 0 for t /∈ {1, ..,n}, and the remaining En,t may, for example, besome lag or lags of εt or of the extreme event I (εt > an,t ), peak over threshold(εt − an,t )+, or extreme value εt I (εt > an,t ) each for some triangular array {an,t }of constants, an,t → ∞ as n → ∞. Because nonsample E ′

n,t s are constants, theassociated σ -fields are trivial: t

n,s = {∅,�} if t ≤ 0 or s > n.The generality behind En,t is not vacuous, since εt may be the innovations in

a parametric model like strong-GARCH, or simply εt = Xt . In the former caseεt is i.i.d., so any functional En,t of εt is trivially mixing. In the latter case, sinceunder mild conditions α−1

mnis grounded on {(ln(Xt/bmn ))+, I (Xt > bmn eu)}, we

may assume En,t = I (εt > bmn eu) and impose a mixing condition on En,t as inHsing (1991).

Now define mixing coefficients, where {qn} again denotes a sequence of integerdisplacements, 1 ≤ qn < n and qn → ∞:

εn,qn := supA∈t

n,−∞,B∈+∞n,t+qn :t∈Z

|P (A ∩ B)− P(A)P(B)|

n,qn := supA∈t

n,−∞,B∈+∞n,t+qn :t∈Z

|P (B|A)− P(B)| .

F-Mixing. If (n/mn)qλn εn,qn → 0 as n → ∞ we say {εt } is functional-strong

mixing with size λ>0. If (n/mn)qλn n,qn →0 as n→∞ we say {εt } is functional-

uniform mixing with size λ > 0.

Remark 6. F-mixing on {εt } is simply mixing assigned to the triangular array{En,t }. There are, therefore, many variations on this concept. If, for example,En,t = I (εt > bmn eu) and (n/mn)qλ

n εn,qn → 0, we might say {εt } is extremal-strong mixing since tail events mix asymptotically.

Remark 7. The coefficients εn,qn and n,qn intrinsically depend on samplesize n due to the triangular array nature of n,t , similar to the E-MIXL andE-NED constants en,t (u) and fn,t (u). Mixing conditions applied to triangular ar-rays have a range of applications in the dependence and limit theory literatures(e.g., Andrews, 1985), in particular for sample-size dependent extremal arrays(Leadbetter, 1974; Leadbetter et al., 1983).


Remark 8. The scale n/mn → ∞ is required in general, since we use F-mixing {εt } as an E-NED base, and E-NED characterizes degenerate I (Xt >bmn eu). Thus, qn → ∞ must also hold since, for example, (n/mn)qλεn,q →∞ is possible unless εn,q = 0 uniformly in n and q (e.g., En,t is independent).See especially the proof of Lemma 2. In general there is much room for interpre-tation, since qn → ∞ is otherwise arbitrary. By σ -field dominance n,t ⊆ Gt ,for example, it is easy to show that a strong mixing process {εt } of size 2 satisfieslimn→∞ q2

nεn,qn → 0. Now put qn = [n/mn] and note that

limn→∞ [n/mn]2 εn,[n/mn ] = lim

n→∞ (n/mn)qnεn,qn = 0

implies F-strong mixing of size 1. Hill (2009b, Lem. C.1) shows that asymptoti-cally infinite order lags of F-mixing random variables are F-mixing, and standardinequalities apply, as in Ibragimov (1962) and Serfling (1968).

Remark 9. By the construction of {n,t }, note identically

εn,qn = supA∈t

n,1,B∈nn,t+qn :1≤t≤n−qn

|P (A ∩ B)− P(A)P(B)|

are Hsing’s (1991, p. 1555) mixing coefficients. Using our notation, Hsing (1991)only considers the case εt = Xt , En,t = [(ln(Xt/bmn ))+, I (Xt > bmn eu)]′,qn = o(n), and (n/qn)εn,qn → 0 to prove that α−1

mnis asymptotically normal. Since

Hsing’s displacement qn = o(n) is otherwise arbitrary, suppose qn = man for some

a ∈ (0,1). Then (n/qn)εn,qn = (n/mn)q(1−a)/an εn,qn → 0 implies F-strong mixing

of size (1 − a)/a ∈ (0,∞).

Remark 10. F-strong mixing is also a generalized, uniform version of Lead-better’s (1974) D-mixing concept (cf. Leadbetter et al., 1983). For any triangulararray {εt : 1 ≤ t ≤ n}n≥1 and any sequence of integers 1 ≤ t1 < · · · < tp1 < s1 <· · · < sp2 ≤ n for which s1 − tp1 > qn → ∞, define

δqn :=∣∣∣Ft1,...,tp1 ; s1,...,sp2

(an)− Ft1,...,tp (an)Fs1,...,tp2(an)∣∣∣ ,

where Ft1,...,tp1(an) := P(εt1 ≤ an,t1 , . . . ,εtp1

≤ an,tp1), {an,t } is some determinis-

tic array where an,t → ∞ as n → ∞, and p1 and p2 are arbitrary positive integers.Then {εt } is D-mixing if δqn → 0 as n → ∞. D-mixing implies joint independenceof the events {εi ≤ an,i }t

i=1 and {εi ≤ an,i }ni=t+qn

as n → ∞, and strong mixingimplies D-mixing. If εt is F-strong mixing with respect to En,t = I (εt ≤ an,t ),then εt is necessarily D-mixing, since δqn ≤ εn,qn due to the sup-operator in εn,qn .In this case D-mixing is a weaker condition, but D-mixing does not necessarilycarry over to finite measurable functions of D-mixing random variables, whileasymptotically infinite order lag functions of F-mixing random variables are F-mixing. In this regard F-mixing has a superlative advantage that we exploit in theproof of asymptotic normality of α−1

mn.


The following examples of F-mixing and E-NED processes are verified inSection 5:

Example 1 (Finite dependence)Let {yt } be a one-sided l-dependent process for finite l ∈ N. Then Xt := |yt |is trivially F-strong mixing with arbitrary size, since εn,qn = 0 ∀qn ≥ l. If theE-NED base is simply Xt itself, and En,t = Xt for t = 1, . . . ,n and 0 otherwise,then {Xt } is L2-E-NED on {n,t } where E-NED and F-mixing sizes are arbitrary.

Example 2 (Strong mixing GARCH)Let yt = ht ut , where ut is i.i.d. and h2

t is stationary, geometrically strong mix-ing, and measurable with respect to σ(yτ : τ ≤ t − 1). Examples include lin-ear and nonlinear GARCH processes. See Carrasco and Chen (2002) and Meitzand Saikkonen (2008) for sufficient conditions for geometric strong mixing inGARCH processes. Define Xt := |yt | and let En,t = Xt for t ∈ {1, . . . ,n} and0 otherwise. Then {Xt } is geometrically F-strong mixing by Lemma C.1 in Hill(2009b), and since {n,t } is adapted to {Xt }, the E-NED property is trivial: {Xt }is geometrically L2-E-NED on {n,t } where E-NED and F-mixing sizes arearbitrary.

Example 3 (Hsing’s mixing)Strong mixing is far stronger than actually required. Let En,t = I (Xt > bmn eu)for t ∈ {1, . . . ,n} and 0 otherwise, and assume F-strong mixing coefficients εn,qn

satisfy (n/qn)εn,qn → 0, where qn = man for any q ∈ (0,1]. Then {Xt } satisfies

Hsing’s (1991, p. 1555) mixing condition by Remark 9. But n,t is adapted to

I (Xt > bmn eu) and (n/mn)q(1−a)/an εn,qn = (n/qn)εn,qn → 0, hence {Xt } is L2-

E-NED on {n,t } with arbitrary E-NED size and F-mixing size (1 − a)/a.

Example 4 (Nonlinear distributed lag)Consider yt = ∑∞

i=0 πt,iεt−i , where |εt | has tail (2) with index α > 1 and limε→∞L(ε) = K . The innovations εt are strictly stationary, uniformly Lα−ι-bounded,and strong mixing with size r/(r − 2), r > 2. The coefficients {πt,i } are foreach i measurable with respect to σ(ετ : τ ≤ t − i), strong mixing with sizer/(r − 2), and supt∈Z |πt,i | ≤ |πi | = O(i−μ) with probability 1 for some μ > 1/min{1, p/2}. Examples include regime switching and random coefficient autore-gressions, and ARFIMA processes each with GARCH innovations. Assume Xt

:= |yt |, and En,t = [εt−i ][qn/2]i=0 for t ∈ {1, . . . ,n} and 0 otherwise. The lag structure

of En,t ensures that {Xt } is L2-E-NED with size 1/2 on an F-strong mixing base

by ensuring that (n/mn)1/2q1/2n ||I (Xt > bmn eu) − P(Xt > bmn eu |t+qn

n,t−qn)||2 →

0 for each 1 ≤ t ≤ n as n → ∞.

Example 5 (Explosive GARCH)Let yt = htεt , where εt is i.i.d. and h2

t = β + γ y2t−1 + δh2

t−1, β > 0, and γ,δ ≥ 0.

Write Xt := |yt | and let En,t = [εt−i ][qn/2]i=0 for t ∈ {1, . . . ,n} and 0 otherwise. By

independence, {εt } is trivially F-strong mixing with arbitrary size. If the GARCH


process has a unit root, and in many cases an explosive root, then {Xt } is stillgeometrically L2-E-NED on {n,t } with arbitrary E-NED and F-mixing base sizes(Hill, 2008c), although {Xt } itself need not be mixing nor population L p-NED(Carrasco and Chen, 2002; Davidson, 2004).

3. MAIN RESULTS

We require two sets of assumptions concerning tail dependence and tail decay.

Assumption A.

(1) Let {n,t } be an arbitrary array of σ -fields, and let {Xt ,n,t } form an L2-E-MIXL array with coefficients ϕqn of size 1/2 and constants en,t (u). Inparticular, en,t (u) is integrable with respect to Lebesgue measure on R+and sup1≤t≤n

∫∞0 en,t (u)du = O((mn/n)1/2).

(2) {Xt } is L2-E-NED on {n,t } with coefficients ψqn of size 1/2 and constantsfn,t (u). In particular, fn,t (u) is integrable with respect to Lebesgue mea-sure on R+ and sup1≤t≤n

∫∞0 fn,t (u)du = O((mn/n)1/2). The base {εt } is

F-uniform mixing with size r/[2(r − 1)],r ≥ 2, or F-strong mixing withsize r/(r −2),r > 2.

Remark 11. We work with the L2-norm and assume Lebesgue integrability ofen,t (u) and fn,t (u) to ensure {(ln(Xt/bmn ))+} satisfies a corresponding mixingaleor NED property. See Lemma B.1 in Appendix B, and see Section 5 for examples.

Remark 12. It is easy to show that L2-E-NED Assumption A.2 ensures the L2-E-MIXL Assumption A.1 by an argument identical to Theorem 17.5 of Davidson(1994).

In order to prove uniform consistency and characterize the limit distribution ofα−1

mn, we appeal to the concept of slow variation with remainder as in condition

(SR1) of Goldie and Smith (1987). See also Smith (1982), Haeusler and Teugels(1985), and Hsing (1991).

Assumption B. There exists a positive measurable function g on (0,∞) suchthat for any λ > 0,

L(λx)/L(x)−1 = O(g(x)) as x → ∞. (SR1)

In particular, g has a bounded increase: There exist 0 < D, z0 < ∞, and τ ≤ 0such that g(λz)/g(z) ≤ Dλτ some for λ ≥ 1 and z ≥ z0. We require mn , bmn , andg to satisfy

m1/2n g(bmn ) → 0.

Remark 13. Assumption B implies the rate mn → ∞ must be made explicitdepending on Ft (x). For example, if Ft (x) = cx−α(1 + O(x−θ )), α,θ > 0, then


m1/2n g(bmn ) → 0 only if mn = o(n2θ/(2θ+α)). See Haeusler and Teugels (1985) for

this and other examples, and see, inter alia, Hall (1982), Cline (1983), Chan andTran (1989), Caner (1998), and Hill (2008b) for applications with this tail shape.Regularly varying tails with L(x) = c(ln x)θ , on the other hand, do not satisfy(SR1) but property (SR2) in Goldie and Smith (1987), which leads to uncenteredlimit laws for α−1

mn(e.g., Haeusler and Teugels, 1985; Hsing, 1991).

3.1. Weak Consistency for E-MIXL Arrays

Uniform consistency is delivered over a parametric class of Lipschitz continuousintermediate order sequences {mn(φ)}, φ ∈ �, where � is some compact subsetof R+.

Assumption C. Let

1 ≤ infφ∈�

mn(φ) → ∞, n ≥ supφ∈�

mn(φ) = o(n),

liminfn≥1

{mn(φ)

mn(φ′)

}≥ 1 ⇐⇒ φ ≥ φ′, and mn(φ)1/2 × g

(bmn(φ)

)→ 0. (4)

Further, for some sequence of positive numbers {hn}, hn = O(infφ∈� mn(φ)),∀φ,φ′ ∈ �,

∣∣mn(φ)−mn(φ′)∣∣≤ hn × ∣∣φ −φ′∣∣ . (5)

Remark 14. Monotonicity mn(φ)/mn(φ′) ≥ 1 ⇐⇒ φ ≥ φ′ simplifies proofsand could easily be replaced with mn(φ)/mn(φ′) ≥ 1 ⇐⇒ φ ≤ φ′.

Define tail arrays of Xt : For 1 ≤ t ≤ n, n ≥ 1,

Umn ,t := (ln(Xt/bmn

))+ − E

[(ln(

Xt/bmn

))+]

Imn ,t (u) := I(

Xt > bmn eu)− P(

Xt > bmn eu) , for any u ∈ R. (6)

The E-MIXL property suffices for tail array strong laws.

LEMMA 1. Under Assumption A.1, for any ρ in an arbitrary neighborhood of 1,

1

mn

n

∑t=1

Umn ,ta.s.→ 0,

1

ρmn

n

∑t=1

Iρmn ,t (u)a.s.→ 0 and ln

(X([ρmn ])

bρmn

)p→ 0.

Lipschitz continuity and Lemma 1 imply uniform strong laws for {(ln(Xt/bmn ))+, I (Xt > bmn eu)} by arguments in Andrews (1992), and therefore weakuniform consistency for α−1

mn(φ) by arguments in Hsing (1991).


THEOREM 1.

(i) Under Assumption A.1, α−1mn

p→ α−1 for any 1 ≤ mn < n, mn → ∞, andmn = o(n).Now let Assumptions A.1, B, and C hold.

(ii) The following limits are uniform on �:

1

mn(φ)

n

∑t=1

Umn(φ),tp→ 0,

1

mn(φ)

n

∑t=1

Imn(φ),t (u)p→ 0,

ln

(X(mn(φ)+1)

bmn(φ)

)p→ 0.

(iii) Finally, supφ∈� |α−1mn(φ) − α−1| p→ 0.

Remark 15. Since E-NED suffices for E-MIXL, Hill’s estimator is consistentfor a truly massive array of time series. See Examples 1–5 and Section 5.

There are notable limitations to Assumption C.

Example 6If Ft (x) = cx−α(1 + O(x−θ )), α,θ > 0, then mn(φ) ∼ nξ + nφ satisfies (4) and(5) for any fixed ξ ∈ (0,2θ/(2θ + α)), where φ ∈ � = [0,ξ0] for any ξ0 ∈ (0,ξ −2ι] and tiny ι > 0. This follows since infφ∈� mn(φ) ∼ nξ → ∞, and by the meanvalue theorem |mn(φ) − mn(φ′)| ≤ nξ−ι ln(n) × |φ − φ′| = O(nξ ) × |φ − φ′|.Example 7For the same tail shape, consider mn(φ) ∼ φnξ for any fixed ξ ∈ (0,2θ/(2θ + α)),where φ ∈ � = [φ0,1] for any φ0 ∈ (0,1). Then |mn(φ) − mn(φ′)| ≤ nξ |φ −φ′|and infφ∈� mn(φ) = φ0nξ , hence (4) and (5) hold.

Example 8Theorem 1 does not cover mn(φ) ∼ nφ , φ ∈ (0,2θ/(2θ + α)), for the same tailshape because Lipschitz continuity (5) with hn = O

(infφ∈� mn(φ)

)fails to hold.

Whether α−1mn(φ) is uniformly weakly consistent for such mn(φ) is left for future

consideration.

3.2. Asymptotic Normality for E-NED Processes

Hsing (1991, Thm. 2.4) proves that if the tail arrays {Umn ,t , Imn ,t (u)} in (6) havea joint central limit property

(1

m1/2n

n

∑t=1

Umn ,t , α−1 1

m1/2n

n

∑t=1

Imn ,t (u/m1/2n )

)′=⇒ (Y1,Y2)

′ (7)


in distribution to some random vector (Y1,Y2), L(λx)/L(x) → 1 as x → ∞ fast

enough and ln(X([ρm])/bρmn )p→ 0 for ρ in any neighborhood of 1, then

m1/2n

(α−1

mn−α−1

)=⇒ Y1 −Y2.

We now characterize memory in {Umn ,t , Imn ,t (u)} under Assumption A.2 and de-liver a key tail array central limit theory for L2-E-NED processes {Xt }.

Construct the following tail array:

Tmn ,t (ω,u/m1/2n ) := 1

m1/2n

[ω1Umn ,t +ω2α

−1 Imn ,t (u/m1/2n )], ω = [ω1,ω2]′,

(8)

and variance

σ 2mn

(ω) = σ 2mn

(ω1,ω2) := E

(n

∑t=1

Tmn ,t (ω,u/m1/2n )

)2

. (9)

The following ensures σ 2mn

(ω) > 0 uniformly in n and ω �= 0.

Assumption D. The covariance matrix of [1/m1/2n ∑n

t=1 Umn ,t , 1/m1/2n ∑n

t=1

Imn ,t (u/m1/2n )]′ is positive definite uniformly in n.

LEMMA 2. Let Assumption A.2 hold. For each ω′ω = 1, {Tmn ,t (ω,u/m1/2n )}

is L2-NED on {n,t } with constants dn,t = O(m−1/2n (mn/n)1/r ) uniformly over

1 ≤ t ≤ n, and coefficients ψ∗n,qn

= o((mn/n)1/2−1/r q−1/2n ). Further, {Tmn ,t (ω,u/

m1/2n ),n,t } forms an L2-mixingale array with coefficients ψqn = o(q−1/2

n ) andconstants cn,t = K n−1/2. Neither sequence of constants {dn,t } and {cn,t } dependson ω.

The L2-mixingale property of {Tmn ,t (ω)} under Lemma 2 and a general centrallimit theorem due to de Jong (1997, Lem. 1) ensure the following central limittheorem.

LEMMA 3. Under Assumptions A.2 and D,

(i) ∑nt=1 Tmn ,t (ω,u/m1/2

n )/σmn (ω) =⇒ N (0,1) pointwise in ω′ω = 1 and u ∈R, where supω′ω=1 σ 2

mn(ω) = O(1);

(ii) m1/2n ln(X(mn+1)/bmn )/σmn (0,1) =⇒ N (0,1), where σ 2

mn(0,1) =

α−2 E(1/m1/2n ∑n

t=1 Imn ,t (u/m1/2n ))2.

Remark 16. Invoke the Cramer-Wold theorem to deduce that 1/m1/2n ∑n

t=1

{(ln(Xt/bmn ))+−E[(ln(Xt/bmn ))+]} and 1/m1/2n ∑n

t=1{I (Xt > bmn eu)− P(Xt >


bmn eu)} have Gaussian distribution limits when {Xt } is L2-E-NED on an F-mixingbase {εt }. See Hsing (1991, 1993), Drees (2002), Einmahl and Lin (2006), andRootzen (2009) for related limit theory for tail arrays of i.i.d., mixing, andl-dependent processes {Xt }, each of which is covered under E-NED (Section 5).

The Lemma 3 central limit theorem does not impose any restrictions on theslowly varying component L(x) in (2). The following main result relies on slowvariation with remainder (SR1).

THEOREM 2. Under Assumptions A.2, B, and D,

m1/2n

(α−1

mn−α−1

)/σmn =⇒ N (0,1) ,

where σ 2mn

= E(m1/2n (α−1

mn− α−1))2 = O(1) and∣∣∣∣∣∣σ 2

mn− E

(1

m1/2n

n

∑t=1

{Umn ,t −α−1 Imn ,t (u/m1/2

n )})2

∣∣∣∣∣∣→ 0.

Remark 17. If {Xt } is i.i.d. then limn→∞ σ 2mn

= α−2 (e.g., Hall, 1982).

Remark 18. Notice the mean squared error σ 2mn

= E(m1/2n (α−1

mn− α−1))2 is

not necessarily the variance, since α−1mn

is in general biased (e.g., Hall, 1982;Segers, 2002). Nevertheless, σ 2

mnis proportional to the asymptotic variance un-

der Assumptions A.2 and B, since by Theorem 2 m1/2n (α−1

mn− α−1)/σmn =⇒

N (0,1).

4. KERNEL VARIANCE ESTIMATOR

In general the parametric form of the asymptotic variance limn→∞ σ 2mn

may de-pend upon underlying memory and heterogeneity properties and therefore modelparameters (e.g., ARFIMA, regime switching, GARCH). Our next goal is anonparametric estimator that sidesteps such distributional issues, at least forL2-E-NED data. We base our estimator on the following trivial expansion:

σ 2mn

= mn × E

(1

mn

n

∑t=1

(ln

(Xt

X(mn+1)

))+

−α−1

)2

= mn × E

(1

mn

n

∑t=1

{(ln

(Xt

X(mn+1)

))+

− mn

nα−1

})2

= 1

mn

n

∑s,t=1

E

[{(ln

(Xs

X(mn+1)

))+

− mn

nα−1

}

×{(

ln

(Xt

X(mn+1)

))+

− mn

nα−1

}].


It is well known that a standard estimator of the right-hand side,

1

mn

n

∑s,t=1

{(ln

(Xs

X(mn+1)

))+

− mn

nα−1

mn

}{(ln

(Xt

X(mn+1)

))+

− mn

nα−1

mn

},

is not guaranteed to be positive (Newey and West, 1987). A powerful solution isa kernel estimator

σ 2mn

= 1

mn

n

∑s,t=1

ws,t,n

{(ln

(Xs

X(mn+1)

))+

− mn

nα−1

mn

}

×{(

ln

(Xt

X(mn+1)

))+

− mn

nα−1

mn

},

where ws,t,n := w((s − t)/γn) denotes a kernel function with bandwidth γn →∞ as n → ∞, w(0) = 1, and w(z) = w(−z). The de Jong and Davidson (2000,Assum. 1) class of kernels ensures

σ 2mn

> 0 a.s.,

and includes Bartlett, Parzen, quadratic spectral, and Tukey-Hanning kernels. Seealso Newey and West (1987), Gallant and White (1988), and Hansen (1992).

THEOREM 3. Let mn = o(n) and mn/n1/2 → ∞, and let ws,t,n satisfyAssumption 1 of de Jong and Davidson (2000) with bandwidth γn → ∞ andγn = o(n). In particular, γn = o(mn/n1/2) and 1/n ∑n

s,t=1 |wn,s,t | = O(γn). Un-

der Assumptions A.2 and B, |σ 2mn

− σ 2mn

| p→ 0.

Remark 19. The number of tail observations mn must increase sufficiently fastto ensure that the plug-ins X(mn+1) and α−1

mnthat appear in every cross-product of

(ln(Xt/X(mn+1)))+ − (mn/n)α−1mn

in σ 2mn

do not affect the limit. The restrictionmn/n1/2 → ∞ implies that some tails characterized by Assumption B are not cov-ered here, including F(x) = cx−α(1 + O((ln x)−θ )), because mn = o((lnn)2θ )is required (Haeusler and Teugels, 1985).

Remark 20. As few as m2n pairs {Xs, Xt } go into the construction of σ 2

mndue to

the operator (·)+. Thus the bandwidth rate γn → ∞, which regulates the numberof included cross-products in σ 2

mn, must be restricted. The bound γn = o(mn/n1/2)

implies that the largest bandwidth allowed is γn ∼ m1/2−ιn for infinitessimal ι > 0

because we then require mn ∼ n1−ι = o(n).

5. APPLICATIONS: L2-E-NED

In this section we relate mixing and L p-NED properties to L2-E-NED and charac-terize processes that have the L2-E-NED property. In particular, we want to knowwhen Assumption A.2 holds.


5.1. Mixing Implies L2-E-NED

If n,t is adapted to Xt or simply I (Xt > bmn eu), then {Xt } is triviallyL2-E-NED on {n,t } with constants fn,t (u) = 0 and coefficients ψqn of any size,since ||I (Xt > bmn eu) − P(Xt > bmn eu |t+qn

n,t−qn)||p = ||I (Xt > bmn eu) − I (Xt >

bmn eu)||p = 0. For example, suppose Xt is geometrically strong mixing andEn,t = Xt for t ∈ {1, . . . ,n}. Then {Xt } is L2-E-NED on {n,t } with arbitraryE-NED size and {n,t } is induced by a strong mixing array {En,t } with arbi-trary size due to geometric memory, so Assumption A.2 is trivial. This coversfinite dependent processes and geometrically ergodic processes like nonlinearAR-nonlinear GARCH with innovations that have a sufficiently smooth density(An and Huang, 1996; Carrasco and Chen, 2002; Leibscher, 2005; Meitz andSaikkonen, 2008). See Examples 1–3 in Section 2.

5.2. Lp-NED Implies L2-E-NED

By definition, {Xt } is L p-NED on {n,t } with size λ>0 if ||Xt−E[Xt |t+qn,t−q ]||p≤

dn,tϑq for some constants dn,t ≥ 0, coefficients ϑq = o(q−λ) where q ∈N (Gallantand White, 1988). The following composite result implies that population L p-NED implies Ls-E-NED for any s > 0.

LEMMA 4. Assume Xt satisfies Assumption B.

(i) Let {Xt } be L p-NED on {n,t }, 0 < p < α, with constants dn,t and coef-ficients ϑq of size λ > 0. If the slowly varying component limx→∞ L(x) =K > 0, then∥∥∥I(

Xt > bneu)− P(

Xt > bneu |t+qnn,t−qn

)∥∥∥2

≤{

e−up/2 (1+d pn,t)1/2

(m/n)p/2α}

×o(

q−λmin{p,1}/4n

).

In particular, if p = α− ι for sufficiently tiny ι > 0, supn≥1 sup1≤t≤n dn,t ≤K , λ ≥ 1/min{1, p/2} and {qn} satisfies n/mn = o(qδ

n) for some δ > 0,then Assumption A.2 is satisfied.

(ii) Let {Xt } be L p-E-NED on {n,t }, p > 0, with constants fn,t (u) and co-efficients ψqn of size λ > 0. Then {Xt } is Ls-E-NED on {n,t } for anys � p with constants fn,t (u)θ and coefficients ψθ

qnof size λθ , θ = p/max

{p,s}.Remark 21. Boundedness dn,t ≤ K applies to {Xt } with bounded forms of

time dependence in the L p-norm, like cyclical trend or stochastic breaks in vari-ance when p = 2. Processes {Xt } with tail (2) and L(x) → K include the popularclass Ft (x) = cx−α(1 + o(1)). Finally, any restriction on qn is irrelevant, sincethe main results only exploit qn → ∞.

The general class of nonlinear distributed lags in Example 4 satisfies Lemma 4.


LEMMA 5. Consider Xt = ∑∞i=0 πt,iεt−i from Example 4. If En,t = [εt−i ]

[qn/2]i=0

for t = 1, . . . ,n and 0 otherwise, and n/mn = o(qδn) for some δ > 0, then Assump-

tion A.2 is satisfied.

5.3. Non-NED and L2-E-NED

The fact that such a large class of L p-NED processes has the L2-E-NED propertysuggests it is safe simply to impose L p-NED on {Xt }. However, not all interestingprocesses are NED. Consider the following GARCH process:

Xt = σtεt , εt is i.i.d. and L p-bounded, p > 0; (10)

σ 2t = ω+

p

∑i=1

βi X2t−i +

q

∑i=1

γiσ2t−i , α0 > 0, at least one βi ,γi > 0;

the roots of 1−q

∑i=1

γi zi lie outside unit circle;

and the Lyapunov exponent γ < 0.3 Class (10) has regularly varying tails of theform P(|Xt | > x) = cx−κ(1 + o(1)), c > 0, α > 0 (Basrak et al., 2002, Thm. 3.1).The root condition implies

σ 2t = π0 +

∞∑i=1

πi X2t−i , π0 > 0, πi ≥ 0, at least one πi > 0.

Davidson (2004) shows that {Xt } is L1- or L2-NED on {εt } if ∑∞i=1 πi < 1, which

neglects IGARCH and GARCH with explosive roots. The following result devel-oped in Hill (2008c) reveals many of these latter processes are, however, E-NED.See also Example 5 in Section 2.

LEMMA 6. Let Xt be generated by (10) with E[εt ] = 0 and E[ε2t ] = 1. Let

0 ≤ πi ≤ Cρi for some ρ ∈ (0,1) and C ∈ (0,1/ρ). Then {Xt } is geometricallyL2-E-NED on {n,t }, where n,t is induced by En,t = [εt−i ]

[qn2]i=1 for t = 1, . . . ,n

and 0 otherwise.

Remark 22. The bound πi ≤ Cρ−i easily allows ∑∞i=1 πi ≥ 1 covering inte-

grated and many explosive GARCH cases.

Remark 23. Since εt is i.i.d., all parts of Assumption A.2 hold.

5.4. L2-E-NED: Direct Proofs

Despite knowing that E-NED covers mixing, NED, and certain non-NED pro-cesses, it is instructive to demonstrate the property from first principles. Assumethat throughout {εt } is a symmetrically distributed process where |εt | has for eacht tail (2) with index α > 0, and En,t = [εt−i ]

[qn/2]i=0 for t = 1, . . . ,n and 0 otherwise.


Example 9 (Linear distributed lags)Define Xt := ∑∞

i=1 πiεt−i , π0 = 1, where πi ≥ 0 and inft∈Z P(εt ≥ 0) = 1 forbrevity, and ∑∞

i=0 παi < ∞, general cases being similar. In the following we only

require {εt } to behave like an independent sequence in the tails (cf. Feller, 1971;Cline, 1983; Hill, 2008b).

LEMMA 7. Let {εt } satisfy the convolution tail property P(∑∞i=0 aiεt−i > x) ∼

∑∞i=0 P(aiεt−i > x) for any deterministic sequence of real numbers {ai }, ∑∞

i=0|ai |α < ∞. Then Xt has tail (2) with index α. Further, {Xt } is L2-E-NED on {n,t }with constants fn,t (u) = e−αu/2(mn/n)1/2 and coefficients ψqn = (∑∞

i=qnπα

i /

∑∞i=0 πα

i )1/2 ∈ (0,1) for any r ≥ 2.

Remark 24. Given the simple parametric structure of Xt , we do not requirelimx→∞ L(x) = K > 0 or n/mn = o(qδ

n), contrary to Lemma 4.

Remark 25. Since εt is geometrically strong mixing the F-mixing propertywith arbitrary size is immediate, and sup1≤t≤n fn,t (u) = e−αu/2(mn/n)1/2

is Lebesgue integrable on R+. Further, the E-NED size is 1/2 as long as πi

decays sufficiently fast. This is trivial for stationary ARMA, since πi → 0 ge-ometrically as i → ∞, and for ARFIMA(p,d,q) with Hurst d < (α −1)/α < 1,since πi = O(i−(1−d)) implies both ∑∞

i=0 παi < ∞ and ψqn = O(q−1

n ).

Example 10 (Bilinear)Assume Xt = β Xt−1εt−1 + εt , εt is i.i.d., β > 0, and βα/2 E[εα/2

t ] < 1. Then {Xt }has a convergent linear distributed lag representation Xt = ∑∞

j=0 β jε( j)t , where

ε(0)t = εt , and ε

( j)t = ε2

t− j (� j−1i=1 εt−i ) has tail (2) with index α/2. In particular,

the tail behavior of Xt is dominated by ∑∞j=1 β jε

( j)t , which also satisfies (2) with

index α/2. See Davis and Resnick (1996, Cor. 2.4).

LEMMA 8. {Xt } is L2-E-NED on {n,t } with constants fn,t (u) = e−αu/2(mn/n)1/2 and coefficients ψqn = o(q−λ

n ) for any λ > 0.

NOTES

1. GARCH processes, for example, are known to have regularly varying tails (Basrak et al., 2002).The scaled residuals {εt /σt } of GARCH Xt = σt εt , however, may have subtantially thinner tails

than the original series itself, and need not have regularly varying tails (e.g., εti id∼ N (0,1)). See

Iglesias and Linton (2008) for a novel, direct approach for estimating the index of GARCHprocesses.

2. I would like to thank Oliver Linton for pointing out this issue.3. The exponent γ is associated with the first order difference equation form of Zt := [X2

t , . . . ,

X2t−p+2; σ 2

t+1,σ 2t , . . . ,σ 2

t−q+2]′. It is easy to show Zt = At Zt−1 + Bt for some i.i.d. sequences

{At , Bt } of k × k matrices At and k-vectors Bt , k ≥ 1. The exponent γ is defined by γ = limn→∞ n−1

ln || �nt=1 At ||o, where ||A||o = supx∈Rk ,|x |=1 |Ax |. If εt in (10) is i.i.d. with zero mean and

unit variance, then γ < 0 given the remaining properties (Basrak et al., 2002).


REFERENCES

Akgiray, V. & G.G. Booth (1988) The stable law model of stock returns. Journal of Business andEconomic Statistics 6, 51–57.

An, H.Z. & F.C. Huang (1996) The geometrical ergodicity of nonlinear autoregressive models. Statis-tica Sinica 6, 943–956.

Andrews, D.W.K. (1984) Non-Strong mixing autoregressive processes. Journal of Applied Probability21, 930–934.

Andrews, D.W.K. (1985) A nearly independent, but non-strong mixing, triangular array. Journal ofApplied Probability 22, 729–731.

Andrews, D.W.K. (1992) Generic uniform convergence. Econometric Theory 8, 241–257.Basrak, B., R.A. Davis, & T. Mikosch (2002) Regular variation of GARCH processes. Stochastic

Processes and Their Applications 99, 95–115.Beirlant, J., D. Dierckx, & A. Guillou (2005) Estimation of the extreme value index and generalized

plots. Bernoulli 11, 949–970.Bingham, N.H., C.M. Goldie, & J.L. Teugels (1987) Regular Variation. Cambridge University Press.Caner, M. (1998) Tests for cointegration with infinite variance errors. Journal of Econometrics 86,

155–175.Carrasco, M. & X. Chen (2002) Mixing and moment properties of various GARCH and stochastic

volatility models. Econometric Theory 18, 17–39.Chan, N.H., S.D. Deng, L. Peng, & Z. Xia (2007) Interval estimation for the conditional value-at-risk

based on GARCH with heavy tailed innovations. Journal of Econometrics 137, 556–576.Chan, N.H. & L.T. Tran (1989) On the first order autoregressive process with infinite variance. Econo-

metric Theory 5, 354–362.Cheng, B.N. & S.T. Rachev (1995) Multivariate stable futures prices. Mathematical Finance 5,

133–153.Cline, D.B.H. (1983) Estimation and Linear Prediction for Regression, Autoregression and ARMA

with Infinite Variance Data. Ph.D. dissertation, Colorado State University.Csorgo, S. & L. Viharos (1995) On the asymptotic normality of Hill’s estimator. Mathematical Pro-

ceedings of the Cambridge Philosophical Society 118, 375–382.Davidson, J. (1992) A central limit theorem for globally nonstationary near-epoch dependent functions

of mixing processes. Econometric Theory 8, 313–329.Davidson, J. (1994) Stochastic Limit Theory. Oxford University Press.Davidson, J. (2004) Moment and memory properties of linear conditional heteroscedasticity models,

and a new model. Journal of Business and Economics Statistics 22, 16–29.Davis, R. & T. Hsing (1995) Point process and partial sum convergence for weakly dependent random

variables with infinite variance. Annals of Probability 23, 879–917.Davis, R. & S. Resnick (1984) Tail estimates motivated by extreme value theory. Annals of Statistics

12, 1467–1487.Davis, R. & S. Resnick (1996) Limit theory for bilinear processes with heavy-tailed noise. Annals of

Applied Probability 6, 1191–1210.de Haan, L. & S. Resnick (1998) On asymptotic normality of the Hill estimator. Stochastic Models

14, 849–867.de Jong, R.M. (1997) Central limit theorems for dependent heterogeneous random variables. Econo-

metric Theory 13, 353–367.de Jong, R.M. & J. Davidson (2000) Consistency of kernel estimators of heteroscedastic and autocor-

related covariance matrices. Econometrica 68, 407–423.Drees, H. (2002) Tail empirical processes under mixing conditions. In H. Dehling, T. Mikosch, &

M. Sørensen (eds.), Empirical Process Techniques for Dependent Data. Birkhauser.Drees, H., A. Ferreira, & L. de Haan (2004) On maximum likelihood estimation of the extreme value

index. Annals of Applied Probability 14, 1179–1201.


Einmahl, J. & T. Lin (2006) Asymptotic normality of extreme value estimators on C[0,1]. Annals ofStatistics 34, 469–492.

Embrechts, P., C. Kluppelberg, & T. Mikosch (1997) Modelling Extremal Events for Insurance andFinance. Springer.

Feller, W. (1971) An Introduction to Probability Theory and Its Applications, vol. 2. Wiley.Gallant, A.R. & H. White (1988) A Unified Theory of Estimation and Inference for Nonlinear Dynamic

Models. Basil Blackwell.Goldie, C.M. & R.L. Smith (1987) Slow variation with remainder: Theory and applications. Quarterly

Journal of Mathematics 38, 45–71.Gorodetskii, V.V. (1977) On the strong mixing property for linear sequences. Theory of Probability

and Its Applications 22, 411–413.Guegan D. & S. Ladoucette (2001) Non-mixing properties of long memory processes. Comptes Ren-

dus de l’Academie des Sciences, Series I, Mathematics 333, 373–376.Haeusler, E. & J.L. Teugels (1985) On asymptotic normality of Hill’s estimator for the exponent of

regular variation. Annals of Statistics 13, 743–756.Hall, P. (1982) On some estimates of an exponent of regular variation. Journal of the Royal Statistical

Society, Series B 44, 37–42.Hall, P. & A.H. Welsh (1985) Adaptive estimates of parameters of regular variation. Annals of Statis-

tics 13, 331–341.Hansen, B. (1992) Consistent covariance matrix estimation for dependent heterogeneous processes.

Econometrica 60, 976–972.Hill, B.M. (1975) A simple general approach to inference about the tail of a distribution. Annals of

Mathematical Statistics 3, 1163–1174.Hill, J.B. (2008b) Gaussian Tests of Extremal White Noise for Dependent, Heterogeneous, Heavy

Tailed Time Series. Working paper, University of North Carolina–Chapel Hill.Hill, J.B. (2008a) Extremal Memory of Stochastic Volatility with Applications to Tail Shape and Tail

Dependence Inference. Working paper, University of North Carolina–Chapel Hill.Hill, J.B. (2008c) Tail and Non-Tail Memory with Applications to Extreme Value and Robust Statis-

tics. Working paper, University of North Carolina–Chapel Hill.Hill, J.B. (2009a) On functional central limit theorems for dependent, heterogeneous arrays with ap-

plications. Journal of Statistical Planning and Inference 139, 2091–2110.Hill, J.B. (2009b) On Tail Index Estimation for Dependent, Heterogeneous Data. Working paper,

University of North Carolina–Chapel Hill.Hsing, T. (1991) On tail index estimation using dependent data. Annals of Statistics 19, 1547–

1569.Hsing, T. (1993) Extremal index estimation for a weakly dependent stationary sequence. Annals of

Statistics 21, 2043–2071.Ibragimov, I.A. (1962) Some limit theorems for stationary processes. Theory of Probability and Its

Applications 7, 349–382.Ibragimov, I.A. & Y.V. Linnik (1971) Independent and Stationary Sequences of Random Variables.

Walters-Noordhoff.Iglesias, E. & O. Linton (2008) Estimation of Tail Thickness Parameters from GARCH Models.

Mimeo, London School of Economics.Leadbetter, M.R. (1974) On extreme values in stationary sequences. Zeitschrift fur Wahrscheinlichkeit-

stheorie und Verwandte Gebiete 28, 289–303.Leadbetter, M.R., G. Lindgren, & H. Rootzen (1983) Extremes and Related Properties of Random

Sequences and Processes. Springer-Verlag.Leibscher, E. (2005) Towards a unified approach for proving geometric ergodicity and mixing

properties of nonlinear autoregressive processes. Journal of Time Series Analysis 26, 669–689.


Mason, D. (1982) Laws of large numbers for sums of extreme values. Annals of Probability 10,754–764.

McLeish, D.L. (1974) Dependent central limit theorems and invariance principles. Annals of Proba-bility 2, 620–628.

McLeish, D.L. (1975) A maximal inequality and dependent strong law. Annals of Probability 3,329–339.

Meitz M. & P. Saikkonen (2008) Stability of nonlinear AR-GARCH models. Journal of Time SeriesAnalysis 29, 453–475.

Newey, W.K. & K.D. West (1987) A simple, positive semi-definite, heteroscedasticity and autocorre-lation consistent covariance matrix. Econometrica 55, 703–708.

Pickands, J. (1975) Statistical-Inference using extreme order statistics. Annals of Statistics 3, 119–131.Quintos, C., Z. Fan, & P.C.B. Phillips (2001) Structural change tests in tail behavior and the Asian

crisis. Review of Economic Studies 68, 633–663.Resnick, S. (1987) Extreme Values, Regular Variation and Point Processes. Springer-Verlag.Resnick, S. & H. Rootzen (2000) Self-Similar communication models and very heavy tails. Annals of

Applied Probability 10, 753–778.Resnick, S. & C. Starica (1995) Consistency of Hill’s estimator for dependent data. Journal of Applied

Probability 32, 139–167.Resnick, S. & C. Starica (1998) Tail index estimation for dependent data. Annals of Applied Probabil-

ity 8, 1156–1183.Rootzen, H. (2009) Weak convergence of the tail empirical function for dependent sequences. Stochas-

tic Processes and Their Applications 119, 468–490.Rootzen, H., M. Leadbetter, & L. de Haan (1990) Tail and Quantile Estimation for Strongly Mixing

Stationary Sequences. Technical report 292, University of North Carolina, Chapel Hill.Segers, J. (2002) Abelian and Tauberian theorems on the bias of the Hill estimator. Scandinavian

Journal of Statistics 29, 461–483.Serfling, R.J. (1968) Contributions to central limit theory for dependent variables. Annals of Mathe-

matical Statistics 39, 1158–1175.Smith, R.L. (1982) Uniform rates of convergence in extreme-value theory. Advances in Applied Prob-

ability 14, 600–622.Smith, R.L. (1987) Estimating tails of probability-distributions. Annals of Statistics 15, 1175–1207.Smith, R.L. & I. Weissman (1994) Estimating the extremal index. Journal of the Royal Statistical

Society, Series B 56, 515–128.Wu, W.B. (2005) Nonlinear system theory: Another look at dependence. Proceedings of the National

Academy of Science USA 102, 14150–14154.Wu, W.B. & M. Min (2005) On linear processes with dependent innovations. Stochastic Processes

and Their Applications 115, 939–958.

APPENDIX A: Proofs of Main Results

The following proofs exploit Lemmas B.1–B.10 in Appendix B. Recall that Umn ,t =(ln(Xt/bmn ))+ − E[(ln(Xt/bmn ))+] and Imn ,t (u) = I (Xt > bmn eu) − P(Xt > bmn eu),u ≥ 0.

Proof of Lemma 1. Under the maintained assumptions and Lemma B.1, {Umn ,t ,n,t }and {Iρmn ,t (u),n,t } for all ρ in an arbitrary neighborhood of 1 form L2-mixingale ar-rays with size 1/2 and constants {e∗

n,t ,en,t (u)} = O((mn/n)1/2). Now define an integersequence {an,t },an,t := t × I (t �= n)+mn × I (t = n) t = 1,2, . . . ,


and note that an,n = mn and an,t → ∞ as t → ∞ ∀n ≥ 1. For some finite K > 0, eachen,t ∈ {e∗

n,t ,en,t (u)} satisfies (e.g., Davidson, 1994, Thm. 2.2.3)

∞∑t=1

(en,t/an,t )2 ≤ K ∑

t �=nt−2 +o(1) < ∞.

Thus ∑nt=1 Umn ,t/an,n

a.s.→ 0 and ∑nt=1 Iρmn ,t (u)/an,n

a.s.→ 0 by Davidson’s (1994, Cor.20.16) generalization of McLeish’s (1975) strong law for L2-mixingales. The weak limit

ln(X([ρm])/bρmn )p→ 0 then follows from arguments in Hsing (1991, p. 1551). n

Proof of Theorem 1.

Claim (i). Weak consistency α−1mn

p→ α−1 under Assumption A.1 follows from Lemma1. See Theorem 2.2 of Hsing (1991).

Claim (ii). Uniform weak consistency supφ∈� |1/mn(φ)∑nt=1 Umn(φ),t | p→ 0 and

supφ∈� |1/mn(φ)∑nt=1 Imn(φ),t (u)

p→ 0 follow instantly from Theorem 3 of Andrews(1992), cf. Davidson (1994, Thm. 21.10), given weak consistency Lemma 1 and LemmaB.3 Lipschitz properties.

The argument for supφ∈� | ln(X(mn(φ)+1)/bmn(φ))| p→ 0 is similar to Hsing’s (1991, p.1551) consistency proof. First, note by subadditivity for any u > 0,

P

(supφ∈�

∣∣∣∣∣ln{

X(mn(φ)+1)

bmn(φ)

}∣∣∣∣∣> u

)≤ P

(∣∣∣∣∣ supφ∈�

ln

{X(mn(φ)+1)

bmn(φ)

}∣∣∣∣∣> u/2

)

+P

(∣∣∣∣∣ supφ∈�

− ln

{X(mn(φ)+1)

bmn(φ)

}∣∣∣∣∣> u/2

).

We will show that the first term on the right-hand side is o(1), the second term being simi-lar. Since (1)–(3) and Assumption B imply (cf. Hsing, 1991, pp. 1553–1554; see especiallySmith, 1982, eqn. 2.2; Goldie and Smith, 1987, Thm. 2.1.1, Cor. 2.2.1) that

n

mn(φ)E[

I(

Xt > bmn(φ)eu/2)]

= e−αu/2 ×(

1+o(

1/mn(φ)1/2))

,

observe by construction

ln

{X(mn(φ)+1)

bmn(φ)

}> u/2 ⇐⇒ 1

mn(φ)

n

∑t=1

I(

Xt > bmn(φ)eu/2)

> 1

⇐⇒ 1

mn(φ)

n

∑t=1

Imn(φ),t (u/2)>1− e−αu/2+o(

1/mn(φ)1/2)

.

Now use supφ∈� |1/mn(φ)∑nt=1 Imn(φ),t (u)| p→ 0, e−αu/2 < 1 and infφ∈� mn(φ) → ∞

under Assumption C to conclude, for some tiny ι > 0,


limn→∞ P

(∣∣∣∣∣ supφ∈�

ln

{X(mn(φ)+1)

bmn(φ)

}∣∣∣∣∣> u/2

)

≤ limn→∞ P

(∣∣∣∣∣ supφ∈�

1

mn(φ)

n

∑t=1

Imn(φ),t (u/2)

∣∣∣∣∣> 1− e−αu/2 −|o (1)|)

≤ limn→∞ P

(supφ∈�

∣∣∣∣∣ 1

mn(φ)

n

∑t=1

Imn(φ),t (u/2)

∣∣∣∣∣> ι

)= 0.

Claim (iii). Consider supφ∈� |α−1mn(φ) − α−1| p→ 0 and define

�Wmn ,t := ln(

Xt/bmn

)× I(

Xt > X(mn+1)) − ln

(Xt/bmn

)+ .

Consistency α−1mn

p→ α−1 under Claim (i), the Lemma B.4 identity

α−1mn

−α−1 = 1

mn

n

∑t=1

{Umn ,t −α−1 Imn ,t

(u/m1/2

n)}

+ 1

mn

n

∑t=1

�Wmn ,t +o(1/m1/2

n), (A.1)

and the Lemma 1 implication 1/mn ∑nt=1

(Umn ,t − α−1 Imn ,t

(u/m1/2

n)) p→ 0 imply that

1/mn ∑nt=1 �Wmn ,t

p→ 0. Andrews’s (1992, Thm. 3) uniform law of large numbers andLemma B.3 Lipschitz properties therefore imply that supφ∈� |1/mn(φ)∑n

t=1 �Wmn(φ),t |p→ 0. The proof now follows from identity (A.1), the Claim (ii) uniform laws, and infφ∈� mn

(φ) → ∞ under Assumption C:

supφ∈�

∣∣∣α−1mn(φ) −α−1

∣∣∣≤ supφ∈�

∣∣∣∣∣ 1

mn(φ)

n

∑t=1

Umn(φ),t

∣∣∣∣∣+ supφ∈�

∣∣∣∣∣ 1

mn(φ)

n

∑t=1

Imn(φ),t (u)

∣∣∣∣∣

+ supφ∈�

∣∣∣∣∣ 1

mn(φ)

n

∑t=1

�Wmn(φ),t

∣∣∣∣∣+o

((inf

φ∈�{mn(φ)}

)−1/2)

p→ 0.

nProof of Lemma 2. Write

Tmn ,t = Tmn ,t

(ω,u/m1/2

n

)= m−1/2

n

[ω1Umn ,t −ω2α−1 Imn ,t (u)

], ω′ω = 1.

Step 1 (NED). Under the maintained assumptions and Lemma B.1, {Umn ,t , Imn ,t (u)}are L2-NED on {n,t } with coefficients ψ∗

n,qn= (mn/n)1/2−1/r ψqn = o

((mn/n)1/2−1/r

q−1/2n

)and constants { f ∗

n,t , f ∗n,t (u)} that satisfy sup1≤t≤n f ∗

n,t = O((mn/n)1/r ) and

sup1≤t≤n supu≥0 f ∗n,t (u) = O

((mn/n)1/r

). Use Minkowski’s inequality and ω′ω = 1


to deduce {Tmn ,t } is L2-NED on {n,t } with coefficients ψ∗n,qn

and constants (Davidson,1994, Thm. 17.8)

dn,t = K m−1/2n max

{f ∗n,t , sup

u≥0f ∗n,t (u)

}

= O(

m−1/2n (mn/n)1/r

)uniformly in 1 ≤ t ≤ n.

Step 2 (Mixingale). Assume that the base {εt } is F-strong mixing with coefficients εn,qn

= o((mn/n)q−r/(r−2)

n). Standard inequalities for mixing random variables carry over to

F-mixing, and distributed lags of F-mixing random variables are F-mixing (Hill, 2009b,Lem. C.1). Therefore Theorem 17.5 of Davidson (1994) applies. For some r > 2,∥∥Tmn ,t − E[Tmn ,t |n,t−qn ]

∥∥2 ≤ max

{∥∥Tmn ,t∥∥

r , dn,t}×max

{6ε

1/2−1/rn,qn ,ψ∗

n,qn

}.

Use ω′ω = 1, Minkowski’s inequality, and the Lemma B.2 moment bounds to deduce that

∥∥Tmn ,t∥∥

r ≤ K m−1/2n

(∥∥Umn ,t∥∥

r + supu≥0

∥∥∥Imn ,t

(u/m1/2

n

)∥∥∥r

)= O

(m−1/2

n (mn/n)1/r)

.

Multiply and divide by n1/2 and rearrange terms,∥∥Tmn ,t − E[Tmn ,t |n,t−qn ]∥∥

2

≤ K n−1/2 × (n/mn)1/2−1/r max{

6ε1/2−1/rn,qn ,ψ∗

n,qn

}

= K n−1/2 ×max{[

(n/mn)εn,qn

]1/2−1/r, (n/mn)1/2−1/r ψ∗

n,qn

}= cn,t ×ψqn ,

say, where ψqn = o(q−1/2

n)

under Assumption B.2 and cn,t = K n−1/2 given F-mixingand E-NED rates.

Analogous arguments apply to the remaining mixingale inequality ||Tmn ,t − E[Tmn ,t

|t+qnn,−∞]||2 ≤ cn,tψqn+1 (e.g., Davidson, 1994; eqn. 17.19) and to the F-uniform mixing

case. n

Remark A.1. Notice that ||Tmn ,t − E[Tmn ,t |n,t−qn ]||2 ≤ o(n−1/2q−1/2n ) requires

the F-mixing coefficients to satisfy (n/mn)qr/(r−2)n εn,qn → 0. In general, therefore, qn

→ ∞ must hold to ensure limn→∞ εn,qn = 0, since n/mn → ∞. An obvious exception isεn,q = 0 uniformly in n and q (e.g., the base En,t is independent).

Proof of Lemma 3. The proof exploits Lemma 2: {Tmn ,t (ω,u/m1/2n ),n,t } forms an

L2-mixingale array with coefficients ψqn = o(q−1/2n ) and constants cn,t = K n−1/2. Note

by McLeish’s (1975) bound for L2-mixingales with size 1/2,

supω′ω=1

σ 2mn

(ω) = supω′ω=1

E

(n

∑t=1

Tmn ,t (ω,u/m1/2n )

)2

= O

(sup

ω′ω=1

n

∑t=1

c2n,t

)= O(1).

(A.2)


Step 1 (∑nt=1 Tmn ,t/σmn (ω) =⇒ N (0,1)). Write Tmn ,t := Tmn ,t (ω,u/m1/2

n ). We willshow conditions (a)–(f) of de Jong’s (1997) Lemma 1 central limit theorem hold, repli-cated for reference in Lemma B.5. De Jong’s argument exploits the following real-valuedsequences {kn, ln,rn} and Bernstein blocks {Zn,i , Ln,i }rn

i=1:

kn/n → 0, kn = o(m1/4)n ), rn = [n/kn] where kn,rn → ∞ as n → ∞

1 ≤ ln ≤ kn −1 ≤ n −1 where ln/kn → 0 and ln → ∞ as n → ∞ (A.3)

and

Zn,i :=ikn

∑t=(i−1)kn+ln+1

Tmn ,t and Ln,i =(i−1)kn+ln

∑t=(i−1)kn+1

Tmn ,t . (A.4)

By construction, ∑nt=1 Tmn ,t obtains the decomposition

n

∑t=1

Tmn ,t =rn

∑i=1

Zn,i +rn

∑i=1

Ln,i + Rn for some remainder Rn .

De Jong’s (1997) construction rn = [n/kn] (cf. Davidson, 1992), renders Rn = op(1). Thesequences kn and ln regulate the amount of information in and between the blocks Ln,iand Zn,i in such a way that ∑rn

i=1 Ln,i = op(1) is also asymptotically negligible. Finally,under the stated conditions {Zn,i }rn

i=1 is approximable by a martingale difference array thatsatisfies McLeish’s (1974, Thm. 2.1) central limit theorem (cf. Lemma 1 of de Jong, 1997).

Note that kn = o(m1/4)n ) is always possible and merely expedites the proof.

Define a σ -subfield associated with the mixing functional En,t

Fn,i := σ({En,τ : τ ≤ ikn}).Condition (a). Minkowski’s inequality and the Lemma B.2 moment bounds imply

∥∥Tmn ,t∥∥

2 ≤ K m−1/2n

(∥∥Umn ,t∥∥

2 + supu≥0

∥∥∥Imn ,t (u/m1/2n )∥∥∥

2

)

= O(m−1/2n (mn/n)1/2) = O(n−1/2).

Now use Minkowski’s inequality again and rnkn − n → 0 to deduce∥∥∥∥∥n

∑t=rnkn+1

Tmn ,t

∥∥∥∥∥2

≤n

∑t=rnkn+1

∥∥Tmn ,t∥∥

2 ≤ (n − rnkn) K n−1/2 = o(1).

Chebyshev’s inequality completes the proof: ∑nt=rnkn+1 Tmn ,t

p→ 0.

Condition (b). The mixingale property and McLeish’s (1975) bound imply

E

(rn

∑i=1

(i−1)kn+ln

∑t=(i−1)kn+1

Tmn ,t

)2

= O

(rn

∑i=1

(i−1)kn+ln

∑t=(i−1)kn+1

c2n,t

)

= O(rnlnn−1) = O(ln/kn) = o(1).


Condition (c). Define the index set

An,t ={

t : t ∈rn⋃

i=1

[(i −1)kn + ln +1, ikn]

}.

Analogous to de Jong’s (1997, A.7–A.12) argument, for t ∈ An,t it can be shown that{E[Tmn ,t |Fn,i−1],n,t } forms an L2-mixingale array with constants (i.e., de Jong’s ”index

numbers”) cn,tψιln

and coefficients ψ1−ιln

satisfying ψ1−ιln

= o(l−1/2n ) for sufficiently tiny

ι > 0. Thus, by McLeish’s (1975) bound and ln → ∞ as n → ∞,

E

(rn

∑i=1

E[

Zn,i |Fn,i−1

])2

= E

(rn

∑i=1

ikn

∑t=(i−1)kn+ln+1

E[Tmn ,t |Fn,i−1

])2

= O

(rn

∑i=1

ikn

∑t=(i−1)kn+ln+1

c2n,tψ

2ιln

)

= O(

rnknn−1l−ιn

)= O(l−ι

n ) = o(1).

Condition (d). The argument here mimics the verification of Condition (c).

Condition (e). Analogous to de Jong (1997, A.13–A.17) and Condition (c),

∥∥∥∥∥rn

∑i=1

Z2n,i −

rn

∑i=1

(E[

Zn,i |Fn,i

]− E

[Zn,i |Fn,i−1

])2∥∥∥∥∥

1

≤ 3rn

∑i=1

∥∥∥Zn,i −(

E[Zn,i |Fn,i ]− E[Zn,i |Fn,i−1])∥∥∥

2×∥∥Zn,i

∥∥2

= O

⎛⎝ rn

∑i=1

(ikn

∑t=(i−1)kn+ln+1

c2n,tψ

2ιln

)1/2( ikn

∑t=(i−1)kn+ln+1

c2n,t

)1/2⎞⎠

= O

(rn

(knn−1l−ι

n

)1/2(knn−1

)1/2)

= O(l−ι/2n ) = o(1).

Now apply Chebyshev’s inequality and ∑rni=1 Z2

n,i /σ2mn

(ω)p→ 1 by Lemma B.7.

Condition (f). Define Wn,i := E[Zn,i |Fn,i ] − E[Zn,i |Fn,i−1]. We require the Linde-berg condition ∑rn

i=1 E[W 2n,i I (|Wn,i | > ε)] → 0 for any ε > 0. By the same reasoning as

Condition (a) and the conditional Jensen’s inequality, ∀r ≥ 1,

∥∥Wn,i∥∥

r ≤ 2∥∥Zn,i

∥∥r ≤ 2

ikn

∑t=(i−1)kn+ln+1

∥∥Tmn ,t∥∥

r = O(

knm−1/2n (mn/n)1/r

).


Therefore, ∀p,s ≥ 0, 1/p + 1/s = 1, and all ε > 0, under Holder’s and Markov’sinequalities

max1≤i≤rn

rn E[W 2

n,i I (|Wn,i | > ε)]

≤ K max1≤i≤rn

{rn∥∥Wn,i

∥∥22p ×∥∥Wn,i

∥∥s

}

= O(

rnk2nm−1

n (mn/n)1/p × knm−1/2n (mn/n)1/s

)

= O(

k2nm−1/2

n

)= o(1),

where the last line exploits kn = o(m1/4)n ) in (A.3).

Step 2 (m1/2n ln(X(mn+1)/bmn )/σmn (0,1) =⇒ N (0,1)). Use Step 1 and a Cramer-

Wold device to deduce

α−1 1

m1/2n

n

∑t=1

Imn ,t (u/m1/2n )/σmn (0,1) =⇒ N (0,1), (A.5)

where σ 2mn

(0,1) = E(α−1m−1/2n ∑n

t=1 Imn ,t (u/m1/2n ))2 = O(1) by construction of σ 2

mn

(ω1,ω2) in (9) and bound (A.2). It is straightforward to show that (A.5) implies m1/2n ln

(X(mn+1)/bmn )/σmn (0,1) =⇒ N (0,1) (Hsing, 1991, Thm 2.4) n

Proof of Theorem 2. Lemma 3 and a Cramer-Wold device suffice to prove

(1

m1/2n

n

∑t=1

Umn ,t

σmn (1,0), α−1 1

m1/2n

n

∑t=1

Imn ,t (u/m1/2n )

σmn (0,1)

)=⇒ (Z1, Z2) (A.6)

for some random vector (Z1, Z2) with marginal distributions Zi ∼ N (0,1), where σ 2mn

(ω1,

ω2) = E(∑nt=1 Tmn ,t (ω,u/m1/2

n ))2 and Tmn ,t (ω,u) = 1/m1/2n [ω1Umn ,t + ω2α−1 Imn ,t

(u)]. Therefore, by the continuous mapping theorem,

1

m1/2n

n

∑t=1

(Umn ,t −α−1 Imn ,t (u/m1/2n ))/σmn (1,−1)

= σmn (1,0)

σmn (1,−1)

1

m1/2n

n

∑t=1

Umn ,t

σmn (1,0)− σmn (0,1)

σmn (1,−1)α−1 1

m1/2n

n

∑t=1

Imn ,t (u/m1/2n )

σmn (0,1)

=⇒(

limn→∞

σmn (1,0)

σmn (1,−1)

)Z1 −

(lim

n→∞σmn (0,1)

σmn (1,−1)

)Z2 ∼ N (0,1). (A.7)

Now exploit the Theorem 1 assertion ln(X([ρm])/bρmn )p→ 0 for all ρ in a neighborhood

of 1, (A.6), and (A.7), and arguments identical to Hsing’s (1991, pp. 1553–1554) under taildecay Assumption B to conclude that


m1/2n

(α−1

mn−α−1

)/σmn (1,−1)

=⇒(

limn→∞

σmn (1,0)

σmn (1,−1)

)Z1 −

(lim

n→∞σmn (0,1)

σmn (1,−1)

)Z2 ∼ N (0,1).

Since σ 2mn

:= E(m1/2n (α−1

mn− α−1))2, it follows instantly that |σ 2

mn(1,−1) − σ 2

mn| p→ 0.

n

Proof of Theorem 3. Lemmas B.8 and B.9 together imply |σ 2mn

− σ 2mn

(1,−1)| p→ 0,

and by Theorem 2, |σ 2mn

(1,−1) − σ 2mn

| p→ 0. The claim |σ 2mn

− σ 2mn

| p→ 0 now followsfrom the triangular inequality. n

Proof of Lemma 4.

Claim (i). Let {Xt } be L p-NED on {n,t }. For any ηn > 0 to be defined below (I wouldlike to thank an anonymous referree for insights into the proof of Lemma 4):

E(

I(

Xt > bmn eu)− E[I(

Xt > bmn eu) |t+qnn,t−qn

])2

≤ E

[(I(

Xt > bmn eu)− I (E[Xt |t+qnn,t−qn

] > bmn eu))2

×I(∣∣∣Xt − E[Xt |t+qn

n,t−qn]∣∣∣≤ ηn

)]

+E

[(I(

Xt > bmn eu)− I (E[Xt |t+qnn,t−qn

] > bmn eu))2

×I(∣∣∣Xt − E[Xt |t+qn

n,t−qn]∣∣∣> ηn

)]

≤ E[I(bmn eu −ηn < Xt < bmn eu +ηn

)]+ P(∣∣∣Xt − E[Xt |t+qn

n,t−qn]∣∣∣> ηn

)

≤ [Ft(bmn eu −ηn

)− Ft(bmn eu +ηn

)]+∥∥∥Xt − E[Xt |t+qnn,t−qn

]∥∥∥p

p/η

pn

≤ [Ft(bmn eu −ηn

)− Ft(bmn eu +ηn

)]+d pn,tϑ

pqn /η

pn . (A.8)

The first inequality is due to the conditional expectations minimizing the mean squarederror, and a trivial identity. The second follows from basic logic and a trivial inequalitythat exploits the indicator function. The third follows from Markov’s inequality, and thefourth from L p-NED, where ϑqn = o(q−λ

n ).

Define ϑ := supq≥1 ϑq ∈ [0,1) and put ηn = bmn euϑ1/2qn . Under Assumption B,

Ft(bmn

) = (mn/n) ×(

1+o(1/m1/2

n))

and Ft(bmn zn,t

)/Ft(bmn

) = a−αn,t ×(

1+o(

1/m1/2n

))for any array of nonstochastic positive real numbers {an,t }, an,t ≥ 1


(cf. Hsing, 1991, p. 1553). Therefore[Ft(bmn eu −ηn

)− Ft(bmn eu +ηn

)]+d pn,tϑ

pqn /η

pn

= Ft(bmn

) Ft

(bmn eu

(1−ϑ

1/2qn

))Ft(bmn

) − Ft(bmn

) Ft

(bmn eu(1+ϑ

1/2qn )

)Ft(bmn

)+ b−p

mn d pn,t e−upϑ

p/2qn

= (mn/n)e−αu[(1−ϑ

1/2qn )−α − (1+ϑ

1/2qn )−α

](1+o(1/m1/2

n ))

+ b−pmn d p

n,t e−upϑp/2qn

≤ K (mn/n)e−αuϑ1/2qn +b−p

mn d pn,t e−upϑ

p/2qn

≤ K ×max{mn/n, b−pmn }× e−up

(1+d p

n,t

)×ϑ

min{p,1}/2qn , (A.9)

where the second inequality exploits p < α, and the first follows from the mean valuetheorem:

(1−ϑ1/2qn )−α − (1+ϑ

1/2qn )−α ≤ α2(1− ϑ1/2)−α−1ϑ

1/2qn ≤ Kϑ

1/2qn .

If limx→∞ L(x) = K > 0, it is easy to show that b−pmn = K (mn/n)p/α ≥ K (mn/n)

from (3) and p < α. Together (A.8), (A.9), and ϑqn = o(q−λn ) imply that∥∥∥I

(Xt > bmn eu)− P

(Xt > bmn eu |t+qn

n,t−qn

)∥∥∥2

≤{

e−up/2(

1+d pn,t

)1/2(mn/n)p/2α

}×o(

q−λmin{p,1}/4n

).

Now suppose p = α − ι, supn≥1 sup1≤t≤n dn,t ≤ K , and λ ≥ 1/min{1, p/2}. Then theright-hand side is bounded by

K (mn/n)1/2e−up/2 ×o((n/mn)ιq−λmin{p,1}/4

n

)

={

e−up/2(mn/n)1/2}

×o((n/mn)ιq−1/2

n

)= fn,t (u)×ψqn ,

where sup1≤t≤n fn,t (u) = e−up/2(mn/n)1/2 is Lebesgue integrable on R+. As long as

n/mn = o(qδn) for some δ > 0, then for sufficiently tiny ι > 0, ψqn = o((n/mn)ιq−1/2

n ) =o(q−1/2

n ).

Claim (ii). See Hill (2008c). n

Proof of Lemma 5. In lieu of Lemma 4, we need only prove that {Xt } is Lα−ι-NEDon {Fn,t } with size λ ≥ 1/min{1, p/2} and uniformly bounded constants dn,t ≤ K . Recall

En,t = [εt−i ][qn/2]i=0 for t = 1, . . . ,n and 0 otherwise. Since t

n,−∞ = tn,1 ⊆ Gt−∞ and


+∞n,t+qn

= nn,t+qn

⊆ G+∞t+[qn/2], it is easy to show the strong mixing property implies that

εt is F-strong mixing size with r/(r − 2), r > 2.Recall α > 1, note that supt∈Z ||εi ||α−ι ≤ K for tiny ι > 0 by stationarity, and by

construction t+qnn,t−qn

= σ(ετ : max{1− [qn/2], t − qn − [qn/2]} ≤ τ ≤ min{t + qn,n}).Use σ(ετ : τ ≤ t − i)-measurability of πt,i , supt∈Z |πt,i | ≤ |πi | = O(i−μ) for some μ >1/min{1, p/2} by the stipulations of Example 4 and Minkowsi’s and conditional Jensen’sinequalities to deduce∥∥∥xt − E

[xt |t+qn

n,t−qn

]∥∥∥α−ι

≤∞∑

i=[qn/2]

∥∥∥πt,i εt−i − E[πt,i εt−i |{ετ }min{t+qn ,n}

max{1−[qn/2],t−qn−[qn/2]}]∥∥∥

α−ι

≤ K∞∑

i=[qn/2]

∥∥πt,i εt−i∥∥α−ι ≤ K

∞∑

i=[qn/2]|πi | = O

(q−μ

n

).

Therefore ||xt − E[xt |t+qnn,t−qn

]||α−ι ≤ dn,t × o(

q−λn

)for dn,t = K and λ ≥ 1/min

{1, p/2}. n

Proof of Lemma 6. See Hill (2008c). n

Proof of Lemma 7.

Step 1 (Xt ∼ (2)). Use εt ∼ (2) with index α, the convolution tail property of {εt } and∑∞

i=0 παi < ∞ to deduce, as z → ∞,

P (Xt > z) = P

( ∞∑i=0

πi εt−i > z

)∼

∞∑i=0

παi P(εt−i > z

)=∞∑i=0

παi × z−α L (z) .

Therefore Xt ∼ (2) with index α. Further, since by construction of {mn,bmn }

limn→∞

n

mnP(

Xt > bmn

)=∞∑i=0

παi lim

n→∞n

mnP(εt−i > bmn

)= 1,

identical distributedness implies (n/mn)P(εt > bmn ) ∼ (∑∞i=0 πα

i )−1.

Step 2 (L2-E-NED). For notational clarity assume qn < t . A similar argument appliesfor all 1 ≤ t ≤ n. By iterated expectations and the Cauchy-Schwartz inequality,

E(

I (Xt > bmn eu)− P(Xt > bmn eu |t+qnn,t−qn

))2

= P(

Xt > bmn eu)−2E[

I (Xt > bmn eu)P(Xt > bmn eu |t+qnn,t−qn

)]

+ E

[P(


)2]

= P(

Xt > bmn eu)− E

[P(


)2]


= E[

P(


)×(

1− P(


))]

≤∥∥∥P(

Xt > bmn eu)|t+qnn,t−qn

)∥∥∥2×∥∥∥(1− P

(Xt > bmn eu |t+qn

n,t−qn

))∥∥∥2

≤∥∥∥P(


)∥∥∥2. (A.10)

Let ε∗t,a denote a random draw from the distribution governing ∑a

i=0 πi εt−i , a ∈ N, andnote πi ≥ 0 and εt ≥ 0 a.s. ∀t imply ε∗

t,a ≥ 0 a.s. An argument similar to Step 1, and theproof of Lemma 5, reveals as n → ∞,

P(


)= P

( ∞∑

i=qn+[qn/2]+1πi εt−i > bmn eu − ε∗

t,qn+[qn/2]

)

≤(

1−ε∗

t,qn+[qn/2]

bmn eu

)−α ∞∑

i=qn+1πα

i P(εt−i > bmn eu) . (A.11)

Since ∑∞i=0 πα

i < ∞, and (n/mn)P(εt > bmn ) ∼ (∑∞i=0 πα

i )−1 by Step 1, for every ε > 0and a ∈ N,

P(ε∗

t,a/bmn > ε) ≤ P

( ∞∑i=0

πi εt−i > ε ×bmn

)∼ ε−α

∞∑i=0

παi P(εt−i > bmn

)

= O(mn/n),

hence ε∗t,qn+[qn/2]/bmn

p→ 0.

Now use (A.11), Minkowski’s inequality, (n/mn)P(εt > bmn ) ∼ (∑∞i=0 πα

i )−1, and

||(1 − ε∗t,qn

/bmn eu)−α ||2 p→ 1 by ε∗t,qn

/bmn

p→ 0 and the Helly-Bray theorem to deduce

limn→∞

n

mn

∥∥∥P(


)∥∥∥2

≤ limn→∞

∥∥∥∥∥∥(

1−ε∗

t,qn+[qn/2]

bmn eu

)−α∥∥∥∥∥∥

2

×∞∑

i=qn+1|πi |α n

mnP(εt−i > bmn

)× e−αu

= limn→∞

∞∑

i=qn+1|πi |α n

mnP(εt−i > bmn

)e−αu =

∞∑

i=qn+1|πi |α

( ∞∑i=0

|πi |α)−1

e−αu.

(A.12)

Together, (A.10) and (A.12) imply, for any r > 2,∥∥∥I(

Xt > bmn eu)− P(


)∥∥∥2

≤{

e−αu/2(mn

n

)1/2}

×⎧⎨⎩(

∑∞i=qn+1 πα

i

∑∞i=0 πα

i

)1/2⎫⎬⎭= fn,t (u)×ψqn ,


say, where sup1≤t≤n fn,t (u) = e−αu/2(mn/n)1/2 is Lebesgue integrable on R+, and ψqn

∈ [0,1]. n

Proof of Lemma 8. The tail of Xt = ∑∞j=0 β j ε

( j)t = εt + ∑∞

j=1 β j ε( j)t = εt + X∗

t is

dominated by X∗t ∼ (2) with index α/2 (cf. Davis and Resnick, 1996), hence it suffices to

demonstrate that X∗t satisfies Lemma 7. Since εt is i.i.d., straightforward generalizations

of Corollaries 2.3 and 2.4 of Davis and Resnick (1996) reveal P(β j ε( j)t > x) ∼ β jα/2(E

|εt |α/2) j−1 P(ε2t > x) for each j ≥ 1 and P(X∗

t > x) ∼ ∑∞j=1 β jα/2(E |εt |α/2) j−1 P(ε2

t

> x). But this implies that{β j ε

( j)t}∞

j=1 has the same tail behavior as some stochastic

sequence{β jα/2(E |εt |α/2) j−1zt− j

}∞j=1 where limx→∞ P(zt− j > x)/P(ε2

t > x) = 1

for all j ∈ N and {zt− j }∞j=1 has the convolution tail property P(∑∞j=1 aj zt− j > x) ∼

∑∞j=1 P(ai εt−i > x) for any sequence of real numbers {ai }, ∑∞

i=0 |ai |α < ∞. Therefore

X∗t satisfies the conditions of Lemma 7. n

APPENDIX B: Supporting Lemmas B1–B10

Let ρ be any number in an arbitrary neighborhood of 1, and write Tmn ,t := Tmn ,t (ω,

u/m1/2n ). Lemmas B.1 and B.2 characterize moment and memory properties of the tail

arrays {Umn ,t , Imn ,t (u)}, where Umn ,t := (ln(Xt/bmn ))+ − E[[(ln(Xt/bmn ))+] and Imn ,t(u) := I (Xt > bmn eu) − P(Xt > bmn eu), u ≥ 0.

LEMMA B.1.

(i) Under Assumption A.1, {Umn ,t ,n,t } and {Iρmn ,t (u),n,t } form L2-mixingale ar-rays with common coefficients ϕqn and constants {e∗

n,t ,en,t (u)}= O((mn/n)1/2),

where e∗n,t = ∫∞

0 en,t (u)du, provided en,t (u) is Lebesgue integrable on R+.

(ii) Under Assumption A.2, {Umn ,t , Iρmn ,t (u)} are L2-NED on {n,t } with commoncoefficients ψ∗

n,qn= (mn/n)1/2−1/r ψqn and constants { f ∗

n,t , f ∗n,t (u)}, where

f ∗n,t (u) = (n/mn)1/2−1/r fn,t (u) and f ∗

n,t = K (n/mn)1/2−1/r ∫∞0 fn,t (u)du,

provided fn,t (u) is Lebesgue integrable on R+. In particular, sup1≤t≤n f ∗n,t and

sup1≤t≤n supu≥0 f ∗n,t (u) are O((mn/n)1/r ).

LEMMA B.2. The tail arrays {Umn ,t } and {Iρmn ,t (u)} are Lr -bounded for any r ≥ 1:

limn→∞

(n

mn

)1/r ∥∥Iρmn ,t (u)∥∥

r ≤ Ar (u) < ∞ and

limn→∞

(n

mn

)1/r ∥∥Umn ,t∥∥

r ≤ Br < ∞,

where Ar :R→R+ is p-integrable with respect to Lebesgue measure on R+ for any p > 0,and uniformly bounded on R+. In particular, supu≥0

∥∥Iρmn ,t (u)∥∥

r = O((mn/n)1/r ).

Define

�Wmn ,t := ln(

Xt/bmn

) × I(

Xt > X(mn+1))− ln

(Xt/bmn

)+ .


Lemmas B.3 and B.4 establish key Lipschitz properties and a decomposition for provingα−1

mnis uniformly consistent for α−1.

LEMMA B.3. Define m∗ := infφ∈� mn(φ) and let Assumptions A.1 and B hold. For

each ymn ∈{1/mn ∑nt=1 Umn ,t ,1/mn ∑n

t=1 Imn ,t(1,u/m1/2

n), 1/mn ∑n

t=1 �Wmn ,t , α−1mn

}there exists a stochastic array {Bn,t } that is not a function of φ ∈ � and that satisfies1/m∗ ∑n

t=1 E[Bn,t ] = O(1), such that |ymn(φ) − ymn(φ′)| ≤ 1/m∗ ∑nt=1 Bn,t × |φ −

φ′| a.s. for all φ,φ′ ∈ �.

LEMMA B.4. Under Assumptions A.1 and B,

α−1mn

−α−1 = 1

mn

n

∑t=1

{Umn ,t −α−1 Imn ,t

(u/m1/2

n

)}+ 1

mn

n

∑t=1

�Wmn ,t +o(

1/m1/2n

)

where o(

1/m1/2n

)is deterministic.

LEMMA B.5. Let {Xn,t } be a mean-zero stochastic array with σn := ||∑nt=1 Xn,t ||2 > 0

uniformly in n. Define Zn,i := ∑iknt=(i−1)kn+ln+1 Xn,t and Fn,i := σ({En,τ (an) : τ ≤ ikn})

and let the sequences {ln,kn,rn} be as in (A.3). Then ∑nt=1 Xn,t/σn =⇒ N (0,1) under the

following conditions:

(a) ∑nt=rnkn+1 Xn,t

p→ 0,

(b) ∑rni=1 ∑(i−1)kn+ln

t=(i−1)kn+1 Xn,tp→ 0,

(c) ∑rni=1 E[Zn,i |Fn,i−1]

p→ 0,

(d) ∑rni=1(Zn,i − E[Zn,i |Fn,i−1])

p→ 0,

(e) ∑rni=1(E[Zn,i |Fn,i ] − E[Zn,i |Fn,i−1])2/σn

p→ 1,

(f) ∑rni=1 E[W 2

n,i I (|Wn,i | > ε)]p→ 0 ∀ ε > 0, where Wn,i := E[Zn,i |Fn,i ]− E[Zn,i |

Fn,i−1].

LEMMA B.6. If {Tmn ,t ,n,t } forms an L2-mixingale array with size 1/2 and constants

cn,t , sup1≤t≤n cn,t = O(

n−1/2)

, then for the sequences {ln, kn,rn} defined in (A.3),

limn→∞

∣∣∣∣∣rn

∑i=1

rn

∑j=i+1

ikn

∑t=(i−1)kn+ln+1

jkn

∑s=( j−1)kn+ln+1

E[Tmn ,s Tmn ,t

]∣∣∣∣∣= 0.

Recall Zn,i = ∑iknt=(i−1)kn+ln+1 Tmn ,t .

LEMMA B.7. Under Assumptions A.2 and B, ∑rni=1(Z2

n,i − E[Z2n,i ])

p→ 0 and ∑rni=1

Z2n,i /σ

2mn

(ω)p→ 1.

Compactly write the kernel function ws,t,n := w((s − t)/γn) from Theorem 3, and

σ 2mn

:= 1

mn

n

∑s,t=1

wn,s,t Ymn ,sYmn ,t , where Ymn ,t := Umn ,t − mn

nln

(X(mn+1)

bmn

).


LEMMA B.8. Under the conditions of Theorem 3, |σ 2mn

− σ 2mn

| p→ 0.

LEMMA B.9. Under the conditions of Theorem 3, |σ 2mn

− σ 2mn

(1,−1)| p→ 0.

LEMMA B.10. Under the conditions of Theorem 3, {m−1/2n Ymn ,t ,n,t } forms an L2-

mixingale array with O(n−1/2)-constants and size 1/2.

Proof of Lemma B.1. We will prove the E-NED assertion, the E-MIXL proof be-ing similar. Since {ρmn} forms an intermediate order sequence, under Assumption A.2{Iρmn ,t (u)} is by construction L2-NED on {n,t }: ||Iρmn ,t (u)−E[Iρmn ,t (u)|t+qn

n,t−qn]||2 ≤

{(n/mn)1/2−1/r fn,t (u)} × {(mn/n)1/2−1/r ψqn } = f ∗n,t (u)ψ∗

n,qn, say, where the

claimed properties of f ∗n,t (u) and ψ∗

n,qnfollow from Assumption A.2.

Now consider Umn ,t , define Pn,t (u) := I (Xt > bmn eu) − P(Xt > bmn eu |t+qnn,t−qn

),invoke Assumption A.2, and let the E-NED constants fn,t (u) be Lebesgue integrable onR+. Then

∥∥∥Umn ,t − E[Umn ,t |t+qnn,t−qn

]∥∥∥

2

=∥∥∥(ln(Xt/bmn

))+ − E[

(ln(

Xt/bmn

))+ |t+qn

n,t−qn]∥∥∥

2

=[

E

(∫ ∞0

[I (Xt > bmn eu)− P

(Xt > bmn eu |t+qn

n,t−qn

)]du

)2]1/2

=[

E∫ ∞

0

∫ ∞0

Pn,t (u1)Pn,t (u2)du1du2

]1/2

=[∫ ∞

0

∫ ∞0

E[

Pn,t (u1)Pn,t (u2)]

du1du2

]1/2

≤[∫ ∞

0

∫ ∞0

∥∥Pn,t (u1)∥∥

2

∥∥Pn,t (u2)∥∥

2 du1du2

]1/2

=∫ ∞

0

∥∥Pn,t (u)∥∥

2 du =∫ ∞

0

∥∥∥I (Xt > bmn eu)− P(


)∥∥∥2

du

≤(∫ ∞

0fn,t (u)du

)×ψqn =

(∫ ∞0

f ∗n,t (u)du

)×ψ∗

n,qn= f ∗

n,t ×ψ∗n,qn

,

say. The second equality follows from the identity (ln(Xt/bm))+ = ∫∞0 I (Xt > bmn eu)du

and the Fubini-Tonelli theorem: E[(ln(Xt/bmn ))+|t+qnn,t−qn

] = E[∫∞

0 I (Xt > bmn eu)du

|t+qnn,t−qn

] = ∫∞0 P(Xt > bmn eu |t+qn

n,t−qn). The fourth equality follows from the Fubini-

Tonelli theorem. The first inequality is Cauchy-Schwartz’s. The last inequality followsfrom Step 1 and Lebesgue integrability of fn,t (u). The asserted properties of f ∗

n,t =∫∞0 f ∗

n,t (u)du = (n/mn)1/2−1/r ∫∞0 fn,t (u)du follow from Assumption A.2. n


Proof of Lemma B.2. Use (1)–(3) to deduce for any u ∈ R, any ρ in an arbitraryneighborhood of 1, and any r ≥ 1,

limn→∞

(n

mn

)1/r||Iρmn ,t (u)||r ≤ 2 lim

n→∞

(n

mn

)1/rP(

Xt > bρmn eu)1/r

= 2 limn→∞

[n

mnP(

Xt > bρmn

) P(

Xt > bρmn eu)P(

Xt > bρmn

)]1/r

= 2ρ1/r e−αu/r =: Ar (u) < ∞.

Trivially, supu≥0 Ar (u) ≤ K < ∞,∫∞

0 Ar (u)pdu ≤ K∫∞

0 e−αup/r du < ∞ for any p > 0.Similarly, for any r ≥ 1 it is easy to show under (1)–(3) (e.g., Hsing, 1991, eqn. 1.5) that

limn→∞

(n

mn

)1/r||Umn ,t ||r ≤ 2 lim

n→∞(n/mn)1/r∥∥∥(ln(Xt/bmn

))+∥∥∥

r

= 2

(∫ ∞0

e−αu1/rdu

)1/r=: Br < ∞.

n

Proof of Lemma B.3. See Hill (2009b, Lem. B.3). n

Proof of Lemma B.4. Write

α−1mn

−α−1 = 1

mn

mn

∑j=1

ln

(X( j)

bmn

)− E

[1

mn

n

∑t=1

(ln

(Xt

bmn

))+

]

− ln

(X(mn+1)

bmn

)+(

E

[1

mn

n

∑t=1

(ln

(Xt

bmn

))+

]−α−1

). (B.1)

Under Assumption B, the last term satisfies (Hsing, 1991, p. 1554)

E

[1

mn

n

∑t=1

(ln

(Xt

bmn

))+

]−α−1 = o(1/m1/2

n ),

and by construction, the first term can be written

1

mn

mn

∑j=1

ln

(X( j)

bmn

)= 1

mn

n

∑t=1

(ln

(Xt

bmn

))+

+ 1

mn

n

∑t=1

�Wt,mn ,

where �Wt,mn := ln(Xt/bmn ) × I (Xt > X(mn+1)) − (ln(X ˙t/b˙m˙n)) +.Further, it is easy to show that the third term in (B.1) satisfies, for all u ∈ R,

m1/2n ln

(X(mn+1)

bmn

)= u ⇐⇒ α−1 1

m1/2n

n

∑t=1

Imn ,t (u/m1/2n ) = u +o(1)


and deterministic o(1) (Hsing, 1991, p. 1553). Therefore

α−1mn

−α−1 = 1

mn

n

∑t=1

(ln

(Xt

bmn

))+

− E

[1

mn

n

∑t=1

(ln

(Xt

bmn

))+

]

+ 1

mn

n

∑t=1

ln

(Xt

bmn

)× [I (Xt > X(mn+1)

)− I(

Xt > bmn

)]

−α−1 1

mn

n

∑t=1

Imn ,t (u/m1/2n )+o(1/m1/2

n )

= 1

mn

n

∑t=1

(Umn ,t −α−1 Imn ,t (u/m1/2

n ))

+ 1

mn

n

∑t=1

�Wt,mn +o(1/m1/2n ). n

Proof of Lemma B.5. See de Jong (1997, Lem. 1). n

Proof of Lemma B.6. See de Jong (1997, Lem. 4). n


Proof of Lemma B.8. Recall Ymn ,t := Umn ,t − (mn/n) ln(X(mn+1)/bmn ), write wn,s,t:= w(|s − t |/γn), define

An,t :=(

ln

(Xt

X(mn+1)

))+

−(

ln

(Xt

bmn

))+

+ mn

nln

(X(mn+1)

bmn

)

Bn := mn

n×{

n

mn

(E

[(ln

(Xt

bmn

))+

]−α−1

)+(α−1

mn−α−1

)},

and decompose σ 2mn

= σ 2mn

+ Rn , where

σ 2mn

= 1

mn

n

∑s,t=1

wn,s,t Ymn ,sYmn ,t ,

Rn = 1

mn

n

∑s,t=1

wn,s,t An,s An,t + B2n

1

mn

n

∑s,t=1

wn,s,t +21

mn

n

∑s,t=1

wn,s,t An,sYmn ,t

+ 2Bn1

mn

n

∑s,t=1

wn,s,t Ymn ,t +2Bn1

mn

n

∑s,t=1

wn,s,t An,t .

We need only show that ‖Rn‖1 = o(1).By cases it is easy to show that |An,t | ≤ | ln(X(mn+1)/bmn )|, and Assumption B implies

m1/2n

{n

mnE

[(ln

(Xt

bmn

))+

]−α−1

}= o(1). (B.2)

Now apply Lemma 3 and Theorem 2 to deduce respectively that

∥∥An,t∥∥

2 ≤∥∥∥∥ln

(X(mn+1)

bmn

)∥∥∥∥2

= O(m−1/2n ) and ‖Bn‖2 = O(m1/2

n /n). (B.3)


Similarly, Lemma 3 and the Lemma B.2 moment bounds imply that

∥∥Ymn ,t∥∥

2 ≤ ∥∥Umn ,t∥∥

2 + mn

n

∥∥∥∥ln

(X(mn+1)

bmn

)∥∥∥∥2

= O((mn/n)1/2). (B.4)

Finally, by supposition,

1

mn

n

∑s,t=1

∣∣wn,s,t∣∣= o (γnn/mn) = o(n1/2). (B.5)

Together (B.2)–(B.5), the Minkowski and Cauchy-Schwartz inequalities, and mn/n1/2 →∞ by supposition give

‖Rn‖1 = o(n1/2)×{

O(m−1n )+ O(mn/n2)+ O(n−1/2)+ O(mn/n3/2)+ O(n−1)

}

= o(n1/2/mn)+ O(mn/n3/2)+o(1)+ O(mn/n)+ O(n−1/2) = o(1). n

Proof of Lemma B.9. Write Imn ,t := Imn ,t (u/m1/2n ), recall Ymn ,t := Umn ,t − (mn/n)

ln(X(mn+1)/bmn ), and

σ 2mn

(1,−1) := E

(1

m1/2n

n

∑t=1

(Umn ,t −α−1 Imn ,t

))2

and

σ 2mn

:= 1

mn

n

∑s,t=1

wn,s,t Ymn ,sYmn ,t .

We will prove |σ 2mn

− E(1/m1/2n ∑n

t=1 Ymn ,t )2| = op (1) and |E(1/m1/2

n ∑nt=1 Ymn ,t )

2 −σ 2

mn(1,−1)| = op (1).

Step 1. We will verify Assumptions 1–3 of de Jong and Davidson (2000) (JD) to show∣∣∣∣∣∣σ 2mn

− E

(1

m1/2n

n

∑t=1

Ymn ,t

)2∣∣∣∣∣∣→ 0. (B.6)

JD’s Assumption 1 holds by the statement of the lemma.

By Lemma B.10, {m−1/2n Ymn ,t ,n,t } forms an L2-mixingale array with size 1/2 and

constants c2n,t = K n−1/2. Thus JD’s Assumption 2 is satisfied. [Equation (2.6) of de Jong

and Davidson (2000) is only sufficient for the mixingale property to hold, but not necessary.

By the proof of Lemma B.10 in Hill (2009b){

m−1/2n Ymn ,t

}is L2-NED on {n,t } with

O((mn/n)1/r )-constants and o((mn/n)1/2−1/r q−1/2

n

)-coefficients, and

{m−1/2

n Ymn ,t ,

n,t

}forms an L2-mixingale sequences with constants and coefficients cn,t × ξqn =

K n−1/2 × o(q−1/2n ). With these properties in hand, each of de Jong and Davidson’s argu-

ments that exploit their (2.6) go through.]Finally, JD’s Assumption 3 is satisfied by γn max1≤t≤n c2

n,t = o(1) given γn = o(n).This proves (B.6).


Step 2. Define Umn := m−1/2n ∑n

t=1 Umn ,t , Imn := α−1m−1/2n ∑n

t=1 Imn ,t , and Bmn =m1/2

n ln(X(mn+1)/bmn ). Arguments in Hsing (1991, p. 1553) and the Helly-Bray theoremimply under Assumption B that∣∣∣∣∣∣E(

1

m1/2n

n

∑t=1

Ymn ,t

)2

−σ 2mn

(1,−1)

∣∣∣∣∣∣=∣∣∣E (Umn − Bmn

)2 − E(Umn − Imn

)2∣∣∣≤ 2∥∥Umn

∥∥2

∥∥Bmn − Imn

∥∥2 +∣∣∣E (Bmn

)2 − E(

Imn

)2∣∣∣= O(

m1/2n g

(bmn

))= o(1).

(B.7)

Together, (B.6) and (B.7) imply |σ 2mn

− σ 2mn

(1,−1)| = op(1) as claimed. n


APPENDIX C: Symbols

The following table displays the most frequently used symbols and variables in order ofappearance, their definitions, and the section(s) in which they first appear. If the symbolor variable first appears in a numbered equation, definition, etc., that information is alsogiven. Consult the first appearance for a complete definition.

Symbol Definition Section §, (eqn.), etc.

Ft (x), Ft (x) P (Xt ≤ x) , P (Xt > x) §1L(x) slowly varying component in Ft (x) = x−α L(x) §1, (2)X(i) sample order statistic: X(1) ≥ X(2) ≥ ·· · ≥ X(n) §1{mn} sequence of integers: mn → ∞, mn = o(n) §1bmn threshold sequence: n/mn P

(Xt > bmn

)→ 1 §2, (3){En,t } stochastic triangular array, mixing functional of εt §2.1, §2.2{n,t

}triangular σ -array induced by {En,t } §2.1

{qn} sequence of displacements, qn → ∞, qn = o(n) §2.1en,t (u),ϕqn E-MIXL constants and coefficients §2.1, Defn: E-MIXLfn,t (u),ψqn E-NED constants and coefficients §2.1, Defn: E-NEDεt E-NED base §2.2Gt σ -field induced by εt §2.2εn,qn , n,qn F-strong and F-uniform mixing coefficients §2.2g slow variation with remainder component of L(x) §3, Assumption B{mn(φ)} sequence of Lipschitz integer functions §3.1, (4)-(5)hn O(infφ∈� mn(φ))-sequence for Lipschitz mn(φ) §3.1, (5)

Umn ,t(ln(

Xt/bmn

))+ − E

[(ln(

Xt/bmn

))+]

§3.1, (6)

Imn ,t (u) I(

Xt > bmn eu)− E[I(

Xt > bmn eu)] §3.1, (6)

Tmn ,t (ω,u) 1/m1/2n

[ω1Umn ,t −ω2α−1 Imn ,t (u)

]§3.2, (8)

σ 2mn

(ω) σ 2mn

(ω1,ω2) := E(

∑nt=1 Tmn ,t

(ω,u/m1/2

n))2

§3.2, (9)


Symbol Definition Section §, (eqn.), etc.

dn,t L2-NED constants for{

Tmn ,t

(ω,u/m1/2

n

)}§3.2, Lemma 2

ψ∗n,qn

L2-NED coefficients for{

Tmn ,t

(ω,u/m1/2

n

)}§3.2, Lemma 2

cn,t L2-mixingale constants for{

Tmn ,t

(ω,u/m1/2

n

)}§3.2, Lemma 2

σ 2mn

E(

m1/2n

(α−1

mn−α−1

))2§3.2, Theorem 2

wn,s,t w(|s − t |/γn) §4e∗

n,t ,en,t (u) L2-mixingale constants of{

Umn ,t}

and Proof Lemma 1{Imn ,t (u)

}f ∗n,t , f ∗

n,t (u) L2-NED constants of{

Umn ,t}

and{

Imn ,t (u)}

Proof Lemma 2

Tmn ,t Tmn ,t

(ω,u/m1/2

n

)Proof Lemma 2

kn, ln,rn integer sequences for Bernstein blocks Proof Lemma 3, (A.3){Zn,i}rn

i=1 Bernstein blocks ∑iknt=(i−1)kn+ln+1 Proof Lemma 3, (A.4)

Tmn ,t

(ω,u/m1/2

n

)Fn,i σ({En,τ : τ ≤ ikn}), i = 1, ...,rn Proof Lemma 3

on tail index estimation for dependent...

Documents