pdfs.semanticscholar.org · weighted sums and residual empirical processes for time-varying...

Weighted Sums and Residual Empirical Processes for

Time-varying Processes

1Gabriel Chandler∗ and 2Wolfgang Polonik1Department of Mathematics, Pomona College, Claremont, CA 917112Department of Statistics, University of California, Davis, CA 95616

E-mail: [email protected] and [email protected]

March 6, 2011

Abstract

Function indexed weighted sums and sequential residual empirical processes based

on time-varying AR-processes are studied. It is shown that under appropriate assump-

tions non-parametric estimation of the parameter functions does not influence the

asymptotic distribution of the residual empirical process. An exponential inequality

for weighted sums of time-varying processes provides the basis for a weak convergence

result of weighted sum processes. A motivation for studying these two types of pro-

cesses is provided by the fact that, as shown in this paper, large sample results for both

processes can be utilized to derive the asymptotic distribution of a test for modality of

the variance function that is studied by the authors in an accompanying paper.

Keywords: Cumulants, empirical process theory, exponential inequality, locally stationary pro-

cesses, nonstationary processes, investigating unimodality.

1

1 Introduction

Consider the following time-varying AR-model of the form

Yt −p∑

k=1

θk

( tn

)Yt−k = σ

( tn

)εt, t = 1, . . . , T, (1)

where θk are the autoregressive parameter functions, p is the order of the model, σ is a

function controlling the volatility, and εt ∼ (0, 1) i.i.d. Following Dahlhaus (1997), time is

rescaled to the interval [0, 1] in order to make a large sample analysis feasible. Observe that

this in particular means that Yt = Yt,T satisfying (1) in fact forms a triangular array.

The consideration of non-stationary time series models goes back to Priestly (1965) who

considered evolutionary spectra, i.e. spectra of time series evolving in time. The time-

varying AR-process has always been an important special case, either in more methodological

and theoretical considerations of non-stationary processes, or in applications such as signal

processing and (financial) econometrics, e.g. Subba Rao (1970), Grenier (1983), Hall et

al. (1983), Rjan and Rayner (1996), Girault et al. (1998), Eom (1999), Drees and Starica

(2002), Fryzlewicz et al. (2006), and Orbe et al. (2005), Chandler and Polonik (2006).

Dahlhaus (1997) advanced the formal analysis of time-varying processes by introducing the

notion of a locally stationary process. This is a time-varying processes with time being

rescaled to [0, 1] that satisfies certain regularity assumptions (see (9 - 11) below for the case

of a time-varying AR-process). We would like to point out, however, that in this paper local

stationarity is only used to calculate the asymptotic covariance function in Theorem 2. All

the other results hold under weaker assumptions.

The interest of this paper is the investigation of the large sample behavior of two types

of processes, residual sequential empirical process and weighted sums processes, based on

observations from the non-stationary process satisfying (1). One motivation for studying

the behavior of both of these processes is given by the fact, that the large sample behavior

of both of these processes enter the derivation of the limit distribution of a test statistic for

modality of the variance function in model (1). This testing procedure is discussed in an

accompanying paper Chandler and Polonik (2010); see section 4.

Residual sequential empirical process are defined as follows. Let Yt = (Yt−1, . . . , Yt−p)′.Given

an estimator θ of θ = (θ1, . . . , θp)′ and corresponding residuals ηt = Yt−θ

(tT

)′Yt we estimate

2

the distribution function of the innovations ηt = σ(tT

)εt by the empirical distribution function

of the residuals

HT (z) =1

T

T∑t=1

1{ηt ≤ z}, z ∈ R.

Observe that we can write HT (z) = 1T

∑Tt=1 1{σ( t

T) εt ≤ (θ− θ)′( t

T) Yt + z}. This motivates

to consider the following type of process. Let [a, b] ⊂ [0, 1], and let Ys, s = 1, . . . , n =

bbT c − daT e+ 1 denote the observations Yt with tT∈ [a, b]. Define

νn(α,g, z) =1√n

bαnc∑s=1

[1{σ(a+ s

T) εt ≤ g′(a+ s

T) Ys + z

}− Fa+ s

T

(g′(a+ s

T) Ys + z

) ], (2)

where α ∈ [0, 1], z ∈ R, g : [a, b] → Rp, g ∈ Gp = {g = (g1, . . . , gp)′, gi ∈ G} with G

an appropriate function class such that θ − θ ∈ Gp (see below), and Fu(z) denotes the

distribution function of σ(u)εt, u ∈ [0, 1]. The reason for considering a subinterval [a, b] ⊂[0, 1] is again motivated by our application to the results to testing for modality of the

variance function. Observe that with 0 = (0, . . . , 0)′ ∈ Gp denoting the p-vector of null-

functions, νn(z, α,0) equals the empirical process based on the innovations rather than the

residuals. The basic form of νn is standard, but in contrast to most of the related literature,

we are considering a non-parametric index class G, and of course Yt is non-stationary here.

Letting Fn(z) denote the empirical distribution function of the innovations εs, our results will

in particular imply that under appropriate assumptions we have supz∈R |Hn(z) − Fn(z)| =

oP (T−1/2) even though non-parametric estimation of the parameter functions θk is involved.

The second type considered here are weighted sum processes of the form

Zn(h) =1√n

n∑s=1

h( sn)Ys,

α ∈ [0, 1] and h ∈ H where H is an appropriate function class. Such processes can be

considered as generalizations of partial sum processes. Large sample behavior, including

exponential inequalities and weak convergence of Zn is considered below.

The obtained results will be applied to derive the large sample distribution of a test for

modality of the variance function σ2(·) that is proposed in the accompanying paper Chandler

and Polonik (2010). In fact, the corresponding test statistic essentially is a functional of

α→ νn(α, θ − θ, qγ) plus a term that can be approximated by a weighted sum, where qγ is

3

an estimate of an appropriate γ-quantile. It turns out that the test statistic is asymptotically

distribution free, again despite the non-parametric estimation of the parameter functions.

(For more details see below.)

The outline of the paper is as follows. In sections 2 and 3 we analyze the large sample

behavior of the function indexed residual empirical process and weighted sums, respectively,

under the time varying model (1). Section 4 applies some of the results from the preceding

sections to derive a crucial approximation result for a quantity based on the residual empirical

process. This approximation provides a crucial ingredient in Chandler and Polonik (2010)

to derive the asymptotic distribution free limit of a test statistics for testing modality of the

variance function in model (1). Proofs are deferred to section 5.

Remark on measurability. Suprema of function indexed processes will enter the theo-

retical results below. We assume throughout the paper that such suprema are measurable.

Otherwise statements “in probability” have to be replaced by “outer” or “inner” probability,

respectively.

2 Residual empirical processes under time varying AR-models

There exists a substantial body of work on residual empirical processes indexed by a finite

dimensional parameter. For the time series setting see for instance Horvath et al. (2001),

Stute (2001), Koul (2002), Koul and Ling (2006), Laıb et al. (2008), Muller et al. (2009),

and references therein. There is not much work available on residual empirical processes for

models involving infinite dimensional parameter spaces. Except Muller et al. (2009), who

consider a the estimation of the innovation distribution in a nonparametric autoregression

model, we are only aware of Akritas and van Keilegom (2001) and Cheng (2005). Aktiras

and van Keilegom consider the estimation of the error distribution in a heteroscedastic

non-paramteric regression model, and Cheng (2005) is estimating distribution and density

function of the errors in a nonparametric (homoscedastic) regression model utilizing a sample

splitting technique. Here we are considering function indexed residual empirical processes

based on non-stationary time series.

In order to formulate one of our main results for the residual empirical process νn(α,g, z)

defined in (2) above, we first introduce some notation. For a function h : [a, b]→ R we denote

4

‖h‖∞ := supu∈[a,b]

|h(u)| and ‖h‖2n :=1

n

∑t∈[aT,bT ]

h2(tT

).

Let H denote a class of functions defined on [a, b]. For a given δ > 0, let N(δ,H) denote the

minimal number N of ‖·‖n-balls of radius less than or equal to δ that are needed to cover H,

i.e., there exists functions gk, k = 1, . . . , N such that the balls Ak = {h ∈ H : ‖gk−h‖n ≤ δ}with H ⊂

⋃Nk=1Ak. Then logN(δ,H) is called the metric entropy of H with respect to ‖ · ‖n.

If the balls Ak are replaced by brackets Bk = {h ∈ H : gk≤ h ≤ gk} for pairs of functions

gk≤ gk, k = 1, . . . , N with ‖gk − g

k‖n ≤ δ, then the minimal number N = NB(δ,H) of

such brackets with H ⊂⋃Nk=1 Bk is called a bracketing covering number, and logNB(δ, H) is

called the metric entropy with bracketing of H with respect to ‖ · ‖n.

Assumptions. (i) The process Yt = Yt,T has an MA-type representation

Yt,T =∞∑j=0

at,T (j)εt−j

where εt ∼i.i.d. (0, 1). The distribution F of εt has a strictly positive Lipschitz continuous

Lebesgue density f. The function σ(u) in (1) is of bounded variation with 0 < m∗ < σ(u) <

m∗ <∞ for all u.

(ii) The coefficients at,T (·) of the MA-type representation of Yt,T given in (i) satisfy

sup1≤t≤T

|at,T (j)| ≤ K

`(j)

for some K > 0, and where for some κ > 0 we have `(j) = j (log j)1+κ for j > 1 and `(j) = 1

for j = 0, 1.

(iii) There exists a class G with

θk(·)− θk(·) ∈ G, k = 1, . . . , p,

such that supg∈G ‖g‖∞ <∞ and for some γ ∈ [0, 1) and C, c > 0 and for all δ > cn

logNB(δ,G) ≤

C δ−γ, for 0 < γ < 1

C log(δ−1), for γ = 0.

5

(iv) For k = 1, . . . , p we have 1T

∑Tt=1

(θk(

tT

)−θk( tT ))2

= OP (m−1n ) with mn →∞ as T →∞.

Theorem 1 Suppose that assumptions (i) and (ii) hold, and that for some c > 0 and n

sufficiently large G satisfies ∫ 1

cn

√logNB(u2,G) du <∞.

Then we have for any L > 0 and δn → 0 as n→∞ that

supα∈[0,1],z∈[−L,L], g∈Gp

‖g‖n≤δn

|νn(α, z,g)− νn(α, z,0)| = oP (1). (3)

The above type of result is typical for work on residual empirical processes (e.g. (8.2.32)

in Koul 2002). However, as metioned above, compared to the existing work we are dealing

with non-stationary observations, and are considering an index set G that is a non-parametric

class of functions. Also keep in mind that we are in fact considering triangular arrays (recall

that Yt = Yt,T ),

Discussion of the assumptions. Assumptions (i) and (ii) have been used in the literature

on locally stationary processes (e.g. Dahlhaus and Polonik, 2006, 2009). It is shown in

Dahlhaus and Polonik (2006) that (i) and (ii) (and also the additional assumptions (9) -

(11) needed for Theorem 3 below) hold for time-varying AR-processes (1) and more general,

for time-varying ARMA-models as long as the zeros of the corresponding AR-polynomials

are bounded away from the unit circle (uniformly in the rescaled time u) and the parameter

functions are of bounded variation.

Assumption (iii) controls the complexity of the class G. In fact this assumption implies that

for some c > 0 we have∫ 1

cn

√logNB(u2,G) du <∞ and

∫ 1

cn

logN(u,G) du <∞.

Both of these assumptions are used below. Notice that the standard condition on the covering

integral is∫ 1

0

√logNB(u,G) du <∞ (or similarly without bracketing). In contrast to that,

our first condition is using NB(u2,G) (rather than NB(u,G)) in the integrant, and the second

does not have a square root. This makes both our the conditions stronger than the standard

6

assumption. The reason for this is that the exponential inequality that is underlying the

derivations of our results is not of sub-Gaussian type (see Lemma 3).

A class of non-parametric estimators satisfying conditions (iii) and (iv) is given by the

wavelet estimators of Dahlhaus et al. (1999). These estimators lie in the Besov smoothness

class Bsp,q(C) where the smoothness parameters satisfy the condition s + 1

2− 1

max(2,p)>

1. The constant C > 0 is a uniform bound on the (Besov) norm of the functions in the

class. Dahlhaus et al. derive conditions under which their estimators converge at rate(lognn

)s/(2s+1)in the L2-norm. For s ≥ 1 the functions in Bs

p,q(C) have uniformly bounded

total variation. Assuming that the model parameter functions also posses this property, the

rate of convergence in the ‖ · ‖n-norm is the same as the one of the L2-norm, because in

this case the error in approximating the integral by the average over equidistant points of

order O(n−1). Consequently, in this case we have m−1n =

(lognn

)s/(2s+1). In order to verify the

condition on the bracketing covering numbers from (iii), we are using Nickl and Potscher

(2007). Their Corollary 1, applied with s = 2, p = q = 2 implies that the bracketing entropy

with respect to the L2-norm can be bounded by C δ−1/2. (When applying their Corollary to

our situation choose, in their notation, β = 0, µ = U [0, 1], r = 2 and γ = 2, say.)

The proof of Theorem 1 rests on the following lemma which is of independent interest. It is

modeled after similar results for empirical processes (see van de Geer, 2000, Theorem 5.11).

Let H = [0, 1] × [−L,L] × Gp denote the index space of the process νn where L > 0 is a

constant. Define a metric on H as

d(h1, h2) = d((α1, z1,g1), (α2, z2,g2, )

)= |α1 − α2|+ |z1 − z2|+

p∑k=1

‖g1,k − g2,k‖n. (4)

Let DB(ε,H) denote the bracketing covering number of H with respect to the metric d.

Lemma 1 For C0 > 0 let Fn ={

1n

∑ns=1−p Y

2s ≤ C2

0

}and define K∗ = 1 + (1 + pC0)

‖f‖∞m∗

.

Suppose that H is totally bounded with respect to d, that assumption 1 (i) and (ii) hold, and

7

that for C1 > 0

η ≥ 26K∗√n

(5)

η ≤ 1

2K∗√n (τ 2 ∧ τ), (6)

η ≥ C1

(∫ τ

η/28K∗√n

√logDB(u2,H) du ∨ τ

). (7)

Then, for C1 ≥ 26√

10K∗ we have

P[

supd(h1,h2)≤τ2

∣∣ νn(h1)−νn(h2)| ≥ η, Fn

]≤(

26(26 + 1)K∗

C21

+ 2

)exp

(− η2

26(26 + 1)K∗ τ 2

),

where the supremum is extended over h1, h2 ∈ H.

3 Weighted sums under local stationarity

As discussed above, the second type of process of importance in our context are weighted

partial sums of locally stationary processes given by

Zn(h) =1√n

n∑s=1

h(a+ s

T

)Ys, h ∈ H. (8)

In the iid case weighted sums have received some attention in the literature. For functional

central limit theorems and exponential inequalities see for instance Alexander and Pyke

(1986), van de Geer (2000), and references therein.

We will show below that under appropriate assumptions, Zn(h) converges weakly to a Guas-

sian process. In order to calculate the covariance function of the limit we assume that the

process Yt is locally stationary as in Dahlhaus and Polonik (2009). This means that we

assume the existence of functions a(·, j) : (0, 1]→ R with

supu|a(u, j)| ≤ K

`(j), (9)

8

supj

n∑t=1

|at,T (j)− a(t

T, j)| ≤ K, (10)

TV (a(·, j)) ≤ K

`(j), (11)

where for a function g : [0, 1] → R we denote by TV (g) the total variation of g on [0, 1].

Further we define the time varying spectral density as the function

f(u, λ) :=1

2π|A(u, λ)|2

with

A(u, λ) :=∞∑

j=−∞

a(u, j) exp(−iλj),

and

c(u, k) :=

∫ π

−πf(u, λ) exp(iλk)dλ =

∞∑j=−∞

a(u, k + j) a(u, j) (12)

is the time varying covariance of lag k at rescaled time u ∈ [0, 1].

Theorem 2 Let H denote a class of uniformly bounded, real valued functions of bounded

variation defined on [a, b]. Assume further that for some c > 0,∫ 1

cn

logN(u,H) du <∞.

Then we have under assumptions (i), (ii) and (9) - (11) that as n → ∞ the process

Zn(h), h ∈ H converges weakly to a tight, mean zero Gaussian process {G(h) , h ∈ H}with variance-covariance function C(h1, h2) = 1

b−a

∫ bah1(u)h2(u)S(u) du, where S(u) =∑∞

k=−∞ c(u, k).

Remarks. a) Here weak convergence is meant in the sense of Hoffman-Jorgensen - see van

der Vaart and Wellner (1996) for more details.

b) Partial weighted sums of the form

Zn(α, h) =1√n

bαnc∑s=1

h(a+ s

T

)Ys

9

are in fact a special case of processes considered in the theorem. This can be seen by writing

Zn(α, h) = 1√n

∑ns=1 hα(a + s

T)Ys with hα(u) = 1(a ≤ u ≤ a + α(b − a))h(u). In other

words, the partial weighted sums are weighted sums indexed a slightly modified class of sets

H = {hα = 1(a ≤ u ≤ a + α(b − a))h(u), α ∈ [0, 1], h ∈ H}. If the class H satisfies the

assumptions on the covering integral from the above theorem, then so does H. The limit

covariance can then be written as C(h1,α), h2,β) = 1b−a

∫ (α∧β2) b

ah1(u)h2(u)S(u) du,

c) Assumptions (9) - (11) are only used for calculating the covariance function of the limit

process.

The main ingredients to the proof of this theorem are presented in the following results, the

proofs of which are given in the appendix. These results are of independent interest.

Theorem 3 Let {Yt, t = 1, . . . , T} satisfy assumptions (i) and (ii) and let H = {h : [a, b]→R} be totally bounded with respect to ‖·‖n. Further assume that there exists a constant C > 0

such that for all k = 1, 2, . . . we have E|εs|k ≤ Ck, and let Fn ={

1n

∑ns=1 Y

2s ≤ M2

}where

M > 0. There exists constants c0, c1, c2 > 0 such that for all η > 0 satisfying

η < 16M√n τ (13)

and

η > c0

( ∫ τ

η8M√n

logN(u,H) d u ∨ τ)

(14)

we have

P[

suph∈H, ‖h‖n≤τ

|Zn(h)| > η, Fn

]≤ c1 exp

{− η

c2 τ

}.

The second result of importance for the proof of Theorem 2 deals with cumulants. For

random variables X1, . . . , Xk we denote by cum(X1, . . . , Xk) their joint cumulant, and if

Xi = X for all i = 1, . . . , k, then cum(X1, . . . , Xk) = cum(X, . . . , X) = cumk(X), the k-th

order cumulant of X.

Lemma 2 Let {Yt, t = 1, . . . , T} have a MA-type representation given in assumption (i)

with coefficients satisfying assumption (ii). For j = 1, 2, . . . let hj be functions defined on

[a, b] with ‖hj‖n <∞. Then there exists a constant 1 ≤ K0 <∞ such that for all k ≥ 1,∣∣ cum(Zn(h1), . . . , Zn(hk))∣∣ ≤ Kk−1

0

∣∣ cumk(ε1)∣∣ k∏j=1

‖hj‖n.

10

If, in addition, ‖hj‖∞ ≤M <∞, j = 1, . . . , k, then for k ≥ 3,

∣∣ cum(Zn(h1), . . . , Zn(hk))∣∣ ≤ (K0M)k−2 n−

k−22 .

The behavior of the cumulants given in the above lemma is needed for the following crucial

exponential inequality.

Lemma 3 Suppose the assumptions of Lemma 2 hold. Let h be a function with ‖h‖n < ∞.

Assume that there exists a constant C > 0 such that for all m = 1, 2, . . . we have

E|εs|m ≤ Cm. Then there exists constants c1, c2 > 0 such that for any η > 0 we have

P[|Zn(h)| > η

]≤ c1 exp

{− η

c2 ‖h‖n

}. (15)

4 An application: Asymptotic distribution of a test for modal-

ity of the variance function

In an accompanying paper, Chandler and Polonik (2010), we propose a test for modal-

ity of the variance function model (1). The test statistic is essentially given by Tn =

supα∈[0,1]

(Gn,γ(α)− αγ

)where

Gn,γ(α) =1

n

daT e+bαnc−1∑t=daT e

1(η 2t ≥ q 2

γ ), α ∈ [0, 1],

where qγ is the empirical quantile of the squared residuals on the interval [a, b] ⊂ [0, 1] under

consideration. Notice that Gn,γ(α) counts the number of large residuals within the first

(100 × α)% of the interval [a, b]. If the variance function is constant, then, since one has

a total of bnγc large residuals, the expected value of n Gn,γ(α) approximately equals αnγ.

This motivates the form of Tn. We assume the interval [a, b] to be fixed. For conducting

the test for modality, using an appropriate interval is crucial, and Chandler and Polonik

(2010) provide an estimator for the interval of interest and show that this estimation does

not influence the asymptotic limit of the test statistic.

Notice that Gn,γ(α) is closely related to the sequential residual empirical process, and as

can be seen from the proof of Theorem 4, weighted empirical processes enter the analysis

11

of Gn,γ(α) through handling the estimation of qγ. Theorem 4 below provides an approxi-

mation of the test statistic Tn by independent (but not necessarily identically distributed)

random variables. This result crucially enters the proofs in Chandler and Polonik (2010). In

particular, it implies that the large sample behavior of the test statistic Tn under the null

hypothesis is not influenced by the in general non-parametric estimation of the parameter

functions, as long as the rate of convergence of these estimators is sufficiently fast. This is

somewhat a surprise, and it is connected to the particular structure of our model.

First we introduce some additional notation. Let fu denote the pdf of σ(u) εt, i.e. fu(z) =1

σ(u)f( z

σ(u)), and

Gn,γ(α) =1

n

daT e+bαnc−1∑t=daT e

1(ε 2t σ

2(tT

)≥ q 2

γ

),

where qγ is defined via

Ψa,b(z) =

∫ 1

0

F ( zσ(a+β(b−a))) dβ −

∫ 1

0

F ( −zσ(a+β(b−a))) dβ, z ≥ 0,

as the solution to the equation

Ψa,b(qγ) = 1− γ. (16)

Notice that this solution is unique since we have assumed F to be strictly monotonic, and if

σ2(u) = σ20 is constant for all u ∈ [a, b], then q2

γ equals the upper γ-quantile of the squared

innovations η2t = σ2

0 ε2t . The following approximation result is not assuming that the variance

is constant, however.

Theorem 4 Let γ ∈ [0, 1] and suppose that 0 ≤ a < b ≤ 1 are non-random. Assume further

that E|ε|k < Ck for some C > 0 and all k = 1, 2, . . . Then, under assumptions (i) - (iv),

with n1/2

m2n logn

= o(1) we have

√n supα∈[0,1]

∣∣ Gn,γ(α)−Gn,γ(α) + c(α)(Gn,γ(1)− EGn,γ(1)

) ∣∣ = op(1), (17)

where c(α) =

∫ a+α(b−a)a

[fu(qγ)) + fu(−qγ)

]du∫ b

a

[fu(qγ) + fu(−qγ)

]du

.

12

Under the null-hypothesis σ(u) ≡ σ0 > 0 for u ∈ [a, b] we have c(α) = α. Moreover, in case

the AR-parameter in model (1) are constant, and√n-consistent estimators are used, then

the moment assumptions on the innovations can be significantly relaxed to E ε 2t <∞.

Under the worst case null-hypothesis, σ2(u) = σ20 for all u ∈ [0, 1], the innovations are iid,

we have c(α) = α, and EGn,γ(α) = αγ. Therefore the above result implies that (γ(1 −γ))−1/2

√n(Gn,γ(α)−αγ) converges weakly to a standard Brownian Bridge. For more details

we refer to Chandler and Polonik (2010).

5 Proofs

5.1 Proof of Lemma 1

We only present a brief outline. In order to simplify the notation we again assume w.l.o.g.

that a = 0. First notice that νn(α, z,g) is a sum of bounded martingale differences, because

with ξz,gs = 1(σ(a+ s

T) εs ≤ g′(a+ s

T)Ys−1 + z

)we have

νn(α, z,g) =1√n

n∑s=1

ξz,gs 1(sT≤ αb

)where ξz,gs = ξz,gs − E(ξz,gs |Fs−1) and Fs = σ(εs, εs−1, . . .) denotes the σ-algebra generated

by {εs, εs−1, . . .}, and obviously also νn(α1, z1,g1) − νn(α2, z2,g2) are sums of martingale

differences. The proof of this lemma is based on the basic chaining device that is well-known

in empirical process theory, utilizing the following exponential inequality for sums of bounded

martingale differences from Freedman (1975).

Lemma. (Freedman 1975) Let Z1, . . . , ZT denote martingale differences with respect to

a filtration {Ft, t = 0, . . . , T − 1} with |Zt| ≤ C for all t = 1, . . . , T. Let further Sn =1√T

∑Tt=1 Zt and Vn = Vn(Sn) = 1

T

∑Tt=1 E(Z2

t | Ft−1). Then we have for all ε, τ 2 > 0 that

P(Sn ≥ ε, Vn ≤ τ 2

)≤ exp

(− ε2

2τ 2 + 2εC√n

). (18)

The form of (18) motivates that we first need to control the quadratic variation Vn. Let

ηα,z,gs = ξz,gs 1(sT≤ αb

).

13

We have for h1 = (α1, z1,g1), h2 = (α2, z2,g2) ∈ H with d(h1, h2

)≤ ε that

Vn =Vn(νn(h1)− νn(h2)) =1

n

n∑s=1

E(|ηα1,z1,g1s − ηα2,z2,g2

s |∣∣Fs−1

)≤ 1

n

n∑s=1

∣∣ 1( sT≤ α1 b

)− 1(sT≤ α2 b

) ∣∣ E(|ξz1,g1s |) +

1

n

n∑s=1

E(|ξz1,g1s − ξz2,g2

s |∣∣Fs−1)

≤ 1

n

∣∣n (α2 − α1) + 1∣∣+

1

n

n∑s=1


s |∣∣Fs−1)

≤ |α1 − α2|+1

n+

1

n

n∑s=1


s |∣∣Fs−1).

On Fn we have for the last sum that

1

n

n∑s=1


s |∣∣Fs−1)

≤ 1

n

n∑s=1

∣∣Fa+ sT

(z1 + (g1(

sT

))′Ys−1

)− F s

T

(z2 + (g1(

sT

))′Ys−1

)∣∣+

1

n

n∑s=1

∣∣F sT

(z2 + (g1(

sT

))′Ys−1

)− F s

T

(z2 + (g2(

sT

))′Ys−1

)

≤ supu,x

fu(x) |z1 − z2|+ supu,x

fu(x)1

n

n∑s=1

∣∣ ((g1 − g2)(sT

))′Ys−1|

≤ ‖f‖∞m∗

|z1 − z2|+‖f‖∞m∗

p∑k=1

‖g1k − g2k‖n

√√√√ 1

n

n∑s=−p

Y 2s ≤ (1 + C0)

‖f‖∞m∗

ε. (19)

Thus, on Fn we have for h1 = (α1, z1,g1), h2 = (α2, z2,g2) ∈ H with d(h1, h2

)≤ ε and ε ≥ 1

n

that

Vn(νn(h1)− νn(h2)) ≤1

n

n∑s=1

E(∣∣ηh1

s − ηh2

s

∣∣Fs−1

)2 ≤ K∗ ε. (20)

This control of the quadratic variation in conjunction with Freedman’s exponential bound

for martingales, now enables us to apply the chaining argument in a way similar to the proof

of Theorem 5.11 in van de Geer (2000). Details are omitted.

14

5.2 Proof of Theorem 1

We will utilize Lemma 1. For η > 0 we have

P ( supα∈[0,1],z∈[−L,L],g∈Gp

|νn(α, z,g)− νn(α, z,0)| ≥ η)

≤ P((

Fn

)c )+ P

(sup

d(h1,h2)≤Cεm−1n

|νn(h1)− νn(h2)| ≥ η,Fn

),

where Fn = { 1n

∑ns=1−p Y

2s ≤ C0 } for some C0 > 0. We will see below that 1

n

∑ns=1−p Y

2s =

OP (1) as n→∞. Therefore, for any given ε > 0 we can choose C0 such that P( (

Fn

)c ) ≤ ε

for n large enough. An application of Lemma 1 now gives the assertion. Similar arguments as

those leading to (19) show that∫ 1cn

√logNB(u2,G) du <∞ implies

∫ 1cn

√logDB(u2,H) du <

∞ (see also (40)). It remains to show that in fact 1n

∑ns=1−p Y

2s = OP (1). To this end we

show that

E[ 1

n

n∑s=1

Y 2s

]<∞, (21)

Var[ 1

n

n∑s=1

Y 2s

]= o(1). (22)

These two facts can be seen by direct calculations as demonstrated now. First we consider

(21). We have

EY 2s = E

[ s∑j=−∞

as,T (s− j)εj]2

= Es∑

j=−∞

s∑k=−∞

as,T (s− j)as,T (s− k)εjεk

=s∑

j=−∞

a2s(s− j) ≤

∞∑j=0

( K

`(j)

)2

< C <∞, (23)

15

for some C > 0, where we were using assumption (ii). This implies (21). Next we indicate

(22). Straightforward calculations show that

E(Y 2t Y

2s )− E(Y 2

t ) E(Y 2s )

=t∑

i=−∞

t∑j=−∞

s∑k=−∞

s∑`=−∞

at,T (t− i)at,T (t− j)as,T (s− k)as,T (s− `)E(εi εj εk ε`)

−t∑

i=−∞

a2t,T (t− i)

s∑k=−∞

a2s,T (s− k)

= (Eε4t − 2) A(t, s) + 2B(t, s),

where

A(t, s) =∞∑i=0

a2t,T (i)a2

s,T (i+ |t− s|),

B(t, s) =( ∞∑i=0

at,T (i)as,T (i+ |t− s|))2

.

We obtain that for some C∗ <∞,

A(t, s) ≤ K4

∞∑i=0

1

`2(i)

1

`2(i+ |t− s|)≤ K4

∞∑i=0

1

`(i)

1

`(i+ |t− s|)

≤ K4 1

`(|t− s|)

∞∑i=0

1

`(i)< C∗

1

`(|t− s|),

and similarly

|B(t, s)| ≤[K2

∞∑i=0

1

`(i)

1

`(i+ |t− s|)

]2≤ C∗

1

`(|t− s|). (24)

Now we have

∣∣∣ 1

n2

n∑s=1

n∑t=1

A(t, s)∣∣∣ ≤ C∗

n2

n∑s=1

n∑t=1

1

`(|t− s|)≤ C∗

n

n−1∑k=0

2(n− k)

n

1

`(k)

≤ 2C∗

n

n−1∑k=0

1

`(k)= O

( 1

n

)as n→∞.

This implis that Var[

1n

∑ns=1 Y

2s

]= O(1/n) which is (22).

16


Assume w.l.o.g. that a = 0. Showing weak convergence of the process Zn(h) means proving

asymptotic tightness and convergence of the finite dimensional distribution (e.g. see van der

Vaart and Wellner, 1996). Tightness follows from Theorem 3.

It remains to show convergence of the finite dimensional distributions. To this end we will

utilize the Cramer-Wold device in conjunction with the method of cumulants. It follows from

Lemma 2, that all the cumulants of Zn(h) of order k ≥ 3 converge to zero as n→∞. Using

the linearity of the cumulants, the same holds for any linear combination of Zn(hi), i =

1, . . . , K. The mean of all the Zn(h) equals zero. It remains to show convergence of the

covariances cov(Zn(h1), Zn(h2)). The range of the summation indices below are such that

the indices of the Y -variables are between 1 and n. For ease of notation we achieve this by

formally setting hi(u) = 0 for u ≤ 0 and u > b, i = 1, 2. We have

cov(Zn(h1), Zn(h2)) =1

n

n∑s1=1

n∑s2=1

h1

(s1T

)· h2

(s2T

)cov(Ys1 , Ys2)

=1

n

n∑s1=1

s1−1∑k=s1−n

h1

(s1T

)· h2

(s1−kT

)cov(Ys1 , Ys1−k)

=1

n

n∑s1=1

∑|k|≤√n

h1

(s1T

)· h2

(s1−kT

)cov(Ys1 , Ys1−k) + R1n (25)

where for n sufficiently large

|R1n| ≤1

n

n∑s1=1

∑|k|>√n

∣∣h1

(s1T

)· h2

(s1−kT

) ∣∣ ∣∣ cov(Ys1 , Ys1−k)∣∣.

From Proposition 5.4 of Dahlhaus and Polonik (2009) we obtain that sups |cov(Ys, Ys−k)| ≤K`(k)

for some constant K. Since both h1 and h2 are bounded and∑∞

k=−∞1`(k)

< ∞, we can

conclude that R1n = o(1). The main term in (25) can be approximated as

1

n

n∑s1=1

∑|k|≤√n

h1

(s1T

)· h2

(s1−kT

)c(s1T, k)

+R2n (26)

where

|R2,n| ≤1

n

n∑s1=1

∑|k|≤√n

∣∣h1

(s1T

)· h2

(s1−kT

) ∣∣ ∣∣ cov(Ys1 , Ys1−k)− c(s1T, k) ∣∣.

17

Proposition 5.4 of Dahlhaus and Polonik (2009) also says that for |k| ≤√n we have∑n

s1=0

∣∣ cov(Ys1 , Ys1−k) − c(s1T, k) ∣∣ ≤ K

(1 + |k|

n

)for some K > 0. Applying this result

we obtain that

|R2,n| ≤1

n

n∑s1=1

√n∑

k=−√n

∣∣h1

(s1T

)· h2

(s1−kT

) ∣∣ ∣∣ cov(Ys1 , Ys1−k)− c(s1T, k) ∣∣

≤ K11

n

√n∑

k=−√n

n∑s1=1

∣∣ cov(Ys1 , Ys1−k)− c(s1T, k) ∣∣ ≤ K1

1

n

√n∑

k=−√n

(1 + |k1|

`(k1)

)= o(1)

as n→∞. Next we replace h2(s1−kT

) in the main term of (26) by h2(s1T

). The approximation

error can be bounded by

1

n

n∑s1=1

∑|k|≤√n

∣∣h1

(s1T

)∣∣ · ∣∣h2

(s1−kT

)− h2

(s1T

) ∣∣ K`(k)

= o(1).

Here we are using the fact that supu |c(u, k)| ≤ K`(k)

(see Proposition 5.4 in Dahlhaus and

Polonik, 2009) together with the assumed (uniform) continuity of h2, the boundedness of h1

and the boundedness of∑∞

k=−∞1`(k)

. We have seen that

cov(Zn(h1), Zn(h2) ) =1

n

bα1nc∑s1=1

h1

(s1T

)· h2

(s1T

) ∑k≤√n

c(s1T, k)

+ o(1).

Since S(u) =∑∞

k=−∞ c(s1T, k)<∞ we also have

cov(Zn(h1), Zn(h2) ) =∞∑

k=−∞

1

n

n∑s1=1

h1

(s1T

)· h2

(s1T

)c(s1T, k)

+ o(1).

Finally, we utilize the fact that TV (c(·, k)) ≤ K`(k)

which is another result from Proposition 5.4

of Dahlhaus and Polonik (2009). This result, together with the assumed bounded variation

of both h1 and h2 allows us to replace the average over s1 by the integral. Recalling that

n = bbT e − daT e+ 1 = (b− a)T +O( 1T

) gives the assertion.

18


For ease of notation we assume w.l.o.g. that a = 0. By utilizing multilinearity of cumulants

we obtain that

cum(Zn(h1), . . . , Zn(hk)) = n−k2

n∑s1,s2,...,sk=1

h1

(s1T

)· · ·hk

(skT

)cum(Ys1 , . . . , Ysk).

In order to estimate cum(Ys1 , . . . , Ysk) we utilize the special structure Ys. Since the εj are

independent we again obtain by using multilinearity of the cumulants together with the fact

that cum(εj1 , . . . , εjk) = 0 unless all the j`, ` = 1, . . . , k are equal, that

cum(Ys1 , . . . , Ysk) =

min{s1,...,sk}∑j=0

as1,T (s1 − j) · · · ask,T (sk − j) cum(εj, . . . , εj)

= cumk(ε1)

min{s1,...,sk}∑j=0

as1,T (s1 − j) · · · ask,T (sk − j). (27)

Thus∣∣cum(Ys1 , . . . , Ysk)

∣∣ ≤ ∣∣cumk(ε1)∣∣ ∑∞

0

∏ki=1

K`(si−j) , and consequently,

∣∣cum(Zn(h1), . . . , Zn(hk))∣∣ ≤ n−

k2

∣∣cumk(ε1)∣∣ ∞∑j=−∞

k∏i=1

[ n∑si=0

∣∣hi( siT )∣∣ K

`(si − j)

]= n−

k2

∣∣cumk(ε1)∣∣ ∞∑j=−∞

2∏i=1

[ n∑si=0


`(si − j)

]×

k∏i=3

[ n∑si=0


`(si − j)

].

Utilizing Cauchy-Schwarz inequality we have for the last product

k∏i=3

[ n∑si=0

∣∣hi( siT )∣∣ 1

`(si − j)

]≤

k∏i=3

√√√√ n∑si=0

hi(siT

)2 √√√√ n∑si=0

( K

`(si − j)

)2

19

≤ nk2−1

k∏i=3

‖hi‖n

√√√√ ∞∑s=−∞

( K

`(s)

)2

≤ Kk−20 n

k2−1

k∏i=3

‖hi‖n. (28)

where we used the fact that

√∑∞s=−∞

(K`(s)

)2

≤∑∞

s=−∞K`(s)≤ K0 for some K0 <∞. Notice

that the bound (28) does not depend on the index j anymore, so that

∣∣cum(Zn(h1), . . . , Zn(hk))∣∣ ≤ Kk−2

0 n−1

k∏i=3

‖hi‖n∣∣cumk(ε1)

∣∣ ∞∑j=−∞

2∏i=1

[ n∑si=0


`(si − j)

].

As for the last sum, we have by again using the fact that∑∞

j=−∞K

`(s1−j)K

`(s2−j) ≤K∗

`(s1−s2)(cf.

(24)) for some K∗ > 0, and the Cauchy-Schwarz inequality that

∞∑j=−∞

2∏i=1

[ n∑si=0


`(si − j)

]=

n∑s1=0

n∑s2=0

h1

(s1T

)h2

(s2T

) ∞∑j=−∞

K

`(s1 − j)K

`(s2 − j)

≤n∑

s1=0

n∑s2=0

h1

(s1T

)h2

(s2T

) K∗

`(s1 − s2)

≤ K∗

√√√√ n∑s1=0

n∑s2=0

h1

(s1T

)2 1

`(s1 − s2)

√√√√ n∑s1=0

n∑s2=0

h1

(s2T

)2 1

`(s1 − s2)

= K∗

√√√√ n∑s1=0

h1

(s1T

)2 n∑s2=0

1

`(s1 − s2)

√√√√ n∑s1=0

h1

(s2T

)2 n∑s2=0

1

`(s1 − s2)

≤ K0 n ‖h1‖n ‖h2‖n.

This completes the proof of the first part of the lemma. The second part follows similar to

the above by observing that if ‖hi‖∞ < M for all i = 1, . . . , k, then, instead of the estimate

(28), we have

20

k∏i=3

[ n∑si=0

∣∣hi( siT )∣∣ 1

`(si − j)

]≤Mk−2

k∏i=3

n∑si=0

1

`(si − j)≤ (MK0)

k−2

with K0 =∑∞

s=−∞1`(s).


First observe that since |cumk(εs)| ≤ E(|εs|k) +∑k−1

j=1

(k−1j−1

)E|εs|k−j|cumj(εs)| it is straight-

forward to see that assumption E|εs|k ≤ Ck implies that |cumk(εs)| ≤ 3k Ck. It follows by

utilizing Lemma 2 that

ΨZn(h)(t) = log Eet Zn(h) =1

K0

∞∑k=1

tk

k!cumk(Zn(h))

≤ 1

K0

∞∑k=1

tk

k!(3CK0 ‖hj‖n)k = K−1

0

(e3tCK0‖h‖n − 1

).

We obtain for any t > 0

P[|Zn(h)| > η

]≤ 2 e−tη E

(eZn(h)

)= 2 exp{−tη} exp{ΨZn(h)(t)}

≤ 2 exp{−tη +K−10

(e3tC‖h‖n − 1

)}.

Choosing t = 13C‖h‖n gives the assertion:

P[|Zn(h)| > η

]≤ 2 exp

{− η

3C‖h‖n

}eK−10 (e−1).


Using the exponential inequality from Lemma 3 we can mimic the proof of Lemma 3.2

from van de Geer (2000). As compared to van de Geer, our exponential bound is of the form

c1 exp{−c2 η‖h‖n} rather than c1 exp{−c2

(η‖h‖n

)2}. It is well-known that this type of inequality

leads to the covering integral being the integral of the metric entropy rather than the square

root of the metric entropy. (See for instance Theorem 2.2.4 in van der Vaart and Wellner,

1996.) This indicates the necessary modifications to the proof in van de Geer. Details are

omitted. Condition (13) just makes sure that the upper limit in the integral from (14) is

larger than the lower limit.

21


First we indicate how the two processes studied in this paper enter the picture, and simul-

taneously we provide an outline of the proof.

Recall that we have relabeled the observations Yt inside the rescaled interval [a, b] ⊂ [0, 1],

i.e. aT ≤ t ≤ bT to Ys, s = 1, . . . , n. Also, Ys−k with k ≥ s denotes YdaT e−(k−s). Let

Hn(α, z) =1

n

bαnc∑s=1

1(ηs ≤ z) and Fn(α, z) =1

n

bαnc∑s=1

1(σ(a+ sT

) εs ≤ z).

With this notation

√n ( Gn,γ(α)−Gn,γ(α) )

=[√

n(Fn(α, qγ)−Hn(α, qγ)

)−√n(Fn(α,−qγ)−Hn(α,−qγ)

) ]−[√n(Fn(α, qγ)− Fn(α, qγ)

)−√n(Fn(α,−qγ)− Fn(α,−qγ)

) ]=: In(α) − IIn(α) (29)

with In(α) and IIn(α), respectively, denoting the two quantities inside the two [·]-brackets.

The assertion of Theorem 4 follows from

supα∈[0,1]

|In(α)| = oP (1) and (30)

supα∈[0,1]

|IIn(α)− c(α)(Gn,γ(1)− EGn,γ(1))| = oP (1). (31)

Property (31) can be shown by using empirical process theory based on independent, but not

identically distributed random variables. Verification of (30) involves both residual empirical

processes and weighted sum processes. To see the latter, observe that for each z ∈ R

Hn(α, z ) =1

n

bαnc∑s=1

1(ηs ≤ z

)=

1

n

bαnc∑s=1

1(σ(a+ s

T) εs ≤ σ(a+ s

T) εs − ηs + z

)=

1

n

bαnc∑s=1

1(σ(a+ s

T) εs ≤

((θ − θ)(a+ s

T))′Ys−1 + z

).

Recall that by assumption θn(·) − θ(·) ∈ Gp, where for g = (g1, . . . , gp)′ : [a, b]p → R we

22

write {g ∈ Gp} for {gi ∈ G, i = 1, . . . , p}. We also write

Fn(α, z,g) =1

n

bαnc∑s=1

1{σ( t

T) εt ≤ g′( t

T) Yt + z

}.

By taking into account that by assumption θ − θ = OP (m−1n ) the above implies that with

probability tending to one we have for any C > 0 that

supα∈[0,1], z∈[−L,L]

√n∣∣∣Fn(α, z)− Hn(α, z)

∣∣∣ ≤ supα∈[0,1], z∈[−L,L],

g∈Gp, ‖g‖n≤Cm−1n

√n∣∣∣ Fn(α, z)− Fn(α, z,g)

∣∣∣. (32)

Thus, if L is such that qγ ∈ (−L,L) and P (qγ ∈ [−L,L]) → 1, as n → ∞, then the right

hand side in (32) being oP (1) implies that supα |In(α)| = oP (1). In order to control the right

hand side in (32), an appropriate centering is needed. Let Fu(z) = F(

zσ(u)

)denote the cdf

corresponding to the pdf fu(z), defined above Theorem 4. Let

En(α, z,g) :=1

n

bαnc∑s=1

E[1{σ( t

T) εt ≤ g′( t

T) Yt + z

} ∣∣ εs−1, εs−2, . . .]

=1

n

bαnc∑s=1

Fa+ sT

( (g(a+ s

T))′Ys−1 + z

)(33)

and let 0 = (0, . . . , 0)′ ∈ Gp denote the p-vector of null-functions. Write

Fn(α, z)− Fn(α, z,g) =[(Fn(α, z,0)− En(α, z,0)

)−(Fn(α, z,g)− En(α, z,g)

) ]+[En(α, z,0)− En(α, z,g)

]=: Tn1(α, z,g) + Tn2(α, z,g) (34)

with Tn1 and Tn2 denoting the two terms inside the two [ ] brackets. This centering makes

T1n a sum of martingale differences (cf. proof of Lemma 1 given above). The quantity

νn(α, z,g)) =√n(Fn(α, z,g) − En(α, z,g)

)is the residual empirical process discussed in

section 2. As for Tn2(α, z,g) notice that we have

√nTn2(α, z,g) =

1√n

dnαe∑s=1

[Fa+ s

T

(z +

(g(a+ s

T

))′Ys−1

)− Fa+ s

T(z)]

23

=1√n

dnαe∑s=1

fa+ sT

(z)(g(a+ s

T

))′Ys−1 +

1√n

dnαe∑s=1

(fa+ sT

(ζs)− fa+ sT

(z))(g(a+ s

T

))′Ys−1 (35)

with ζs between z and z +(g(a + s

T

))′Ys−1. The second term in (35) is a remainder term,

while the first term is a weighted sum of the Ys’s.

The above motivates that (30) can be verified by appropriate control of both the residual

empirical process and a weighted sum process.

After this outline we now present the missing details of the proof. Without loss of generality

we assume that aT > p so that the residuals are all well defined, and we assume a = 0.

Choose L large enough such that qγ ∈ (−L,L). As outlined above, it suffices to show that

both

supα∈[0,1]

|In(α)| = oP (1) and supα∈[0,1]

|IIn(α)− c(α)(Gn,γ(1)− EGn,γ(1))| = oP (1).

Proof of supα∈[0,1] |In(α)| = oP (1). It follows from (32) and (34) that together

supα∈[0,1],z∈[−L,L],g∈Gp

|√nTn1(α, z,g)| = oP (1) and (36)


∣∣∣√nTn2(α, z,g)∣∣∣ = oP (1) (37)

imply the desired result, provided we can show that P (qγ /∈ [−L,L]) = o(1) as n → ∞.

Since qγ ∈ (−L,L), the desired property follows from consistency of qγ as an estimator for

qγ. This will be shown at the end of this proof.

Notice that by assumption (iv), for any given ε > 0 and n large enough there exists a constant

Cε > 0 such that with

An(ε) :={‖θn,k(u)− θk(u)‖n ≥ Cεm

−1n for all k = 1, . . . , p

}we have

P (An(ε)) ≤ ε.

Thus, on A{n(ε) we can assume that ‖g‖n ≤ Cεm

−1n , which means that on A{

n(ε) we can

24

assume d((α, z,g), (α, z,0)) ≤ Cεm−1n . Thus, on A{

n(ε)


|νn(α, z,g)− νn(α, z,0)| ≤ supd(h1,h2)≤Cεm−1

n

|νn(h1)− νn(h2)|.

It is shown in the proof of Theorem 1 that 1n

∑ns=−p Y

2s = OP (1), and thus P (Fn) = o(1).

By definition of Tn2(α, z,g) it remains to show that and for η > 0 we have

P(

supd(h1,h2)≤Cεm−1

n

|νn(h1)− νn(h2)| ≥ η,Fn

)= o(1). (38)

This, however, is an immediate application of Theorem 1, and (36) is verified. Now we

consider (37). Again we assume that we are on A{n(ε). We have already seen in (35) that

√nTn2(α, z,g)

=1√n

dnαe∑s=1

fa+ sT

(z)(g(sT

))′Ys−1 +

1√n

dnαe∑s=1

(fa+ sT

(ζs)− fa+ sT

(z))(g(sT

))′Ys−1 (39)

with ζs between z and z+(g(sT

))′Ys−1. The second term in (39) will be treated below. The

first sum on the right hand side can be written as∑p

k=1 Zk,n(h) where

Zk,n(h) =1√n

n∑s=1

h( sT

)Ys−k, k = 1, . . . , p, (40)

for h ∈ H with H = {hα,z,g(u) = 1α(u) fu(z)g(u), α ∈ [0, 1], z ∈ R, g ∈ G} where we use

the shorthand notation 1α(u) = 1(u ≤ α b

). We will apply Theorem 3 to show that each

Zk,n(h) tends to zero uniformly in h ∈ H. Since by our assumptions the functions {fu(z), z ∈[−L,L]} are uniformly bounded, we have supα,z,g ‖hα,z,g‖2n ≤ supu,z |fu(z)|m−1

n = o(1). Since

EZn(hα,z,g) = 0, Theorem 3 implies the result if we have shown that H has a finite covering

integral.

The finiteness of the covering integral of H with respect to ‖ · ‖n follows by standard argu-

ments. In fact, it is not difficult to see that for some C0 > 0

logN(C1δ,H) ≤ −C0 log ε+ logN(ε,G). (41)

Our assumptions now allow an application of Theorem 3, showing that the first term of (39)

converges to zero in probability uniformly in (α, z, g).

25

Now we treat the second term in (39). Recall that our assumptions imply that the functions

{fu, u ∈ [0, 1]} are uniformly Lipschitz continuous with Lipschitz constant c, say. Therefore

we can estimate the last term in (39) by

c√n

n∑s=1

∣∣ (g( sT

))′Ys−1

∣∣2 ≤ c√n

n∑s=1

( p∑k=1

g2k

(sT

) p∑j=0

Y 2s−j

)≤ c p sup

−p≤t≤TY 2t

√n

p∑k=1

‖gk‖2n = c p

√n

m2n

OP (log n) = oP (1).

where the last inequality uses the fact that sup−p≤t≤T Y2t = OP (log n), which follows as in

the proof of Lemma 5.9 of Dahlhaus and Polonik (2009).

Proof of supα∈[0,1] | IIn(α)− c(α)(Gn,γ(1)− EGn,γ(1)) | = oP (1). Define

F n(α, z) := EFn(α, z) =1

n

bαnc∑s=1

Fa+ sT

(z).

(Recall that Fu(z) = F ( zσ(u)

).) We can write

IIn(α) =√n((Fn − F n)(α, qγ)− (Fn − F n)(α, qγ)

)(42)

+√n((Fn − F n)(α,−qγ) − (Fn − F n)(α,−qγ)

)(43)

+√n (F n(α, qγ)− F n(α, qγ)) −

√n (F n(α,−qγ)− F n(α,−qγ)) (44)

The process νn(α, z,0) =√n (Fn − F n)(α, z) is a sequential empirical process, or a Kiefer-

Muller process, based on independent, but not necessarily identically distributed random

variables. This process is asymptotically stochastically equicontinuous, uniformly in α with

respect to dn(v, w) =∣∣F n(1, v)− F 1,n(1, w)

∣∣, i.e. for every η > 0 there exists an ε > 0 with

lim supn→∞

P[

supα∈[0,1], dn(z1,z2)≤ε

|νn(α, z1,0)− νn(α, z2,0)| > η]

= 0. (45)

In fact, with dn(α1, z1), (α2, z2)

)= |α1 − α2|+ dn(z1, z2) we have

supα∈[0,1]

supz1,z2,∈R, dn(z1,z2)≤ε

|νn(α, z1,0)− νn(α, z2,0)|

≤ supα1,α2∈[0,1], z1,z2∈Rdn( (α,z1), (α,z2) )≤ε

|νn(α1, z1,0)− νn(α2, z2,0)|.

26

Thus, (45) follows from asymptotic stochastic dn-equicontinuity of νn(α, z,0). This in turn

follows from a proof similar to, but simpler than, the proof of Lemma 1. In fact, it can be

seen from (19) that for g1 = g2 = 0 we simply can use the metric d(

(α1, z1), (α2, z2))

=

|α1−α2|+dn(z1, z2) in the estimation of the quadratic variation, which in the simple case of

g1 = g2 = 0 amounts to the estimation of the variance, because the randomness only comes

in through the εs. With this modification the proof of the dn-equicontinuity of νn(α, z1,0)

follows the proof of Lemma 1.

Thus, if qγ is consistent for qγ with respect to dn, then it follows that both (42) and (43) are

oP (1). We now prove this consistency of qγ.

First observe that 1b

∫ b0F ( qγ

σ(v)) dv =

∫ 1

0F ( qγ

σ(b u)) du is close to F n(1, qγ). In fact, since by

assumption u → F ( qγσ(u)

) is of bounded variation, the difference is of the order O(1/n).

Consequently we have

∣∣ (1− γ)−(F n(1, qγ)− F n(1,−qγ)

) ∣∣=∣∣∣Ψ0,b(1, qγ)−

(F n(1, qγ)− F n(1,−qγ)

) ∣∣∣ ≤ c n−1 (46)

for some c ≥ 0. In fact (46) holds uniformly in γ. This follows from the fact that the functions

u→ Fu(

zσ(u)

)are Lipschitz continuous uniformly in z. (Notice that in the case where σ(·) ≡

σ0 is constant on [a, b] then Ψ0,b(z) =∫ 1

0F ( z

σ(b u)) du −

∫ 1

0F ( −z

σ(b u)) du = F ( z

σ0) − F (−z

σ0) =

F n(1, z)− F n(1,−z). In other words, in this case we can choose c = 0.) We now show that

under our assumptions we have for any fixed 0 < γ < 1 that

dn(qγ, qγ) =∣∣F n(1, qγ)− F n(1, qγ)

∣∣ = oP (1). (47)

By assumption Ψ0,b(z) is a strictly monotonic function in z. Together with (46) this implies

that (47) is equivalent to∣∣∣ (F n(1, qγ) − F n(1,−qγ)

)−(F n(1, qγ) − F n(1,−qγ)

) ∣∣∣ = oP (1),

which (by using (46)) follows from∣∣∣ (F n(1, qγ)− F n(1,−qγ))−(

1− γ) ∣∣∣ = oP (1), (48)

Since by definition of qγ we have Hn(1, qγ)− Hn(1,−qγ) = 1− γ, (48) follows from

supz|Hn(1, z)− F n(1, z)| = oP (1). (49)

27

To see (49) notice that supz |Hn(1, z)− Fn(1, z)| = oP (1). This follows from (32), (34), (36)

and (37). Utilizing triangular inequality it remains to show that supz |Fn(1, z)−F n(1, z)| =oP (1). This uniform law of large numbers result follows from the arguments given below. It

can also be seen directly by observing that for every fixed z we have |Fn(1, z)− F n(1, z)| =oP (1) (which easily follows by observing that the variance of this quantity tends to zero as

n→∞) together with a standard argument as in the proof of the classical Glivenko-Cantelli

theorem for continuous random variables, utilizing monotonicity of F n(1, z) and Fn(1, z).

This completes the proof of (47), and as outlined above, this implies that both (42) and (43)

are oP (1), uniformly in α.

It remains to consider the quantity in (44). First, we derive an upper bound for qγ. Let

Bn(ε) = {|F n(qγ) − F n(qγ)| < ε; supz |(Hn − F n)(1, z)| < δε/6} where δε > 0 is such that

|F n(qγ±δε)−F n(qγ)| < ε. The above shows that P (Bn(ε))→ 1 as n→∞ for any ε > 0. Now

choose ε > 0 small enough (such that γ > δε/6 in order for the below to be well defined).

On Bn(ε) we have

qγ = inf{z ≥ 0 : Hn(1, z)− Hn(1,−z) ≥ 1− γ; |F n(z)− F n(qγ)| < ε}

≤ inf{z ≥ 0 : F n(1, z)− F n(1,−z) ≥ 1− γ −[

(Hn − F n)(1, qγ)− (Hn − F n)(1,−qγ)]

+ 2 sup|Fn(v)−Fn(w)|≤ε

|(Hn − F n)(1, v)− |(Hn − F n)(1, w)|}

≤ inf{z ≥ 0 : Ψ0,b(z) ≥ 1− γ −[

(Hn − F n)(1, qγ)− (Hn − F n)(1,−qγ)]

+ 2 sup|Fn(v)−Fn(w)|≤ε

|(Hn − F n)(1, v)− |(Hn − F n)(1, w)|+ c n−1}

= Ψ−10,b(1− γ −Qn + rn),

where for short rn = 2 sup|Fn(v)−Fn(w)|≤ε |(Hn − F n)(1, v)− |(Hn − F n)(1, w)|+ cn−1 with c

from (46), and Qn =[

(Hn − F n)(1, qγ)− (Hn − F n)(1,−qγ)]. Now let

Ψ0,b(α, z) =

∫ α

0

F ( zσ(b u))

) du−∫ α

0

F ( −zσ(b u))

) du, z ≥ 0, α ∈ [0, 1]. (50)

Observe that Ψ0,b(z) = Ψ0,b(1, z). For ease of notation we omit the subscripts on Ψ in what

follows. Since by definition of qγ we have qγ = Ψ−1(1 − γ), and since Ψ(α, z) is strictly

increasing in z for any α we have on Bn(ε),

[F n(α, qγ)− F n(α, qγ)

]−[F n(α,−qγ)− F n(α,−qγ)

]28

= Ψ(α, qγ)−Ψ(α, qγ) + c n−1

≤ Ψ(α,Ψ−1

(1− γ −Qn + rn

) )−Ψ

(α,Ψ−1(1− γ)

)+ c n−1

= −∂∂z

Ψ(α,Ψ−1(ξ+n ))

Ψ′(Ψ−1(ξ+n ))

(Qn − rn) + c n−1

= −∫ bα

0

[fu(Ψ

−1(ξ+n )) + fu(−Ψ−1(ξ+

n ))]du∫ b

0

[fu(Ψ−1(ξ+

n )) + fu(−Ψ−1(ξ+n ))]du

(Qn − rn) + c n−1

=: − (c(α) + oP (1)) (Qn − rn) + c n−1

with ξ+n ∈ [1−γ, 1−γ− Qn+rn], fu(z) = ∂

∂zFu(z) = 1

σ(u)f(

zσ(u)

), and the oP (1)-term equals

cn(α)− c(α) with cn(α) the ratio from the second to last line in the above formula, and

c(α) =

∫ b α0

[fu(qγ)) + fu(−qγ)

]du∫ b

0

[fu(qγ) + fu(−qγ)

]du

.

The fact that |cn(α)− c(α)| = oP (1) follows from our smoothness assumptions together with

the fact that |Qn − rn| = oP (1). Similarly, we obtain a lower bound of the form

[F n(α, qγ)− F n(α, qγ)


]≥ −(c(α) + oP (1)) (Qn + rn)− c n−1.

From the above we know that√nQn =

√n (Fn−F n)(qγ)−

√n (Fn−F n)(−qγ) + oP (1) and

√n rn = oP (1), so that

√n[F n(α, qγ)− F n(α, qγ)


]= −c(α)√

n

n∑s=1

(1(σ2

(sT

)ε2s ≤ q2

γ)− P (σ2(sT

)ε2s ≤ q2

γ))

+ oP (1)

= − c(α)√n (Gn,γ(1)− EGn,γ(1) ) + oP (1).

Finally, observe that under H0, where σ(u) ≡ σ0, we have c(α) = α, because fu does not

depend on u ∈ [a, b]. The proof is complete once we have shown that |qγ − qγ| = oP (1)

as n → ∞ (cf. comment right after (37)). To see that, first notice that our regularity

conditions imply that infu fu(qγ) =: d∗ > 0. Again using the fact that our assumptions

imply the uniform Lipschitz continuity of the functions z → fu(z) we obtain the existence

29

of an ε0 > 0 such that fu(z) > d∗2

for all |z − qγ| ≤ ε0 and all u ∈ [0, 1]. Consequently, if

|qγ−qγ| ≤ ε0 then∣∣F s

T(qγ)−F s

T(qγ)

∣∣ =∣∣∣ ∫ bqγ

qγf sT

(u) du∣∣∣ ≥ d∗

2|qγ−qγ| ∀ s = 0, 1, . . . , n.On the

other hand, if |qγ−qγ| > ε0 then, because the fu(z) are all non-negative,∣∣F s

T(qγ)−F s

T(qγ)

∣∣ =∣∣∣ ∫ bqγqγf sT

(u) du∣∣∣ ≥ d∗

2ε0 ∀ s = 0, 1, . . . , n. Thus, for any ε > 0 there exists an η > 0 with

{|qγ − qγ| > ε} ⊂{∣∣F s

T(qγ)− F s

T(qγ)

∣∣ > η ∀ s = 0, 1, . . . , n}. (51)

Since all the function z → F sT

(z) are strictly monotone, we have

dn(qγ, qγ) =∣∣F n(qγ)− F n(qγ)

∣∣ =1

n

n∑s=1

∣∣F sT

(qγ)− F sT

(qγ))∣∣.

Consequently, (51) implies that if |qγ − qγ| is not oP (1), then also dn(qγ, qγ) is not oP (1).

This is a contradiction to (47) and this completes our proof.

6 References:

AKRITAS, M.G. and VAN KEILEGOM, I. (2001): Non-paramteric estimation of the resid-

ual distribution. Scand. J. Statist. 28, 549-567.

ALEXANDER, K.S. and PYKE, R. (1986): A uniform central limit theorem for set-indexed

partial-sum processes with finite variance. Ann. Probab. 14, 582-597.

CHANDLER, G. and POLONIK, W. (2006). Discrimination of locally stationary time series

based on the excess mass functional. J. Amer. Statist. Assoc. 101, 240-253.

CHANDLER, G. and POLONIK, W. (2010): A test of modality for the variance function

in time-varying autoregression. Submitted.

CHENG, F. (2005). Asymptotic distributions of error density and distribution function es-

timators in nonparametric regression. J. Statist. Plann. Inference 128, 327349.

DAHLHAUS, R. (1997). Fitting time series models to nonstationary processes. Ann. Statist.

25, 1-37.

DAHLHAUS, R., NEUMANN, M. and V. SACHS, R. (1999). Nonlinear wavelet estimation

of time-varying autoregressive processes. Bernoulli, 5, 873-906.

DAHLHAUS, R. and POLONIK, W. (2006). Nonparametric quasi-maximum likelihood es-

timation for gaussian locally stationary processes. Ann. Statist. 34, 2790 - 2824.

30

DAHLHAUS, R. and POLONIK, W. (2009). Empirical spectral processes for locally sta-

tionary time series Bernoulli 15, 1 - 39.

DREES, H. and STARICA, C. (2002). A simple non-stationary model for stock returns.

Preprint.

FRYZLEWICZ, P., SAPATINAS, T., and SUBBA RAO, S. (2006). A Haar-Fisz technique

for locally stationary volatility estimation. Biometrika 93, 687-704.

EOM, K.B. (1999). Analysis of acoustic signatures from moving vehicles using time-varying

autoregressive models, Multidimensional Systems and Signal Processing 10, 357-378.

GIRAULT, J-M., OSSANT, F. OUAHABI, A., KOUAME, D. and PATAT, F. (1998). Time-

varying autoregressive spectral estimation for ultrasound attenuation in tissue characteriza-

tion. IEEE Trans. Ultrasonic, Ferroeletric, and Frequency Control 45, 650 - 659.

GRENIER, Y. (1983): Time-dependent ARMA modeling of nonstationary Signals, IEEE

Trans. on Acoustics, Speech, and Signal Processing 31 , 899911.

HALL, M.G., OPPENHEIM, A.V. and WILLSKY, A.S. (1983), Time varying parametric

modeling of speech, Signal Processing 5, 267285.

HORVATH, L., KOKOSZKA, P., TEYESSIERE, G. (2001): Empirical process of the

squared residuals of an ARCH sequence. Ann. Statist. 29, 445469.

KOUL, H. L.. (2002). Weighted empirical processes in dynamic nonlinear models, 2nd edn.,

Lecture Notes in Statistics, 166, Springer, New York.

KOUL, H.L., LING, S. (2006): Fitting an error distribution in some heteroscedastic time

series models. Ann. Statist. 34, 994-1012.

MULLER, U., SCHICK, A. and WEFELMEYER, W. (2008): Estimating the innovation

distribution in nonparametric autoregression. Probab. Theory Relat. Fields 144, 5377

LAIB, N., LEMDANI, M. and OLUD-SAID, E. (2008): On residual empirical processes of

GARCH-SM models: Application to conditional symmetry tests. J. Time Ser. Anal. 29,

762- 782.

NICKL, R. and POTSCHER, B.M. (2007). Bracketing metric entropy rates and empirical

central limit theorems for function classes of Besov- and Sobolev-type. J. Thoer. Probab.

20, 177-199.

ORBE, S., FERREIRA, E., and RODRIGUES-POO, J. (2005). Nonparametric estimation

31

of time varying parameters under shape restrictions. J. Econometrics 126, 53 - 77.

PRIESTLEY, M.B. (1965). Evolutionary spectra and non-stationary processes. J. Royal

Statist. Soc. - Ser. B 27, 204-237.

RAJAN, J.J. and Rayner, P.J. (1996). Generalized feature extraction for time-varying au-

toregressive models, IEEE Trans. on Signal Processing 44, 24982507.

STUTE, W. (2001). Residual analysis for ARCH(p)-time series. Test 1, 393403.

SUBBA RAO, T.(1970). The fitting of nonstationary time-series models with time-

dependent parameters, J. Royal Stat. Soc. - Ser. B 32, 312322.

VAN DE GEER, S. (2000): Empirical processes in M-estimation. Cambridge University

Press.

VAN DER VAART, A.W. and WELLNER, J. (1996). Weak convergence and empirical

processes. Springer, New York.

32

pdfs.semanticscholar.org · weighted sums and residual empirical processes for time-varying...

Documents