optimal bandwidth selection in heteroskedasticity...

OPTIMAL BANDWIDTH SELECTION IN HETEROSKEDASTICITY-AUTOCORRELATION ROBUST TESTING

BY

YIAIAO SUN, PETER C. B. PHILLIPS, and SAINAN JIN

COWLES FOUNDATION PAPER NO. 1221

COWLES FOUNDATION FOR RESEARCH IN ECONOMICS YALE UNIVERSITY

Box 208281 New Haven, Connecticut 06520-8281

2007

http://cowles.econ.yale.edu/

http://www.econometricsociety.org/

Econometrica, Vol. 76, No. 1 (January, 2008), 175–194

OPTIMAL BANDWIDTH SELECTION INHETEROSKEDASTICITY–AUTOCORRELATION ROBUST TESTING

YIXIAO SUNUniversity of California, San Diego, La Jolla, CA 92093, U.S.A.

PETER C. B. PHILLIPSCowles Foundation, Yale University, New Haven, CT 06520, U.S.A.; University of Auckland,

Auckland, New Zealand; and University of York, York, U.K.

SAINAN JINGuanghua School of Management, Peking University, Beijing, 100871, China

The copyright to this Article is held by the Econometric Society. It may be downloaded,printed and reproduced only for educational or research purposes, including use in coursepacks. No downloading or copying may be done for any commercial purpose without theexplicit permission of the Econometric Society. For such commercial purposes contactthe Office of the Econometric Society (contact information may be found at the websitehttp://www.econometricsociety.org or in the back cover of Econometrica). This statement mustthe included on all copies of this Article that are made available electronically or in any otherformat.


Econometrica, Vol. 76, No. 1 (January, 2008), 175–194

OPTIMAL BANDWIDTH SELECTION INHETEROSKEDASTICITY–AUTOCORRELATION ROBUST TESTING

BY YIXIAO SUN, PETER C. B. PHILLIPS, AND SAINAN JIN1

This paper considers studentized tests in time series regressions with nonparametri-cally autocorrelated errors. The studentization is based on robust standard errors withtruncation lag M = bT for some constant b ∈ (01] and sample size T It is shownthat the nonstandard fixed-b limit distributions of such nonparametrically studentizedtests provide more accurate approximations to the finite sample distributions than thestandard small-b limit distribution. We further show that, for typical economic timeseries, the optimal bandwidth that minimizes a weighted average of type I and type IIerrors is larger by an order of magnitude than the bandwidth that minimizes the asymp-totic mean squared error of the corresponding long-run variance estimator. A plug-inprocedure for implementing this optimal bandwidth is suggested and simulations (notreported here) confirm that the new plug-in procedure works well in finite samples.

KEYWORDS: Asymptotic expansion, bandwidth choice, kernel method, long-runvariance, loss function, nonstandard asymptotics, robust standard error, type I andtype II errors.

1. INTRODUCTION

IN TIME SERIES REGRESSIONS with autocorrelation of unknown form, the stan-dard errors of regression coefficients are usually estimated nonparametricallyby kernel-based methods that involve some smoothing over the sample autoco-variances. The underlying smoothing parameter (b) may be defined as the ratioof the bandwidth to the sample size and is an important tuning parameter thatdetermines the size and power properties of the associated test. It thereforeseems sensible that the choice of b should take these properties into account.However, in conventional approaches (e.g., Andrews (1991); Newey and West(1987, 1994)) and most practical software, the parameter b is chosen to min-imize the asymptotic mean squared error (AMSE) of the long-run variance(LRV) estimator. This approach follows what has long been standard prac-tice in the context of spectral estimation (Grenander and Rosenblatt (1957))where the focus of attention is the spectrum (or, in the present context, theLRV). Such a choice of the smoothing parameter is designed to be optimal inthe AMSE sense for estimation of the relevant quantity (here, the asymptoticstandard error or LRV), but is not necessarily best suited for hypothesis testing.

1An earlier version of the paper is available as Sun, Phillips, and Jin (2006). The present versionis a substantially shorter paper and readers are referred to the earlier version for further details,discussion, proofs, and some asymptotic expansion results of independent interest. The authorsthank Whitney Newey, two anonymous referees, and many seminar participants for comments onearlier versions. Phillips acknowledges partial research support from the Kelly Foundation andthe NSF under grant SES 04-142254. Jin acknowledges financial support from the NSFC (grant70601001).

175


176 Y. SUN, P. C. B. PHILLIPS, AND S. JIN

In contrast to the above convention, the present paper develops a new ap-proach to choosing the smoothing parameter. We focus on two-sided tests andconsider choosing b to optimize a loss function that involves a weighted aver-age of the type I and type II errors, a criterion that addresses the central con-cerns of interest in hypothesis testing, balancing possible size distortion againstpossible power loss in the use of different bandwidths. This new approach toautomatic bandwidth selection requires improved measurement of type I andtype II errors, which are provided here by means of asymptotic expansions ofboth the finite sample distribution of the test statistic and the nonstandard limitdistribution.

We first examine the asymptotic properties of the statistical test under dif-ferent choices of b Using a Gaussian location model, we show that the distrib-ution of the conventionally constructed t statistic is closer to its limit distribu-tion derived under the fixed-b asymptotics than that derived under the small-basymptotics. More specifically, when b is fixed, the error in the rejection prob-ability (ERP) of the nonstandard t-test is O(T−1), while that of the standardnormal test is O(1) The result is related to that of Jansson (2004), who showedthat the ERP of the nonstandard test based on the Bartlett kernel with b = 1is O(logT/T). Our result strengthens and generalizes Jansson’s result in twoaspects. First, we show that the log(T) factor can be dropped. Second, whileJansson’s result applies only to the Bartlett kernel with b= 1, our result appliesto more general kernels with both b = 1 and b < 1 On the other hand, when bis decreasing with the sample size, the ERP of the nonstandard t-test is smallerthan that of the standard normal test, although they are of the same order ofmagnitude. As a consequence, the nonstandard t-test has less size distortionthan the standard normal test. Our analytical findings here support earlier sim-ulation results in Kiefer, Vogelsang, and Bunzel (2000, hereafter KVB), Kieferand Vogelsang (2002a, 2002b, hereafter KV), Gonçalves and Vogelsang (2005),and our own work (Phillips, Sun, and Jin (2006, 2007), hereafter PSJ).

The theoretical development is based on two high-order asymptotic expan-sions. The first is the expansion of the nonstandard limiting distribution aroundits limiting chi-squared form. The second is the expansion of the finite sampledistribution of the t statistic under conventional joint limits where the samplesize T → ∞ and b → 0 simultaneously. These higher-order expansions enableus to develop improved approximations to the type I and type II errors. Fortypical economic time series, the dominant term in the type I error increasesas b decreases while the dominant term in the type II error decreases as b de-creases. Since the desirable effects on these two types of errors generally workin opposing directions, there is an opportunity to trade off these effects. Ac-cordingly, we construct a loss function criterion by taking a weighted averageof these two types of errors and show how b may be selected in such a way asto minimize the loss.

Our approach gives an optimal b that generally has a shrinking rate of atmost b = O(T−q/(q+1)) and that can even be O(1) for certain loss functions,

BANDWIDTH SELECTION IN ROBUST TESTING 177

depending on the weights that are chosen. Note that the optimal b that mini-mizes the asymptotic mean squared error of the corresponding LRV estimatoris O(T−2q/(2q+1)) (cf., Andrews (1991)). Thus, optimal values of b for LRV es-timation are smaller as T → ∞ than those that are most suited for statisticaltesting. The fixed-b rule is obtained by attaching substantially higher weight tothe type I error in the construction of the loss function. This theory thereforeprovides some insight into the type of loss function for which there is a decisiontheoretic justification for the use of fixed-b rules in econometric testing.

To implement the optimal bandwidth selection rule for two-sided tests in alocation model, users may proceed via the following steps:

1. Specify the null hypothesis H0 :β = β0 and an alternative hypothesisH1 : |β−β0| = c0 > 0, where c0 may reflect a value of scientific interest or eco-nomic significance if such a value is available. (In the absence of such a value,we recommend that the user may set the default discrepancy parameter valueδ = 2 in (1) below.)

2. Specify the significance level α of the test and the associated two-sidedstandard normal critical value zα that satisfies (zα)= 1 − α/2

3. Specify a weighted average loss function (see (45) below) that capturesthe relative importance of the type I and II errors, and reflects the relative costof false acceptance and false rejection. Normalize the weight for the type IIerror to be unity and let w be the weight for the type I error.

4. Estimate the model by ordinary least squares (OLS) and construct theresiduals ut

5. Fit an AR(1) model to the estimated residuals and compute

d = 2ρ(1 − ρ)2

σ2 = 1T 2

T∑t=1

(ut − ρut−1)2 δ = √

Tc01 − ρ

σ(1)

where ρ is the OLS estimator of the autoregressive (AR) coefficient. Set δ = 2as a default value if the user is unsure about the alternative hypothesis.

6. Specify the kernel function to be used in heteroskedasticity–autocorre-lation consistent (HAC) standard error estimation. Among the commonly usedpositive definite kernels, we recommend a suitable second-order kernel (q = 2)such as the Parzen or quadratic spectral (QS) kernel.

7. Compute the automatic bandwidth,

b=

(qgd[wG′

0(z2α)−G′

δ(z2α)]

cz2αKδ(z2

α)

)1/3

T−2/3

d[wG′0(z

2α)−G′

δ(z2α)]> 0,

logTT

d[wG′0(z

2α)−G′

δ(z2α)] ≤ 0

(2)


where

g =

6000 Parzen kernel,1421 QS kernel,

c =

0539 Parzen kernel,1000 QS kernel.

(3)

Gδ(·) is the cumulative distribution function (cdf) of a (non)central chi-squared variate with 1 degree of freedom and noncentrality parameter δ2 andKδ(·) is defined in equation (26).

8. Compute the HAC standard error using bandwidth M = bT and con-struct the usual t statistic tb

9. Let zαb = zα + k3b + k4b2, where k3 and k4 are given in Table I. Reject

the null hypothesis if |tb| ≥ zαb.The rest of the paper is organized as follows. Section 2 reviews the first-

order limit theory for the t-test as T → ∞ with the parameter b fixed and asT → ∞ with b approaching zero. Section 3 develops an asymptotic expansionof the nonstandard distribution under the null and local alternative hypothe-ses as b → 0 about the usual central and noncentral chi-squared distributions.Section 4 develops comparable expansions of the finite sample distribution ofthe statistic as T → ∞ and b → 0 at the same time. Section 5 compares theaccuracy of the nonstandard approximation with that of the standard normalapproximation. Section 6 proposes a selection rule for b that is suitable for im-plementation in semiparametric testing. The last section provides some con-cluding discussion. Proofs and relevant technical results are available in Sun,Phillips, and Jin (2006, 2007, hereafter SPJ).

2. HETEROSKEDASTICITY–AUTOCORRELATION ROBUST INFERENCE

Throughout the paper, we focus on inference about β in the location model,

yt = β+ ut t = 12 T(4)

where ut is a zero mean process with a nonparametric autocorrelation struc-ture. The nonstandard limiting distribution in this section and its asymptoticexpansion in Section 3 apply to general regression models under certain con-ditions on the regressors; see KV (2002a, 2002b, 2005). On the other hand,the asymptotic expansion of the finite sample distribution in Section 4 appliesonly to the location model. Although the location model is of interest in itsown right, this is a limitation of the paper and some possible extensions arediscussed in Section 7.

OLS estimation of β gives β = Y = 1T

∑T

t=1 yt and the scaled and centeredestimation error is

√T(β−β)= 1√

TST (5)


where St = ∑t

τ=1 uτ Let uτ = yτ − β be the demeaned time series and let St =∑t

τ=1 uτ be the corresponding partial sum process.The following condition is commonly used to facilitate the limit theory (e.g.,

KVB and Jansson (2004)).

ASSUMPTION 1: S[Tr] satisfies the functional law

T−1/2S[Tr] ⇒ωW (r) r ∈ [01](6)

where ω2 is the long-run variance of ut and W (r) is standard Brownian motion.

Under Assumption 1,

T−1/2S[Tr] ⇒ωV (r) r ∈ [01](7)

where V is a standard Brownian bridge process, and√T(β−β)⇒ωW (1)= N(0ω2)(8)

which provides the usual basis for robust testing about β It is standard empir-ical practice to estimate ω2 using kernel-based nonparametric estimators thatinvolve some smoothing and possibly truncation of the autocovariances. Whenut is stationary, the long-run variance of ut is

ω2 = γ0 + 2∞∑j=1

γ(j)(9)

where γ(j) = E(utut−j) Correspondingly, heteroskedasticity–autocorrelationconsistent (HAC) estimates of ω2 typically have the form

ω2(M)=T−1∑

j=−T+1

k

(j

M

)γ(j)(10)

γ(j)=

1T

T−j∑t=1

ut+jut for j ≥ 0,

1T

T∑t=−j+1

ut+jut for j < 0,

involving the sample covariances γ(j) In (10), k(·) is some kernel functionand M is a bandwidth parameter. Consistency of ω2(M) requires M → ∞ andM/T → 0 as T → ∞ (e.g., Andrews (1991), Newey and West (1987, 1994)).


To test the null H0 :β= β0 against H1 :β = β0 the standard approach relieson a nonparametrically studentized t-ratio statistic of the form

tω(M) = T 1/2(β−β0)/ω(M)(11)

which is asymptotically N(01). Use of tω(M) is convenient empirically andtherefore widespread in practice, in spite of well known problems of size dis-tortion in inference.

To reduce size distortion, KVB and KV (2005) proposed the use of kernel-based estimators of ω2 in which M is set proportional to T , that is, M = bT forsome b ∈ (01]. In this case, the estimator ω2 becomes

ω2b =

T−1∑j=−T+1

k

(j

bT

)γ(j)(12)

and the associated t statistic is given by

tb = T 1/2(β−β0)/ωb(13)

When the parameter b is fixed as T → ∞ KV (2005) showed that underAssumption 1 ω2

b ⇒ ω2Ξb where the limit Ξb is random and given by

Ξb =∫ 1

0

∫ 1

0kb(r − s)dV (r)dV (s)(14)

with kb(·) = k(·/b), and the tb statistic has a nonstandard limit distribution.Under the null hypothesis,

tb ⇒W (1)Ξ−1/2b (15)

whereas under the local alternative H1 :β= β0 + cT−1/2

tb ⇒ (δ+W (1))Ξ−1/2b (16)

where δ = c/ω Thus, the tb statistic has a nonstandard limit distribution thatarises from the random limit of the LRV estimate ω2

b when b is fixed as T → ∞However, as b decreases, the effect of this randomness diminishes, and whenb → 0, the limit distributions under the null and local alternative both ap-proach those of conventional regression tests with consistent LRV estimates.It is important to point out that in the nonstandard limit distribution, W (1)is independent Ξb This is because W (1) is uncorrelated with V (r) for anyr ∈ [01], and both W (1) and V (r) are Gaussian.


In related work, the present authors (PSJ (2006, 2007)) proposed using anestimator of ω2 of the form

ω2ρ =

T−1∑j=−T+1

[k

(j

T

)]ρ

γ(j)(17)

which involves setting M equal to T and taking an arbitrary power ρ ≥ 1 ofthe traditional kernel. Statistical tests based on ω2

ρ and ω2b share many of the

same properties, which is explained by the fact that ρ and b play similar rolesin the construction of the estimates. The present paper focuses on ω2

b and testsassociated with this estimate. Ideas and methods comparable to those exploredin the present paper may be pursued in the context of estimates such as ω2

ρ andare reported in other work (PSJ (2005a, 2005b)).

3. EXPANSION OF THE NONSTANDARD LIMIT THEORY

This section develops asymptotic expansions of the limit distributions givenin (15) and (16) as the bandwidth parameter b → 0 These expansions aretaken about the relevant central and noncentral chi-squared limit distributionsthat apply when b → 0 corresponding to the null and the local alternativehypotheses. The expansions are of some independent interest. For instance,they can be used to deliver correction terms to the limit distributions under thenull, thereby providing a mechanism for adjusting the nominal critical valuesprovided by the usual chi-squared distribution. The latter correspond to thecritical values that would be used for tests based on conventional consistentLRV estimates.

The expansions and later developments in the paper make use of the follow-ing kernel conditions:

ASSUMPTION 2: (i) k(x) : R → [01] is symmetric, piecewise smooth withk(0)= 1 and

∫ ∞0 k(x)xdx < ∞.

(ii) The Parzen characteristic exponent defined by

q = maxq0 :q0 ∈ Z

+ gq0 = limx→0

1 − k(x)

|x|q0<∞

(18)

is greater than or equal to 1.(iii) k(x) is positive semidefinite, that is, for any square integrable function

f (x),∫ ∞

0

∫ ∞0 k(s − t)f (s)f (t)ds dt ≥ 0

Assumption 2 imposes only mild conditions on the kernel function. All thecommonly used kernels satisfy (i) and (ii). The assumption

∫ ∞0 k(x)xdx < ∞

ensures the integrals that appear frequently in later developments are finiteand validates use of the Riemann–Lebesgue lemma in proofs. The assumption


of positive semidefiniteness in (iii) ensures that the associated LRV estimatoris nonnegative. Commonly used kernels that are positive semidefinite includethe Bartlett kernel, the Parzen kernel, and the QS kernel, which are the mainfocus of the present paper. For the Bartlett kernel, the Parzen characteristic ex-ponent is 1 For the Parzen and QS kernels, the Parzen characteristic exponentis 2.

Let Fδ(z) := P|(δ + W (1))Ξ−1/2b | ≤ z be the nonstandard limiting distri-

bution and let Gλ = G(·;λ2) be the cdf of a noncentral χ21(λ

2) variate withnoncentrality parameter λ2 The following theorem establishes the asymptoticexpansion of Fδ(z) around Gδ(z

2)

THEOREM 1: Let Assumption 2 hold. Then

Fδ(z) =Gδ(z2)+pδ(z

2)b+ qδ(z)b2 + o(b2)(19)

where the term o(b2) holds uniformly over z ∈ R+ as b → 0,

pδ(z2)= c2G

′′δ(z

2)z4 − c1G′δ(z

2)z2(20)

qδ(z2)= −G′

δ(z2)z2c3 + 1

2G′′

δ(z2)z4(c4 − c2

1)−G′′′δ (z

2)z6c1c2

and

c1 =∫ ∞

−∞k(x)dx c2 =

∫ ∞

−∞k2(x)dx(21)

c3 = −∫ ∞

−∞k(x)|x|dx c4 = −

∫ ∞

−∞k2(x)|x|dx

As is apparent from the proof given in SPJ (2007), the term c2G′′δ(z

2)z4b inpδ(z

2) arises from the randomness of Ξb whereas the term c1G′δ(z

2)z2b inpδ(z

2) arises from the asymptotic bias of Ξb Although Ξb converges to 1 asb → 0 we have var(Ξb) = 2bc2(1 + o(1)) and E(Ξb) = 1 − bc1(1 + o(1)) asestablished in the proof. Ξb is not centered exactly at 1 because the regres-sion errors have to be estimated. The terms in qδ(z

2) are due to the first- andsecond-order biases of Ξb, the variance of Ξb, and their interactions.

It follows from Theorem 1 that when δ = 0

F0(z) = D(z2)+ [c2D′′(z2)z4 − c1D

′(z2)z2]b(22)

+[−D′(z2)z2c3 + 1

2D′′(z2)z4(c4 − c2

1)−D′′′(z2)z6c1c2

]b2

+ o(b2)


where D(·)=G0(·) is the cdf of χ21 distribution. For any α ∈ (01) let z2

α ∈ R+

z2αb ∈ R

+ such that D(z2α) = 1 −α and F0(zαb) = 1 −α Then, using a Cornish–

Fisher-type expansion, we can obtain high-order corrected critical values.Before presenting the corollary below, we introduce some terminology. We

call the critical value that is correct to O(b) the second-order corrected criti-cal value and call the critical value correct to O(b2) the third-order correctedcritical value. We use this convention throughout the rest of the paper. Thefollowing quantities appear in the corollary:

k1 =(c1 + 1

2c2

)z2α + 1

2c2z

4α

k2 =(

12c2

1 + 32c1c2 + 3

16c2

2 + c3 + 14c4

)z2α

+(

−12c1 + 3

2c1c2 + 9

16c2

2 + 14c4

)z4α +

(516

c22

)z6α −

(116

c22

)z8α

k3 = 12

(c1 + 1

2c2

)zα + 1

4c2z

3α

k4 =(

18c2

1 + 58c1c2 + 1

16c2

2 + 12c3 + 1

8c4

)zα

+(

−14c1 + 5

8c1c2 + 7

32c2

2 + 18c4

)z3α + 1

8c2

2z5α − 1

32c2

2z7α

COROLLARY 2: For asymptotic chi-squared and normal tests:(i) The second-order corrected critical values are

z2αb = z2

α + k1b+ o(b) zαb = zα + k3b+ o(b)(23)

(ii) The third-order corrected critical values are

z2αb = z2

α + k1b+ k2b2 + o(b2) zαb = zα + k3b+ k4b

2 + o(b2)(24)

where zα is the nominal critical value from the standard normal distribution.

Our later developments require only the second-order corrected criticalvalues. The third-order corrected critical values are given here because thesecond-order correction is not enough to deliver a good approximation to theexact critical values when the Parzen and QS kernels are employed and b islarge.

For the Bartlett, Parzen, and QS kernels, we can compute c1, c2, c3, and c4

either analytically or numerically. Table I gives the high-order corrected criti-cal values for these three kernels. These higher order corrected critical values


TABLE I

HIGH-ORDER CORRECTED CRITICAL VALUESz2αb = z2

α + k1b+ k2b2 + o(b2) zαb = zα + k3b+ k4b

2 + o(b2)

α= 5% zα = 1960 α= 10% zα = 1645

k1 k2 k3 k4 k1 k2 k3 k4

Bartlett 100414 169197 25616 26423 60489 97192 18386 19267Parzen 78964 95481 20144 14006 47337 55670 14388 10629QS 141017 386840 35974 65671 83968 218723 25522 46682

provide excellent approximations to the exact ones when b is smaller than 0.5and reasonably good approximations for other values of b

When δ = 0 and the second-order corrected critical values are used, we canuse Theorem 1 to calculate the local asymptotic power, measured by P|(δ +W (1))Ξ−1/2

b |> zαb as in the following corollary.

COROLLARY 3: Let Assumption 2 hold. Then the local asymptotic power satis-fies

P∣∣(δ+W (1))Ξ−1/2

b

∣∣ > zαb = 1 −Gδ(z

2α)− c2z

4αKδ(z

2α)b+ o(b)(25)

as b→ 0, where

Kδ(z) =∞∑j=0

(δ2/2)j

j! e−δ2/2 zj−1/2e−z/2

(j + 1/2)2j+1/2

j

z(26)

is positive for all z and δ

According to Corollary 3, the local asymptotic test power, as measured byP|(δ+W (1))Ξ−1/2

b |> zαb decreases monotonically with b at least when b issmall. For a given critical value, we can show that z4

αKδ(z2α) achieves its maxi-

mum around δ = 2, implying that the power increase resulting from the choiceof a small b is greatest when the local alternative is in an intermediate neigh-borhood of the null hypothesis. For any given local alternative, the function ismonotonically increasing in zα Therefore, the power improvement due to thechoice of a small b increases with the confidence level 1 − α This is expected.When the confidence level is higher, the test is less powerful and the room forpower improvement is greater.

4. EXPANSIONS OF THE FINITE SAMPLE DISTRIBUTION

This section develops a finite sample expansion for the simple locationmodel. This development, like that of Jansson (2004), relies on Gaussianity,


which facilitates the derivations. The assumption could be relaxed by takingdistributions based (for example) on Gram–Charlier expansions, but at thecost of much greater complexity (see, for example, Phillips (1980), Velasco andRobinson (2001)). The following assumption facilitates the development of thehigher order expansion.

ASSUMPTION 3: ut is a mean zero covariance-stationary Gaussian process with∑∞h=−∞ h2|γ(h)|<∞ where γ(h)= Eutut−h

We develop an asymptotic expansion of P|√T(β − β0)/ωb| ≤ z for β =β0 + c/

√T Depending on whether c is zero or not, this expansion can be used

to approximate the size and power of the t-test.Since ut is in general autocorrelated, β and ωb are statistically dependent,

which makes it difficult to write down the finite sample distribution of the t sta-tistic. To tackle this difficulty, we decompose β and ωb into statistically inde-pendent components. Let u= (µ1 uT )

′ y = (y1 yT ) lT = (1 1)T ,and ΩT = var(u) Then the generalized least squares estimator of β is β =(l′TΩ

−1T lT )

−1l′TΩ−1T y and

β−β= β−β+ (l′T lT )−1l′T u(27)

where u= (I− lT (l′TΩ

−1T lT )

−1l′TΩ−1T )u which is statistically independent of β−

β Since ω2b can be written as a quadratic form in u, ω2

b is also statisticallyindependent of β−β Next, it is easy to see that

ω2T := var(

√T(β−β)) = T−1l′TΩT lT = ω2 +O(T−1)(28)

and it follows from Grenander and Rosenblatt (1957) that

ω2T := var(

√T(β−β)) = T(l′TΩ

−1T lT )

−1 = ω2 +O(T−1)(29)

Therefore T−1/2l′T u = N(0O(T−1)). Combining this result with the indepen-dence of β and (u, ωb) we have

P√T(β−β0)/ωb ≤ z(30)

= P√T(β−β)/ωT + c/ωT ≤ zωb/ωT − T−1/2l′T u/ωT =E(zωb/ωT − c/ωT − T−1/2l′T u/ωT )

=E(zωb/ωT − c/ωT )− T−1/2E[ϕ(zωb/ωT − c/ωT )l′T u/ωT ]

+O(T−1)

= P√T(β−β)/ωT + c/ωT ≤ zωb/ωT +O(T−1)


uniformly over z ∈ R where and ϕ are the cdf and probability density func-tion of the standard normal distribution, respectively. The second to last equal-ity follows because ω2

b is quadratic in u and thus E[ϕ(zωb/ωT −c/ωT )l′T u] = 0.

In a similar fashion we find that

P√T(β−β0)/ωb ≤ −z= P√T(β−β)/ωT + c/ωT ≤ −zωb/ωT +O(T−1)

uniformly over z ∈ R Therefore,

FTδ(z) := P|√T(β−β0)/ωb| ≤ z

(31)

= EGδ(z2ω2

b/ω2T ) = EGδ(z

2ςbT ) +O(T−1)

uniformly over z ∈ R+, where ςbT := (ωb/ωT)

2 converges weakly to Ξb.Setting µbT =E(ςbT ), we have

FTδ(z) = Gδ(µbTz2)+ 1

2G′′

δ(µbT z2)E(ςbT −µbT )

2z4(32)

+ 16G′′′

δ (µbT z2)E(ςbT −µbT )

3z6 +O(E(ςbT −µbT )4)

+O(T−1)

where the O(·) term holds uniformly over z ∈ R+. By developing asymptotic ex-

pansions of µbT and E(ςbT −µbT )m for m = 1234 we can establish a higher-

order expansion of the finite sample distribution for the case where T → ∞and b → 0 at the same time. This expansion validates, for finite samples, theuse of the second-order corrected critical values given in the previous sectionthat were derived there on the basis of an expansion of the (nonstandard) limitdistribution.

THEOREM 4: Let Assumptions 2 and 3 hold. If bT → ∞ as T → ∞ andb → 0 then

FTδ(z) = Gδ(z2)+ [c2G

′′δ(µbz

2)z4 − c1G′δ(z

2)z2]b(33)

− gqdqTG′δ(z

2)z2(bT)−q + ob+ (bT)−q +O(T−1)

where dqT =ω−2T

∑∞h=−∞ |h|qγ(h), ω2

T = T−1l′TΩT lT , and the o(·) and O(·) termshold uniformly over z ∈ R

+

Under the null hypothesis, δ = 0 and Gδ(·)= D(·) so

FT0(z) = D(z2)+ [c2D′′(z2)z4 − c1D

′(z2)z2]b(34)

− gqdqTD′(z2)z2(bT)−q + ob+ (bT)−q +O(T−1)


The leading two terms (up to O(b)) in this expansion are the same as thosein the corresponding expansion of the limit distribution F0(z) given in (22)above. Thus, use of the second-order corrected critical values given in (23),which take account of terms up to O(b) should lead to size improvementswhen Tqbq+1 → ∞

The third term in the expansion (34) is O(T−q) when b is fixed. When b de-creases with T this term provides an asymptotic measure of the size distortionin tests based on the use of the first two terms of (34) or, equivalently, thosebased on the nonstandard limit theory, at least to O(b). Thus, the third termof (34) approximately measures how satisfactory the second-order correctedcritical values given in (23) are for any given values of b and T .

Under the local alternative hypothesis, the power of the test based on thesecond-order corrected critical values is 1 − FTδ(zαb) Theorem 4 shows thatFTδ(zαb) can be approximated by

Gδ(z2αb)+ [c2G

′′δ(z

2αb)z

4αb − c1G

′δ(z

2αb)z

4αb]b− gqdqTG

′δ(z

2)z2(bT)−q

with an approximation error of order o((bT)−q + b) + O(T−1) Note that theabove approximation depends on δ only through δ2 so it is valid under bothH1 :β−β0 = c/

√T and H1 :β−β0 = −c/

√T

These results on size distortion and local power are formalized in the follow-ing corollary.

COROLLARY 5: Let Assumptions 2 and 3 hold. If bT → ∞ as T → ∞ andb→ 0, then:

(a) The size distortion of the two-sided t-test based on the second-order cor-rected critical values is

(1−FT0(zαb))−α = gqdqTD′(z2

α)z2α(bT)

−q+o((bT)−q+b)+O(T−1)(35)

(b) Under the local alternative H1 : |β − β0| = c/√T , the power of the two-

sided t-test based on the second-order corrected critical values is

1 − FTδ(zαb) = 1 −Gδ(z2α)− c2z

4αKδ(z

2α)b(36)

+ gqdqTG′δ(z

2α)z

2α(bT)

−q + o((bT)−q + b)+O(T−1)

5. ACCURACY OF THE NONSTANDARD APPROXIMATION

The previous section showed that when b goes to zero at a certain rate, theERP of the standard normal or chi-squared test is at least O(T−q/(q+1)) fortypical economic time series. This section establishes a related result for thenonstandard test when b is fixed and then compares the ERP of these two testsunder the same asymptotic specification, that is, either b is fixed or b → 0 Asin the previous section, we focus on the Gaussian location model.


In view of (31), the error of the nonstandard approximation is given by

FT0(z)− F0(z) := P|√T(β−β)/ωb| ≤ z

− P|W (1)Ξ−1/2

b | ≤ z

(37)

= ED(z2ςbT )−ED(z2Ξb)+O(T−1)

By comparing the cumulants of ςbT − EςbT with those of Ξb − EΞb we canshow that ED(z2ςbT )−ED(z2Ξb)=O(T−1) which, combined with the aboveequation, gives Theorem 6 below. The requirement b < 1/(16

∫ ∞−∞ |k(x)|dx)

on b that appears in the theorem is a technical condition in the proof thatfacilitates the use of a power series expansion. The requirement can be relaxed,but at the cost of more extensive and tedious calculations.

THEOREM 6: Let Assumptions 2 and 3 hold. If b < 1/(16∫ ∞

−∞ |k(x)|dx), then

FT0(z)= F0(z)+O(T−1)(38)

uniformly over z ∈ R+ when T → ∞ with fixed b

Under the null hypothesis H0 :β = β0, we have δ = 0 In this case, Theo-rem 6 indicates that the ERP for tests with b fixed and using critical valuesobtained from the nonstandard limit distribution of W (1)Ξ−1/2

b is O(T−1) Thetheorem is an extension to the result of Jansson (2004), who considered onlythe Bartlett-type kernel with b= 1 and proved that the ERP is O(T−1 log(T))It is an open question in Jansson (2004) whether the log(T) factor can be omit-ted. Theorem 6 provides a positive answer to this question.

In the previous section, we showed that when b → 0T → ∞ such thatbT → ∞

FT0(z)= D(z2)+O(T−q/(q+1))(39)

for typical economic time series. Comparing (38) with (39), one may concludethat the error of the nonstandard approximation is smaller than that of thestandard normal approximation by an order of magnitude. However, the twoO(·) terms are obtained under different asymptotic specifications. The O(·)term in (38) holds for fixed b, while the O(·) term in (39) holds for diminish-ing b Since the O(·) term in (38) does not hold uniformly over b ∈ (01] thetwo O(·) terms cannot be directly compared, although they are obviously sug-gestive of the relative quality of the two approximations when b is small, as ittypically will be in practical applications.

Indeed, F0(z) and D(z2) are just different approximations to the same quan-tity FT0(z) To compare the two approximations more formally, we need toevaluate FT0(z)− F0(z) and FT0(z)−D(z2) under the same asymptotic spec-ification, that is, either b is fixed or b→ 0


First, when b is fixed, we have

FT0(z)−D(z2)= FT0(z)− F0(z)+ F0(z)−D(z2)=O(1)(40)

This is because FT0(z)− F0(z) =O(1/T) as shown in Theorem 6 and F0(z)−D(z2)=O(1). Comparing (40) with (38), we conclude that when b is fixed, theerror of the nonstandard approximation is smaller than that of the standardapproximation by an order of magnitude.

Second, when b= O(T−q/(q+1)) we have

FT0(z)− F0(z) = [FT0(z)−D(z2)] − [F0(z)−D(z2)](41)

= −gqdqTD′(z2)z2(bT)−q + ob+ (bT)−q

+O(T−1)

where we have used Theorems 1 and 4. Therefore, when dqT > 0 which istypical for economic time series, the error of the nonstandard approximationis smaller than that of the standard normal approximation, although they areof the same order of magnitude for this choice of b.

We can conclude from the above analysis that the nonstandard distributionprovides a more accurate approximation to the finite sample distribution re-gardless of the asymptotic specification employed. There are two reasons forthe better performance: the nonstandard distribution mimics the randomnessof the denominator of the t statistic and it accounts for the bias of the LRVestimator that results from the unobservability of the regressor errors. As aresult, the critical values from the nonstandard limiting distribution provide ahigher-order correction on the critical values from the standard normal dis-tribution. However, just as in the standard limiting theory, the nonstandardlimiting theory does not deal with another source of bias, that is, the usual biasthat arises in spectral density estimation even when a time series is known tobe mean zero and observed. This second source of bias manifests itself in theerror of approximation given in (41).

6. OPTIMAL BANDWIDTH CHOICE

It is well known that the optimal choice of b that minimizes the asymptoticmean squared error in LRV estimation has the form b = O(T−2q/(2q+1)). How-ever, there is no reason to expect that such a choice is the most appropriatein statistical testing using nonparametrically studentized statistics. Developingan optimal choice of b for semiparametric testing is not straightforward andinvolves some conceptual as well as technical challenges. In what follows weprovide one possible approach to constructing an optimizing criterion that isbased on balancing the type I and type II errors.


In view of the asymptotic expansion (35), we know that the type I error for atwo-sided test with nominal size α can be expressed as

1 −FT0(zαb)= α+gqdqTD′(z2

α)z2α(bT)

−q +o((bT)−q +b)+O(T−1)(42)

Similarly, from (36), the type II error has the form

Gδ(z2α)+ c2z

4αKδ(z

2α)b− gqdqTG

′δ(z

2α)z

2α(bT)

−q(43)

+ o((bT)−q + b)+O(T−1)

A loss function for the test can be constructed based on the following threefactors: (i) the magnitude of the type I error, as measured by the second termof (42); (ii) the magnitude of the type II error, as measured by the O(b) andO((bT)−q) terms in (43); and (iii) the relative importance of the type I andtype II errors.

For most economic time series we can expect that dqT > 0, and then bothgqdqTD

′(z2α)z

2α > 0 and gqdqTG

′δ(z

2α)z

2α > 0 Hence, the type I error increases as

b decreases. On the other hand, the (bT)−q term in (43) indicates that there isa corresponding decrease in the type II error as b decreases. Indeed, for δ > 0the decrease in the type II error will generally exceed the increase in the type Ierror because G′

δ(z2α) > D′(z2

α) for δ ∈ (075) and zα = 16451960, or 2580.The situation is further complicated by the fact that there is an additional O(b)term in the type II error. As we saw earlier, Kδ(z

2α) > 0, so that the second term

of (43) leads to a reduction in the type II error as b decreases. Thus, the type IIerror generally decreases with b for two reasons—one from the nonstandardlimit theory and the other from the (typical) downward bias in estimating thelong-run variance.

The case of dqT < 0 usually arises where there is negative serial correlationin the errors and so tends to be less typical for economic time series. In sucha case, (42) shows that the type I error increases with b while the type II errormay increase or decrease with b depending on which of the two terms in (43)dominates.

These considerations suggest that a loss function may be constructed by tak-ing a suitable weighted average of the type I and type II errors given in (42)and (43). Setting

eIT = α+ gqdqTD

′(z2)z2(bT)−q(44)

eIIT =Gδ(z

2α)+ c2z

4αKδ(z

2α)b− gqdqTG

′δ(z

2α)z

2α(bT)

−q

we define the loss function to be

L(b;δTzα)= wT(δ)

1 +wT(δ)eIT + 1

1 +wT(δ)eIIT (45)


where wT(δ) is a function that determines the relative weight on the typeI and II errors, and this function is allowed to depend on the sample sizeT and δ Obviously, the loss L(b;δTzα) is here specified for a particularvalue of δ and this function could be adjusted in a simple way so that the typeII error is averaged over a range of values of δ with respect to some (prior)distribution over alternatives.

We focus below on the case of a fixed local alternative, in which case we cansuppress the dependence of wT(δ) on δ and write wT =wT(δ). To sum up, theloss function we consider is of the form

L(b;δTzα) = [gqdqT wTD

′(z2α)−G′

δ(z2α)z2

α(bT)−q + c2z

4αKδ(z

2α)b

](46)

× 11 +wT

+CT

where CT = [wTα+Gδ(z2α)]/[1 +wT ], which does not depend on b In the rest

of this section, we consider the case wTD′(z2

α)−G′δ(z

2α) > 0 which holds if the

relative weight wT is large enough.It turns out that the optimal choice of b depends on whether dqT > 0 or

dqT < 0 We consider these two cases in turn. When dqT > 0 the loss functionL(b;δTzα) is minimized for the following choice of b:

bopt =qgqdqT [wTD

′(z2α)−G′

δ(z2α)]

c2z2αKδ(z2

α)

1/(q+1)

T−q/(q+1)(47)

Therefore, the optimal shrinkage rate for b is O(T−q/(q+1)) when wT is a fixedconstant. If wT → ∞ as T → ∞, we then have

bopt =qgqdqTD

′(z2α)

z2αc2Kδ(z2

α)

1/(q+1)(wT

Tq

)1/(q+1)

(48)

Fixed-b rules may then be interpreted as assigning relative weight wT =O(Tq)in the loss function so that the emphasis in tests based on such rules is a smalltype I error, at least when we expect the type I error to be larger than thenominal size of the test. This gives us an interpretation of fixed-b rules in termsof the loss perceived by the econometrician using such a rule. Within the moregeneral framework given by (47), b may be fixed or shrink with T up to anO(T−q/(q+1)) rate corresponding to the relative importance that is placed in theloss function on the type I and type II errors.

Observe that when b = O((wT/Tq)1/(q+1)), size distortion is O(wTT)

−q/(q+1)

rather than O(T−1) as it is when b is fixed. Thus, the use of b = bopt for a fi-nite wT involves some compromise by allowing the error order in the rejectionprobability to be somewhat larger so as to achieve higher power. Such com-promise is an inevitable consequence of balancing the two elements in the lossfunction (46). Note that even in this case, the order of ERP is smaller than


O(T−q/(2q+1)) which is the order of the ERP for the conventional procedure inwhich standard normal critical values are used and b is set to be O(T−2q/(2q+1))

For Parzen and QS kernels the optimal rate of b is T−2/3 whereas for theBartlett kernel the optimal rate is T−1/2 Therefore, in large samples, a smallerb should be used with the Parzen and QS kernels than with the Bartlett ker-nel. In the former cases, the ERP is at most of order T−2/3 and in the lattercase the ERP is at most of order T−1/2 The O(T−2/3) rate of the ERP for thequadratic kernels represents an improvement on the Bartlett kernel. Note thatfor the optimal b, the rate of the ERP is also the rate for which the loss functionL(b;δTzα) approaches CT from above. Therefore, the loss L(b;δTzα) isexpected to be smaller for quadratic kernels than for the Bartlett kernel inlarge samples. Finite sample performance may not necessarily follow this or-dering, however, and will depend on the sample size and the shape of the spec-tral density of ut at the origin.

The formula for bopt involves the unknown parameter dqT which could beestimated nonparametrically (e.g., Newey and West (1994)) or by a standardplug-in procedure based on a simple model like AR(1) (e.g., Andrews (1991)).Both methods achieve a valid order of magnitude and the procedure is obvi-ously analogous to conventional data-driven methods for HAC estimation.

When wTD′(z2

α) − G′δ(z

2α) > 0 and dqT < 0 L(b;δTzα) is an increasing

function of b. To minimize loss in this case, we can choose b to be as smallas possible. Since the loss function is constructed under the assumption thatb → 0 and bT → ∞ the choice of b is required to be compatible with thesetwo rate conditions. These considerations lead to a choice of b of the formb = JT/T for some JT that goes to infinity but at a slowly varying rate relativeto T In practice, we may set JT = log(T) so that b = (logT)/T

To sum up, for typical economic time series, the value of b that minimizes theweighted type I and type II errors has a shrinkage rate of b =O(T−q/(q+1)) Thisrate may be compared with the optimal rate of b = O(T−2q/(2q+1)) that applieswhen minimizing the mean squared error of estimation of the correspondingHAC estimate, ω2

b itself (Andrews (1991)). Thus, the AMSE optimal valuesof b for HAC estimation are smaller as T → ∞ than those that are most suitedfor statistical testing. In effect, optimal HAC estimation tolerates more bias soas to reduce variance in estimation.

In contrast, optimal b selection in HAC testing undersmooths the long-runvariance estimate to reduce bias and allows for greater variance in long-runvariance estimation through higher order adjustments to the nominal asymp-totic critical values or by direct use of the nonstandard limit distribution. Thisconclusion highlights the importance of bias reduction in statistical testing.

In Monte Carlo experiments not reported here, we have compared the fi-nite sample performance of the new plug-in procedure with that of the con-ventional plug-in procedure given in Andrews (1991). The findings from thesesimulations indicate that the new plug-in procedure works well in terms of in-curring a smaller loss than the conventional plug-in procedure. Detailed resultsof this experiment are reported in Sun, Phillips, and Jin (2006).


7. CONCLUDING DISCUSSION

Automatic bandwidth choice is a long-standing problem in time series mod-els when the autocorrelation is of unknown form. Existing automatic methodsare all based on minimizing the asymptotic mean squared error of the standarderror estimator, a criterion that is not directed at statistical testing. In hypoth-esis testing, the focus of attention is the type I and type II errors that arise intesting, and it is these errors that give rise to loss. Consequently, it is desirableto make the errors of incorrectly rejecting a true null hypothesis and failingto reject a false null hypothesis as small as possible. While these two types oferrors may not be simultaneously reduced, it is possible to design bandwidthchoice to control the loss from these errors. This paper develops for the firsttime a theory of optimal bandwidth choice that achieves this end by minimizinga weighted average of the type I and type II errors.

The results in this paper suggest some important areas of future research.The present work has focused on two-sided tests for the Gaussian locationmodel, but the ideas and methods explored here can be used as a foundationfor tackling bandwidth choice problems in nonparametrically studentized testsand confidence interval construction in general regression settings with linearinstrumental variable and Generalized Method of Moments (GMM) estima-tion. Results for these general settings will be reported in later work. The ad-ditional complexity of the formulae in the general case complicates the searchfor an optimal truncation parameter, but the main conclusion of the presentfindings are the same, namely, that when the goal is testing or confidence in-terval construction, it is advantageous to reduce bias in HAC estimation byundersmoothing.

Dept. of Economics, University of California, San Diego, La Jolla, CA 92093,U.S.A.; [email protected],

Cowles Foundation, Yale University, Box 20828, New Haven, CT 06520, U.S.A.;University of Auckland, Auckland, New Zealand; and University of York, York,U.K.; [email protected],

andGuanghua School of Management, Peking University, Beijing, 100871, China;

[email protected].

Manuscript received December, 2004; final revision received June, 2007.

REFERENCES

ANDREWS, D. W. K. (1991): “Heteroskedasticity and Autocorrelation Consistent CovarianceMatrix Estimation,” Econometrica, 59, 817–854. [175,177,179,192]

GONÇALVES, S., AND T. J. VOGELSANG (2005): “Block Bootstrap Puzzles in HAC Robust Test-ing: The Sophistication of the Naive Bootstrap,” Working Paper, University of Montreal andCornell University. [176]

GRENANDER, U., AND M. ROSENBLATT (1957): Statistical Analysis of Stationary Time Series. NewYork: Wiley. [175,185]

mailto:[email protected]



http://www.e-publications.org/srv/ecta/linkserver/setprefs?rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C

http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:1/A91&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C

http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:3/GR57&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C

http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:1/A91&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C

http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:3/GR57&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C


JANSSON, M. (2004): “On the Error of Rejection Probability in Simple Autocorrelation RobustTests,” Econometrica, 72, 937–946. [176,179,184,188]

KIEFER, N. M., AND T. J. VOGELSANG (2002a): “Heteroskedasticity–Autocorrelation RobustTesting Using Bandwidth Equal to Sample Size,” Econometric Theory, 18, 1350–1366. [176,178]

(2002b): “Heteroskedasticity–Autocorrelation Robust Standard Errors Using theBartlett Kernel Without Truncation,” Econometrica, 70, 2093–2095. [176,178]

(2005): “A New Asymptotic Theory for Heteroskedasticity–Autocorrelation RobustTests,” Econometric Theory, 21, 1130–1164. [178,180]

KIEFER, N. M., T. J. VOGELSANG, AND H. BUNZEL (2000): “Simple Robust Testing of RegressionHypotheses,” Econometrica, 68, 695–714. [176]

NEWEY, W. K., AND K. D. WEST (1987): “A Simple, Positive Semidefinite, Heteroskedasticityand Autocorrelation Consistent Covariance Matrix,” Econometrica, 55, 703–708. [175,179]

(1994): “Automatic Lag Selection in Covariance Estimation,” Review of Economic Stud-ies, 61, 631–654. [175,179,192]

PHILLIPS, P. C. B. (1980): “Finite Sample Theory and the Distributions of Alternative Estimatorsof the Marginal Propensity to Consume,” Review of Economic Studies, 47, 183–224. [185]

PHILLIPS, P. C. B., Y. SUN, AND S. JIN (2005a): “Improved HAR Inference Using Power KernelsWithout Truncation,” Mimeographed, Yale University. [181]

(2005b): “Balancing Size and Power in Nonparametric Studentized Testing withQuadratic Power Kernels Without Truncation,” Working Paper, Department of Economics,UCSD. [181]

(2006): “Spectral Density Estimation and Robust Hypothesis Testing Using Steep OriginKernels Without Truncation,” International Economic Review, 21, 837–894. [176,181]

(2007): “Long-Run Variance Estimation and Robust Regression Testing Using SharpOrigin Kernels with No Truncation,” Journal of Statistical Planning and Inference, 137,985–1023. [176,181]

SUN, Y., P. C. B. PHILLIPS, AND S. JIN (2006): “Optimal Bandwidth Selection inHeteroskedasticity–Autocorrelation Robust Testing,” Working Paper, Department of Eco-nomics, UCSD. [175,178,192]

(2007): “Supplement to ‘Optimal Bandwidth Selection in Heteroskedasticity–Autocorrelation Robust Testing’,” Econometrica Supplementary Material, 76, http://www.econometricsociety.org/ecta/Supmat/5582proofs.pdf. [178,182]

VELASCO, C., AND P. M. ROBINSON (2001): “Edgeworth Expansions for Spectral Density Esti-mates and Studentized Sample Mean,” Econometric Theory, 17, 497–539. [185]

http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:4/J04&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C

http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:5/KV02a&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C

http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:6/KV02b&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C

http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:7/KV05&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C

http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:8/KVB00&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C

http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:9/NW87&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C


http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:11/P80&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C

http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:14/PSJ06&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C


http://www.econometricsociety.org/ecta/Supmat/5582proofs.pdf

http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:18/VR01&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C

http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:4/J04&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C

http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:5/KV02a&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C





http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:8/KVB00&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C




http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:11/P80&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C






http://www.econometricsociety.org/ecta/Supmat/5582proofs.pdf

http://www.e-publications.org/srv/ecta/linkserver/openurl?rft_dat=bib:18/VR01&rfe_id=urn:sici%2F0012-9682%28200801%2976%3A1%3C175%3AOBSIHR%3E2.0.CO%3B2-C

Econometrica Supplementary Material

SUPPLEMENT TO “OPTIMAL BANDWIDTH SELECTION INHETEROSKEDASTICITY–AUTOCORRELATION ROBUST TESTING”

(Econometrica, Vol. 76, No. 1, January 2008, 175–194)

BY YIXIAO SUN, PETER C. B. PHILLIPS, AND SAINAN JIN

THIS APPENDIX provides technical results and proofs for the paper.

A.1. Technical Lemmas and Supplements

LEMMA 1: Define

c∗1 = 4

∫ ∞

−∞|k(v)|dv

Let Assumption 2 hold. Then the cumulants κm of Ξb −µb satisfy

|κm| ≤ 2m(m− 1)!(c∗1b)

m−1 for m ≥ 1(A.1)

and its moments αm =E(Ξb −µb)m satisfy

|αm| ≤ 22mm!(c∗1b)

m−1 for m≥ 1(A.2)

PROOF OF LEMMA 1: To find the cumulants of Ξb −µb, we write

Ξb =∫ 1

0

∫ 1

0k∗b(r s)dW (r)dW (s)(A.3)

where k∗b(r s) is defined by

k∗b(r s) = kb(r − s)−

∫ 1

0kb(r − t) dt −

∫ 1

0kb(τ − s)dτ

+∫ 1

0

∫ 1

0kb(t − τ)dt dτ

It can be shown that the cumulant generating function of Ξb −µb is

lnφ(t)=∞∑

m=1

κm

(it)m

m! (A.4)

where κ1 = 0 and for m≥ 2,

κm = 2m−1(m− 1)!∫ 1

0· · ·

∫ 1

0

(m∏j=1

k∗b(τj τj+1)

)dτ1 · · · dτm(A.5)

1


so

|κm| ≤ 2m(m− 1)!(c∗1b)

m−1(A.11)

Note that the moments αj and cumulants κj satisfy the relationship

αm =∑π

m!(j1!)m1(j2!)m2 · · · (j!)m

1m1!m2! · · ·m!

∏j∈π

κj(A.12)

where the sum is taken over the elements

π = [j1 j1︸︷︷︸m1 times

j2 j2︸︷︷︸m2 times

j j︸︷︷︸m times

](A.13)

for some integer , sequence jii=1 such that j1 > j2 > · · · > j and m =∑

i=1 miji.Combining the preceding formula with (A.11) gives

|αm| < 2mm!(c∗1b)

m−1∑π

(j1)−m1(j2)

−m2 · · · (j)−m

m1!m2! · · ·m!(A.14)

≤ 22mm!(c∗1b)

m−1

where the last line follows because

∑π

(j1)−m1(j2)

−m2 · · · (j)−m

m1!m2! · · ·m! ≤∑π

1m1!m2! · · ·m! < 2m(A.15)

Q.E.D.

LEMMA 2: Let Assumptions 2 and 3 hold. When T → ∞ for a fixed b, we have

µbT = µb +O

(1T

);(A.16)

κmT = κm +O

m!2m

T 2(c∗

1b)m−2

(A.17)

uniformly over m≥ 1;

αmT =E(ςbT −µbT )m = αm +O

m!22m

T 2(c∗

1b)m−2

(A.18)

uniformly over m≥ 1.


PROOF OF LEMMA 2: We first calculate µbT = (Tω2T )

−1 Trace(ΩTATWbAT).Let W ∗

b =ATWbAT . Then the (i j)th element of W ∗b is

kb

(i

Tj

T

)= kb

(i− j

T

)− 1

T

T∑p=1

kb

(i−p

T

)(A.19)

− 1T

T∑q=1

kb

(q− j

T

)+ 1

T 2

T∑p=1

T∑q=1

kb

(p− q

T

)

So

Trace(ΩTATWbAT)(A.20)

= Trace(ΩTW∗b )

=∑

1≤r1r2≤T

γ(r1 − r2)kb

(r1

Tr2

T

)

=T∑

r2=1

T−r2∑h1=1−r2

γ(h1)kb

(r2 + h1

Tr2

T

)

=(

T−1∑h1=1

T−h1∑r2=1

+0∑

h1=1−T

T∑r2=1−h1

)γ(h1)kb

(r2 + h1

Tr2

T

)

But

T−h1∑r2=1

kb

(r2 + h1

Tr2

T

)(A.21)

=T−h1∑r2=1

kb

(h1

T

)− 1

T

T∑r1=1+h1

T∑p=1

kb

(r1 −p

T

)

− 1T

T−h1∑r2=1

T∑q=1

kb

(q− r2

T

)+

T−h1∑r2=1

1T 2

T∑p=1

T∑q=1

kb

(p− q

T

)

= − 1T

T∑r1=1

T∑p=1

kb

(r1 −p

T

)− 1

T

T∑r2=1

T∑q=1

kb

(q− r2

T

)

+T∑

r2=1

1T 2

T∑p=1

T∑q=1

kb

(p− q

T

)+ Tkb

(h1

T

)+C(h1)


= − 1T

T∑r=1

T∑s=1

kb

(r − s

T

)+ Tkb

(h1

T

)+C(h1)

=T∑

r2=1

kb

(r2

Tr2

T

)+ T

kb

(h1

T

)− kb(0)

+C(h1)

where C(h1) is a function of h1 that satisfies |C(h1)| ≤ h1. Similarly,

T∑r2=1−h1

kb

(r2 + h1

Tr2

T

)(A.22)

=T∑

r2=1

kb

(r2

Tr2

T

)+ T

kb

(h1

T

)− kb(0)

+C(h1)

Therefore, Trace(ΩTATWbAT) is equal to

T−1∑h=−T+1

γ(h)

T∑r2=1

kb

(r2

Tr2

T

)+ T

T−1∑h=−T+1

γ(h)

kb

(h

T

)− kb(0)

(A.23)

+O(1)

=T−1∑

h=−T+1

γ(h)

T∑r2=1

kb

(r2

Tr2

T

)

+ T(bT)−q

T−1∑h=−T+1

|h|qγ(h)k(h/(bT))− k(0)

|h/(bT)|q

+O(1)

=T−1∑

h=−T+1

γ(h)

T∑r2=1

kb

(r2

Tr2

T

)

+ T(bT)−qgq

∞∑h=−∞

|h|qγ(h)(1 + o(1))+O(1)

Using

T−1∑h=−T+1

γ(h)=ω2T

(1 +O

(1T

))(A.24)


and

1T

T∑r2=1

kb

(r2

Tr2

T

)=

∫ 1

0k∗b(r r)dr +O

(1T

)(A.25)

we now have

µbT =∫ 1

0k∗b(r r)dr − (bT)−qgq

(ω−2

T

∞∑h=−∞

|h|qγ(h))(1 + o(1))(A.26)

+O

(1T

)

By definition, µb = EΞb = ∫ 10 k∗

b(r r)dr and thus µbT = µb + O(T−1) as de-sired.

We next approximate Trace[(ΩTATWbAT)m] for m > 1. The approach is

similar to the case m = 1 but notationally more complicated. Let r2m+1 = r1,r2m+2 = r2, and hm+1 = h1. Then

Trace[(ΩTATWbAT)m](A.27)

=T∑

r1r2r2m+1=1

m∏j=1

γ(r2j−1 − r2j)kb

(r2j

Tr2j+1

T

)

=T∑

r2r4r2m=1

T−r2∑h1=1−r2

T−r4∑h2=1−r4

· · ·T−r2m∑

hm=1−r2m

m∏j=1

γ(hj)kb

(r2j

Tr2j+2 + hj+1

T

)

=(

T−1∑h1=1

T−h1∑r2=1

+0∑

h1=1−T

T∑r2=1−h1

)· · ·

(T−1∑hm=1

T−hm∑r2m=1

+0∑

hm=1−T

T∑r2m=1−hm

)

m∏j=1

γ(hj)kb

(r2j

Tr2j+2 + hj+1

T

)

= I + II

where

I =(

T−1∑h1=1

T−h1∑r2=1

+0∑

h1=1−T

T∑r2=1−h1

)· · ·

(T−1∑hm=1

T−hm∑r2m=1

+0∑

hm=1−T

T∑r2m=1−hm

)(A.28)

m∏j=1

γ(hj)kb

(r2j

Tr2j+2

T

)


and

II = O

(T−1∑h1=1

T−h1∑r2=1

+0∑

h1=1−T

T∑r2=1−h1

)· · ·

(T−1∑hm=1

T−hm∑r2m=1

+0∑

hm=1−T

T∑r2m=1−hm

)(A.29)

m∏j=1

|γ(hj)|( |hj+1|

bT

)

Here we have used∣∣∣∣kb

(r2j

Tr2j+2 + hj+1

T

)− kb

(r2j

Tr2j+2

T

)∣∣∣∣ = O

( |hj+1|bT

)(A.30)

To show this, note that

1T

T∑p=1

kb

(p− r2j+2 − hj+1

T

)= 1

T

T−hj+1∑p=1−hj+1

kb

(p− r2j+2

T

)(A.31)

= 1T

T∑p=1

kb

(p− r2j+2

T

)+O

( |hj+1|T

)

and ∣∣∣∣kb

(r2j − r2j+2 − hj+1

T

)− kb

(r2j − r2j+2

T

)∣∣∣∣ =O

( |hj+1|bT

)(A.32)

so that

kb

(r2j

Tr2j+2 + hj+1

T

)= kb

(r2j

Tr2j+2

T

)+ kb

(r2j − r2j+2 − hj+1

T

)(A.33)

− kb

(r2j − r2j+2

T

)+O

( |hj+1|T

)

= kb

(r2j

Tr2j+2

T

)+O

( |hj+1|bT

)

The first term (I) can be written as

I =(

T−1∑h1=1−T

T∑r2=1

−T−1∑h1=1

T∑r2=T−h1+1

−0∑

h1=1−T

−h1∑r2=1

)(A.34)

· · ·(

T−1∑hm=1−T

T∑r2m=1

−T−1∑hm=1

T∑r2m=T−hm+1

−0∑

hm=1−T

−hm∑r2m=1

)


m∏j=1

γ(hj)

kb

(r2j

Tr2j+2

T

)

=∑π

∑h1r2

· · ·∑

hmr2m

m∏j=1

γ(hj)

kb

(r2j

Tr2j+2

T

)

where∑

hjr2jis one of the three choices

∑T−1hj=1−T

∑T

r2j=1, −∑T−1hj=1

∑T

r2j=T−hj+1, or

−∑0hj=1−T

∑−hjr2j=1, and

∑π is the summation over all possible combinations of

(∑

h1r2· · ·∑hmr2m

). The 3m summands in (A.34) can be divided into twogroups: the first group consists of the summands all of whose r indices runfrom 1 to T and the second group consists of the rest. It is obvious that the firstgroup can be written as

(∑h

m∏j=1

γ(hj)

)∑r

kb

(r2j

Tr2j+2

T

)

The dominating terms (in terms of the order of magnitude) in the second groupare of the forms

T−1∑h1=1−T

T∑r2=1

· · ·T−1∑

hp=1−T

T∑r2p=T−hp+1

· · ·T−1∑

hm=1−T

T∑r2m=1

m∏j=1

γ(hj)

kb

(r2j

Tr2j+2

T

)

or

T−1∑h1=1−T

T∑r2=1

· · ·T−1∑

hp=1−T

−hp∑r2p=1

· · ·T−1∑

hm=1−T

T∑r2m=1

m∏j=1

γ(hj)

kb

(r2j

Tr2j+2

T

)

These are the summands with only one r index not running from 1 to T . Bothof the above terms are bounded by

T−1∑h1=1−T

T∑r2=1

· · ·T−1∑

hp=1−T

· · ·T−1∑

hm=1−T

T∑r2m=1

m∏j=1

|γ(hj)||hp|∏j =p

∣∣∣∣kb

(r2j

Tr2j+2

T

)∣∣∣∣

≤[

supr4

T∑r2=1

kb

(r2

Tr4

T

)]m−2(∑hj

|γ(hj)|)m−1(∑

hp

|γ(hp)||hp|)

using the same approach as in (A.6). Approximating the sum by an integraland noting that the second group contains (m − 1) terms, all of which areof the same order of magnitude as the above typical dominating terms, we


conclude that the second group is O[2mTm−2(c∗1b)

m−2] uniformly over m. As aconsequence,

I =(∑

h

m∏j=1

γ(hj)

)∑r

kb

(r2j

Tr2j+2

T

)+O2mTm−2(c∗

1b)m−2(A.35)

uniformly over m.The second term (II) is easily shown to be o(2mTm−2(c∗

1b)m−2) uniformly

over m. Therefore,

Trace[(ΩTATWbAT)m](A.36)

=(∑

h

γ(h)

)m ∑r

kb

(r2j

Tr2j+2

T

)+O2mTm−2(c∗

1b)m−2

and

κmT = 2m−1(m− 1)!T−m(ω2T )

−m Trace[(ΩTATWbAT)m](A.37)

= 2m−1(m− 1)!T−m

∑r

kb

(r2j

Tr2j+2

T

)+O

[2mT 2

(c∗1b)

m−2

]

= 2m−1(m− 1)!

×∫ m∏

j=1

∫ 1

0k∗b(τj τj+1)dτj dτj+1 +O

[2mT 2

(c∗1b)

m−2

]

= κm +O

m!2m

T 2(c∗

1b)m−2

uniformly over m.Finally, we consider αmT . Note that α1T =E(ςbT −µbT )= 0 and

αmT =∑π

m!(j1!)m1(j2!)m2 · · · (jk!)mk

1m1!m2! · · ·mk!

∏j∈π

κjT (A.38)

where the summation∑

π is defined in (A.12). Combining the preceding for-mula with (A.17) gives

αmT = αm +O

2m

T 2(c∗

1b)m−2

∑π

m!m1!m2! · · ·mk!

(A.39)

= αm +O

m!22m

T 2(c∗

1b)m−2


uniformly over m, where the last line follows because∑

π1

m1!m2!···mk! < 2m.Q.E.D.

LEMMA 3: Let Assumptions 2 and 3 hold. If b → 0 and T → ∞ such thatbT → ∞, then

µbT =∫ 1

0k∗b(r r)dr − (bT)−qgq

(ω−2

T

∞∑h=−∞

|h|qγ(h))(1 + o(1))(A.40)

+O

(1T

);

κ2T = 2∫ 1

0

∫ 1

0(k∗

b(r s))2 dr ds(1 + o(1))+O

(1T

);(A.41)

for m= 3 and 4,

κmT = O(bm−1)+O

(1T

)(A.42)

PROOF OF LEMMA 3: We have proved (A.40) in the proof of Lemma 2because equation (A.26) holds for both fixed b and decreasing b. It remainsto consider κmT for m = 2, 3, and 4. We first consider κ2T = 2T−2(ω−4

T ) ×Trace[(ΩTATWbAT)

2]. As a first step, we have

Trace[(ΩTATWbAT)2](A.43)

=∑

r1r2r3r4

kb

(r2

Tr3

T

)kb

(r4

Tr1

T

)γ(r1 − r2)γ(r3 − r4)

=T∑

r2=1

T−r2∑h1=1−r2

T∑r4=1

T−r4∑h2=1−r4

kb

(r2

Tr4 + h2

T

)kb

(r4

Tr2 + h1

T

)

× γ(h1)γ(h2)

=(

T−1∑h1=1

T−h1∑r2=1

+0∑

h1=1−T

T∑r2=1−h1

)(T−1∑h2=1

T−h2∑r4=1

+0∑

h2=1−T

T∑r4=1−h2

)

kb

(r2

Tr4 + h2

T

)kb

(r4

Tr2 + h1

T

)γ(h1)γ(h2)

:= I1 + I2 + I3 + I4


where

I1 =T−1∑h1=1

T−1∑h2=1

T−h1∑r2=1

T−h2∑r4=1

kb

(r2

Tr4 + h2

T

)kb

(r4

Tr2 + h1

T

)γ(h1)γ(h2)

I2 =T−1∑h1=1

T−h1∑r2=1

0∑h2=1−T

T∑r4=1−h2

kb

(r2

Tr4 + h2

T

)kb

(r4

Tr2 + h1

T

)

× γ(h1)γ(h2)

I3 =0∑

h1=1−T

T∑r2=1−h1

T−h1∑r2=1

T−h2∑r4=1

kb

(r2

Tr4 + h2

T

)kb

(r4

Tr2 + h1

T

)

× γ(h1)γ(h2)

and

I4 =0∑

h1=1−T

T∑r2=1−h1

0∑h2=1−T

T∑r4=1−h2

kb

(r2

Tr4 + h2

T

)kb

(r4

Tr2 + h1

T

)

× γ(h1)γ(h2)

We now consider each term in turn. Using equation (A.33), we have

kb

(r2

Tr4 + h2

T

)= kb

(r2

Tr4

T

)+O

(b|h2|T

)(A.44)

and

kb

(r4

Tr2 + h1

T

)= kb

(r4

Tr2

T

)+O

(b|h1|T

)(A.45)

It follows from (A.44) and (A.45) that

I1 =T−1∑h1=1

T−1∑h2=1

T−1∑r2=1

T−1∑r4=1

kb

(r2

Tr4 + h2

T

)kb

(r4

Tr2 + h1

T

)γ(h1)γ(h2)(A.46)

+O

(T−1∑h1=1

T−1∑h2=1

[T(|h1| + |h2|)+ |h1h2|

]|γ(h1)γ(h2)|)

=T−1∑h1=1

T−1∑h2=1

T−1∑r2=1

T−1∑r4=1

kb

(r2

Tr4 + h2

T

)kb

(r4

Tr2 + h1

T

)γ(h1)γ(h2)

+O(T)


=T−1∑h1=1

T−1∑h2=1

T−1∑r2=1

T−1∑r4=1

kb

(r2

Tr4

T

)kb

(r4

Tr2

T

)γ(h1)γ(h2)+O(T)

+O

T−1∑h1=1

T−1∑h2=1

T−1∑r2=1

T−1∑r4=1

∣∣∣∣kb

(r2

Tr4

T

)∣∣∣∣(b(|h1| + |h2|)

T

)

× |γ(h1)γ(h2)|

=T−1∑h1=1

T−1∑h2=1

T−1∑r2=1

T−1∑r4=1

kb

(r2

Tr4

T

)kb

(r4

Tr2

T

)γ(h1)γ(h2)+O(T)

Following the same procedure, we can show that

I2 =T−1∑h1=1

T∑r2=1

0∑h2=1−T

T∑r4=1

kb

(r2

Tr4

T

)kb

(r4

Tr2

T

)γ(h1)γ(h2)+O(T)(A.47)

I3 =0∑

h1=1−T

T∑r2=1

T∑r2=1

T∑r4=1

kb

(r2

Tr4

T

)kb

(r4

Tr2

T

)γ(h1)γ(h2)+O(T)(A.48)

and

I4 =0∑

h1=1−T

T∑r2=1

0∑h2=1−T

T∑r4=1

kb

(r2

Tr4

T

)kb

(r4

Tr2

T

)γ(h1)γ(h2)(A.49)

+O(T)

As a consequence,

Trace[(ΩTATWbAT)2] =

∑r2r4

kb

(r2

Tr4

T

)2(

T−1∑h=1−T

γ(h1)

)2

+O(T)(A.50)

and

κ2T = 2T−2(ω−4T )Trace[(ΩTATWbAT)

2](A.51)

= 2∫ 1

0

∫ 1

0(k∗

b(r s))2 dr ds (1 + o(1))+O

(1T

)

The proof for κmT for m = 3 and 4 is essentially the same except that we useLemma 1 to obtain the first term O(bm−1). The details are omitted. Q.E.D.


A.2. Proofs of the Main Results

PROOF OF THEOREM 1: Using the independence between W (1) and Ξb, wehave

Fδ(z) = P∣∣(W (1)+ δ)Ξ−1/2

b

∣∣ < z = EGδ(z

2Ξb)(A.52)

Taking a fourth-order Taylor expansion of Gδ(z2Ξb) around µbz

2 yields

Gδ(z2Ξb) = Gδ(µbz

2)+ 12(G′′

δ(µbz2)z4)(Ξb −µb)

2(A.53)

+ 16(G′′′

δ (µbz2)z6)(Ξb −µb)

3

+ 124

(G(4)δ (µbz

2)z8)(Ξb −µb)4

where µb lies on the line segment between µb and Ξb. Taking expectation onboth sides of the equation and using the fact that |G(4)

δ (µbz2)z8| ≤ C for some

constant C , we have

EGδ(z2Ξb) = Gδ(µbz

2)+ 12G′′

δ(µbz2)E(Ξb −µb)

2z4(A.54)

+ 16G′′′

δ (µbz2)α3z

6 +O(|α4|)

as b → 0, where the O(·) term holds uniformly over z ∈ R+.

In view of Lemma 1, we have

|α4| =O(b3) |α3| ≤ |α4|3/4 =O(b9/4) = o(b2)(A.55)

As a consequence,

Fδ(z) = P∣∣(W (1)+ δ)Ξ−1/2

b

∣∣ < z

(A.56)

= Gδ(µbz2)+ 1

2(G′′

δ(µbz2)z4)α2 + o(b2)

uniformly over z ∈ z ∈ R+, where

µb =EΞb =∫ 1

0k∗b(r r)dr = 1 −

∫ 1

0

∫ 1

0kb(r − s)dr ds(A.57)

and

α2 = 2(∫ 1

0

∫ 1

0kb(r − s)dr ds

)2

+ 2∫ 1

0

∫ 1

0k2b(r − s)dr ds(A.58)


− 4∫ 1

0

∫ 1

0

∫ 1

0kb(r −p)kb(r − q)dr dpdq

We proceed to develop an asymptotic expansion of µb and α2 as b→ 0. Let

K1(λ)= 12π

∫ ∞

−∞k(x)exp(−iλx)dx(A.59)

K2(λ)= 12π

∫ ∞

−∞k2(x)exp(−iλx)dx

Then

k(x) =∫ ∞

−∞K1(λ)exp(iλx)dλ k2(x) =

∫ ∞

−∞K2(λ)exp(iλx)dλ(A.60)

For the integral that appears in both µb and α2, we have

∫ 1

0

∫ 1


=∫ ∞

−∞K1(λ)

∫ 1

0

∫ 1

0exp

(iλ(r − s)

b

)dr ds dλ

=∫ ∞

−∞K1(λ)

(∫ 1

0exp

(iλr

b

)dr

)(∫ 1

0exp

(− iλs

b

)ds

)

=∫ ∞

−∞K1(λ)

(b2

λ2

((1 − cos

(λ

b

))2

+(

sin(λ

b

))2))dλ

= b

∫ ∞

−∞K1(λ)b

(sin λ

2bλ2

)2

dλ

= 2πbK1(0)+ 4b2

∫ ∞

−∞

K1(λ)−K1(0)λ2

(sin

λ

2b

)2

dλ

where the last equality holds because

∫ ∞

−∞

(λ

2b

)−2(sin

λ

2b

)2

dλ= 2∫ ∞

−∞x−2 sin2 xdx= 2π(A.62)

Now,

∫ ∞

−∞

K1(λ)−K1(0)λ2

(sin

λ

2b

)2

dλ(A.63)


=∫ ∞

−∞

K1(λ)−K1(0)λ2

((sin

λ

2b

)2

− 12

)dλ

+ 12

∫ ∞

−∞

K1(λ)−K1(0)λ2

dλ

= −12

∫ ∞

−∞

(K1(λ)−K1(0)λ2

)(cos

1bλ

)dλ

+ 12

∫ ∞

−∞

K1(λ)−K1(0)λ2

dλ

= 12

∫ ∞

−∞

(K1(λ)−K1(0)λ2

)dλ+ o(1)

as b → 0, where we have used the Riemann–Lebesgue lemma. In view of thesymmetry of k(x), K1(λ) = 1

2π

∫ ∞−∞ k(x) cos(λx)dx and, therefore, (A.61) and

(A.63) lead to∫ 1

0

∫ 1


= 2πbK1(0)+ 2b2

∫ ∞

−∞

(K1(λ)−K1(0)λ2

)dλ

= 2πbK1(0)+ b2 1π

∫ ∞

−∞

∫ ∞

−∞k(x)

cosλx− 1λ2

dxdλ

= 2πbK1(0)− b2

∫ ∞

−∞k(x)|x|dx

= bc1 + b2c3 + o(b2)

Similarly,∫ 1

0

∫ 1

0k2b(r − s)dr ds = bc2 + b2c4 + o(b2)(A.65)

Next,∫ 1

0kb(r − s)ds(A.66)

= 12

∫ ∞

−∞K(λ)

∫ 1

0

exp

(iλ(r − s)

b

)+ exp

(− iλ(r − s)

b

)ds dλ

=∫ ∞

−∞K(λ)

∫ 1

0cos

(λ(r − s)

b

)ds dλ


= b

∫ ∞

−∞K(λ)

1λ

(sin

(λ(r − 1)

b

)− sin

(λr

b

))dλ

= b

∫ ∞

−∞K(xb)

1x

(sin(x(r − 1))− sin(xr)

)dx

so∫ 1

0

∫ 1

0

∫ 1

0kb(r −p)kb(r − q)dr dpdq(A.67)

= b2

∫ 1

0

[∫ ∞

−∞K(xb)

1x

(sin(x(r − 1))− sin(xr)

)dx

]2

dr

= b2K2(0)∫ 1

0

(∫ ∞

−∞

1x

sin(x(r − 1))dx−∫ ∞

−∞

1x

sin(xr)dx)2

dr

= b2K2(0)∫ 1

0

(∫ ∞

−∞

sin(x(r − 1))x(r − 1)

d(x(r − 1))

−∫ ∞

−∞

1xr

sin(xr)dx r)2

dr

= b2K2(0)∫ 1

0

(2∫ ∞

−∞

1y

sin(y)dy)2

dr = c21b

2

Combining (A.64), (A.65), and (A.67) yields

µb = 1 − bc1 − b2c3 + o(b2)(A.68)

and

α2 = 2bc2 + b2(c4 − 2c21)+ o(b2)(A.69)

Now

Fδ(z) = Gδ(µbz2)+ 1

2(G′′

δ(µbz2)z4)α2 + o(b2)(A.70)

= Gδ(z2)−G′

δ(z2)z2bc1 −G′

δ(z2)z2b2c3 + 1

2G′′

δ(z2)z4c2

1b2

+G′′δ(z

2)z4bc2 + 12G′′

δ(z2)z4b2(c4 − 2c2

1)

+ 12G′′′

δ (z2)z6(−bc1)(2bc2)+ o(b2)

= Gδ(z2)+ [c2G

′′δ(z

2)z4 − c1G′δ(z

2)z2]b


−(G′

δ(z2)z2c3 − 1

2G′′

δ(z2)z4(c4 − c2

1)+G′′′δ (z

2)z6c1c2

)b2

+ o(b2)

uniformly over z ∈ R+, where the uniformity holds because |G′

δ(z2)z2| ≤ C ,

|G′′δ(z

2)z4| < C , and |G′′δ(z

2)z6| < C for some constant C . This completes theproof of the theorem. Q.E.D.

PROOF OF COROLLARY 2: Using a power series expansion, we have

F0(zαb) = D(z2αb)+ [c2D

′′(z2αb)z

4αb − c1D

′(z2αb)z

2αb]b(A.71)

−(D′(z2

αb)z2αbc3 − 1

2D′′(z2

αb)z4αb(c4 − c2

1)

+D′′′(z2αb)z

6αbc1c2

)b2 + o(b2)

= D(z2α)+D′(z2

α)(z2αb − z2

α)+ 12D′′(z2

α)(z2αb − z2

α)2

+ [c2D′′(z2

α)z4α − c1D

′(z2α)z

2α]b

+ [c2D′′′(z2

α)z4α + 2c2D

′′(z2α)z

2α − c1D

′′(z2α)z

2α − c1D

′(z2α)]

× (z2αb − z2

α)b

−(D′(z2

α)z2αc3 − 1

2D′′(z2

α)z4α(c4 − c2

1)+D′′′(z2α)z

6αc1c2

)b2

+ o(b2)

that is,

D′(z2α)(z

2αb − z2

α)+ 12D′′(z2

α)(z2αb − z2

α)2(A.72)

+ [c2D′′(z2

α)z4α − c1D

′(z2α)z

2α]b

+ [c2D′′′(z2

α)z4α + 2c2D

′′(z2α)z

2α − c1D

′′(z2α)z

2α − c1D

′(z2α)]

× [(z2αb − z2

α)]b

−(D′(z2

α)z2αc3 − 1

2D′′(z2

α)z4α(c4 − c2

1)+D′′′(z2α)z

6αc1c2

)b2 + o(b2)

= 0

Let

z2αb = z2

α + k1b+ k2b2 + o(b2)(A.73)


Then

[c2D′′(z2

α)z4α − c1D

′(z2α)z

2α]b+D′(z2

α)k1b(A.74)

−(D′(z2

α)z2αc3 − 1

2D′′(z2

α)z4α(c4 − c2

1)+D′′′(z2α)z

6αc1c2

)b2

+ [c2D′′′(z2

α)z4α + 2c2D

′′(z2α)z

2α − c1D

′′(z2α)z

2α − c1D

′(z2α)]k1b

2

+ 12D′′(z2

α)k21b

2 +D′(z2α)k1b+ o(b2)= 0

This implies that

k1 = − 1D′(z2

α)[c2D

′′(z2α)z

4α − c1D

′(z2α)z

2α](A.75)

and

k2 = − 1D′(z2

α)

[−

(D′(z2

α)z2αc3 − 1

2D′′(z2

α)z4α(c4 − c2

1)+D′′′(z2α)z

6αc1c2

)(A.76)

+ (c2D′′′(z2

α)z4α + 2c2D

′′(z2α)z

2α − c1D

′′(z2α)z

2α

− c1D′(z2

α))k1 + 12D′′(z2

α)k21

]

Now

D′(z)= z−1/2e−z/2

(1/2)√

2 D′′(z)= 1

4√πz2

(−√2ze−z/2 − z3/2

√2e−z/2)(A.77)

D′′′(z)= 18√πz7/2

(3z√

2e−z/2 + 2z2√

2e−z/2 + z3√

2e−z/2)(A.78)

and thus

D′′(z2)

D′(z2)= 1

4z3(−2z − 2z3)

D′′′(z2)

D′(z2)= 1

4z4(2z2 + z4 + 3)(A.79)

Hence

k1 =(c1 + 1

2c2

)z2α + 1

2c2z

4α(A.80)

and

k2 =(

12c2

1 + 32c1c2 + 3

16c2

2 + c3 + 14c4

)z2α(A.81)


+(

−12c1 + 3

2c1c2 + 9

16c2

2 + 14c4

)z4α +

(5

16c2

2

)z6α −

(1

16c2

2

)z8α

as desired.It follows from z2

αb = z2α + k1b+ k2b

2 + o(b2) that

zαb = zα

(1 + 1

2k1b+ k2b

2

z2α

− 18k2

1b2

z4α

)+ o(b2)(A.82)

= zα + 12k1

zαb+

(12k2

zα− 1

8k2

1

z3α

)b2 + o(b2)

= zα + k3b+ k4b2 + o(b2)

where

k3 = 12

[(c1 + 1

2c2

)zα + 1

2c2z

3α

](A.83)

and

k4 =(

12c3 + 1

8c4 + 5

8c1c2 + 1

8c2

1 + 116

c22

)zα(A.84)

+(

−14c1 + 1

8c4 + 5

8c1c2 + 7

32c2

2

)z3α + 1

8c2

2z5α − 1

32c2

2z7α

Q.E.D.

PROOF OF COROLLARY 3: For notational convenience, let

p(1)(z2α)=

(c1 + c2

2

)z2α +

(c2

2

)z4α(A.85)

and then z2αb = z2

α +p(1)(z2α)b+ o(b). We have

1 −EGδ(z2αbΞb)(A.86)

= 1 −Gδ(z2α +p(1)(z2

α)b)

− c2G

′′δ(z

2α +p(1)(z2

α)b)[z2α + bp(1)(z2

α)]2b

+ c1

G′

δ(z2α +p(1)(z2

α)b)(z2α +p(1)(z2

α)b)b+ o(b)

= 1 −Gδ(z2α)−G′

δ(z2α)p

(1)(z2α)b− [c2G

′′δ(z

2α)z

4α − c1G

′δ(z

2α)z

2α]b

+ o(b)

= 1 −Gδ(z2α)−G′

δ(z2α)

[(c1 + c2

2

)z2α +

(c2

2

)z4α

]b


− [c2G′′δ(z

2α)z

4α − c1G

′δ(z

2α)z

2α]b+ o(b)

= 1 −Gδ(z2α)−G′

δ(z2α)

[(c1 + c2

2

)z2α +

(c2

2

)z4α

]b

− [c2G′′δ(z

2α)z

4α − c1G

′δ(z

2α)z

2α]b+ o(b)

= 1 −Gδ(z2α)−

(c2

2G′

δ(z2α)z

4α + c2G

′′δ(z

2α)z

4α + c2

2G′

δ(z2α)z

2α

)b

+ o(b)

= 1 −Gδ(z2α)− c2

(12G′

δ(z2α)z

4α +G′′

δ(z2α)z

4α + 1

2G′

δ(z2α)z

2α

)b+ o(b)

Note that

G′δ(z) =

∞∑j=0

(δ2/2)j

j! e−δ2/2 zj−1/2e−z/2

(j + 1/2)2j+1/2(A.87)

and

G′′δ(z) =

∞∑j=0

(δ2/2)j

j! e−δ2/2

((j − 1

2

)1z

zj−1/2e−z/2

(j + 1/2)2j+1/2(A.88)

− 12

zj−1/2e−z/2

(j + 1/2)2j+1/2

)

=(

− 12z

− 12

)G′

δ(z)+∞∑j=0

(δ2/2)j

j! e−δ2/2 zj−1/2e−z/2

(j + 1/2)2j+1/2

j

z

= −12G′

δ(z)

(1z

+ 1)

+Kδ(z)

so that

12G′

δ(z2α)z

4α +G′′

δ(z2α)z

4α + 1

2G′

δ(z2α)z

2α = z4

αKδ(z2α)(A.89)

and

1 −EGδ(z2αbΞb)= 1 −Gδ(z

2α)− c2z

4αKδ(z

2α)b+ o(b)(A.90)

completing the proof of the corollary. Q.E.D.

PROOF OF THEOREM 4: It follows from Lemma 3 that when b→ 0,

α2T = κ2T = 2bc2(1 + o(1))+O(T−1)(A.91)


α3T = κ3T =O(b2)+O(T−1)

α4T = κ4T + 3κ22T = O(b2)+O(T−1)

and

µbT = µb − (bT)−qgq

(ω−2

T

∞∑h=−∞

|h|qγ(h))(1 + o(1))+O(T−1)(A.92)

Thus, as b→ 0,

FTδ(z) = P|√T(β−β0)/ωb| ≤ z

(A.93)

= EGδ(z2ςbT ) +O(T−1)

= Gδ(µbTz2)+ 1

2G′′

δ(µbT z2)z4α2T + o(b)

= Gδ(µbTz2)+ 1

2G′′

δ(µbT z2)z4(2bc2)+ o(b)+O(T−1)

= Gδ(µbz2)+G′

δ(µbz2)z2(µbT −µb)+ bc2G

′′δ(µbz

2)z4

+ o(b)+O(T−1)

uniformly over z ∈ R+, using (A.91) and (A.92), but

Gδ(µbz2) = Gδ(z

2)+G′δ(z

2)z2(µb − 1)+ o(b)(A.94)

= Gδ(z2)− bc1G

′δ(z

2)z2 + o(b)

uniformly over z ∈ R+ and

G′δ(µbz

2)z2(µbT −µb)(A.95)

= (G′δ(z

2)+O(b))z2

×

−gq

(ω−2

T

∞∑h=−∞

|h|qγ(h))(bT)−q(1 + o(1))+O(T−1)

= −gq

(ω−2

T

∞∑h=−∞

|h|qγ(h))G′

δ(z2)z2(bT)−q(1 + o(1))

+ o(b)+O(T−1)

uniformly over z ∈ R+. So

FTδ(z) = Gδ(z2)+ (c2G

′′δ(µbz

2)z4 − c1G′δ(z

2)z2)b(A.96)

− gqdqTG′δ(z

2)z2(bT)−q + o(b+ (bT)−q)+O(T−1)


uniformly over z ∈ R+, as desired. Q.E.D.

PROOF OF COROLLARY 5: (a) Using Theorem 4, we have, as b + 1/T +1/(bT) → 0,

FT0(zαb)(A.97)

=D(z2αb)+ [c2D

′′(z2αb)z

4αb − c1D

′(z2αb)z

2αb]b

− gqdqTD′(z2

αb)z2αb(bT)

−q +O(T−1)+ o(b+ (bT)−q)

= F0(zαb)− gqdqTD′(z2

αb)z2αb(bT)

−q +O(T−1)+ o(b+ (bT)−q)

= 1 − α− gqdqTD′(z2

α)z2α(bT)

−q +O(T−1)+ o(b+ (bT)−q)

so

1 −FT0(zαb)−α= gqdqTD′(z2

α)z2α(bT)

−q +O(T−1)+o(b+ (bT)−q)(A.98)

(b) Plugging z2αb into

FTδ(z) = Gδ(z2)+ [c2G

′′δ(µbz

2)z4 − c1G′δ(z

2)z2]b(A.99)

− gqdqTG′δ(z

2)z2(bT)−q + ob+ (bT)−q +O(T−1)

yields

P

(∣∣∣∣√T(β−β0)

ωb

∣∣∣∣2

≥ z2αb

)(A.100)

= 1 −Gδ(z2αb)− [c2G

′′δ(z

2αb)z

4αb − c1G

′δ(z

2αb)z

2αb]b

+ gqdqTG′δ(z

2αb)z

2αb(bT)

−q +O(T−1)+ o(b+ (bT)−q)

= 1 −Gδ(z2α)− c2z

4αKδ(z

2α)b

+ gqdqTG′δ(z

2α)z

2α(bT)

−q +O(T−1)+ o(b+ (bT)−q)

where the last equality follows as in the proof of Corollary 3. Q.E.D.

PROOF OF THEOREM 6: First, since D(·) is a bounded function, we have

P|W (1)Ξ−1/2

b | ≤ z

(A.101)

= limB→∞

ED(z2Ξb)1|Ξb −µb| ≤ B

= limB→∞

E

∞∑m=1

1m!D

(m)(µbz2)(Ξb −µb)

mz2m1|Ξb −µb| ≤ B


= limB→∞

∞∑m=1

1m!D

(m)(µbz2)αmz

2m1|Ξb −µb| ≤ B

where the last line follows because the infinite sum∑∞

m=11m!D

(m)(µbz2)αmz

2m

converges uniformly to D(z2Ξb) when |Ξb − µb| ≤ B. Uniformity holds be-cause D(·) is infinitely differentiable with bounded derivatives.

Since D(z2) decays exponentially as z2 → ∞, there exists a constant C suchthat |D(m)

(µbz2)z2m|<C for all m. Using this and Lemma 1, we have∣∣∣∣∣

∞∑m=1

1m!D

(m)(µbz

2)αmz2m

∣∣∣∣∣ ≤ C

∞∑m=1

1m! |αm| ≤ C

∞∑m=1

1m!2

2mm!(c∗1b)

m−1(A.102)

= C(c∗1b)

−1∞∑

m=1

(4c∗1b)

m < ∞

provided that b < 1/(4c∗1). As a consequence, the operation limB→∞ can be

moved inside the summation sign in (A.101), giving

P|W (1)Ξ−1/2

b | ≤ z =

∞∑m=1

1m!D

(m)(µbz

2)αmz2m(A.103)

when b < 1/(4c∗1).

Second, it follows from

FTδ(z) = P|√T(β−β0)/ωb| ≤ z

=EGδ(z2ςbT ) +O(T−1)(A.104)

that, when c = 0, we have

P|√T(β−β0)/ωb| ≤ z

= ED(z2ςbT ) +O(T−1)(A.105)

But

ED(z2ςbT ) =∞∑

m=1

1m!D

(m)(µbT z

2)αmT z2m(A.106)

where the right hand side converges to ED(z2ςbT ) uniformly over T because

αmT = αm +O

22mm!T 2

(c∗1b)

m−2

uniformly over m by Lemma 2, |D(m)(µbT z

2)z2m|<C for some constant C , andthus ∣∣∣∣∣

∞∑m=1

1m!D

(m)(µbT z

2)αmT z2m

∣∣∣∣ ≤ C

∞∑m=1

1m! |αm| + C

T 2

∞∑m=1

22m(c∗1b)

m−2

< ∞


when b < 1/(4c∗1). Therefore,

P

∣∣∣∣√Tβ−β0

ωb

∣∣∣∣ ≤ z

=

∞∑m=1

1m!D

(m)(µbTz

2)αmT z2m +O

(1T

)(A.107)

uniformly over z ∈ R+.

It follows from (A.103) and (A.107) that

|FT0(z)− F0(z)|(A.108)

= |P∣∣∣∣

√Tβ−β

ωb

∣∣∣∣ ≤ z

− P

|W (1)Ξ−1/2b |< z

∣∣∣∣

=∣∣∣∣∣

∞∑m=1

1m!D

(m)(µbT z

2)αmT z2m −

∞∑m=1

1m!D

(m)(µbz

2)αmz2m

∣∣∣∣∣ +O

(1T

)

=∣∣∣∣∣

∞∑m=1

1m!D

(m)(µbz

2)αmT z2m −

∞∑m=1

1m!D

(m)(µbz

2)αmz2m

∣∣∣∣∣ +O

(1T

)

uniformly over z ∈ R, where the second equality holds because

D(m)(µbTz

2)=D(m)(µbz

2)+O(D(m+1)

(µbz2)z2/T)

uniformly over z ∈ R and∣∣∣∣∣

∞∑m=1

1m!D

(m+1)(µbz

2)αmT z2m+2

∣∣∣∣∣ <∞

Therefore,

|FT0(z)− F0(z)| =∣∣∣∣∣

∞∑m=1

1m!D

(m)(µbz

2)(αmT − αm)z2m

∣∣∣∣∣ +O

(1T

)(A.109)

= O

1T 2

∞∑m=1

22m(c∗1b)

m−2

+O

(1T

)

= O

(1T

)

uniformly over z ∈ R+ as desired. Q.E.D.

Dept. of Economics, University of California, San Diego, La Jolla, CA 92093,U.S.A.; [email protected],



Cowles Foundation, Yale University, Box 208281, New Haven, CT 06520,U.S.A.; University of Auckland, Auckland, New Zealand; and University of York,York, U.K.; [email protected],

andGuanghua School of Management, Peking University, Beijing, 100871, China;

[email protected].



optimal bandwidth selection in heteroskedasticity...

Documents