generalised empirical likelihood kernel block bootstrapping · this article unveils how the kernel...

REM WORKING PAPER SERIES

Generalised Empirical Likelihood Kernel Block Bootstrapping

Paulo M.D.C. Parente, Richard J. Smith

REM Working Paper 055-2018

November 2018

REM – Research in Economics and Mathematics Rua Miguel Lúpi 20,

1249-078 Lisboa, Portugal

ISSN 2184-108X

Any opinions expressed are those of the authors and not those of REM. Short, up to two paragraphs can be cited provided that full credit is given to the authors.

Generalised Empirical Likelihood

Kernel Block Bootstrapping

Paulo M.D.C. ParenteISEG- Lisbon School of Economics & Management, Universidade de Lisboa

REM - Research in Economics and Mathematics;CEMAPRE- Centro de Matem�atica Aplicada �a Previs~ao e Decis~ao Econ�omica.

[email protected]

Richard J. Smithcemmap, U.C.L and I.F.S.

Faculty of Economics, University of CambridgeDepartment of Economics, University of MelbourneONS Economic Statistics Centre of Excellence

[email protected]

This Draft: October 2018

Abstract

This article unveils how the kernel block bootstrap method of Parente and Smith (2018a,2018b)can be applied to make inferences on parameters of models de�ned through moment restrictions.Bootstrap procedures that resort to generalised empirical likelihood implied probabilities to drawobservations are also introduced. We prove the �rst-order asymptotic validity of bootstrapped teststatistics for overidentifying moment restrictions, parametric restrictions and additional momentrestrictions. Resampling methods based on such probabilities were shown to be e�cient by Brownand Newey (2002). A set of simulation experiments reveals that the statistical tests based on theproposed bootstrap methods perform better than those that rely on �rst-order asymptotic theory.

JEL Classi�cation: C14, C15, C32Keywords: Bootstrap; heteroskedastic and autocorrelation consistent inference; Generalised Method

of Moments; Generalised Empirical Likelihood

1 Introduction

The objective of this article is to propose new bootstrap methods for models de�ned through moment

restrictions in the time-series context using a novel bootstrap method introduced recently by Parente and

Smith (2018a, 2018b). Simultaneously, we amend some of the existent results in the related literature.

The generalized method of moments (GMM) estimator of Hansen (1982) has become one of the most

popular tools in econometrics due to its applicability in di�erent and varied situations. It can be used,

for instance to estimate parameters of interest under endogeneity and measurement error. Consequently,

the richness of the set of inferential statistics provided by GMM may be extremely useful to economists

doing empirical work. These statistics allow to test for overidentifying moment conditions, parametric

restrictions and additional moment conditions.

The performance of statistics based on GMM has been revealed to be poor in �nite samples and

this situation worsens in time-series data due to the presence of autocorrelation [see Newey and West

(1994), Burnside and Eichenbaum (1996), Christiano and den Haan (1996) among others]. To tackle

this problem several alternative approaches have been proposed in the literature, being the bootstrap

among the methods that has produced better results. The bootstrap is a resampling method introduced

by Efron (1979) to make inferences on parameters of interest. It can be used not only to approximate

the (asymptotic) distribution of an estimator or statistic, but also to estimate its variance. From the

practical standpoint it has the bene�t of not requiring the application of complicated formulae and from

the theoretical viewpoint it allows to obtain asymptotic re�nements when the statistic of interest is

smooth and asymptotically pivotal.

Bootstrap methods in the context of moment restrictions have been introduced previously by Hahn

(1996) and Brown and Newey (2002) for random samples and Hall and Horowitz (1996), Andrews

(2002), Inoue and Shintani (2006), Allen, et al. (2011) and Bravo and Crudu (2011) for dependent data.

This literature can be divided in two strands.

Hahn (1996) proves consistency of the i.i.d. bootstrap distribution for GMM, but he did not

consider bootstrapped test statistics based on GMM. Hall and Horowitz (1996), Andrews (2002) and

Inoue and Shintani (2006) propose the use of the standard moving blocks bootstrap applied to GMM.

A second line of research is followed by Brown and Newey (2002), Allen, et al. (2011) and Bravo and

[1]

Crudu (2011) who use empirical likelihood and generalised empirical likelihood implied probabilities to

draw observations or blocks of data.

Hall and Horowitz (1996) suggested applying the non-overlapping blocks bootstrap method of Carl-

stein (1986) to GMM after centering the bootstrap moment restrictions at their sample means. They

prove that this method yields asymptotic re�nements not only for the bootstrapped J statistic of Hansen

(1982), but also for the bootstrapped t statistic for testing a single parametric restriction. Andrews (2002)

extends Hall and Horowitz (1996) method to overlapping moving blocks bootstrap of K�unsch (1989) and

Liu and Singh (1992) and the k-step bootstrap of Davidson and Mackinnon (1999). However, Hall and

Horowitz (1996) and Andrews (2002) require uncorrelateness of the moment indicators after a certain

number of lags. This assumption is relaxed by Inoue and Shintani (2006) in the special case of linear

models estimated using instruments.

Brown and Newey (2002) in the i.i.d. setting mention, though without a formal proof, that the

same improvements can be obtained, by using a method that they denominate empirical likelihood (EL)

bootstrap. The EL bootstrap consists in �rst computing the empirical likelihood implied probabilities

associated with each observation under a set of moment restrictions and using these probabilities to

draw each observation in order to construct the bootstrap samples. Although Brown and Newey (2002)

did not prove the asymptotic validity of the method, they showed heuristically that it is e�cient in

the sense that the di�erence between the �nite sample distribution of a statistic and its EL bootstrap

counterpart is asymptotically normal (after a proper scaling) with minimum variance. Recently the EL

bootstrap method was extended to the time series context by Allen, et al. (2011) and Bravo and Crudu

(2011) using a MBB procedure. Both articles suggest �rst computing implied probabilities for blocks

of observations and use these probabilities to draw blocks in order to construct the bootstrap samples.

There are some di�erences between these two articles. Firstly, while Allen, et al. (2011) consider EL

implied probabilities, Bravo and Crudu (2011) use the generalised empirical likelihood (GEL) implied

probabilities of Smith (2011). Secondly, Allen, et al. (2011) propose using both non-overlapping blocks

and overlapping blocks whereas Bravo and Crudu (2011) only study the latter. Thirdly, Allen, et al.

(2011) investigate the �rst order validity of the method for general GMM estimators and Bravo and

Crudu (2011) consider only the e�cient GMM estimator. Both articles address the �rst-order asymp-

totic behaviour of bootstrapped J statistic and bootstrapped Wald (W) statistics tests for parametric

[2]

restrictions. Finally, in the case of tests of parametric restrictions, Bravo and Crudu (2011), additionally,

propose drawing bootstrap samples based on the GEL implied probabilities computed under the null

hypothesis and the moment restrictions and put forward the bootstrapped Lagrange multiplier (LM)

and distance (D) statistics in this framework.

In this article we also consider a time-series setting, but depart from the dominant paradigm of

using bootstrap methods based on moving blocks and introduce an alternative to these resampling

schemes based on the kernel block bootstrap (KBB) method of Parente and Smith (2018a, 2018b). The

KBB method consists in transforming the data using weighted moving averages of all observations and

drawing bootstrap samples with replacement from the transformed sample. This method is akin to the

Tapered Block Bootstrap (TBB) method of Paparoditis and Politis (2001) in that if the kernel chosen

is of bounded support the KBB method can be seen as a variant of TBB that allows the inclusion of

incomplete blocks. However, KBB can be implemented also using kernels with unbounded support.

In the case of the sample mean and for a particular choice of the kernel with unbounded support it

allows to obtain a bootstrap variance estimator that is asymptotically equivalent to the quasi-spectral

estimator of the long run variance which Andrews (1991) proved to be optimal. Additionally, the

technical assumptions required by Paparoditis and Politis (2001) to prove the asymptotic validity of

TBB are not satis�ed by truncated kernels that are non-monotonic in the positive quadrant such as

the ap-top cosine windows described in D'Antona and Ferrero (2006, p.40), while KBB can be applied

using this kernel. We note however that both TBB and KBB allow the most popular truncated kernels

to be used, such as the rectangular, Bartlett and Tuckey-Hanning.

We use the new method to approximate the asymptotic distribution of the J statistic of Hansen

(1982) that allows to test for the overidentifying moment restrictions, and the trinity of test statis-

tics (Wald, Lagrange multiplier and distance statistics, cf. Newey and McFadden 1994, section 9 and

Ruud,2000, chapter 22) that permit testing parametric restrictions and additional moment conditions.

We show that the �rst order validity of the bootstrap test for overidentifying conditions GMM estimator

does not require prior centering of the bootstrap moments, this centering can be done a posteriori.

In the spirit of Brown and Newey (2002), we propose additionally to use the GEL implied probability

associated with each transformed observation [Smith, 2011] to construct the bootstrap sample. We prove

the �rst order validity of the method and corresponding test statistics. As Allen et al. (2011) and Bravo

[3]

and Crudu (2011) we prove the �rst order validity of the bootstrapped distribution of the estimator and

the bootstrapped J statistic, and tests for parametric restrictions and additional moment conditions.

We show in this article that the proof of consistency of the EL block bootstrap of Allen, et al. (2011)

is in error in that when applied to the ine�cient GMM estimator the bootstrap distribution of the

latter has to be centered at the e�cient GMM estimator. Hence the results stated in their Theorem 1

and 2 are invalid in general, though they hold if the weighting matrix is a consistent estimator of the

inverse of the covariance matrix of the moment indicators [cf. Theorem 1 of Bravo and Crudu (2010).]

Although our proof of this results applies only to the new bootstrap methods introduced in this article,

the demonstration for EL block bootstrapping is analogous.

When testing for parametric restrictions and additional moment conditions the GEL implied proba-

bilities can be computed under the null or under the maintained hypothesis. Hence, two types of KBB

bootstrap methods can be used, one using the GEL implied probabilities computed under the main-

tained hypothesis as in Brown and Newey (2002) and Allen et al. (2011) and another based on these

probabilities computed under the null as suggest in the case of parametric restrictions by Bravo and

Crudu (2011). This article investigates these two types of bootstrap methods. We note that Allen, et

al. (2011) in the case of EL block bootstrap actually do not present the formula of the bootstrapped

Wald statistic, though their Theorem 3, which is based on theirs incorrect Theorems 1 and 2, refers

to it. On the other hand, the formula for this statistic presented in Bravo and Crudu (2011) is only

valid if the implied probabilities were computed under the maintained hypothesis and not under the

null hypothesis, though it is presented jointly with the LM and D statistics which are obtained with

the implied probabilities computed under the null. We show that the trinity of tests statistics can be

computed using implied probabilities obtained under the null and under the maintained hypothesis and

that they have di�erent mathematical expressions depending on the resampling scheme chosen.

This paper is organized as follows. In the �rst section we introduce the KBB-method for moment

restrictions. In section 2 we summarize some important results on GMM and GEL in the time-series

context. The KBB method is brie y explained in section 3. In section 4 we present the �rst order

asymptotic theory on the KBB methods computed using the following di�erent probabilities to draw

observations: uniform (standard non-parametric KBB method), the implied probabilities associated

with the moment restrictions and the implied probabilities associated with the maintained hypothesis,

[4]

parametric restrictions and additional moment conditions. In section 5 we present a Monte Carlo study

in which we investigate the performance of the proposed bootstrap methods in �nite samples. Finally

section 6 concludes. The proofs of the results are given in the Appendix.

2 Framework

Let zt, (t = 1; :::; T ) denote observations on a �nite dimensional (strictly) stationary process fztg1t=1 :We

assume initially that the process is ergodic, but later we will require the stronger condition of mixing.

Consider the moment indicator g(zt; �); an m�vector functions of the data observation zt and the p-

vector � of unknown parameters which are the object of inferential interest, where m � p. It is assumed

that the true parameter vector �0 uniquely satis�es the moment condition

E[g(zt; �0)] = 0;

where E[�] denotes expectation taken with respect to the unknown distribution of zt.

2.1 The Generalized Method of Moments estimator

2.1.1 The Estimator

For notational simplicity we de�ne gt(�) � g(zt; �), (t = 1; :::; T ), and g(�) �PT

s=1 gt(�)=T; let also

Gt(�) � @gt(�)=@�0; (t = 1; :::; T ), G � E[Gt(�0)] and � limn!1 var[

pT g(�0)]. Denote W a sym-

metric weighting matrix that converges in probability to a non-random matrix W: The GMM estimator

is de�ned as

� = argmin�2B

Q(�);

QT (�) = g(�)0W g(�):

Hansen (1982) showed that under some regularity conditions �p! �0 and

pT (� � �0)

d! N(0; avar(�));

wherep! and

d! denote convergence in probability and distribution respectively and

avar(�) = (G0WG)�1G0WWG(G0WG)�1:

Denote � � (G0�1G)�1 and G (�) =PT

i=1 Gt (�) =T; G = G(�): Hansen (1982) proved also that the

most e�cient GMM estimator �e is obtained when we set W = �1 and in this case avar(�e) = �:

[5]

We consider the following regularity conditions that are su�cient to prove consistency.

Assumption 2.1 (i) The observed data are realizations of a stochastic process z � fzt : ! Rn; n 2 N; t = 1; 2; :::g

on the complete probability space (;F ; P ) where = �1t=1Rk and F = B(�1t=1 Rn) (the Borel ��eld

generated by the measure �nite dimension product cylinders); (ii) zt is stationary and ergodic ; (iii)

g(:; �) is Borel measurable for each � 2 B; g(zt; �) is continuous on B for each zt 2 Z, (iv) E[sup�2B kg(zt; �)k] <

1; (v) E[g(zt; �)] is continuous on B; (vi) E[g(zt; �)] = 0 only for � = �0; (vii) B is compact. (viii)

W =W + op(1) and W is a positive semi-de�nite de�nite matrix.

The following theorem corresponds to Theorem 3.1 of Hall (2005, p.68)

Theorem 2.1 Under assumption 2.1 � = �0 + op(1):

The assumptions 2.2 ensure that the estimator asymptotically normal distributed.1

Assumption 2.2 (i) fzt;�1 < t <1g is a strong mixing process with mixing coe�cients of size

�r=(r � 2); r > 2; E[kg(zt; �0)kr] < 1; r � 2; (ii) Gt(�) exists and is continuous on B for each

zt 2 Z (iii) rank(G) = p; (iv) E[sup�2N kGt(�)k] <1; where N is a neighborhood of �0:

The following Theorem is proven in Hansen (1982, Theorem 3.1) or Hall (2005, p. 71).

Theorem 2.2 Under assumption 2.1 and 2.2

pT (� � �0)

d! N(0; avar(�));

where avar(�) = (G0WG)�1G0WWG(G0WG)�1:

To obtain an e�cient estimator we need to estimate : Numerous estimators for have been proposed

in the literature under di�erent assumptions [see White (1984), Newey and West (1987), Gallant (1987),

Andrews (1991), Ng and Perron (1996).] Let = + op(1); the e�cient two-step GMM estimator is

de�ned as

�e = argmin�2B

~Q(�);

~QT (�) = g(�)0�1g(�):

1These assumptions are di�erent from those stated in Hansen (1982), but facilitate comparisions with the assumptionsmade later in the paper for GEL and KBB.

[6]

Overidenti�cation tests Consider the hypothesis H0 : E[gt(�0)] = 0 vs H1 : E[gt(�0)] 6= 0: Hansen

(1982) proposed the J statistic to test this hypothesis which is de�ned as

J = ng(�e)0�1g(�e);

where is a consistent estimator of : Hansen (1982, Lemma 4.2) proved the following Theorem:

Theorem 2.3 Under assumption 2.1 and 2.2 and if m > p; J d! �2(m� p):

Speci�cation Tests Here we consider tests for the null hypothesis

H0 : a(�0) = 0; E[q(zt; �0)] = 0;

where q(zt; �0) is a s�vector of moment indicators and a(�) is a r�vector of constraints. The alternative

H1 is a(�0) 6= 0 and/or E[q(zt; �0)] 6= 0:

In the context of GMM, test statistics for parametric restrictions were proposed by Newey and West

(1987) and for additional moment restrictions by Newey (1985), Eichebaum et al. (1988) and Ruud

(2000) [see also Smith (1997) for tests based on GEL.]

In order to introduce these statistics de�ne h(zt; �) � (g(zt; �)0; q(zt; �)0)0; qt(�) � q(zt; �); ht(�) �

h(zt; �) (t = 1; :::; T ), h (�) �PT

t=1 ht(�)=T; q (�) �PT

t=1 qt(�)=T: Let also � � limT!1 var[pT h(�0)];

�12 � limn!1 E[Pn

i=1 gt(�0)qt(�0)0=pT ] and �22 � limn!1 E[

pnq(�0)

0]: Denote � a consistent estima-

tor of � and let �12 and �22 be the submatrices of � that consistently estimate �12 and �22 respectively.

Let also

R(�) ��A(�) 0r�s0s�r Is

�;

where A(�) � @a(�)=@�0 (a r � p matrix). The restricted e�cient GMM estimator is de�ned as

�er = argmin�2Br

�QT (�);

�QT (�) = h(�)0��1h(�);

where Br = f� 2 B : a (�) = 0g : Let � q(�e)� �21��111 g(�e), r � (a(�e)0; 0)0 and R � R(�e): De�ne

also Qt(�) � @qt(�)=@�0; Q (�) �

PTi=1 Qt(�)=T and Q � E[@qt(�0)=@�

0]: Let � (D0��1D)�1 and

� (D0��1D)�1where

D =

�G 0m�sQ �Is

�, D (�) =

�G (�) 0m�sQ (�) �Is

�;

[7]

and D = D(�e):

We consider the following versions of the Wald, score and distance statistics

W = r0(RR0)�1r;

S = T h(�er)0��1DD0��1h(�er);

D = T [h(�er)0��1h(�er)� g(�e)0�1g(�e)]:

The results of Newey and West (1987), Newey (1985), Eichebaum et al. (1988) and Ruud (2000)

are summarized in the following Theorem which is proven in the Appendix for completeness.

We require the following additional assumptions to hold

Assumption 2.3 (i) �0 is the unique solution of E[ht(�)] = 0 and a(�) = 0; (ii) q(:; �) is Borel

measurable for each � 2 B and qt(�) is continuous in � for each zt 2 Z (iii) a(�) is twice continuously

di�erentiable on B; (iv) E[kq(zt; �0)kr] < 1; r � 2; (v) Qt(�) exists and is continuous on B for each

zt 2 Z; (vi) rank(Q) = s; (vii) E[sup�2N kQt(�)k] <1; (viii) � is non-singular and � = � + op(1):

Theorem 2.4 unveils the asymptotic distribution of the trinity of the test statistics.

Theorem 2.4 Under assumptions 2.1, 2.2 and 2.3 the statistics W; S and D are asymptotically equiv-

alent and converge in distribution to �2(s+ r):

2.1.2 Generalised Empirical Likelihood

In this section we review the e�cient GEL estimator for time-series proposed by Smith (2011). Consider

the smoothed moments

gtT (�) =1

ST

Xt�1

s=t�Tk(s

ST)gt(�); t = 1; :::; T;

where the kernel function k(�) satis�esR +1�1 k(a)da = 1; ST is a bandwidth parameter. De�ne k2 �R +1

�1 k(a)2da:

Let �(�) be a function that is concave on its domain V , an open interval containing zero. It is

convenient to impose a normalization on �(�). Let �j(�) = @j�(�)=@vj and �j = �j(0), (j = 0; 1; 2; :::).

We normalize this function so that �1 = �2 = �1. The GEL criteria for weakly dependent data was

de�ned by Smith (2011) as

PT (�; �) =XT

t=1[� (k�0gtT (�))� �0]=T;

[8]

where k = 1=k2: The GEL estimator is

�gel = argmin�2B

sup�2�T

PT (�; �) ;

where �T is de�ned below in Assumption 2.8. Let � (�) = arg sup�2�T

PT (�; �) ; � � �(�gel) and GtT (�) �

@gtT (�)=@�0 :

Smith (2011) de�ned the implied probabilities as

�t(�) =�1(k� (�)

0gtT (�))PT

t=1 �1(k� (�)0gtT (�))

; t = 1; :::; T:

Smith (2011) required the following assumptions to hold.

Assumption 2.4 The �nite dimensional stochastic process fztg1t=1 is stationary and strong mixing with

mixing coe�cients � of size �3v=(v � 1) for some v > 1.

Remark 2.1 The mixing coe�cient condition in Assumption 2.4 guarantees thatP1

j=1 j2�(j)(v�1)=v <

1 is satis�ed, see Andrews (1991, p.824), a condition required for the results in Smith (2011).

Assumption 2.5 (i) ST ! 1; ST =T 1=2 ! 0; (ii) k(:) : R ! [�kmax; kmax]:kmax < 1; k(0) 6=

0; k1 6= 0 and is continuous at zero at almost everywhere; (iii)R1�1

�k(x)dx < 1 where �k(x) =

I(x � 0) supy�x jk(y)j + I(x < 0) supy�x jk(y)j; (iv) jK(�)j � 0 for all � 2 R, where K(�) =

(2�)�1Zk(x) exp(�ix�)dx.

Assumption 2.6 T !1, ST = O(T 1=2��) for some � 2 (0; 1=2);

Assumption 2.7 (i) �0 2 B is the unique solution of E[gt (�)] = 0; (ii) B is compact; (iii) gt (�) is

continuous at each � 2 B; (iv) E[sup�2B kgt (�)k�] <1 for some � > max(4v; 1=�); (v) (�) is �nite

and p.d. for all � 2 B.

Assumption 2.8 (i) � (�) is twice di�erentiable and concave on its domain an open interval V con-

taining zero, �1 = �2 = �1; (ii) � 2 �T ; where �T =�� : k�k � D(T=S2T )��

for some D > 0 with

1=2 > � > 1=(2��):

Theorem 2.5 is proven in Smith (2011).

Theorem 2.5 If Assumptions 2.4, 2.6, 2.7 and 2.8 are satis�ed �p! �0 and �

p! 0: Moreover, � =

Op[(T=S2T )�1=2] and

gT (�) = Op(T�1=2):[9]

Let H � �G0�1 and P � �1��1G�G0�1. The proof of asymptotic normality of Smith (2011)

also required the following assumptions.

Assumption 2.9 (i) �0 2 int (B) ; (ii) g(�; �) is di�erentiable in a neighborhood N of �0 and E[sup�2N kGt (�)k�=(��1)

] <

1; (iii) rank(G) = p:

Smith (2011) proved the following theorem.

Theorem 2.6 If Assumptions 2.4, 2.6, 2.7, 2.8 and 2.9 are satis�ed�T 1=2(�gel � �0)T 1=2�=ST

�p! N(0;diag(�; P )):

3 The kernel block bootstrap method

The idea behind the KBB method is to replace the original sample by a transformed sample and apply the

i.i.d. bootstrap to the latter. To be more precise consider a sample of T observations, (X1; :::; XT ), on

the zero mean �nite dimensional stationary and strong mixing stochastic process fXtg1t=1 with E[Xt] = 0.

Let �X =PT

t=1Xt=T . De�ne the transformed variables

YtT =1

ST

Xt�1

s=t�Tk(s

ST)Xt�s; (t = 1; :::; T );

where ST is a bandwidth parameter and k(�) is a kernel function standardized such thatZ 1

�1k(v)dv = 1.

The standard KBB method consists in applying the non-parametric bootstrap for i.i.d. data using

the transformed sample (Y1T ; :::; YTT ) obtaining a bootstrap sample of size mT = T=ST , that is each

bootstrap observation is drawn from (Y1T ; :::; YTT ) with equal probability 1=T: The asymptotic validity

of the method was proven by Parente and Smith (2018a, 2018b).

In this article we modify the original method in that each observation is drawn with probability

P[Y �jT = YtT ] = ptT ; for j = 1; :::;mT and t = 1; :::; T where ptT can depend on the data and satisfy

0 � ptT � 1 andPT

t=1 ptT = 1. The standard KBB method of Parente and Smith (2018a, 2018b) is

obtained with ptT = 1=T for for j = 1; :::;mT and t = 1; :::; T: Let ~Y =PT

t=1 ptTYtT :

In order to prove that the bootstrap distribution ofpT ( �Y �� ~Y ) is close to the asymptotic distribution

of T 1=2 �X as T goes to in�nite; we required the following assumptions taken from Parente and Smith

(2018a, 2018b).

[10]

Assumption 3.1 The �nite dimensional stochastic process fXtg1t=1 is stationary and strong mixing

with mixing coe�cients � of size �3v=(v � 1) for some v > 1.

Assumption 3.2 (i) mT = T=ST , ST !1, ST = O(T 1=2��) for some � 2 (0; 1=2); (ii) E[jXtj�] <1;

for some � > max(4v; 1=�), (iii) �21 � limT!1 var[T1=2 �X] is �nite.

Assumption 3.3 (i) 0 � ptT � 1;PT

t=1 ptT = 1; max1�t�T jTptT j = op(1); (ii)pT ~Y = Op(1):

Similarly to Gon�calves and White (2004) P denotes the probability measure of the original time

series and P� that induced by the bootstrap method. For a bootstrap statistic ��T we write ��T ! 0

prob-P�, prob-P if for any " > 0 and any � > 0, limT!1 PfP�fj��T j > "g > �g = 0. We also use

measures of magnitude of bootstrapped sequences as de�ned by Hanh (1997). Let ��T = O!p (aT ) if �

�T .

when conditioned on ! is Op(aT ) and ��T = o

!p (aT ) if �

�T . when conditioned on ! is op(aT ). We write

��T = OB(1) if, for a given subsequence fT 0g there exists a further subsequence fT 00g such that O!p (1).

Similarly we write ��T = oB(1) if, for a given subsequence fT 0g there exists a further subsequence fT 00g

such that o!p (1).

The Theorem 3.1 shows that the bootstrap distribution ofpT=k2( �Y

� � ~Y ) is uniformly close to the

asymptotic distribution of T 1=2 �X:

Theorem 3.1 Under Assumptions 3.1-3.3 and 2.5, if E[Xt] = 0

limT!1

P�supx2R

��P�fpT=k2( �Y � � ~Y ) � xg � PfT 1=2 �X � xg�� "� = 0;

where k2 =

Z 1

�1k2(v)dv:

The GEL- KBB method is obtained when ptT = �t; where �t = �t(�gel):

Lemma 3.1 Assumption 3.3 is satis�ed if ptT = �t:

4 Kernel block bootstrap methods for GMM

4.1 The standard KBB method

Consider a bootstrap sample of size mT ; fg�tT (�)gmT

t=1 ; drawn from fgtT (�)gTt=1 and let W �T = WT +

oB(1); where W�T is positive semi-de�nite matrix: De�ne also g

�T (�) =

PmT

s=1 g�sT (�)=mT and

Q�T (�) = g�T (�)

0W �T g

�T (�):

[11]

To prove consistency we require the Assumption 4.1.

Assumption 4.1 (i) The observed data are realizations of a stochastic process z � fzt : ! Rn; n 2 N; t = 1; 2; :::g

on the complete probability space (;F ; P ) where = �1t=1Rn and F = B(�1t=1 Rn) (the Borel ��eld

generated by the measure �nite dimension product cylinders); (ii) zt is stationary and ergodic ; (iii)

g : Rl � B ! R is measurable for each � 2 B, B a compact subset of Rp, and g(zt; �) is continu-

ous; (iv) E[g(zt; �)] = 0 only for � = �0; (v) WT = W + op(1) and is a positive de�nite matrix,

W �T = WT + oB(1) (vi) E[sup�2B kg(zt; �)k

�] < 1 for some � � 1; (vii) T 1=�=mT = o(1); where

mT !1:

Theorem 4.1 shows that the GMM bootstrap estimator is consistent.

Theorem 4.1 Under assumption 4.1 �� ! 0, prob-P�, prob-P.

To prove the consistency of the bootstrap distribution of the GMM estimator we require assumption

4.2 to be satis�ed.

Assumption 4.2 (i) The (k � 1) random vectors fzt;�1 < t <1g form a strictly stationary and

mixing with mixing coe�cients of size �3v=(v � 1) for some v > 1; (ii) �0 2 int(B); (iii) g(zt:�) is

continuously di�erentiable in a neighborhood N of � with probability approaching one; (iv) E(g(z; �0)) =

0 and E[kg(z; �0)k�] is �nite for for some � > max(4v; 1=�); (v) E[sup�2N k@g(z; �)=@�0ka] < 1 for

some a > 2=(1+ 2�); (vi) G0WG is nonsingular and exists and is positive de�nite (vii) mT = T=ST :

Theorem 4.2 demonstrates the consistency of the KBB distribution of the GMM estimator.

Theorem 4.2 Under Assumptions 2.5, 4.1 and 4.2,

limT!1

P(supx2Rk

��P�frT

k2(�� ) � xg � PfT 1=2(� � �0) � xg

�� ")= 0:

4.1.1 Bootstrap Estimation of

Hansen (1982) showed that the most e�cient estimator is obtained if one sets W = �1: We now show

how to obtain consistent estimator for using the bootstrap: Let �( ~��) � STPmT

t=1 g�t (~��)g�t (

~��)0= (mT k2)

where ~�� is a bootstrap estimator of �0 such thatpT ( ~�� 0) = OB(1):

Assumption 4.3 is going to be required.

[12]

Assumption 4.3 E[sup�2N k@g(z; �)=@�0k2�=(��1)

] <1:

The desired result is given by Lemma 4.1

Lemma 4.1 Under assumptions 2.5, 4.2 (i), (iii), (iv), (vi), (vii) and 4.3 and ifpT ( ~��0) = OB(1)

we have

limT!1

P[P�[��( ~��)� �� > "] > �] = 0:

4.1.2 Testing for overidentifying restrictions

Let � = +oB(1); and let �e� be the bootstrap GMM estimator obtained with W �

T = ��1 and de�ne

J � =T

k2[g�(�e�)� g(�e)]0��1[g�(�e�)� g(�e)]:

The following Theorem proves the validity of the KBB- J test for overidentifying restrictions.

Theorem 4.3 Under Assumptions 2.5, 4.1 and 4.2,

limT!1

P�supx2R

jP�fJ � � xg � PfJ � xgj � "�= 0:

4.1.3 Bootstrap tests for parametric restrictions and additional moment conditions.

In this subsection we propose bootstrap versions of the tests for parametric restrictions and additional

moment conditions. Let

htT (�) =1

ST

t�1Xs=t�T

k(s

ST)ht(�); t = 1; :::; T

and consider a bootstrap sample of size mT ; fh�tT (�)gmT

t=1 ,drawn from fhtT (�)gTt=1 : Let

~� = +oB(1)

and �T = �+ oB(1): De�ne also h�T (�) =

PmT

s=1 h�sT (�)=mT ;

�Q�T (�) = h�T (�)

0��1T h�T (�);

and

�e�r = argmin�2Br

Q�h;T (�):

Let � = q�(�e�)��21��111 g�(�e�), r� = ((a(�e�)0; �0)0 and R� = R(�e�): Additionally, denote Q�t (�) �

@q�t (�)=@�0; Q� (�) �

PTi=1 Q

�t (�)=T;

� � (D�0��1D�)�1; where

D� (�) =

�G� (�) 0m�sQ� (�) �Is

�;

[13]

and D� = D�(�): We consider the following bootstrapped statistics

W� = (T

k2)[r� � r]0[R��R�0]�1[r� � r];

S� = (T

k2)[h�(�e�r )� h(�er)]0��1D�D�0��1[h�(�e�r )� h(�er)];

D� = (T

k2)([h�(�e�r )� h(�er)]0��1[h�(�e�r )� h(�er)]

�[g�(�e�)� g(�e)]0��1[g�(�e�)� g(�e)]):

Hall and Horowitz (1996) considered t-statistics for tests on a single parameter for GMM using MBB

and consequently these statistics seem to be new in the literature.

In order to show that the bootstrap distributions of these statistics are close to its asymptotic

distributions the following assumptions are required.

Assumption 4.4 (i) �0 is the unique solution of E[ht(�)] = 0 and r(�) = 0; E[kh(z; �0)k�] is �nite;

(ii) qt(�) is continuous in � for each zt 2 Z; (iii) r(�) is twice continuously di�erentiable on B; (iv)

@q(z; �)=@�0 exists and is continuous on B for each zt 2 Z (v) rank(Q) = s; (vi) E[sup�2N k@q(z; �)=@�0ka] <

1; (vii) � exists and is positive de�nite and � = � + op(1):

Theorem 4.4 reveals that under Assumption 4.4 the bootstrapped trinity of test statistics is consistent

to the asymptotic distributions of the statistics.

Theorem 4.4 Under Assumptions 2.5, 4.1, 4.2,4.4

limT!1

P�supx2R

jP�fW� � xg � PfW � xgj � "�

= 0;

limT!1

P�supx2R

jP�fS� � xg � PfS � xgj � "�

= 0;

limT!1

P�supx2R

jP�fD� � xg � PfD � xgj � "�

= 0:

Moreover, W�; S� and D� are asymptotically equivalent.

4.2 The generalised empirical likelihood kernel block bootstrap method

4.2.1 An e�cient GMM estimator

In this sub-section we introduce a GMM-type estimator that is e�cient and plays an important role in

establishing the consistency of the kernel block bootstrap distribution to the asymptotic distribution of

[14]

the GMM estimator. We consider the objective function

~QT (�) = ~g (�)0WT ~g (�) ;

where ~gT (�) =PT

t=1 gt;T (�)�t: De�ne the GMM-type estimator is de�ned as

~� = argmin�2B

~QT (�):

where WTp!W and W is a positive semi-de�nite de�nite matrix.

We characterize now the asymptotic properties of the new estimator. Theorem 4.5 shows that this

estimator is consistent for �0:

Theorem 4.5 Under Assumptions 2.4, 2.5, 2.6, 2.7 and 2.8 ~�p! �0:

Theorem 4.6 reveals that ~� is asymptotically equivalent to �e.

Theorem 4.6 Under Assumptions 2.4, 2.5, 2.6,2.7, 2.8 and 2.9

pT ( ~� � �0)�

pT (�e � �0)

p! 0;

pT ( ~� � �0)

D! N(0;�):

This theorem shows that no-matter the weighting matrixWT we choose, we always obtain a estimator

that is asymptotically equivalent to the e�cient two-step GMM estimator.

4.2.2 The bootstrap method

Let g?iT (�) ,i = 1; :::;mT be obtained by drawing observations from fgtT (�)gTt=1 where P(g?iT (�) =

gtT (�)) = �t; t = 1; :::; T: Denote g?T (�) =

PmT

i=1 g?iT (�)=mT : The generalised empirical likelihood kernel

block bootstrap estimator (GEL-KBB) �? is de�ned as follows. Let

�? = argmin�2B

g?T (�)0W ?

T g?T (�)

where W ?T =WT + oB(1):

Let P? be the bootstrap probability measure induced by the new resampling scheme.

Theorem 4.7 Under Assumption 2.4, 2.5, 2.6, 2.7, 2.8 and 4.1 �? � ~� ! 0 prob-P?, prob-P.

Assumption 4.5 E[sup�2N k@g(z; �)=@�0kl] < 1 for some l = max f�=(�� 1); 2=(1 + 2�) + "g ; for

some " > 0:

[15]

The following result shows consistency of the bootstrap estimator to the asymptotic distribution of

�:

Theorem 4.8 Under Assumptions 2.4, 2.5, 2.6, 2.8 2.9, 4.2 strengthen by 4.5 ,

limT!1

P(supx2Rp

��P?frT

k2(�? � ~�) � xg � PfT 1=2(� � �0) � xg

�� ")

= 0;

limT!1

P(supx2Rp

��P?frT

k2(�? � �e) � xg � PfT 1=2(� � �0) � xg

�� ")

= 0:

We note that �? is centered at the e�cient estimator ~�; not on the ine�cient �; though the bootstrap

distribution ofpT=k2(�

? � ~�) approximates the asymptotic distribution of the ine�cient estimator

T 1=2(��0): This result is not speci�c of the GEL-KBB method, it also holds for the empirical likelihood

moving blocks bootstrap of Allen et al. (2011) contradicting Theorems 1 and 2 of that article. Both

estimators only coincide if W = �1:

4.2.3 GEL-KBB Estimation of

Let ��? be a bootstrap estimator such thatpT ( ��? � �0) = OB(1): We now prove consistency of the

bootstrap estimator of under the GEL-KBB measure, which is given by

?( ��?) � STmT k2

XmT

t=1g?t (

��?)g?t (��?)0:

The consistency of ?( ��?) is proven in Lemma 4.2.

Lemma 4.2 Under Assumptions 2.4, 2.5, 2.6, 2.8 2.9, 4.2 strengthen by 4.3 ifpT ( ��? � �0) = OB(1)

we have

limT!1

P[P?[��?( ��?)� �� > "] > �] = 0:

4.2.4 Testing for overidentifying restrictions

Let W ?T =

~?�1 where ~? = + oB(1) and de�ne �e? as the bootstrap GMM estimator computed with

W ?T =

?�1: corresponds to the e�cient estimator and let

J ? =T

k2g?(�e?)0 ~?�1g?(�e?):

Theorem 4.9 Under Assumptions 2.4, 2.5, 2.6, 2.8 2.9, 4.2 strengthen by 4.5

limT!1

P�supx2R

jP?fJ ? � xg � PfJ � xgj � "�= 0:

[16]

4.2.5 GEL-KBB tests for parametric restrictions and additional moment conditions underthe maintained hypothesis

In this subsection we propose bootstrap versions of the tests for parametric restrictions and additional

moment conditions. Consider a bootstrap sample of size mT ; fh?tT (�)gmT

t=1 ; drawn from fhtT (�)gTt=1

where P(h?jT (�) = htT (�)) = �t; t = 1; :::; T and j = 1; :::;mT . Let also �? = � + oB(1), h

?(�) =PmT

s=1 h?sT (�)=mT ; ~hT (�) =

PTt=1 ht;T (�)�t and ~q(�) =

PTt=1 qt;T (�)�t: Consider the objective function

�Q?T (�) = h?(�)0�?�1h?(�) and let

�e?r = argmin�2Br

�Q?T (�):

De�ne ? = q?(�e?)� �?21�?�111 g?(�e?), r? = ((a(�e?)0; ?0)0; ~ = ~q(�e), ~r = ((a(�e)0; ~ 0)0; R? = R(�e?):

Additionally, let Q?t (�) � @q?t (�)=@�0 and Q? (�) �PT

i=1 Q?t (�)=T :Denote also

? � (D?0�?�1D?)�1where

D? (�) =

�G? (�) 0m�sQ? (�) �Is

�;

and D? = D?(�):We consider the following bootstrapped statistics

W? = (T

k2)[r? � ~r]0[R??R?0]�1[r? � ~r];

S? = (T

k2)[h?(�e?r )� ~h(�er)]0�?�1D??D?0�?�1[h?(�e?r )� ~h(�er)];

D? = (T

k2)([h?(�e?r )� ~h(�er)]0�?�1[h?(�e?r )� ~h(�er)]� g?(�e?)0 ~?�1g?(�e?)):

The Wald statistic can be seen as a generalization of the bootstrapped Wald statistic of Allen at al.

(2011) and Bravo and Crudu (2011) for parametric restrictions. The remaining statistics seem to be

new in the bootstrap literature.

Theorem 4.10 proves consistency of the bootstrap distribution of the trinity of test statistics.

Theorem 4.10 Under Assumptions Under Assumptions 2.4, 2.5, 2.6, 2.8 2.9, 4.2 strengthen by 4.5

and 4.4

limT!1

P�supx2R

jP?fW? � xg � PfW � xgj � "�

= 0;

limT!1

P�supx2R

jP?fS? � xg � PfS � xgj � "�

= 0;

limT!1

P�supx2R

jP?fD? � xg � PfD � xgj � "�

= 0:

Moreover, W?; S? and D? are asymptotically equivalent.

[17]

4.2.6 GEL-KBB tests for parametric restrictions and additional moment conditions underthe null hypothesis

In this subsection we propose kernel block bootstrap versions of the tests for parametric restrictions

and additional moment conditions that impose the null hypothesis through the generalised empirical

likelihood implied probabilities similar to the method proposed by Bravo and Crudu (2011).

Before introducing the method we need to introduce the GEL criteria for weakly dependent data for

additional moments which is given by

�PT (�; ') =XT

t=1[� (k'0htT (�))� �0]=T;

where k = 1=k2: The GEL estimator is de�ned as

�r;gel = argmin�2Br

sup'2�T

�PT (�; ')

where �T is de�ned below in Assumption 4.7 de�ne also ' (�) = arg sup'2�T

PT (�; ') ; 'r � '(�r;gel):

Consider a bootstrap sample of size mT ,nhytT (�)

omT

t=1; drawn from fhtT (�)gTt=1 where P(h

yjT (�) =

htT (�)) = ~�t; t = 1; :::; T and j = 1; :::;mT where

~�t =�1('

0rhtT (�r;gel))PT

j=1 �1('0rhjT (�r;gel))

; t = 1; :::; T:

We consider the case that the bootstrap weighting matrix is W yT = �

y�1; where �y = �+ oB(1): De�ne

hyT (�) � 1mT

PmT

s=1 hysT (�);

�QyT (�) = hyT (�)

0�y�1hyT (�)and let

�ey = argmin�2B

�QyT (�); �eyr = argmin

�2Br�QyT (�):

De�ne y = qy(�ey) � �y21�y�111 g

y(�ey), ry = ((a(�ey)0; y0)0 and Ry = R(�y): Additionally, let us de�ne

Qyt(�) � @qyt (�)=@�

0; Qy (�) �PT

i=1 Qyt(�)=T : Denote also

y � (Dy0�y�1Dy)�1where

Dy (�) =

�Gy (�) 0m�sQy (�) �Is

�;

We consider the following bootstrapped statistics

Wy = (T

k2)ry0[RyyRy0]�1ry;

Sy = (T

k2)hy(�eyr )

0�y�1DyyDy0�y�1hy(�eyr );

Dy = (T

k2)[hy(�eyr )

0�y�1hy(�eyr )� gy(�ey)0 ~y�1gy(�ey)]:

[18]

where ~y = + oB(1):

Versions of the statistics Sy and Dy for moving blocks bootstrap and parametric restrictions were

introduced previously by Bravo and Crudu (2011). The statistic Wy is new.

In order to show that the bootstrap distributions of these statistics are close to its asymptotic

distributions the following assumptions are required.

Assumption 4.6 (i) �0 2 B is the unique solution of E[ht (�)] = 0; (ii) B is compact; (iii) ht (�) is

continuous at each � 2 B; (iv) E[sup�2B kht (�)k�] <1 for some � > max(4v; 1=�); (v) � (�) is �nite

and p.d. for all � 2 B:

Assumption 4.7 ' 2 �T ; where �T =�' : k'k � D(T=S2T )��

; for some D > 0 with 1=2 > � >

1=(2��):

Assumption 4.8 (i) �0 2 int (B) ; (ii) h(�; �) is di�erentiable in a neighborhood N of �0 and E[sup�2N kHt (�)kl] <

1 where l = max f�=(�� 1); 2=(1 + 2�) + "g ; (iii) rank(H) = p+ q:

Theorem 4.11 demonstrates that the bootstrapped Wald, score and distance statistics are asymptot-

ically valid.

Theorem 4.11 Under Assumptions 2.5, 4.6, 4.7, 4.8, 4.2, 4.4

limT!1

P�supx2R

��PyfWy � xg � PfW � xg�� "� = 0;

limT!1

P�supx2R

��PyfSy � xg � PfS � xg�� "� = 0;

limT!1

P�supx2R

��PyfDy � xg � PfD � xg�� "� = 0:

Moreover, Wy; Sy and Dy are asymptotically equivalent.

5 Monte Carlo Study

In this section we present a simulation study in which we investigate the small sample properties of

the proposed bootstrap methods. The model used in our study is a version of an asset-pricing model

considered in the Monte Carlo study of Hall and Horowitz (1996). The moment restrictions of this model

[19]

are

Efexp[�s � �0(x+ z) + 3z]� 1g = 0;

Efz exp[�s � �0(x+ z) + 3z]� zg = 0

where �0 = 3 , �s = �9s2=2, and x and z are scalars. The random variable x has distribution normal

with mean zero and variance s2; with s = 0:2 or 0:4. z is independent of x, has a marginal distribution

normal with zero mean and variance s2; and is either sampled independently from this distribution or

follows an AR(1) process with �rst-order serial correlation coe�cient �z = 0:75.

We evaluate the performance of Hansen (1982)'s J test and the symmetrical t tests for the null

hypothesis H0 : �0 = 3 with asymptotic and bootstrap critical values. The J statistic is computed using

the two-step GMM estimator in which the weighting matrix used in the �rst step is the identity matrix.

In the second step the long-run variance of the moment indicators is computed using the Newey-West

estimator (Newey and West, 1987).2

We obtain the bootstrap critical values for the J -tests and t-tests using the standard moving blocks

bootstrap, the kernel blocks bootstrap (KBB) based on di�erent kernel functions and the versions of

these methods based on the Empirical Likelihood (EL) implied probabilities. KBB is computed using the

truncated kernel (KKBtr), the Bartlett Kernel (KKBbt), the kernel that induces the quadratic-spectral

kernel (KKBqs) [see Smith (2011)] and the kernel version of the optimal taper of Paparoditis and Politis

(2001) (KKBpp). The EL implied probabilities are computed imposing the moment restrictions in the

sample: In the tables of results we use the superscript el to denote the results obtained with the bootrap

method based on the implied probabilities. Although the methods were computed for the case that there

is dependence in the data, we also apply the same method in the case that there is no dependence.3

In order to investigate whether the methods proposed are sensitive to the choice of the band-

2We also computed a two step GMM estimator in which the long-run variance of the moment indicators is estimatedusing the Andrews (1991) estimator based on the Quadratic Spectral kernel. These results are available upon request.Additionally, we investigated the performance of the tests based on the J -statistic in which the long run variance of themoment indicators was estimated using the approach of Andrews and Monahan (1992) which requires pre-whitened series.The results obtained were not satisfactory in the Monte Carlo design considered and consequently are not presented.

3The quasi-Newton algorithm of MATLAB is used to compute GMM and EL hence ensuring a local optimum. The Newtonmethod is used to locate �(�) for given � which is required for the pro�le EL objective function. EL computation requiressome care since the EL criterion involves the logarithm function which is unde�ned for negative arguments; this di�cultyis avoided by employing the approach due to Owen in which logarithms are replaced by a function that is logarithmic forarguments larger than a small positive constant and quadratic below that threshold. See Owen (2001, (12.3), p.235); Notehowever, that this method might produce estimates that lie outside the convex hull of the data. In our study the worstcase in which this problem occurred a�ected 1% of the replications and corresponded to the case n = 50; s = 0:2 andthe truncated kernel was used. In all the remaining designs the problem only occurred in less or equal than 0:6% of thereplications. Hence our results are not considerably a�ected by this issue.

[20]

width/block size we compute these parameters using two methods: the automatic bandwidth of Andrews

(1991) based on an AR(1) model and a non-parametric version of the Andrews (1991) method based

on a taper proposed by Romano and Politis (1995). These methods to compute the bandwidth were

applied to the residuals obtained in the �rst step of the GMM problem [see Parente and Smith (2018b),

section 4.3, for details]. Additionally, given that the computed automatic bandwidth ST might induce

values of mT = dT=ST e larger than T or equal to 1; where d�e is the ceiling function, we replace ST by

S�T = max fST ; 1g and mT by m�T = max

nlT=S�T

m; 2o: Consequently we have 2 � m�

T � T:

We can �nd in the literature di�erent bootstrap symmetric t-tests. Hall (1992, see sections 3.5, 3.6

and 3.12) considers the two-sided symmetric percentile t-test, the two sided equal-tailed t-test. Here

we report only the results on the former method because it provided the best results in our study.

Additionally, because our objective is to compare the performance of several di�erent bootstrap tests

we present, for succinctness, only the results on computed using the 5% nominal level.4

Table 1 reports the empirical rejection rates of the Hansen (1982)'s J test. The results obtained

reveal that the J test based on asymptotic critical values are slightly undersized for s = 0:2 and they

become to some extent oversized for s = 0:4: Note that in the latter case the rejection frequencies do

not get closer to the nominal size when the sample size increases from 50 to 100.5 The tests based on

standard KBB and MBB critical values are considerably undersized. The tests based on the empirical

likelihood versions of the bootstrap methods although are undersized for s = 0:2; yield empirical rejection

rates closer to the nominal size for s = 0:4:

Table 2 presents the results on the t-tests for the hypothesis H0 : �0 = 3. The empirical rejection

rates of the t-tests based on the asymptotic critical values are considerably larger than the nominal

rate. On the other hand, the performance of the t-tests based on the critical values obtained with MBB

and KBB are noticeably better than those based on the asymptotic critical values. However, the t-tests

based on the taper of Paparoditis and Politis (2000) are undersized. The empirical-likelihood versions

of these t-tests, in general are slightly oversized, apart from the case in which the kernel version of the

taper of Paparoditis and Politis (2001).

Overall the results obtained with both methods to compute the automatic bandwidth are very similar

4The results on 1% and 10% nominal level were also computed and are available upon request.5Note that these results are di�erent to those reported by Hall and Horowitz (1995), specially in the case s = 0:4,

though they computed the GMM estimator using a di�erent weighting matrix.

[21]

Table 1: Empirical rejection rates of the J-tests with asymptotic and bootstrap critical values at 5%level

Politis and Romano Andrewsn 50 100 50 100�z 0 0:75 0 0:75 0 0:75 0 0:75s 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4

asymp 2:4 7:9 3:8 5:6 3:1 9:2 2:9 7:5 3:3 8:0 3:5 7:0 3:6 8:8 4:5 7:7

kbbtr 0:3 1:3 0:6 1:5 0:3 2:4 0:7 2:6 0:5 2:1 0:6 1:9 0:7 2:8 1:1 2:1kbbeltr 1:2 5:2 1:9 3:1 2:4 7:1 2:4 5:9 1:6 5:0 1:6 3:8 2:8 7:2 3:4 5:7

kbbbt 0:1 0:4 0:1 0:6 0:5 0:8 0:6 1:1 0:3 0:5 0:3 0:8 0:7 0:9 0:8 1:2kbbelbt 0:8 4:4 1:3 2:9 1:5 6:4 1:7 5:2 0:7 4:6 1:3 3:7 2:2 6:9 2:5 4:7

kbbpp 0:0 0:0 0:1 0:0 0:2 0:3 0:5 0:6 0:2 0:2 0:2 0:4 0:6 0:4 0:7 0:7kbbelpp 0:5 3:6 1:8 2:6 1:0 5:1 1:5 4:1 0:6 3:6 1:5 3:3 1:3 5:4 2:1 3:6

kbbqs 0:1 0:2 0:1 0:3 0:3 0:8 0:7 0:9 0:3 0:1 0:2 0:8 0:7 0:6 0:8 1:2kbbelqs 0:6 3:3 1:5 2:0 1:5 6:3 1:7 4:6 1:0 3:9 1:3 2:9 1:9 6:1 2:7 3:9

mbbbt 0:2 0:1 0:2 0:1 0:3 0:5 0:5 0:7 0:4 0:2 0:3 0:6 0:8 0:6 0:7 0:9mbbel 0:6 4:0 1:3 2:4 1:2 6:2 1:6 4:5 0:5 3:9 1:1 3:3 1:6 6:0 2:3 4:0

which may indicate that the proposed methods are robust to the choice of this parameter.

6 Conclusion

In this article we put forward new bootstrap methods for models de�ned through moment restrictions

for time series data that build on the kernel block bootstrap method of Parente and Smith (2018a,

2018b). These methods approximate the asymptotic distributions of tests for overidentifying conditions,

parametric restrictions and additional moment restrictions. We consider methods that impose the null

hypothesis, methods that impose the maintained hypothesis and methods that do not impose any restric-

tion in the way the bootstrap samples are generated. We prove the �rst-order validity of the methods

generalizing and correcting the work of Allen et al. (2011) and Bravo and Crudu (2011). A simulation

study reveals that the proposed methods perform well in practice.

Appendix: Proofs

Throughout the Appendix, C and � will denote generic positive constants that may be di�erent in di�erent uses, andC, M, and T the Chebyshev, Markov, and triangle inequalities respectively. We use the same notation of Gon�calves andWhite (2004). For a bootstrap statistic W �

T (:; !) we write W�T (:; !) ! 0 prob � P�; prob � P if for any " > 0 and any

� > 0, limT!1 P[P�T;! [jW �T (�; !)j > "] > �] = 0:

A.1 Proofs of the results in subsection 2.1.1Proof of Theorem 2.4: As Tauchen (1985) and Ruud (2000) we recast the test for H0 as a test for parametric restrictionsqat (�; ) � qt(�)� and construct the moment indicators hat (�; ) � (gt(�)0; qat (�; )

0): Under the null hypothesis = 0;

a (�0) = 0 thus we have the model E(hat (�0; 0)) = 0 and a (�0) = 0: De�ne � = (�0; 0)0 and ha(�) =

PTt=1 h

at (�; )=T:

[22]

Table 2: Empirical rejection rates of the t-tests with asymptotic and bootstrap critical values at 5% levelPolitis and Romano Andrews

n 50 100 50 100�z 0 0:75 0 0:75 0 0:75 0 0:75s 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4

asymp 24:3 25:8 22:2 25:7 18:3 20:4 19:5 19:9 22:3 26:0 20:0 25:9 18:9 20:1 17:4 20:0

kbbtr 4:4 6:6 6:0 7:9 4:6 6:1 5:9 6:6 4:4 7:1 4:7 6:6 5:5 6:3 5:8 6:9kbbeltr 6:9 8:7 7:5 9:3 6:6 7:8 7:3 7:8 6:5 8:4 6:2 8:2 7:1 6:9 6:8 7:5

kbbbt 4:1 4:4 4:8 6:5 3:6 3:9 5:1 4:5 3:9 4:2 4:1 5:2 3:8 3:8 4:3 5:2kbbelbt 6:0 6:4 6:8 8:5 5:5 5:7 6:9 6:0 6:5 5:3 5:7 6:9 7:6 4:9 6:2 5:8

kbbpp 2:9 2:5 3:4 4:1 2:8 2:5 3:3 3:2 2:9 2:9 2:9 4:0 3:0 2:3 3:0 3:1kbbelpp 4:4 4:0 4:5 5:5 4:0 4:3 5:2 4:3 4:4 3:1 3:7 5:1 6:0 3:1 4:6 3:7

kbbqs 4:4 4:5 5:1 6:4 3:6 4:3 5:2 4:8 4:0 4:8 4:3 5:4 4:0 4:2 4:2 5:1kbbelqs 6:6 6:6 6:8 8:9 5:8 6:4 7:0 7:0 6:9 6:3 6:2 7:9 7:7 5:8 6:6 7:0

mbb 3:6 3:3 4:4 5:2 3:0 2:9 4:6 4:0 3:6 3:5 3:9 4:6 3:4 3:2 3:8 4:1mbbel 5:8 5:3 5:6 7:4 5:1 5:0 6:6 4:9 5:7 4:2 5:5 5:9 6:7 4:6 6:1 5:2

De�ne r (�) = (a (�)0 ; 0) and the unrestricted GMM objective function

Qa(�) = ha(�)0��1ha(�):

Consider the GMM estimator�e = argmin

�2�Qa(�):

As pointed out by Ruud (2000, p. 574-575) the sub-vectors of � are

�e = argmin�2B

g(�)0g(�);

= q(�)� �21��111 g(�):

We note that by Theorem 2.1 �e = �0 + op(1) also as � = � + op(1) and �11 is invertible we have by a UWL that = op(1) as E(hat (�0; 0)) = 0 under the regularity conditions of the Theorem 2.1. and

pT (�e � �0)

d! N(0;�)

by Theorem 2.2 as �0 2 int(B) and 0 2 int(R) = R where � = (D0��1D).Furthermore using the usual arguments based on �rst order conditions we have

pT

��e � �0

�= �[D0��1D]�1D0��1

pT ha(�0; 0) + op(1):

Thus by a Taylor expansion we have under H0

pT

�a(�e)

�= �R(��)[D0��1D]�1D0��1

pT ha(�0; 0) + op(1)

= �R[D0��1D]�1D0��1pT ha(�0; 0) + op(1)

where �� is in a line between (�e0; 0)0 and 0. Hence

W = T

�a(�e)

�0 hR(D0��1D)�1R

i�1 � a(�e)

�=

pT ha(�0; 0)

0KpT ha(�0; 0) + op(1);

as D = D + op(1); � = � + op(1), R = R+ op(1),pT h(�0; 0) = Op(1) and where

K � ��1D[D0��1D]�1R0�R(D0��1D)�1R

��1R[D0��1D]�1D0��1:

Note that �K�K� = �K� and tr(K�) = s + r:Thus by Theorem 9.2.1 of Rao and Mitra(1971) It follows that

W d!�2(r + s):We consider now the LM statistic

LM = T h(�er)0��1D(D0��1D)�1D0��1h(�er):

[23]

Note that the restricted GMM estimator solves

�er = argmin�2�r

ha(�)0��1ha(�);

where �r = f( 0; �0) 2 � : a(�) = 0; = 0g : We note that since � is compact, �r is compact. Note that �er = (�e0r ; 00)0

and �er is consistent by Theorem 2.1.We derive now the distribution of the restricted estimator. The Lagrangian is

L = ha(�)0��1ha(�)� �0r(�)

and the �rst order conditions are

D0r�

�1ha(�er)�R(�er)� = 0;

r(�er) = 0;

where Dr = D(�er): Thus by the usual arguments we havepT (�er � �0) = �

�� R0(R�R

��1R�)D0��1

pT ha(�0; 0) + op(1):

where � = [D0��1D]�1: By a Taylor expansionpT ha(�er) =

pT ha(�0; 0) +D

pT (�er � �0)

= [Im+s ��D�D0��1 �D�R0(R�R

��1R�D0��1]

pT h(�0; 0) + op(1): (A.1)

Thus

LM = T ha(�er ; 0)0��1D(D0��1D)�1D0��1h(�er ; 0)

= T ha(�er)0��1D(D0��1D)�1D0��1ha(�er)

=pT ha(�0; 0)

0KpT ha(�0; 0) + op(1)

as D = D + op(1) and � = � + op(1): Thus LM is asymptotically equivalent to W.Now we consider the distance statistic

D = T [h(�er)0~��1h(�er)� g(�e)0�1g(�e)]

= T [ha(�er)0~��1ha(�er)� ha(�e)0��1ha(�e)]:

It follows from replacingpT ha(�er) by (A:1) and

pT ha(�e) by

pT ha(�e) =

pT ha(�0; 0) +D

pT (�e � �0) + op(1)

=pT ha(�0; 0)�D[D0��1D]�1D0��1

pT ha(�0; 0) + op(1)

= [Im+s �D[D0��1D]�1D0��1]pT ha(�0; 0) + op(1):

Thus aspT ha(�0; 0) = Op(1) we have

D =pT ha(�0; 0)

0KpT ha(�0; 0) + op(1)

and the result follows.

A.2 Auxiliary results on Generalised Empirical Likelihood

A.2.1 Unrestricted models

The following Lemma corresponds to a version of Lemma A.1 of Ramalho and Smith (2011) for weakly dependent data.

Lemma A.1 If Assumptions 2.4, 2.6, 2.7 and 2.8 are satis�ed, then T �t = 1 + op(1) and

T 1=2(�t � 1=T ) =ST

Tg0tT

T 1=2

ST�(1=k2 + op(1)) +Op(

ST

T):

uniformly t = 1; :::; T:

Proof: The proof of the is contained in the proof of Theorem 3.1 of Smith (2004,p. A.11).

Let wtT =1ST

Pt�1s=t�T k(

sST)wt; t = 1; :::; T; ~w =

PTt=1 �twtT ; w =

PTt=1 wt=T:

Assumption A.1 (i) The random vectors f(wt; zt);�1 < t <1g form a strictly stationary and mixing with mixingcoe�cients of size �3v=(v � 1) for some v > 1; (ii) E[wt] = 0; E[kwtk�] < 1; for some � > max (4v; 1=�) and� = limT!1 var[T 1=2w] is �nite and p.d.

[24]

The following Lemma corresponds to a simpli�ed version of Theorem 3.1 of Smith (2011).

Lemma A.2 Under assumptions 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, and A.1

pT ~w = T�1=2

XT

t=1wt �B0PT 1=2g(�0) + op(1);

where B0 =P1s=�1 E[wtgt�s(�0)0]: Additionally if wt = g(zt; �0) we have

pT ~w = [G�G0�1]T 1=2g(�0) + op(1):

Proof: Note that by Lemma A.1

T 1=2 ~w = T 1=2XT

t=1�twtT

= T 1=2XT

t=1wtT =T +

XT

t=1[ST

Tg0tT

T 1=2

ST�(k + op(1)) +Op(

ST

T)]wtT

= T 1=2XT

t=1wtT +

ST

T

XT

t=1wtT g

0tT

T 1=2

ST�(k + op(1)) +

XT

t=1wtTOp(

ST

T)]:

Now by Smith (2011) Proof of Theorem 2.3 (see expression B.2, p A.11) we have

T 1=2

ST� = �T 1=2P gT (�0) + op(1):

Thus

T 1=2 ~w = T�1=2TXt=1

wtT +ST

T

TXt=1

wtT g0tT [�T 1=2P gT (�0) + op(1)](k + op(1))

+TXt=1

wtTOp(ST

T)];

where gT (�) =1T

PTt=1

1ST

Pt�1s=t�T k ((s=ST ) gt�s(�); and gtT = gtT (�): Now note that as in Lemma A.2 of Smith (2011)

we have

T�1=2TXt=1

wtT = T�1=2XT

t=1wt +Op(T

�1=2);

T 1=2gT (�) = T 1=2XT

t=1gt(�0) +Op(T

�1=2):

and

TXt=1

wtTOp(ST

T) = Op(

ST

T 1=2)[T�1=2

TXt=1

wt +Op(T�1=2)]

= op(1):

By similar arguments of the proof of Lemma A3 of Smith (2011) we have

ST

Tk

TXt=1

wtT g0tT = B0 + op(1);

where B0 =P1s=�1 E[wtgt�s(�0)0]:

pT ~w = T�1=2

TXt=1

wt �B0PT 1=2g(�0) + op(1):

Now note that if wt = g(zt; �0) we have T�1=2PTt=1 wt = T

1=2g(�0) and B0 = and hence

T�1=2TXt=1

wt �B0PT 1=2g(�0) = T 1=2g(�0)� [�1 � �1G�G0�1]T 1=2g(�0)

= G�G0�1T 1=2g(�0):

Proof of Theorem 4.5: Note that by CS�� ~QT (�)� Q(�)�� k~g(�)� g(�)k2 kWT k :

[25]

Note that by Tsup�2B

k~g(�)� g(�)k � sup�2B

k~g(�)� E[g(zt; �]]k+ sup�2B

kg(�)� E[g(zt; �]k :

Also by a UWLsup�2B

kg(�)� E[g(zt; �]k = op(1):

Now

sup�2B

k~g(�)� E[g(zt; �]]k = sup�2B

k~g(�)� gT (�)k+ sup�2B

kg(�)� E[g(zt; �]k

� max1�t�T

jTptT � 1j sup�2B

kgT (�)k+ op(1)

= op(1);

by max1�t�T jTptT � 1j = 1 + oB(1) and an UWL. Hence�� ~QT (�)� Q(�)�� = op(1) as kWT �Wk = op(1). Thus the

result follows by Theorem 2.1.

Proof of Theorem 4.6: The �rst order criteria yieldpT ~G0TWT ~gT ( ~�) = 0 where ~GT � @~gT ( ~�)=@�

0: Hence by aTaylor expansion p

TG0TWT ~gT (�0) + G0TWT

�GTpT ( ~� � �0) = 0;

where �GT � @~gT ( ��)=@�0 where �� is in a line joining ~� and �0: Solving forpT ( ~� � �0) we obtain

pT ( ~� � �0) = �(G0TWT

�G�T )�1pTG0TWT ~gT (�0) (A.2)

= �(G0TWT�GT )

�1G0TWT f[G�G0�1]T 1=2g(�0) + op(1)g

by Lemma A.2. But by Lemma A.1 of Smith (2011) we have GT = G+op(1); �GT = G+op(1_): And sinceWT =W +op(1)and T 1=2g(�0) = Op(1): Thus

pT ( ~� � �0) = �(G0WG)�1G0WG�G0�1T 1=2g(�0) + op(1)

= �G0�1T 1=2g(�0) + op(1)

which corresponds to the asymptotic representation of the e�cient GMM estimator (see for instance Hall, 2005, p. 70 eq3.26 with WT =

�1)

A.2.2 Restricted models

For notational convenience we now de�ne the restricted GEL estimator in a slightly di�erent but equivalent manner towhat is done in sub-section 4.2.6. Let

�Pn (�; ') =1

T

XT

t=1[�(['0hatT (�)]=k2)� �0];

Pn (�; ') = �Pn�(�0; 00)0; '

�=1

T

XT

t=1[�(['0htT (�)]=k2)� �0];

~Pn (�; '; �) =1

T

XT

t=1[�(['0hatT (�) + �

0r(�)]=k2)� �0];

where � = (�0; 0)0 ; ha(zt; �) = (g(zt; �)0 ; q(zt; �)0 � 0)0; hatT (�) =

1ST

Xt�1

s=t�Tk( s

ST)ha(zt; �); t = 1; :::; T: Let �r =�

� = (�0; 0)0 : a(�) = 0; = 0; thus �r = Br � f0g : Let �T =

�' : k'k � D(T=S2T )��

:

Let('(�)0; �(�)0) = argmax

'2�T ;�2Rs~Pn (�; '; �) :

Note that '(�) can also be de�ned as'(�) = argmax

'2�T

�Pn (�; ') ; � 2 �r

and

�r = argmin�2�r

�Pn (�; '(�))

= argmin�2�r

~Pn (�; '(�); �(�))

and let 'r = '(�r); �r = �(�r):

We note that �r = S1�r where�r = argmin

�2Brsup'2�T

Pn (�; ')

and S1 is a matrix such that S1�r = (�0r; 00s�1)

0:The following Theorem provides a convenient asymptotic representation of the restricted GEL estimator and corre-

sponding Lagrange multiplier.

[26]

Theorem A.1 If Assumptions 2.4, 2.6, 4.6, 4.7 and 4.8 are satis�ed �rp! �0 and 'r

p! 0; �rp! 0: Moreover, k'rk =

Op[(T=S2T )�1=2]; k�rk = Op[(T=S2T )�1=2];

S1pT��r � �0

�= �[�� R0

�R�R0

��1R�]D0��1

pT hT (�0) + op(1);

pT 'r

ST= �Pr

pT hT (�0) + op(1);

where Pr = ��1 � ��1DS1[�� R0 [R�R0]�1R�]D0��1:

Proof: Let k = 1=k2: The �rst order conditions are

k1

T

XT

t=1�1(k('

0rhtT (�r)))h

atT (�r; 0) = 0; (A.3)

k1

T

XT

t=1�1(k('

0rhtT (�r)))r(�r; 0) = 0;

k1

T

XT

t=1�1(k('

0rhtT (�r)))(DtT (�r)

0'r +R(�r)0� = 0;

where DtT (�) = @hatT (�)=@�0 and R(�) = 0: Note that r(�r; 0) = 0: Similarly to Theorem 2.5 we have �r ! �0; 'r

p! 0

and k'rk = Op[(T=S2T )�1=2]: Thereforemax1�i�T

��1(k('0rhatT (�r; 0))) + 1�� p! 0: Thus

�k[1 + op(1)]1

T

XT

t=1DtT (�r; 0)

0'+R(�r; 0)0�r = 0

as 1T

PTt=1DtT (�r; 0)

0 p! D by a UWL and 'rp! 0 and we have

(R+ op(1))�r = op(1)

and consequently as rank(R) = r + s we must have �p! 0.

Note also that k'rk = Op[(T=S2T )�1=2] hence

(R+ op(1))�r = Op[(T=S2T )

�1=2]

and consequently �r = Op[(T=S2T )�1=2]:

Now a �rst order Taylor expansion of the lfs of (A:3) around 'r = 0 gives

�k 1T

XT

t=1htT (�r) +

1

T

XT

t=1�2(k( ~'

0rh

atT (�r; 0)))htT (�r)htT (�r)

0'r = 0; (A.4)

where ~'r is in a line joining 'r and 0: Now note that by a Taylor expansion we have

htT (�r) = htT (�0) +DtT ( ~�r)S1��r � �0

�; (A.5)

where ~�r lies in a line joining �r and �0: Replacing (A:5) in (A:4) yields

�k 1T

XT

t=1htT (�0)� k

1

T

XT

t=1DtT ( ~�r)S1

��r � �0

�+ST

T

XT

t=1�2(k( ~'

0rhtT (�r)))htT (�r)htT (�r)

0 'rST

= 0:

Now as �0r = Op(ST =pT ), 'r = Op(ST =

pT );

pT (�r � �0) = Op(1) (which is a consequence of the fact that hT (�r) = Op(T�1=2) by Theorem 2.2 of Smith (2011) and assumption (4:8)) and max1�i�T

��2(k('0rhtT (�r))) + 1�� wehave by a UWL and continuity of R(�)

D0pT

ST'r +R

0pT

ST�r = op(1)

pT hT (�0) +DS1

pT��r � �0

�+ �

pT 'r

ST= op(1);

RpTS1(�r � �0) = op(1):

Multiply the �rst equation by R�; where � = [D0��1D]�1; and solving for �r we obtainpT

ST�r = �

�R�R0

��1R�D0

pT

ST'r + op(1):

Replacing it in the �rst equation yields

D0pT

ST'r �R0

�R�R0

��1R�D0

pT

ST'r = op(1)

[27]

and multiplying both sides by � we have

�D0pT

ST'r � �R0

�R�R0

��1R�D0

pT

ST'r = op(1);

which is equivalent to

[�� R0�R�R0

��1R�]D0

pT

ST'r = op(1):

Consider nowpT hT (�0) +DS1

pT��r � �0

�+ �

pT 'r

ST= op(1):

Multiplying both sides by [�� R0 [R�R0]�1R�]D0��1 we obtain

[�� R0�R�R0

��1]R�]D0��1

pT hT (�0)

+[�� R0�R�R0

��1R�]D0��1DS1

pT��r � �0

�= op(1):

NowR�(D)0��1DS1

pT��r � �0

�= RS1

pT��r � �0

�= R

pT��r � �0

�= op(1):

Hence we have

S1pT��r � �0

�= �[�� R0

�R�R0

��1R�]D0��1

pT hT (�0) + op(1): (A.6)

Additionally note thatpT hT (�0) +DS1

pT��r � �0

�+ �

pT 'r

ST= op(1);

and replacing (A:6) in this equation and solving forpT 'r=ST yields

pT 'r

ST= ��1[

pT hT (�0) +DS1

pT��r � �0

�]

= ��1[pT hT (�0)�DS1[�� R0

�R�R0

��1R�](D)0��1

pT hT (�0):]

= [��1 + ��1DS1[�� R0�R�R0

��1R�]D0��1]

pT hT (�0):

Let

~�t =�1(['0rhtT (�r)]=k2)PTt=1 �1(['

0rhtT (�r)]=k2)

; t = 1; :::; T:

Lemma A.3 If Assumptions 2.4, 2.6, 2.7 and 2.8 are satis�ed, then T ~�t = 1 + op(1) and

T 1=2(~�t � 1=T ) =ST

Th0tT

T 1=2

ST'r(1=k2 + op(1)) +Op(

ST

T):

uniformly t = 1; :::; T:

Proof: This is similar to the proof of Lemma A.1Let ~wr =

PTt=1 ~�twtT :

Lemma A.4 Under assumptions 2.4, 2.6, 4.6, 4.7 and 4.8 and A.1

pT ~wr = T

�1=2XT

t=1wt � J0PrT 1=2h(�0) + op(1);

where J0 =P1s=�1 E[wtht�s(�0)0]; Pr = ��1 � ��1DS1[� � �R0 [R�R0]�1R�]D0��1: Additionally if wt = h(zt; �0)

we have pT ~wr = DS1[�� R0

�R�R0

��1R�]D0��1]

pT h(�0) + op(1):

Proof: Note that by Lemma A.3

T 1=2 ~wr = T 1=2XT

t=1~�twtT

= T 1=2XT

t=1wtT =T +

XT

t=1[ST

Th0tT

T 1=2

ST'r(k + op(1)) +Op(

ST

T)]wtT

= T 1=2XT

t=1wtT +

ST

T

XT

t=1wtT h

0tT

T 1=2

ST'r(k + op(1)) +

XT

t=1wtTOp(

ST

T)]:

[28]

Now by Theorem A.1 we haveT 1=2

ST'r = �T 1=2Prh(�0) + op(1):

Thus

T 1=2 ~wr = T�1=2XT

t=1wtT +

ST

T

XT

t=1wtT g

0tT [�T 1=2Prh(�0) + op(1)](k + op(1))

+XT

t=1wtTOp(

ST

T)];

where hT (�) =1T

PTt=1

1ST

Pt�1s=t�T k ((s=ST )ht�s(�); and htT = htT (�): Now note that as in Lemma A.2 of Smith

(2011) we have

T�1=2XT

t=1wtT = T�1=2

XT

t=1wt +Op(T

�1=2);

T 1=2hT (�) = T 1=2XT

t=1ht(�0) +Op(T

�1=2);

and XT

t=1wtTOp(

ST

T) = Op(

ST

T 1=2)[T�1=2

XT

t=1wt +Op(T

�1=2)]

= op(1):

By a similar argument arguments of the proof of Lemma A3 of Smith (2011) we have

ST

TkXT

t=1wtT h

0tT = J0 + op(1);

where J0 =P1s=�1 E[wtht�s(�0)0]: Thus

pT ~wr = T

�1=2XT

t=1wt � J0PrT 1=2h(�0) + op(1):

Now note that if wt = h(zt; �0) we have T�1=2PTt=1 wt = T

1=2h(�0) and J0 = � and hence

T�1=2XT

t=1wt � J0PrT 1=2h(�0) = DS1[�� R0

�R�R0

��1R�]D0��1]

pT h(�0) + op(1):

A.3 Proofs of the results in sub-section 3 and auxiliary Lemmata on theweighted kernel block bootstrap method

In this Appendix we present bootstrap LLN, CLT and UWL that are required to prove the results.Proof of Theorem 3.1: Let

qtT � ~Y + (ST =k2)1=2(YtT � ~Y ); (t = 1; :::; T );

q�tT � ~Y + (ST =k2)1=2(Y �tT � ~Y ) (t = 1; :::;mT );

and

~q �XT

t=1qtT ptT =

XT

t=1~Y ptT + (ST =k2)

1=2XT

t=1(wtT � ~w)ptT

= ~Y ;

~q� �XmT

t=1q�tT =mT :

ThuspmT (~q

� � ~q) =pmT ((ST =k2)

1=2( �Y � � ~Y ))

=pT=k2( �Y

� � ~Y ):

and consequently

P�fpT=k2( �Y

� � ~Y ) � xg = P�fpmT (~q�T � ~q) � xg;

where �Y � = 1mT

PmTj=1 Y

�jT :The result is proven if we are able to show the following steps:. Step 1: �X

p! 0. Step 2:

T 1=2 �X=�1d! N(0; 1). Step 3: supx2R

��PfT 1=2 �X � xg � �(x=�1)��! 0, where �(�) is the c.d.f. of the standard normal

distribution. Step 4: mT var�[~qe�T ]

p! �21. Step 5:

limT!1

P(supx2R

��P�fpmT (~q

� � ~qe)

var�[pmT ~q�]1=2

� xg � �(x)�� "

)= 0:

[29]

Step 1: Follows from the ergodic theorem (Theorem 3.34 of White, 1999).Step 2: By White (1999, Theorem 5.20).Step 3: From Step 2 and the Polya Theorem, Ser ing (2002, p.18), as �(�) is a continuous c.d.f.Step 4: To prove this note that

E�(q�tT ) = E�( ~Y + (ST =k2)

1=2(Y �tT � ~Y )) = ~Y

and

var�(q�tT ) = var�( ~Y + (ST =k2)1=2(Y �tT � ~Y ))

= ST =k2var�(Y �tT )

=ST

k2

XT

t=1(YtT � ~Y )2ptT

=ST

k2

XT

t=1Y 2tT ptT �

ST

k2~Y 2

=ST

k2

1

T

XT

t=1Y 2tTTptT +Op(

ST

T)

=ST

k2

1

T

XT

t=1Y 2tT (1 + op(1)) +Op(

ST

T)

= �21 + op(1);

as max1�t�T

jTptT jp! 1 and the fact that ~Y = Op(

1pT) and Lemma A.3 of Smith (2011).

Step 5: Since the bootstrap sample observations are independent, we can apply Berry-Ess�een inequality. Thus

supx2R

��P�fpmT (~q

� � ~q)

var�[pmT ~q�]1=2

� xg � �(x)�� C

m1=2T

E�[(

��q�tT � ~qe��

var�[q�tT ]1=2

)3]

=C

m1=2T

var�[q�tT ]�3=2E�[jq�tT � ~qej3]:

Note that var�[q�tT ] = �21 + op(1) and that

E�[jq�tT � ~qj3] =XT

t=1jqtT � ~qj3 �t:

� maxtjqtT � ~qj

XT

t=1jqtT � ~qj2 �t:

Now

maxtjqtT � ~qj = O(S

1=2T )max

t

��YtT � ~Y��

= Op(S1=2T T 1=�)

by Lemma A.1 of Smith (2011) and M with � > max(4v; 1=�).Hence

E�[jq�tT � ~qj3] = Op(S1=2T T 1=�):

Thus

C

m1=2T

E�[jq�tT � ~qej3] = S1=2T T�1=2Op(S

1=2T T 1=�)

= O(STT1=��1=2) =

= O(T 1=��)op(1) (A.7)

since ST = O(T1=2��). Now as � > max(4v; 1=�) > 1=� we have 1=� < � and the result follows as var�[q�tT ] = �

21+op(1).

Assumption A.2 (a) E[jkXtkj4v ] <1; (b) �1 � limT!1 var[T 1=2 �X] is �nite and positive de�nite.

Theorem A.2 Let Assumptions 2.4, 2.5 and A.2 be satis�ed. If E[Xt] = 0 mT = T=ST , then

limT!1

P(supx2Rd

��P�fT 1=2 � �Y � � ~Y�� xg � PfT 1=2 �X � xg

�� ") = 0:

[30]

Proof of Theorem A.2: Let qtT ; q�tT ; ~q and ~q

� de�ned as in the proof of Theorem 3.1. The result is proven if

we are able to show the following steps; cf. Politis and Romano (1992b, Proof of Theorem 2). Step 1: �Xp! 0. Step 2:

T 1=2��1=21 �X

d! N(0; 1). Step 3: supx2R

��PfT 1=2 �X � xg � �d(��1=21 x)

�� ! 0, where �d(�) is the c.d.f. of the standard

d-variate normal distribution. Step 4: Tvar�[~q�]p! �1. Step 5:

limT!1

P(supx2Rd

��P�fvar�[~q�]�1=2(~q� � ~q) � xg � �d(x)�� ") = 0:

The proofs of results 1-4 are analogous to the proofs of results 1-4 in Theorem 3.1 As pointed out by Cattaneo et al.(2010) to prove 5 we need to show that

limT!1

P(sup�2�d

supx2R

��P�fm1=2T

�0 (~q� � ~q)

var�[�0q�1T ]1=2

� xg � �(x)�� "

)= 0:

where �d =�� 2 Rd : �0� = 1

: Let ��d =

�� 2 Rd : �0� � 1

and note that �d � ��d:

Given the sample, the bootstrap observations are independent. Hence we can apply Berry-Ess�een inequality. Thus

sup�2�d

supx2R

��P�fm1=2T

�0~q� � �0~qvar�[�0q�1T ]

1=2� x]� �(x)

�� sup�2�d

C

m1=2T

E�[(�0��q�1T � ~q

��var�[�0q�1T ]

1=2)3]

= sup�2�d

C

m1=2T

var�[�0q�1T ]�3=2E�[

��0q�1T � �0~q��3]� C

inf�2�d var�[�0q�1T ]

3=2sup�2��d

S1=2T

T 1=2E�[��0q�1T � �0~q��3]:

Now for �xed � we have

S1=2T

T 1=2E�[��0q�1T � �0~q��3] =

S1=2T

T 1=21

T

XT

t=1

��0qtT � �0~q��3= op(1)

as in A.7. Since ��d is a is compact and convex and since j:j3 is a convex function we can apply Pollard (1991,p.187) ConvexityLemma to strength pointwise convergence to uniform convergence and therefore sup�2�d S

1=2T T�1=2E�[

��0q�1T � �0~q��3] =op(1); using also the fact that E[sup�2�d j�

0Xtj4v ] � E[jkXtkj4v ] < � by CS.Additionally, by Lemma A.3 of Smith (2011) we have

inf�2�d

1

T

XT

t=1

��0qtT � �0~q��2 = inf�2�d

�0(1

T

XT

t=1(qtT � ~q)(qtT � ~q)0)�

= inf�2�d

�0�1�+ op(1) = inf�2�d

�0QPQ0�+ op(1)

= inf�2�d

�0P�+ op(1) > pmin + op(1)

where P is a diagonal matrix of eigenvectors of �1 and Q is the corresponding orthonormal matrix of eigenvectors andpmin > 0 is the smallest eigenvalue of P: Hence the result follows.

Let �Y � 1T

PTt=1 YtT and ~Y �

PTt=1 YtT ptT

Assumption A.3 (a) The �nite dimensional stochastic process fXtg1t=1 is stationary and ergodic; (b) E[jXtj� ] < 1

for some � � 1; (c) T 1=�=mT = o(1).

Lemma A.5 Let the both A.3, 3.2, 3.3 (a); Then

�Y � � �Y ! 0, prob-P�, prob-P, (A.8)

�Y � � ~Y ! 0, prob-P�, prob-P (A.9)

[31]

Proof: If we prove (A:9), (A:8) follows from this result and the fact that�� Y � ~Y�� =

�� 1T XT

t=1YtT �

XT

t=1YtT ptT

��=

�� 1T XT

t=1YtT (1� TptT )

��

�� 1T XT

t=1YtT

��maxt j1� TptT jp! 0

by the ergodic theorem and the fact that max1�t�T

jTptT j = 1 + op(1): First note that

E�[��Y �jT ��] =

XT

t=1ptT

�� 1STXt�1

s=t�Tk(

s

ST)Xt�s

�� (1 + op(1))

1

T

XT

t=1

�� 1STXt�1

s=t�Tk(

s

ST)Xt�s

��= (1 + op(1))

1

T

XT

t=1

�� 1STX�1

�j=�Tk(t� jST

)Xj

��= (1 + op(1))

1

T

XT

t=1

�� 1STXT

j=1k(t� jST

)Xj

�� (1 + op(1))

1

T

XT

t=1

1

ST

XT

j=1

��k( t� jST)

�� jXj j= (1 + op(1))

1

T

XT

j=1jXj j

1

ST

XT�j

s=1�j

��k( sST )��

by Smith (2011, equation, (A.4) we have1

ST

XT�j

s=1�j

��k( sST )�� = O(1)

uniformly in j: Also by the ergodic theorem (White, 1999, Theorem 2.34)XT

j=1jXj j =T = Op(1). Thus E�[

��Y �jT ��] = Op(1).In addition by T��XT

t=1jYtT j �t �

XT

t=1jYtT j ptT I(jYtT j < �mT )

�� (1 + op(1))1

T

XT

t=1jYtT j I(jYtT j � �mT )

� (1 + op(1))1

T

XT

t=1jYtT jmax

tI(jYtT j � �mT )

Now by MmaxtjYtT j = O(1)max

tjXtj = Op(T 1=� ):

Since T 1=�=mT = o(1) it follows that maxt I(jYtT j � �mT ) = op(1). Thus

1

T

XT

t=1jYtT j I(jYtT j � �mT ) = op(1):

The remaining part of the proof is similar to the proof of Khinchine's weak law of large numbers given in Rao (2002).De�ne a pair of new random variables for each T , (t = 1; :::;mT ),

WtT = Y �tT ; ZtT = 0 if jY �tT j < �mT ;

WtT = 0; ZtT = Y�tT if jY �tT j � �mT :

Hence Y �tT =WtT + ZtT . De�ne

�T = E�[WtT ]

=XT

t=1ptTYtT I[jYtT j < �mT ]:

Note that ~Y = E�[��Y �tT ��] and �� ~Y � �T �� < " for any " > 0 and T large enough. The latter claim holds since by T��XT

t=1ptTYtT I[jwtT j < �mT ]�

XT

t=1ptTYtT

�� (1 + op(1))1

T

XT

t=1jYtT j I(jYtT j � �mT )

= op(1):

Nowvar�[WtT ] = E

�[W 2tT ]� �2T � E�[W 2

tT ] � �mTE�[jWtT j]:

Thus, writing �W =XmT

t=1WtT =mT , using C,

P�f�� W � �T

�� "g � var�[WtT ]

"2mT

� �E�[jWtT j]"2

:

[32]

Hence, since�� ~Y � �T �� < " for any " > 0 and T large enough,

P�f�� W � ~Y

�� 2"g � �E�[jWtT j]"2

: (A.10)

Now by M it follows that

P�fZtT 6= 0g = P�fjY �tT j � �mT g

� 1

�mTE�[jY �tT j I[jY �tT j � �mT ]] �

�

mT:

w.p.a. To see this, as E�[��Y �tT ��] = Op(1), it follows that E�[

��Y �tT �� I[��Y �tT �� mT ]] = op(1). Thus, we can always choose a

constant �2 such that for T large enough E�[��Y �tT �� I[��Y �tT �� mT ]] � �2 w.p.a.1. Write �Z =

XmT

t=1ZtT =mT . Note that

P�f �Z 6= 0g � P�fmaxtZtT 6= 0g (A.11)

�XmT

t=1P�fZtT 6= 0g � �:

From eqs. (A.10) and (A.11)

P�f�� Y � � ~Y

�� 4"g = P�f�� W � ~Y + �Z

�� 4"g� P�f

�� W � ~Y��+ �� Z�� 4"g

� P�f�� W � ~Y

�� 2"g+ P�f�� Z�� 2"g� �E�[jWtT j]

"2+ P�f

�� Z�� 6= 0g = �E�[jWtT j]"2

+ �:

Now choose � small enough. As E�[jWtT j] � E�[��Y �tT ��] = Op(1), the result follows from M.

The following Theorem is due to Ranga Rao [see Wooldridge, 1994]

Theorem A.3 Let � � Rp; let fXt 2 X : t = 1; 2; :::; g be a sequence of stationary and ergodic m � 1 random vectorswith and let ft : X��! R be a real valued function. Assume that (a) � is compact.; (b) for each �, f(:; �) is measurableand for each Xt 2 X; f(xt; :) is continuous on �; (c) E

�sup�2� jf(Xt; �)j

�<1 then

sup�2�

�� 1T XT

t=1f(Xt; �)� E[f(Xt; �)]

�� = op(1):The following Lemma corresponds to a weak uniform law of large numbers for kernel block bootstrapped sequences.

Lemma A.6 Let fXt 2 X : t = 1; 2; :::; g be a sequence of stationary and ergodic m� 1 random vectors and let

qtT (�) =1

ST

Xt�1

s=t�Tk(

s

ST)g(Xt�s; �); (A.12)

and consider the sample qtT (�), (t = 1; :::; T ). Draw a random sample of size mT with replacement from qtT (�), (t =1; :::; T ) , to obtain the bootstrap sample q�tT (�), (t = 1; :::;mT ) where P(q�tT (�) = qtT (�)) = ptT for s = 1; :::;mT andt = 1; :::; T . Assume that 3.2, 3.3 (a) hold and that : (a) Bootstrap Pointwise Weak Law of Large Numbers. for each�xed � 2 � � Rp, � a compact set,

1

mT

XmT

t=1q�tT (�)�

XT

t=1qtT (Xt; �)ptT ! 0, prob-P�, prob-P;

(b) Uniform Convergence:

sup�2�

�� 1T XT

t=1qtT (�)�

1

T

XT

t=1g(Xt; �)

�� p! 0;

E[sup�2�

jg(Xt; �)j] � �

(c) or each �, g(:; �) is measurable and for each xt 2 X; g(xt; :) is continuous on �. Then, as mT ! 1 and ST =op(T 1=2), for any � > 0 and � > 0

limT!1

PfP�fsup�2�

�� 1mT

XmT

t=1q�tT (�)�

1

T

XT

t=1g(Xt; �)

�� > �g > �g = 0;

limT!1

PfP�fsup�2�

�� 1mT

XmT

t=1q�tT (�)�

XT

t=1qtT (�)ptT

�� > �g > �g = 0:

[33]

Proof: First write

AT = P�fsup�2�

�� 1mT

XmT

t=1q�tT (�)�

1

T

XT

t=1g(Xt; �)

�� > �gand by M PfAT > �g � ��1E[AT ]: Note that the Lebesgue convergence theorem is valid for sequences that converge inprobability by Proposition 20 of Royden (1988, p.96.). Therefore as AT � 1 the result follows from this theorem if we

show that ATp! 0:

The proof is similar to the proof of a standard UWL (eg. Amemya, 1985).First note that�� 1mT

XmT

t=1q�tT (�)�

1

T

XT

t=1g(Xt; �)

�� 1mT

XmT

t=1q�tT (�)�

XT

t=1qtT (�)ptT

��+

�� 1T XT

t=1gt(�)�

XT

t=1qtT (�)ptT

��and that

sup�2�

�� 1T XT

t=1g(Xt; �)�

XT

t=1qtT (�)ptT

�� sup�2�

�� 1T XT

t=1g(Xt; �)�

1

T

XT

t=1qtT (�)

��+ sup�2�

�� 1T XT

t=1qtT (�)TptT �

1

T

XT

t=1qtT (�)

�� :By Smith (Lemma A.1, 2004) we have

sup�2�

�� 1T XT

t=1g(Xt; �)�

1

T

XT

t=1qtT (�)

�� = op(1): (A.13)

Also

sup�2�

�� 1T XT

t=1qtT (�)TptT �

1

T

XT

t=1qtT (�)

�� sup�2�

�� 1T XT

t=1qtT (�)

�� max1�t�TjTptT � 1j

= op(1)

since

sup�2�

�� 1T XT

t=1qtT (�)

�� O(1) 1T XT

t=1sup�2�

jg(xt; �)j = Op(1)

by the ergodic theorem ergodic theorem (White, 1999, Theorem 2.34) and the fact that max1�t�T jTptT j = 1+ op(1):Weprove now that

limT!1

PfP�fsup�2�

�� 1mT

XmT

t=1q�tT (�)�

XT

t=1qtT (�)ptT

�� > �g > �g = 0:Since � is compact it follows that there is a �nite number of �0s for instance �1; �2; :::; �n� such that � �

Sni=1 �(�i; �)

where �(�i; �) is an open ball with center �i and radius �: Thus

P�fsup�2�

�� 1mT

XmT

t=1q�tT (�)�

XT

t=1qtT (�)ptT

�� > �g �

P�f[n�

i=1sup

�2�(�i;�)

�� 1mT

XmT

t=1q�tT (�)�

XT

t=1qtT (�)ptT

�� > �g �

Xn�

i=1P�f sup

�2�(�i;�)

�� 1mT

XmT

t=1q�tT (�)�

XT

t=1qtT (�)ptT

�� > �g:

Now

P�f sup�2�(�i;�)

�� 1mT

XmT

t=1q�tT (�)�

XT

t=1qtT (�)ptT

�� > �g �P�f

�� 1mT

XmT

t=1q�tT (�i)�

XT

t=1qtT (�i)ptT

�� > �

3g

+P�f sup�2�(�i;�)

�� 1mT

XmT

t=1q�tT (�)�

1

mT

XmT

t=1q�tT (�i)

�� > �

3g

Pf sup�2�(�i;�)

��XT

t=1qtT (�)ptT �

XT

t=1qtT (�i)ptT

�� > �

3g

B1;T +B2;T +B3;T:

Now �� 1mT

XmT

t=1q�tT (�i)�

XT

t=1ptT qtT (�i)

�� = oB(1)by the KBB Law of large numbers. Thus B1;T = op(1):

[34]

By M

P�f sup�2�(�i;�)

�� 1mT

XmT

t=1q�tT (�)�

1

mT

XmT

t=1q�tT (�i)

�� > �

3g

� 3

�E�[ sup

�2�(�i;�)

�� 1mT

XmT

t=1q�tT (�)�

1

mT

XmT

t=1q�tT (�i)

��]� 3

�

1

mT

XmT

t=1E�[ sup

�2�(�i;�)jq�tT (�)� q�tT (�i)j]

=1

�

XT

t=1ptT sup

�2�(�i;�)jqtT (�)� qtT (�i)j

=1

�

1

T

XT

t=1TptT sup

�2�(�i;�)jqtT (�)� qtT (�i)j

= (1 + op(1))1

�

1

T

XT

t=1sup

�2�(�i;�)jqtT (�)� qtT (�i)j ;

where the second inequality follows from T. But by M

P( 1T

XT

t=1sup

�2�(�i;�)jqtT (�)� qtT (�i)j > �) �

1

�T

XT

t=1E( sup

�2�(�i;�)jqtT (�)� qtT (�i)j)

also

sup�2�(�i;�)

jqtT (�)� qtT (�i)j = sup�2�(�i;�)

�� 1STXt�1

s=t�Tk(

s

ST)(g(Xt�s; �)� g(Xt�s; �i)

�� 1

ST

Xt�1

s=t�T

��k( sST )�� sup�2�(�i;�)

j(g(Xt�s; �)� g(Xt�s; �i)j

by T. Now taking expectations we have

E[1

ST

Xt�1

s=t�T

��k( sST )�� sup�2�(�i;�)

j(g(xt�s; �)� g(xt�s; �i)j]

=1

ST

Xt�1

s=t�T

��k( sST )��E[ sup

�2�(�i;�)j(g(xt�s; �)� g(xt�s; �i)j]

and 1ST

Xt�1

s=t�T

��k( sST)�� C; E[sup�2�(�i;�) j(g(xt�s; �)� g(xt�s; �i)j] ! 0 by as g(Xt�s; �) is continuous and domi-

nated convergence as � ! 0. Consequently we have

1

�T

XT

t=1E( sup

�2�(�i;�)jqtT (�)� qtT (�i)j) =

1

�T

XT

t=1

1

ST

Xt�1

s=t�T

��k( sST )��

= o(1):

Thus B2;T = op(1):Finally

B3;T = sup�2�(�i;�)

��XT

t=1ptT qtT (�)�

XT

t=1ptT qtT (�i))

�� 1

T

XT

t=1TptT sup

�2�(�i;�)jqtT (�)� qtT (�i)j

� (1 + op(1))1

T

XT

t=1sup

�2�(�i;�)jqtT (�)� qtT (�i)j

by T and the result follows as above.The second result follows from the �rst and (A:13) and (A:14) :

Lemma A.7 Let fXt 2 X : t = 1; 2; :::; g be a sequence of stationary and ergodic m� 1 random vectors and let

qtT (�) =1

ST

Xt�1

s=t�Tk(

s

ST)g(Xt�s; �); (A.14)

and consider the sample qtT (�), (t = 1; :::; T ). Draw a random sample of size mT with replacement from qtT (�), (t =1; :::; T ) , to obtain the bootstrap sample q�tT (�), (t = 1; :::;mT ) where P(q�tT (�) = qtT (�)) = ptT for s = 1; :::;mT andt = 1; :::; T . Assume that 3.2, 3.3 (a) hold and that : (a) Bootstrap Pointwise Weak Law of Large Numbers. for ,

1

mT

XmT

t=1q�tT (�0)�

XT

t=1qtT (�0)ptT ! 0, prob-P�, prob-P;

[35]

(b) E[sup�2N jg(Xt; �)j] � � where N is a neighbourhood of �0: (c) or each �, g(:; �) is measurable and for each xt 2 X;g(xt; :) is continuous on �. Then,

sup�2N

�� 1T XT

t=1g(Xt; �)� E[g(Xt; �)]

�� p! 0

and as mT !1 and ST = op(T1=2), for any � > 0 and � > 0

limT!1

PfP�f sup�2N

�� 1mT

XmT

t=1q�tT (�)�

XT

t=1qtT (�)ptT

�� > �g > �g = 0;

limT!1

PfP�f sup�2N

�� 1mT

XmT

t=1q�tT (�)�

1

T

XT

t=1g(Xt; �)

�� > �g > �g = 0:

Proof: Let N = �(�0; �); where �(�0; �) is an open ball with center �0 and radius �: First note that

sup�2�(�0;�)

�� 1T XT

t=1g(Xt; �)� E[g(Xt; �)]

�� sup�2�(�0;�)

�� 1T XT

t=1g(Xt; �)�

1

T

XT

t=1g(Xt; �0)

��+

�� 1T XT

t=1g(Xt; �0)� E[g(Xt; �0)]

��+ sup�2�(�0;�)

jE[g(Xt; �0)]� E[g(Xt; �)]j

Pf sup�2�(�0;�)

�� 1T XT

t=1g(Xt; �)�

1

T

XT

t=1g(Xt; �0)

��g > " � 1

"E[ sup�2�(�0;�)

jg(Xt; �)� g(Xt; �0)j]

Now by TE[ sup�2�(�0;�)

jg(Xt; �)� g(Xt; �0)j] � 2E[ sup�2�(�0;�)

jg(Xt; �)j]

Thus by the Dominated convergence theorem and continuity of g(Xt; �) thus as � ! 0 we have

lim�!0

E[ sup�2�(�0;�)

jg(Xt; �)� g(Xt; �0)j] = 0:

Let us consider now

sup�2�(�0;�)

�� 1T XT

t=1qtT (�)�

1

T

XT

t=1g(Xt; �)

�� p! 0

Note that

sup�2�(�0;�)

�� 1T XT

t=1qtT (�)�

1

T

XT

t=1g(Xt; �)

�� sup�2�(�0;�)

�� 1T XT

t=1g(Xt; �)� E[g(Xt; �)]

��+sup

�2�(�0;�)

�� 1T XT

t=1qtT (�)� E[g(Xt; �)]

��= A1;T +A2;T :

The proofs that A1;Tp! 0 was proven before and the proof that A2;T

p! 0 is identical to the proof of Lemma A.1 of

Smith (2011) [and uses the fact that A1;Tp! 0 ]

We prove now that

limT!1

PfP�f sup�2N

�� 1mT

XmT

t=1q�tT (�)�

XT

t=1qtT (�)ptT

�� > �g > �g = 0:Note that

P�f sup�2�(�0;�)

�� 1mT

XmT

t=1q�tT (�)�

XT

t=1qtT (�)ptT

�� > �g �P�f

�� 1mT

XmT

t=1q�tT (�0)�

XT

t=1qtT (�0)ptT

�� > �

3g

+P�f sup�2�(�0;�)

�� 1mT

XmT

t=1q�tT (�)�

1

mT

XmT

t=1q�tT (�0)

�� > �

3g

Pf sup�2�(�0;�)

��XT

t=1qtT (�)ptT �

XT

t=1qtT (�0)ptT )

�� > �

3g

C1;T + C2;T + C3;T:

Now �� 1mT

XmT

t=1q�tT (�0)�

XT

t=1ptT qtT (�0)

�� = oB(1)[36]

by the KBB Law of large numbers. Thus C1;T = op(1):By M

P�f sup�2�(�0;�)

�� 1mT

XmT

t=1q�tT (�)�

1

mT

XmT

t=1q�tT (�0)

�� > �

3g

� 3

�E�[ sup

�2�(�0;�)

�� 1mT

XmT

t=1q�tT (�)�

1

mT

XmT

t=1q�tT (�0)

��]� 3

�

1

mT

XmT

t=1E�[ sup

�2�(�0;�)jq�tT (�)� q�tT (�0)j]

=3

�

XT

t=1ptT sup

�2�(�0;�)jqtT (�)� qtT (�0)j

=3

�

1

T

XT

t=1TptT sup

�2�(�0;�)jqtT (�)� qtT (�0)j

= (1 + op(1))1

�

1

T

XT

t=1sup

�2�(�0;�)jqtT (�)� qtT (�0)j ;

where the second inequality follows from T. But

P( 1T

XT

t=1sup

�2�(�0;�)jqtT (�)� qtT (�0)j > �) �

1

�T

XT

t=1E( sup

�2�(�0;�)jqtT (�)� qtT (�0)j)

also

sup�2�(�0;�)

jqtT (�)� qtT (�i)j = sup�2�(�0;�)

�� 1STXt�1

s=t�Tk(

s

ST)(g(Xt�s; �)� g(Xt�s; �0)

�� 1

ST

Xt�1

s=t�T

��k( sST )�� sup�2�(�0;�)

j(g(Xt�s; �)� g(Xt�s; �0)j

by T. Now taking expectations we have

E[1

ST

Xt�1

s=t�T

��k( sST )�� sup�2�(�0;�)

j(g(xt�s; �)� g(xt�s; �0)j]

=1

ST

Xt�1

s=t�T

��k( sST )��E[ sup

�2�(�0;�)j(g(xt�s; �)� g(xt�s; �0)j]

and 1ST

Xt�1

s=t�T

��k( sST)�� C; E[sup�2�(�0;�) j(g(xt�s; �)� g(xt�s; �0)j]! 0 as g(Xt�s; �) is continuous and dominated

convergence as � ! 0. Consequently we have

1

�T

XT

t=1E( sup

�2�(�0;�)jqtT (�)� qtT (�0)j) =

1

�T

XT

t=1

1

ST

Xt�1

s=t�T

��k( sST )��

= o(1):

Thus the result follows.The second result follows from the fact that

sup�2�(�0;�)

�� 1mT

XmT

t=1q�tT (�)�

1

T

XT

t=1g(Xt; �)

�� sup�2�(�0;�)

�� 1mT

XmT

t=1q�tT (�)�

XT

t=1qtT (�)ptT

��+ sup�2�(�0;�)

�� 1T XT

t=1g(Xt; �)�

XT

t=1qtT (�)ptT

�� :The �rst term of the RHS was shown to converge to zero

sup�2�(�0;�)

�� 1T XT

t=1g(Xt; �)�

XT

t=1qtT (�)ptT

�� sup�2�(�0;�)

�� 1T XT

t=1qtT (�)�

XT

t=1qtT (�)ptT

��+ sup�2�(�0;�)

�� 1T XT

t=1g(Xt; �)�

1

T

XT

t=1qtT (�)

�� sup

�2�(�0;�)

�� 1T XT

t=1qtT (�)

�� max1�t�Tj1� TptT j+ op(1):

As A1;T + A2;Tp! 0: Now sup�2�(�0;�)

�� 1T PTt=1 qtT (�)

�� = Op(1) and max1�t�T j1� TptT j = op(1): Hence the result

follows.

[37]

Lemma A.8 If the �nite dimensional stochastic process fXtg1t=1 satisfy assumptions 3.1, 2.5 and 3.3 (a) hold and ifmT = T=ST ; and ST = o(T

1=2) and if E[Xt] = 0,

limn!1

P[P�(j ST

mT k2

XmT

t=1Y �2tT � ST

Tk2

XT

t=1Y 2tT j > ") > �] = 0;

limn!1

P[P�(j ST

mT k2

XmT

t=1Y ?2tT � ST

k2

XT

t=1Y 2tT ptT j > ") > �] = 0:

Proof: The result is proved if we show that

P�(j STmTk2

XmT

t=1Y �2tT � ST

Tk2

XT

t=1Y 2tT j > ") = op(1):

Note that ��STk2 XT

t=1Y 2tT ptT �

ST

Tk2

XT

t=1Y 2tT

�� = maxt jTptT � 1j�� STTk2 XT

t=1Y 2tT

�� :Thus as

�� STTk2 PTt=1 Y

2tT

�� = Op(1) by Lemma A.3 of Smith (2004, pA.4) and max1�t�T jTptT � 1j = op(1) by assumption.Hence the result follows by T if we show that

P�� STmTk2

XmT

t=1Y �2tT � ST

k2

XT

t=1Y 2tT ptT

�� > "� = op(1):The proof of this result is similar to that of Lemma B.2 of Gon�calves and White (2004).First note that E�

�Y �2tT

�=PT

t=1 ptTY2tT Thus by M we have

P��j ST

mT k2

XmT

t=1Y �2tT ��ST

k2

XT

t=1Y 2tT ptT

�� > "� � "�pE�(j ST

mT k2

XmT

t=1Y �2tT � ST

k2

XT

t=1ptTY

2tT jp)

for some p > 1: Now

E�(j ST

mT k2

XmT

t=1Y �2tT � ST

k2

XT

t=1ptTY

2tT jp) = (

ST

mT k2)pE�(j

XmT

t=1(Y ?2tT � E?

�Y ?2tT

�)jp)

� (ST

mT k2)pCE�((

XmT

t=1j(Y ?2tT � E? [Y ?tT ])2j)p=2)

for some C < 1 by an extension of the Burkholder inequality due to White and Chen (1996, Lemma A.2) as (Y �2tT �E��Y �2tT

�) are i.i.d zero mean. But for 1 < p � 2 we have by the cr inequality (Davidson, 1994, p140) with r = p=2

(ST

mT k2)pE�((

XmT

t=1j(Y �2tT � E�

�Y �2tT

�)2j)p=2) � (

ST

mT k2)pXmT

t=1E�(��(Y �2tT � E�

�Y �2tT

�)2��p=2)

=SpT

mp�1T kp2

E��(Y �2tT � E�

�Y �2tT

�)��p�

�SpT

mp�1T kp2

2pE��j(Y �tT j

2p�

=S3=2T

m1=2T k

3=22

2pE��j(Y �tT j

3�

=S2T

T 1=2k3=22

23=2XT

t=1jYtT j3 ptT

=S2T

T 1=2k3=22

23=21

T

XT

t=1jYtT j3 TptT

=S2T

T 1=2k3=22

23=2(1 + op(1))1

T

XT

t=1jYtT j3

as max1�t�T

jTptT � 1jp! 1 and for p = 3=2: Now note that

ST

T

XT

t=1jYtT j3 =

ST

T

XT

t=1jYtT j2max

tjYtT j = Op(T 1=�)

by Lemma A.3 of Smith (2011) and by M. Thus

ST

T 1=2k3=22

23=2(1 + op(1))ST

T

XT

t=1jYtT j3 = Op(T��+1=�):

Now note � > max(4v; 1=�) > 1=�, thus � > 1=� and the result follows

[38]

Lemma A.9 If the �nite dimensional stochastic process f(Xt; Ztg1t=1 is strictly stationary and ergodic and satisfy

E(jXtjdp) < � and E(jZtjdpd�1 ) < �; for some 1 < p � 2 and d > 1 and if assumptions assumptions 2.5 and 3.3

hold and if mT = T=ST and ST = o(T1=2) then

limn!1

P[P�(j STmT

XmT

t=1Y �tTZ

�tT j > T 1=2") > "] = 0;

where ZtT =1ST

Xt�1

s=t�Tk( s

ST)Zt�s; (t = 1; :::; T ); and (Z?1T ; :::; Z

?mT T

) is a bootstrap sample drawn from (Z1T ; :::; ZTT ):

Proof: The proof is similar to the proof of Lemma B.2 of Gon�calves and White (2004). First note that by M for some1 < p � 2 we have

P�(j STmT

XmT

t=1Y �tTZ

�tT j > T 1=2") �

1

"pT p=2E�[j ST

mT

XmT

t=1Y �tTZ

�tT jp]

� C 1

"pT p=2E�[j ST

mT

XmT

t=1Y �tTZ

�tT � E�[Y �tTZ�tT ]jp]

+C

"pT p=2E�[j ST

mT

XmT

t=1E�[Y �tTZ

�tT ]jp] = F1 + F2

by cr inequality. Now

F1 � 1

"pT p=2SpTmpT

E�[jXmT

t=1Y �tTZ

�tT � E�[Y �tTZ�tT ]jp]

� C1

"pT p=2SpTmpT

E�[(XmT

t=1(jY �tTZ�tT � E�[Y �tTZ�tT ]j)

2)p=2]

� C1

"pT p=2SpTmpT

E�[XmT

t=1j(Y �tTZ�tT � E�[Y �tTZ�tT ])

pj]

� C1

"pT p=2SpT

mp�1T

E�[jY �tTZ�tT jp]

by an extension of the Burkholder inequality due to White and Chen (1996, Lemma A.2) and cr inequality with r = p=2.Also

F2 =C

"pT p=2

�� STmT

XmT

t=1E�[Y �tTZ

�tT ]

��p� C

"pT p=2SpTmpT

��XmT

t=1E�[Y �tTZ

�tT ]��p

=C

"pT p=2SpT jE

�[Y �tTZ�tT ]j

p

� C

"pT p=2SpTE

�[jY �tTZ�tT jp]

by Jensen. Now

C

"pT p=2SpTE

�[jY �tTZ�tT jp] =

C

"pT p=2SpT

XT

t=1jYtT jp jZtT jp ptT

=C

"pT p=2SpT

1

T

XT

t=1jYtT jp jZtT jp TptT

=C

"pT p=2SpT (1 + op(1))

1

T

XT

t=1jYtT jp jZtT jp

as max1�t�T jTptT � 1jp! 1:

But by M and Holder Inequality

P[ C

"pT p=2SpT

1

T

XT

t=1jYtT jp jZtT jp > �] � C

�"pT p=2SpTE[

1

T

XT

t=1jYtT jp jZtT jp]

=C

�"pT p=2SpT

1

T

XT

t=1E[jYtT jp jZtT jp]

� C

�"pT p=2SpT

1

T

XT

t=1(E[jYtT j�p])1=�

�E[jZtT j

�p1�� ]

�(1��)=�:

[39]

Now by T and Jensen inequalities

E[jYtT j�p] = E[j 1ST

Xt�1

s=t�Tk(

s

ST)Xt�sj�p]

� E[(1

ST

Xt�1

s=t�Tjk( sST

)jjXt�sj)�p]

= E[(1

ST

Xt�1

s=t�Tjk( sST

)j)�p(1ST

Xt�1

s=t�Tjk( s

ST)j jXt�sj

1ST

Xt�1

s=t�Tjk( s

ST)j

)�p]

� E[(1

ST

Xt�1

s=t�Tjk( sST

)j)�p�1 1

ST

Xt�1

s=t�Tjk( sST

)j jXt�sj�p]

= (1

ST

Xt�1

s=t�Tjk( sST

)j)�p�1 1

ST

Xt�1

s=t�Tjk( sST

)jE jXt�sj�p

= (1

ST

Xt�1

s=t�Tjk( sST

)j)�p� = O(1)

as E(jXtj�p) is bounded : By the same reasoning E[jZtT j�p1�� ] = O(1): Thus the result follows since ST =T

1=2 = o(1):

A.4 Proofs of the results in section 4.1In this subsection of the appendix we take ptT = 1=T and consequently Assumption 3.3 (i) is automatically satis�ed.Assumption 3.3 (ii) follow from Lemma A.2 of Smith (2011).

Proof of Theorem 4.1: The result is proven if we show that the conditions of Lemma A.2 of Gon�calves and White(2004) are satis�ed. Conditions (a1), (a2) and (b1) and (b2) are satis�ed by assumption 4.1 (i) and (iii) (see Jennrich,1969, Lemma 2). Note that uniqueness of the minimum follows from Lemma 2.3 of Newey and MacFadden (1994). Toprove (a3) de�ne Q0(�) = E[g(zt; �)]0WE[g(zt; �)] and note that as in the proof of Theorem 2.6 of Newey and MacFadden(1994) using T and CS

jQT (�)�Q0(�)j � kg(�)� E[g(zt; �)]k2 kWT k+2 kE[g(zt; �)]k kg(�)� E[g(zt; �)]k kWT k+ kE[g(zt; �)]k2 kWT �Wk :

By the the Lemma A.3 we have sup�2B kg(�)� E[g(zt; �)]k = op(1) Also by assumption kE[g(zt; �)]k is bounded andkWT �Wk = op(1).

It remains to prove (b3). By T and CS

jQ�T (�)�QT (�)j � kg�T (�)� g(�)k2 kW �

T k+ 2 kg(�)k kg�T (�)� g(�)]k kW �T k

+ kg(�)k2 kW �T �WT k :

Now by Lemma A.6 we have sup�2B g�T (�)� g(�) = oB(1) also

sup�2B

kg(�)k � sup�2B

kg(�)� E[g(zt; �)]k+ sup�2B

kE[g(zt; �)]k

= op(1) + C:

thus the result follows as W �

T �WT

= oB(1):Proof of Theorem 4.2: Let G�T � @g�T (�

�)@�0 To prove asymptotic Normality notice that by the �rst order

conditions we havepT=k2G�0TW

�T g

�T (�

�) = 0: Hence a �rst order Taylor expansion around � yieldspT=k2G

�0TW

�T g

?T (�) + G

�0TW

�T~G�TpT=k2(�

� � �) = 0;

where ~G�T � @g?T ( ~��)@�0 and ~�� is on a line joining � and ��: Solving forpT=k2(�� ) we obtainp

T=k2(�� ) = �[G�0TW �

T~G�T ]

�1pT=k2G�0TW �T g

�T (�):

By a Taylor expansion we havepT=k2G

�0TW

�T g

�T (�) =

pT=k2G

�0TW

�T g

�T (�0) +

pT=k2G

�0TW

�T�G�T (� � �0)

=pT=k2G

�0TW

�T [g

�T (�0)� gT (�0)] +

pT=k2G

�0TW

�T gT (�0) +

pT=k2G

�0TW

�T~G�T (� � �0);

where �G�T = @g?T (��)@�0 and �� is on a line joining � and �0:

We prove now that pT=k2G

�0TW

�T gT (�0) +

pT=k2G

�0TW

�T~G�T (� � �0) = oB(1):

Note that by the �rst order conditions of the original GMM problem we havepT (� � �0) = �[G0TWT

�GT ]�1pT=k2G0TWT gT (�0);

[40]

where �GT = @gT ( ��)@�0 and �� is on a line joining � and �0: Thusp

T=k2G�0TW

�T gT (�0) +

pT=k2G

�0TW

�T~G�T (� � �0)

= [G�0TW�T � G�0TW �

T~G�T

hG0TWT

�GT

i�1G0TWT ]

pT=k2gT (�0):

Now by Assumption W �T = WT + oB(1), WT = W + op(1); also by the bootstrap uniform convergence Lemma A.7

consistency of �� and �; G�T � G = oB(1); ~G�T � G = oB(1); �GT � G = op(1); GT � G = op(1) and by the CLT of

Wooldridge and White (Theorem 5.20 of White ,1999)pT=k2gT (�0) = O(1):

NowpT=k22G�0TW

�T [g

�T (�0)�gT (�0)] converges toN(0; (G0WG)�1G0WWG(G0WG)�1) by bootstrap CLT Theorem

A.2 and the fact that G�T �G = oB(1) and W �T =W + oB(1). The result follows as the

pT=k2(�� ) converges to the

same asymptotic distribution of T 1=2(� � �0) and by Polya Theorem, Ser ing (2002, p.18), as �(�) is a continuous c.d.f.:

Proof of Lemma 4.1: We use the same strategy of the proof of Theorem 4.1 of Gon�calves and White (2004). Firstconsider the unfeasible estimator of :

�(�0) =ST

mT k2

XmT

t=1g�t (�0)g

�t (�0)

0:

Fix any � 2 Rm. Now

�0�(�0)� =ST

mT k2

XmT

t=1�0g�t (�0)g

�t (�0)

0�

=ST

mT k2

XmT

t=1(�0g�t (�0))

2:

Now applying Lemma A.8 with Xt = �0gt(�0) and ptT = 1=T , t = 1; :::; T it follows that

ST

mT k2

XmT

t=1(�0g�t (�0))

2 � ST

Tk2

XT

t=1(�0gtT (�0))

2 = oB(1)

and by Smith (2011) Lemma A.3ST

Tk2

XT

t=1(�0gtT (�0))

2 = �0�+ op(1):

Thus it remains to prove that��0�( ~��)�� 0�(�0)�� = oB(1): Note that by �rst order Taylor expansion of (�0gtT ( ~��))2

around �0 we have

(�0gtT ( ~��))2 =

��0g�t (�0)

�2+ 2(�0g�t ( ��

�)�0G�t ( ��)( ~�� 0))

where �� is in a line joining ~�� and �0: Thus

�0�( ~��)� =ST

mT k2

XmT

t=1(�0g�t ( ~�

�))2

=ST

mT k2

XmT

t=1(��0g�t (�0)

�2+ 2(�0g�t ( ��

�)�0G�t ( ��)( ~�� 0)))

= �0�(�0)�+ST

mTk2

XmT

t=12(�0g�t ( ��

�)�0G�t ( ��)( ~�� 0)):

Now denote G�t;j(��) the column j of G�t (

��) thus

j STmT

XmT

t=12(�0g�t ( ��

�)�0G�t ( ��)( ~�� 0))j

= j STmT

XmT

t=12(�0g�t ( ��

�)Xp

j=1�0G�t;j( ��

�)( ~��j � �0;j))j

= jXp

j=1

ST

mT

XmT

t=12(�0g�t ( ��

�)�0G�t;j( ��)( ~��j � �j;0))j

�Xp

j=1j STmT

XmT

t=12(�0g�t ( ��

�)�0G�t;j( ��)( ~��j � �j;0))j

=Xp

j=1OB(

1pT)

�� STmT

XmT

t=12(�0g�t ( ��

�)�0G�tj( ��)

��by T and the fact that ( ~��j � �j;0) = OB(1=T 1=2): Note that�� STmT

XmT

t=12(�0g�t ( ��

�)�0G�tj( ��)

�� ST

mT

XmT

t=12��(�0g�t ( ��)�0G�tj( ��)��

� ST

mT

XmT

t=1sup�2B

2��(�0g�t (�)�� sup

�2B

��0G�tj(�)�� :[41]

Now de�ne jYtT j = 2 sup�2B j(�0g�t (�)j and jZtT j = sup�2N��0G�tj(�)�� and apply Lemma A.9 above with p = 2 , d = �=2

and ptT = 1=T , t = 1; :::; T which shows that

ST

mT

XmT

t=12 sup�2B

2��(�0g�t (�)�� sup

�2B

��0G�tj(�)�� = op(T�1=2)and hence the result follows.

Proof of Theorem 4.3: Note that by a Taylor expansionpT=k2g

�(�e�) =pT=k2g

�(�e) + ~G�pT=k2(�

e� � �e);

where ~G�T � @g�T ( ~�)=@�0 and ~�� is in a line joining �e� and �e:Note that by Theorem 4.2 with W �

T =~��1p

T=k2(�e� � �e) = �[G�0T ~��1 ~G�T ]�1G�0T ~��1

pT=k2[g

�T (�0)� gT (�0)] + oB(1):

Also by a Taylor expansionpT=k2(g

�(�e)� g�(�0)� g(�e) + g(�0)) = ( �G�T � �GT )pT=k2

��e � �0

�(A.15)

= oB(1)Op(1) = oB(1);

where �G�T � @g�T ( ��)=@�0 where �� is in a line joining �e and �0 and �GT � @gT ( ��)=@�0 where �� is in a line joining �e and�0.

Thus sT

k2[g�(�e�)� g(�e)] = [Im � ~G�[G�0T ~

��1 ~G�T ]�1G�0T ~

��1]pT=k2[g

�T (�0)� gT (�0)] + oB(1):

Now since ~G� = G + oB(1); G�T = G + oB(1); ~

��1 = �1 + oB(1) and by the bootstrap CLT Theorem A.2pT=k2[g�T (�0)� gT (�0)] converges to N(0;) It follows thats

T

k2[g�(�e�)� g(�e)] = [Im �G

�G0�1G

��1G0�1]

pT=k2[g

�T (�0)� gT (�0)] + oB(1):

Thus

J � =T

k2[g�T (�0)� gT (�0)]0[�1 � �1G

�G0�1G

��1G0�1][g�T (�0)� gT (�0)] + oB(1):

As

[�1 � �1G�G0�1G

��1G0�1][�1 � �1G

�G0�1G

��1G0�1]

= [�1 � �1G�G0�1G

��1G0�1]

and tr([�1 � �1G�G0�1G

��1G0�1]) = m � p, it follows from Rao and Mitra(1972), the fact that J � )dP�

�2(m � p): Since J d! �2(m � p); the result stated in the Theorem is a consequence of Polya Theorem (Ser ing, 2002,p.18), as the chi-squared distribution has a continuous c.d.f.

Proof of Theorem 4.4: We start by deriving the asymptotic distribution ofW�:De�ne ha;�t (�; ) � (g�(zt; �)0; [q�(zt; �)� ]0), ha;�(�; ) =

PmTt=1 h

a;�t (�; )=mT and ~Q�(�) = ha;�(�; )0��1ha;�(�; ): Note that the unrestricted bootstrapped

GMM estimator solves(�e�0; �0) = argmin

�2B; 2�~Q�(�; );

where � is a compact parameter space. The solution is given by

�e� = argmin�2B

g�(�)0��1g�(�);

� = q�(��)� ��21��1g�(��):

We note that by Theorem 4.1 �e� = � + oB(1) and by Lemma A.6 and �� = �+ oB(1) we have � = + oB(1):Since

these estimators satisfy the �rst order conditions we have D�(�e�)0��1ha;�(�e�; �) = 0 with

D�(�) �� PmT

i=1 G�t (�)=mT 0PmT

i=1 Q�t (�)=mT �Is

�:

Thus by a Taylor expansion around (�e0; 0)0

D�(�e�)0��1h�(�e; ) + D�(��)0��1D�( ~��)

��

�= 0;

[42]

where ~�� is in a line joining �e� and �e: Thus

pT

��e� � �e � �

�= �[D�0��1 ~D�]�1D�0��1

pT h�(�; ):

Now notice that as in the proof of Theorem 4.2sT

k2

��e� � �e � �

�= �[D0��1D]�1D0��1

sT

k2[ha;�T (�0; 0)� haT (�0; 0)] + oB(1):

Thus by a Taylor expansion we havesT

k2

�a(�e�)� a(�e)

� �

�= �R( ��)[D0��1D]�1D0��1

sT

k2[ha;�T (�0; 0)� haT (�0; 0)] + oB(1)

= �R[D0��1D]�1D0��1

sT

k2[ha;�T (�0; 0)� haT (�0; 0)] + oB(1);

where �� is in a line joining �e� and �e as R( ��) = R+ oB(1):Thus

W� = (T=k2)[r� � r]0

hR�(D�0��1D�)�1R�0

i�1[r� � r]

=

sT

k2[ha;�T (�0; 0)� haT (�0; 0)]0��1D[D0��1D]�1R0

�R(D0��1D)�1R0

��1�R(D0��1D)�1D0��1

sT

k2[ha;�T (�0; 0)� haT (�0; 0)] + oB(1):

Since as D� = D + oB(1) by Lemma A.7 and �� = � + oB(1) by Lemma A.6and the fact that

pT=k2[h

a;�T (�0; 0) �

haT (�0; 0)] = OB(1) by the bootstrap CLT. Thus as in Theorem 2.4 aboveW� converges to a chi-squared distribution withs+ r degrees of freedom.

We consider know the score statistic S�. We derive the distribution of the bootstrap restricted GMM estimator. Notethat the Lagrangian of the restricted problem is

L� = ~Q�(�; )� a(�)0�� 0�:

Denote '� = (�0; �0)0 the vector of Lagrange multipliers evaluated at the optimum. Thus the �rst order conditions are

D�(�e�r )0��1h(�e�r ; 0)�R(�e�r )'� = 0:

Multiplying both sides by R(�e�r )0(D�0��1D�)�1 we have

R(�e�r )0(D�(�e�r )

0��1D�(�e�r ))�1D�(�e�r )

0��1ha;�(�e�r ; 0)�R(�e�r )0(D�(�e�r )0��1D�(�e�r ))

�1R(�e�r )'� = 0:

(A.16)

Thus

'� = [R(�e�r )0(D�(�e�r )

0��1D�(�e�r ))�1R(�e�r )]

�1R(�e�r )0(D�(�e�r )

0��1D�(�e�r ))�1D�(�e�r )

0��1h(�e�r ; 0):(A.17)

Hence replacing (A:17) in (A:16) we have

D�(�e�r )0��1ha;�(�e�r ; 0)

�R(�e�r )[R(�e�r )0(D�(�e�r )0��1D�(�e�r ))

�1R(�e�r )]�1R(�e�r )

0(D�(�e�r )0��1D�(�e�r ))

�1D�(�e�r )0��1ha;�(�e�r ; 0) = 0:

But by a Taylor expansion ha;�(�e�r ; 0) = ha;�(�er ; 0) + ~D�( ��)S1(�e�r � �er) where �� is in a line joining �e�r and �erand S1 is a selection matrix such that

~D�( ��)S1 =

� PTi=1 G

�t

��=TPT

i=1 Q�t

��=T

�:

Thus we have

[I �R(�e�r )[R(�e�r )0(D�(�e�r )0��1D�(�e�r ))

�1R(�e�r )]�1R(�e�r )

0(D�(�e�r )0��1D�(�e�r ))

�1]

[D�(�e�r )0��1

pT ha;�(�er ; 0) + D

�(�e�r )0��1 ~D�( ��)S1

pT (�e�r � �er)] = 0;

and consequently

S1

sT

k2(�e�r � �er) = �[D�(�e�r )

0��1 ~D�( ��)]�1

�[I �R(�e�r )[R(�e�r )0(D�(�e�r )0��1D�(�e�r ))

�1R(�e�r )]�1R(�e�r )

0(D�(�e�r )0��1D�(�e�r ))

�1]

�D�(�e�r )0��1

sT

k2ha;�(�er ; 0):

[43]

Now as in (A:15) above we havesT

k2(ha;�(�er ; 0)� ha;�(�0; 0)� ha(�er ; 0) + ha(�0; 0)) = oB(1): (A.18)

Therefore we have

S1

sT

k2(�e�r � �er) = �[D�(�e�r )

0��1 ~D�( ��)]�1

�[I �R(�e�r )[R(�e�r )0(D�(�e�r )0��1D�(�e�r ))

�1R(�e�r )]�1R(�e�r )

0(D�(�e�r )0��1D�(�e�r ))

�1]

�D�(�e�r )0��1

pT=k2(h

a;�(�0; 0)� ha(�0; 0)) +A�T ;where

A�T = �[D�(�e�r )0��1 ~D�( ��)]�1

�[Ip �R(�e�r )[R(�e�r )0(D�(�e�r )0��1D�(�e�r )

0)�1R(�e�r )]�1R(�e�r )

0(D�(�e�r )0��1D�(�e�r ))

�1]

�D�(�e�r )0��1

pT=k2h

a(�er ; 0):

We show now that A�T = oB(1): But by the �rst order conditions of the original restricted problem we have

D0��1h(�er ; 0)�R(�er)[R(�er)0(D0��1D)�1R(�er)]�1R(�er)

0(D0��1D)�1D0��1ha(�er ; 0) = 0:

Hence

A�T = �[D�(�e�r )0��1 ~D�( ��)]�1

�[Ip �R(�e�r )[R(�e�r )0(D�(�e�r )0��1D�(�e�r )

0)�1R(�e�r )]�1R(�e�r )

0(D�(�e�r )0��1D�(�e�r ))

�1]

�D�(�e�r )0��1

pT=k2h

a(�er ; 0)

+[D�(�e�r )0��1 ~D�( ��)]�1[Ip �R(�er)[R(�er)0(D0��1D)�1R(�er)]

�1R(�er)0(D0��1D)�1]

�D0��1ha(�er ; 0)

= oB(1)

by the bootstrap local UWL andpT=k2ha(�er ; 0) = Op(1) which can be proven using Taylor expansion and the fact thatp

T (�er � �0) = Op(1):It follows that

S1

sT

k2(�e�r � �er) = �[D0��1D]�1[I �R[R0(D0��1D)�1R]�1R0(D0��1D)�1]

�D0��1pT=k2(h

a;�(�0; 0)� ha(�0; 0))+oB(1):

Consider now the bootstrapped score statistic

S� = ( Tk2)hh�(�e�r )� h(�er)

i0��1D�(D�0��1D�)�1D�0��1

hh�(�e�r )� h(�er)

i:

Note that by a Taylor expansion of h�(��r ) around �r we have

D�0��1

sT

k2(h�(�e�r )� h(�er)) =

sT

k2D�0��1D�( ��r)S1(�

e�r � �er) +

sT

k2D�0��1(ha;�(�er)� ha(�er))

= �D�0��1D�( ��r)[D0��1D]�1[I �R[R0(D0��1D)�1R]�1

�R0(D0��1D)�1]D0��1pT=k2(h

a;�(�0; 0)� ha(�0; 0))

+D�0��1pT=k2(h

a;�(�0; 0)� ha(�0; 0)) + oB(1)

= [R[R0(D0��1D)�1R]�1R0(D0��1D)�1]D0��1pT=k2(h

a;�(�0; 0)� h(�0; 0)) + oB(1)by A.18, the local bootstrap UWL, and the bootstrap CLT.

Thus

S� = (T

k2)hh�(�e�r )� h(�er)

i0��1D�(D�0��1D�)�1D�0��1

hh�(�e�r )� h(�er)

i=

pT=k2(h

a;�(�0; 0)� h(�0; 0))0��1D(D0��1D)�1

�[R0(D0��1D)�1R]�1R0(D�0��1D�)�1[R[R0(D0��1D)�1R]�1

�R0(D0��1D)�1]D0��1pT=k2(h

a;�(�0; 0)� h(�0; 0)) + oB(1)=

pT=k2(h

a;�(�0; 0)� h(�0; 0))0��1D(D0��1D)�1[R0(D0��1D)�1R]�1R0(D0��1D)�1]

�D0��1pT=k2(h

a;�(�0; 0)� h(�0; 0)) + oB(1)= W� + oB(1);

[44]

and the result follows.Now we consider the distance statistic

D� = (T

k2)[hh�(�e�r )� h(�er)

i0��1

hh�(�e�r )� h(�er)

i� [g�(�e�)� g(�e)]0 ~��1[g�(�e�)� g(�e)]

= (T

k2)[hha;�(�e�r ; 0)� ha(�er ; 0)

i0��1

hha;�(�e�r ; 0)� ha(�er ; 0)

i�hha;�(�e�; �)� ha(�e; )

i0~��1

hha;�(�e�; �)� ha;�(�e; )

i+ oB(1);

asg�(�e�)0 ~��1g�(�e�) = ha;�(�e�; �)0~��1ha;�(�e�; �);

and

T g(�e)0 ~�1g(�e) = T ha(�e; )0~��1ha(�e; )

= T ha(�e; )0~��1ha(�e; )

+T ha(�e; )0[~��1 � ~��1]ha(�e; )

= T ha(�e; )0~��1ha(�e; ) + oB(1);

sincepT ha(�e; ) = Op(1) and ~��1 � ~��1 = oB(1):

Now note that by two �rst order Taylor expansions we have

ha;�(�e�r ; 0)� ha(�er ; 0) = ha;�(�e�; �)� ha(�e; )

+D�( ��)

��e�r � �e�

�

��D( ��)

��er � �e

�;

where �� is in a line joining �e�r and �e� and �� is in a line joining �er and �e: Thus

(T

k2)[ha;�(�e�r ; 0)� ha(�er ; 0)]0��1[ha;�(�e�r ; 0)� ha(�er ; 0)]

= (T

k2)[ha;�(�e�; �)� ha(�e; )]0��1[ha;�(�e�; �)� ha(�e; )]

+(T

k2)2[D�( ��)

��e�r � �e�

�

��D( ��)

��er � �e

�]0��1[ha;�(�e�; �)� ha(�e; )]

+[D�( ��)

��e�r � �e�

�

��D( ��)

��er � �e

�]0��1[D�( ��)

��e�r � �e�

�

��D( ��)

��er � �e

�]:

Note that D�(�e�)0��1ha;�(�e�; �) = 0 and D�ha(�e; ) = 0: ThussT

k2D�( ��)0��1ha;�(�e�; �) =

sT

k2(D�( ��)0��1 �D�(�e�)0��1)

sT

k2ha;�(�e�; �) = oB(1);s

T

k2D�( ��)0��1ha(�e; ) =

sT

k2(D�( ��)0��1 � D�)ha(�e; ) = oB(1):

by the bootstrap UWL, the standard UWL,q

Tk2ha;�(�e�; �) = OB(1) and

pT ha(�e; ) = Op(1): Thuss

T

k2D�( ��)0��1

hha;�(��; �)� ha(�; )

i= oB(1);

and similarlyq

Tk2D( ��)0��1

hha;�(��; �)� ha(�; )

i= oB(1):

Now as �D�0��1hha;�(��; �)� ha(�; )

i= oB(1=

pT ) and �D0��1

hha;�(��; �)� ha(�; )

i= oB(1=

pT ) by the

�rst order conditions and the bootstrap UWL and the standard UWL.Also s

T

k2[D�( ��)

��e�r � �e�

�

��D( ��)

��er � �e

�] = OB(1);

aspT��e�r � �er

�= OB(1);

pT��e� � �e

�= OB(1);

pT��e � �0

�= Op(1);

pT��er � �0

�= Op(1): Thus

(T

k2)2[D�( ��)

��e�r � �e�

�

��D( ��)

��er � �e

�]0��1[ha;�(�e�; �)� ha(�e; )] = oB(1):

Now notice that D�( ��) = D + oB(1) and D( ��) = D + op(1); thussT

k2(D�( ��)

��e�r � �e�

�

��D( ��)

��er � �e

�) =

sT

k2D[

��e�r � �er

0

��e� � �e � �

�] + oB(1);

[45]

and consequentlysT

k2D[

��e�r � �er

0

��e� � �e � �

�] =

sT

k2D[D0��1D]�1[R[R0(D0��1D)�1R]�1R0(D0��1D)�1]

�D0��1pT=k2(h

a;�(�0; 0)� ha(�0; 0)) + oB(1):

Thus

T

k2[D�( ��)

��e�r � �e�

�

��D( ��)

��er � �e

�]0��1[D�( ��)

��e�r � �e�

�

��D( ��)

��er � �e

�]

=pT=k2(h

a;�(�0; 0)� ha(�0; 0))0��1D(D0��1D)�1[R0(D0��1D)�1R]�1R0(D0��1D)�1]

�D0��1pT=k2(h

a;�(�0; 0)� ha(�0; 0)) + oB(1)= W� + oB(1):

A.5 Proofs of the results in section 4.2

Proof of 4.7: We only need to show that the regularity conditions of the lemma A.2 of Gon�calves and White (2004) .Condition (a1) is satis�ed as g(:; �) is measurable and continuous functions of measurable functions are measurable. Sinceg(zt; �) is continuous on B the objective function is continuous g (�)0WT g (�) is continuous. Also Note that by T.

sup�2B

��g?T (�)0W ?T g

?T (�)� E?[g?T (�)]0WTE

?[g?T (�)]��

� sup�2B

��g?T (�)0W ?T g

?T (�)� E?[g?T (�)]0W ?

TE?[g?T (�)]

��+ sup�2B

��E?[g?T (�)]0W ?TE

?[g?T (�)]0 � E?[g?T (�)]0WTE

?[g?T (�)]�� :

Now by T

sup�2B

��g?T (�)0W ?T g

?T (�)� E[g?T (�)]0W ?

TE[g?T (�)]

�� =� sup�2B

kg?T (�)� E?[g?T (�)]k2 kW ?

T k

+2 sup�2B

kg?T (�)� E?[g?T (�)]k kW ?T k sup

�2BkE?[g?T (�)]k :

Now for ptT = �t; note that by Lemma A.1 Assumption 3.3 (a) is satis�ed. Hence the bootstrap UWL and the localUWL given by Lemmata A.6 and A.7 can be applied and therefore sup�2B

g?T (�)� E?[g?T (�)] = oB(1): W ?T �WT

=oB(1) and WT = Op(1):

E?[g?T (�)] =XT

i=1gtT (�)�t

= (1 + op(1))1

T

XT

i=1gtT (�)

= Op(1);

by Lemma A1 of Smith (2011). Thus

sup�2B

��g?T (�)0W ?T g

?T (�)� E[g?T (�)]0W ?

TE[g?T (�)]

�� = oB(1):Now

sup�2B

��E?[g?T (�)]0W ?TE

?[g?T (�)]0 � E?[g?T (�)]0WTE

?[g?T (�)]0��

= sup�2B

��E?[g?T (�)]0[W ?T �WT ]E

?[g?T (�)]0��

� sup�2B

kE?[g?T (�)]k k[W ?T �WT ]k

= Op(1)op(1):

Uniqueness was proven in Lemma 2.3 of Newey and MacFadden (1994).

Proof of 4.8: Note that by that by Hansen (1982) we have

pT (� � �) d! N(0; (G0WG)�1G0WWG(G0WG)�1);

[46]

as since the normal is continuous we have for � = (G0WG)�1G0WWG(G0WG)�1

supx2Rp

��Pf��1=2T 1=2(� � �0) � xg � �(x)��by Polya's Theorem.

We prove now that

limT!1

P(supx2Rp

��P?f��1=2sT

k2(�? � ~�) � xg � �(x)

�� ")= 0:

Let G?T � @g?T (��)=@�0; To prove asymptotic normality notice that by the FOC

pT=k2G?0TW

?T g

?T (�

?) = 0: Hence a

�rst order Taylor expansion around � yieldspT=k2G

?0TW

?T g

?T (~�) + G?0TW

?T~G?TpT=k2(�

? � ~�) = 0;

where �G�T = @g?T (��?)=@�0 and �� is on a line joining ~� and �?: Solving for

pT=k2(�? � ~�) we obtainp

T=k2(�? � ~�) = �[G?0TW ?

T�G?T ]

�1pT=k2G?0TW ?T g

?T (~�):

Now notice that by a Taylor expansionpT=k2G

?0TW

?T g

?T (~�) =

pT=k2G

?0TW

?T g

?T (�0) +

pT=k2G

?0TW

?T�G?T (

~� � �0)

=pT=k2G

?0TW

?T [g

?T (�0)� ~gT (�0)] +

pT=k2G

?0TW

?T ~gT (�0) +

pT=k2G

?0TW

?T�G?T (

~� � �0);

where �G?T � @g?T ( ��)=@�0 and ~gT (�0) =PTt=1 gt;T (�0)�t and

�� is on a line joining � and �0:Now note that by (A:2) we have

pT ( ~� � �0) = �(G0TWT

�GT )�1pTG0TWT ~gT (�0):

Thus pT=k2G

?0TW

?T ~gT (�0) +

pT=k2G

?0TW

?T~G�T (

~� � �0) =pT=k2G

?0TW

?T ~gT (�0)�

pT=k2G

?0TW

?T~G�T (G

0TWTG

�T )

�1G0TWT ~gT (�0) = oB(1)

sincepT ~gT (�0) = Op(1); WT =W + op(1); GT = G+ op(1); G

?T = G+ oB(1); W

?T =WT + oB(1):

NowpT=k2G�0TW

�T [g

�T (�0)�~gT (�0)] converges to N(0; (G0WG)�1G0WWG(G0WG)�1) by bootstrap CLT Theorem

A.2 and the fact that G�T�G = oB(1) andW �T =W+oB(1). The result follows as the

pT=k2(�� ~�) converges uniformly to

the same asymptotic distribution of T 1=2(��0):We note that ~� can be replaced by �e becausepT ( ~��0)�

pT (�e��0) =

op(1):

Proof of Lemma 4.2: The proof of this Lemma is identical to the proof of Lemma 4.1 with ptT = �t and uses thefact that T �t = 1 + op(1) by Lemma A.1.

Proof of Theorem 4.9: Note that by a Taylor expansionpT=k2g

?(�e?) =pT=k2g

?( ~�) + ~G�pT=k2(�

e� � ~�);

where ~G�T � @g?T ( ��)=@�0 where �� is in a line joining �e? and ~�:Note that by Theorem 4.8 with W ?

T =~?�1p

T=k2(�e? � ~�) = �[G?0T ~?�1 ~G?T ]�1G?0T ~?�1

pT=k2[g

?T (�0)� ~gT (�0)] + oB(1):

Also by a Taylor expansionpT=k2(g

?( ~�)� g?(�0)� ~gT ( ~�) + ~gT (�0)) = ( �G?T � �GT )pT=k2

�~� � �0

�;

where �G?T = @g?T (��)=@�0 where �� is in a line joining ~� and �0 and �GT = @~gT ( ��)=@�

0 where �� is in a line joining ~� and �0.

Now �GT = G+oB(1) by Lemma A.7, �GT = G+op(1) by Lemma A.1 of Smith (2011) and the fact that T �t = 1+op(1)

by Lemma A.1. Also by Theorem 4.8pT�~� � �0

�= Op(1):

Now we show thatpT=k2~g( ~�) = op(1):Note that by a Taylor expansion

pT ~g�~��=pT ~g (�0) + �GT

pT ( ~� � �0);

where �GT = @~gT ( ��)=@�0 where �� is in a line joining ~� and �0: �GT = G+ op(1) by Lemma A.1 of Smith (2011)- and the

fact that T �t = 1 + op(1) by Lemma A.1. Thus by Theorem 4.8 we have

�GTpT ( ~� � �0) = G�G0�1T 1=2g(�0) + op(1):

[47]

Now by Lemma A.2 we have pT ~g (�0) = [G�G

0�1]T 1=2g(�0) + op(1)

HencepT ~g( ~�) = op(1):ThussT

k2g?(�e?) =

pT=k2(g

?(�0)� ~g(�0))� [G?0T ~?�1 ~G?T ]�1G?0T ~?�1pT=k2[g

?T (�0)� ~gT (�0)] + oB(1)

= [Im � ~G?[G?0T ~?�1 ~G?T ]

�1G?0T ~?�1]

pT=k2[g

?T (�0)� gT (�0)] + oB(1):

Now since ~G� = G + oB(1); G�T = G + oB(1) ~

��1 = �1 + oB(1) and by the bootstrap CLT Theorem A.2pT=k2[g�T (�0)� gT (�0)] converges to N(0;) It follows as in Theorem 4.3 that J ? = T g?(�e?)0 ~��1g?(�e?)=k2 converges

to �2(m�p): Since J d! �2(m�p) the result follows by Polya Theorem Ser ing (2002, p.18), as the chi-squared distributionhas a continuous c.d.f.

Proof of 4.10: We start by deriving the asymptotic distribution of W?: De�ne ha;?t (�; ) � (g?(zt; �)0; [q?(zt; �)� ]0), ha;?(�; ) �

PmTt=1 h

a;?t (�; )=mT and ~Q?(�; ) = ha;?(�; )0�?�1ha;?(�; ): Note that the unrestricted GMM esti-

mator solves(�e?0; ?0)0 = argmin

�2B; 2Rm~Q?(�; ):

As before the solution is given by

�e? = argmin�2B

g?(�)0?�1g?(�);

? = q?(�e?)� �?21?�1g?(�e?):

Consistency of �e? follows from Theorem 4.7. We note that by Lemma A.6 and �� = � + oB(1) and ? = + oB(1):

We derive now the asymptotic distribution of (�e?0; ?0)0:Since these estimators satisfy the �rst order conditions we

have D?0�?�1ha;?(�e?; ?) = 0: Thus by a Taylor expansion around��e0; 0

�0D?0�?�1ha;?(�e; ) + D?0�?�1 ~D?

��e? � �e e? �

�= 0;

where �D? � D?( ��?)

D?(�) =

� PmTi=1 G

?t (�) =mT 0PmT

i=1 Q?t (�) =mT �Is

�;

and �� is in a line joining �e? and �e: Thus

pT

��e? � �e e? �

�= �[D?0�?�1 ~D?]�1D?0�?�1

pT ha;?(�e; ):

Now by a Taylor expansion

pT ha;?(�e; ) = T 1=2ha;?(�0; 0) + �D?

��e � �0

�;

where �D? = D?( ��);where �� is in a line joining � and �0:We show now that

[D?0�?�1 �D?]�1D?0�?�1[T 1=2~haT (�0; 0)� T 1=2 �D?

��e � �0

�] = oB(1):

First notice that by Lemma A.2 above we have

T 1=2~haT (�0; 0) = T�1=2haT (�0; 0)��11�21

�PT 1=2g(�0) + op(1) (A.19)

= T�1=2XT

t=1haT (�0; 0) + �S1PT

1=2g(�0) + op(1):

Thus as �?�1 = ��1 + oB(1) we have

[D?0� �D?]�1D?0�?�1T 1=2~haT (�0; 0)

= [D?0� �D?]�1D?0�?�1T�1=2XT

t=1haT (�0; 0)

�[D?0�?�1 �D?]�1D?0S1PT1=2g(�0) + op(1)

= [D?0�?�1 �D?]�1D?0�?�1T�1=2XT

t=1haT (�0; 0)

�[D?0�?�1 �D?]�1�G?0 Q?0

� � PT 1=2g(�0)0

�= [D?0�?�1 �D?]�1D?0�?�1T�1=2

XT

t=1haT (�0; ) + oB(1);

[48]

as G? = G+ oB(1) by Lemma A.7 and G0P = 0:

Now notice that

pT

��e � �0

�=pT

��e � �0

~

��pT

�0

~ �

�(A.20)

and the usual asymptotic representation of the e�cient GMM estimator yields

pT

��e � �0

�= �[D0��1 �D]�1D0��1T�1=2

XT

t=1hat (�0; 0) + op(1); (A.21)

where �D = D( ��) and �� is in a line joining �e and �0: Hence by (A:20) and (A:21) we have

[D?0�?�1 �D?]�1D?0�?�1 �D?pT

��e � �0

�= [D?0�?�1 �D?]�1D?0�?�1 �D?

pT

��e � �0

~

��OB(1)

pT

�0

~ �

�as D?; �D? �D? converge to D by the Lemma A.7 ; and the fact that �?�1 = ��1 + oB(1): It remains to prove thatpT (~ � ) = op(1):First note that p

T (~ � ) =pT ~q(�e)�

pT q(�e) + �21�

�111

pT g(�e):

Now Lemma A.1 above yields

pT ~q(�e) =

pT q(�e) +

XT

t=1

pT (�i � 1)qtT (�e)

= q(�e) +ST

T

XT

t=1qtT (�

e)gtT (�e)0T 1=2

ST�(1=k2 + op(1)) + op(1):

Also by the FOC of the GEL problem with respect to �

1

T

XT

t=1�1(k�

0gtT (�gel))gtT (�gel) = 0:

Thus by a Taylor expansion around 0 we have

� 1T

XT

t=1gtT (�gel) +

1

T

XT

t=1�2(k~�

0gtT (�gel))gtT (�gel)gtT (�gel)0�=k2 = 0:

Thus

T 1=2

ST�=k2 = [

ST

T

TXt=1

�2(k~�0gtT (�gel))gtT (�gel)gtT (�gel)

0]�1pT gT (�gel):

Now

[ST

T

TXt=1

�2(k~�0gtT (�gel))gtT (�gel)gtT (�gel)

0]�1 = ��111 + op(1)

by Theorem 2.5 of Smith (2011)-. Also by a Taylor expansionpT gT (�gel) =

pT gT (�0) + �GT

pT (�gel � �0)

and �GT � @gT ( ��)=@�0 and �� is in a line joining �gel and �0: now by Lemma A.2 of Smith (2011)-pT gT (�0) =

pT g(�0)+

Op(T�1=2):Since p

T g(�e) =pT g(�0) + �G

pT (�e � �0);

where �G � @g( ��)=@�0: It follows thatpT gT (�gel) =

pT g(�e) + op(1) as �GT and �G converge to G and

pT (�gel � �e) =

op(1): ConsequentlypT (~ � ) = op(1) as T 1=2g(�e) = Op(1):

Hence pT=k2

��e? � �e e? � ~

�= �[D?0�?�1 ~D?]�1D?0�?�1

pT=k2[h

a;?T (�0; 0)� ~haT (�0; 0)] + oB(1):

Thus by a Taylor expansion we havepT=k2

�a(�e?)� a(�e)

e? � ~

�= �R( ��?)[D?0�?�1 ~D?]�1D?0�?�1

pT=k2[h

a;?T (�0; 0)� ~haT (�0; 0)] + oB(1):

where ��? is in a line joining �e? and �e:Thus

W? = (T=k2)[r? � ~r]0

hR?(D?0�?�1D?)�1R?0

i�1[r? � ~r]

=pT=k2[h

a;?T (�0; 0)� ~haT (�0; 0)]0

hR?(D?0�?�1D?)�1R?0

i�1R( ��?)[D?0�?�1 ~D?]�1

�D?0�?�1pT=k2[h

a;?T (�0; 0)� ~haT (�0; 0)]:

[49]

Thus as in Theorem 4.4 above W? converges to a chi-squared distribution with s + r degrees of freedom as D? =D + oB(1) by the bootstrap UWL Lemma A.7 and �

? = � + oB(1) and the fact that by the bootstrap CLT we havepT=k2[h

a;?T (�0; 0)� ~haT (�0; 0)] converging to N(0;�):

We consider now the S? statistic. First we derive the distribution of the bootstrapped restricted estimator. We �rstnote that this estimator is consistent by Theorem 4.7 adapted to the moment restrictions h(zt; �) and considering the thecompact parameter space f� 2 B : a(�) = 0g :

The Lagrangian of the restricted problem is

L? = ~Q?(�; )� a(�)0�� 0�:

Denote the value of the Lagrange Multiplier at the saddle point as '? = (�?0; �?0)0 thus the �rst order conditions are

D?0�?�1h(�e?r ; 0)�R(�e?r )'� = 0:

Multiplying both sides by R(�e?r )0(D?0�?�1D?)�1 we have

R(�e?r )0(D?0�?�1D?)�1D?0�?�1ha;?(�e?r ; 0)�R(�e?r )0(D?0�?�1D?)�1R(�e?r )'

? = 0:

Thus'? = [R(�e?r )

0(D?0�?�1D?)�1R(�e?r )]�1R(�e?r )

0(D?0�?�1D?)�1D?0�?�1h(�e?r ; 0):

Hence

D?0�?�1ha;?(�e?r ; 0)

�R(�e?r )[R(�e?r )0(D?0�?�1D?)�1R(�e?r )]�1R(�e?r )

0(D?0�?�1D�)�1D?0�?�1ha;?(�e?r ; 0) = 0:

But by a Taylor expansion around �er we have ha;?(�?r ; 0) = h

a;�(�er ; 0) + �D?S1(�e?r � �er) where �D? = D?( ��?); ��? is

in a line joining �e?r and �er and S1 is a selection matrix such that

�D?S1 �� PT

i=1 G?t

��?�=TPT

i=1 Q?t

��?�=T

�:

Thus we have

[I �R(�e?r )[R(�e?r )0(D?0�?�1D?)�1R(�e?r )]�1R(�e?r )

0(D?0�?�1D?)�1]

[D?0�?�1pT ha;?(�er ; 0) + D

?0�?�1 �D?S1pT (�e?r � �er)] = 0:

Hence

S1pT (�e?r � �er) = �[D?0�?�1 �D?]�1[I �R(�e?r )[R(�e?r )0(D?0�?�1D?)�1R(�e?r )]

�1R(�?r )0(D?0�?�1D?)�1]

�D?0�?�1pT ha;?(�er ; 0):

Now note that by a Taylor expansion around �0 we havepT=k2(h

a;?(�er ; 0)� ha;?(�0; 0)� ha(�er ; 0) + ha(�0; 0)) = ( �D? � �D)S1pT=k2(�

er � �0) (A.22)

= oB(1);

where �D? = D?( ��) and �� is in a line joining �er and �0 and�D = D( ��) and �� is in a line joining �er and �0: The second

line follows from a UWL an bootstrap UWL and the fact thatpT=k2(�er � �0) = Op(1):

We show now that

D?0�?�1pT=k2~h

aT (�0; 0) + D

?0�?�1pT=k2h

a(�0; 0) = oB(1): (A.23)

Note that by Lemma A.2 we have

D?0�?�1pT=k2~h

aT (�0; 0) = D

?0�?�1pT=k2h

a(�0; 0) + D?0�?�1�S1P

pT=k2g(�0) + op(1);

and D?0�?�1�S1P = DS1P + oB(1) = oB(1) as D? = D+ oB(1) by Lemma A.7; �?�1 = ��1 + oB(1), and the fact that

G0P = 0 the result follows.Thus we have

S1pT=k2(�

e?r � �er) = �[D?0�?�1 �D?]�1

�[I �R(�e?r )[R(�e?r )0(D?0�?�1D?)�1R(�e?r )]�1R(�e?r )

0(D?0�?�1D?)�1]

�D?0�?�1pT=k2(h

a;?(�0; 0)� ~haT (�0; 0))+A?T + oB(1): (A.24)

where

A?T = �[D?0�?�1 ~D?]�1[I �R(�e?r )[R(�e?r )0(D?0�?�1D?)�1R(�e?r )]�1R(�e?r )

0(D?0�?�1D?)�1]

�D?0�?�1pT=k2h(�

er ; 0):

[50]

But by the FOC of the original restricted problem we have

D0��1h(�er ; 0)�R(�er)[R(�er)0(D0��1D)�1R(�er)]�1R(�er)

0(D0��1D)�1D0��1h(�er ; 0) = 0:

Thus

A?T = �[D?0�?�1 ~D?]�1([I �R(�?r )[R(�?r )0(D?0�?�1D?)�1R(�?r )]�1R(�?r )

0(D?0�?�1D?)�1]D?0�?�1

�[I �R(�er)[R(�er)0(D0��1D)�1R(�er)]�1R(�er)

0(D0��1D)�1]D0��1)pT=k2h(�

er ; 0)

= oB(1):

by the bootstrap UWL andpT=k2h(�er ; 0) = Op(1):

Now

S? = ( Tk2)hh?(�e?r )� ~hT (�er)

i0�?�1D?(D?0�?�1D?)�1D?0�?�1

hh?(�e?r )� ~hT (�er)

i:

Notice that by two Taylor expansionspT=k2

hh?(�er)� ~hT (�er)

i=

pT=k2(h

?(�0)� ~hT (�0)) +pT=k2( �D

? � �D)(�r � �0)

=pT=k2(h

?(�0)� ~hT (�0)) + oB(1);

where �D? � @h?( ��r)=@�0 where ��r lies in a line between �r and �0 and �D � @hT ( ��r)=@�0 where ��r lies in a line between�r and �0: The second line is due to a UWL a bootstrap UWL and the fact that

pT=k2(�r � �0) = Op(1):

Thus by a Taylor expansion

D0�?�1pT=k2(h

?(�e?r )� ~hT (�r)) =pT=k2D

?0�?�1 �D�(�e?r � �r)

+pT=k2D

?0�?�1(h?(�er)� ~hT (�er))

= �D?0�?�1 ~D?[D?0�?�1 ~D?]�1[I �R(�?r )[R(�?r )0(D?0�?�1D?)�1R(�?r )]�1R(�?r )

0(D?0�?�1D?)�1]

�D?0�?�1pT=k2(h

a;?(�0; 0)� ~haT (�0; 0))

+D?0�?�1pT=k2(h

a;?(�0; 0)� ~haT (�0; 0))

= [R(�?r )[R(�?r )0(D?0�?�1D?)�1R(�?r )]

�1R(�?r )0(D?0�?�1D?)�1]

�D?0�?�1pT=k2(h

a;?(�0; 0)� ~hT (�0; 0)) + oB(1);

where �D? � @h?( ��r)=@�0 and ��r lies in a line between �er and �0 and using (A:23) :Thus

S? = (T

k2)hh?(�e?r )� ~hT (�er)

i0�?�1D?(D?0�?�1D?)�1D?0�?�1

hh?(�e?r )� ~hT (�er)

i=

pT=k2(h

a;?(�0; 0)� ~haT (�0; 0))0�?�1D?(D?0�?�1D?)�1

�[R(�e?r )0(D?0�?�1D?)�1R(�e?r )]�1R(�e?r )

0(D?0�?�1D?)�1

�[R(�e?r )[R(�e?r )0(D?0�?�1D?)�1R(�e?r )]�1R(�?r )

0(D?0�?�1D?)�1]

�D?0�?�1pT=k2(h

a;?(�0; 0)� ~haT (�0; 0)) + oB(1)

=pT=k2(h

a;?(�0; 0)� ~haT (�0; 0))0�?�1D?(D?0�?�1D?)�1

�[R(�e?r )0(D?0�?�1D?)�1R(�e?r )]�1R(�e?r )

0(D?0�?�1D?)�1]

�D?0�?�1pT=k2(h

a;?(�0; 0)� ~haT (�0; 0)) + oB(1);

and the result follows.Now we consider the distance statistic

D? = (T

k2)[[h?(�e?r )� ~h(�r)]0�?�1[h?(�e?r )� ~h(�r)]� g?(�e?)0 ~?�1g?(�e?)]

= (T

k2)[[ha;?(�e?r ; 0)� ~haT (�r; 0)]0�?�1[ha;?(�e?r ; 0)� ~haT (�r; 0)]

�ha;?(�e?; ?)0~�?�1ha;?(�e?; ?)];

asg?(�e?)0 ~?�1g?(�e?) = ha;?(�e?; ?)0~�?�1ha;?(�e?; ?):

Note now that by two Taylor expansions

ha;?(�e?r ; 0)� ~haT (�er ; 0) = ha;?(�e?; ?)� ~haT (�e; )

+ �D?

��e?r � �e?� ?

�� D

��er � �e�

�;

[51]

where �D? � @ha;?( ��?; � ?)=@�0 and (��?0; � ?0)0 is in a line joining (�e?0r ; ?0)0 and (�e?0; 00)0 and �D � @~haT (��; � )=@�0 and

(��0; � 0)0 is in a line joining (�e0r ; 00)0 and (�e0; 0)0: Thus

T

k2[ha;?(�e?r ; 0)� ~ha(�er ; 0)]0�?�1[ha;?(�e?r ; 0)� ~ha(�er ; 0)]

=T

k2f[ha;?(�e?; ?)� ~ha(�e; ) + �D?

��e?r � �e�� ?

�� D

��er � �e�

�]0�?�1

�[ha;?(�e?; ?)� ~ha(�e; ) + �D?

��e?r � �e�� ?

�� D

��er � �e�

�]g

=T

k2[ha;?(�e?; ?)� ~ha(�e; )]0�?�1[ha;?(�e?; ?)� ~ha(�e; )]

+T

k22[ �D?

��e?r � �e?� ?

�� D

��er � �e�

�]0�?�1[ha;?(�e?; ?)� ~ha(�e; )]

+T

k2[ �D?

��e?r � �e?� ?

�� D

��er � �e�

�]0�?�1

�[ �D?

��e?r � �e?� ?

�� D

��er � �e�

�]:

Note thatpT �D?0�?�1~haT (�

e; ) = �D?0�?�1pT haT (�

e; ) + �D?0�?�1pTXT

t=1(n�t � 1)htT (�e; )

= �D?0�?�1pT haT (�

e; ) + �D?0�?�1ST

T

XT

t=1htT (�

e; )g0tTT 1=2

ST�+ op(1)

= �D?0�?�1pT haT (�

e; )� �D?0�?�1(�S1 + op(1))(T1=2P gT (�0) + op(1))

using Lemma A.1 and the fact that (T 1=2=ST )� = �T 1=2P gT (�0) + op(1) by Smith (2011) Proof of Theorem 2.3 (see

expression B.2, p A.11). Now as GP = 0 and �D� = D + op(1) by Lemma A.7 and �?�1 = ��1 + op(1) and the factT 1=2gT (�0) = Op(1) that we have

pT �D?0�?�1~ha(�; ) =

pT �D?0�?�1ha(�; ) + op(1); (A.25)

Now by three Taylor expansions we have

ha;?(�e?; ?)� ~haT (�e; ) = ha;?(�e; )� ~haT (�e; ) + �D?

��e? � �e ? �

�= ha;?(�0; 0)� ~haT (�0; 0) + ( �D? �

...D)

��e � �0

�+ �D?

��e? � �e ? �

�= ha;?(�0; 0)� ~haT (�0; 0) + ~ha(�0; 0)� ha(�0; 0)

+( �D� �...D)

��e � �0

�+ �D?

��e? � �e ? �

�= OB(1=

pT );

where �D? � @ha;?( ��?; � ?)=@�0 and (��?0; � ?0)0 is in a line joining (�e?0; ?0)0 and (�e0; 0)0 and...D � @~haT (

...� ;... )=@�0 and

(...�0;... 0)0 is in a line joining (�e0; 0)0 and (�00; 0

0)0 and �D? = @ha;?( ��?; � ?)=@�0 and (��0; � 0)0 is in a line joining (�e0; 0)0

and (�00; 00)0: The result follows from the bootstrap CLT, the standard CLT, Lemma A.2 and asymptotic normality of

((�e? � �e)0; ( ? � )0)0 and ((�e � �0)0; 0)0.Thus by A.25

�D?0�?�1pT [ha;?(�e?; ?)� ~haT (�e; )] = �D?0�?�1

pT [ha;?(�e?; ?)� haT (�e; )] + op(1)

= [ �D?0�?�1 � D?0�?�1]pT [ha;?(�e?; ?)� haT (�e; )] + op(1);

as D?0�?�1ha;?(�e?; ?) = 0 and D?0�?�1pT ha(�e; ) = [D?0�?�1�D0��1]

pT ha(�e; ) = op(1) since D0��1ha(�e; ) =

0 and consequently

D0��1pT haT (�

e; ) = D0��1pT (haT (�

e; )� ha(�e; )) +

= D0��1pT (haT (�0; 0)� ha(�0; 0)) + D0��1( �DT � �D)

pT

��e � �0

�= oB(1)

with �DT = @~haT (��; � )=@�0 and �D = @~ha( ��; � )=@�0 and (��0; � 0)0 is in a line joining (�e0; 0)0 and (�00; 0

0)0: The result followsfrom equation (A11) page [A.3] and Lemma A.1 of Smith (2011).

[52]

SincepT [ha;?(�e?; ?) � ha(�e; )] = Op(1) and �D? = D + oB(1); D

? = D + oB(1) by Lemma A.7 and �?�1 =

��1+op(1) and the fact thatpT [ha;?(�e?; ?)�ha(�e; )] = Op(1) it follows that �D?0�?�1

pT [ha;?(�e?; ?)�~ha(�e; )] =

oB(1): Similarly �D�?�1pT [ha;?(�e?; ?)� ~ha(�e; )] = oB(1):

Note also that �e?r ��e? = (�e?r ��er)�(�e?��e)+(�er��0)�(�e��0) = OB(1=pT ), �er��e = (�er��0)�(�e��0) =

Op(1=pT ) and ? � + = OB(1=

pT ) and = Op(1=

pT ):

HenceT

k22[ �D?

��e?r � �e?� ?

�� D

��er � �e�

�]0�?�1[ha;?(�e?; ?)� ~haT (�e; )] = oB(1):

Now note thatsT

k2[ �D?

��e?r � �e?� ?

�� D

��er � �e�

�] = �

sT

k2D(

��e? � �e ? �

��e?r � �er

0

�) + oB(1)

as �D? = D + oB(1); �D = D + op(1) and �r � � = Op(1=pT ) and � = OB(1=

pT ) and = Op(1=

pT ):

Thus

�

sT

k2D(

��e? � �e ? �

��e?r � �er

0

�) =

sT

k2D[D?0�?�1 ~D?]�1[R(�?r )

�[R(�?r )0(D?0�?�1D?)�1R(�?r )]�1R(�?r )

0(D?0�?�1D?)�1]D?0�?�1pT=k2(h

a;?(�0; 0)� ~haT (�0; 0));

using the asymptotic representations ofpT=k2((�e? � �e)0; ( ? � )0)0 given in Theorem 4.8 and of S1

pT=k2(�e?r � �er)

given in (A.24):Thus

T

k2[ �D?

��e?r � �e?� ?

�� D

��er � �e�

�]0�?�1[ �D?

��e?r � �e?� ?

�� D

��er � �e�

�]

= [pT=k2(h

a;?(�0; 0)� ~haT (�0; 0))0�?�1D?(D?0�?�1D?)�1R(�e?r )[R(�e?r )

0(D?0�?�1D?)�1R(�e?r )]�1

�R(�e?r )0[D?0�?�1 ~D?]�1D0�?�1D[D?0�?�1 ~D?]�1[R(�e?r )[R(�e?r )

0(D?0�?�1D?)�1R(�e?r )]�1

�R(�e?r )0(D?0�?�1D?)�1]D?0�?�1pT=k2(h

a;?(�0; 0)� ~haT (�0; 0))) + oB(1)

= [pT=k2(h

a;?(�0; 0)� ~haT (�0; 0))0�?�1D?(D?0�?�1D?)�1R(�e?r )[R(�e?r )

0(D?0�?�1D?)�1R(�e?r )]�1

�R(�e?r )0(D?0�?�1D?)�1]D?0�?�1pT=k2(h

a;?(�0; 0)� ~haT (�0; 0)) + oB(1);and the result follows as in the proof of Theorem 4.4.

Proof of Theorem 4.11: We start by deriving the asymptotic distribution ofWy:De�ne ha;yt (�; ) � (gy(zt; �)0; [qy(zt; �)� ]0), ha;y(�; ) =

PmTt=1 h

a;yt (�; )=mT and ~Qy(�) = ha;y(�; )0�y�1ha;y(�; ): Note that the unrestricted GMM estimator

solves(�ey0; y0) = argmin

�2B; 2Rm~Qy(�; ):

The solution is given by

�ey = argmin�2B

gy(�)0y�1gy(�);

y = qy(�ey)� �y21y�1gy(�ey):

Consistency of �ey follows from Theorem 4.7 hence �ey = �e + oB(1) and since �e = �er + op(1) as � and �r are both

consistent we have �ey = �er + oB(1). We note that by Lemma A.6 and �y = �+ oB(1) we have ey = + oB(1) = oB(1):

Since these estimators satisfy the �rst order conditions we have Dy0�y�1hy(�ey; y) = 0: Thus by a Taylor expansion

around (�e0r ; 00)0 we have

Dy0�y�1hy(�r; 0) + Dy0�y�1 �Dy

��ey � �er y

�= 0;

where �Dy = Dy( ��y);

Dy(�) =

PmTi=1 G

yt (�)=mT 0PmT

i=1 Qyt (�)=mT �Is

!;

and ��y is in a line joining �ey and �er : Thus

pT


�= �[Dy0�y�1 �Dy]�1Dy0�y�1

pT ha;y(�er ; 0):

Now notice that expandingpT ha;y(�er ; 0) around �0 yieldsp

T ha;y(�er ; 0) =pT ha;y(�0; 0) + �DyS1(�

er � �0)

=pT ha;y(�0; 0)�

pT~haT (�0; 0)

+pT~haT (�0; 0) +

~DyS1(�er � �0);

[53]

where ~Dy = Dy( ~�y) and ~�y is in a line joining �er and �0:

By the asymptotic representation of �er we have

�DyS1pT (�er � �0) = �DS1[�� R0

�R�R0

��1R�]D0��1

pT haT (�0; 0) + op(1):

as �Dy = D + oB(1) by Lemma A.7.Also by Lemma A.4 we have

pT~haT (�0; 0) = DS1[�� R0

�R�R0

��1R�]D0��1]

pT haT (�0; 0) + op(1); (A.26)

ConsequentlypT~ha(�0; 0) + ~DyS1

pT (�r � �0) = op(1): (A.27)

It follows thatpT=k2


�= �[Dy0�y�1 �Dy]�1Dy0�y�1[

sT

k2ha;y(�0; 0)�

pT~haT (�0; 0)]:

Thus by a Taylor expansion we havepT=k2

�a(�ey) y

�=

pT=k2

�A( ��y)(�ey � �er)

y

�=

�A( ��y) 00 I

�pT


�= �R( ��y)[Dy0�y�1 ~Dy]�1D?0�?�1

pT=k2[h

a;?T (�0; 0)� ~haT (�0; 0)] + oB(1);

where ��y is in a line joining �ey and �er :Thus

Wy = (T=k2)ry0[Ry(Dy0�y�1Dy)�1Ry0]�1ry

=pT=k2[h

a;yT (�0; 0)� ~haT (�0; )]0�y�1Dy[Dy0�y�1 ~Dy]�1R( ��y)0[Ry(Dy0�y�1Dy)�1Ry0]�1

�R( ��y)[Dy0�y�1 ~Dy]�1Dy0�y�1pT=k2[h

a;?T (�0; 0)� ~haT (�0; 0)]:

Thus as in the proof of Theorem 2.4 aboveWy converges to a chi-squared distribution with s+ r degrees of freedom as

Dy = D+oB(1) by the Lemma A.7 and �y = �+oB(1) and the fact that by the bootstrap CLT we havepT=k2[h

a;yT (�0; )�

~haT (�0; )] converging to N(0;�):

We consider know the Sy statistic. First we derive the distribution of the bootstrap restricted estimator. We note that�yr is consistent by Theorem 4.7 applied to the moment indicators h(zt; �) and restricted parameter space Br. Note thatthe Lagrangian of the restricted problem is

Ly = ~Qy(�; )� a(�)0�y � 0�:

Denote 'y = (�y0; �y0)0 � and � are the Lagrange multipliers evaluated at the optimum. Thus the �rst order conditionsyield

Dy0�y�1h(�eyr ; 0)�R(�eyr )'y = 0:Multiplying both sides by R(�eyr )0(Dy0�y�1Dy)�1 we have

R(�eyr )0(Dy0�y�1Dy)�1Dy0�y�1ha;y(�er ; 0)�R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )'

y = 0:

Thus'y = [R(�eyr )

0(Dy0�y�1Dy)�1R(�eyr )]�1R(�eyr )

0(Dy0�y�1Dy)�1Dy0�y�1h(�eyr ; 0):

Hence

Dy0�y�1ha;y(�eyr ; 0)�R(�eyr )[R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )]�1R(�eyr )

0(Dy0�y�1Dy)�1Dy0�y�1ha;y(�eyr ; 0) = 0:

But by a Taylor expansion ha;y(�eyr ; 0) = ha;y(�er ; 0) + ~DyS1(�eyr � �er) where where S1 is a selection matrix such that

~DyS1 =

0@ PTi=1 G

yt

�~�y�=TPT

i=1 Qyt

�~�y�=T

1A~Dy = Dy( ~�y) where ~�y is in a line joining �eyr and �er ; thus we have

[I �R(�eyr )[R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )]�1R(�eyr )

0(Dy0�y�1Dy)�1]

�Dy0�y�1pT ha;y(�er ; 0) + D

y0�y�1 ~DyS1pT (�eyr � �er) = 0:

Hence

S1pT (�eyr � �er) = �[Dy0�y�1 ~Dy]�1[I �R(�eyr )[R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )]

�1R(�eyr )0(Dy0�y�1Dy)�1]

�Dy0�y�1pT ha;y(�er ; 0):

[54]

Now note that by a Taylor expansion around �0 we havepT=k2(h

a;y(�er ; 0)� ha;y(�0; 0)� ~haT (�er ; 0) + ~haT (�0; 0)) = ( �Dy � �D)S1pT=k2(�

er � �0) (A.28)

= oB(1);

where �Dy = Dy( ��) and �� lies in a line joining �er and �0 and �D = D( ��r) where

~D(�) =

� PTi=1Gt (�)�t;r 0PTi=1Qt (�)�t;r �Is

�;

and ��r is in a line joining �er and �0:Note that by a Taylor expansionp

T=k2~haT (�

er ; 0) =

pT=k2~h

aT (�0; 0) +

...DS1(�

er � �0); (A.29)

where...D = D(

...� ) and

...� is on a line joining �er and �0: Note that similarly to (A:27) the rhs of (A:29) is op(1): Thus we

have


�1R(�eyr )0(Dy0�y�1Dy)�1]

�Dy0�y�1pT [ha;y(�0; 0)�

pT~haT (�0; 0)] + oB(1):

Now let us consider the score statistic:

Sy = ( Tk2)hy(�eyr )

0�y�1Dy(Dy0�y�1Dy)�1Dy0�y�1hy(�eyr ):

We proved above that pT=k2(h

a;y(�er ; 0)� ha;y(�0; 0) + ~haT (�0; 0)) = oB(1):Notice also by a Taylor expansionp

T=k2hay(�eyr ; 0) =

pT=k2h

a;y(�er ; 0) +pT=k2 �D

yS1(�eyr � �er);

where �Dy = Dy( ��) and �� is in a line joining �er and �0:Thus

Dy0�y�1pT=k2h

y(�eyr ) = Dy0�y�1[ha;y(�0; 0)� ~h(�0; 0)]�Dy0�y�1( �Dy)[Dy0�y�1 ~Dy]�1Dy0�y�1

pT [ha;y(�0; 0)�

pT~haT (�0; 0)]

+[Dy0�y�1 ~Dy]�1R(�eyr )[R(�eyr )

0(Dy0�y�1Dy)�1R(�eyr )]�1

�R(�eyr )0(Dy0�y�1Dy)�1Dy0�y�1pT [ha;y(�0; 0)�

pT~haT (�0; 0)] + op(1)

= [Dy0�y�1 ~Dy]�1R(�eyr )[R(�eyr )

0(Dy0�y�1Dy)�1R(�eyr )]�1

�R(�eyr )0(Dy0�y�1Dy)�1Dy0�y�1pT [ha;y(�0; 0)�

pT~haT (�0; 0)] + op(1):

Hence

S� = (T

k2)hy(�eyr )

0�y�1Dy(Dy0�y�1Dy)�1Dy0�y�1hy(�eyr )

=pT=k2(h

a;y(�0; 0)� ~haT (�0; 0))0�y�1Dy(Dy0�y�1Dy)�1

�[R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )]�1R(�eyr )

0(Dy0�y�1Dy)�1[R(�eyr )[R(�eyr )

0(Dy0�y�1Dy)�1R(�eyr )]�1

�R(�eyr )0(Dy0�y�1Dy)�1]Dy0�y�1pT=k2(h

a;?(�0; 0)� ~haT (�0; 0)) + oB(1)

=pT=k2(h

a;y(�0; 0)� ~haT (�0; 0))0�y�1Dy(Dy0�y�1Dy)�1[R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )]

�1


a;y(�0; 0)� ~haT (�0; 0)) + oB(1)

as the result follows as in the proof of Theorem 2.4.Now we consider the Dy statistic:

Dy = (T

k2)[hy(�eyr )

0�y�1hy(�eyr )� gy(�ey)0 ~y�1gy(�ey)

= (T

k2)[ha;y(�eyr ; 0)

0�y�1ha;y(�eyr ; 0)� ha;y(�ey; y)0~�y�1ha;y(�ey; y)]:

Expanding ha;y(�eyr ; 0) around (�ey; y) yields

ha;y(�eyr ; 0) = ha;y(�ey; y)

+ �Dy��eyr � �ey� y

�= ha;y(�ey; y) + �Dy[

��eyr � �er

0


�];

[55]

where �Dy = D( ��) and �� lies in a line joining �eyr and �ey: Thus

(T

k2)ha;y(�eyr ; 0)

0�y�1ha;y(�eyr ; 0)

= (T

k2)[ha;y(�ey; y) + �Dy[

��eyr � �er

0


�]]0�y�1

�[ha;y(�ey; y) + �Dy[

��eyr � �er

0


�]]

= (T

k2)ha;y(�ey; y)0�y�1ha;y(�y; y)

+(T

k2)2f �Dy[

��eyr � �er

0


�]g0�y�1ha;y(�y; y)

+(T

k2)f �Dy[

��eyr � �er

0


�]g0�y�1f �Dy[

��eyr � �er

0


�g:

Now notice that by the �rst order conditions of the bootstrapped GMM problem we havepT �Dy0�y�1ha;y(�ey; y) =

pT ( �Dy � Dy)0�y�1ha;y(�ey; y)

= oB(1)[pT ha;y(�y; y]:

Now note that by two Taylor expansions

ha;y(�ey; y) = ha;y(�e; ) +...Dy��ey � �e y �

�= ha;y(�0; 0) + �Dy

��e � �0

�+...Dy��ey � �e y �

�= ha;y(�0; 0)� ~ha(�0; 0) + ~ha(�0; 0)

+ ~Dy��e � �0

�+...Dy��ey � �e y �

�= OB(1=

pT );

where �Dy = Dy( ��) and �� is in a line joining �e and �0 and...Dy= Dy(

...� ) and

...� is in a line joining �ey and �e: Thusp

T �Dy0�y�1ha;y(�y; y) = oB(1) and

(T

k2)f �Dy[

��eyr � �er

0


�]g0�y�1ha;y(�ey; y) = oB(1)

as �eyr � �er = OB(1=pT ) and �ey � �er = OB(1=

pT ):

Additionally notice that

(T

k2)fha;y(�ey; y)0�y�1ha;y(�ey; y)� ha;y(�ey; y)0~�y�1ha;y(�ey; y)g

= (T

k2)fha;y(�ey; y)0(�y�1 � ~�y�1)ha;y(�ey; y)g = oB(1);

as �y�1 � ~�y�1 = oB(1) the result follows as we proved that ha;y(�ey; y) = OB(1=pT ):

Also sT

k2�Dyf��eyr � �er

0


�g = �

sT

k2D(

��ey � �e y

��eyr � �er

0

�) + oB(1):

Thus

�

sT

k2D(


��eyr � �r

0

�) = �

sT

k2D[Dy0�y�1 ~Dy]�1[R(�eyr )

�[R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )]�1


a;y(�0; 0)� ~haT (�0; 0))+oB(1);

as pT=k2


�= �[Dy0�y�1 �Dy]�1Dy0�y�1[

sT

k2ha;y(�0; 0)�

pT~haT (�0; 0)];

[56]


�1R(�eyr )0(Dy0�y�1Dy)�1]

�Dy0�y�1pT [ha;y(�0; 0)�

pT~haT (�0; 0)] + op(1):

and the facts that Dy = D+oB(1) by Lemma A.7 also R(�eyr ) = R+oB(1); by continuity of R(�) and �

y�1 = ��1+op(1):Thus

T

k2f �Dy[

��eyr � �er

0


�]g0�y�1f �Dy[

��eyr � �er

0


�g

=T

k2[pT=k2(h

a;y(�0; 0)� ~haT (�0; 0))0�y�1Dy(Dy0�y�1Dy)�1R(�eyr )[R(�eyr )

0(Dy0�y�1Dy)�1R(�eyr )]�1

�R(�eyr )0[Dy0�y�1 ~Dy]�1D0�y�1D[Dy0�y�1 ~Dy]�1

�[R(�eyr )[R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )]�1R(�eyr )

0(Dy0�y�1Dy)�1]

�Dy0�y�1pT=k2(h

a;y(�0; 0)� haT (�0; 0))

=T

k2[pT=k2(h

a;y(�0; 0)� ~haT (�0; 0))0�y�1Dy(Dy0�y�1Dy)�1R(�eyr )[R(�eyr )

0(Dy0�y�1Dy)�1R(�eyr )]�1


a;y(�0; 0)� ~haT (�0; 0)) + oB(1);

and the result follows as in the proof of Theorem 2.4.

A ReferencesAllen, J., Gregory, A. W. and K. Shimotsu, (2011): \Empirical likelihood block bootstrapping," Journal of Econometrics,

vol. 161(2), 110-121.

Andrews, D.W.K (1991): \Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation," Econo-metrica, vol. 59(3), 817-58.

Andrews, D.W.K., (2002): \Equivalence of the Higher Order Asymptotic E�ciency of k-step and Extremum Statistics,"Econometric Theory, vol. 18(05), 1040-1085.

Andrews, D.W.K. and Monahan, J.C. (1992):\An Improved Heteroskedasticity and Autocorrelation Consistent Covari-ance Matrix Estimator," Econometrica, 60, 1992, 953|966.

Bravo F. and F. Crudu (2011): \E�cient Bootstrap with Weakly Dependent Processes," forthcoming in ComputationalStatistics and Data Analysis.

Brown, B.W. and W.K. Newey, (2002): \Generalized Method of Moments, E�cient Bootstrapping, and Improved Infer-ence," Journal of Business and Economic Statistics, vol. 20(4), 507-17.

Burnside, C. and M.S. Eichenbaum, (1996): \Small-Sample Properties of GMM-Based Wald Tests," Journal of Businessand Economic Statistics, vol. 14(3), 294-308.

Carlstein, E. (1986): \The use of subseries values for estimating the variance of a general statistic from a stationarysequence," Annals of Statistics, 14, 1171-1179.

Cattaneo,M.D.; Crump, R. K. and Jansson, M. (2010): \Bootstrapping Density-Weighted Average Derivatives," CRE-ATES Research Papers 2010-23, School of Economics and Management, University of Aarhus.

Christiano, L. J. and den Haan, W.J., (1996):. \Small-Sample Properties of GMM for Business-Cycle Analysis," Journalof Business & Economic Statistics, American Statistical Association, vol. 14(3), pages 309-27, July

D'Antona, G. and A. Ferrero (2005): Digital Signal Processing for Measurement Systems: Theory and Applications,Springer, New York, U.S.A..

Davidson, J.E.H. (1994): Stochastic Limit Theory. Oxford: Oxford University Press.

Davidson, R. and J. G. MacKinnon, (1999): \Bootstrap Testing in Nonlinear Models," International Economic Review,,vol. 40(2), 487-508.

Efron B. (1979). \Bootstrap Methods: Another Look at the Jackknife," Annals of Statistics, 7 (1), 1{26.

Gallant, R. (1987): Nonlinear Statistical Models, John Wiley and Sons, New York,

Gon�calves S. and H. White (2004): \Maximum likelihood and the bootstrap for nonlinear dynamic models," Journal ofEconometrics, vol. 119(1), 199-219.

Hahn, J. (1996): \A Note on Bootstrapping Generalized Method of Moments Estimators," Econometric Theory, vol.12(01), 187-197.

Hall, A. (2005): Generalized Method of Moments, New York (NY): Oxford University Press.

Hall, P. (1992). The Bootstrap and Edgeworth Expansion, Springer-Verlag New York Inc.

Hall, P. and J. L. Horowitz (1996): \Bootstrap Critical Values for Tests Based on Generalized Method of MomentsEstimators," Econometrica, 891-916.

[57]

Hansen, L. P., (1982): \Large Sample Properties of Generalized Method of Moments Estimators," Econometrica, vol.50(4), 1029-1054.

Inoue, A. and M. Shintani (2006): \Bootstrapping GMM estimators for time series," Journal of Econometrics, vol.133(2), 531-555.

K�unsch, H. (1989): \The jacknife and the bootstrap for general stationary observations," Annals of Statistics, 17, 1217{1241.

Liu, R. and K. Singh, (1992): \Moving blocks jacknife and bootstrap capture weak dependence," in Exploring the Limitsof Bootstrap. Wiley, New York, eds. Lepage, R. and Billard, L., 225{248.

Newey, W.K. and D. McFadden (1994): \Large Sample Estimation and Hypothesis Testing," Handbook of Econometrics,Volume 4 R.F. Engle and D.L. McFadden (eds.), 2111-2245.

Newey, W.K. and K.D.West, (1987): \A Simple, Positive Semi-de�nite, Heteroskedasticity and Autocorrelation ConsistentCovariance Matrix," Econometrica, 55, 703-08.

Newey, W. K, and K.D.West, (1994): \Automatic Lag Selection in Covariance Matrix Estimation," Review of EconomicStudies, vol. 61(4), 631-653.

Ng, S. and Perron, P. (1996): \The Exact Error in Estimating the Spectral Density at the Origin," Journal of TimeSeries Analysis 17, 379{408.

Paparoditis, E. and D. N. Politis (2001): \Tapered Block Bootstrap," Biometrika, vol. 88, No. 4, 1105-1119.

Parente, P.M.D.C., and Smith, R.J. (2018a): \Kernel Block Bootstrap," CWP 48/18, Centre for Microdata Methods andPractice, U.C.L and I.F.S.

Parente, P.M.D.C. and R.J. Smith (2018b): \Quasi-Maximum Likelihood and The Kernel Block Bootstrap for NonlinearDynamic Models," working paper.

Politis, D. and Romano, J. (1992b): \A general resampling scheme for triangular arrays of �-mixing random variableswith application to the problem of spectral density estimation," Annals of Statistics, 20, 1985-2007.

Politis, D. N. and Romano, J. P. (1995): \Bias-corrected nonparametric spectral estimation". Journal of Time SeriesAnalysis 16, 67{103.

Pollard, D. (1991): \Asymptotics for Least Absolute Deviation Regression Estimators," Econometric Theory, Vol 7,186-199.

Ramalho, J.J.S., and R.J. Smith (2011): \Goodness of Fit Tests for Moment Conditions Models," working paper, Uni-versidade de Evora.

Rao, C. (2002): Linear Statistical Inference and its Applications, Wiley.

Rao, C. R. and S. K. Mitra, (1971). Generalized inverse of matrices and its applications. New York: Wiley:

Royden H.L. (1988): Real analysis, 3ed., Macmillan.

Ruud, P. (2000): An Introduction to Classical Econometric Theory, Oxford University Press.

Ser ing, R. (2002): Approximation Theorems of Mathematical Statistics, New York: Wiley.

Smith, R.J., (1997): \Alternative Semi-parametric Likelihood Approaches to Generalised Method of Moments Estima-tion," Economic Journal, vol. 107(441), 503-19.

Smith. R.J. (2011): \GEL Criteria for Moment Condition Models," Econometric Theory, 27, 1192-1235.

Tauchen G. (1985): \Diagnostic Testing and Evaluation of Maximum Likelihood Models," Journal of Econometrics, 30,415-443.

White, H., (1984): Asymptotic theory for econometricians. Academic Press.

White, H., (1999): Asymptotic theory for econometricians. Academic Press, 2nd ed.

White H. and X. Chen (1996): \Laws of Large Numbers for Hilbert Space-Valued Mixingales With Applications,"Econometric Theory, 12, 284-304 (1996).

Wooldridge,J. (1994): \Estimation and Inference for Dependent Processes," in Handbook of Econometrics, Volume 4 R.F.Engle and D.L. McFadden (eds.), 2639-2738.

[58]

generalised empirical likelihood kernel block bootstrapping · this article unveils how the kernel...

Documents