generalised empirical likelihood kernel block bootstrapping · this article unveils how the kernel...
TRANSCRIPT
REM WORKING PAPER SERIES
Generalised Empirical Likelihood Kernel Block Bootstrapping
Paulo M.D.C. Parente, Richard J. Smith
REM Working Paper 055-2018
November 2018
REM – Research in Economics and Mathematics Rua Miguel Lúpi 20,
1249-078 Lisboa, Portugal
ISSN 2184-108X
Any opinions expressed are those of the authors and not those of REM. Short, up to two paragraphs can be cited provided that full credit is given to the authors.
Generalised Empirical Likelihood
Kernel Block Bootstrapping
Paulo M.D.C. ParenteISEG- Lisbon School of Economics & Management, Universidade de Lisboa
REM - Research in Economics and Mathematics;CEMAPRE- Centro de Matem�atica Aplicada �a Previs~ao e Decis~ao Econ�omica.
Richard J. Smithcemmap, U.C.L and I.F.S.
Faculty of Economics, University of CambridgeDepartment of Economics, University of MelbourneONS Economic Statistics Centre of Excellence
This Draft: October 2018
Abstract
This article unveils how the kernel block bootstrap method of Parente and Smith (2018a,2018b)can be applied to make inferences on parameters of models de�ned through moment restrictions.Bootstrap procedures that resort to generalised empirical likelihood implied probabilities to drawobservations are also introduced. We prove the �rst-order asymptotic validity of bootstrapped teststatistics for overidentifying moment restrictions, parametric restrictions and additional momentrestrictions. Resampling methods based on such probabilities were shown to be e�cient by Brownand Newey (2002). A set of simulation experiments reveals that the statistical tests based on theproposed bootstrap methods perform better than those that rely on �rst-order asymptotic theory.
JEL Classi�cation: C14, C15, C32Keywords: Bootstrap; heteroskedastic and autocorrelation consistent inference; Generalised Method
of Moments; Generalised Empirical Likelihood
1 Introduction
The objective of this article is to propose new bootstrap methods for models de�ned through moment
restrictions in the time-series context using a novel bootstrap method introduced recently by Parente and
Smith (2018a, 2018b). Simultaneously, we amend some of the existent results in the related literature.
The generalized method of moments (GMM) estimator of Hansen (1982) has become one of the most
popular tools in econometrics due to its applicability in di�erent and varied situations. It can be used,
for instance to estimate parameters of interest under endogeneity and measurement error. Consequently,
the richness of the set of inferential statistics provided by GMM may be extremely useful to economists
doing empirical work. These statistics allow to test for overidentifying moment conditions, parametric
restrictions and additional moment conditions.
The performance of statistics based on GMM has been revealed to be poor in �nite samples and
this situation worsens in time-series data due to the presence of autocorrelation [see Newey and West
(1994), Burnside and Eichenbaum (1996), Christiano and den Haan (1996) among others]. To tackle
this problem several alternative approaches have been proposed in the literature, being the bootstrap
among the methods that has produced better results. The bootstrap is a resampling method introduced
by Efron (1979) to make inferences on parameters of interest. It can be used not only to approximate
the (asymptotic) distribution of an estimator or statistic, but also to estimate its variance. From the
practical standpoint it has the bene�t of not requiring the application of complicated formulae and from
the theoretical viewpoint it allows to obtain asymptotic re�nements when the statistic of interest is
smooth and asymptotically pivotal.
Bootstrap methods in the context of moment restrictions have been introduced previously by Hahn
(1996) and Brown and Newey (2002) for random samples and Hall and Horowitz (1996), Andrews
(2002), Inoue and Shintani (2006), Allen, et al. (2011) and Bravo and Crudu (2011) for dependent data.
This literature can be divided in two strands.
Hahn (1996) proves consistency of the i.i.d. bootstrap distribution for GMM, but he did not
consider bootstrapped test statistics based on GMM. Hall and Horowitz (1996), Andrews (2002) and
Inoue and Shintani (2006) propose the use of the standard moving blocks bootstrap applied to GMM.
A second line of research is followed by Brown and Newey (2002), Allen, et al. (2011) and Bravo and
[1]
Crudu (2011) who use empirical likelihood and generalised empirical likelihood implied probabilities to
draw observations or blocks of data.
Hall and Horowitz (1996) suggested applying the non-overlapping blocks bootstrap method of Carl-
stein (1986) to GMM after centering the bootstrap moment restrictions at their sample means. They
prove that this method yields asymptotic re�nements not only for the bootstrapped J statistic of Hansen
(1982), but also for the bootstrapped t statistic for testing a single parametric restriction. Andrews (2002)
extends Hall and Horowitz (1996) method to overlapping moving blocks bootstrap of K�unsch (1989) and
Liu and Singh (1992) and the k-step bootstrap of Davidson and Mackinnon (1999). However, Hall and
Horowitz (1996) and Andrews (2002) require uncorrelateness of the moment indicators after a certain
number of lags. This assumption is relaxed by Inoue and Shintani (2006) in the special case of linear
models estimated using instruments.
Brown and Newey (2002) in the i.i.d. setting mention, though without a formal proof, that the
same improvements can be obtained, by using a method that they denominate empirical likelihood (EL)
bootstrap. The EL bootstrap consists in �rst computing the empirical likelihood implied probabilities
associated with each observation under a set of moment restrictions and using these probabilities to
draw each observation in order to construct the bootstrap samples. Although Brown and Newey (2002)
did not prove the asymptotic validity of the method, they showed heuristically that it is e�cient in
the sense that the di�erence between the �nite sample distribution of a statistic and its EL bootstrap
counterpart is asymptotically normal (after a proper scaling) with minimum variance. Recently the EL
bootstrap method was extended to the time series context by Allen, et al. (2011) and Bravo and Crudu
(2011) using a MBB procedure. Both articles suggest �rst computing implied probabilities for blocks
of observations and use these probabilities to draw blocks in order to construct the bootstrap samples.
There are some di�erences between these two articles. Firstly, while Allen, et al. (2011) consider EL
implied probabilities, Bravo and Crudu (2011) use the generalised empirical likelihood (GEL) implied
probabilities of Smith (2011). Secondly, Allen, et al. (2011) propose using both non-overlapping blocks
and overlapping blocks whereas Bravo and Crudu (2011) only study the latter. Thirdly, Allen, et al.
(2011) investigate the �rst order validity of the method for general GMM estimators and Bravo and
Crudu (2011) consider only the e�cient GMM estimator. Both articles address the �rst-order asymp-
totic behaviour of bootstrapped J statistic and bootstrapped Wald (W) statistics tests for parametric
[2]
restrictions. Finally, in the case of tests of parametric restrictions, Bravo and Crudu (2011), additionally,
propose drawing bootstrap samples based on the GEL implied probabilities computed under the null
hypothesis and the moment restrictions and put forward the bootstrapped Lagrange multiplier (LM)
and distance (D) statistics in this framework.
In this article we also consider a time-series setting, but depart from the dominant paradigm of
using bootstrap methods based on moving blocks and introduce an alternative to these resampling
schemes based on the kernel block bootstrap (KBB) method of Parente and Smith (2018a, 2018b). The
KBB method consists in transforming the data using weighted moving averages of all observations and
drawing bootstrap samples with replacement from the transformed sample. This method is akin to the
Tapered Block Bootstrap (TBB) method of Paparoditis and Politis (2001) in that if the kernel chosen
is of bounded support the KBB method can be seen as a variant of TBB that allows the inclusion of
incomplete blocks. However, KBB can be implemented also using kernels with unbounded support.
In the case of the sample mean and for a particular choice of the kernel with unbounded support it
allows to obtain a bootstrap variance estimator that is asymptotically equivalent to the quasi-spectral
estimator of the long run variance which Andrews (1991) proved to be optimal. Additionally, the
technical assumptions required by Paparoditis and Politis (2001) to prove the asymptotic validity of
TBB are not satis�ed by truncated kernels that are non-monotonic in the positive quadrant such as
the ap-top cosine windows described in D'Antona and Ferrero (2006, p.40), while KBB can be applied
using this kernel. We note however that both TBB and KBB allow the most popular truncated kernels
to be used, such as the rectangular, Bartlett and Tuckey-Hanning.
We use the new method to approximate the asymptotic distribution of the J statistic of Hansen
(1982) that allows to test for the overidentifying moment restrictions, and the trinity of test statis-
tics (Wald, Lagrange multiplier and distance statistics, cf. Newey and McFadden 1994, section 9 and
Ruud,2000, chapter 22) that permit testing parametric restrictions and additional moment conditions.
We show that the �rst order validity of the bootstrap test for overidentifying conditions GMM estimator
does not require prior centering of the bootstrap moments, this centering can be done a posteriori.
In the spirit of Brown and Newey (2002), we propose additionally to use the GEL implied probability
associated with each transformed observation [Smith, 2011] to construct the bootstrap sample. We prove
the �rst order validity of the method and corresponding test statistics. As Allen et al. (2011) and Bravo
[3]
and Crudu (2011) we prove the �rst order validity of the bootstrapped distribution of the estimator and
the bootstrapped J statistic, and tests for parametric restrictions and additional moment conditions.
We show in this article that the proof of consistency of the EL block bootstrap of Allen, et al. (2011)
is in error in that when applied to the ine�cient GMM estimator the bootstrap distribution of the
latter has to be centered at the e�cient GMM estimator. Hence the results stated in their Theorem 1
and 2 are invalid in general, though they hold if the weighting matrix is a consistent estimator of the
inverse of the covariance matrix of the moment indicators [cf. Theorem 1 of Bravo and Crudu (2010).]
Although our proof of this results applies only to the new bootstrap methods introduced in this article,
the demonstration for EL block bootstrapping is analogous.
When testing for parametric restrictions and additional moment conditions the GEL implied proba-
bilities can be computed under the null or under the maintained hypothesis. Hence, two types of KBB
bootstrap methods can be used, one using the GEL implied probabilities computed under the main-
tained hypothesis as in Brown and Newey (2002) and Allen et al. (2011) and another based on these
probabilities computed under the null as suggest in the case of parametric restrictions by Bravo and
Crudu (2011). This article investigates these two types of bootstrap methods. We note that Allen, et
al. (2011) in the case of EL block bootstrap actually do not present the formula of the bootstrapped
Wald statistic, though their Theorem 3, which is based on theirs incorrect Theorems 1 and 2, refers
to it. On the other hand, the formula for this statistic presented in Bravo and Crudu (2011) is only
valid if the implied probabilities were computed under the maintained hypothesis and not under the
null hypothesis, though it is presented jointly with the LM and D statistics which are obtained with
the implied probabilities computed under the null. We show that the trinity of tests statistics can be
computed using implied probabilities obtained under the null and under the maintained hypothesis and
that they have di�erent mathematical expressions depending on the resampling scheme chosen.
This paper is organized as follows. In the �rst section we introduce the KBB-method for moment
restrictions. In section 2 we summarize some important results on GMM and GEL in the time-series
context. The KBB method is brie y explained in section 3. In section 4 we present the �rst order
asymptotic theory on the KBB methods computed using the following di�erent probabilities to draw
observations: uniform (standard non-parametric KBB method), the implied probabilities associated
with the moment restrictions and the implied probabilities associated with the maintained hypothesis,
[4]
parametric restrictions and additional moment conditions. In section 5 we present a Monte Carlo study
in which we investigate the performance of the proposed bootstrap methods in �nite samples. Finally
section 6 concludes. The proofs of the results are given in the Appendix.
2 Framework
Let zt, (t = 1; :::; T ) denote observations on a �nite dimensional (strictly) stationary process fztg1t=1 :We
assume initially that the process is ergodic, but later we will require the stronger condition of mixing.
Consider the moment indicator g(zt; �); an m�vector functions of the data observation zt and the p-
vector � of unknown parameters which are the object of inferential interest, where m � p. It is assumed
that the true parameter vector �0 uniquely satis�es the moment condition
E[g(zt; �0)] = 0;
where E[�] denotes expectation taken with respect to the unknown distribution of zt.
2.1 The Generalized Method of Moments estimator
2.1.1 The Estimator
For notational simplicity we de�ne gt(�) � g(zt; �), (t = 1; :::; T ), and g(�) �PT
s=1 gt(�)=T; let also
Gt(�) � @gt(�)=@�0; (t = 1; :::; T ), G � E[Gt(�0)] and � limn!1 var[
pT g(�0)]. Denote W a sym-
metric weighting matrix that converges in probability to a non-random matrix W: The GMM estimator
is de�ned as
� = argmin�2B
Q(�);
QT (�) = g(�)0W g(�):
Hansen (1982) showed that under some regularity conditions �p! �0 and
pT (� � �0)
d! N(0; avar(�));
wherep! and
d! denote convergence in probability and distribution respectively and
avar(�) = (G0WG)�1G0WWG(G0WG)�1:
Denote � � (G0�1G)�1 and G (�) =PT
i=1 Gt (�) =T; G = G(�): Hansen (1982) proved also that the
most e�cient GMM estimator �e is obtained when we set W = �1 and in this case avar(�e) = �:
[5]
We consider the following regularity conditions that are su�cient to prove consistency.
Assumption 2.1 (i) The observed data are realizations of a stochastic process z � fzt : ! Rn; n 2 N; t = 1; 2; :::g
on the complete probability space (;F ; P ) where = �1t=1Rk and F = B(�1t=1 Rn) (the Borel ���eld
generated by the measure �nite dimension product cylinders); (ii) zt is stationary and ergodic ; (iii)
g(:; �) is Borel measurable for each � 2 B; g(zt; �) is continuous on B for each zt 2 Z, (iv) E[sup�2B kg(zt; �)k] <
1; (v) E[g(zt; �)] is continuous on B; (vi) E[g(zt; �)] = 0 only for � = �0; (vii) B is compact. (viii)
W =W + op(1) and W is a positive semi-de�nite de�nite matrix.
The following theorem corresponds to Theorem 3.1 of Hall (2005, p.68)
Theorem 2.1 Under assumption 2.1 � = �0 + op(1):
The assumptions 2.2 ensure that the estimator asymptotically normal distributed.1
Assumption 2.2 (i) fzt;�1 < t <1g is a strong mixing process with mixing coe�cients of size
�r=(r � 2); r > 2; E[kg(zt; �0)kr] < 1; r � 2; (ii) Gt(�) exists and is continuous on B for each
zt 2 Z (iii) rank(G) = p; (iv) E[sup�2N kGt(�)k] <1; where N is a neighborhood of �0:
The following Theorem is proven in Hansen (1982, Theorem 3.1) or Hall (2005, p. 71).
Theorem 2.2 Under assumption 2.1 and 2.2
pT (� � �0)
d! N(0; avar(�));
where avar(�) = (G0WG)�1G0WWG(G0WG)�1:
To obtain an e�cient estimator we need to estimate : Numerous estimators for have been proposed
in the literature under di�erent assumptions [see White (1984), Newey and West (1987), Gallant (1987),
Andrews (1991), Ng and Perron (1996).] Let = + op(1); the e�cient two-step GMM estimator is
de�ned as
�e = argmin�2B
~Q(�);
~QT (�) = g(�)0�1g(�):
1These assumptions are di�erent from those stated in Hansen (1982), but facilitate comparisions with the assumptionsmade later in the paper for GEL and KBB.
[6]
Overidenti�cation tests Consider the hypothesis H0 : E[gt(�0)] = 0 vs H1 : E[gt(�0)] 6= 0: Hansen
(1982) proposed the J statistic to test this hypothesis which is de�ned as
J = ng(�e)0�1g(�e);
where is a consistent estimator of : Hansen (1982, Lemma 4.2) proved the following Theorem:
Theorem 2.3 Under assumption 2.1 and 2.2 and if m > p; J d! �2(m� p):
Speci�cation Tests Here we consider tests for the null hypothesis
H0 : a(�0) = 0; E[q(zt; �0)] = 0;
where q(zt; �0) is a s�vector of moment indicators and a(�) is a r�vector of constraints. The alternative
H1 is a(�0) 6= 0 and/or E[q(zt; �0)] 6= 0:
In the context of GMM, test statistics for parametric restrictions were proposed by Newey and West
(1987) and for additional moment restrictions by Newey (1985), Eichebaum et al. (1988) and Ruud
(2000) [see also Smith (1997) for tests based on GEL.]
In order to introduce these statistics de�ne h(zt; �) � (g(zt; �)0; q(zt; �)0)0; qt(�) � q(zt; �); ht(�) �
h(zt; �) (t = 1; :::; T ), h (�) �PT
t=1 ht(�)=T; q (�) �PT
t=1 qt(�)=T: Let also � � limT!1 var[pT h(�0)];
�12 � limn!1 E[Pn
i=1 gt(�0)qt(�0)0=pT ] and �22 � limn!1 E[
pnq(�0)
0]: Denote � a consistent estima-
tor of � and let �12 and �22 be the submatrices of � that consistently estimate �12 and �22 respectively.
Let also
R(�) ��A(�) 0r�s0s�r Is
�;
where A(�) � @a(�)=@�0 (a r � p matrix). The restricted e�cient GMM estimator is de�ned as
�er = argmin�2Br
�QT (�);
�QT (�) = h(�)0��1h(�);
where Br = f� 2 B : a (�) = 0g : Let � q(�e)� �21��111 g(�e), r � (a(�e)0; 0)0 and R � R(�e): De�ne
also Qt(�) � @qt(�)=@�0; Q (�) �
PTi=1 Qt(�)=T and Q � E[@qt(�0)=@�
0]: Let � (D0��1D)�1 and
� (D0��1D)�1where
D =
�G 0m�sQ �Is
�, D (�) =
�G (�) 0m�sQ (�) �Is
�;
[7]
and D = D(�e):
We consider the following versions of the Wald, score and distance statistics
W = r0(RR0)�1r;
S = T h(�er)0��1DD0��1h(�er);
D = T [h(�er)0��1h(�er)� g(�e)0�1g(�e)]:
The results of Newey and West (1987), Newey (1985), Eichebaum et al. (1988) and Ruud (2000)
are summarized in the following Theorem which is proven in the Appendix for completeness.
We require the following additional assumptions to hold
Assumption 2.3 (i) �0 is the unique solution of E[ht(�)] = 0 and a(�) = 0; (ii) q(:; �) is Borel
measurable for each � 2 B and qt(�) is continuous in � for each zt 2 Z (iii) a(�) is twice continuously
di�erentiable on B; (iv) E[kq(zt; �0)kr] < 1; r � 2; (v) Qt(�) exists and is continuous on B for each
zt 2 Z; (vi) rank(Q) = s; (vii) E[sup�2N kQt(�)k] <1; (viii) � is non-singular and � = � + op(1):
Theorem 2.4 unveils the asymptotic distribution of the trinity of the test statistics.
Theorem 2.4 Under assumptions 2.1, 2.2 and 2.3 the statistics W; S and D are asymptotically equiv-
alent and converge in distribution to �2(s+ r):
2.1.2 Generalised Empirical Likelihood
In this section we review the e�cient GEL estimator for time-series proposed by Smith (2011). Consider
the smoothed moments
gtT (�) =1
ST
Xt�1
s=t�Tk(s
ST)gt(�); t = 1; :::; T;
where the kernel function k(�) satis�esR +1�1 k(a)da = 1; ST is a bandwidth parameter. De�ne k2 �R +1
�1 k(a)2da:
Let �(�) be a function that is concave on its domain V , an open interval containing zero. It is
convenient to impose a normalization on �(�). Let �j(�) = @j�(�)=@vj and �j = �j(0), (j = 0; 1; 2; :::).
We normalize this function so that �1 = �2 = �1. The GEL criteria for weakly dependent data was
de�ned by Smith (2011) as
PT (�; �) =XT
t=1[� (k�0gtT (�))� �0]=T;
[8]
where k = 1=k2: The GEL estimator is
�gel = argmin�2B
sup�2�T
PT (�; �) ;
where �T is de�ned below in Assumption 2.8. Let � (�) = arg sup�2�T
PT (�; �) ; � � �(�gel) and GtT (�) �
@gtT (�)=@�0 :
Smith (2011) de�ned the implied probabilities as
�t(�) =�1(k� (�)
0gtT (�))PT
t=1 �1(k� (�)0gtT (�))
; t = 1; :::; T:
Smith (2011) required the following assumptions to hold.
Assumption 2.4 The �nite dimensional stochastic process fztg1t=1 is stationary and strong mixing with
mixing coe�cients � of size �3v=(v � 1) for some v > 1.
Remark 2.1 The mixing coe�cient condition in Assumption 2.4 guarantees thatP1
j=1 j2�(j)(v�1)=v <
1 is satis�ed, see Andrews (1991, p.824), a condition required for the results in Smith (2011).
Assumption 2.5 (i) ST ! 1; ST =T 1=2 ! 0; (ii) k(:) : R ! [�kmax; kmax]:kmax < 1; k(0) 6=
0; k1 6= 0 and is continuous at zero at almost everywhere; (iii)R1�1
�k(x)dx < 1 where �k(x) =
I(x � 0) supy�x jk(y)j + I(x < 0) supy�x jk(y)j; (iv) jK(�)j � 0 for all � 2 R, where K(�) =
(2�)�1Zk(x) exp(�ix�)dx.
Assumption 2.6 T !1, ST = O(T 1=2��) for some � 2 (0; 1=2);
Assumption 2.7 (i) �0 2 B is the unique solution of E[gt (�)] = 0; (ii) B is compact; (iii) gt (�) is
continuous at each � 2 B; (iv) E[sup�2B kgt (�)k�] <1 for some � > max(4v; 1=�); (v) (�) is �nite
and p.d. for all � 2 B.
Assumption 2.8 (i) � (�) is twice di�erentiable and concave on its domain an open interval V con-
taining zero, �1 = �2 = �1; (ii) � 2 �T ; where �T =�� : k�k � D(T=S2T )��
for some D > 0 with
1=2 > � > 1=(2��):
Theorem 2.5 is proven in Smith (2011).
Theorem 2.5 If Assumptions 2.4, 2.6, 2.7 and 2.8 are satis�ed �p! �0 and �
p! 0: Moreover, � =
Op[(T=S2T )�1=2] and
gT (�) = Op(T�1=2):[9]
Let H � �G0�1 and P � �1��1G�G0�1. The proof of asymptotic normality of Smith (2011)
also required the following assumptions.
Assumption 2.9 (i) �0 2 int (B) ; (ii) g(�; �) is di�erentiable in a neighborhood N of �0 and E[sup�2N kGt (�)k�=(��1)
] <
1; (iii) rank(G) = p:
Smith (2011) proved the following theorem.
Theorem 2.6 If Assumptions 2.4, 2.6, 2.7, 2.8 and 2.9 are satis�ed�T 1=2(�gel � �0)T 1=2�=ST
�p! N(0;diag(�; P )):
3 The kernel block bootstrap method
The idea behind the KBB method is to replace the original sample by a transformed sample and apply the
i.i.d. bootstrap to the latter. To be more precise consider a sample of T observations, (X1; :::; XT ), on
the zero mean �nite dimensional stationary and strong mixing stochastic process fXtg1t=1 with E[Xt] = 0.
Let �X =PT
t=1Xt=T . De�ne the transformed variables
YtT =1
ST
Xt�1
s=t�Tk(s
ST)Xt�s; (t = 1; :::; T );
where ST is a bandwidth parameter and k(�) is a kernel function standardized such thatZ 1
�1k(v)dv = 1.
The standard KBB method consists in applying the non-parametric bootstrap for i.i.d. data using
the transformed sample (Y1T ; :::; YTT ) obtaining a bootstrap sample of size mT = T=ST , that is each
bootstrap observation is drawn from (Y1T ; :::; YTT ) with equal probability 1=T: The asymptotic validity
of the method was proven by Parente and Smith (2018a, 2018b).
In this article we modify the original method in that each observation is drawn with probability
P[Y �jT = YtT ] = ptT ; for j = 1; :::;mT and t = 1; :::; T where ptT can depend on the data and satisfy
0 � ptT � 1 andPT
t=1 ptT = 1. The standard KBB method of Parente and Smith (2018a, 2018b) is
obtained with ptT = 1=T for for j = 1; :::;mT and t = 1; :::; T: Let ~Y =PT
t=1 ptTYtT :
In order to prove that the bootstrap distribution ofpT ( �Y �� ~Y ) is close to the asymptotic distribution
of T 1=2 �X as T goes to in�nite; we required the following assumptions taken from Parente and Smith
(2018a, 2018b).
[10]
Assumption 3.1 The �nite dimensional stochastic process fXtg1t=1 is stationary and strong mixing
with mixing coe�cients � of size �3v=(v � 1) for some v > 1.
Assumption 3.2 (i) mT = T=ST , ST !1, ST = O(T 1=2��) for some � 2 (0; 1=2); (ii) E[jXtj�] <1;
for some � > max(4v; 1=�), (iii) �21 � limT!1 var[T1=2 �X] is �nite.
Assumption 3.3 (i) 0 � ptT � 1;PT
t=1 ptT = 1; max1�t�T jTptT j = op(1); (ii)pT ~Y = Op(1):
Similarly to Gon�calves and White (2004) P denotes the probability measure of the original time
series and P� that induced by the bootstrap method. For a bootstrap statistic ��T we write ��T ! 0
prob-P�, prob-P if for any " > 0 and any � > 0, limT!1 PfP�fj��T j > "g > �g = 0. We also use
measures of magnitude of bootstrapped sequences as de�ned by Hanh (1997). Let ��T = O!p (aT ) if �
�T .
when conditioned on ! is Op(aT ) and ��T = o
!p (aT ) if �
�T . when conditioned on ! is op(aT ). We write
��T = OB(1) if, for a given subsequence fT 0g there exists a further subsequence fT 00g such that O!p (1).
Similarly we write ��T = oB(1) if, for a given subsequence fT 0g there exists a further subsequence fT 00g
such that o!p (1).
The Theorem 3.1 shows that the bootstrap distribution ofpT=k2( �Y
� � ~Y ) is uniformly close to the
asymptotic distribution of T 1=2 �X:
Theorem 3.1 Under Assumptions 3.1-3.3 and 2.5, if E[Xt] = 0
limT!1
P�supx2R
���P�fpT=k2( �Y � � ~Y ) � xg � PfT 1=2 �X � xg��� � "� = 0;
where k2 =
Z 1
�1k2(v)dv:
The GEL- KBB method is obtained when ptT = �t; where �t = �t(�gel):
Lemma 3.1 Assumption 3.3 is satis�ed if ptT = �t:
4 Kernel block bootstrap methods for GMM
4.1 The standard KBB method
Consider a bootstrap sample of size mT ; fg�tT (�)gmT
t=1 ; drawn from fgtT (�)gTt=1 and let W �T = WT +
oB(1); where W�T is positive semi-de�nite matrix: De�ne also g
�T (�) =
PmT
s=1 g�sT (�)=mT and
Q�T (�) = g�T (�)
0W �T g
�T (�):
[11]
To prove consistency we require the Assumption 4.1.
Assumption 4.1 (i) The observed data are realizations of a stochastic process z � fzt : ! Rn; n 2 N; t = 1; 2; :::g
on the complete probability space (;F ; P ) where = �1t=1Rn and F = B(�1t=1 Rn) (the Borel ���eld
generated by the measure �nite dimension product cylinders); (ii) zt is stationary and ergodic ; (iii)
g : Rl � B ! R is measurable for each � 2 B, B a compact subset of Rp, and g(zt; �) is continu-
ous; (iv) E[g(zt; �)] = 0 only for � = �0; (v) WT = W + op(1) and is a positive de�nite matrix,
W �T = WT + oB(1) (vi) E[sup�2B kg(zt; �)k
�] < 1 for some � � 1; (vii) T 1=�=mT = o(1); where
mT !1:
Theorem 4.1 shows that the GMM bootstrap estimator is consistent.
Theorem 4.1 Under assumption 4.1 �� � � ! 0, prob-P�, prob-P.
To prove the consistency of the bootstrap distribution of the GMM estimator we require assumption
4.2 to be satis�ed.
Assumption 4.2 (i) The (k � 1) random vectors fzt;�1 < t <1g form a strictly stationary and
mixing with mixing coe�cients of size �3v=(v � 1) for some v > 1; (ii) �0 2 int(B); (iii) g(zt:�) is
continuously di�erentiable in a neighborhood N of � with probability approaching one; (iv) E(g(z; �0)) =
0 and E[kg(z; �0)k�] is �nite for for some � > max(4v; 1=�); (v) E[sup�2N k@g(z; �)=@�0ka] < 1 for
some a > 2=(1+ 2�); (vi) G0WG is nonsingular and exists and is positive de�nite (vii) mT = T=ST :
Theorem 4.2 demonstrates the consistency of the KBB distribution of the GMM estimator.
Theorem 4.2 Under Assumptions 2.5, 4.1 and 4.2,
limT!1
P(supx2Rk
�����P�frT
k2(�� � �) � xg � PfT 1=2(� � �0) � xg
����� � ")= 0:
4.1.1 Bootstrap Estimation of
Hansen (1982) showed that the most e�cient estimator is obtained if one sets W = �1: We now show
how to obtain consistent estimator for using the bootstrap: Let �( ~��) � STPmT
t=1 g�t (~��)g�t (
~��)0= (mT k2)
where ~�� is a bootstrap estimator of �0 such thatpT ( ~�� � �0) = OB(1):
Assumption 4.3 is going to be required.
[12]
Assumption 4.3 E[sup�2N k@g(z; �)=@�0k2�=(��1)
] <1:
The desired result is given by Lemma 4.1
Lemma 4.1 Under assumptions 2.5, 4.2 (i), (iii), (iv), (vi), (vii) and 4.3 and ifpT ( ~����0) = OB(1)
we have
limT!1
P[P�[����( ~��)� ��� > "] > �] = 0:
4.1.2 Testing for overidentifying restrictions
Let � = +oB(1); and let �e� be the bootstrap GMM estimator obtained with W �
T = ��1 and de�ne
J � =T
k2[g�(�e�)� g(�e)]0��1[g�(�e�)� g(�e)]:
The following Theorem proves the validity of the KBB- J test for overidentifying restrictions.
Theorem 4.3 Under Assumptions 2.5, 4.1 and 4.2,
limT!1
P�supx2R
jP�fJ � � xg � PfJ � xgj � "�= 0:
4.1.3 Bootstrap tests for parametric restrictions and additional moment conditions.
In this subsection we propose bootstrap versions of the tests for parametric restrictions and additional
moment conditions. Let
htT (�) =1
ST
t�1Xs=t�T
k(s
ST)ht(�); t = 1; :::; T
and consider a bootstrap sample of size mT ; fh�tT (�)gmT
t=1 ,drawn from fhtT (�)gTt=1 : Let
~� = +oB(1)
and �T = �+ oB(1): De�ne also h�T (�) =
PmT
s=1 h�sT (�)=mT ;
�Q�T (�) = h�T (�)
0���1T h�T (�);
and
�e�r = argmin�2Br
Q�h;T (�):
Let � = q�(�e�)���21���111 g�(�e�), r� = ((a(�e�)0; �0)0 and R� = R(�e�): Additionally, denote Q�t (�) �
@q�t (�)=@�0; Q� (�) �
PTi=1 Q
�t (�)=T;
� � (D�0���1D�)�1; where
D� (�) =
�G� (�) 0m�sQ� (�) �Is
�;
[13]
and D� = D�(�): We consider the following bootstrapped statistics
W� = (T
k2)[r� � r]0[R��R�0]�1[r� � r];
S� = (T
k2)[h�(�e�r )� h(�er)]0���1D�D�0���1[h�(�e�r )� h(�er)];
D� = (T
k2)([h�(�e�r )� h(�er)]0���1[h�(�e�r )� h(�er)]
�[g�(�e�)� g(�e)]0��1[g�(�e�)� g(�e)]):
Hall and Horowitz (1996) considered t-statistics for tests on a single parameter for GMM using MBB
and consequently these statistics seem to be new in the literature.
In order to show that the bootstrap distributions of these statistics are close to its asymptotic
distributions the following assumptions are required.
Assumption 4.4 (i) �0 is the unique solution of E[ht(�)] = 0 and r(�) = 0; E[kh(z; �0)k�] is �nite;
(ii) qt(�) is continuous in � for each zt 2 Z; (iii) r(�) is twice continuously di�erentiable on B; (iv)
@q(z; �)=@�0 exists and is continuous on B for each zt 2 Z (v) rank(Q) = s; (vi) E[sup�2N k@q(z; �)=@�0ka] <
1; (vii) � exists and is positive de�nite and � = � + op(1):
Theorem 4.4 reveals that under Assumption 4.4 the bootstrapped trinity of test statistics is consistent
to the asymptotic distributions of the statistics.
Theorem 4.4 Under Assumptions 2.5, 4.1, 4.2,4.4
limT!1
P�supx2R
jP�fW� � xg � PfW � xgj � "�
= 0;
limT!1
P�supx2R
jP�fS� � xg � PfS � xgj � "�
= 0;
limT!1
P�supx2R
jP�fD� � xg � PfD � xgj � "�
= 0:
Moreover, W�; S� and D� are asymptotically equivalent.
4.2 The generalised empirical likelihood kernel block bootstrap method
4.2.1 An e�cient GMM estimator
In this sub-section we introduce a GMM-type estimator that is e�cient and plays an important role in
establishing the consistency of the kernel block bootstrap distribution to the asymptotic distribution of
[14]
the GMM estimator. We consider the objective function
~QT (�) = ~g (�)0WT ~g (�) ;
where ~gT (�) =PT
t=1 gt;T (�)�t: De�ne the GMM-type estimator is de�ned as
~� = argmin�2B
~QT (�):
where WTp!W and W is a positive semi-de�nite de�nite matrix.
We characterize now the asymptotic properties of the new estimator. Theorem 4.5 shows that this
estimator is consistent for �0:
Theorem 4.5 Under Assumptions 2.4, 2.5, 2.6, 2.7 and 2.8 ~�p! �0:
Theorem 4.6 reveals that ~� is asymptotically equivalent to �e.
Theorem 4.6 Under Assumptions 2.4, 2.5, 2.6,2.7, 2.8 and 2.9
pT ( ~� � �0)�
pT (�e � �0)
p! 0;
pT ( ~� � �0)
D! N(0;�):
This theorem shows that no-matter the weighting matrixWT we choose, we always obtain a estimator
that is asymptotically equivalent to the e�cient two-step GMM estimator.
4.2.2 The bootstrap method
Let g?iT (�) ,i = 1; :::;mT be obtained by drawing observations from fgtT (�)gTt=1 where P(g?iT (�) =
gtT (�)) = �t; t = 1; :::; T: Denote g?T (�) =
PmT
i=1 g?iT (�)=mT : The generalised empirical likelihood kernel
block bootstrap estimator (GEL-KBB) �? is de�ned as follows. Let
�? = argmin�2B
g?T (�)0W ?
T g?T (�)
where W ?T =WT + oB(1):
Let P? be the bootstrap probability measure induced by the new resampling scheme.
Theorem 4.7 Under Assumption 2.4, 2.5, 2.6, 2.7, 2.8 and 4.1 �? � ~� ! 0 prob-P?, prob-P.
Assumption 4.5 E[sup�2N k@g(z; �)=@�0kl] < 1 for some l = max f�=(�� 1); 2=(1 + 2�) + "g ; for
some " > 0:
[15]
The following result shows consistency of the bootstrap estimator to the asymptotic distribution of
�:
Theorem 4.8 Under Assumptions 2.4, 2.5, 2.6, 2.8 2.9, 4.2 strengthen by 4.5 ,
limT!1
P(supx2Rp
�����P?frT
k2(�? � ~�) � xg � PfT 1=2(� � �0) � xg
����� � ")
= 0;
limT!1
P(supx2Rp
�����P?frT
k2(�? � �e) � xg � PfT 1=2(� � �0) � xg
����� � ")
= 0:
We note that �? is centered at the e�cient estimator ~�; not on the ine�cient �; though the bootstrap
distribution ofpT=k2(�
? � ~�) approximates the asymptotic distribution of the ine�cient estimator
T 1=2(���0): This result is not speci�c of the GEL-KBB method, it also holds for the empirical likelihood
moving blocks bootstrap of Allen et al. (2011) contradicting Theorems 1 and 2 of that article. Both
estimators only coincide if W = �1:
4.2.3 GEL-KBB Estimation of
Let ��? be a bootstrap estimator such thatpT ( ��? � �0) = OB(1): We now prove consistency of the
bootstrap estimator of under the GEL-KBB measure, which is given by
?( ��?) � STmT k2
XmT
t=1g?t (
��?)g?t (��?)0:
The consistency of ?( ��?) is proven in Lemma 4.2.
Lemma 4.2 Under Assumptions 2.4, 2.5, 2.6, 2.8 2.9, 4.2 strengthen by 4.3 ifpT ( ��? � �0) = OB(1)
we have
limT!1
P[P?[���?( ��?)� ��� > "] > �] = 0:
4.2.4 Testing for overidentifying restrictions
Let W ?T =
~?�1 where ~? = + oB(1) and de�ne �e? as the bootstrap GMM estimator computed with
W ?T =
?�1: corresponds to the e�cient estimator and let
J ? =T
k2g?(�e?)0 ~?�1g?(�e?):
Theorem 4.9 Under Assumptions 2.4, 2.5, 2.6, 2.8 2.9, 4.2 strengthen by 4.5
limT!1
P�supx2R
jP?fJ ? � xg � PfJ � xgj � "�= 0:
[16]
4.2.5 GEL-KBB tests for parametric restrictions and additional moment conditions underthe maintained hypothesis
In this subsection we propose bootstrap versions of the tests for parametric restrictions and additional
moment conditions. Consider a bootstrap sample of size mT ; fh?tT (�)gmT
t=1 ; drawn from fhtT (�)gTt=1
where P(h?jT (�) = htT (�)) = �t; t = 1; :::; T and j = 1; :::;mT . Let also �? = � + oB(1), h
?(�) =PmT
s=1 h?sT (�)=mT ; ~hT (�) =
PTt=1 ht;T (�)�t and ~q(�) =
PTt=1 qt;T (�)�t: Consider the objective function
�Q?T (�) = h?(�)0�?�1h?(�) and let
�e?r = argmin�2Br
�Q?T (�):
De�ne ? = q?(�e?)� �?21�?�111 g?(�e?), r? = ((a(�e?)0; ?0)0; ~ = ~q(�e), ~r = ((a(�e)0; ~ 0)0; R? = R(�e?):
Additionally, let Q?t (�) � @q?t (�)=@�0 and Q? (�) �PT
i=1 Q?t (�)=T :Denote also
? � (D?0�?�1D?)�1where
D? (�) =
�G? (�) 0m�sQ? (�) �Is
�;
and D? = D?(�):We consider the following bootstrapped statistics
W? = (T
k2)[r? � ~r]0[R??R?0]�1[r? � ~r];
S? = (T
k2)[h?(�e?r )� ~h(�er)]0�?�1D??D?0�?�1[h?(�e?r )� ~h(�er)];
D? = (T
k2)([h?(�e?r )� ~h(�er)]0�?�1[h?(�e?r )� ~h(�er)]� g?(�e?)0 ~?�1g?(�e?)):
The Wald statistic can be seen as a generalization of the bootstrapped Wald statistic of Allen at al.
(2011) and Bravo and Crudu (2011) for parametric restrictions. The remaining statistics seem to be
new in the bootstrap literature.
Theorem 4.10 proves consistency of the bootstrap distribution of the trinity of test statistics.
Theorem 4.10 Under Assumptions Under Assumptions 2.4, 2.5, 2.6, 2.8 2.9, 4.2 strengthen by 4.5
and 4.4
limT!1
P�supx2R
jP?fW? � xg � PfW � xgj � "�
= 0;
limT!1
P�supx2R
jP?fS? � xg � PfS � xgj � "�
= 0;
limT!1
P�supx2R
jP?fD? � xg � PfD � xgj � "�
= 0:
Moreover, W?; S? and D? are asymptotically equivalent.
[17]
4.2.6 GEL-KBB tests for parametric restrictions and additional moment conditions underthe null hypothesis
In this subsection we propose kernel block bootstrap versions of the tests for parametric restrictions
and additional moment conditions that impose the null hypothesis through the generalised empirical
likelihood implied probabilities similar to the method proposed by Bravo and Crudu (2011).
Before introducing the method we need to introduce the GEL criteria for weakly dependent data for
additional moments which is given by
�PT (�; ') =XT
t=1[� (k'0htT (�))� �0]=T;
where k = 1=k2: The GEL estimator is de�ned as
�r;gel = argmin�2Br
sup'2�T
�PT (�; ')
where �T is de�ned below in Assumption 4.7 de�ne also ' (�) = arg sup'2�T
PT (�; ') ; 'r � '(�r;gel):
Consider a bootstrap sample of size mT ,nhytT (�)
omT
t=1; drawn from fhtT (�)gTt=1 where P(h
yjT (�) =
htT (�)) = ~�t; t = 1; :::; T and j = 1; :::;mT where
~�t =�1('
0rhtT (�r;gel))PT
j=1 �1('0rhjT (�r;gel))
; t = 1; :::; T:
We consider the case that the bootstrap weighting matrix is W yT = �
y�1; where �y = �+ oB(1): De�ne
hyT (�) � 1mT
PmT
s=1 hysT (�);
�QyT (�) = hyT (�)
0�y�1hyT (�)and let
�ey = argmin�2B
�QyT (�); �eyr = argmin
�2Br�QyT (�):
De�ne y = qy(�ey) � �y21�y�111 g
y(�ey), ry = ((a(�ey)0; y0)0 and Ry = R(�y): Additionally, let us de�ne
Qyt(�) � @qyt (�)=@�
0; Qy (�) �PT
i=1 Qyt(�)=T : Denote also
y � (Dy0�y�1Dy)�1where
Dy (�) =
�Gy (�) 0m�sQy (�) �Is
�;
We consider the following bootstrapped statistics
Wy = (T
k2)ry0[RyyRy0]�1ry;
Sy = (T
k2)hy(�eyr )
0�y�1DyyDy0�y�1hy(�eyr );
Dy = (T
k2)[hy(�eyr )
0�y�1hy(�eyr )� gy(�ey)0 ~y�1gy(�ey)]:
[18]
where ~y = + oB(1):
Versions of the statistics Sy and Dy for moving blocks bootstrap and parametric restrictions were
introduced previously by Bravo and Crudu (2011). The statistic Wy is new.
In order to show that the bootstrap distributions of these statistics are close to its asymptotic
distributions the following assumptions are required.
Assumption 4.6 (i) �0 2 B is the unique solution of E[ht (�)] = 0; (ii) B is compact; (iii) ht (�) is
continuous at each � 2 B; (iv) E[sup�2B kht (�)k�] <1 for some � > max(4v; 1=�); (v) � (�) is �nite
and p.d. for all � 2 B:
Assumption 4.7 ' 2 �T ; where �T =�' : k'k � D(T=S2T )��
; for some D > 0 with 1=2 > � >
1=(2��):
Assumption 4.8 (i) �0 2 int (B) ; (ii) h(�; �) is di�erentiable in a neighborhood N of �0 and E[sup�2N kHt (�)kl] <
1 where l = max f�=(�� 1); 2=(1 + 2�) + "g ; (iii) rank(H) = p+ q:
Theorem 4.11 demonstrates that the bootstrapped Wald, score and distance statistics are asymptot-
ically valid.
Theorem 4.11 Under Assumptions 2.5, 4.6, 4.7, 4.8, 4.2, 4.4
limT!1
P�supx2R
��PyfWy � xg � PfW � xg�� � "� = 0;
limT!1
P�supx2R
��PyfSy � xg � PfS � xg�� � "� = 0;
limT!1
P�supx2R
��PyfDy � xg � PfD � xg�� � "� = 0:
Moreover, Wy; Sy and Dy are asymptotically equivalent.
5 Monte Carlo Study
In this section we present a simulation study in which we investigate the small sample properties of
the proposed bootstrap methods. The model used in our study is a version of an asset-pricing model
considered in the Monte Carlo study of Hall and Horowitz (1996). The moment restrictions of this model
[19]
are
Efexp[�s � �0(x+ z) + 3z]� 1g = 0;
Efz exp[�s � �0(x+ z) + 3z]� zg = 0
where �0 = 3 , �s = �9s2=2, and x and z are scalars. The random variable x has distribution normal
with mean zero and variance s2; with s = 0:2 or 0:4. z is independent of x, has a marginal distribution
normal with zero mean and variance s2; and is either sampled independently from this distribution or
follows an AR(1) process with �rst-order serial correlation coe�cient �z = 0:75.
We evaluate the performance of Hansen (1982)'s J test and the symmetrical t tests for the null
hypothesis H0 : �0 = 3 with asymptotic and bootstrap critical values. The J statistic is computed using
the two-step GMM estimator in which the weighting matrix used in the �rst step is the identity matrix.
In the second step the long-run variance of the moment indicators is computed using the Newey-West
estimator (Newey and West, 1987).2
We obtain the bootstrap critical values for the J -tests and t-tests using the standard moving blocks
bootstrap, the kernel blocks bootstrap (KBB) based on di�erent kernel functions and the versions of
these methods based on the Empirical Likelihood (EL) implied probabilities. KBB is computed using the
truncated kernel (KKBtr), the Bartlett Kernel (KKBbt), the kernel that induces the quadratic-spectral
kernel (KKBqs) [see Smith (2011)] and the kernel version of the optimal taper of Paparoditis and Politis
(2001) (KKBpp). The EL implied probabilities are computed imposing the moment restrictions in the
sample: In the tables of results we use the superscript el to denote the results obtained with the bootrap
method based on the implied probabilities. Although the methods were computed for the case that there
is dependence in the data, we also apply the same method in the case that there is no dependence.3
In order to investigate whether the methods proposed are sensitive to the choice of the band-
2We also computed a two step GMM estimator in which the long-run variance of the moment indicators is estimatedusing the Andrews (1991) estimator based on the Quadratic Spectral kernel. These results are available upon request.Additionally, we investigated the performance of the tests based on the J -statistic in which the long run variance of themoment indicators was estimated using the approach of Andrews and Monahan (1992) which requires pre-whitened series.The results obtained were not satisfactory in the Monte Carlo design considered and consequently are not presented.
3The quasi-Newton algorithm of MATLAB is used to compute GMM and EL hence ensuring a local optimum. The Newtonmethod is used to locate �(�) for given � which is required for the pro�le EL objective function. EL computation requiressome care since the EL criterion involves the logarithm function which is unde�ned for negative arguments; this di�cultyis avoided by employing the approach due to Owen in which logarithms are replaced by a function that is logarithmic forarguments larger than a small positive constant and quadratic below that threshold. See Owen (2001, (12.3), p.235); Notehowever, that this method might produce estimates that lie outside the convex hull of the data. In our study the worstcase in which this problem occurred a�ected 1% of the replications and corresponded to the case n = 50; s = 0:2 andthe truncated kernel was used. In all the remaining designs the problem only occurred in less or equal than 0:6% of thereplications. Hence our results are not considerably a�ected by this issue.
[20]
width/block size we compute these parameters using two methods: the automatic bandwidth of Andrews
(1991) based on an AR(1) model and a non-parametric version of the Andrews (1991) method based
on a taper proposed by Romano and Politis (1995). These methods to compute the bandwidth were
applied to the residuals obtained in the �rst step of the GMM problem [see Parente and Smith (2018b),
section 4.3, for details]. Additionally, given that the computed automatic bandwidth ST might induce
values of mT = dT=ST e larger than T or equal to 1; where d�e is the ceiling function, we replace ST by
S�T = max fST ; 1g and mT by m�T = max
nlT=S�T
m; 2o: Consequently we have 2 � m�
T � T:
We can �nd in the literature di�erent bootstrap symmetric t-tests. Hall (1992, see sections 3.5, 3.6
and 3.12) considers the two-sided symmetric percentile t-test, the two sided equal-tailed t-test. Here
we report only the results on the former method because it provided the best results in our study.
Additionally, because our objective is to compare the performance of several di�erent bootstrap tests
we present, for succinctness, only the results on computed using the 5% nominal level.4
Table 1 reports the empirical rejection rates of the Hansen (1982)'s J test. The results obtained
reveal that the J test based on asymptotic critical values are slightly undersized for s = 0:2 and they
become to some extent oversized for s = 0:4: Note that in the latter case the rejection frequencies do
not get closer to the nominal size when the sample size increases from 50 to 100.5 The tests based on
standard KBB and MBB critical values are considerably undersized. The tests based on the empirical
likelihood versions of the bootstrap methods although are undersized for s = 0:2; yield empirical rejection
rates closer to the nominal size for s = 0:4:
Table 2 presents the results on the t-tests for the hypothesis H0 : �0 = 3. The empirical rejection
rates of the t-tests based on the asymptotic critical values are considerably larger than the nominal
rate. On the other hand, the performance of the t-tests based on the critical values obtained with MBB
and KBB are noticeably better than those based on the asymptotic critical values. However, the t-tests
based on the taper of Paparoditis and Politis (2000) are undersized. The empirical-likelihood versions
of these t-tests, in general are slightly oversized, apart from the case in which the kernel version of the
taper of Paparoditis and Politis (2001).
Overall the results obtained with both methods to compute the automatic bandwidth are very similar
4The results on 1% and 10% nominal level were also computed and are available upon request.5Note that these results are di�erent to those reported by Hall and Horowitz (1995), specially in the case s = 0:4,
though they computed the GMM estimator using a di�erent weighting matrix.
[21]
Table 1: Empirical rejection rates of the J-tests with asymptotic and bootstrap critical values at 5%level
Politis and Romano Andrewsn 50 100 50 100�z 0 0:75 0 0:75 0 0:75 0 0:75s 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4
asymp 2:4 7:9 3:8 5:6 3:1 9:2 2:9 7:5 3:3 8:0 3:5 7:0 3:6 8:8 4:5 7:7
kbbtr 0:3 1:3 0:6 1:5 0:3 2:4 0:7 2:6 0:5 2:1 0:6 1:9 0:7 2:8 1:1 2:1kbbeltr 1:2 5:2 1:9 3:1 2:4 7:1 2:4 5:9 1:6 5:0 1:6 3:8 2:8 7:2 3:4 5:7
kbbbt 0:1 0:4 0:1 0:6 0:5 0:8 0:6 1:1 0:3 0:5 0:3 0:8 0:7 0:9 0:8 1:2kbbelbt 0:8 4:4 1:3 2:9 1:5 6:4 1:7 5:2 0:7 4:6 1:3 3:7 2:2 6:9 2:5 4:7
kbbpp 0:0 0:0 0:1 0:0 0:2 0:3 0:5 0:6 0:2 0:2 0:2 0:4 0:6 0:4 0:7 0:7kbbelpp 0:5 3:6 1:8 2:6 1:0 5:1 1:5 4:1 0:6 3:6 1:5 3:3 1:3 5:4 2:1 3:6
kbbqs 0:1 0:2 0:1 0:3 0:3 0:8 0:7 0:9 0:3 0:1 0:2 0:8 0:7 0:6 0:8 1:2kbbelqs 0:6 3:3 1:5 2:0 1:5 6:3 1:7 4:6 1:0 3:9 1:3 2:9 1:9 6:1 2:7 3:9
mbbbt 0:2 0:1 0:2 0:1 0:3 0:5 0:5 0:7 0:4 0:2 0:3 0:6 0:8 0:6 0:7 0:9mbbel 0:6 4:0 1:3 2:4 1:2 6:2 1:6 4:5 0:5 3:9 1:1 3:3 1:6 6:0 2:3 4:0
which may indicate that the proposed methods are robust to the choice of this parameter.
6 Conclusion
In this article we put forward new bootstrap methods for models de�ned through moment restrictions
for time series data that build on the kernel block bootstrap method of Parente and Smith (2018a,
2018b). These methods approximate the asymptotic distributions of tests for overidentifying conditions,
parametric restrictions and additional moment restrictions. We consider methods that impose the null
hypothesis, methods that impose the maintained hypothesis and methods that do not impose any restric-
tion in the way the bootstrap samples are generated. We prove the �rst-order validity of the methods
generalizing and correcting the work of Allen et al. (2011) and Bravo and Crudu (2011). A simulation
study reveals that the proposed methods perform well in practice.
Appendix: Proofs
Throughout the Appendix, C and � will denote generic positive constants that may be di�erent in di�erent uses, andC, M, and T the Chebyshev, Markov, and triangle inequalities respectively. We use the same notation of Gon�calves andWhite (2004). For a bootstrap statistic W �
T (:; !) we write W�T (:; !) ! 0 prob � P�; prob � P if for any " > 0 and any
� > 0, limT!1 P[P�T;! [jW �T (�; !)j > "] > �] = 0:
A.1 Proofs of the results in subsection 2.1.1Proof of Theorem 2.4: As Tauchen (1985) and Ruud (2000) we recast the test for H0 as a test for parametric restrictionsqat (�; ) � qt(�)� and construct the moment indicators hat (�; ) � (gt(�)0; qat (�; )
0): Under the null hypothesis = 0;
a (�0) = 0 thus we have the model E(hat (�0; 0)) = 0 and a (�0) = 0: De�ne � = (�0; 0)0 and ha(�) =
PTt=1 h
at (�; )=T:
[22]
Table 2: Empirical rejection rates of the t-tests with asymptotic and bootstrap critical values at 5% levelPolitis and Romano Andrews
n 50 100 50 100�z 0 0:75 0 0:75 0 0:75 0 0:75s 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4 0:2 0:4
asymp 24:3 25:8 22:2 25:7 18:3 20:4 19:5 19:9 22:3 26:0 20:0 25:9 18:9 20:1 17:4 20:0
kbbtr 4:4 6:6 6:0 7:9 4:6 6:1 5:9 6:6 4:4 7:1 4:7 6:6 5:5 6:3 5:8 6:9kbbeltr 6:9 8:7 7:5 9:3 6:6 7:8 7:3 7:8 6:5 8:4 6:2 8:2 7:1 6:9 6:8 7:5
kbbbt 4:1 4:4 4:8 6:5 3:6 3:9 5:1 4:5 3:9 4:2 4:1 5:2 3:8 3:8 4:3 5:2kbbelbt 6:0 6:4 6:8 8:5 5:5 5:7 6:9 6:0 6:5 5:3 5:7 6:9 7:6 4:9 6:2 5:8
kbbpp 2:9 2:5 3:4 4:1 2:8 2:5 3:3 3:2 2:9 2:9 2:9 4:0 3:0 2:3 3:0 3:1kbbelpp 4:4 4:0 4:5 5:5 4:0 4:3 5:2 4:3 4:4 3:1 3:7 5:1 6:0 3:1 4:6 3:7
kbbqs 4:4 4:5 5:1 6:4 3:6 4:3 5:2 4:8 4:0 4:8 4:3 5:4 4:0 4:2 4:2 5:1kbbelqs 6:6 6:6 6:8 8:9 5:8 6:4 7:0 7:0 6:9 6:3 6:2 7:9 7:7 5:8 6:6 7:0
mbb 3:6 3:3 4:4 5:2 3:0 2:9 4:6 4:0 3:6 3:5 3:9 4:6 3:4 3:2 3:8 4:1mbbel 5:8 5:3 5:6 7:4 5:1 5:0 6:6 4:9 5:7 4:2 5:5 5:9 6:7 4:6 6:1 5:2
De�ne r (�) = (a (�)0 ; 0) and the unrestricted GMM objective function
Qa(�) = ha(�)0��1ha(�):
Consider the GMM estimator�e = argmin
�2�Qa(�):
As pointed out by Ruud (2000, p. 574-575) the sub-vectors of � are
�e = argmin�2B
g(�)0g(�);
= q(�)� �21��111 g(�):
We note that by Theorem 2.1 �e = �0 + op(1) also as � = � + op(1) and �11 is invertible we have by a UWL that = op(1) as E(hat (�0; 0)) = 0 under the regularity conditions of the Theorem 2.1. and
pT (�e � �0)
d! N(0;�)
by Theorem 2.2 as �0 2 int(B) and 0 2 int(R) = R where � = (D0��1D).Furthermore using the usual arguments based on �rst order conditions we have
pT
��e � �0
�= �[D0��1D]�1D0��1
pT ha(�0; 0) + op(1):
Thus by a Taylor expansion we have under H0
pT
�a(�e)
�= �R(��)[D0��1D]�1D0��1
pT ha(�0; 0) + op(1)
= �R[D0��1D]�1D0��1pT ha(�0; 0) + op(1)
where �� is in a line between (�e0; 0)0 and 0. Hence
W = T
�a(�e)
�0 hR(D0��1D)�1R
i�1 � a(�e)
�=
pT ha(�0; 0)
0KpT ha(�0; 0) + op(1);
as D = D + op(1); � = � + op(1), R = R+ op(1),pT h(�0; 0) = Op(1) and where
K � ��1D[D0��1D]�1R0�R(D0��1D)�1R
��1R[D0��1D]�1D0��1:
Note that �K�K� = �K� and tr(K�) = s + r:Thus by Theorem 9.2.1 of Rao and Mitra(1971) It follows that
W d!�2(r + s):We consider now the LM statistic
LM = T h(�er)0��1D(D0��1D)�1D0��1h(�er):
[23]
Note that the restricted GMM estimator solves
�er = argmin�2�r
ha(�)0��1ha(�);
where �r = f( 0; �0) 2 � : a(�) = 0; = 0g : We note that since � is compact, �r is compact. Note that �er = (�e0r ; 00)0
and �er is consistent by Theorem 2.1.We derive now the distribution of the restricted estimator. The Lagrangian is
L = ha(�)0��1ha(�)� �0r(�)
and the �rst order conditions are
D0r�
�1ha(�er)�R(�er)� = 0;
r(�er) = 0;
where Dr = D(�er): Thus by the usual arguments we havepT (�er � �0) = �
��� �R0(R�R
��1R�)D0��1
pT ha(�0; 0) + op(1):
where � = [D0��1D]�1: By a Taylor expansionpT ha(�er) =
pT ha(�0; 0) +D
pT (�er � �0)
= [Im+s ��D�D0��1 �D�R0(R�R
��1R�D0��1]
pT h(�0; 0) + op(1): (A.1)
Thus
LM = T ha(�er ; 0)0��1D(D0��1D)�1D0��1h(�er ; 0)
= T ha(�er)0��1D(D0��1D)�1D0��1ha(�er)
=pT ha(�0; 0)
0KpT ha(�0; 0) + op(1)
as D = D + op(1) and � = � + op(1): Thus LM is asymptotically equivalent to W.Now we consider the distance statistic
D = T [h(�er)0~��1h(�er)� g(�e)0�1g(�e)]
= T [ha(�er)0~��1ha(�er)� ha(�e)0��1ha(�e)]:
It follows from replacingpT ha(�er) by (A:1) and
pT ha(�e) by
pT ha(�e) =
pT ha(�0; 0) +D
pT (�e � �0) + op(1)
=pT ha(�0; 0)�D[D0��1D]�1D0��1
pT ha(�0; 0) + op(1)
= [Im+s �D[D0��1D]�1D0��1]pT ha(�0; 0) + op(1):
Thus aspT ha(�0; 0) = Op(1) we have
D =pT ha(�0; 0)
0KpT ha(�0; 0) + op(1)
and the result follows.
A.2 Auxiliary results on Generalised Empirical Likelihood
A.2.1 Unrestricted models
The following Lemma corresponds to a version of Lemma A.1 of Ramalho and Smith (2011) for weakly dependent data.
Lemma A.1 If Assumptions 2.4, 2.6, 2.7 and 2.8 are satis�ed, then T �t = 1 + op(1) and
T 1=2(�t � 1=T ) =ST
Tg0tT
T 1=2
ST�(1=k2 + op(1)) +Op(
ST
T):
uniformly t = 1; :::; T:
Proof: The proof of the is contained in the proof of Theorem 3.1 of Smith (2004,p. A.11).
Let wtT =1ST
Pt�1s=t�T k(
sST)wt; t = 1; :::; T; ~w =
PTt=1 �twtT ; w =
PTt=1 wt=T:
Assumption A.1 (i) The random vectors f(wt; zt);�1 < t <1g form a strictly stationary and mixing with mixingcoe�cients of size �3v=(v � 1) for some v > 1; (ii) E[wt] = 0; E[kwtk�] < 1; for some � > max (4v; 1=�) and� = limT!1 var[T 1=2w] is �nite and p.d.
[24]
The following Lemma corresponds to a simpli�ed version of Theorem 3.1 of Smith (2011).
Lemma A.2 Under assumptions 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, and A.1
pT ~w = T�1=2
XT
t=1wt �B0PT 1=2g(�0) + op(1);
where B0 =P1s=�1 E[wtgt�s(�0)0]: Additionally if wt = g(zt; �0) we have
pT ~w = [G�G0�1]T 1=2g(�0) + op(1):
Proof: Note that by Lemma A.1
T 1=2 ~w = T 1=2XT
t=1�twtT
= T 1=2XT
t=1wtT =T +
XT
t=1[ST
Tg0tT
T 1=2
ST�(k + op(1)) +Op(
ST
T)]wtT
= T 1=2XT
t=1wtT +
ST
T
XT
t=1wtT g
0tT
T 1=2
ST�(k + op(1)) +
XT
t=1wtTOp(
ST
T)]:
Now by Smith (2011) Proof of Theorem 2.3 (see expression B.2, p A.11) we have
T 1=2
ST� = �T 1=2P gT (�0) + op(1):
Thus
T 1=2 ~w = T�1=2TXt=1
wtT +ST
T
TXt=1
wtT g0tT [�T 1=2P gT (�0) + op(1)](k + op(1))
+TXt=1
wtTOp(ST
T)];
where gT (�) =1T
PTt=1
1ST
Pt�1s=t�T k ((s=ST ) gt�s(�); and gtT = gtT (�): Now note that as in Lemma A.2 of Smith (2011)
we have
T�1=2TXt=1
wtT = T�1=2XT
t=1wt +Op(T
�1=2);
T 1=2gT (�) = T 1=2XT
t=1gt(�0) +Op(T
�1=2):
and
TXt=1
wtTOp(ST
T) = Op(
ST
T 1=2)[T�1=2
TXt=1
wt +Op(T�1=2)]
= op(1):
By similar arguments of the proof of Lemma A3 of Smith (2011) we have
ST
Tk
TXt=1
wtT g0tT = B0 + op(1);
where B0 =P1s=�1 E[wtgt�s(�0)0]:
pT ~w = T�1=2
TXt=1
wt �B0PT 1=2g(�0) + op(1):
Now note that if wt = g(zt; �0) we have T�1=2PTt=1 wt = T
1=2g(�0) and B0 = and hence
T�1=2TXt=1
wt �B0PT 1=2g(�0) = T 1=2g(�0)� [�1 � �1G�G0�1]T 1=2g(�0)
= G�G0�1T 1=2g(�0):
Proof of Theorem 4.5: Note that by CS��� ~QT (�)� Q(�)��� � k~g(�)� g(�)k2 kWT k :
[25]
Note that by Tsup�2B
k~g(�)� g(�)k � sup�2B
k~g(�)� E[g(zt; �]]k+ sup�2B
kg(�)� E[g(zt; �]k :
Also by a UWLsup�2B
kg(�)� E[g(zt; �]k = op(1):
Now
sup�2B
k~g(�)� E[g(zt; �]]k = sup�2B
k~g(�)� gT (�)k+ sup�2B
kg(�)� E[g(zt; �]k
� max1�t�T
jTptT � 1j sup�2B
kgT (�)k+ op(1)
= op(1);
by max1�t�T jTptT � 1j = 1 + oB(1) and an UWL. Hence��� ~QT (�)� Q(�)��� = op(1) as kWT �Wk = op(1). Thus the
result follows by Theorem 2.1.
Proof of Theorem 4.6: The �rst order criteria yieldpT ~G0TWT ~gT ( ~�) = 0 where ~GT � @~gT ( ~�)=@�
0: Hence by aTaylor expansion p
TG0TWT ~gT (�0) + G0TWT
�GTpT ( ~� � �0) = 0;
where �GT � @~gT ( ��)=@�0 where �� is in a line joining ~� and �0: Solving forpT ( ~� � �0) we obtain
pT ( ~� � �0) = �(G0TWT
�G�T )�1pTG0TWT ~gT (�0) (A.2)
= �(G0TWT�GT )
�1G0TWT f[G�G0�1]T 1=2g(�0) + op(1)g
by Lemma A.2. But by Lemma A.1 of Smith (2011) we have GT = G+op(1); �GT = G+op(1_): And sinceWT =W +op(1)and T 1=2g(�0) = Op(1): Thus
pT ( ~� � �0) = �(G0WG)�1G0WG�G0�1T 1=2g(�0) + op(1)
= �G0�1T 1=2g(�0) + op(1)
which corresponds to the asymptotic representation of the e�cient GMM estimator (see for instance Hall, 2005, p. 70 eq3.26 with WT =
�1)
A.2.2 Restricted models
For notational convenience we now de�ne the restricted GEL estimator in a slightly di�erent but equivalent manner towhat is done in sub-section 4.2.6. Let
�Pn (�; ') =1
T
XT
t=1[�(['0hatT (�)]=k2)� �0];
Pn (�; ') = �Pn�(�0; 00)0; '
�=1
T
XT
t=1[�(['0htT (�)]=k2)� �0];
~Pn (�; '; �) =1
T
XT
t=1[�(['0hatT (�) + �
0r(�)]=k2)� �0];
where � = (�0; 0)0 ; ha(zt; �) = (g(zt; �)0 ; q(zt; �)0 � 0)0; hatT (�) =
1ST
Xt�1
s=t�Tk( s
ST)ha(zt; �); t = 1; :::; T: Let �r =�
� = (�0; 0)0 : a(�) = 0; = 0; thus �r = Br � f0g : Let �T =
�' : k'k � D(T=S2T )��
:
Let('(�)0; �(�)0) = argmax
'2�T ;�2Rs~Pn (�; '; �) :
Note that '(�) can also be de�ned as'(�) = argmax
'2�T
�Pn (�; ') ; � 2 �r
and
�r = argmin�2�r
�Pn (�; '(�))
= argmin�2�r
~Pn (�; '(�); �(�))
and let 'r = '(�r); �r = �(�r):
We note that �r = S1�r where�r = argmin
�2Brsup'2�T
Pn (�; ')
and S1 is a matrix such that S1�r = (�0r; 00s�1)
0:The following Theorem provides a convenient asymptotic representation of the restricted GEL estimator and corre-
sponding Lagrange multiplier.
[26]
Theorem A.1 If Assumptions 2.4, 2.6, 4.6, 4.7 and 4.8 are satis�ed �rp! �0 and 'r
p! 0; �rp! 0: Moreover, k'rk =
Op[(T=S2T )�1=2]; k�rk = Op[(T=S2T )�1=2];
S1pT��r � �0
�= �[�� �R0
�R�R0
��1R�]D0��1
pT hT (�0) + op(1);
pT 'r
ST= �Pr
pT hT (�0) + op(1);
where Pr = ��1 � ��1DS1[�� �R0 [R�R0]�1R�]D0��1:
Proof: Let k = 1=k2: The �rst order conditions are
k1
T
XT
t=1�1(k('
0rhtT (�r)))h
atT (�r; 0) = 0; (A.3)
k1
T
XT
t=1�1(k('
0rhtT (�r)))r(�r; 0) = 0;
k1
T
XT
t=1�1(k('
0rhtT (�r)))(DtT (�r)
0'r +R(�r)0� = 0;
where DtT (�) = @hatT (�)=@�0 and R(�) = 0: Note that r(�r; 0) = 0: Similarly to Theorem 2.5 we have �r ! �0; 'r
p! 0
and k'rk = Op[(T=S2T )�1=2]: Thereforemax1�i�T
����1(k('0rhatT (�r; 0))) + 1��� p! 0: Thus
�k[1 + op(1)]1
T
XT
t=1DtT (�r; 0)
0'+R(�r; 0)0�r = 0
as 1T
PTt=1DtT (�r; 0)
0 p! D by a UWL and 'rp! 0 and we have
(R+ op(1))�r = op(1)
and consequently as rank(R) = r + s we must have �p! 0.
Note also that k'rk = Op[(T=S2T )�1=2] hence
(R+ op(1))�r = Op[(T=S2T )
�1=2]
and consequently �r = Op[(T=S2T )�1=2]:
Now a �rst order Taylor expansion of the lfs of (A:3) around 'r = 0 gives
�k 1T
XT
t=1htT (�r) +
1
T
XT
t=1�2(k( ~'
0rh
atT (�r; 0)))htT (�r)htT (�r)
0'r = 0; (A.4)
where ~'r is in a line joining 'r and 0: Now note that by a Taylor expansion we have
htT (�r) = htT (�0) +DtT ( ~�r)S1��r � �0
�; (A.5)
where ~�r lies in a line joining �r and �0: Replacing (A:5) in (A:4) yields
�k 1T
XT
t=1htT (�0)� k
1
T
XT
t=1DtT ( ~�r)S1
��r � �0
�+ST
T
XT
t=1�2(k( ~'
0rhtT (�r)))htT (�r)htT (�r)
0 'rST
= 0:
Now as �0r = Op(ST =pT ), 'r = Op(ST =
pT );
pT (�r � �0) = Op(1) (which is a consequence of the fact that hT (�r) = Op(T�1=2) by Theorem 2.2 of Smith (2011) and assumption (4:8)) and max1�i�T
����2(k('0rhtT (�r))) + 1��� wehave by a UWL and continuity of R(�)
D0pT
ST'r +R
0pT
ST�r = op(1)
pT hT (�0) +DS1
pT��r � �0
�+ �
pT 'r
ST= op(1);
RpTS1(�r � �0) = op(1):
Multiply the �rst equation by R�; where � = [D0��1D]�1; and solving for �r we obtainpT
ST�r = �
�R�R0
��1R�D0
pT
ST'r + op(1):
Replacing it in the �rst equation yields
D0pT
ST'r �R0
�R�R0
��1R�D0
pT
ST'r = op(1)
[27]
and multiplying both sides by � we have
�D0pT
ST'r � �R0
�R�R0
��1R�D0
pT
ST'r = op(1);
which is equivalent to
[�� �R0�R�R0
��1R�]D0
pT
ST'r = op(1):
Consider nowpT hT (�0) +DS1
pT��r � �0
�+ �
pT 'r
ST= op(1):
Multiplying both sides by [�� �R0 [R�R0]�1R�]D0��1 we obtain
[�� �R0�R�R0
��1]R�]D0��1
pT hT (�0)
+[�� �R0�R�R0
��1R�]D0��1DS1
pT��r � �0
�= op(1):
NowR�(D)0��1DS1
pT��r � �0
�= RS1
pT��r � �0
�= R
pT��r � �0
�= op(1):
Hence we have
S1pT��r � �0
�= �[�� �R0
�R�R0
��1R�]D0��1
pT hT (�0) + op(1): (A.6)
Additionally note thatpT hT (�0) +DS1
pT��r � �0
�+ �
pT 'r
ST= op(1);
and replacing (A:6) in this equation and solving forpT 'r=ST yields
pT 'r
ST= ���1[
pT hT (�0) +DS1
pT��r � �0
�]
= ���1[pT hT (�0)�DS1[�� �R0
�R�R0
��1R�](D)0��1
pT hT (�0):]
= [���1 + ��1DS1[�� �R0�R�R0
��1R�]D0��1]
pT hT (�0):
Let
~�t =�1(['0rhtT (�r)]=k2)PTt=1 �1(['
0rhtT (�r)]=k2)
; t = 1; :::; T:
Lemma A.3 If Assumptions 2.4, 2.6, 2.7 and 2.8 are satis�ed, then T ~�t = 1 + op(1) and
T 1=2(~�t � 1=T ) =ST
Th0tT
T 1=2
ST'r(1=k2 + op(1)) +Op(
ST
T):
uniformly t = 1; :::; T:
Proof: This is similar to the proof of Lemma A.1Let ~wr =
PTt=1 ~�twtT :
Lemma A.4 Under assumptions 2.4, 2.6, 4.6, 4.7 and 4.8 and A.1
pT ~wr = T
�1=2XT
t=1wt � J0PrT 1=2h(�0) + op(1);
where J0 =P1s=�1 E[wtht�s(�0)0]; Pr = ��1 � ��1DS1[� � �R0 [R�R0]�1R�]D0��1: Additionally if wt = h(zt; �0)
we have pT ~wr = DS1[�� �R0
�R�R0
��1R�]D0��1]
pT h(�0) + op(1):
Proof: Note that by Lemma A.3
T 1=2 ~wr = T 1=2XT
t=1~�twtT
= T 1=2XT
t=1wtT =T +
XT
t=1[ST
Th0tT
T 1=2
ST'r(k + op(1)) +Op(
ST
T)]wtT
= T 1=2XT
t=1wtT +
ST
T
XT
t=1wtT h
0tT
T 1=2
ST'r(k + op(1)) +
XT
t=1wtTOp(
ST
T)]:
[28]
Now by Theorem A.1 we haveT 1=2
ST'r = �T 1=2Prh(�0) + op(1):
Thus
T 1=2 ~wr = T�1=2XT
t=1wtT +
ST
T
XT
t=1wtT g
0tT [�T 1=2Prh(�0) + op(1)](k + op(1))
+XT
t=1wtTOp(
ST
T)];
where hT (�) =1T
PTt=1
1ST
Pt�1s=t�T k ((s=ST )ht�s(�); and htT = htT (�): Now note that as in Lemma A.2 of Smith
(2011) we have
T�1=2XT
t=1wtT = T�1=2
XT
t=1wt +Op(T
�1=2);
T 1=2hT (�) = T 1=2XT
t=1ht(�0) +Op(T
�1=2);
and XT
t=1wtTOp(
ST
T) = Op(
ST
T 1=2)[T�1=2
XT
t=1wt +Op(T
�1=2)]
= op(1):
By a similar argument arguments of the proof of Lemma A3 of Smith (2011) we have
ST
TkXT
t=1wtT h
0tT = J0 + op(1);
where J0 =P1s=�1 E[wtht�s(�0)0]: Thus
pT ~wr = T
�1=2XT
t=1wt � J0PrT 1=2h(�0) + op(1):
Now note that if wt = h(zt; �0) we have T�1=2PTt=1 wt = T
1=2h(�0) and J0 = � and hence
T�1=2XT
t=1wt � J0PrT 1=2h(�0) = DS1[�� �R0
�R�R0
��1R�]D0��1]
pT h(�0) + op(1):
A.3 Proofs of the results in sub-section 3 and auxiliary Lemmata on theweighted kernel block bootstrap method
In this Appendix we present bootstrap LLN, CLT and UWL that are required to prove the results.Proof of Theorem 3.1: Let
qtT � ~Y + (ST =k2)1=2(YtT � ~Y ); (t = 1; :::; T );
q�tT � ~Y + (ST =k2)1=2(Y �tT � ~Y ) (t = 1; :::;mT );
and
~q �XT
t=1qtT ptT =
XT
t=1~Y ptT + (ST =k2)
1=2XT
t=1(wtT � ~w)ptT
= ~Y ;
~q� �XmT
t=1q�tT =mT :
ThuspmT (~q
� � ~q) =pmT ((ST =k2)
1=2( �Y � � ~Y ))
=pT=k2( �Y
� � ~Y ):
and consequently
P�fpT=k2( �Y
� � ~Y ) � xg = P�fpmT (~q�T � ~q) � xg;
where �Y � = 1mT
PmTj=1 Y
�jT :The result is proven if we are able to show the following steps:. Step 1: �X
p! 0. Step 2:
T 1=2 �X=�1d! N(0; 1). Step 3: supx2R
��PfT 1=2 �X � xg � �(x=�1)��! 0, where �(�) is the c.d.f. of the standard normal
distribution. Step 4: mT var�[~qe�T ]
p! �21. Step 5:
limT!1
P(supx2R
�����P�fpmT (~q
� � ~qe)
var�[pmT ~q�]1=2
� xg � �(x)����� � "
)= 0:
[29]
Step 1: Follows from the ergodic theorem (Theorem 3.34 of White, 1999).Step 2: By White (1999, Theorem 5.20).Step 3: From Step 2 and the Polya Theorem, Ser ing (2002, p.18), as �(�) is a continuous c.d.f.Step 4: To prove this note that
E�(q�tT ) = E�( ~Y + (ST =k2)
1=2(Y �tT � ~Y )) = ~Y
and
var�(q�tT ) = var�( ~Y + (ST =k2)1=2(Y �tT � ~Y ))
= ST =k2var�(Y �tT )
=ST
k2
XT
t=1(YtT � ~Y )2ptT
=ST
k2
XT
t=1Y 2tT ptT �
ST
k2~Y 2
=ST
k2
1
T
XT
t=1Y 2tTTptT +Op(
ST
T)
=ST
k2
1
T
XT
t=1Y 2tT (1 + op(1)) +Op(
ST
T)
= �21 + op(1);
as max1�t�T
jTptT jp! 1 and the fact that ~Y = Op(
1pT) and Lemma A.3 of Smith (2011).
Step 5: Since the bootstrap sample observations are independent, we can apply Berry-Ess�een inequality. Thus
supx2R
�����P�fpmT (~q
� � ~q)
var�[pmT ~q�]1=2
� xg � �(x)����� � C
m1=2T
E�[(
��q�tT � ~qe��
var�[q�tT ]1=2
)3]
=C
m1=2T
var�[q�tT ]�3=2E�[jq�tT � ~qej3]:
Note that var�[q�tT ] = �21 + op(1) and that
E�[jq�tT � ~qj3] =XT
t=1jqtT � ~qj3 �t:
� maxtjqtT � ~qj
XT
t=1jqtT � ~qj2 �t:
Now
maxtjqtT � ~qj = O(S
1=2T )max
t
���YtT � ~Y���
= Op(S1=2T T 1=�)
by Lemma A.1 of Smith (2011) and M with � > max(4v; 1=�).Hence
E�[jq�tT � ~qj3] = Op(S1=2T T 1=�):
Thus
C
m1=2T
E�[jq�tT � ~qej3] = S1=2T T�1=2Op(S
1=2T T 1=�)
= O(STT1=��1=2) =
= O(T 1=���)op(1) (A.7)
since ST = O(T1=2��). Now as � > max(4v; 1=�) > 1=� we have 1=� < � and the result follows as var�[q�tT ] = �
21+op(1).
Assumption A.2 (a) E[jkXtkj4v ] <1; (b) �1 � limT!1 var[T 1=2 �X] is �nite and positive de�nite.
Theorem A.2 Let Assumptions 2.4, 2.5 and A.2 be satis�ed. If E[Xt] = 0 mT = T=ST , then
limT!1
P(supx2Rd
���P�fT 1=2 � �Y � � ~Y�� xg � PfT 1=2 �X � xg
��� � ") = 0:
[30]
Proof of Theorem A.2: Let qtT ; q�tT ; ~q and ~q
� de�ned as in the proof of Theorem 3.1. The result is proven if
we are able to show the following steps; cf. Politis and Romano (1992b, Proof of Theorem 2). Step 1: �Xp! 0. Step 2:
T 1=2��1=21 �X
d! N(0; 1). Step 3: supx2R
���PfT 1=2 �X � xg � �d(��1=21 x)
��� ! 0, where �d(�) is the c.d.f. of the standard
d-variate normal distribution. Step 4: Tvar�[~q�]p! �1. Step 5:
limT!1
P(supx2Rd
���P�fvar�[~q�]�1=2(~q� � ~q) � xg � �d(x)��� � ") = 0:
The proofs of results 1-4 are analogous to the proofs of results 1-4 in Theorem 3.1 As pointed out by Cattaneo et al.(2010) to prove 5 we need to show that
limT!1
P(sup�2�d
supx2R
�����P�fm1=2T
�0 (~q� � ~q)
var�[�0q�1T ]1=2
� xg � �(x)����� � "
)= 0:
where �d =�� 2 Rd : �0� = 1
: Let ��d =
�� 2 Rd : �0� � 1
and note that �d � ��d:
Given the sample, the bootstrap observations are independent. Hence we can apply Berry-Ess�een inequality. Thus
sup�2�d
supx2R
�����P�fm1=2T
�0~q� � �0~qvar�[�0q�1T ]
1=2� x]� �(x)
����� � sup�2�d
C
m1=2T
E�[(�0��q�1T � ~q
��var�[�0q�1T ]
1=2)3]
= sup�2�d
C
m1=2T
var�[�0q�1T ]�3=2E�[
���0q�1T � �0~q��3]� C
inf�2�d var�[�0q�1T ]
3=2sup�2��d
S1=2T
T 1=2E�[���0q�1T � �0~q��3]:
Now for �xed � we have
S1=2T
T 1=2E�[���0q�1T � �0~q��3] =
S1=2T
T 1=21
T
XT
t=1
���0qtT � �0~q��3= op(1)
as in A.7. Since ��d is a is compact and convex and since j:j3 is a convex function we can apply Pollard (1991,p.187) ConvexityLemma to strength pointwise convergence to uniform convergence and therefore sup�2�d S
1=2T T�1=2E�[
���0q�1T � �0~q��3] =op(1); using also the fact that E[sup�2�d j�
0Xtj4v ] � E[jkXtkj4v ] < � by CS.Additionally, by Lemma A.3 of Smith (2011) we have
inf�2�d
1
T
XT
t=1
���0qtT � �0~q��2 = inf�2�d
�0(1
T
XT
t=1(qtT � ~q)(qtT � ~q)0)�
= inf�2�d
�0�1�+ op(1) = inf�2�d
�0QPQ0�+ op(1)
= inf�2�d
�0P�+ op(1) > pmin + op(1)
where P is a diagonal matrix of eigenvectors of �1 and Q is the corresponding orthonormal matrix of eigenvectors andpmin > 0 is the smallest eigenvalue of P: Hence the result follows.
Let �Y � 1T
PTt=1 YtT and ~Y �
PTt=1 YtT ptT
Assumption A.3 (a) The �nite dimensional stochastic process fXtg1t=1 is stationary and ergodic; (b) E[jXtj� ] < 1
for some � � 1; (c) T 1=�=mT = o(1).
Lemma A.5 Let the both A.3, 3.2, 3.3 (a); Then
�Y � � �Y ! 0, prob-P�, prob-P, (A.8)
�Y � � ~Y ! 0, prob-P�, prob-P (A.9)
[31]
Proof: If we prove (A:9), (A:8) follows from this result and the fact that��� �Y � ~Y��� =
���� 1T XT
t=1YtT �
XT
t=1YtT ptT
����=
���� 1T XT
t=1YtT (1� TptT )
�����
���� 1T XT
t=1YtT
����maxt j1� TptT jp! 0
by the ergodic theorem and the fact that max1�t�T
jTptT j = 1 + op(1): First note that
E�[���Y �jT ���] =
XT
t=1ptT
���� 1STXt�1
s=t�Tk(
s
ST)Xt�s
����� (1 + op(1))
1
T
XT
t=1
���� 1STXt�1
s=t�Tk(
s
ST)Xt�s
����= (1 + op(1))
1
T
XT
t=1
���� 1STX�1
�j=�Tk(t� jST
)Xj
����= (1 + op(1))
1
T
XT
t=1
���� 1STXT
j=1k(t� jST
)Xj
����� (1 + op(1))
1
T
XT
t=1
1
ST
XT
j=1
����k( t� jST)
���� jXj j= (1 + op(1))
1
T
XT
j=1jXj j
1
ST
XT�j
s=1�j
����k( sST )����
by Smith (2011, equation, (A.4) we have1
ST
XT�j
s=1�j
����k( sST )���� = O(1)
uniformly in j: Also by the ergodic theorem (White, 1999, Theorem 2.34)XT
j=1jXj j =T = Op(1). Thus E�[
���Y �jT ���] = Op(1).In addition by T����XT
t=1jYtT j �t �
XT
t=1jYtT j ptT I(jYtT j < �mT )
���� � (1 + op(1))1
T
XT
t=1jYtT j I(jYtT j � �mT )
� (1 + op(1))1
T
XT
t=1jYtT jmax
tI(jYtT j � �mT )
Now by MmaxtjYtT j = O(1)max
tjXtj = Op(T 1=� ):
Since T 1=�=mT = o(1) it follows that maxt I(jYtT j � �mT ) = op(1). Thus
1
T
XT
t=1jYtT j I(jYtT j � �mT ) = op(1):
The remaining part of the proof is similar to the proof of Khinchine's weak law of large numbers given in Rao (2002).De�ne a pair of new random variables for each T , (t = 1; :::;mT ),
WtT = Y �tT ; ZtT = 0 if jY �tT j < �mT ;
WtT = 0; ZtT = Y�tT if jY �tT j � �mT :
Hence Y �tT =WtT + ZtT . De�ne
�T = E�[WtT ]
=XT
t=1ptTYtT I[jYtT j < �mT ]:
Note that ~Y = E�[��Y �tT ��] and ��� ~Y � �T ��� < " for any " > 0 and T large enough. The latter claim holds since by T����XT
t=1ptTYtT I[jwtT j < �mT ]�
XT
t=1ptTYtT
���� � (1 + op(1))1
T
XT
t=1jYtT j I(jYtT j � �mT )
= op(1):
Nowvar�[WtT ] = E
�[W 2tT ]� �2T � E�[W 2
tT ] � �mTE�[jWtT j]:
Thus, writing �W =XmT
t=1WtT =mT , using C,
P�f�� �W � �T
�� � "g � var�[WtT ]
"2mT
� �E�[jWtT j]"2
:
[32]
Hence, since��� ~Y � �T ��� < " for any " > 0 and T large enough,
P�f��� �W � ~Y
��� � 2"g � �E�[jWtT j]"2
: (A.10)
Now by M it follows that
P�fZtT 6= 0g = P�fjY �tT j � �mT g
� 1
�mTE�[jY �tT j I[jY �tT j � �mT ]] �
�
mT:
w.p.a. To see this, as E�[��Y �tT ��] = Op(1), it follows that E�[
��Y �tT �� I[��Y �tT �� � �mT ]] = op(1). Thus, we can always choose a
constant �2 such that for T large enough E�[��Y �tT �� I[��Y �tT �� � �mT ]] � �2 w.p.a.1. Write �Z =
XmT
t=1ZtT =mT . Note that
P�f �Z 6= 0g � P�fmaxtZtT 6= 0g (A.11)
�XmT
t=1P�fZtT 6= 0g � �:
From eqs. (A.10) and (A.11)
P�f��� �Y � � ~Y
��� � 4"g = P�f��� �W � ~Y + �Z
��� � 4"g� P�f
��� �W � ~Y���+ �� �Z�� � 4"g
� P�f��� �W � ~Y
��� � 2"g+ P�f�� �Z�� � 2"g� �E�[jWtT j]
"2+ P�f
�� �Z�� 6= 0g = �E�[jWtT j]"2
+ �:
Now choose � small enough. As E�[jWtT j] � E�[��Y �tT ��] = Op(1), the result follows from M.
The following Theorem is due to Ranga Rao [see Wooldridge, 1994]
Theorem A.3 Let � � Rp; let fXt 2 X : t = 1; 2; :::; g be a sequence of stationary and ergodic m � 1 random vectorswith and let ft : X��! R be a real valued function. Assume that (a) � is compact.; (b) for each �, f(:; �) is measurableand for each Xt 2 X; f(xt; :) is continuous on �; (c) E
�sup�2� jf(Xt; �)j
�<1 then
sup�2�
���� 1T XT
t=1f(Xt; �)� E[f(Xt; �)]
���� = op(1):The following Lemma corresponds to a weak uniform law of large numbers for kernel block bootstrapped sequences.
Lemma A.6 Let fXt 2 X : t = 1; 2; :::; g be a sequence of stationary and ergodic m� 1 random vectors and let
qtT (�) =1
ST
Xt�1
s=t�Tk(
s
ST)g(Xt�s; �); (A.12)
and consider the sample qtT (�), (t = 1; :::; T ). Draw a random sample of size mT with replacement from qtT (�), (t =1; :::; T ) , to obtain the bootstrap sample q�tT (�), (t = 1; :::;mT ) where P(q�tT (�) = qtT (�)) = ptT for s = 1; :::;mT andt = 1; :::; T . Assume that 3.2, 3.3 (a) hold and that : (a) Bootstrap Pointwise Weak Law of Large Numbers. for each�xed � 2 � � Rp, � a compact set,
1
mT
XmT
t=1q�tT (�)�
XT
t=1qtT (Xt; �)ptT ! 0, prob-P�, prob-P;
(b) Uniform Convergence:
sup�2�
���� 1T XT
t=1qtT (�)�
1
T
XT
t=1g(Xt; �)
���� p! 0;
E[sup�2�
jg(Xt; �)j] � �
(c) or each �, g(:; �) is measurable and for each xt 2 X; g(xt; :) is continuous on �. Then, as mT ! 1 and ST =op(T 1=2), for any � > 0 and � > 0
limT!1
PfP�fsup�2�
���� 1mT
XmT
t=1q�tT (�)�
1
T
XT
t=1g(Xt; �)
���� > �g > �g = 0;
limT!1
PfP�fsup�2�
���� 1mT
XmT
t=1q�tT (�)�
XT
t=1qtT (�)ptT
���� > �g > �g = 0:
[33]
Proof: First write
AT = P�fsup�2�
���� 1mT
XmT
t=1q�tT (�)�
1
T
XT
t=1g(Xt; �)
���� > �gand by M PfAT > �g � ��1E[AT ]: Note that the Lebesgue convergence theorem is valid for sequences that converge inprobability by Proposition 20 of Royden (1988, p.96.). Therefore as AT � 1 the result follows from this theorem if we
show that ATp! 0:
The proof is similar to the proof of a standard UWL (eg. Amemya, 1985).First note that���� 1mT
XmT
t=1q�tT (�)�
1
T
XT
t=1g(Xt; �)
���� ����� 1mT
XmT
t=1q�tT (�)�
XT
t=1qtT (�)ptT
����+
���� 1T XT
t=1gt(�)�
XT
t=1qtT (�)ptT
����and that
sup�2�
���� 1T XT
t=1g(Xt; �)�
XT
t=1qtT (�)ptT
���� � sup�2�
���� 1T XT
t=1g(Xt; �)�
1
T
XT
t=1qtT (�)
����+ sup�2�
���� 1T XT
t=1qtT (�)TptT �
1
T
XT
t=1qtT (�)
���� :By Smith (Lemma A.1, 2004) we have
sup�2�
���� 1T XT
t=1g(Xt; �)�
1
T
XT
t=1qtT (�)
���� = op(1): (A.13)
Also
sup�2�
���� 1T XT
t=1qtT (�)TptT �
1
T
XT
t=1qtT (�)
���� � sup�2�
���� 1T XT
t=1qtT (�)
���� max1�t�TjTptT � 1j
= op(1)
since
sup�2�
���� 1T XT
t=1qtT (�)
���� � O(1) 1T XT
t=1sup�2�
jg(xt; �)j = Op(1)
by the ergodic theorem ergodic theorem (White, 1999, Theorem 2.34) and the fact that max1�t�T jTptT j = 1+ op(1):Weprove now that
limT!1
PfP�fsup�2�
���� 1mT
XmT
t=1q�tT (�)�
XT
t=1qtT (�)ptT
���� > �g > �g = 0:Since � is compact it follows that there is a �nite number of �0s for instance �1; �2; :::; �n� such that � �
Sni=1 �(�i; �)
where �(�i; �) is an open ball with center �i and radius �: Thus
P�fsup�2�
���� 1mT
XmT
t=1q�tT (�)�
XT
t=1qtT (�)ptT
���� > �g �
P�f[n�
i=1sup
�2�(�i;�)
���� 1mT
XmT
t=1q�tT (�)�
XT
t=1qtT (�)ptT
���� > �g �
Xn�
i=1P�f sup
�2�(�i;�)
���� 1mT
XmT
t=1q�tT (�)�
XT
t=1qtT (�)ptT
���� > �g:
Now
P�f sup�2�(�i;�)
���� 1mT
XmT
t=1q�tT (�)�
XT
t=1qtT (�)ptT
���� > �g �P�f
���� 1mT
XmT
t=1q�tT (�i)�
XT
t=1qtT (�i)ptT
���� > �
3g
+P�f sup�2�(�i;�)
���� 1mT
XmT
t=1q�tT (�)�
1
mT
XmT
t=1q�tT (�i)
���� > �
3g
Pf sup�2�(�i;�)
����XT
t=1qtT (�)ptT �
XT
t=1qtT (�i)ptT
���� > �
3g
B1;T +B2;T +B3;T:
Now ���� 1mT
XmT
t=1q�tT (�i)�
XT
t=1ptT qtT (�i)
���� = oB(1)by the KBB Law of large numbers. Thus B1;T = op(1):
[34]
By M
P�f sup�2�(�i;�)
���� 1mT
XmT
t=1q�tT (�)�
1
mT
XmT
t=1q�tT (�i)
���� > �
3g
� 3
�E�[ sup
�2�(�i;�)
���� 1mT
XmT
t=1q�tT (�)�
1
mT
XmT
t=1q�tT (�i)
����]� 3
�
1
mT
XmT
t=1E�[ sup
�2�(�i;�)jq�tT (�)� q�tT (�i)j]
=1
�
XT
t=1ptT sup
�2�(�i;�)jqtT (�)� qtT (�i)j
=1
�
1
T
XT
t=1TptT sup
�2�(�i;�)jqtT (�)� qtT (�i)j
= (1 + op(1))1
�
1
T
XT
t=1sup
�2�(�i;�)jqtT (�)� qtT (�i)j ;
where the second inequality follows from T. But by M
P( 1T
XT
t=1sup
�2�(�i;�)jqtT (�)� qtT (�i)j > �) �
1
�T
XT
t=1E( sup
�2�(�i;�)jqtT (�)� qtT (�i)j)
also
sup�2�(�i;�)
jqtT (�)� qtT (�i)j = sup�2�(�i;�)
���� 1STXt�1
s=t�Tk(
s
ST)(g(Xt�s; �)� g(Xt�s; �i)
����� 1
ST
Xt�1
s=t�T
����k( sST )���� sup�2�(�i;�)
j(g(Xt�s; �)� g(Xt�s; �i)j
by T. Now taking expectations we have
E[1
ST
Xt�1
s=t�T
����k( sST )���� sup�2�(�i;�)
j(g(xt�s; �)� g(xt�s; �i)j]
=1
ST
Xt�1
s=t�T
����k( sST )����E[ sup
�2�(�i;�)j(g(xt�s; �)� g(xt�s; �i)j]
and 1ST
Xt�1
s=t�T
���k( sST)��� � C; E[sup�2�(�i;�) j(g(xt�s; �)� g(xt�s; �i)j] ! 0 by as g(Xt�s; �) is continuous and domi-
nated convergence as � ! 0. Consequently we have
1
�T
XT
t=1E( sup
�2�(�i;�)jqtT (�)� qtT (�i)j) =
1
�T
XT
t=1
1
ST
Xt�1
s=t�T
����k( sST )���� �
= o(1):
Thus B2;T = op(1):Finally
B3;T = sup�2�(�i;�)
����XT
t=1ptT qtT (�)�
XT
t=1ptT qtT (�i))
���� � 1
T
XT
t=1TptT sup
�2�(�i;�)jqtT (�)� qtT (�i)j
� (1 + op(1))1
T
XT
t=1sup
�2�(�i;�)jqtT (�)� qtT (�i)j
by T and the result follows as above.The second result follows from the �rst and (A:13) and (A:14) :
Lemma A.7 Let fXt 2 X : t = 1; 2; :::; g be a sequence of stationary and ergodic m� 1 random vectors and let
qtT (�) =1
ST
Xt�1
s=t�Tk(
s
ST)g(Xt�s; �); (A.14)
and consider the sample qtT (�), (t = 1; :::; T ). Draw a random sample of size mT with replacement from qtT (�), (t =1; :::; T ) , to obtain the bootstrap sample q�tT (�), (t = 1; :::;mT ) where P(q�tT (�) = qtT (�)) = ptT for s = 1; :::;mT andt = 1; :::; T . Assume that 3.2, 3.3 (a) hold and that : (a) Bootstrap Pointwise Weak Law of Large Numbers. for ,
1
mT
XmT
t=1q�tT (�0)�
XT
t=1qtT (�0)ptT ! 0, prob-P�, prob-P;
[35]
(b) E[sup�2N jg(Xt; �)j] � � where N is a neighbourhood of �0: (c) or each �, g(:; �) is measurable and for each xt 2 X;g(xt; :) is continuous on �. Then,
sup�2N
���� 1T XT
t=1g(Xt; �)� E[g(Xt; �)]
���� p! 0
and as mT !1 and ST = op(T1=2), for any � > 0 and � > 0
limT!1
PfP�f sup�2N
���� 1mT
XmT
t=1q�tT (�)�
XT
t=1qtT (�)ptT
���� > �g > �g = 0;
limT!1
PfP�f sup�2N
���� 1mT
XmT
t=1q�tT (�)�
1
T
XT
t=1g(Xt; �)
���� > �g > �g = 0:
Proof: Let N = �(�0; �); where �(�0; �) is an open ball with center �0 and radius �: First note that
sup�2�(�0;�)
���� 1T XT
t=1g(Xt; �)� E[g(Xt; �)]
���� � sup�2�(�0;�)
���� 1T XT
t=1g(Xt; �)�
1
T
XT
t=1g(Xt; �0)
����+
���� 1T XT
t=1g(Xt; �0)� E[g(Xt; �0)]
����+ sup�2�(�0;�)
jE[g(Xt; �0)]� E[g(Xt; �)]j
Pf sup�2�(�0;�)
���� 1T XT
t=1g(Xt; �)�
1
T
XT
t=1g(Xt; �0)
����g > " � 1
"E[ sup�2�(�0;�)
jg(Xt; �)� g(Xt; �0)j]
Now by TE[ sup�2�(�0;�)
jg(Xt; �)� g(Xt; �0)j] � 2E[ sup�2�(�0;�)
jg(Xt; �)j]
Thus by the Dominated convergence theorem and continuity of g(Xt; �) thus as � ! 0 we have
lim�!0
E[ sup�2�(�0;�)
jg(Xt; �)� g(Xt; �0)j] = 0:
Let us consider now
sup�2�(�0;�)
���� 1T XT
t=1qtT (�)�
1
T
XT
t=1g(Xt; �)
���� p! 0
Note that
sup�2�(�0;�)
���� 1T XT
t=1qtT (�)�
1
T
XT
t=1g(Xt; �)
���� � sup�2�(�0;�)
���� 1T XT
t=1g(Xt; �)� E[g(Xt; �)]
����+sup
�2�(�0;�)
���� 1T XT
t=1qtT (�)� E[g(Xt; �)]
����= A1;T +A2;T :
The proofs that A1;Tp! 0 was proven before and the proof that A2;T
p! 0 is identical to the proof of Lemma A.1 of
Smith (2011) [and uses the fact that A1;Tp! 0 ]
We prove now that
limT!1
PfP�f sup�2N
���� 1mT
XmT
t=1q�tT (�)�
XT
t=1qtT (�)ptT
���� > �g > �g = 0:Note that
P�f sup�2�(�0;�)
���� 1mT
XmT
t=1q�tT (�)�
XT
t=1qtT (�)ptT
���� > �g �P�f
���� 1mT
XmT
t=1q�tT (�0)�
XT
t=1qtT (�0)ptT
���� > �
3g
+P�f sup�2�(�0;�)
���� 1mT
XmT
t=1q�tT (�)�
1
mT
XmT
t=1q�tT (�0)
���� > �
3g
Pf sup�2�(�0;�)
����XT
t=1qtT (�)ptT �
XT
t=1qtT (�0)ptT )
���� > �
3g
C1;T + C2;T + C3;T:
Now ���� 1mT
XmT
t=1q�tT (�0)�
XT
t=1ptT qtT (�0)
���� = oB(1)[36]
by the KBB Law of large numbers. Thus C1;T = op(1):By M
P�f sup�2�(�0;�)
���� 1mT
XmT
t=1q�tT (�)�
1
mT
XmT
t=1q�tT (�0)
���� > �
3g
� 3
�E�[ sup
�2�(�0;�)
���� 1mT
XmT
t=1q�tT (�)�
1
mT
XmT
t=1q�tT (�0)
����]� 3
�
1
mT
XmT
t=1E�[ sup
�2�(�0;�)jq�tT (�)� q�tT (�0)j]
=3
�
XT
t=1ptT sup
�2�(�0;�)jqtT (�)� qtT (�0)j
=3
�
1
T
XT
t=1TptT sup
�2�(�0;�)jqtT (�)� qtT (�0)j
= (1 + op(1))1
�
1
T
XT
t=1sup
�2�(�0;�)jqtT (�)� qtT (�0)j ;
where the second inequality follows from T. But
P( 1T
XT
t=1sup
�2�(�0;�)jqtT (�)� qtT (�0)j > �) �
1
�T
XT
t=1E( sup
�2�(�0;�)jqtT (�)� qtT (�0)j)
also
sup�2�(�0;�)
jqtT (�)� qtT (�i)j = sup�2�(�0;�)
���� 1STXt�1
s=t�Tk(
s
ST)(g(Xt�s; �)� g(Xt�s; �0)
����� 1
ST
Xt�1
s=t�T
����k( sST )���� sup�2�(�0;�)
j(g(Xt�s; �)� g(Xt�s; �0)j
by T. Now taking expectations we have
E[1
ST
Xt�1
s=t�T
����k( sST )���� sup�2�(�0;�)
j(g(xt�s; �)� g(xt�s; �0)j]
=1
ST
Xt�1
s=t�T
����k( sST )����E[ sup
�2�(�0;�)j(g(xt�s; �)� g(xt�s; �0)j]
and 1ST
Xt�1
s=t�T
���k( sST)��� � C; E[sup�2�(�0;�) j(g(xt�s; �)� g(xt�s; �0)j]! 0 as g(Xt�s; �) is continuous and dominated
convergence as � ! 0. Consequently we have
1
�T
XT
t=1E( sup
�2�(�0;�)jqtT (�)� qtT (�0)j) =
1
�T
XT
t=1
1
ST
Xt�1
s=t�T
����k( sST )���� �
= o(1):
Thus the result follows.The second result follows from the fact that
sup�2�(�0;�)
���� 1mT
XmT
t=1q�tT (�)�
1
T
XT
t=1g(Xt; �)
���� � sup�2�(�0;�)
���� 1mT
XmT
t=1q�tT (�)�
XT
t=1qtT (�)ptT
����+ sup�2�(�0;�)
���� 1T XT
t=1g(Xt; �)�
XT
t=1qtT (�)ptT
���� :The �rst term of the RHS was shown to converge to zero
sup�2�(�0;�)
���� 1T XT
t=1g(Xt; �)�
XT
t=1qtT (�)ptT
���� � sup�2�(�0;�)
���� 1T XT
t=1qtT (�)�
XT
t=1qtT (�)ptT
����+ sup�2�(�0;�)
���� 1T XT
t=1g(Xt; �)�
1
T
XT
t=1qtT (�)
����� sup
�2�(�0;�)
���� 1T XT
t=1qtT (�)
���� max1�t�Tj1� TptT j+ op(1):
As A1;T + A2;Tp! 0: Now sup�2�(�0;�)
��� 1T PTt=1 qtT (�)
��� = Op(1) and max1�t�T j1� TptT j = op(1): Hence the result
follows.
[37]
Lemma A.8 If the �nite dimensional stochastic process fXtg1t=1 satisfy assumptions 3.1, 2.5 and 3.3 (a) hold and ifmT = T=ST ; and ST = o(T
1=2) and if E[Xt] = 0,
limn!1
P[P�(j ST
mT k2
XmT
t=1Y �2tT � ST
Tk2
XT
t=1Y 2tT j > ") > �] = 0;
limn!1
P[P�(j ST
mT k2
XmT
t=1Y ?2tT � ST
k2
XT
t=1Y 2tT ptT j > ") > �] = 0:
Proof: The result is proved if we show that
P�(j STmTk2
XmT
t=1Y �2tT � ST
Tk2
XT
t=1Y 2tT j > ") = op(1):
Note that ����STk2 XT
t=1Y 2tT ptT �
ST
Tk2
XT
t=1Y 2tT
���� = maxt jTptT � 1j���� STTk2 XT
t=1Y 2tT
���� :Thus as
��� STTk2 PTt=1 Y
2tT
��� = Op(1) by Lemma A.3 of Smith (2004, pA.4) and max1�t�T jTptT � 1j = op(1) by assumption.Hence the result follows by T if we show that
P������ STmTk2
XmT
t=1Y �2tT � ST
k2
XT
t=1Y 2tT ptT
���� > "� = op(1):The proof of this result is similar to that of Lemma B.2 of Gon�calves and White (2004).First note that E�
�Y �2tT
�=PT
t=1 ptTY2tT Thus by M we have
P������j ST
mT k2
XmT
t=1Y �2tT ��ST
k2
XT
t=1Y 2tT ptT
���� > "� � "�pE�(j ST
mT k2
XmT
t=1Y �2tT � ST
k2
XT
t=1ptTY
2tT jp)
for some p > 1: Now
E�(j ST
mT k2
XmT
t=1Y �2tT � ST
k2
XT
t=1ptTY
2tT jp) = (
ST
mT k2)pE�(j
XmT
t=1(Y ?2tT � E?
�Y ?2tT
�)jp)
� (ST
mT k2)pCE�((
XmT
t=1j(Y ?2tT � E? [Y ?tT ])2j)p=2)
for some C < 1 by an extension of the Burkholder inequality due to White and Chen (1996, Lemma A.2) as (Y �2tT �E��Y �2tT
�) are i.i.d zero mean. But for 1 < p � 2 we have by the cr inequality (Davidson, 1994, p140) with r = p=2
(ST
mT k2)pE�((
XmT
t=1j(Y �2tT � E�
�Y �2tT
�)2j)p=2) � (
ST
mT k2)pXmT
t=1E�(��(Y �2tT � E�
�Y �2tT
�)2��p=2)
=SpT
mp�1T kp2
E����(Y �2tT � E�
�Y �2tT
�)��p�
�SpT
mp�1T kp2
2pE��j(Y �tT j
2p�
=S3=2T
m1=2T k
3=22
2pE��j(Y �tT j
3�
=S2T
T 1=2k3=22
23=2XT
t=1jYtT j3 ptT
=S2T
T 1=2k3=22
23=21
T
XT
t=1jYtT j3 TptT
=S2T
T 1=2k3=22
23=2(1 + op(1))1
T
XT
t=1jYtT j3
as max1�t�T
jTptT � 1jp! 1 and for p = 3=2: Now note that
ST
T
XT
t=1jYtT j3 =
ST
T
XT
t=1jYtT j2max
tjYtT j = Op(T 1=�)
by Lemma A.3 of Smith (2011) and by M. Thus
ST
T 1=2k3=22
23=2(1 + op(1))ST
T
XT
t=1jYtT j3 = Op(T��+1=�):
Now note � > max(4v; 1=�) > 1=�, thus � > 1=� and the result follows
[38]
Lemma A.9 If the �nite dimensional stochastic process f(Xt; Ztg1t=1 is strictly stationary and ergodic and satisfy
E(jXtjdp) < � and E(jZtjdpd�1 ) < �; for some 1 < p � 2 and d > 1 and if assumptions assumptions 2.5 and 3.3
hold and if mT = T=ST and ST = o(T1=2) then
limn!1
P[P�(j STmT
XmT
t=1Y �tTZ
�tT j > T 1=2") > "] = 0;
where ZtT =1ST
Xt�1
s=t�Tk( s
ST)Zt�s; (t = 1; :::; T ); and (Z?1T ; :::; Z
?mT T
) is a bootstrap sample drawn from (Z1T ; :::; ZTT ):
Proof: The proof is similar to the proof of Lemma B.2 of Gon�calves and White (2004). First note that by M for some1 < p � 2 we have
P�(j STmT
XmT
t=1Y �tTZ
�tT j > T 1=2") �
1
"pT p=2E�[j ST
mT
XmT
t=1Y �tTZ
�tT jp]
� C 1
"pT p=2E�[j ST
mT
XmT
t=1Y �tTZ
�tT � E�[Y �tTZ�tT ]jp]
+C
"pT p=2E�[j ST
mT
XmT
t=1E�[Y �tTZ
�tT ]jp] = F1 + F2
by cr inequality. Now
F1 � 1
"pT p=2SpTmpT
E�[jXmT
t=1Y �tTZ
�tT � E�[Y �tTZ�tT ]jp]
� C1
"pT p=2SpTmpT
E�[(XmT
t=1(jY �tTZ�tT � E�[Y �tTZ�tT ]j)
2)p=2]
� C1
"pT p=2SpTmpT
E�[XmT
t=1j(Y �tTZ�tT � E�[Y �tTZ�tT ])
pj]
� C1
"pT p=2SpT
mp�1T
E�[jY �tTZ�tT jp]
by an extension of the Burkholder inequality due to White and Chen (1996, Lemma A.2) and cr inequality with r = p=2.Also
F2 =C
"pT p=2
���� STmT
XmT
t=1E�[Y �tTZ
�tT ]
����p� C
"pT p=2SpTmpT
���XmT
t=1E�[Y �tTZ
�tT ]���p
=C
"pT p=2SpT jE
�[Y �tTZ�tT ]j
p
� C
"pT p=2SpTE
�[jY �tTZ�tT jp]
by Jensen. Now
C
"pT p=2SpTE
�[jY �tTZ�tT jp] =
C
"pT p=2SpT
XT
t=1jYtT jp jZtT jp ptT
=C
"pT p=2SpT
1
T
XT
t=1jYtT jp jZtT jp TptT
=C
"pT p=2SpT (1 + op(1))
1
T
XT
t=1jYtT jp jZtT jp
as max1�t�T jTptT � 1jp! 1:
But by M and Holder Inequality
P[ C
"pT p=2SpT
1
T
XT
t=1jYtT jp jZtT jp > �] � C
�"pT p=2SpTE[
1
T
XT
t=1jYtT jp jZtT jp]
=C
�"pT p=2SpT
1
T
XT
t=1E[jYtT jp jZtT jp]
� C
�"pT p=2SpT
1
T
XT
t=1(E[jYtT j�p])1=�
�E[jZtT j
�p1�� ]
�(1��)=�:
[39]
Now by T and Jensen inequalities
E[jYtT j�p] = E[j 1ST
Xt�1
s=t�Tk(
s
ST)Xt�sj�p]
� E[(1
ST
Xt�1
s=t�Tjk( sST
)jjXt�sj)�p]
= E[(1
ST
Xt�1
s=t�Tjk( sST
)j)�p(1ST
Xt�1
s=t�Tjk( s
ST)j jXt�sj
1ST
Xt�1
s=t�Tjk( s
ST)j
)�p]
� E[(1
ST
Xt�1
s=t�Tjk( sST
)j)�p�1 1
ST
Xt�1
s=t�Tjk( sST
)j jXt�sj�p]
= (1
ST
Xt�1
s=t�Tjk( sST
)j)�p�1 1
ST
Xt�1
s=t�Tjk( sST
)jE jXt�sj�p
= (1
ST
Xt�1
s=t�Tjk( sST
)j)�p� = O(1)
as E(jXtj�p) is bounded : By the same reasoning E[jZtT j�p1�� ] = O(1): Thus the result follows since ST =T
1=2 = o(1):
A.4 Proofs of the results in section 4.1In this subsection of the appendix we take ptT = 1=T and consequently Assumption 3.3 (i) is automatically satis�ed.Assumption 3.3 (ii) follow from Lemma A.2 of Smith (2011).
Proof of Theorem 4.1: The result is proven if we show that the conditions of Lemma A.2 of Gon�calves and White(2004) are satis�ed. Conditions (a1), (a2) and (b1) and (b2) are satis�ed by assumption 4.1 (i) and (iii) (see Jennrich,1969, Lemma 2). Note that uniqueness of the minimum follows from Lemma 2.3 of Newey and MacFadden (1994). Toprove (a3) de�ne Q0(�) = E[g(zt; �)]0WE[g(zt; �)] and note that as in the proof of Theorem 2.6 of Newey and MacFadden(1994) using T and CS
jQT (�)�Q0(�)j � kg(�)� E[g(zt; �)]k2 kWT k+2 kE[g(zt; �)]k kg(�)� E[g(zt; �)]k kWT k+ kE[g(zt; �)]k2 kWT �Wk :
By the the Lemma A.3 we have sup�2B kg(�)� E[g(zt; �)]k = op(1) Also by assumption kE[g(zt; �)]k is bounded andkWT �Wk = op(1).
It remains to prove (b3). By T and CS
jQ�T (�)�QT (�)j � kg�T (�)� g(�)k2 kW �
T k+ 2 kg(�)k kg�T (�)� g(�)]k kW �T k
+ kg(�)k2 kW �T �WT k :
Now by Lemma A.6 we have sup�2B g�T (�)� g(�) = oB(1) also
sup�2B
kg(�)k � sup�2B
kg(�)� E[g(zt; �)]k+ sup�2B
kE[g(zt; �)]k
= op(1) + C:
thus the result follows as W �
T �WT
= oB(1):Proof of Theorem 4.2: Let G�T � @g�T (�
�)@�0 To prove asymptotic Normality notice that by the �rst order
conditions we havepT=k2G�0TW
�T g
�T (�
�) = 0: Hence a �rst order Taylor expansion around � yieldspT=k2G
�0TW
�T g
?T (�) + G
�0TW
�T~G�TpT=k2(�
� � �) = 0;
where ~G�T � @g?T ( ~��)@�0 and ~�� is on a line joining � and ��: Solving forpT=k2(�� � �) we obtainp
T=k2(�� � �) = �[G�0TW �
T~G�T ]
�1pT=k2G�0TW �T g
�T (�):
By a Taylor expansion we havepT=k2G
�0TW
�T g
�T (�) =
pT=k2G
�0TW
�T g
�T (�0) +
pT=k2G
�0TW
�T�G�T (� � �0)
=pT=k2G
�0TW
�T [g
�T (�0)� gT (�0)] +
pT=k2G
�0TW
�T gT (�0) +
pT=k2G
�0TW
�T~G�T (� � �0);
where �G�T = @g?T (��)@�0 and �� is on a line joining � and �0:
We prove now that pT=k2G
�0TW
�T gT (�0) +
pT=k2G
�0TW
�T~G�T (� � �0) = oB(1):
Note that by the �rst order conditions of the original GMM problem we havepT (� � �0) = �[G0TWT
�GT ]�1pT=k2G0TWT gT (�0);
[40]
where �GT = @gT ( ��)@�0 and �� is on a line joining � and �0: Thusp
T=k2G�0TW
�T gT (�0) +
pT=k2G
�0TW
�T~G�T (� � �0)
= [G�0TW�T � G�0TW �
T~G�T
hG0TWT
�GT
i�1G0TWT ]
pT=k2gT (�0):
Now by Assumption W �T = WT + oB(1), WT = W + op(1); also by the bootstrap uniform convergence Lemma A.7
consistency of �� and �; G�T � G = oB(1); ~G�T � G = oB(1); �GT � G = op(1); GT � G = op(1) and by the CLT of
Wooldridge and White (Theorem 5.20 of White ,1999)pT=k2gT (�0) = O(1):
NowpT=k22G�0TW
�T [g
�T (�0)�gT (�0)] converges toN(0; (G0WG)�1G0WWG(G0WG)�1) by bootstrap CLT Theorem
A.2 and the fact that G�T �G = oB(1) and W �T =W + oB(1). The result follows as the
pT=k2(�� � �) converges to the
same asymptotic distribution of T 1=2(� � �0) and by Polya Theorem, Ser ing (2002, p.18), as �(�) is a continuous c.d.f.:
Proof of Lemma 4.1: We use the same strategy of the proof of Theorem 4.1 of Gon�calves and White (2004). Firstconsider the unfeasible estimator of :
�(�0) =ST
mT k2
XmT
t=1g�t (�0)g
�t (�0)
0:
Fix any � 2 Rm. Now
�0�(�0)� =ST
mT k2
XmT
t=1�0g�t (�0)g
�t (�0)
0�
=ST
mT k2
XmT
t=1(�0g�t (�0))
2:
Now applying Lemma A.8 with Xt = �0gt(�0) and ptT = 1=T , t = 1; :::; T it follows that
ST
mT k2
XmT
t=1(�0g�t (�0))
2 � ST
Tk2
XT
t=1(�0gtT (�0))
2 = oB(1)
and by Smith (2011) Lemma A.3ST
Tk2
XT
t=1(�0gtT (�0))
2 = �0�+ op(1):
Thus it remains to prove that����0�( ~��)�� �0�(�0)���� = oB(1): Note that by �rst order Taylor expansion of (�0gtT ( ~��))2
around �0 we have
(�0gtT ( ~��))2 =
��0g�t (�0)
�2+ 2(�0g�t ( ��
�)�0G�t ( ���)( ~�� � �0))
where ��� is in a line joining ~�� and �0: Thus
�0�( ~��)� =ST
mT k2
XmT
t=1(�0g�t ( ~�
�))2
=ST
mT k2
XmT
t=1(��0g�t (�0)
�2+ 2(�0g�t ( ��
�)�0G�t ( ���)( ~�� � �0)))
= �0�(�0)�+ST
mTk2
XmT
t=12(�0g�t ( ��
�)�0G�t ( ���)( ~�� � �0)):
Now denote G�t;j(���) the column j of G�t (
���) thus
j STmT
XmT
t=12(�0g�t ( ��
�)�0G�t ( ���)( ~�� � �0))j
= j STmT
XmT
t=12(�0g�t ( ��
�)Xp
j=1�0G�t;j( ��
�)( ~��j � �0;j))j
= jXp
j=1
ST
mT
XmT
t=12(�0g�t ( ��
�)�0G�t;j( ���)( ~��j � �j;0))j
�Xp
j=1j STmT
XmT
t=12(�0g�t ( ��
�)�0G�t;j( ���)( ~��j � �j;0))j
=Xp
j=1OB(
1pT)
���� STmT
XmT
t=12(�0g�t ( ��
�)�0G�tj( ���)
����by T and the fact that ( ~��j � �j;0) = OB(1=T 1=2): Note that���� STmT
XmT
t=12(�0g�t ( ��
�)�0G�tj( ���)
���� � ST
mT
XmT
t=12���(�0g�t ( ���)�0G�tj( ���)���
� ST
mT
XmT
t=1sup�2B
2��(�0g�t (�)�� sup
�2B
���0G�tj(�)�� :[41]
Now de�ne jYtT j = 2 sup�2B j(�0g�t (�)j and jZtT j = sup�2N����0G�tj(�)��� and apply Lemma A.9 above with p = 2 , d = �=2
and ptT = 1=T , t = 1; :::; T which shows that
ST
mT
XmT
t=12 sup�2B
2��(�0g�t (�)�� sup
�2B
���0G�tj(�)�� = op(T�1=2)and hence the result follows.
Proof of Theorem 4.3: Note that by a Taylor expansionpT=k2g
�(�e�) =pT=k2g
�(�e) + ~G�pT=k2(�
e� � �e);
where ~G�T � @g�T ( ~�)=@�0 and ~�� is in a line joining �e� and �e:Note that by Theorem 4.2 with W �
T =~��1p
T=k2(�e� � �e) = �[G�0T ~��1 ~G�T ]�1G�0T ~��1
pT=k2[g
�T (�0)� gT (�0)] + oB(1):
Also by a Taylor expansionpT=k2(g
�(�e)� g�(�0)� g(�e) + g(�0)) = ( �G�T � �GT )pT=k2
��e � �0
�(A.15)
= oB(1)Op(1) = oB(1);
where �G�T � @g�T ( ��)=@�0 where �� is in a line joining �e and �0 and �GT � @gT ( ��)=@�0 where �� is in a line joining �e and�0.
Thus sT
k2[g�(�e�)� g(�e)] = [Im � ~G�[G�0T ~
��1 ~G�T ]�1G�0T ~
��1]pT=k2[g
�T (�0)� gT (�0)] + oB(1):
Now since ~G� = G + oB(1); G�T = G + oB(1); ~
��1 = �1 + oB(1) and by the bootstrap CLT Theorem A.2pT=k2[g�T (�0)� gT (�0)] converges to N(0;) It follows thats
T
k2[g�(�e�)� g(�e)] = [Im �G
�G0�1G
��1G0�1]
pT=k2[g
�T (�0)� gT (�0)] + oB(1):
Thus
J � =T
k2[g�T (�0)� gT (�0)]0[�1 � �1G
�G0�1G
��1G0�1][g�T (�0)� gT (�0)] + oB(1):
As
[�1 � �1G�G0�1G
��1G0�1][�1 � �1G
�G0�1G
��1G0�1]
= [�1 � �1G�G0�1G
��1G0�1]
and tr([�1 � �1G�G0�1G
��1G0�1]) = m � p, it follows from Rao and Mitra(1972), the fact that J � )dP�
�2(m � p): Since J d! �2(m � p); the result stated in the Theorem is a consequence of Polya Theorem (Ser ing, 2002,p.18), as the chi-squared distribution has a continuous c.d.f.
Proof of Theorem 4.4: We start by deriving the asymptotic distribution ofW�:De�ne ha;�t (�; ) � (g�(zt; �)0; [q�(zt; �)� ]0), ha;�(�; ) =
PmTt=1 h
a;�t (�; )=mT and ~Q�(�) = ha;�(�; )0���1ha;�(�; ): Note that the unrestricted bootstrapped
GMM estimator solves(�e�0; �0) = argmin
�2B; 2�~Q�(�; );
where � is a compact parameter space. The solution is given by
�e� = argmin�2B
g�(�)0��1g�(�);
� = q�(��)� ��21��1g�(��):
We note that by Theorem 4.1 �e� = � + oB(1) and by Lemma A.6 and �� = �+ oB(1) we have � = + oB(1):Since
these estimators satisfy the �rst order conditions we have D�(�e�)0���1ha;�(�e�; �) = 0 with
D�(�) �� PmT
i=1 G�t (�)=mT 0PmT
i=1 Q�t (�)=mT �Is
�:
Thus by a Taylor expansion around (�e0; 0)0
D�(�e�)0���1h�(�e; ) + D�(��)0���1D�( ~��)
��� � � � �
�= 0;
[42]
where ~�� is in a line joining �e� and �e: Thus
pT
��e� � �e � �
�= �[D�0���1 ~D�]�1D�0���1
pT h�(�; ):
Now notice that as in the proof of Theorem 4.2sT
k2
��e� � �e � �
�= �[D0��1D]�1D0��1
sT
k2[ha;�T (�0; 0)� haT (�0; 0)] + oB(1):
Thus by a Taylor expansion we havesT
k2
�a(�e�)� a(�e)
� �
�= �R( ���)[D0��1D]�1D0��1
sT
k2[ha;�T (�0; 0)� haT (�0; 0)] + oB(1)
= �R[D0��1D]�1D0��1
sT
k2[ha;�T (�0; 0)� haT (�0; 0)] + oB(1);
where ��� is in a line joining �e� and �e as R( ���) = R+ oB(1):Thus
W� = (T=k2)[r� � r]0
hR�(D�0���1D�)�1R�0
i�1[r� � r]
=
sT
k2[ha;�T (�0; 0)� haT (�0; 0)]0��1D[D0��1D]�1R0
�R(D0��1D)�1R0
��1�R(D0��1D)�1D0��1
sT
k2[ha;�T (�0; 0)� haT (�0; 0)] + oB(1):
Since as D� = D + oB(1) by Lemma A.7 and �� = � + oB(1) by Lemma A.6and the fact that
pT=k2[h
a;�T (�0; 0) �
haT (�0; 0)] = OB(1) by the bootstrap CLT. Thus as in Theorem 2.4 aboveW� converges to a chi-squared distribution withs+ r degrees of freedom.
We consider know the score statistic S�. We derive the distribution of the bootstrap restricted GMM estimator. Notethat the Lagrangian of the restricted problem is
L� = ~Q�(�; )� a(�)0�� � 0�:
Denote '� = (�0; �0)0 the vector of Lagrange multipliers evaluated at the optimum. Thus the �rst order conditions are
D�(�e�r )0���1h(�e�r ; 0)�R(�e�r )'� = 0:
Multiplying both sides by R(�e�r )0(D�0���1D�)�1 we have
R(�e�r )0(D�(�e�r )
0���1D�(�e�r ))�1D�(�e�r )
0���1ha;�(�e�r ; 0)�R(�e�r )0(D�(�e�r )0���1D�(�e�r ))
�1R(�e�r )'� = 0:
(A.16)
Thus
'� = [R(�e�r )0(D�(�e�r )
0���1D�(�e�r ))�1R(�e�r )]
�1R(�e�r )0(D�(�e�r )
0���1D�(�e�r ))�1D�(�e�r )
0���1h(�e�r ; 0):(A.17)
Hence replacing (A:17) in (A:16) we have
D�(�e�r )0���1ha;�(�e�r ; 0)
�R(�e�r )[R(�e�r )0(D�(�e�r )0���1D�(�e�r ))
�1R(�e�r )]�1R(�e�r )
0(D�(�e�r )0���1D�(�e�r ))
�1D�(�e�r )0���1ha;�(�e�r ; 0) = 0:
But by a Taylor expansion ha;�(�e�r ; 0) = ha;�(�er ; 0) + ~D�( ���)S1(�e�r � �er) where ��� is in a line joining �e�r and �erand S1 is a selection matrix such that
~D�( ���)S1 =
� PTi=1 G
�t
�����=TPT
i=1 Q�t
�����=T
�:
Thus we have
[I �R(�e�r )[R(�e�r )0(D�(�e�r )0���1D�(�e�r ))
�1R(�e�r )]�1R(�e�r )
0(D�(�e�r )0���1D�(�e�r ))
�1]
[D�(�e�r )0���1
pT ha;�(�er ; 0) + D
�(�e�r )0���1 ~D�( ���)S1
pT (�e�r � �er)] = 0;
and consequently
S1
sT
k2(�e�r � �er) = �[D�(�e�r )
0���1 ~D�( ���)]�1
�[I �R(�e�r )[R(�e�r )0(D�(�e�r )0���1D�(�e�r ))
�1R(�e�r )]�1R(�e�r )
0(D�(�e�r )0���1D�(�e�r ))
�1]
�D�(�e�r )0���1
sT
k2ha;�(�er ; 0):
[43]
Now as in (A:15) above we havesT
k2(ha;�(�er ; 0)� ha;�(�0; 0)� ha(�er ; 0) + ha(�0; 0)) = oB(1): (A.18)
Therefore we have
S1
sT
k2(�e�r � �er) = �[D�(�e�r )
0���1 ~D�( ���)]�1
�[I �R(�e�r )[R(�e�r )0(D�(�e�r )0���1D�(�e�r ))
�1R(�e�r )]�1R(�e�r )
0(D�(�e�r )0���1D�(�e�r ))
�1]
�D�(�e�r )0���1
pT=k2(h
a;�(�0; 0)� ha(�0; 0)) +A�T ;where
A�T = �[D�(�e�r )0���1 ~D�( ���)]�1
�[Ip �R(�e�r )[R(�e�r )0(D�(�e�r )0���1D�(�e�r )
0)�1R(�e�r )]�1R(�e�r )
0(D�(�e�r )0���1D�(�e�r ))
�1]
�D�(�e�r )0���1
pT=k2h
a(�er ; 0):
We show now that A�T = oB(1): But by the �rst order conditions of the original restricted problem we have
D0��1h(�er ; 0)�R(�er)[R(�er)0(D0��1D)�1R(�er)]�1R(�er)
0(D0��1D)�1D0��1ha(�er ; 0) = 0:
Hence
A�T = �[D�(�e�r )0���1 ~D�( ���)]�1
�[Ip �R(�e�r )[R(�e�r )0(D�(�e�r )0���1D�(�e�r )
0)�1R(�e�r )]�1R(�e�r )
0(D�(�e�r )0���1D�(�e�r ))
�1]
�D�(�e�r )0���1
pT=k2h
a(�er ; 0)
+[D�(�e�r )0���1 ~D�( ���)]�1[Ip �R(�er)[R(�er)0(D0��1D)�1R(�er)]
�1R(�er)0(D0��1D)�1]
�D0��1ha(�er ; 0)
= oB(1)
by the bootstrap local UWL andpT=k2ha(�er ; 0) = Op(1) which can be proven using Taylor expansion and the fact thatp
T (�er � �0) = Op(1):It follows that
S1
sT
k2(�e�r � �er) = �[D0��1D]�1[I �R[R0(D0��1D)�1R]�1R0(D0��1D)�1]
�D0��1pT=k2(h
a;�(�0; 0)� ha(�0; 0))+oB(1):
Consider now the bootstrapped score statistic
S� = ( Tk2)hh�(�e�r )� h(�er)
i0���1D�(D�0���1D�)�1D�0���1
hh�(�e�r )� h(�er)
i:
Note that by a Taylor expansion of h�(��r ) around �r we have
D�0���1
sT
k2(h�(�e�r )� h(�er)) =
sT
k2D�0���1D�( ��r)S1(�
e�r � �er) +
sT
k2D�0���1(ha;�(�er)� ha(�er))
= �D�0���1D�( ��r)[D0��1D]�1[I �R[R0(D0��1D)�1R]�1
�R0(D0��1D)�1]D0��1pT=k2(h
a;�(�0; 0)� ha(�0; 0))
+D�0���1pT=k2(h
a;�(�0; 0)� ha(�0; 0)) + oB(1)
= [R[R0(D0��1D)�1R]�1R0(D0��1D)�1]D0��1pT=k2(h
a;�(�0; 0)� h(�0; 0)) + oB(1)by A.18, the local bootstrap UWL, and the bootstrap CLT.
Thus
S� = (T
k2)hh�(�e�r )� h(�er)
i0���1D�(D�0���1D�)�1D�0���1
hh�(�e�r )� h(�er)
i=
pT=k2(h
a;�(�0; 0)� h(�0; 0))0��1D(D0��1D)�1
�[R0(D0��1D)�1R]�1R0(D�0���1D�)�1[R[R0(D0��1D)�1R]�1
�R0(D0��1D)�1]D0��1pT=k2(h
a;�(�0; 0)� h(�0; 0)) + oB(1)=
pT=k2(h
a;�(�0; 0)� h(�0; 0))0��1D(D0��1D)�1[R0(D0��1D)�1R]�1R0(D0��1D)�1]
�D0��1pT=k2(h
a;�(�0; 0)� h(�0; 0)) + oB(1)= W� + oB(1);
[44]
and the result follows.Now we consider the distance statistic
D� = (T
k2)[hh�(�e�r )� h(�er)
i0���1
hh�(�e�r )� h(�er)
i� [g�(�e�)� g(�e)]0 ~��1[g�(�e�)� g(�e)]
= (T
k2)[hha;�(�e�r ; 0)� ha(�er ; 0)
i0���1
hha;�(�e�r ; 0)� ha(�er ; 0)
i�hha;�(�e�; �)� ha(�e; )
i0~���1
hha;�(�e�; �)� ha;�(�e; )
i+ oB(1);
asg�(�e�)0 ~��1g�(�e�) = ha;�(�e�; �)0~���1ha;�(�e�; �);
and
T g(�e)0 ~�1g(�e) = T ha(�e; )0~��1ha(�e; )
= T ha(�e; )0~���1ha(�e; )
+T ha(�e; )0[~��1 � ~���1]ha(�e; )
= T ha(�e; )0~���1ha(�e; ) + oB(1);
sincepT ha(�e; ) = Op(1) and ~��1 � ~���1 = oB(1):
Now note that by two �rst order Taylor expansions we have
ha;�(�e�r ; 0)� ha(�er ; 0) = ha;�(�e�; �)� ha(�e; )
+D�( ���)
��e�r � �e�
�
��D( ��)
��er � �e
�;
where ��� is in a line joining �e�r and �e� and �� is in a line joining �er and �e: Thus
(T
k2)[ha;�(�e�r ; 0)� ha(�er ; 0)]0���1[ha;�(�e�r ; 0)� ha(�er ; 0)]
= (T
k2)[ha;�(�e�; �)� ha(�e; )]0���1[ha;�(�e�; �)� ha(�e; )]
+(T
k2)2[D�( ���)
��e�r � �e�
�
��D( ��)
��er � �e
�]0���1[ha;�(�e�; �)� ha(�e; )]
+[D�( ���)
��e�r � �e�
�
��D( ��)
��er � �e
�]0���1[D�( ���)
��e�r � �e�
�
��D( ��)
��er � �e
�]:
Note that D�(�e�)0���1ha;�(�e�; �) = 0 and D�ha(�e; ) = 0: ThussT
k2D�( ���)0���1ha;�(�e�; �) =
sT
k2(D�( ���)0���1 �D�(�e�)0���1)
sT
k2ha;�(�e�; �) = oB(1);s
T
k2D�( ���)0���1ha(�e; ) =
sT
k2(D�( ���)0���1 � D�)ha(�e; ) = oB(1):
by the bootstrap UWL, the standard UWL,q
Tk2ha;�(�e�; �) = OB(1) and
pT ha(�e; ) = Op(1): Thuss
T
k2D�( ���)0���1
hha;�(��; �)� ha(�; )
i= oB(1);
and similarlyq
Tk2D( ��)0���1
hha;�(��; �)� ha(�; )
i= oB(1):
Now as �D�0���1hha;�(��; �)� ha(�; )
i= oB(1=
pT ) and �D0���1
hha;�(��; �)� ha(�; )
i= oB(1=
pT ) by the
�rst order conditions and the bootstrap UWL and the standard UWL.Also s
T
k2[D�( ���)
��e�r � �e�
�
��D( ��)
��er � �e
�] = OB(1);
aspT��e�r � �er
�= OB(1);
pT��e� � �e
�= OB(1);
pT��e � �0
�= Op(1);
pT��er � �0
�= Op(1): Thus
(T
k2)2[D�( ���)
��e�r � �e�
�
��D( ��)
��er � �e
�]0���1[ha;�(�e�; �)� ha(�e; )] = oB(1):
Now notice that D�( ���) = D + oB(1) and D( ��) = D + op(1); thussT
k2(D�( ���)
��e�r � �e�
�
��D( ��)
��er � �e
�) =
sT
k2D[
��e�r � �er
0
����e� � �e � �
�] + oB(1);
[45]
and consequentlysT
k2D[
��e�r � �er
0
����e� � �e � �
�] =
sT
k2D[D0��1D]�1[R[R0(D0��1D)�1R]�1R0(D0��1D)�1]
�D0��1pT=k2(h
a;�(�0; 0)� ha(�0; 0)) + oB(1):
Thus
T
k2[D�( ���)
��e�r � �e�
�
��D( ��)
��er � �e
�]0���1[D�( ���)
��e�r � �e�
�
��D( ��)
��er � �e
�]
=pT=k2(h
a;�(�0; 0)� ha(�0; 0))0��1D(D0��1D)�1[R0(D0��1D)�1R]�1R0(D0��1D)�1]
�D0��1pT=k2(h
a;�(�0; 0)� ha(�0; 0)) + oB(1)= W� + oB(1):
A.5 Proofs of the results in section 4.2
Proof of 4.7: We only need to show that the regularity conditions of the lemma A.2 of Gon�calves and White (2004) .Condition (a1) is satis�ed as g(:; �) is measurable and continuous functions of measurable functions are measurable. Sinceg(zt; �) is continuous on B the objective function is continuous g (�)0WT g (�) is continuous. Also Note that by T.
sup�2B
��g?T (�)0W ?T g
?T (�)� E?[g?T (�)]0WTE
?[g?T (�)]��
� sup�2B
��g?T (�)0W ?T g
?T (�)� E?[g?T (�)]0W ?
TE?[g?T (�)]
��+ sup�2B
��E?[g?T (�)]0W ?TE
?[g?T (�)]0 � E?[g?T (�)]0WTE
?[g?T (�)]�� :
Now by T
sup�2B
��g?T (�)0W ?T g
?T (�)� E[g?T (�)]0W ?
TE[g?T (�)]
�� =� sup�2B
kg?T (�)� E?[g?T (�)]k2 kW ?
T k
+2 sup�2B
kg?T (�)� E?[g?T (�)]k kW ?T k sup
�2BkE?[g?T (�)]k :
Now for ptT = �t; note that by Lemma A.1 Assumption 3.3 (a) is satis�ed. Hence the bootstrap UWL and the localUWL given by Lemmata A.6 and A.7 can be applied and therefore sup�2B
g?T (�)� E?[g?T (�)] = oB(1): W ?T �WT
=oB(1) and WT = Op(1):
E?[g?T (�)] =XT
i=1gtT (�)�t
= (1 + op(1))1
T
XT
i=1gtT (�)
= Op(1);
by Lemma A1 of Smith (2011). Thus
sup�2B
��g?T (�)0W ?T g
?T (�)� E[g?T (�)]0W ?
TE[g?T (�)]
�� = oB(1):Now
sup�2B
��E?[g?T (�)]0W ?TE
?[g?T (�)]0 � E?[g?T (�)]0WTE
?[g?T (�)]0��
= sup�2B
��E?[g?T (�)]0[W ?T �WT ]E
?[g?T (�)]0��
� sup�2B
kE?[g?T (�)]k k[W ?T �WT ]k
= Op(1)op(1):
Uniqueness was proven in Lemma 2.3 of Newey and MacFadden (1994).
Proof of 4.8: Note that by that by Hansen (1982) we have
pT (� � �) d! N(0; (G0WG)�1G0WWG(G0WG)�1);
[46]
as since the normal is continuous we have for � = (G0WG)�1G0WWG(G0WG)�1
supx2Rp
���Pf��1=2T 1=2(� � �0) � xg � �(x)���by Polya's Theorem.
We prove now that
limT!1
P(supx2Rp
�����P?f��1=2sT
k2(�? � ~�) � xg � �(x)
����� � ")= 0:
Let G?T � @g?T (��)=@�0; To prove asymptotic normality notice that by the FOC
pT=k2G?0TW
?T g
?T (�
?) = 0: Hence a
�rst order Taylor expansion around � yieldspT=k2G
?0TW
?T g
?T (~�) + G?0TW
?T~G?TpT=k2(�
? � ~�) = 0;
where �G�T = @g?T (��?)=@�0 and ��� is on a line joining ~� and �?: Solving for
pT=k2(�? � ~�) we obtainp
T=k2(�? � ~�) = �[G?0TW ?
T�G?T ]
�1pT=k2G?0TW ?T g
?T (~�):
Now notice that by a Taylor expansionpT=k2G
?0TW
?T g
?T (~�) =
pT=k2G
?0TW
?T g
?T (�0) +
pT=k2G
?0TW
?T�G?T (
~� � �0)
=pT=k2G
?0TW
?T [g
?T (�0)� ~gT (�0)] +
pT=k2G
?0TW
?T ~gT (�0) +
pT=k2G
?0TW
?T�G?T (
~� � �0);
where �G?T � @g?T ( ��)=@�0 and ~gT (�0) =PTt=1 gt;T (�0)�t and
�� is on a line joining � and �0:Now note that by (A:2) we have
pT ( ~� � �0) = �(G0TWT
�GT )�1pTG0TWT ~gT (�0):
Thus pT=k2G
?0TW
?T ~gT (�0) +
pT=k2G
?0TW
?T~G�T (
~� � �0) =pT=k2G
?0TW
?T ~gT (�0)�
pT=k2G
?0TW
?T~G�T (G
0TWTG
�T )
�1G0TWT ~gT (�0) = oB(1)
sincepT ~gT (�0) = Op(1); WT =W + op(1); GT = G+ op(1); G
?T = G+ oB(1); W
?T =WT + oB(1):
NowpT=k2G�0TW
�T [g
�T (�0)�~gT (�0)] converges to N(0; (G0WG)�1G0WWG(G0WG)�1) by bootstrap CLT Theorem
A.2 and the fact that G�T�G = oB(1) andW �T =W+oB(1). The result follows as the
pT=k2(��� ~�) converges uniformly to
the same asymptotic distribution of T 1=2(���0):We note that ~� can be replaced by �e becausepT ( ~���0)�
pT (�e��0) =
op(1):
Proof of Lemma 4.2: The proof of this Lemma is identical to the proof of Lemma 4.1 with ptT = �t and uses thefact that T �t = 1 + op(1) by Lemma A.1.
Proof of Theorem 4.9: Note that by a Taylor expansionpT=k2g
?(�e?) =pT=k2g
?( ~�) + ~G�pT=k2(�
e� � ~�);
where ~G�T � @g?T ( ��)=@�0 where �� is in a line joining �e? and ~�:Note that by Theorem 4.8 with W ?
T =~?�1p
T=k2(�e? � ~�) = �[G?0T ~?�1 ~G?T ]�1G?0T ~?�1
pT=k2[g
?T (�0)� ~gT (�0)] + oB(1):
Also by a Taylor expansionpT=k2(g
?( ~�)� g?(�0)� ~gT ( ~�) + ~gT (�0)) = ( �G?T � �GT )pT=k2
�~� � �0
�;
where �G?T = @g?T (��)=@�0 where �� is in a line joining ~� and �0 and �GT = @~gT ( ��)=@�
0 where �� is in a line joining ~� and �0.
Now �GT = G+oB(1) by Lemma A.7, �GT = G+op(1) by Lemma A.1 of Smith (2011) and the fact that T �t = 1+op(1)
by Lemma A.1. Also by Theorem 4.8pT�~� � �0
�= Op(1):
Now we show thatpT=k2~g( ~�) = op(1):Note that by a Taylor expansion
pT ~g�~��=pT ~g (�0) + �GT
pT ( ~� � �0);
where �GT = @~gT ( ��)=@�0 where �� is in a line joining ~� and �0: �GT = G+ op(1) by Lemma A.1 of Smith (2011)- and the
fact that T �t = 1 + op(1) by Lemma A.1. Thus by Theorem 4.8 we have
�GTpT ( ~� � �0) = G�G0�1T 1=2g(�0) + op(1):
[47]
Now by Lemma A.2 we have pT ~g (�0) = [G�G
0�1]T 1=2g(�0) + op(1)
HencepT ~g( ~�) = op(1):ThussT
k2g?(�e?) =
pT=k2(g
?(�0)� ~g(�0))� [G?0T ~?�1 ~G?T ]�1G?0T ~?�1pT=k2[g
?T (�0)� ~gT (�0)] + oB(1)
= [Im � ~G?[G?0T ~?�1 ~G?T ]
�1G?0T ~?�1]
pT=k2[g
?T (�0)� gT (�0)] + oB(1):
Now since ~G� = G + oB(1); G�T = G + oB(1) ~
��1 = �1 + oB(1) and by the bootstrap CLT Theorem A.2pT=k2[g�T (�0)� gT (�0)] converges to N(0;) It follows as in Theorem 4.3 that J ? = T g?(�e?)0 ~��1g?(�e?)=k2 converges
to �2(m�p): Since J d! �2(m�p) the result follows by Polya Theorem Ser ing (2002, p.18), as the chi-squared distributionhas a continuous c.d.f.
Proof of 4.10: We start by deriving the asymptotic distribution of W?: De�ne ha;?t (�; ) � (g?(zt; �)0; [q?(zt; �)� ]0), ha;?(�; ) �
PmTt=1 h
a;?t (�; )=mT and ~Q?(�; ) = ha;?(�; )0�?�1ha;?(�; ): Note that the unrestricted GMM esti-
mator solves(�e?0; ?0)0 = argmin
�2B; 2Rm~Q?(�; ):
As before the solution is given by
�e? = argmin�2B
g?(�)0?�1g?(�);
? = q?(�e?)� �?21?�1g?(�e?):
Consistency of �e? follows from Theorem 4.7. We note that by Lemma A.6 and �� = � + oB(1) and ? = + oB(1):
We derive now the asymptotic distribution of (�e?0; ?0)0:Since these estimators satisfy the �rst order conditions we
have D?0�?�1ha;?(�e?; ?) = 0: Thus by a Taylor expansion around��e0; 0
�0D?0�?�1ha;?(�e; ) + D?0�?�1 ~D?
��e? � �e e? �
�= 0;
where �D? � D?( ��?)
D?(�) =
� PmTi=1 G
?t (�) =mT 0PmT
i=1 Q?t (�) =mT �Is
�;
and ��� is in a line joining �e? and �e: Thus
pT
��e? � �e e? �
�= �[D?0�?�1 ~D?]�1D?0�?�1
pT ha;?(�e; ):
Now by a Taylor expansion
pT ha;?(�e; ) = T 1=2ha;?(�0; 0) + �D?
��e � �0
�;
where �D? = D?( ��);where �� is in a line joining � and �0:We show now that
[D?0�?�1 �D?]�1D?0�?�1[T 1=2~haT (�0; 0)� T 1=2 �D?
��e � �0
�] = oB(1):
First notice that by Lemma A.2 above we have
T 1=2~haT (�0; 0) = T�1=2haT (�0; 0)���11�21
�PT 1=2g(�0) + op(1) (A.19)
= T�1=2XT
t=1haT (�0; 0) + �S1PT
1=2g(�0) + op(1):
Thus as �?�1 = ��1 + oB(1) we have
[D?0� �D?]�1D?0�?�1T 1=2~haT (�0; 0)
= [D?0� �D?]�1D?0�?�1T�1=2XT
t=1haT (�0; 0)
�[D?0�?�1 �D?]�1D?0S1PT1=2g(�0) + op(1)
= [D?0�?�1 �D?]�1D?0�?�1T�1=2XT
t=1haT (�0; 0)
�[D?0�?�1 �D?]�1�G?0 Q?0
� � PT 1=2g(�0)0
�= [D?0�?�1 �D?]�1D?0�?�1T�1=2
XT
t=1haT (�0; ) + oB(1);
[48]
as G? = G+ oB(1) by Lemma A.7 and G0P = 0:
Now notice that
pT
��e � �0
�=pT
��e � �0
~
��pT
�0
~ �
�(A.20)
and the usual asymptotic representation of the e�cient GMM estimator yields
pT
��e � �0
�= �[D0��1 �D]�1D0��1T�1=2
XT
t=1hat (�0; 0) + op(1); (A.21)
where �D = D( ��) and �� is in a line joining �e and �0: Hence by (A:20) and (A:21) we have
[D?0�?�1 �D?]�1D?0�?�1 �D?pT
��e � �0
�= [D?0�?�1 �D?]�1D?0�?�1 �D?
pT
��e � �0
~
��OB(1)
pT
�0
~ �
�as D?; �D? �D? converge to D by the Lemma A.7 ; and the fact that �?�1 = ��1 + oB(1): It remains to prove thatpT (~ � ) = op(1):First note that p
T (~ � ) =pT ~q(�e)�
pT q(�e) + �21�
�111
pT g(�e):
Now Lemma A.1 above yields
pT ~q(�e) =
pT q(�e) +
XT
t=1
pT (�i � 1)qtT (�e)
= q(�e) +ST
T
XT
t=1qtT (�
e)gtT (�e)0T 1=2
ST�(1=k2 + op(1)) + op(1):
Also by the FOC of the GEL problem with respect to �
1
T
XT
t=1�1(k�
0gtT (�gel))gtT (�gel) = 0:
Thus by a Taylor expansion around 0 we have
� 1T
XT
t=1gtT (�gel) +
1
T
XT
t=1�2(k~�
0gtT (�gel))gtT (�gel)gtT (�gel)0�=k2 = 0:
Thus
T 1=2
ST�=k2 = [
ST
T
TXt=1
�2(k~�0gtT (�gel))gtT (�gel)gtT (�gel)
0]�1pT gT (�gel):
Now
[ST
T
TXt=1
�2(k~�0gtT (�gel))gtT (�gel)gtT (�gel)
0]�1 = ��111 + op(1)
by Theorem 2.5 of Smith (2011)-. Also by a Taylor expansionpT gT (�gel) =
pT gT (�0) + �GT
pT (�gel � �0)
and �GT � @gT ( ��)=@�0 and �� is in a line joining �gel and �0: now by Lemma A.2 of Smith (2011)-pT gT (�0) =
pT g(�0)+
Op(T�1=2):Since p
T g(�e) =pT g(�0) + �G
pT (�e � �0);
where �G � @g( ��)=@�0: It follows thatpT gT (�gel) =
pT g(�e) + op(1) as �GT and �G converge to G and
pT (�gel � �e) =
op(1): ConsequentlypT (~ � ) = op(1) as T 1=2g(�e) = Op(1):
Hence pT=k2
��e? � �e e? � ~
�= �[D?0�?�1 ~D?]�1D?0�?�1
pT=k2[h
a;?T (�0; 0)� ~haT (�0; 0)] + oB(1):
Thus by a Taylor expansion we havepT=k2
�a(�e?)� a(�e)
e? � ~
�= �R( ��?)[D?0�?�1 ~D?]�1D?0�?�1
pT=k2[h
a;?T (�0; 0)� ~haT (�0; 0)] + oB(1):
where ��? is in a line joining �e? and �e:Thus
W? = (T=k2)[r? � ~r]0
hR?(D?0�?�1D?)�1R?0
i�1[r? � ~r]
=pT=k2[h
a;?T (�0; 0)� ~haT (�0; 0)]0
hR?(D?0�?�1D?)�1R?0
i�1R( ��?)[D?0�?�1 ~D?]�1
�D?0�?�1pT=k2[h
a;?T (�0; 0)� ~haT (�0; 0)]:
[49]
Thus as in Theorem 4.4 above W? converges to a chi-squared distribution with s + r degrees of freedom as D? =D + oB(1) by the bootstrap UWL Lemma A.7 and �
? = � + oB(1) and the fact that by the bootstrap CLT we havepT=k2[h
a;?T (�0; 0)� ~haT (�0; 0)] converging to N(0;�):
We consider now the S? statistic. First we derive the distribution of the bootstrapped restricted estimator. We �rstnote that this estimator is consistent by Theorem 4.7 adapted to the moment restrictions h(zt; �) and considering the thecompact parameter space f� 2 B : a(�) = 0g :
The Lagrangian of the restricted problem is
L? = ~Q?(�; )� a(�)0�� 0�:
Denote the value of the Lagrange Multiplier at the saddle point as '? = (�?0; �?0)0 thus the �rst order conditions are
D?0�?�1h(�e?r ; 0)�R(�e?r )'� = 0:
Multiplying both sides by R(�e?r )0(D?0�?�1D?)�1 we have
R(�e?r )0(D?0�?�1D?)�1D?0�?�1ha;?(�e?r ; 0)�R(�e?r )0(D?0�?�1D?)�1R(�e?r )'
? = 0:
Thus'? = [R(�e?r )
0(D?0�?�1D?)�1R(�e?r )]�1R(�e?r )
0(D?0�?�1D?)�1D?0�?�1h(�e?r ; 0):
Hence
D?0�?�1ha;?(�e?r ; 0)
�R(�e?r )[R(�e?r )0(D?0�?�1D?)�1R(�e?r )]�1R(�e?r )
0(D?0�?�1D�)�1D?0�?�1ha;?(�e?r ; 0) = 0:
But by a Taylor expansion around �er we have ha;?(�?r ; 0) = h
a;�(�er ; 0) + �D?S1(�e?r � �er) where �D? = D?( ��?); ��? is
in a line joining �e?r and �er and S1 is a selection matrix such that
�D?S1 �� PT
i=1 G?t
���?�=TPT
i=1 Q?t
���?�=T
�:
Thus we have
[I �R(�e?r )[R(�e?r )0(D?0�?�1D?)�1R(�e?r )]�1R(�e?r )
0(D?0�?�1D?)�1]
[D?0�?�1pT ha;?(�er ; 0) + D
?0�?�1 �D?S1pT (�e?r � �er)] = 0:
Hence
S1pT (�e?r � �er) = �[D?0�?�1 �D?]�1[I �R(�e?r )[R(�e?r )0(D?0�?�1D?)�1R(�e?r )]
�1R(�?r )0(D?0�?�1D?)�1]
�D?0�?�1pT ha;?(�er ; 0):
Now note that by a Taylor expansion around �0 we havepT=k2(h
a;?(�er ; 0)� ha;?(�0; 0)� ha(�er ; 0) + ha(�0; 0)) = ( �D? � �D)S1pT=k2(�
er � �0) (A.22)
= oB(1);
where �D? = D?( ��) and �� is in a line joining �er and �0 and�D = D( ��) and �� is in a line joining �er and �0: The second
line follows from a UWL an bootstrap UWL and the fact thatpT=k2(�er � �0) = Op(1):
We show now that
D?0�?�1pT=k2~h
aT (�0; 0) + D
?0�?�1pT=k2h
a(�0; 0) = oB(1): (A.23)
Note that by Lemma A.2 we have
D?0�?�1pT=k2~h
aT (�0; 0) = D
?0�?�1pT=k2h
a(�0; 0) + D?0�?�1�S1P
pT=k2g(�0) + op(1);
and D?0�?�1�S1P = DS1P + oB(1) = oB(1) as D? = D+ oB(1) by Lemma A.7; �?�1 = ��1 + oB(1), and the fact that
G0P = 0 the result follows.Thus we have
S1pT=k2(�
e?r � �er) = �[D?0�?�1 �D?]�1
�[I �R(�e?r )[R(�e?r )0(D?0�?�1D?)�1R(�e?r )]�1R(�e?r )
0(D?0�?�1D?)�1]
�D?0�?�1pT=k2(h
a;?(�0; 0)� ~haT (�0; 0))+A?T + oB(1): (A.24)
where
A?T = �[D?0�?�1 ~D?]�1[I �R(�e?r )[R(�e?r )0(D?0�?�1D?)�1R(�e?r )]�1R(�e?r )
0(D?0�?�1D?)�1]
�D?0�?�1pT=k2h(�
er ; 0):
[50]
But by the FOC of the original restricted problem we have
D0��1h(�er ; 0)�R(�er)[R(�er)0(D0��1D)�1R(�er)]�1R(�er)
0(D0��1D)�1D0��1h(�er ; 0) = 0:
Thus
A?T = �[D?0�?�1 ~D?]�1([I �R(�?r )[R(�?r )0(D?0�?�1D?)�1R(�?r )]�1R(�?r )
0(D?0�?�1D?)�1]D?0�?�1
�[I �R(�er)[R(�er)0(D0��1D)�1R(�er)]�1R(�er)
0(D0��1D)�1]D0��1)pT=k2h(�
er ; 0)
= oB(1):
by the bootstrap UWL andpT=k2h(�er ; 0) = Op(1):
Now
S? = ( Tk2)hh?(�e?r )� ~hT (�er)
i0�?�1D?(D?0�?�1D?)�1D?0�?�1
hh?(�e?r )� ~hT (�er)
i:
Notice that by two Taylor expansionspT=k2
hh?(�er)� ~hT (�er)
i=
pT=k2(h
?(�0)� ~hT (�0)) +pT=k2( �D
? � �D)(�r � �0)
=pT=k2(h
?(�0)� ~hT (�0)) + oB(1);
where �D? � @h?( ��r)=@�0 where ��r lies in a line between �r and �0 and �D � @hT ( ��r)=@�0 where ��r lies in a line between�r and �0: The second line is due to a UWL a bootstrap UWL and the fact that
pT=k2(�r � �0) = Op(1):
Thus by a Taylor expansion
D0�?�1pT=k2(h
?(�e?r )� ~hT (�r)) =pT=k2D
?0�?�1 �D�(�e?r � �r)
+pT=k2D
?0�?�1(h?(�er)� ~hT (�er))
= �D?0�?�1 ~D?[D?0�?�1 ~D?]�1[I �R(�?r )[R(�?r )0(D?0�?�1D?)�1R(�?r )]�1R(�?r )
0(D?0�?�1D?)�1]
�D?0�?�1pT=k2(h
a;?(�0; 0)� ~haT (�0; 0))
+D?0�?�1pT=k2(h
a;?(�0; 0)� ~haT (�0; 0))
= [R(�?r )[R(�?r )0(D?0�?�1D?)�1R(�?r )]
�1R(�?r )0(D?0�?�1D?)�1]
�D?0�?�1pT=k2(h
a;?(�0; 0)� ~hT (�0; 0)) + oB(1);
where �D? � @h?( ��r)=@�0 and ��r lies in a line between �er and �0 and using (A:23) :Thus
S? = (T
k2)hh?(�e?r )� ~hT (�er)
i0�?�1D?(D?0�?�1D?)�1D?0�?�1
hh?(�e?r )� ~hT (�er)
i=
pT=k2(h
a;?(�0; 0)� ~haT (�0; 0))0�?�1D?(D?0�?�1D?)�1
�[R(�e?r )0(D?0�?�1D?)�1R(�e?r )]�1R(�e?r )
0(D?0�?�1D?)�1
�[R(�e?r )[R(�e?r )0(D?0�?�1D?)�1R(�e?r )]�1R(�?r )
0(D?0�?�1D?)�1]
�D?0�?�1pT=k2(h
a;?(�0; 0)� ~haT (�0; 0)) + oB(1)
=pT=k2(h
a;?(�0; 0)� ~haT (�0; 0))0�?�1D?(D?0�?�1D?)�1
�[R(�e?r )0(D?0�?�1D?)�1R(�e?r )]�1R(�e?r )
0(D?0�?�1D?)�1]
�D?0�?�1pT=k2(h
a;?(�0; 0)� ~haT (�0; 0)) + oB(1);
and the result follows.Now we consider the distance statistic
D? = (T
k2)[[h?(�e?r )� ~h(�r)]0�?�1[h?(�e?r )� ~h(�r)]� g?(�e?)0 ~?�1g?(�e?)]
= (T
k2)[[ha;?(�e?r ; 0)� ~haT (�r; 0)]0�?�1[ha;?(�e?r ; 0)� ~haT (�r; 0)]
�ha;?(�e?; ?)0~�?�1ha;?(�e?; ?)];
asg?(�e?)0 ~?�1g?(�e?) = ha;?(�e?; ?)0~�?�1ha;?(�e?; ?):
Note now that by two Taylor expansions
ha;?(�e?r ; 0)� ~haT (�er ; 0) = ha;?(�e?; ?)� ~haT (�e; )
+ �D?
��e?r � �e?� ?
�� �D
��er � �e�
�;
[51]
where �D? � @ha;?( ��?; � ?)=@�0 and (��?0; � ?0)0 is in a line joining (�e?0r ; ?0)0 and (�e?0; 00)0 and �D � @~haT (��; � )=@�0 and
(��0; � 0)0 is in a line joining (�e0r ; 00)0 and (�e0; 0)0: Thus
T
k2[ha;?(�e?r ; 0)� ~ha(�er ; 0)]0�?�1[ha;?(�e?r ; 0)� ~ha(�er ; 0)]
=T
k2f[ha;?(�e?; ?)� ~ha(�e; ) + �D?
��e?r � �e�� ?
�� �D
��er � �e�
�]0�?�1
�[ha;?(�e?; ?)� ~ha(�e; ) + �D?
��e?r � �e�� ?
�� �D
��er � �e�
�]g
=T
k2[ha;?(�e?; ?)� ~ha(�e; )]0�?�1[ha;?(�e?; ?)� ~ha(�e; )]
+T
k22[ �D?
��e?r � �e?� ?
�� �D
��er � �e�
�]0�?�1[ha;?(�e?; ?)� ~ha(�e; )]
+T
k2[ �D?
��e?r � �e?� ?
�� �D
��er � �e�
�]0�?�1
�[ �D?
��e?r � �e?� ?
�� �D
��er � �e�
�]:
Note thatpT �D?0�?�1~haT (�
e; ) = �D?0�?�1pT haT (�
e; ) + �D?0�?�1pTXT
t=1(n�t � 1)htT (�e; )
= �D?0�?�1pT haT (�
e; ) + �D?0�?�1ST
T
XT
t=1htT (�
e; )g0tTT 1=2
ST�+ op(1)
= �D?0�?�1pT haT (�
e; )� �D?0�?�1(�S1 + op(1))(T1=2P gT (�0) + op(1))
using Lemma A.1 and the fact that (T 1=2=ST )� = �T 1=2P gT (�0) + op(1) by Smith (2011) Proof of Theorem 2.3 (see
expression B.2, p A.11). Now as GP = 0 and �D� = D + op(1) by Lemma A.7 and �?�1 = ��1 + op(1) and the factT 1=2gT (�0) = Op(1) that we have
pT �D?0�?�1~ha(�; ) =
pT �D?0�?�1ha(�; ) + op(1); (A.25)
Now by three Taylor expansions we have
ha;?(�e?; ?)� ~haT (�e; ) = ha;?(�e; )� ~haT (�e; ) + �D?
��e? � �e ? �
�= ha;?(�0; 0)� ~haT (�0; 0) + ( �D? �
...D)
��e � �0
�+ �D?
��e? � �e ? �
�= ha;?(�0; 0)� ~haT (�0; 0) + ~ha(�0; 0)� ha(�0; 0)
+( �D� �...D)
��e � �0
�+ �D?
��e? � �e ? �
�= OB(1=
pT );
where �D? � @ha;?( ��?; � ?)=@�0 and (��?0; � ?0)0 is in a line joining (�e?0; ?0)0 and (�e0; 0)0 and...D � @~haT (
...� ;... )=@�0 and
(...�0;... 0)0 is in a line joining (�e0; 0)0 and (�00; 0
0)0 and �D? = @ha;?( ��?; � ?)=@�0 and (��0; � 0)0 is in a line joining (�e0; 0)0
and (�00; 00)0: The result follows from the bootstrap CLT, the standard CLT, Lemma A.2 and asymptotic normality of
((�e? � �e)0; ( ? � )0)0 and ((�e � �0)0; 0)0.Thus by A.25
�D?0�?�1pT [ha;?(�e?; ?)� ~haT (�e; )] = �D?0�?�1
pT [ha;?(�e?; ?)� haT (�e; )] + op(1)
= [ �D?0�?�1 � D?0�?�1]pT [ha;?(�e?; ?)� haT (�e; )] + op(1);
as D?0�?�1ha;?(�e?; ?) = 0 and D?0�?�1pT ha(�e; ) = [D?0�?�1�D0��1]
pT ha(�e; ) = op(1) since D0��1ha(�e; ) =
0 and consequently
D0��1pT haT (�
e; ) = D0��1pT (haT (�
e; )� ha(�e; )) +
= D0��1pT (haT (�0; 0)� ha(�0; 0)) + D0��1( �DT � �D)
pT
��e � �0
�= oB(1)
with �DT = @~haT (��; � )=@�0 and �D = @~ha( ��; � )=@�0 and (��0; � 0)0 is in a line joining (�e0; 0)0 and (�00; 0
0)0: The result followsfrom equation (A11) page [A.3] and Lemma A.1 of Smith (2011).
[52]
SincepT [ha;?(�e?; ?) � ha(�e; )] = Op(1) and �D? = D + oB(1); D
? = D + oB(1) by Lemma A.7 and �?�1 =
��1+op(1) and the fact thatpT [ha;?(�e?; ?)�ha(�e; )] = Op(1) it follows that �D?0�?�1
pT [ha;?(�e?; ?)�~ha(�e; )] =
oB(1): Similarly �D�?�1pT [ha;?(�e?; ?)� ~ha(�e; )] = oB(1):
Note also that �e?r ��e? = (�e?r ��er)�(�e?��e)+(�er��0)�(�e��0) = OB(1=pT ), �er��e = (�er��0)�(�e��0) =
Op(1=pT ) and ? � + = OB(1=
pT ) and = Op(1=
pT ):
HenceT
k22[ �D?
��e?r � �e?� ?
�� �D
��er � �e�
�]0�?�1[ha;?(�e?; ?)� ~haT (�e; )] = oB(1):
Now note thatsT
k2[ �D?
��e?r � �e?� ?
�� �D
��er � �e�
�] = �
sT
k2D(
��e? � �e ? �
����e?r � �er
0
�) + oB(1)
as �D? = D + oB(1); �D = D + op(1) and �r � � = Op(1=pT ) and � = OB(1=
pT ) and = Op(1=
pT ):
Thus
�
sT
k2D(
��e? � �e ? �
����e?r � �er
0
�) =
sT
k2D[D?0�?�1 ~D?]�1[R(�?r )
�[R(�?r )0(D?0�?�1D?)�1R(�?r )]�1R(�?r )
0(D?0�?�1D?)�1]D?0�?�1pT=k2(h
a;?(�0; 0)� ~haT (�0; 0));
using the asymptotic representations ofpT=k2((�e? � �e)0; ( ? � )0)0 given in Theorem 4.8 and of S1
pT=k2(�e?r � �er)
given in (A.24):Thus
T
k2[ �D?
��e?r � �e?� ?
�� �D
��er � �e�
�]0�?�1[ �D?
��e?r � �e?� ?
�� �D
��er � �e�
�]
= [pT=k2(h
a;?(�0; 0)� ~haT (�0; 0))0�?�1D?(D?0�?�1D?)�1R(�e?r )[R(�e?r )
0(D?0�?�1D?)�1R(�e?r )]�1
�R(�e?r )0[D?0�?�1 ~D?]�1D0�?�1D[D?0�?�1 ~D?]�1[R(�e?r )[R(�e?r )
0(D?0�?�1D?)�1R(�e?r )]�1
�R(�e?r )0(D?0�?�1D?)�1]D?0�?�1pT=k2(h
a;?(�0; 0)� ~haT (�0; 0))) + oB(1)
= [pT=k2(h
a;?(�0; 0)� ~haT (�0; 0))0�?�1D?(D?0�?�1D?)�1R(�e?r )[R(�e?r )
0(D?0�?�1D?)�1R(�e?r )]�1
�R(�e?r )0(D?0�?�1D?)�1]D?0�?�1pT=k2(h
a;?(�0; 0)� ~haT (�0; 0)) + oB(1);and the result follows as in the proof of Theorem 4.4.
Proof of Theorem 4.11: We start by deriving the asymptotic distribution ofWy:De�ne ha;yt (�; ) � (gy(zt; �)0; [qy(zt; �)� ]0), ha;y(�; ) =
PmTt=1 h
a;yt (�; )=mT and ~Qy(�) = ha;y(�; )0�y�1ha;y(�; ): Note that the unrestricted GMM estimator
solves(�ey0; y0) = argmin
�2B; 2Rm~Qy(�; ):
The solution is given by
�ey = argmin�2B
gy(�)0y�1gy(�);
y = qy(�ey)� �y21y�1gy(�ey):
Consistency of �ey follows from Theorem 4.7 hence �ey = �e + oB(1) and since �e = �er + op(1) as � and �r are both
consistent we have �ey = �er + oB(1). We note that by Lemma A.6 and �y = �+ oB(1) we have ey = + oB(1) = oB(1):
Since these estimators satisfy the �rst order conditions we have Dy0�y�1hy(�ey; y) = 0: Thus by a Taylor expansion
around (�e0r ; 00)0 we have
Dy0�y�1hy(�r; 0) + Dy0�y�1 �Dy
��ey � �er y
�= 0;
where �Dy = Dy( ��y);
Dy(�) =
PmTi=1 G
yt (�)=mT 0PmT
i=1 Qyt (�)=mT �Is
!;
and ��y is in a line joining �ey and �er : Thus
pT
��ey � �er y
�= �[Dy0�y�1 �Dy]�1Dy0�y�1
pT ha;y(�er ; 0):
Now notice that expandingpT ha;y(�er ; 0) around �0 yieldsp
T ha;y(�er ; 0) =pT ha;y(�0; 0) + �DyS1(�
er � �0)
=pT ha;y(�0; 0)�
pT~haT (�0; 0)
+pT~haT (�0; 0) +
~DyS1(�er � �0);
[53]
where ~Dy = Dy( ~�y) and ~�y is in a line joining �er and �0:
By the asymptotic representation of �er we have
�DyS1pT (�er � �0) = �DS1[�� �R0
�R�R0
��1R�]D0��1
pT haT (�0; 0) + op(1):
as �Dy = D + oB(1) by Lemma A.7.Also by Lemma A.4 we have
pT~haT (�0; 0) = DS1[�� �R0
�R�R0
��1R�]D0��1]
pT haT (�0; 0) + op(1); (A.26)
ConsequentlypT~ha(�0; 0) + ~DyS1
pT (�r � �0) = op(1): (A.27)
It follows thatpT=k2
��ey � �er y
�= �[Dy0�y�1 �Dy]�1Dy0�y�1[
sT
k2ha;y(�0; 0)�
pT~haT (�0; 0)]:
Thus by a Taylor expansion we havepT=k2
�a(�ey) y
�=
pT=k2
�A( ��y)(�ey � �er)
y
�=
�A( ��y) 00 I
�pT
��ey � �er y
�= �R( ��y)[Dy0�y�1 ~Dy]�1D?0�?�1
pT=k2[h
a;?T (�0; 0)� ~haT (�0; 0)] + oB(1);
where ��y is in a line joining �ey and �er :Thus
Wy = (T=k2)ry0[Ry(Dy0�y�1Dy)�1Ry0]�1ry
=pT=k2[h
a;yT (�0; 0)� ~haT (�0; )]0�y�1Dy[Dy0�y�1 ~Dy]�1R( ��y)0[Ry(Dy0�y�1Dy)�1Ry0]�1
�R( ��y)[Dy0�y�1 ~Dy]�1Dy0�y�1pT=k2[h
a;?T (�0; 0)� ~haT (�0; 0)]:
Thus as in the proof of Theorem 2.4 aboveWy converges to a chi-squared distribution with s+ r degrees of freedom as
Dy = D+oB(1) by the Lemma A.7 and �y = �+oB(1) and the fact that by the bootstrap CLT we havepT=k2[h
a;yT (�0; )�
~haT (�0; )] converging to N(0;�):
We consider know the Sy statistic. First we derive the distribution of the bootstrap restricted estimator. We note that�yr is consistent by Theorem 4.7 applied to the moment indicators h(zt; �) and restricted parameter space Br. Note thatthe Lagrangian of the restricted problem is
Ly = ~Qy(�; )� a(�)0�y � 0�:
Denote 'y = (�y0; �y0)0 � and � are the Lagrange multipliers evaluated at the optimum. Thus the �rst order conditionsyield
Dy0�y�1h(�eyr ; 0)�R(�eyr )'y = 0:Multiplying both sides by R(�eyr )0(Dy0�y�1Dy)�1 we have
R(�eyr )0(Dy0�y�1Dy)�1Dy0�y�1ha;y(�er ; 0)�R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )'
y = 0:
Thus'y = [R(�eyr )
0(Dy0�y�1Dy)�1R(�eyr )]�1R(�eyr )
0(Dy0�y�1Dy)�1Dy0�y�1h(�eyr ; 0):
Hence
Dy0�y�1ha;y(�eyr ; 0)�R(�eyr )[R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )]�1R(�eyr )
0(Dy0�y�1Dy)�1Dy0�y�1ha;y(�eyr ; 0) = 0:
But by a Taylor expansion ha;y(�eyr ; 0) = ha;y(�er ; 0) + ~DyS1(�eyr � �er) where where S1 is a selection matrix such that
~DyS1 =
0@ PTi=1 G
yt
�~�y�=TPT
i=1 Qyt
�~�y�=T
1A~Dy = Dy( ~�y) where ~�y is in a line joining �eyr and �er ; thus we have
[I �R(�eyr )[R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )]�1R(�eyr )
0(Dy0�y�1Dy)�1]
�Dy0�y�1pT ha;y(�er ; 0) + D
y0�y�1 ~DyS1pT (�eyr � �er) = 0:
Hence
S1pT (�eyr � �er) = �[Dy0�y�1 ~Dy]�1[I �R(�eyr )[R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )]
�1R(�eyr )0(Dy0�y�1Dy)�1]
�Dy0�y�1pT ha;y(�er ; 0):
[54]
Now note that by a Taylor expansion around �0 we havepT=k2(h
a;y(�er ; 0)� ha;y(�0; 0)� ~haT (�er ; 0) + ~haT (�0; 0)) = ( �Dy � �D)S1pT=k2(�
er � �0) (A.28)
= oB(1);
where �Dy = Dy( ��) and �� lies in a line joining �er and �0 and �D = D( ��r) where
~D(�) =
� PTi=1Gt (�)�t;r 0PTi=1Qt (�)�t;r �Is
�;
and ��r is in a line joining �er and �0:Note that by a Taylor expansionp
T=k2~haT (�
er ; 0) =
pT=k2~h
aT (�0; 0) +
...DS1(�
er � �0); (A.29)
where...D = D(
...� ) and
...� is on a line joining �er and �0: Note that similarly to (A:27) the rhs of (A:29) is op(1): Thus we
have
S1pT (�eyr � �er) = �[Dy0�y�1 ~Dy]�1[I �R(�eyr )[R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )]
�1R(�eyr )0(Dy0�y�1Dy)�1]
�Dy0�y�1pT [ha;y(�0; 0)�
pT~haT (�0; 0)] + oB(1):
Now let us consider the score statistic:
Sy = ( Tk2)hy(�eyr )
0�y�1Dy(Dy0�y�1Dy)�1Dy0�y�1hy(�eyr ):
We proved above that pT=k2(h
a;y(�er ; 0)� ha;y(�0; 0) + ~haT (�0; 0)) = oB(1):Notice also by a Taylor expansionp
T=k2hay(�eyr ; 0) =
pT=k2h
a;y(�er ; 0) +pT=k2 �D
yS1(�eyr � �er);
where �Dy = Dy( ��) and �� is in a line joining �er and �0:Thus
Dy0�y�1pT=k2h
y(�eyr ) = Dy0�y�1[ha;y(�0; 0)� ~h(�0; 0)]�Dy0�y�1( �Dy)[Dy0�y�1 ~Dy]�1Dy0�y�1
pT [ha;y(�0; 0)�
pT~haT (�0; 0)]
+[Dy0�y�1 ~Dy]�1R(�eyr )[R(�eyr )
0(Dy0�y�1Dy)�1R(�eyr )]�1
�R(�eyr )0(Dy0�y�1Dy)�1Dy0�y�1pT [ha;y(�0; 0)�
pT~haT (�0; 0)] + op(1)
= [Dy0�y�1 ~Dy]�1R(�eyr )[R(�eyr )
0(Dy0�y�1Dy)�1R(�eyr )]�1
�R(�eyr )0(Dy0�y�1Dy)�1Dy0�y�1pT [ha;y(�0; 0)�
pT~haT (�0; 0)] + op(1):
Hence
S� = (T
k2)hy(�eyr )
0�y�1Dy(Dy0�y�1Dy)�1Dy0�y�1hy(�eyr )
=pT=k2(h
a;y(�0; 0)� ~haT (�0; 0))0�y�1Dy(Dy0�y�1Dy)�1
�[R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )]�1R(�eyr )
0(Dy0�y�1Dy)�1[R(�eyr )[R(�eyr )
0(Dy0�y�1Dy)�1R(�eyr )]�1
�R(�eyr )0(Dy0�y�1Dy)�1]Dy0�y�1pT=k2(h
a;?(�0; 0)� ~haT (�0; 0)) + oB(1)
=pT=k2(h
a;y(�0; 0)� ~haT (�0; 0))0�y�1Dy(Dy0�y�1Dy)�1[R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )]
�1
�R(�eyr )0(Dy0�y�1Dy)�1]Dy0�y�1pT=k2(h
a;y(�0; 0)� ~haT (�0; 0)) + oB(1)
as the result follows as in the proof of Theorem 2.4.Now we consider the Dy statistic:
Dy = (T
k2)[hy(�eyr )
0�y�1hy(�eyr )� gy(�ey)0 ~y�1gy(�ey)
= (T
k2)[ha;y(�eyr ; 0)
0�y�1ha;y(�eyr ; 0)� ha;y(�ey; y)0~�y�1ha;y(�ey; y)]:
Expanding ha;y(�eyr ; 0) around (�ey; y) yields
ha;y(�eyr ; 0) = ha;y(�ey; y)
+ �Dy��eyr � �ey� y
�= ha;y(�ey; y) + �Dy[
��eyr � �er
0
����ey � �er y
�];
[55]
where �Dy = D( ��) and �� lies in a line joining �eyr and �ey: Thus
(T
k2)ha;y(�eyr ; 0)
0�y�1ha;y(�eyr ; 0)
= (T
k2)[ha;y(�ey; y) + �Dy[
��eyr � �er
0
����ey � �er y
�]]0�y�1
�[ha;y(�ey; y) + �Dy[
��eyr � �er
0
����ey � �er y
�]]
= (T
k2)ha;y(�ey; y)0�y�1ha;y(�y; y)
+(T
k2)2f �Dy[
��eyr � �er
0
����ey � �er y
�]g0�y�1ha;y(�y; y)
+(T
k2)f �Dy[
��eyr � �er
0
����ey � �er y
�]g0�y�1f �Dy[
��eyr � �er
0
����ey � �er y
�g:
Now notice that by the �rst order conditions of the bootstrapped GMM problem we havepT �Dy0�y�1ha;y(�ey; y) =
pT ( �Dy � Dy)0�y�1ha;y(�ey; y)
= oB(1)[pT ha;y(�y; y]:
Now note that by two Taylor expansions
ha;y(�ey; y) = ha;y(�e; ) +...Dy��ey � �e y �
�= ha;y(�0; 0) + �Dy
��e � �0
�+...Dy��ey � �e y �
�= ha;y(�0; 0)� ~ha(�0; 0) + ~ha(�0; 0)
+ ~Dy��e � �0
�+...Dy��ey � �e y �
�= OB(1=
pT );
where �Dy = Dy( ��) and �� is in a line joining �e and �0 and...Dy= Dy(
...� ) and
...� is in a line joining �ey and �e: Thusp
T �Dy0�y�1ha;y(�y; y) = oB(1) and
(T
k2)f �Dy[
��eyr � �er
0
����ey � �er y
�]g0�y�1ha;y(�ey; y) = oB(1)
as �eyr � �er = OB(1=pT ) and �ey � �er = OB(1=
pT ):
Additionally notice that
(T
k2)fha;y(�ey; y)0�y�1ha;y(�ey; y)� ha;y(�ey; y)0~�y�1ha;y(�ey; y)g
= (T
k2)fha;y(�ey; y)0(�y�1 � ~�y�1)ha;y(�ey; y)g = oB(1);
as �y�1 � ~�y�1 = oB(1) the result follows as we proved that ha;y(�ey; y) = OB(1=pT ):
Also sT
k2�Dyf��eyr � �er
0
����ey � �er y
�g = �
sT
k2D(
��ey � �e y
����eyr � �er
0
�) + oB(1):
Thus
�
sT
k2D(
��ey � �er y
����eyr � �r
0
�) = �
sT
k2D[Dy0�y�1 ~Dy]�1[R(�eyr )
�[R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )]�1
�R(�eyr )0(Dy0�y�1Dy)�1]Dy0�y�1pT=k2(h
a;y(�0; 0)� ~haT (�0; 0))+oB(1);
as pT=k2
��ey � �er y
�= �[Dy0�y�1 �Dy]�1Dy0�y�1[
sT
k2ha;y(�0; 0)�
pT~haT (�0; 0)];
[56]
S1pT (�eyr � �er) = �[Dy0�y�1 ~Dy]�1[I �R(�eyr )[R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )]
�1R(�eyr )0(Dy0�y�1Dy)�1]
�Dy0�y�1pT [ha;y(�0; 0)�
pT~haT (�0; 0)] + op(1):
and the facts that Dy = D+oB(1) by Lemma A.7 also R(�eyr ) = R+oB(1); by continuity of R(�) and �
y�1 = ��1+op(1):Thus
T
k2f �Dy[
��eyr � �er
0
����ey � �er y
�]g0�y�1f �Dy[
��eyr � �er
0
����ey � �er y
�g
=T
k2[pT=k2(h
a;y(�0; 0)� ~haT (�0; 0))0�y�1Dy(Dy0�y�1Dy)�1R(�eyr )[R(�eyr )
0(Dy0�y�1Dy)�1R(�eyr )]�1
�R(�eyr )0[Dy0�y�1 ~Dy]�1D0�y�1D[Dy0�y�1 ~Dy]�1
�[R(�eyr )[R(�eyr )0(Dy0�y�1Dy)�1R(�eyr )]�1R(�eyr )
0(Dy0�y�1Dy)�1]
�Dy0�y�1pT=k2(h
a;y(�0; 0)� haT (�0; 0))
=T
k2[pT=k2(h
a;y(�0; 0)� ~haT (�0; 0))0�y�1Dy(Dy0�y�1Dy)�1R(�eyr )[R(�eyr )
0(Dy0�y�1Dy)�1R(�eyr )]�1
�R(�eyr )0(Dy0�y�1Dy)�1]Dy0�y�1pT=k2(h
a;y(�0; 0)� ~haT (�0; 0)) + oB(1);
and the result follows as in the proof of Theorem 2.4.
A ReferencesAllen, J., Gregory, A. W. and K. Shimotsu, (2011): \Empirical likelihood block bootstrapping," Journal of Econometrics,
vol. 161(2), 110-121.
Andrews, D.W.K (1991): \Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation," Econo-metrica, vol. 59(3), 817-58.
Andrews, D.W.K., (2002): \Equivalence of the Higher Order Asymptotic E�ciency of k-step and Extremum Statistics,"Econometric Theory, vol. 18(05), 1040-1085.
Andrews, D.W.K. and Monahan, J.C. (1992):\An Improved Heteroskedasticity and Autocorrelation Consistent Covari-ance Matrix Estimator," Econometrica, 60, 1992, 953|966.
Bravo F. and F. Crudu (2011): \E�cient Bootstrap with Weakly Dependent Processes," forthcoming in ComputationalStatistics and Data Analysis.
Brown, B.W. and W.K. Newey, (2002): \Generalized Method of Moments, E�cient Bootstrapping, and Improved Infer-ence," Journal of Business and Economic Statistics, vol. 20(4), 507-17.
Burnside, C. and M.S. Eichenbaum, (1996): \Small-Sample Properties of GMM-Based Wald Tests," Journal of Businessand Economic Statistics, vol. 14(3), 294-308.
Carlstein, E. (1986): \The use of subseries values for estimating the variance of a general statistic from a stationarysequence," Annals of Statistics, 14, 1171-1179.
Cattaneo,M.D.; Crump, R. K. and Jansson, M. (2010): \Bootstrapping Density-Weighted Average Derivatives," CRE-ATES Research Papers 2010-23, School of Economics and Management, University of Aarhus.
Christiano, L. J. and den Haan, W.J., (1996):. \Small-Sample Properties of GMM for Business-Cycle Analysis," Journalof Business & Economic Statistics, American Statistical Association, vol. 14(3), pages 309-27, July
D'Antona, G. and A. Ferrero (2005): Digital Signal Processing for Measurement Systems: Theory and Applications,Springer, New York, U.S.A..
Davidson, J.E.H. (1994): Stochastic Limit Theory. Oxford: Oxford University Press.
Davidson, R. and J. G. MacKinnon, (1999): \Bootstrap Testing in Nonlinear Models," International Economic Review,,vol. 40(2), 487-508.
Efron B. (1979). \Bootstrap Methods: Another Look at the Jackknife," Annals of Statistics, 7 (1), 1{26.
Gallant, R. (1987): Nonlinear Statistical Models, John Wiley and Sons, New York,
Gon�calves S. and H. White (2004): \Maximum likelihood and the bootstrap for nonlinear dynamic models," Journal ofEconometrics, vol. 119(1), 199-219.
Hahn, J. (1996): \A Note on Bootstrapping Generalized Method of Moments Estimators," Econometric Theory, vol.12(01), 187-197.
Hall, A. (2005): Generalized Method of Moments, New York (NY): Oxford University Press.
Hall, P. (1992). The Bootstrap and Edgeworth Expansion, Springer-Verlag New York Inc.
Hall, P. and J. L. Horowitz (1996): \Bootstrap Critical Values for Tests Based on Generalized Method of MomentsEstimators," Econometrica, 891-916.
[57]
Hansen, L. P., (1982): \Large Sample Properties of Generalized Method of Moments Estimators," Econometrica, vol.50(4), 1029-1054.
Inoue, A. and M. Shintani (2006): \Bootstrapping GMM estimators for time series," Journal of Econometrics, vol.133(2), 531-555.
K�unsch, H. (1989): \The jacknife and the bootstrap for general stationary observations," Annals of Statistics, 17, 1217{1241.
Liu, R. and K. Singh, (1992): \Moving blocks jacknife and bootstrap capture weak dependence," in Exploring the Limitsof Bootstrap. Wiley, New York, eds. Lepage, R. and Billard, L., 225{248.
Newey, W.K. and D. McFadden (1994): \Large Sample Estimation and Hypothesis Testing," Handbook of Econometrics,Volume 4 R.F. Engle and D.L. McFadden (eds.), 2111-2245.
Newey, W.K. and K.D.West, (1987): \A Simple, Positive Semi-de�nite, Heteroskedasticity and Autocorrelation ConsistentCovariance Matrix," Econometrica, 55, 703-08.
Newey, W. K, and K.D.West, (1994): \Automatic Lag Selection in Covariance Matrix Estimation," Review of EconomicStudies, vol. 61(4), 631-653.
Ng, S. and Perron, P. (1996): \The Exact Error in Estimating the Spectral Density at the Origin," Journal of TimeSeries Analysis 17, 379{408.
Paparoditis, E. and D. N. Politis (2001): \Tapered Block Bootstrap," Biometrika, vol. 88, No. 4, 1105-1119.
Parente, P.M.D.C., and Smith, R.J. (2018a): \Kernel Block Bootstrap," CWP 48/18, Centre for Microdata Methods andPractice, U.C.L and I.F.S.
Parente, P.M.D.C. and R.J. Smith (2018b): \Quasi-Maximum Likelihood and The Kernel Block Bootstrap for NonlinearDynamic Models," working paper.
Politis, D. and Romano, J. (1992b): \A general resampling scheme for triangular arrays of �-mixing random variableswith application to the problem of spectral density estimation," Annals of Statistics, 20, 1985-2007.
Politis, D. N. and Romano, J. P. (1995): \Bias-corrected nonparametric spectral estimation". Journal of Time SeriesAnalysis 16, 67{103.
Pollard, D. (1991): \Asymptotics for Least Absolute Deviation Regression Estimators," Econometric Theory, Vol 7,186-199.
Ramalho, J.J.S., and R.J. Smith (2011): \Goodness of Fit Tests for Moment Conditions Models," working paper, Uni-versidade de Evora.
Rao, C. (2002): Linear Statistical Inference and its Applications, Wiley.
Rao, C. R. and S. K. Mitra, (1971). Generalized inverse of matrices and its applications. New York: Wiley:
Royden H.L. (1988): Real analysis, 3ed., Macmillan.
Ruud, P. (2000): An Introduction to Classical Econometric Theory, Oxford University Press.
Ser ing, R. (2002): Approximation Theorems of Mathematical Statistics, New York: Wiley.
Smith, R.J., (1997): \Alternative Semi-parametric Likelihood Approaches to Generalised Method of Moments Estima-tion," Economic Journal, vol. 107(441), 503-19.
Smith. R.J. (2011): \GEL Criteria for Moment Condition Models," Econometric Theory, 27, 1192-1235.
Tauchen G. (1985): \Diagnostic Testing and Evaluation of Maximum Likelihood Models," Journal of Econometrics, 30,415-443.
White, H., (1984): Asymptotic theory for econometricians. Academic Press.
White, H., (1999): Asymptotic theory for econometricians. Academic Press, 2nd ed.
White H. and X. Chen (1996): \Laws of Large Numbers for Hilbert Space-Valued Mixingales With Applications,"Econometric Theory, 12, 284-304 (1996).
Wooldridge,J. (1994): \Estimation and Inference for Dependent Processes," in Handbook of Econometrics, Volume 4 R.F.Engle and D.L. McFadden (eds.), 2639-2738.
[58]