on standard inference for gmm with local identi–cation ... · the moment conditions in (2.1), we...

53
On Standard Inference for GMM with Local Identication Failure of Known Forms Ji Hyung Lee y Zhipeng Liao z First Version: April 2014; This Version: October, 2016 Abstract This paper studies the GMM estimation and inference problem that occurs when the Jaco- bian of the moment conditions is rank decient of known forms at the true parameter values. Dovonon and Renault (2013) recently raised a local identication issue stemming from this type of degenerate Jacobian. The local identication issue leads to a slow rate of convergence of the GMM estimator and a non-standard asymptotic distribution of the over-identication test statistics. We show that the known form of rank-decient Jacobian matrix contains non-trivial information about the economic model. By exploiting such information in estimation, we provide GMM estimator and over-identication tests with standard properties. The main theory devel- oped in this paper is applied to the estimation of and inference about the common conditionally heteroskedastic (CH) features in asset returns. The performances of the newly proposed GMM estimators and over-identication tests are investigated under the similar simulation designs used in Dovonon and Renault (2013). Keywords: Degenerate Jacobian, Conditionally Heteroskedastic Factors, GMM, Local Identica- tion Failure, Non-standard Inference, Over-identication Test, Asymptotically Exact Inference We acknowledge useful comments from Don Andrews, Anil Bera, Xu Cheng, Denis Chetverikov, Yanqin Fan, Jinyong Hahn, Guofang Huang, Roger Koenker, Ivana Komunjer, Rosa Matzkin, Peter Phillips, Shuyang Sheng, Ruoyao Shi, Yixiao Sun, and participants in Econometrics seminar at UC Davis, UCLA, UCSD, UIUC, Yale and 2014 Seattle-Vancouver Econometrics Conference. We also thank the Co-Editor, Anna Mikusheva, and two anonymous referees for constructive suggestions which improved this paper. Any errors are the responsibility of the authors. y Department of Economics, University of Illinois, 1407 W. Gregory Dr., 214 David Kinley Hall, Urbana, IL 61801. Email: [email protected] z Department of Economics, UC Los Angeles, 8379 Bunche Hall, Mail Stop: 147703, Los Angeles, CA 90095. Email: [email protected] 1

Upload: others

Post on 22-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

On Standard Inference for GMM

with Local Identification Failure of Known Forms∗

Ji Hyung Lee † Zhipeng Liao‡

First Version: April 2014; This Version: October, 2016

Abstract

This paper studies the GMM estimation and inference problem that occurs when the Jaco-

bian of the moment conditions is rank deficient of known forms at the true parameter values.

Dovonon and Renault (2013) recently raised a local identification issue stemming from this type

of degenerate Jacobian. The local identification issue leads to a slow rate of convergence of

the GMM estimator and a non-standard asymptotic distribution of the over-identification test

statistics. We show that the known form of rank-deficient Jacobian matrix contains non-trivial

information about the economic model. By exploiting such information in estimation, we provide

GMM estimator and over-identification tests with standard properties. The main theory devel-

oped in this paper is applied to the estimation of and inference about the common conditionally

heteroskedastic (CH) features in asset returns. The performances of the newly proposed GMM

estimators and over-identification tests are investigated under the similar simulation designs

used in Dovonon and Renault (2013).

Keywords: Degenerate Jacobian, Conditionally Heteroskedastic Factors, GMM, Local Identifica-

tion Failure, Non-standard Inference, Over-identification Test, Asymptotically Exact Inference

∗We acknowledge useful comments from Don Andrews, Anil Bera, Xu Cheng, Denis Chetverikov, Yanqin Fan,Jinyong Hahn, Guofang Huang, Roger Koenker, Ivana Komunjer, Rosa Matzkin, Peter Phillips, Shuyang Sheng,Ruoyao Shi, Yixiao Sun, and participants in Econometrics seminar at UC Davis, UCLA, UCSD, UIUC, Yale and 2014Seattle-Vancouver Econometrics Conference. We also thank the Co-Editor, Anna Mikusheva, and two anonymousreferees for constructive suggestions which improved this paper. Any errors are the responsibility of the authors.†Department of Economics, University of Illinois, 1407 W. Gregory Dr., 214 David Kinley Hall, Urbana, IL 61801.

Email: [email protected]‡Department of Economics, UC Los Angeles, 8379 Bunche Hall, Mail Stop: 147703, Los Angeles, CA 90095.

Email: [email protected]

1

Page 2: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

1 Introduction

The generalized method of moments (GMM) is a popular method for empirical research in economics

and finance. Under some regularity conditions, Hansen (1982) showed that the GMM estimator

has standard properties, such as√T -consistency and asymptotic normal distribution. The over-

identification test (J-test) statistics has an asymptotic Chi-square distribution. On the other hand,

when some of the regularity conditions are not satisfied, the GMM estimator may have non-standard

properties. For example, when the moment conditions only contain weak information, the GMM

estimator may be inconsistent and have mixture normal asymptotic distribution (see, e.g., Andrews

and Cheng, 2012, Staiger and Stock, 1997 and Stock and Wright 2000).

Dovonon and Renault (2013, hereafter DR) recently pointed out an interesting issue that oc-

curs due to the violation of one regularity condition. When testing for common conditionally

heteroskedastic (CH) features in asset returns, DR showed that the Jacobian of the moment con-

ditions is a matrix of zeros at the true parameter value. This causes a slower than√T rate of

convergence of the GMM estimator. A new limit theory was then developed to investigate the

non-standard asymptotic distribution of the J-test. When H and p are the number of moment

conditions and parameters, respectively, the J-test was shown to have an asymptotic distribution

that lies between χ2(H − p) and χ2(H). Their results extend the findings in Sargan (1983) and

provide an important empirical caution - the commonly used critical values based on χ2(H − p)lead to asymptotically oversized J-tests under the degeneracy of Jacobian moments.

This paper revisits the issue raised in DR and, more generally, non-standard inferential is-

sue with rank-deficient expected Jacobian matrix with known forms. First, we consider moment

functions for which the Jacobian of the moment conditions is known to be a matrix of zeros at

the true parameter values due to the functional forms of the moment conditions. We provide al-

ternative GMM estimation and inference using the zero Jacobian matrix as additional moment

conditions. These additional moment restrictions contain extra information of the economic model.

This information is exploited to achieve the first-order local identification of the unknown struc-

tural parameters. We construct GMM estimators with√T -consistency and asymptotic normality

by adding the zero Jacobian as extra moment conditions. The J-test statistics based on the new

set of moments are shown to have asymptotic Chi-square distributions. Other forms of singular

expected Jacobian are also addressed, and the possible ways of recovering standard inference are

discussed.

We apply the newly developed theory to the main example - inference on the common feature in

the common CH factor model. When using J-tests for the existence of the common feature in this

model, DR suggests using the conservative critical values based on χ2(H) to avoid the over-rejection

issue. We show that, under the same suffi cient conditions of DR, the common feature is not only

first order locally identified, but also globally identified by the zero Jacobian moment conditions.

As a result, our GMM estimators of the common feature have√T -consistency and asymptotic

normality. Our J-test statistic for the existence of the common feature have asymptotic Chi-square

distribution, which enables non-conservative asymptotic inference. Moreover, the Jacobian based

2

Page 3: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

GMM estimator of the common feature has the closed form expression, which makes it particularly

well suited to empirical applications.

The rest of this paper is organized as follows. Section 2 describes the key idea of our methods

in the general GMM framework. Section 3 applies the main results developed in Section 2 to

the common CH factor models. Section 4 contains simulation studies and Section 5 concludes.

Extensions and proofs of the main results, and supplementary simulation results are included in

the Appendix.

Standard notation is used. Throughout the paper, A′ denotes the transpose of a matrix A;

for d1 × 1 vector function f(x): Rd2 → Rd1 we use ∂f(x)∂x′ to denote the d1 × d2 matrix whose i-th

row and j-th column element is ∂fi(x)∂xj

where fi(·) and xj are the i-th component in f(·) and j-thcomponent in x respectively; for any positive integers k1 and k2, Ik1 denotes the k1 × k1 identity

matrix, 0k1×k2 denotes the k1 × k2 zero matrix and 1k1×k2 denotes the k1 × k2 matrix of ones;

A ≡ B means that A is defined as B; an = op(bn) means that for any constants ε1, ε2 > 0, there is

Pr (|an/bn| ≥ ε1) < ε2 eventually; an = Op(bn) means that for any ε > 0, there is a finite constant

Cε such that Pr (|an/bn| ≥ Cε) < ε eventually; “→p”and “→d”denote convergence in probability

and convergence in distribution, respectively.

2 Degenerate Jacobian in GMM Models

We are interested in estimating some parameter θ0 ∈ Θ ⊂ Rp which is uniquely identified by H(H ≥ p) many moment conditions:

ψ(θ0) ≡ E [ψ (Xt, θ0)] ≡ E [ψt (θ0)] = 0H×1, (2.1)

where Xt is a random vector which is observable in period t. When we have data XtTt=1, the

GMM estimator θψ,T based on the moment conditions (2.1) is defined as

θψ,T = arg minθ∈Θ

Jψ,T (θ) (2.2)

where Jψ,T (θ) ≡ T−1[∑T

t=1 ψt(θ)]′Wψ,T

[∑Tt=1 ψt(θ)

]and Wψ,T is an H × H weight matrix.

Let θ∗ψ,T be the GMM estimator in (2.2) with weight matrix Wψ,T = Ω−1

ψ,T , where Ωψ,T is a

consistent estimator of the asymptotic variance of T−1/2∑T

t=1 ψt(θ0). The J-test statistic defined

as Jψ,T ≡ Jψ,T (θ∗ψ,T ) with Wψ,T = Ω−1

ψ,T is commonly used to test the validity of the moment

conditions (2.1).

As illustrated in Hansen (1982), the global identification together with other regularity condi-

tions can be used to show standard properties of the GMM estimator θψ,T and J-test statistic Jψ,T .

For any θ ∈ Θ, define

Γ (θ) =∂

∂θ′E [ψt (θ)] .

Then Γ (θ) is an H × p matrix of functions. The standard properties of the GMM estimator θψ,T ,

3

Page 4: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

such as√T -consistency and asymptotic normality, rely on the condition that Γ (θ0) has full rank.

When Γ (θ0) = 0H×p, the properties of the GMM estimator are nonstandard. For example its

convergence rate is slower than√T and the J-test statistic Jψ,T has a mixture of asymptotic Chi-

square distributions. DR have established these nonstandard properties of GMM inference when

Γ (θ0) = 0H×p, in the context of testing for the common feature in the common CH factor models.

In this section, we discuss the same issue and propose an alternative solution in a general GMM

context.

Define g (θ) = vec(Γ (θ)′), then Γ (θ0) = 0H×p implies that g (θ0) = 0pH×1. The zero Jacobian

matrix provides pH many extra moment conditions:

g (θ) = 0pH×1 when θ = θ0. (2.3)

The new set of moment restrictions ensures the first order local identification of θ0, when the

Jacobian of g (θ) (or essentially the Hessian of ψ (θ)) evaluated at θ0 has full column rank. We

define the corresponding Jacobian matrix of g (θ) as

H (θ) ≡ ∂

∂θ′g (θ) ,

where H (θ) is now a pH × p matrix of functions. When H (θ0) ≡ H has full column rank the

first order local identification of θ0 could be achieved, which makes it possible to construct GMM

estimators and J-tests with standard properties. We next provide a Lemma that enables checking

the rank condition of the moment conditions based on the Jacobian matrix.

Lemma 2.1 Let ψh(θ) be the h-th (h = 1, . . . ,H) component function in ψ(θ). Suppose that (i)

θ0 belongs to the interior of Θ; and (ii) for any θ ∈ Θ,((θ − θ0)′

(∂2ψh∂θ∂θ′

(θ0)

)(θ − θ0)

)1≤h≤H

= 0 (2.4)

if and only if θ = θ0. Then the matrix H has full rank.

Lemma 2.1 shows that when the moment conditions in (2.3) are used under the condition (ii),

the first order local identification of θ0 is achieved. The moment conditions in (2.3) alone may not

ensure the global/unique identification of θ0. However, as θ0 is globally (uniquely) identified by

the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to

ensure both the global identification and the first order local identification of θ0.

Remark 2.1. Condition (ii) in Lemma 2.1 is the second order local identification condition ofθ0 based on the moment conditions in (2.1). This condition is derived as a general result in DR

(see, their Lemma 2.3), and is used as a high-level suffi cient assumption in Dovonon and Goncalves

(2016, hereafter DG) to justify the validity of bootstrap inference procedures. When this condition

is violated, the GMM estimator additionally using moment conditions from zero Jacobian may not

4

Page 5: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

have the√T -asymptotic normal distribution. Moreover, when this condition is close to be violated

(i.e., weak local identification), the finite sample performance of the new GMM estimators and

standard inference procedures proposed below may not closely follow the asymptotic analysis in

the paper (see, e.g., simulation results under D1.1 and D2.1 in Section 4).1

Let Γt (θ) = ∂∂θ′ψt (θ), gt (θ) = vec(Γt (θ)′) and mt(θ) = (ψ′t (θ) , g′t (θ))′. We define the GMM

estimator of θ0 using all moment conditions as

θm,T = arg minθ∈Θ

Jm,T (θ) (2.5)

where Jm,T (θ) ≡ T−1[∑T

t=1mt(θ)]′Wm,T

[∑Tt=1mt(θ)

]and Wm,T is an (Hp + H) × (Hp + H)

weight matrix. Similarly, we define the GMM estimator of θ0 using only the moment conditions in

(2.3) as

θg,T = arg minθ∈Θ

Jg,T (θ) (2.6)

where Jg,T (θ) ≡ T−1[∑T

t=1 gt(θ)]′Wg,T

[∑Tt=1 gt(θ)

]and Wg,T is an Hp×Hp weight matrix.

Assumption 2.1 (i) The Central Limit Theorem (CLT) holds: T−12∑T

t=1mt(θ0) →d N(0,Ωm)

where Ωm is a positive definite matrix with

Ωm =

(Ωψ Ωψg

Ωgψ Ωg

),

where Ωψ is an H ×H matrix and Ωg is a pH × pH matrix; (ii) Wm,T →p Ω−1m and Wg,T →p Wg,

where Wg is a symmetric, positive definite matrix.

We next state the asymptotic distributions of the GMM estimators θm,T and θg,T .

Proposition 2.1 Under the conditions of Lemma 2.1, Assumption 2.1 and Assumption B.1 in theAppendix, we have (i) √

T (θm,T − θ0)→d N(0,Σθ,m)

where Σ−1θ,m= H′(Ωg − ΩgψΩ−1

ψ Ωψg)−1H; (ii) moreover if θ0 is uniquely identified by (2.3), then

√T (θg,T − θ0)→d N(0,Σθ,g)

where Σθ,g = (H′WgH)−1H′WgΩgWgH(H′WgH)−1.

In the linear IV models, Condition (2.4) does not hold under the zero Jacobian, and hence

θ0 is not identified. Phillips (1989) and Choi and Phillips (1992) showed that if the Jacobian

has insuffi cient rank but is not entirely zero in the linear IV models, some part of θ0 (after some

rotation) is identified and√T -estimable. Proposition 2.1 shows that in some non-linear models, the

5

Page 6: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

degenerate Jacobian together with Condition (2.4) and other regularity conditions can be used to

derive a√T -consistent GMM estimator of θ0. Our results therefore supplement the earlier findings

in Phillips (1989) and Choi and Phillips (1992).

Remark 2.2. When Wg = Ω−1g , we have Σ−1

θ,g = H′Ω−1g H ≤ Σ−1

θ,m, which implies that θm,T is

preferred to θg,T from the effi ciency perspective. In some examples (e.g., the common CH factor

model in the next Section), the computation of θg,T is much easier than θm,T as the former has

a closed-form expression. We next propose an estimator which is as effi cient as θm,T , but can be

computed similarly to θg,T . This new procedure shares some similar feature of the principle of

control variables (Fieller and Hartley, 1954; Kleibergen, 2005; and Antoine et al., 2007)2.

Let Ωg,T , Ωψg,T and Ωψ,T be the consistent estimators of Ωg, Ωgψ and Ωψ respectively. These

variance matrix estimators can be constructed using θg,T with identity weight matrix, for example.

The new GMM estimator is defined as

θg∗,T = arg minθ∈Θ

T−1

[T∑t=1

gψ,t(θ)

]′Wg∗,T

[T∑t=1

gψ,t(θ)

](2.7)

where gψ,t(θ) = gt(θ)− Ωgψ,T Ω−1ψ,Tψt(θg,T ) and W−1

g∗,T = Ωg,T − Ωgψ,T Ω−1ψ,T Ωψg,T .

Theorem 2.1 Under the conditions of Proposition 2.1, we have

√T (θg∗,T − θ0)→d N(0,Σθ,m)

where Σθ,m is defined in Proposition 2.1.

From Theorem 2.1, θg∗,T has the same asymptotic variance as θm,T . Moreover, it is essen-

tially computed based on the moment conditions (2.3), hence it benefits computational simplicity

whenever θg,T is easy to calculate.

Remark 2.3. Standard inference on θ0 can be conducted using the asymptotic normal distributions

of θm,T , θg,T or θg∗,T . The inference based on θm,T or θg,T may be better than the inference based

on θg,T , as the latter has larger asymptotic variance. By contrast, the asymptotic distribution of

θψ,T is non-standard, which makes the inference based on θψ,T diffi cult to use in practice. Moreover,

the convergence rates of θm,T , θg,T and θg∗,T are T−1/2 which is faster than T−1/4, the convergence

rate of θψ,T . Hence the inference based on θm,T , θg,T or θg∗,T is not only easy to use but also better

(in both size and power) from the theoretical perspective.

When the GMM estimators have the standard properties, it is straightforward to construct the

over-identification test statistics and show their asymptotic distributions. As the model specification

implies both the moment conditions in (2.1) and (2.3), one can jointly test their validity using the

6

Page 7: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

following standard result:3

Jm,T ≡ T−1

[T∑t=1

mt(θm,T )

]′Ω−1m,T

[T∑t=1

mt(θm,T )

]→d χ

2(Hp+H − p). (2.8)

When θ0 is identified by (2.3), it may be convenient to use the J-test based on (2.6) in practice:

Jg,T ≡ T−1

[T∑t=1

gt(θ∗g,T )

]′Ω−1g,T

[T∑t=1

gt(θ∗g,T )

]→d χ

2(Hp− p), (2.9)

where θ∗g,T denotes the GMM estimator defined in (2.6) with weight matrix Wg,T = Ω−1

g,T . One

interesting aspect of the proposed J-test in (2.8) is that it has standard degree of freedom, i.e. the

number of moment conditions used in estimation minus the number of parameters we estimate.

Among the H(p+ 1) many moment restrictions, H moments from (2.1) have degenerate Jacobian

matrix. By combining these H moments with the extra information provided by the Hp Jacobian

moments, we avoid the issue of rank deficiency. Stacking the additional moments from (2.3) provides

enough sensitivity of the J-test statistic to parameter variation. As a result, the standard degree

of freedom shows up in the asymptotic Chi-square distribution in (2.8).

Without incurring greater computation costs, we prefer testing more valid moment restrictions

under the null. For this purpose, one can use the following test statistics:

Jh,T ≡ T−1

[T∑t=1

mt(θg,T )

]′Wh,T

[T∑t=1

mt(θg,T )

](2.10)

where Wh,T is an (Hp+H)× (Hp+H) real matrix. Let m(θ) = E [mt(θ)] for any θ ∈ Θ. Theorem

2.2 below provides the asymptotic distribution of Jh,T .

Remark 2.4. The J-test Jg,T is expected to have power if a violation of the economic theory impliesthe invalidity of the moment conditions (2.3) (including the case of the CH factor model studied

in the next section). The J-tests Jm,T and Jh,T test not only the moment conditions in (2.1) but

also those in (2.3). Including more moment conditions in the J-test may not necessarily increase or

decrease the power of the test.4 For example, suppose that the null hypothesis of an economic model

implies that both (2.1) and (2.3) hold. If the moment conditions (2.1) hold under both null and

alternative (i.e., only the moment conditions in (2.3) is violated under the alternative hypothesis),

then the J-test based on (2.1) will have no power although it has fewer moment conditions than Jm,T .

Ideally, one would like to include only the most misspecified moment conditions in J-test for the

power consideration. The degree of misspecification of different moment conditions are, however,

unknown in reality. To avoid the potential power loss from testing only mildly misspecified moment

conditions, one may want to include all valid moment conditions implied by the null hypothesis in

the J-test.

7

Page 8: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Theorem 2.2 Suppose that the conditions of Proposition 2.1 hold and Wh,T →p Wh where Wh is

a non-random real matrix. Then we have

Jh,T →d B′H(p+1)Ω

12mP′WhPΩ

12mBH(p+1)

where

P ≡ IH(p+1) −∂m(θ0)

∂θ′(0p×H ,(H′WgH)−1H′Wg)

and BH(p+1) denotes an H(p+ 1)× 1 standard normal random vector.

Remark 2.5. Theorem 2.2 indicates that the asymptotic distribution of Jh,T is not pivotal. How-

ever critical values of its asymptotic distribution is easy to simulate in practice. Unlike the boot-

strap procedures in DG, one does not need to solve the optimization problem of GMM estimation

repeatedly to get the simulated critical values.5

Remark 2.6. From the asymptotic analysis in DR, we know that the asymptotic distribution of

Jψ,T is bounded by χ2(H − p) from below and by χ2(H) from above. Let C1 and C2 denote the

tests based on Jψ,T with critical values χ21−α(H − p) and χ2

1−α(H), respectively. We would expect

that: (i) when the sample size is large, the empirical rejection probabilities of the tests C1(C2) will

be larger(smaller) than the nominal size α; (ii) when H is large and p is small, the critical values

χ21−α(H − p) and χ2

1−α(H) are close to each other (in relative magnitude) which implies that the

empirical rejection probabilities of the tests C1 and C2 will be close; (iii) when the sample size is

large with large H and small p, the empirical rejection probabilities of the tests C1 and C2 should

be close to α. Compared with the tests C1 and C2, the standard inference based on Jm,T , Jg,Tor Jh,T is asymptotically exact in the sense that their empirical rejection probabilities converge

to the nominal size. Moreover, as the lower bound of the convergence rate of θψ,T is T−1/4, the

empirical rejection probabilities of the tests C1 and C2 converge slowly to their asymptotic values.6

It is worth mentioning, however, that the conservative test C2 proposed in DR does not explicitly

use the known information of zero Jacobian. Hence the test remains under-sized when the known

information is misspecified, for example, when the Jacobian has full rank. This test is robust from

this perspective.

Remark 2.7. Compared with the bootstrap procedures in DG, the standard inference proceduresprovided in this paper have the following advantages. First, the leading error terms in the asymp-

totic approximation of the J-test statistics Jm,T , Jg,T and Jh,T converge to zero at the T−1/2 rate,

while the corresponding term in Jψ,T converges at a rate not faster than T−1/4. As the bootstrap

critical values in DG approximate the critical values of the asymptotic distribution of Jψ,T , the

empirical rejection probabilities of the bootstrap inference procedures converge to the nominal size

slower than the standard inference based on Jm,T , Jg,T and Jh,T . Second, the inference based

on Jm,T , Jg,T and Jh,T does not need to solve the optimization problem in the GMM estimation

repeatedly, which makes them simpler to use in practice. One potential drawback of our inference

8

Page 9: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

procedures is that they invoke many moment conditions in GMM estimation and inference. The

asymptotic results established in this section assume that both H and p are fixed, hence they may

not capture the finite sample distributions of the GMM estimators and J-test statistics well when

Hp is large. When there are many moment conditions, estimation methods such as the continu-

ous updated GMM are preferred as they produce GMM estimators with smaller bias (see, Newey

and Smith, 2004), and the inference procedures which take the (second-order) bias and variance

contributed by many moment conditions are preferred (see, Newey and Windmeijer, 2009).

Remark 2.8. Under the conditions of Proposition 2.1 and the null hypothesis that the momentconditions in (2.1) and (2.3) hold, one can show that

Jg,T (θg∗,T )→d χ2(Hp− p) (2.11)

where Jg,T (θg∗,T ) = T−1[∑T

t=1 gψ,t(θg∗,T )]′Wg∗,T

[∑Tt=1 gψ,t(θg∗,T )

]. The over-identification test

based on Jg,T (θg∗,T ) and its asymptotic distribution in (2.11) controls the (asymptotic) size. The

power of this test requires careful investigation. Consider the local misspecification case for illus-

tration. In such case, the moment conditions (2.1) and (2.3) become

E [ψt (θ0)] = dψn−1/2 and E [gt (θ0)] = dgn

−1/2 (2.12)

respectively, where dψ ∈ RH and dg ∈ RHp are real vectors. Suppose that θ0 is asymptotically

uniquely identified by the second set of the moment conditions in (2.12) and Condition (ii) in

Lemma 2.1 holds. Then both θg,T and θg∗,T are still√T -consistent estimators of θ0. Let gψ,t =

gt(θ0)− ΩgψΩ−1ψ ψt(θ0). It is ease to show that

Jg,T (θg∗,T )→d

(Bgψ + dgψ

)′Ξ(Bgψ + dgψ

),

where Ξ = IHP − Σ−1/2θ,m H(H′Σ−1

θ,mH)−1H′Σ−1/2θ,m , dgψ = Σ

−1/2θ,m (dg − ΩgψΩ−1

ψ dψ) and Bgψ is an HP

dimensional standard normal random vector. Test based on Jg,T (θg∗,T ) has non-trivial power as

long as dgψ 6= 0 and dgψ does not lie in the eigenspace of the zero eigenvalue of Ξ. However, this

test may have no (or low) power when dgψ = 0 (or dgψ is close to zero). Note that dgψ can be zero

or close to zero even if dg and dψ are far away from zero. On the other hand, the J-test based

on Jm,T or Jh,T has non-trivial power for most non-zero dg and dψ. To avoid the power loss of

Jg,T (θg∗,T ) with small dgψ but large dg and/or dψ, we do not recommend this test in practice.

Remark 2.9. In reality, one may want to test whether the Jacobian is zero or not. For such

problem, the null and alternative hypotheses are:

H0 : E [gt (θ0)] = 0 v.s. H1 : E [gt (θ0)] 6= 0, (2.13)

where E [ψt (θ0)] = 0 is maintained under both the null and alternative hypotheses. Several tests

are available in the literature. First, one can use the J-test statistic Jm,T based on the stacked

9

Page 10: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

moment conditions E [mt (θ0)] = 0. Second, one can also use the quasi-likelihood ratio (QLR)

statistic (see, e.g., Eichenbaum, Hansen and Singleton, 1988) which is the difference of two J-test

statistics: one is Jm,T and the other is based on E [ψt (θ0)] = 0. As explained in Hall (2005), the

QLR statistic has the asymptotic χ2(Hp) distribution under the standard regularity conditions,

potentially providing a more powerful test than the J-test statistic Jm,T . However, the GMM

estimator based on E [ψt (θ0)] = 0 lacks the local identification under the null hypothesis, so the

asymptotic distribution of the QLR statistic becomes non-standard. On the other hand, the J-test

statistic Jm,T has the standard chi-square limiting distribution under the null hypothesis, hence is

more convenient for testing (2.13).

3 Application to Common CH Factor Model

Multivariate volatility models commonly assume fewer number of conditionally heteroskedastic

(CH) factors than the number of assets. A small number of common factors may generate CH

behavior of many assets, which can be motivated by economic theories (see, e.g., the early discus-

sion in Engle et al., 1990). Moreover, without imposing the common factor structure, there may

be an overwhelming number of parameters to be estimated in the multivariate volatility models.

From these theoretical and empirical perspectives, common CH factor models are preferred and

widely used. Popular examples include the Factor Gaussian generalized autoregressive conditional

heteroskedastic (GARCH) models (Silvennoien and Terasvirta; 2009, Section 2.2) and Factor Sto-

chastic Volatility models (see, e.g., Section 2.2 of Broto and Ruiz (2004) and references therein).

The relevant common CH factor model literature includes Diebold and Nerlove (1989), Engle, Ng

and Rothschild (1990), Fiorentini, Sentana and Shephard (2004), Doz and Renault (2006) and

Dovonon and Renault (2013). It is therefore important to test whether a common CH factor

structure exists in multivariate asset returns of interest.

Engle and Kozicki (1993) proposed to detect the existence of common CH factor structure,

or equivalently the common CH features, using the GMM over-identification test (Hansen, 1982).

Consider an n-dimensional vector of asset returns Yt+1 = (Y1,t+1, . . . , Yn,t+1)′ which satisfies

Var (Yt+1|Ft) = ΛDtΛ′ + Ω, (3.1)

where Var(·|Ft) denotes the conditional variance given all available information Ft at period t,Λ = (λ1, . . . , λp) is an n × p matrix (p ≤ n), Dt = diag(σ2

1,t, . . . , σ2p,t) is a p × p diagonal matrix,

Ω is an n× n positive definite matrix and Ftt≥0 is the increasing filtration to which Ytt≥0 and

σ2k,tt≥0,1≤k≤p are adapted. The Assumptions 3.1, 3.2, 3.3, 3.4, 3.5 and 3.6 below are from DR.

Assumption 3.1 Rank(Λ) = p and V ar [Diag (Dt)] is non-singular.

Assumption 3.2 E [Yt+1|Ft] = 0.

Assumption 3.3 We have H many Ft-measurable random variables zt such that: (i) V ar (zt) is

non-singular, and (ii) Rank [Cov (zt, Diag (Dt))] = p.

10

Page 11: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Assumption 3.4 The process (zt, Yt) is stationary and ergodic with E[‖zt‖2] <∞ and E[‖Yt‖4] <

∞. We also allow weak dependence structure so that (zt, vec (YtY′t )) fulfill a central limit theorem.

When p < n, e.g., p = n−1, there exists a nonzero θ0∗ ∈ Rn such that θ0′

∗ Λ = 0. The real vector

θ0∗ is called the common CH feature in the literature. In the presence of the common CH feature,

we have

Var(θ0′∗ Yt+1|Ft

)= θ0′

∗ ΛDtΛ′θ0∗ + θ0′

∗ Ωθ0∗ = θ0′

∗ Ωθ0∗ = Constant. (3.2)

Note the CH effects are nullified in the linear combination θ0′∗ Yt+1, while the individual return

Yi,t+1’s (i = 1, . . . , n) are showing CH volatility. The equations in (3.2) lead to the following

moment conditions

E[(zt − µz)

(θ′∗Yt+1Y

′t+1θ∗

)]= 0H×1 when θ∗ = θ0

∗ ∈ Rn, θ0∗ 6= 0. (3.3)

where µz denotes the population mean of zt.7 Given the restrictions in (3.3), one can use GMM to

estimate the common feature θ0∗ and conduct inference about the validity of the moment conditions.

DR have shown that GMM inference using (3.3) is subject to the issue of zero Jacobian moments.

The GMM estimator based on (3.3) can therefore be as slow as T−1/4 with a nonstandard limiting

distribution. As explained earlier, the J-test based on (3.3) has an asymptotic mixture of two

different chi-square distributions, χ2(H − p) and χ2(H). Following DR’s empirical suggestion -

using critical values based on χ2(H) rather than χ2(H − p) - provides conservative size control.We show that it is possible to construct T 1/2-consistent and asymptotically normally distributed

GMM estimators and non-conservative J-tests by applying theory developed in Section 2 to this

common CH factor model.

Following DR, we assume that exclusion restrictions characterize a set Θ∗ ⊂ Rn that containsat most one unknown common feature θ0

∗ up to a normalization condition denoted by N .

Assumption 3.5 We have θ∗ ∈ Θ∗ ⊂ Rn such that Θ∗ = Θ∗ ∩N is a compact set and

(θ∗ ∈ Θ∗ and θ′∗Λ = 01×p

)⇔(θ∗ = θ0

∗). (3.4)

The non-zero restriction on θ0∗ could be imposed in several ways. For example, the unit cost

condition N = θ ∈ Rn,∑n

i=1 θi = 1 can be maintained without loss of generality8. To implementthis restriction, we define an n× (n− 1) matrix G2 as

G2 =

1 0 · · · 0

0 1 · · · 0...

.... . .

...

0 0 · · · 1

−1 −1 · · · −1

n×(n−1)

=

(I(n−1)

−11×(n−1)

)n×(n−1)

.

11

Page 12: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Then for any θ∗ ∈ Θ∗,

θ∗ =(θ1, . . . , θn−1, 1−

∑n−1

i=1θi

)′=(θ′, 1−

∑n−1

i=1θi

)′= G2θ + ln (3.5)

where θ = (θ1, . . . , θn−1)′ is an (n − 1)-dimensional real vector and ln = (0, · · · , 0, 1)′ is an n × 1

vector. Hence, we can write

Θ∗ = θ∗ : θ∗ = G2θ + ln, ∀θ ∈ Θ, (3.6)

where Θ is a non-degenerated subspace of Rn−1.

Remark 3.1. Under the unit cost condition, the identification (3.4) requires that

Λ′G2θ0 = −Λ′ln. (3.7)

The suffi cient and necessary condition for the existence of solution θ0 in equation (3.7) is that the

matrices Λ′G2 and (Λ′G2,−Λ′ln) have the same rank. Moreover, when the solution θ0 exists, it is

unique when the matrix Λ′G2 has rank n− 1. These findings together with the special structure of

G2 and ln imply the following lemma.

Lemma 3.1 Suppose that Assumptions 3.1, 3.2 and 3.3 hold. Then:(i) the moment conditions (3.3) hold if and only if there is a θ0

∗ such that θ0′∗ Λ = 01×p;

(ii) there exists a θ0∗ such that θ

0′∗ Λ = 01×p if and only if

rank(Λ′G2) = p; (3.8)

(iii) the identification (3.4) holds if and only if Λ′G2 is invertible, in such case, we have

θ0∗ =

(θ′0, 1−

∑n−1

i=1θ0,i

)′= G2θ0 + ln,

where θ0 = −(Λ′G2)−1Λ′ln.

Remark 3.2. Lemma 3.1 is useful for understanding the properties of the over-identification testwhen it is used for testing the existence of common feature. Essentially, the over-identification test

based on the moment conditions (3.3) enables one to test the following hypotheses

H0 : there exists a unique common feature in Θ∗, v.s.

H1 : no common feature in Θ∗ exists among the assets Yt+1. (3.9)

By Lemma 3.1.(i), we know that under the null, the moment conditions in (3.3) are valid. Lemma

3.1.(iii) further captures the identified common feature under the null. Under the alternative, by

Lemma 3.1.(i)-(ii), we know that the moment conditions in (3.3) are invalid. Because when there is

12

Page 13: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

no general common feature, Λ is an invertible matrix which implies that rank(Λ′G2) = n− 1 < p.

Hence by Lemma 3.1.(i)-(ii), we know that the moment conditions in (3.3) are invalid.

Remark 3.3. It is possible that the true data generating process (DGP) is not nested by eitherthe null or the alternative hypothesis in (3.9). There are two cases to consider: (i) the unit cost

condition is misspecified; (ii) there are multiple common features among the assets Yt+1. When the

unit cost condition is misspecified, there is no θ0∗ ∈ Θ∗ such that θ0′

∗ Λ = 01×p which together with

Lemma 3.1.(i) implies that the moment conditions (3.3) are invalid and the over-identification test

may lead to rejection even if a common feature exists. Next, suppose that the unit cost conditions

is correctly specified, but there are multiple common features among Yt, i.e., p < n − 1. Then

there are multiple θ0∗s in Θ∗ such that the moment conditions (3.3) holds. In this case, the common

feature is not uniquely identified. The asymptotic distribution of the over-identification test in such

case is diffi cult to study and is left for future research.9

Remark 3.4. One possible way of testing the existence of the common features with p < n − 1

is using only (the first) p + 1 assets out of the n assets Yt+1 and then constructing the J-tests

proposed below in this section. As long as the factor structure holds for the selected p+ 1 assets10,

the J-tests control the size under H0 in (3.9) and has power in general against H1 in (3.9). Let θ0,j

(j = 1, . . . , n) denote the j-th component of θ0. This above procedure essentially imposes n− p− 1

many extra restrictions (in addition to the unit cost restriction), i.e., θ0,j = 0 for j = p+ 2, . . . , n,

to ensure the unique identification of a common feature in Θ∗.

Assumption 3.6 The vector θ0 belongs to the interior of Θ.

Following the notations introduced in Section 2, we define

ψt (θ) ≡ (zt − µz) θ′∗[Yt+1Y′t+1 − E(Yt+1Y

′t+1)]θ∗

and gt (θ) ≡ vec

[(1

2

∂ψt (θ)

∂θ′

)′](3.10)

for any θ ∈ Θ and any t, where the relation between θ∗ and θ is specified in (3.5).11 Then by

definition, we have

ψ (θ) = E [ψt (θ)] and g (θ) = E [gt (θ)] for any θ ∈ Θ.

Under the integrability condition in Assumption 3.3, the nullity of the moment Jacobian occurs at

a true common feature θ0 in (3.3) because

Γ (θ0) = E[(zt − µz) θ0′

∗ Yt+1Y′t+1G2

]= 0H×p. (3.11)

13

Page 14: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

We consider using both restrictions in (3.3) and (3.11) by stacking them:

m (θ0) ≡ E [mt (θ0)] ≡ E[ψt (θ0)

gt (θ0)

]= 0(pH+H)×1 (3.12)

which are the moment conditions implied by H0 in (3.9). As discussed in the previous section, the

first order local identification of θ0 could be achieved in (3.12), if we could show that the following

matrix has full column rank:

∂m (θ0)

∂θ′= E

[∂ψt(θ0)∂θ′

∂gt(θ0)∂θ′

]=

[0H×p

HpH×p

].

Lemma 3.2 Under Assumptions 3.1, 3.3 and (3.4), the matrix H has full rank.

Lemma 3.2 shows that θ0∗ is (first-order) locally identified by the stacked moment conditions.

The source of the local identification is from the zero Jacobian matrix, which actually contains

more information than that needed for the local identification. We next show that if θ0∗ is uniquely

identified by (3.3), it is also uniquely identified by (3.11).

Lemma 3.3 Under Assumptions 3.1, 3.3 and (3.4), θ0∗ is uniquely identified by (3.11).

By Lemmas 3.2 and 3.3, θ0∗ is not only (first-order) locally identified, but also globally identified

by the moment conditions in (3.11). As a result, one may only use these moment conditions to

estimate the common feature θ0∗. It is clear that the moment conditions in (3.11) are linear in θ,

which makes the corresponding GMM estimators easy to compute. The GMM estimator based on

the stacked moment conditions may be more effi cient, as illustrated in Proposition 2.1. However,

its computation may be costly, particularly when the dimension of θ0∗ is high.

Lemmas 3.2 and 3.3 imply that the moment conditions in (3.11) are valid and identify a unique

common feature θ0∗ under the null hypothesis H0 in (3.9). Hence the J-test based on (3.11) is

expected to control the (asymptotic) size. We next show that in general, this test also has power

against the hypothesis that there does not exist any common feature among Yt+1.

Lemma 3.4 Suppose that Λ is an invertible square matrix and Assumptions 3.2 and 3.3 hold. If

λ′jG2 6= 01×(n−1) for any j ∈ 1, . . . , p, then there is no θ0∗ ∈ Θ∗ such that (3.11) holds.

By the definition of G2, the left eigenvector of the zero eigenvalue of G2 takes the form a11×n

for any a 6= 0. Hence the restriction λ′jG2 6= 01×(n−1) for any j ∈ 1, . . . , p together with the fullrank of Λ is equivalent to the condition that there is no λj (j ∈ 1, . . . , p) such that λj = aj1n×1

for any aj ∈ R.Using the sample average z of zt as the estimator of µz, we construct the feasible moment

functions as

mt (θ) ≡[ψt (θ)

gt (θ)

]≡[

(zt − z)(θ′∗Yt+1Y

′t+1θ∗

)((zt − z)⊗ Ip)G′2Yt+1Y

′t+1θ∗

]. (3.13)

14

Page 15: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

The GMM estimator θm,T is defined as

θm,T = arg minθ∈Θ

T−1

[T∑t=1

mt(θ)

]′Wm,T

[T∑t=1

mt(θ)

](3.14)

where Wm,T is the inverse of an estimator of the asymptotic variance of T−1/2∑T

t=1 mt(θ0) defined

in (3.17) below. In the Appendix D, we show that under Assumptions 3.1, 3.2, 3.3, 3.4, 3.5 and

3.6, Proposition 2.1(i) holds for the GMM estimators θm,T .

From θ∗ = G2θ + ln, we can write

T−1T∑t=1

gt (θ) = HT θ + ST ,

where we define

HT ≡∑T

t=1 ((zt − z)⊗ Ip)G′2Yt+1Y′t+1G2

Tand ST ≡

∑Tt=1 ((zt − z)⊗ Ip)G′2Yt+1Y

′t+1ln

T. (3.15)

Given the weight matrix Wg,T , we can compute the GMM estimator θg,T as

θg,T = −(H′TWg,THT

)−1H′TWg,TST . (3.16)

Let Ωψ,T , Ωgψ,T and Ωg,T be the estimators of Ωg, Ωgψ and Ωψ respectively. For example, we can

construct

Ωm,T = T−1T∑t=1

(mt(θg,T )−mT (θg,T ))(mt(θg,T )−mT (θg,T ))′ (3.17)

where θg,T = − (H′THT )−1H′TST , mT (θ) = T−1∑T

t=1 mt (θ) and

mt(θ) =

[(zt − z)

(θ′∗(Yt+1Y

′t+1 − T−1

∑Tt=1 Yt+1Y

′t+1)θ∗

)((zt − z)⊗ Ip)G′2(Yt+1Y

′t+1 − T−1

∑Tt=1 Yt+1Y

′t+1)θ∗

],

and Ωg,T , Ωgψ,T and Ωψ,T are the leading H×H, upper-right H×Hp and last Hp×Hp submatricesof Ωm,T , respectively.

The modified GMM estimator θg∗,T as in (2.7) can be also obtained as

θg∗,T = −(H′TWg∗,THT

)−1H′TWg∗,T (ST − FT ) . (3.18)

where W−1g∗,T = Ωg,T − Ωgψ,T Ω−1

ψ,T Ω′gψ,T as earlier, and

FT = Ωgψ,T Ω−1ψ,TAT , and

AT = T−1T∑t=1

(zt − z)((

G2θg,T + ln

)′Yt+1Y

′t+1

(G2θg,T + ln

)).

15

Page 16: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

In the Appendix D, we show that under Assumptions 3.1, 3.2, 3.3, 3.4 and (3.4), Proposition 2.1(i)

and Theorem 2.1 holds for θg,T and θg∗,T respectively. The closed form expressions of θg,T and

θg∗,T enable us to show their√T asymptotic normality without the compactness assumption on

Θ∗.

After the GMM estimators θm,T , θg,T and θg∗,T are obtained, we can use the J-test statistic

defined in (2.8), (2.9) and (2.10) to conduct inference about the existence of common feature θ0∗. It

is clear that the test based on Jg,T is the easiest one to use in practice because θg,T has a closed form

solution and Jg,T has an asymptotically pivotal distribution. The test using Jh,T is also convenient,

although one has to simulate the critical value. The test using Jm,T is not easy to apply, when the

dimension of parameter θ0∗ is high.

4 Simulation Studies

In this section, we study the finite sample performances of the proposed GMM estimators and

J-tests using the CH factor model. Specifically, Yt+1 is a n × 1 vector of asset returns generated

from the following model

Yt+1 = ΛFt+1 + ut+1 (4.1)

where Λ is an n× p matrix of the factor loadings, ut+1 is an n× 1 vector of error term, and Ft+1

is a p× 1 vector of factors (fl,t+1, l = 1, . . . , p) generated from a GARCH model:

fl,t+1 = σl,tεl,t+1 and σ2l,t = ωl + αlf

2l,t + βlσ

2l,t−1 (4.2)

where εt = (ε1,t, . . . , εp,t)′ is independent with us for any t and s, and ut+1 and εt+1 are i.i.d. from

N(0, In) and N(0, 0.5In) respectively. The finite sample properties of the GMM estimators and the

inference procedures are investigated with 50000 simulated samples.12

We consider the portfolio (θ′0, 1−∑n−1

i=1 θ0,i)′ where θ0 is an (n− 1)× 1 real vector. There are

two sets of moment conditions for estimating θ. The moment conditions proposed in DR are:

E

(zt+1 − µz)∣∣∣∣∣Yn,t+1 +

n−1∑i=1

θ0,i(Yi,t+1 − Yn,t+1)

∣∣∣∣∣2 = 0, (4.3)

where zt+1 = (Y 21,t, . . . , Y

2n,t)′, and the moment conditions defined using the Jacobian of the moment

functions in (4.3) are:

E

[(zt+1 − µz)(Yi,t+1 − Yn,t+1)

(Yn,t+1 +

n−1∑i=1

θ0,i(Yi,t+1 − Yn,t+1)

)]= 0, (4.4)

for i = 1, . . . , n − 1. In addition to the square of the lagged asset returns Y 2i,t (i = 1, . . . , n), we

include their cross-product terms Yi,tYj,t (i, j = 1, . . . , n and i 6= j) as extra IVs and hence augment

zt+1 to zt+1 = (z′t+1, Y1,tY2,t, . . . , Y1,tYn,t, . . . , Yn−1,tYn,t)′.13 We consider the GMM estimation and

16

Page 17: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

inference with IVs zt+1 and zt+1 respectively in the simulation studies.

We consider n = 2 and 3 separately in the simulation studies. The proposed J-tests are compared

with the bootstrap procedures suggested in DG when n = 2. In such case, the portfolio (θ0, 1− θ0)

has only one unknown parameter to be estimated, which makes the bootstrap procedures less time

consuming. When n = 3 the number of moment conditions in (4.3) is 3 if the set of IVs zt+1 are

used, and the number of the moment conditions is increased to 18 if both the moment conditions in

(4.3) and (4.4) are used with the augmented set of IVs zt+1. This is useful to investigate the finite

sample properties of the proposed GMM estimators and J-tests with many moment conditions.

Four GMM estimators are studied: (i) the optimal weighted GMM estimator θψ,T in (2.2) which

is denoted as Null-GMM estimator14; (ii) the GMM estimator θm,T in (3.18) which is denoted as Eff-

GMM estimator; (iii) the GMM estimator θg∗,T in (3.18) which is denoted as Feff-GMM estimator;

and (iv) the GMM estimator θg,T in (3.16) which is denoted as Jac-GMM estimator. Five inference

procedures are investigated for both n = 2 and 3 with nominal size α = 0.05. In addition to the J-

tests Jm,T , Jg,T and Jh,T proposed in this paper,15 we also consider the J-tests C1 and C2 based on

Jψ,T with critical values χ21−α(H − p) and χ2

1−α(H) respectively. The bootstrap procedures in DG

are investigated only under n = 2: their corrected bootstrap test and the continuously-corrected

bootstrap test are denoted as B1 and B2 respectively.16 The number of moment conditions and

degrees of freedom of the asymptotic distribution of the J-tests, Jm,T , Jg,T , Jh,T , C1 and C2 are

summarized in Table E.1 in the Appendix E.

4.1 Simulation results with n = 2

Four DGPs (labeled as D1.1, D1.2, D1.3 and D1.4 respectively) are considered when n = 2. The

parametrization of these DGPs are summarized in Table 5.1. The null hypothesis H0 in (3.9) holds

under D1.1 and D1.2, while the alternative hypothesis H1 holds under D1.3 and D1.4.

The finite sample bias, standard deviation and root of mean square error (RMSE) of the four

GMM estimators in D1.1 and D1.2 are reported in Figures A.1 and A.2 in Appendix E, respectively.

From the three left panels (i.e., the top-left, middle-left and bottom-left panels) of Figure A.1, we

see that: (i) with the growth of the sample size, the bias of the Eff-GMM estimator, Jac-GMM

estimator and Feff-GMM estimator goes to zero much faster than the Null-GMM estimator; (ii)

the standard deviation of the Null-GMM estimator is overall smaller than the other three GMM

estimators in this DGP; (iii) the GMM estimators which takes the zero Jacobian into estimation

does not seem to dominate the Null-GMM estimator in terms of RMSE when the sample size is

not large enough; (iv) the finite sample properties of the Eff-GMM estimator and the Feff-GMM

estimator are very similar when the sample size becomes large (e.g., T = 5, 000). From the three

right panels (i.e., the top-right, middle-right and bottom-right panels) of Figure A.1, we see that

including more moment conditions in GMM estimation slightly increases the bias of the GMM

estimators, but reduces their variances.

The finite sample properties of the GMM estimators which take the zero Jacobian into esti-

mation in D1.1 indicate that the additional local identification strength provided by the Jacobian

17

Page 18: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

.

Table 5.1: Parametrization of the DGPs

for D1.1 —D1.4

Number of Factors p Factor Loading Λ ((ωl, αl, βl))pl=1

D1.1 1

(1

1/2

)(ω1, α1, β1) = (0.2, 0.2, 0.6)

D1.2 1

(1

1/2

)(ω1, α1, β1) = (0.2, 0.4, 0.4)

D1.3 2 I2

(ω1, α1, β1

ω2, α2, β2

)=

(0.2, 0.2, 0.60.2, 0.4, 0.4

)D1.4 2

(1 1/2

1/2 0

) (ω1, α1, β1

ω2, α2, β2

)=

(0.2, 0.4, 0.40.2, 0.6, 0.2

)for D2.1 —D2.4

Number of Factors p Factor Loading Λ ((ωl, αl, βl))pl=1

D1.1 2

1 01 1

1/2 1/2

(ω1, α1, β1

ω2, α2, β2

)=

(0.2, 0.2, 0.60.2, 0.4, 0.4

)

D1.2 2

1 01 1

1/2 1/2

(ω1, α1, β1

ω2, α2, β2

)=

(0.2, 0.4, 0.40.2, 0.5, 0.3

)

D1.3 3 I3

ω1, α1, β1

ω2, α2, β2

ω3, α3, β3

=

0.2, 0.2, 0.60.2, 0.4, 0.40.1, 0.1, 0.8

D1.4 3

1 0 01 1 1

1/2 1/2 0

ω1, α1, β1

ω2, α2, β2

ω3, α3, β3

=

0.2, 0.4, 0.40.2, 0.5, 0.30.1, 0.6, 0.2

moment conditions is rather weak.17 The identification strength of the Jacobian moment conditions

is slightly increased in D1.2 which leads to significant improvements of the GMM estimators θm,T ,

θg,T and θg∗,T as we can see in Figure A.2. The finite sample properties of θψ,T is also improved

in D1.2, which may be due to the strengthening of the second-order local identification with larger

λmin. Specifically, in Figure A.2 we see that: (i) the finite sample bias and variance of θm,T , θg,Tand θg∗,T converge to zero very fast with the growth of the sample size; (ii) the finite sample bias

of θm,T , θg,T and θg∗,T are overall smaller than the Null-GMM estimator and their finite sample

variances are also smaller when the sample size is reasonably large (e.g., T = 3000); (iii) including

more IVs in GMM estimation clearly improves the finite sample properties of θm,T , θg,T and θg∗,Tin terms of RMSE, but has almost no effect on the RMSE of the Null-GMM estimator; (iv) the

RMSEs of θm,T , θg,T and θg∗,T become smaller than the Null-GMM estimator when the sample

size is slightly large.

We next investigate the finite sample properties of the inference procedures under D1.1 —D1.4.

The results are presented in Figure 5.1. We have the following findings. First, the empirical rejection

probabilities of the J-tests Jm,T (Eff-GMM), Jg,T (Jac-GMM) and Jh,T (Sim-GMM) converge to

18

Page 19: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

the nominal size slowly under D1.1, while they converge much faster under D1.2. The test Jm,Tshows over-rejection in D1.1 even when the sample size is relatively large. The tests Jg,T and

Jh,T control size well in both D1.1 and D1.2. The slow convergence rate of the empirical rejection

probabilities of the J-tests Jm,T , Jg,T and Jh,T in D1.1 is due to the weak local identification of the

Jacobian moment conditions. Second, the tests C1 and C2 show over-rejection and under-rejection

respectively in both D1.1 and D1.2 when the set of IVs zt+1 is used. This is expected by the

asymptotic analysis in DR. However, when more IVs, i.e. zt+1, are used in the GMM estimation,

the tests C1 and C2 are both under-sized in D1.2 even with the large sample size. This is rather

surprising, because increasing the number of moment conditions (from 2 with zt+1 to 3 with zt+1) in

D1.1 does not make the test C1 severely under-sized and one would expect similar results in D1.2 as

the local identification of the Null-GMM estimator is improved in this DGP. This shows that there

may be a potential further aspect of limit theory in DR regarding the finite sample distribution of

Jψ,T . Third, the bootstrap inference B2 is overall under-sized in both D1.1 and D1.2. In D1.2, B2

is even more conservative than the conservative test C2 proposed in DR. The empirical rejection

probability of the bootstrap inference B1 is closer to the nominal size, although it is also under-sized

in D1.2 (particularly when more IVs are used in GMM). Fourth, all the inference procedures have

non-trivial power in D1.3 and D1.4. The power of the tests Jm,T , Jg,T and Jh,T are better than the

bootstrap tests B1 and B2. From their parametrization, we see that D1.4 is a local perturbation of

D1.2. The under-rejection of the tests based on C1, C2, B1 and B2 in D1.2 leads to power loss in

D1.4. Fifth, the condition λ′jG2 6= 0′n−1 for any j ∈ 1, . . . , p in Lemma 3.4 hold under D1.3 andD.1.4. Hence the Jacobian moment conditions are misspecified in these DGPs, which explains the

good power properties of the J-test Jg,T .

4.2 Simulation results with n = 3

Four DGPs (labeled as D2.1, D2.2, D2.3 and D2.4 respectively) are considered when n = 3, whose

parametrizations are also summarized in Table 5.1. The null hypothesis H0 in (3.9) holds under

D2.1 and D2.2, while the alternative hypothesis H1 holds under D2.3 and D2.4.

The finite sample properties of the GMM estimators of (θ0,1, θ0,2) in D2.1 and D2.2 are summa-

rized in Figures A.3-A.6 of Appendix E. The local identification provided in the Jacobian moment

conditions is slightly improved D2.2 when compared with D2.1. From Figures A.3 —A.6, we see

that (i) with the growth of the sample size, the RMSEs of θm,T , θg,T and θg∗,T converge to zero

much faster than the Null-GMM estimator; (ii) increasing the number of the moment conditions

leads to increase of finite sample bias, decrease of variance and decrease of RMSEs of the GMM

estimators in D2.1 and D2.2; (iii) the finite sample properties of the Eff-GMM estimator θm,T and

the Feff-GMM estimator θg∗,T are very similar when the sample size becomes slightly large; (iv) the

Eff-GMM estimator θm,T has smaller finite sample variance than the Jac-GMM estimator, which

shows the effi ciency gain of using modified moment conditions; (v) the RMSEs of θm,T , θg,T and

θg∗,T converge to zero much faster in D2.2 than in D2.1, which shows the improvement of their

finite sample properties due to the improvement of the local identification strength in this D2.2.

19

Page 20: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

.

Figure 5.1. Empirical Rejection Probabilities of the GMM Inference under D1.1 —D1.4

20

Page 21: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

.

Figure 5.2 Empirical Rejection Probabilities of the GMM Inference under D2.1 —D2.4

21

Page 22: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

We next investigate the properties of the inference procedures Jm,T , Jg,T , Jh,T , C1 and C2 under

D2.1 —D2.4. The results are presented in Figure 5.2., illustrating the following results. First, the

empirical rejection probabilities of the J-tests Jm,T , Jg,T and Jh,T converge to the nominal size slowly

under D2.1, while they converge much faster under D2.2. As we have discussed above, the Jacobian

moment conditions provide weak local identification under D2.1, leading to the slow convergence

of the empirical size of the J-tests Jm,T , Jg,T and Jh,T . The strengthened local identification from

the Jacobian moment conditions in D2.2 improves not only the finite sample properties of the

GMM estimators θm,T and θg,T , but also the performances of the J-tests based on these estimators.

Second, the J-test Jh,T controls the size well in both D2.1 and D2.2, although it is slightly under-

sized in finite sample. It has good power in D2.3 with 3 or 6 IVs and in D2.4 with 3 IVs. Its power

is slightly suffered in D2.4 when more IVs are included in GMM estimation. Third, in D2.1 and

D2.2, we see that including more moment conditions in GMM decreases the empirical rejection

probabilities of the J-tests. This effect is almost negligible on Jh,T , mild on Jm,T , substantial on

Jg,T and severe on the test C1. The under-rejection of the test C1 with more moment conditions

is much severe in D2.2 than in D1.2. As we have discussed in Remark 2.2, one may expect that

the empirical rejection probabilities of the tests C1 and C2 are close to the nominal size when the

sample size is large in D2.1 and/or D2.2 with 6 IVs, as the critical values χ20.95(4) and χ2

0.95(6) are

close in their relative magnitudes. However, as we can see from Figure 5.2, the empirical rejection

probabilities of the tests C1 and C2 are close to each other but away from the nominal size. The

empirical rejection probability of C1 in D2.1 and/or D2.2 with 6 IVs increases with the sample

size but the rate is slow. Thus the finite sample distributions of the asymptotic lower and upper

bounds of Jψ,T (i.e., χ2(H − p) and χ2(H)) may deserve another interesting investigation. Fourth,

the under-rejection of the tests based on C1 and C2 leads to significant loss of power. The poor

power of the test C2 is expected as it is conservative. However, in D2.3 and D2.4, the power of

the tests C1 is worse than the tests based on Jm,T , Jg,T and Jh,T when 6 IVs are used in GMM

estimation. As D2.4 is a (local) perturbation of D2.2, the poor power of C1 in this DGP is not

surprising. Finally, the condition λ′jG2 6= 0′n−1 for any j ∈ 1, . . . , p in Lemma 3.4 hold underD2.3 and D.2.4. Hence the Jacobian moment conditions are misspecified in these DGPs, explaining

the good power properties of the J-test Jg,T .

5 Conclusion

This paper investigates the GMM estimation and inference when the Jacobian of the moment con-

ditions has known deficient rank. We show that the degenerated Jacobian contains non-trivial

information about the unknown parameters. When such information is employed in estimation,

one can possibly construct GMM estimators and over-identification tests with standard properties.

Our simulation results in the common CH factor models support the proposed theory. The GMM

estimators additionally using the Jacobian-based moment conditions show good finite sample prop-

erties. Moreover, the J-tests based on the proposed GMM estimators have good size control and

22

Page 23: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

better power than the commonly used GMM inference which ignores the information contained in

the Jacobian moments.

Notes1We want to emphasize that weak second-order local identification issue is not allowed in Condition (2.4), be-

cause the asymptotic theory of this paper assumes that the latent data generating process (DGP) is fixed with the

growth of the sample size. Our paper and other closely related papers (DR and DG) do not theoretically study this

issue, but rather focus on the case of local identification failure of known forms with deficient rank Jacobian. The

developed theory and effi ciency gain from adding additional Jacobian moments are therefore more clearly confirmed

through a simulation with first-order local identification failure without weak second-order local identification (see,

e.g., simulation results under D1.2 and D2.2 in Section 4). Introducing weak local identification issue further into

our framework is interesting but left for future research.2We appreciate a referee’s comment pointing this out.3 In reality, one may only be interested in testing the validty of the moment conditions in (2.1). For any

√T -

consistent estimator θT , one can show that

Jψg,T ≡ T−1[T∑t=1

ψt(θT )

]′Ω−1ψ,T

[T∑t=1

ψt(θT )

]→d χ

2(H),

under the null hypothesis that (2.1) and (2.3) hold. In practice, one can let θT = θm,T , or θT = θg,T when θ0 is

identified by (2.3). It is interesting that the estimation error in θT does not effect the asymptotic distribution of the

test statistic Jψg,T under the null, because Jψg,T has the same asymptotic χ2(H) if θT is replaced by θ0. The reason

for this phenomenon is that the Jacobian of the moment conditions in (2.1) is degenerate.4See the simulation results under D1.3, D1.4, D2.3 and D2.4 for example. In these DGPs, the tests using the

moment conditions (2.3) are more powful than the tests which only consider the moment conditions (2.1).5Using the Hadamard product of matrices in many standard softwares (e.g., Matlab), one can conveniently get

the simulated critical values. Let PT and Ωm,T be the consistent estimators of P and Ωm respectively. Define

MT = Ω12mP′TWT,hPT Ω

12m. Let BR be a R × (Hp + H) matrix whose rows are independently generated from the

Hp+H dimensional standard normal distribution. The matrix BR can be generated simultaneously in many softwaresincluding Matlab. Let AB denote the Hadamard product of matrices A and B. Define JR = ((BRMT )BR)×1H(p+1),

where 1H(p+1) is an (Hp+H)×1 vector of ones. Then the simulated (1−α)-th quantile of the asymptotic distribution

of Jh,T is the (1 − α)R-th largest value in JR. In the simulation studies of this paper, we choose R = 250, 000 and

set Wh,T to be the identity matrix, and use θg,T with identity weight matrix to construct MT .6From the simulation studies in Section 4, when H is increased (e.g., H = 3 in D2.1 and D2.2 with 3 IVs but

increased to H = 6 in these DGPs with 6 IVs) and p is fixed (e.g., p = 2 in D2.1 and D2.2), the empirical rejection

probabilities of the tests C1 and C2 are close to each other but both tests are under-sized even when the sample size

is large. The empirical rejection probability of the tests C1 converges slowly to its asymptotic value. See Figures 5.1

and 5.2 and the related discussion in Section 4 for the details.7We assume that µz is known when study the identification of the common feature θ

0∗. This is valid as µz is

uniquely identified by the moment conditions E[zt − µz] = 0.8We use this normalization in the rest of the paper because the main results in DR are also derived under this

restriction. However, the issue in DR and our proposed solutions are irrespective of a specific normalization condition.9 In their simulation studies, DR found that the J-test based on Jψ,T is undersized in one of the DGPs which has

multiple common features (see their simulation results under the DGP labeled as D3). We find similar results for the

J-tests based on Jm,T , Jg,T and Jh,T in the same DGP. These simulation results are available upon request.

23

Page 24: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

10Let (Yt+1)p+1 denote the leading (p+ 1)× 1 subvector of Yt+1. Then (3.1) implies that

Var ((Yt+1)p+1|Ft) = Λp+1DtΛ′p+1 + Ω,

where Λp+1 denotes the leading (p + 1) × p sub matrix of Λ. The factor structure of Yt+1 is kept in (Yt+1)p+1 if

rank(Λp+1) = p.11As ψt (θ) is quadractic in θ, we define the Jacobian moment functions as the partial derivative of ψt (θ) divided

by 2 to simpify notations.12For each simulated sample with sample size T , we generate T + 2500 observations and drop the first 2500

observations to reduce the effect of the initial conditions of the data generating mechanism on the estimation and

inference.13The Jacobian of the moment conditions in (4.4) are quadratic functions of Yt+1, which motivates us to include

the cross-product terms of Yt as extra IVs.14The optimal weight matrix is constructed using a preliminary GMM estimator θψ,T based on moment conditions

(4.3) and the identity weight matrix.15We let Wg,T and Wh,T be the identity matrix when constructing Jh,T .16We use the same bootstrap estimating equation and bootstrap weight matrix employed in Section 5 of DG.

Following DG, we let the bootstrap replication to be 399.17 Indeed we find that the the smallest eigenvalue λmin of the inverse of the asymptotic variance-covariance matrix

H′(Ωg −ΩgψΩ−1ψ Ωψg)−1H of θm,T in D1.1 is very close zero (around 0.0034). Simulation results on λmin in D1.1 and

other DGPs are available upon request.

References

[1] Andrews, D.W.K., and X. Cheng (2012): “Estimation and Inference With Weak, Semi-Strongand Strong Identification,”Econometrica, 80, 2153—2211.

[2] Antoine, B., Bonnal, H., & Renault, E. (2007). On the effi cient use of the informational contentof estimating equations: Implied probabilities and Euclidean empirical likelihood. Journal ofEconometrics, 138(2), 461-487.

[3] Broto, C., and Ruiz, E. (2004): "Estimation Methods for Stochastic Volatility Models: ASurvey," Journal of Economic Surveys, 18(5), 613-649.

[4] Choi, I. and Phillips, P.C.B. (1992): "Asymptotic and Finite Sample Distribution Theory forIV Estimators and Tests in Partially Identified Structural Equations," Journal of Economet-rics, 51(1), 113-150.

[5] Diebold, F. X., & Nerlove, M. (1989): "The Dynamics of Exchange Rate Volatility: A Multi-variate Latent Factor ARCH Model," Journal of Applied Econometrics, 4(1), 1-21.

[6] Dovonon and Goncalves (2016): "Bootstrapping the GMM Overidentification Test UnderFirst-order Underidentification," Working Paper, Concordia University.

[7] Dovonon, P., and Renault, E. (2013): "Testing for Common Conditionally HeteroskedasticFactors," Econometrica, 81(6), 2561-2586.

[8] Doz, C., & Renault, E. (2006): "Factor Stochastic Volatility in Mean Models: A GMM Ap-proach." Econometric Reviews, 25(2-3), 275-309.

24

Page 25: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

[9] Eichenbaum, M., L. Hansen and K. Singleton (1988): "A Time Series Analysis of Representa-tive Agent Models of Consumption and Leisure Choice under Uncertainty," Quarterly Journalof Economics, 103(1), 51-78.

[10] Engle, R. F., and Kozicki, S. (1993): "Testing for Common Features," Journal of Businessand Economic Statistics, 11(4), 369-380.

[11] Engle, R. F., Ng, V. K., & Rothschild, M. (1990): "Asset Pricing with A Factor-ARCHCovariance Structure: Empirical Estimates for Treasury Bills," Journal of Econometrics, 45(1),213-237.

[12] Fieller, E. C., & Hartley, H. O. (1954). Sampling with control variables. Biometrika, 41(3/4),494-501.

[13] Fiorentini, G., Sentana, E., & Shephard, N. (2004): "Likelihood-based Estimation of LatentGeneralized ARCH Structures," Econometrica, 72(5), 1481-1517.

[14] Hall, A.R., (2005): Generalized Methods of Moments. Oxford University Press.

[15] Hansen, L.P. (1982): "Large Sample Properties of Generalized Method of Moments Estima-tors," Econometrica, 50, 1029-1054.

[16] Kleibergen, F. (2005). Testing parameters in GMM without assuming that they are identified.Econometrica, 73(4), 1103-1123.

[17] Newey, W.K. and D. F. McFadden (1994): “Large Sample Estimation and Hypothesis Test-ing,”, in R.F. Engle III and D.F. McFadden (eds.), Handbook of Econometrics, Vol. 4. North-Holland, Amsterdam.

[18] Newey, W. K., & Smith, R. J. (2004). Higher order properties of GMM and generalized em-pirical likelihood estimators. Econometrica, 72(1), 219-255.

[19] Newey, W. K., & Windmeijer, F. (2009). Generalized method of moments with many weakmoment conditions. Econometrica, 77(3), 687-719.

[20] Phillips, P.C.B. (1989): "Partially Identified Econometric Models," Econometric Theory, 5,181-240.

[21] Sargan, J.D. (1983): “Identification and Lack of Identification,”Econometrica, 51, 1605—1633.

[22] Sentana, E. (2015): "Finite underidentification", CEMFI Working Paper 1508.

[23] Silvennoinen, A. and Teräsvirta, T. (2009): "Multivariate GARCH Models," In Handbook ofFinancial Time Series, 201-229, Springer Berlin Heidelberg.

[24] Staiger, D., and J.H. Stock (1997): “Instrumental Variables Regression With Weak Instru-ments,”Econometrica, 65, 557—586.

[25] Stock, J.H., and J.H. Wright (2000): “GMM With Weak Identification,”Econometrica, 68,1055—1096.

25

Page 26: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

APPENDIX

A Two Extensions

In this section, we consider the extension of the main results established in Section A. When theJacobian of moment conditions is not zero but is known to have rank r0 smaller than p, one maystill improve the properties of the GMM estimator by utilizing this information. In this section, weillustrate such improvement in two cases: (i) the Jacobian has several zero columns; and (ii) therank of Jacobian satisfies r0 = p−1. We show that in both cases, the first-order local identificationof the GMM estimator holds as long as the extra moment conditions constructed from the Jacobianare used in estimation. Hence the standard properties of the stacked GMM estimator establishedin the previous section can be easily generalized to these scenarios.

A.1 Jacobian with zero columns

In this subsection, we consider the case that the Jacobian has some zero columns and the positionsof these columns are known. For ease of notations, we assume that the first q (q = p− r0) columnsof Γ (θ0) are zero. Let Γt,(q) (θ) denote the first q columns of Γt (θ) and gt (θ) = vec(Γt,(q) (θ)′). Thesuggested GMM estimator is based on the stacked moment conditions

E [mt (θ0)] =

(E [ψt (θ0)]

E [gt (θ0)]

)= 0(H+Hq)×1. (A.1)

We next show the first-order local identification of θ0 via the moment conditions in (A.1).

Lemma A.1 Let ψh(θ) be the h-th (h = 1, . . . ,H) component function in ψ(θ) and θ(q) denote thefirst q components in θ. Suppose that (i) θ0 belongs to the interior of Θ; and (ii) for any θ ∈ Θ,((

θ(q) − θ0,(q)

)′( ∂2ψh∂θ(q)∂θ

′(q)

(θ0)

)(θ(q) − θ0,(q)

))1≤h≤H

= 0 (A.2)

if and only if θ(q) = θ0,(q); (iii) ∂ψ(θ0)/∂θ′(−q) has full rank. Then the matrix ∂E [mt(θ0)] /∂θ′ hasfull rank.

Remark A.1. Consider the one-step GMM estimation of µz and θ0 in the CH factor modelstudied in Section 3. In this case, the moment conditions without taking the insuffi cient rank ofthe Jacobian into account is

E [ψt (µz, θ0)] =

(E [zt − µz]

E[(zt − µz) θ0′

∗ Yt+1Y′t+1θ

0∗] ) = 02H×1. (A.3)

By definition,

Γ (µz, θ0) =

(−IH 0H×dθ

−E[θ0′∗ Yt+1Y

′t+1θ

0∗]IH 0H×dθ

),

26

Page 27: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

which means that the first H columns of the Jacobian matrix is non-zero and its last H columns areknown to be zero. In this example, the second order identification condition (A.2) can be verifiedusing the same arguments in the proof of Lemma 3.2. Hence, the CH factor model is actually aspecial example of the general framework studied in this subsection. One can stack the momentconditions from the zero columns in Γ (µz, θ0) and the moment conditions (A.3) to construct anone-step GMM estimator of (µz, θ0). Using the arguments in Section D, one can show that this one-step GMM estimator also has the

√T -asymptotic normal distribution. Moreover, as µz is exactly

identified by the moment conditions E [zt − µz] = 0, the two-step GMM estimator of θ0 proposedin Section 3 has the same asymptotic variance as the one-step GMM estimator (see, e.g., Ackerberget al. (2014)).

Remark A.2. The zero columns of the Jacobian together with condition (iii) of Lemma A.1implies that the local identification failure of θ0 (when only E [ψt (θ0)] = 0H×1 is used) is causedby its first q components. As a result, we can relax condition (2.4) in Lemma 2.1, and only requirethe second order local identification condition for the first q components of θ0 in condition (ii) ofLemma A.1.

Remark A.3. By the virtue of the local identification of θ0, same results in Proposition 2.1.(i)and (2.8) can be established for the GMM estimator and J-test statistic based on the stack momentconditions (A.1).

Proof of Lemma A.1. Let Γt,j,(q) (θ) denote the first q components in the j-th row of Γt (θ) forj = 1, . . . , p. Then it is clear that

Γt,j,(q) (θ) =∂ψj,t (θ)

∂θ′(q).

By definition,

∂E [mt(θ0)]

∂θ′=

∂E[ψt(θ0)]∂θ′

E[∂2ψ1,t(θ0)

∂θ(q)∂θ′

]...

E[∂2ψH,t(θ0)

∂θ(q)∂θ′

]

=

0H×q

∂ψ(θ0)∂θ′(−q)

∂2ψ1(θ0)∂θ(q)θ

′(q)

∂2ψ1(θ0)∂θ(q)θ

′(−q)

......

∂2ψH(θ0)∂θ(q)θ

′(q)

∂2ψH(θ0)∂θ(q)θ

′(−q)

. (A.4)

Because θ0 is an interior point in Θ, condition (ii) of lemma is equivalent to

∀x ∈ Rq,(x′∂2ψh (θ0)

∂θ(q)∂θ′(q)

x

)1≤h≤H

= 0 if and only if x = 0. (A.5)

Suppose that Rank(∂E[mt(θ0)]

∂θ′

)< p. Then there exists a non-zero x1 ∈ Rp such that

∂E [mt(θ0)]

∂θ′x1 = 0H(q+1)×1. (A.6)

27

Page 28: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Let x1,(q) denote the first q elements in x1 and x1,(−q) denote the last p − q elements in x1. Bycondition (iii) of lemma, we know that x1,(−q) = 0, which together with (A.6) implies that(

x′1,(q)∂2ψh (θ0)

∂θ(q)θ′(q)

x1,(q)

)1≤h≤H

= 0. (A.7)

By (A.5) and (A.6), we deduce that x1,(q) = 0, which finishes the proof.

A.2 Jacobian with rank p− 1

As recently pointed out in Sentana (2015), a Jacobian matrix with insuffi cient rank contains in-formation about the structural parameters which may be used to restore standard inference forGMM estimators. In this subsection, we provide primitive condition for the local identificationof θ0, when the rank of the Jacobian is p − 1 and extra moment conditions constructed from theJacobian are added in GMM estimation.

Under the rank restriction, there exists a p× 1 vector π0 with π′0π0 = 1 such that

Γ (θ0)π0 = 0H×1. (A.8)

For h = 1, . . . ,H, let ψh,t (θ) denote the h-th component function in ψt (θ0). Then the restrictionin (A.8) implies the following set of moment conditions

E [gh,t(θ0, π0)] = E

[π′0∂ψh,t (θ0)

∂θ

]= 0 for h = 1, . . . ,H. (A.9)

Let g′t(θ0, π0) = [g1,t(θ0, π0), . . . , gH,t(θ0, π0)]. The stacked GMM estimation is based on the follow-ing moment conditions

E [mt(θ0, π0)] =

E [ψt(θ0)]

E [gt(θ0, π0)]

π′0π0 − 1

= 0(2H+1)×1. (A.10)

When θ0 is uniquely identified by E [ψt(θ0)] = 0H×1, we see that (θ′0, π′0)′ is uniquely identified by

the moment conditions in (A.10).Let ΨH,θθ = (π′0E

[∂2ψ1,t (θ0) /∂θ∂θ′

]π0, . . . , π

′0E[∂2ψH,t (θ0) /∂θ∂θ′

]π0)′. We next show the

local identification of (θ′0, π′0)′ via the moment conditions in (A.10).

Lemma A.2 Let ψh(θ) be the h-th (h = 1, . . . ,H) component function in ψ(θ). Suppose that (i)θ0 belongs to the interior of Θ; and (ii) for any θ ∈ Θ,(

(θ − θ0)′(∂2ψh∂θ∂θ′

(θ0)

)(θ − θ0)

)1≤h≤H

= 0

if and only if θ = θ0. Then ∂E [mt(θ0, π0)] /∂θ′ has rank p. If (ΨH,θθ,Γ (θ0) Γ (θ0)′) has rank p,then ∂E [mt(θ0, π0)] /∂(θ′0, π

′0) has rank 2p.

28

Page 29: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Remark A.4. By the virtue of the local identification of θ0, same results in Proposition 2.1.(i)and in (2.8) can be established for the GMM estimator and J-test statistic based on the stackedmoment conditions in (A.10).

Remark A.5. As a non-random restriction π′0π0 = 1 is imposed in the moment restrictions in(A.10), the asymptotic variance-covariance matrix of T−1/2

∑Tt=1mt(θ0, π0) is degenerate. In the

GMM estimation, one can choose the weight matrix

Wm,T =

(Ω−1m,T 02H×1

01×2H 1

)

where Ωm,T is the estimator of the asymptotic variance of T−1/2∑T

t=1

(ψ′t(θ0), g′t(θ0, π0)

)′.Remark A.6. When the rank of the Jacobian is between 1 and p−1 (where p > 2), we know thatthere exists an H × q (where q = p− r0) matrix Π0 with Π′0Π0 = Ip−r0 such that

E [gh,t(θ0,Π0)] = E

[Π′0∂ψh,t (θ0)

∂θ

]= 0q×1 for h = 1, . . . ,H.

Let g′t(θ0,Π0) =[g′1,t(θ0,Π0), . . . , g′H,t(θ0,Π0)

]. One can estimate (θ0,Π0) by the stacked moment

conditions

E [mt(θ0,Π0)] =

E [ψt(θ0)]

E [gt(θ0,Π0)]

vech (Π′0Π0 − Ip−r0)

= 0H∗×1 (A.11)

where H∗ = H(q+1)+q(q+1)/2 and vech (·) denotes the half vectorization of a symmetric matrix.It is clear that θ0 is uniquely identified by (A.11), while Π0 is only identified up to rotation. ByPhillips (1989), (θ0,Π0) is partly identified by the moment conditions in (A.11). Under conditions(i) and (ii) of Lemma A.2, we can show that ∂E [mt(θ0,Π0)] /∂θ′ has rank p. However due tothe non-uniqueness identification of Π0, the convergence rate and the limiting distribution of theGMM estimator of θ0 based on (A.11) are challenging to derive. These issues are left for futureinvestigation.

Proof of Lemma A.2. By definition,

∂E [mt(θ0, π0)]

∂θ′=

Γ (θ0)

E[π′0

∂2ψ1,t(θ0)

∂θ∂θ′

]...

E[π′0

∂2ψH,t(θ0)

∂θ∂θ′

]

. (A.12)

Because θ0 is an interior point in Θ, condition (ii) of Lemma A.2 is equivalent to

∀x ∈ Rp,(x′∂2ψh (θ0)

∂θ∂θ′x

)1≤h≤H

= 0 if and only if x = 0. (A.13)

29

Page 30: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Let x1,p ∈ Rp such that ∂E[mt(θ0,π0)]∂θ′

x1,p = 0H×1. Then (A.12) yields

Γ (θ0) x1,p = 0H×1 and E

[π′0∂2ψh,t (θ0)

∂θ∂θ′x1,p

]= 0 for h = 1, . . . ,H. (A.14)

By the first equation in (A.14), we have x1,p = cπ0, where c is some finite constant. Hence (A.14)implies that

cE

[π′0∂2ψh,t (θ0)

∂θ∂θ′π0

]= 0 for h = 1, . . . ,H. (A.15)

As π0 6= 0, by condition (ii) of Lemma A.2, we know that c must be zero, which implies thatx1,p = 0p×1. This proves the first claim of the lemma.

For the second claim, note that by definition

E [mt(θ0, π0)]

∂π′=

0H×p

Γ (θ0)

2π′0

.

Let xp = (x1,p, x2,p)′ ∈ R2p such that E[mt(θ0,π0)]

∂(θ′,γ′)xp = 0. Then

Γ (θ0) x1,p

E[π′0

∂2ψ1,t(θ0)

∂θ∂θ′x1,p

]...

E[π′0

∂2ψH,t(θ0)

∂θ∂θ′x1,p

]0

+

0H×p

Γ (θ0)

2π′0

x2,p = 0. (A.16)

As Γ (θ0) has rank p−1 and Γ (θ0)π0 = 0, Γ (θ0) x1,p = 0H×1 implies that x1,p = 0p×1 or x1,p = π0.Suppose that x1,p = 0p×1, then (A.16) implies that(

Γ (θ0)

2π′0

)x2,p = 0. (A.17)

As (Γ (θ0)′ , π′0)′ has full rank, (A.16) implies that x2,p−1 = 0p×1. Now suppose that x1,p = π0.Then by (A.16),

π′0x2,p = 0 and ΨH,θθ + Γ (θ0) x2,p = 0. (A.18)

The first equality in (A.18) implies that x2,p = Γ (θ0)′ cp for some cp ∈ RH . Hence, by the secondequality in (A.18),

ΨH,θθ + Γ (θ0) Γ (θ0)′ cp = 0. (A.19)

As the rank of Γ (θ0) Γ (θ0)′ is p− 1 and the rank of (ΨH,θθ,Γ (θ0) Γ (θ0)′) is p, we know that thereis no cp ∈ RH to make (A.19) hold, which implies that we must have xp = 0. This finishes theproof.

30

Page 31: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

B Proof of the Main Results in Section 2

Proof of Lemma 2.1. We first notice that by definition,

H = H (θ0) =∂g (θ0)

∂θ′= (A1, . . . , AH)′ , (B.1)

where Ah = ∂2ψh∂θ∂θ′

(θ0) for h = 1, ...,H. Because θ0 is an interior point in Θ, the condition (ii) oflemma is equivalent to

∀x ∈ Rp,(x′Ahx

)1≤h≤H = 0 if and only if x = 0. (B.2)

Now, suppose that Rank (H) < p. Then there exists a non-zero x ∈ Rp such that

x′H′ =(x′A1, . . . , x

′AH)

=01×pH ,

which implies that x′Ahx = 0 for all h. This contradicts (B.2) and hence, we have rank (H) = p.

Assumption B.1 (i) m(θ) = E [mt(θ)] is continuous in θ; (ii) supθ∈Θ1T

∑Tt=1 [mt(θ)−m(θ)] =

Op(T− 12 ); (iii) supθ∈Θ

1T

∑Tt=1

[∂mt(θ)∂θ′

− ∂m(θ)∂θ′

]= op(1); (iv) ∂m(θ)

∂θ′is continuous in θ.

The proof of Proposition 2.1 is standard (see, e.g., Newey and McFadden, 1994) and thus isomitted. We next present the proof of Theorem 2.1.

Proof of Theorem 2.1. By definition,

T−1T∑t=1

gψ,t(θ) = T−1T∑t=1

gt(θ)− Ωgψ,T Ω−1ψ,TT

−1T∑t=1

ψt(θg,T ). (B.3)

Using the consistency of Ωgψ,T and Ωψ,T , the√T -consistency of θg,T , Assumptions B.1 (iii) and

(iv), we deduce that

Ωgψ,T Ω−1ψ,T

T

[T∑t=1

ψt(θg,T )−T∑t=1

ψt(θ0)

]= Op(T

−1), (B.4)

which together with T−1∑T

t=1 ψt(θ0) = Op(T− 12 ), Assumptions 2.1 (ii) and B.1 (ii) implies that[

T−1T∑t=1

gψ,t(θ)

]′Wg∗,T

[T−1

T∑t=1

gψ,t(θ)

]− g(θ)′Ω−1

∗,gg(θ) = op(1) (B.5)

uniformly over θ ∈ Θ, where Ω∗,g = Ωg − ΩgψΩ−1ψ Ωψg. It is clear that g(θ)′Ω−1

∗,gg(θ) is uniquelyminimized at θ0, because Ω∗,g is positive definite and θ0 is identified by g(θ) = 0. This togetherwith the uniform convergence in (B.5) and the continuity of g(θ) implies the consistency of θg∗,T .

31

Page 32: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Next, we note that θg∗,T satisfies the first order condition[T−1

T∑t=1

∂gt(θg∗,T )

∂θ′

]′Wg∗,T

[T−

12

T∑t=1

gψ,t(θg∗,T )

]= 0. (B.6)

Applying the mean value theorem and using the consistency of Ωgψ,T and Ωψ,T , the√T -consistency

of θg,T , (B.4), Assumptions 2.1 and B.1, we get

T−12

T∑t=1

gψ,t(θg∗,T ) = T−12

T∑t=1

[gt(θ0)− ΩgψΩ−1

ψ ψt(θ0)]

+T−1T∑t=1

∂gt(θg∗,T )

∂θ′

[√T (θg∗,T − θ0)

]+ op(1), (B.7)

where∂gt(θg∗,T )

∂θ′denotes a pH×p matrix whose (i,j)-element (i = 1, . . . , pH, j = 1, ..., p) represents

the function ∂gt,i(·)∂θj

evaluated at the element-wise mean value θg∗,T,j (between θ0,j and θg∗,T,j , forj = 1, ..., p) . Under Assumption 2.1(ii),

T−12

T∑t=1

[gt(θ0)− ΩgψΩ−1

ψ ψt(θ0)]→d N

(0,Ωg − ΩgψΩ−1

ψ Ωψg

). (B.8)

Using Assumptions B.1(iii) and (iv), and the√T -consistency of θg,T , we have

T−1T∑t=1

∂gt(θg∗,T )

∂θ′→p H and T−1

T∑t=1

∂gt(θg∗,T )

∂θ′→p H

which together with (B.6), (B.7), (B.8), and the full rank of H and Ω∗,g proves the claimed result.

Proof of Theorem 2.2. Applying the mean value theorem, we get

T−12

T∑t=1

mt(θg,T ) = T−12

T∑t=1

mt(θ0) + T−12

T∑t=1

∂mt(θg,T )

∂θ′(θg,T − θ0) (B.9)

where ∂mt(θg,T )

∂θ′is defined similarly to

∂gt(θg∗,T )

∂θ′above. Using Assumptions B.1(iii) and (iv) and the√

T -consistency of θg,T , we get

T−12

T∑t=1

∂mt(θg,T )

∂θ′(θg,T − θ0) =

∂m(θ0)

∂θ′

[√T (θg,T − θ0)

]+ op(1). (B.10)

32

Page 33: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Using the standard arguments of showing Proposition 2.1, we have

[√T (θg,T − θ0)

]= −(H′WgH)−1H′Wg

[T−

12

T∑t=1

gt(θ0)

]+ op(1)

which together with (B.9) and (B.10) implies that

T−12

T∑t=1

mt(θg,T ) = P

[T−

12

T∑t=1

mt(θ0)

]+ op(1). (B.11)

Plugging the above expression in the definition of Jh,T and then applying Assumption 2.1 andWh,T →p Wh, we immediately prove the result.

C Proof of the Main Results in Section 3

Proof of Lemma 3.1. (i) By (3.2) and Assumption 3.2,

E[(zt − µz)

(θ′∗Yt+1Y

′t+1θ∗

)]= E

[(zt − µz) θ0′

∗ ΛDtΛ′θ0∗]. (C.1)

Hence if θ0′∗ Λ = 0, then (3.3) holds.

Let Λ′θ0∗ = (c∗1, . . . , c

∗p)′. Then we can write

E[(zt − µz)

(θ′∗Yt+1Y

′t+1θ∗

)]= Cov (zt, Diag (Dt))

(c∗1)2

...

(c∗p)2

. (C.2)

Hence if θ0′∗ Λ 6= 0, we have ((c∗1)2, . . . , (c∗p)

2) 6= 0 which together with (C.2) and Assumption 3.3.(ii)implies that

E[(zt − µz)

(θ′∗Yt+1Y

′t+1θ∗

)]6= 0. (C.3)

This proves the first claim of the lemma.(ii) From (3.7), we know that there exists a θ0

∗ with θ0′∗ Λ = 01×p if and only if

rank(Λ′G2) = rank((Λ′G2,−Λ′ln)). (C.4)

Note that (Λ′G2,−Λ′ln) = Λ′(G2,−ln) where (G2,−ln) has full rank by definition, which combinedwith Assumption 3.1 implies that rank ((Λ′G2,−Λ′ln)) = p. In view of (C.4), we have rank(Λ′G2) =

p.(iii) First suppose that (3.4) holds. By the second result of the lemma, the existence of θ0

∗implies that Λ′G2 has full row rank p. On the other hand, the uniqueness of θ0

∗ implies that Λ′G2

has full column rank n − 1. Hence (3.4) implies that Λ′G2 is an invertible matrix. On the otherhand, if Λ′G2 is invertible, then by (3.7), we know that θ0 = −(Λ′G2)−1Λ′ln which implies theexistence and uniqueness of θ0

∗.

33

Page 34: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Proof of Lemma 3.2. Let Λ = (λ1, . . . , λp), where λj’s (j = 1, . . . , p) are n× 1 real vectors. Itis easy to see that

ψ (θ) = Cov (zt, Diag (Dt))×Diag(Λ′θ∗θ′∗Λ) = G1 ×Diag(Λ′θ∗θ

′∗Λ) (C.5)

where G1 ≡ Cov (zt, Diag (Dt)) and

Diag(Λ′θ∗θ′∗Λ) ≡

[(θ′∗λ1)2, . . . , (θ′∗λp)

2]′.

Note that G1 is an H × p matrix with full rank by Assumption 3.3. Hence, there is

Γ (θ) = G1

[(θ′∗λ1)λ1, . . . , (θ

′∗λp)λp

]′G2

where G2 is defined in the main text. Using the Kronecker product, we get

g (θ) ≡ vec(Γ (θ)′) = (G1 ⊗G′2)

λ1λ

′1θ∗...

λpλ′pθ∗

(C.6)

which further implies that

H =(G1 ⊗G′2)

λ1λ

′1...

λpλ′p

G2. (C.7)

Let G1,hj (h = 1, . . . ,H and j = 1, . . . , p) be the h-th row and j-th column entry of G1. Then wecan rewrite the equation (C.7) as

H =

∑p

j=1G1,1jG′2λjλ

′jG2

...∑pj=1G1,HjG

′2λjλ

′jG2

,

where∑p

j=1G1,hjG′2λjλ

′jG2 is a p× p matrix for any h = 1, . . . ,H.

Suppose that there is an x ∈ Rp such that

Hx =

∑p

j=1G1,1jG′2λjλ

′jG2x

...∑pj=1G1,HjG

′2λjλ

′jG2x

=0Hp×1. (C.8)

But G1 has full column rank, which means that Hx = 0Hp×1 if and only if

G′2λjλ′jG2x = 0p×1 for all j. (C.9)

34

Page 35: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

The condition in (C.9) implies that

0 =

p∑j=1

G′2λjλ′jG2x = G′2ΛΛ′G2x.

We have shown in Lemma 3.1 that Λ′G2 is invertible, which implies that G′2ΛΛ′G2 is also invertible.Hence there must be x = 0.

Proof of Lemma 3.3. Recall the equation (C.6) in the proof of Lemma 3.2:

g (θ) = (G1 ⊗G′2)

λ1λ

′1θ∗...

λpλ′pθ∗

. (C.10)

As the common feature θ0∗ satisfies Λ′θ0

∗ = 0p×1, we immediately have λ′jθ0∗ = 0 for any j = 1, . . . , p,

which implies that g (θ0) = 0pH×1. This shows that θ0∗ is one possible solution of the linear equations

g (θ) = 0pH×1. By the relation between θ∗ = G2θ + ln, and the definition of the matrix H, we canwrite the linear equations g (θ) = 0pH×1 as

Hθ = −(G1 ⊗G′2)

λ1λ

′1ln...

λpλ′pln

which together with the fact that H is a full rank matrix implies that θ0

∗ is the unique solution.

Proof of Lemma 3.4. Suppose that there exists some θ0∗ ∈ Θ∗ such that (3.11) holds. Then

using (3.1) and Assumption 3.2, we have

0 = E[(zt − µz) θ0′

∗ Yt+1Y′t+1G2

]= Cov(zt, diag(Dt))

θ0′∗ λ1λ

′1

...

θ0′∗ λpλ

′p

G2, (C.11)

which combined with Assumption 3.3.(ii) and (3.11) implies that

G′2λjλ′jθ

0∗ = 0n−1 for any j ∈ 1, . . . , p. (C.12)

As λ′jG2 6= 0′n−1 for any j ∈ 1, . . . , p, (C.12) implies that λ′jθ0∗ = 0 for any j ∈ 1, . . . , p. This is

impossible because by the full rank of Λ, λ′jθ0∗ = 0 for any j ∈ 1, . . . , p implies that θ0

∗ = 0 whichcontradicts the unit cost restriction imposed on θ0

∗.

35

Page 36: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

D Asymptotic Properties of the GMM Estimators and J-tests in

Section 3

In this section we investigate new GMM estimation and tests for the common CH features proposedin Section 3. Recall that we have defined

mt (θ) =

[ψt (θ)

gt (θ)

]=

[(zt − µz) θ′∗

[Yt+1Y

′t+1 − E(Yt+1Y

′t+1)

]θ∗

((zt − µz)⊗ Ip)G′2[Yt+1Y

′t+1 − E(Yt+1Y

′t+1)

]θ∗

], (D.1)

where zt is an H × 1 vector of instrumental variables and µz denotes its mean. The existence ofthe common CH features implies

E [mt (θ0)] = 0 for some θ0 ∈ Θ. (D.2)

By definition,

H=∂E [gt (θ)]

∂θ′= E

[((zt − µz)⊗ Ip)G′2(Yt+1Y

′t+1 − E(Yt+1Y

′t+1))G2

](D.3)

which implies that H is an Hp × p real matrix and does not depend on θ. We first show theconsistency of HT .

Lemma D.1 Under Assumption 3.4, we have HT −H =Op(T− 12 ).

Proof of Lemma D.1. It is clear that

HT = T−1T∑t=1

((zt − z)⊗ Ip)G′2Yt+1Y′t+1G2

= T−1T∑t=1

((zt − µz)⊗ Ip)G′2(Yt+1Y′t+1 − E(Yt+1Y

′t+1))G2

− ((z − µz)⊗ Ip)G′2∑T

t=1

[Yt+1Y

′t+1 − E(Yt+1Y

′t+1)

]T

G2

= T−1T∑t=1

((zt − µz)⊗ Ip)G′2(Yt+1Y′t+1 − E(Yt+1Y

′t+1))G2 +Op(T

−1)

= E[((zt − µz)⊗ Ip)G′2(Yt+1Y

′t+1 − E(Yt+1Y

′t+1))G2

]+Op(T

− 12 )

where the third and the last equalities are by Assumption 3.4.Let m (θ) = E[mt(θ)] for any θ ∈ Θ. The next corollary is useful for the asymptotic theory

developed in the next subsections.

Corollary D.1 Under Assumption 3.4 and the compactness of Θ, we have(i) supθ∈Θ

[T−1

∑Tt=1 mt (θ)−m (θ)

]= Op(T

− 12 );

(ii) supθ∈Θ

[T−1

∑Tt=1

∂mt(θ)∂θ′

− ∂m(θ)∂θ′

]= Op(T

− 12 );

(iii) m (θ) and ∂m(θ)∂θ′

are continuous in θ;

36

Page 37: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

(iv) T−12∑T

t=1 mt (θ0)→d N (0,Ωm), where Ωm = limT→∞Var[T−

12∑T

t=1mt (θ0)].

Proof of Corollary D.1. First, we note that∑Tt=1 ψt (θ)√

T=

∑Tt=1

[(zt − z) θ′∗Yt+1Y

′t+1θ∗

]√T

. (D.4)

For h = 1, . . . ,H, we use zh,t to denote the h-th component in zt, zh and µh,z to denote its sampleaverage and population mean respectively. Then we can write∑T

t=1

[(zh,t − zh) θ′∗Yt+1Y

′t+1θ∗

]√T

= θ′∗

∑Tt=1

[(zh,t − zh)Yt+1Y

′t+1

]√T

θ∗ (D.5)

where∑Tt=1

[(zh,t − zh)Yt+1Y

′t+1

]√T

=

∑Tt=1

(zh,t − µh,z

) [Yt+1Y

′t+1 − E(Yt+1Y

′t+1)

]√T

−(zh − µh,z

)∑Tt=1

[Yt+1Y

′t+1 − E(Yt+1Y

′t+1)

]√T

=

∑Tt=1

(zh,t − µh,z

) [Yt+1Y

′t+1 − E(Yt+1Y

′t+1)

]√T

+Op(T− 12 )(D.6)

by Assumption 3.4. Using (D.5), (D.6) and the compactness assumption on Θ, we have

∑Tt=1

[(zh,t − zh) θ′∗Yt+1Y

′t+1θ∗

]√T

=

∑Tt=1

(zh,t − µh,z

)(θ′∗Yt+1

)2 − E [(θ′∗Yt+1

)2]√T

+Op(T− 12 )

uniformly over Θ for any h = 1, . . . ,H, which together with (D.4), the fact that H is a fixed integerand the definition of ψt (θ) implies that∑T

t=1 ψt (θ)√T

=

∑Tt=1 ψt (θ)√

T+Op(T

− 12 ) (D.7)

uniformly over θ ∈ Θ. By the same arguments,∑Tt=1 gt (θ)√

T=

∑Tt=1 ((zt − z)⊗ Ip)G′2Yt+1Y

′t+1θ∗√

T

=

∑Tt=1 ((zt − µz)⊗ Ip)G′2

Yt+1Y

′t+1θ∗ − E[Yt+1Y

′t+1θ∗]

√T

+Op

(T−

12

)=

∑Tt=1 gt (θ)√

T+Op

(T−

12

)(D.8)

uniformly over θ ∈ Θ.

37

Page 38: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

(i) Using (D.7) and (D.8), we have

T−1T∑t=1

[ψt (θ)

gt (θ)

]=

[T−1

∑Tt=1 ψt (θ)

T−1∑T

t=1 gt (θ)

]+Op

(T−1

), (D.9)

uniformly over θ ∈ Θ. For any h = 1, . . . ,H,

∑Tt=1

(zh,t − µh,z

)(θ′∗Yt+1

)2 − E [(θ′∗Yt+1

)2]T

= θ′∗

∑Tt=1

(zh,t − µh,z

) [Yt+1Y

′t+1 − E(Yt+1Y

′t+1)

]T

θ∗

= θ′∗E[(zh,t − µh,z

) Yt+1Y

′t+1 − E[Yt+1Y

′t+1]

]θ∗ +Op(T

− 12 ) (D.10)

uniformly over θ ∈ Θ, where the second equality is by Assumption 3.4 and the compactness of Θ.(D.10) implies that∑T

t=1 ψt (θ)

T= E

[(zt − µz)

(θ′∗Yt+1

)2 − E [(θ′∗Yt+1

)2]]+Op(T

− 12 )

= E [ψt (θ)] +Op(T− 12 ) = ψ (θ) +Op(T

− 12 ) (D.11)

uniformly over θ ∈ Θ. By the same arguments, we can show that∑Tt=1 gt (θ)

T=

∑Tt=1 ((zt − µz)⊗ Ip)G′2

Yt+1Y

′t+1 − E[Yt+1Y

′t+1]

T

θ∗

= E[((zt − µz)⊗ Ip)G′2

Yt+1Y

′t+1 − E[Yt+1Y

′t+1]

]θ∗ +Op(T

− 12 )

= E [gt (θ)] +Op(T− 12 ) = g (θ) +Op(T

− 12 ) (D.12)

uniformly over θ ∈ Θ. Collecting the results in (D.9), (D.11) and (D.12), we get

T−1T∑t=1

[ψt,T (θ)

gt,T (θ)

]=

[ψ (θ)

g (θ)

]+Op(T

− 12 ) (D.13)

uniformly over θ ∈ Θ. This proves the claim in (i).(ii) By definition,

T−1T∑t=1

∂mt (θ)

∂θ′=

∑Tt=1(zt−z)θ′∗Yt+1Y ′t+1

T∑Tt=1((zt−z)⊗Ip)G′2Yt+1Y

′t+1G2

T

=

[ ∑Tt=1(zt−z)θ′∗Yt+1Y ′t+1

T

HT

], (D.14)

where we have proved that HT = H+ Op(T− 12 ), and both HT and H do not depend on θ. Hence,

it is suffi cient to show that

1

T

T∑t=1

[(zt − z) θ′∗Yt+1Y

′t+1

]= E

[(zt − µz) θ′∗Yt+1Y

′t+1

]+Op(T

− 12 ) (D.15)

38

Page 39: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

uniformly over θ ∈ Θ. Note that∑Tt=1 (zt − z) θ′∗Yt+1Y

′t+1

T=

∑Tt=1 (zt − µz) θ′∗Yt+1Y

′t+1

T−

(z − µz)∑T

t=1 θ′∗Yt+1Y

′t+1

T(D.16)

for any θ ∈ Θ. Under Assumption 3.4,

z − µz = Op(T− 12 ) and T−1

T∑t=1

Yt+1Y′t+1 = E

[Yt+1Y

′t+1

]+Op(T

− 12 ) (D.17)

which combined with the compactness of Θ implies

supθ∈Θ

(z − µz) θ′∗∑T

t=1 Yt+1Y′t+1

T= Op(T

− 12 ). (D.18)

For any h = 1, . . . ,H,∑Tt=1

(zh,t − µh,z

)θ′∗Yt+1Y

′t+1

T= θ′∗

∑Tt=1

(zh,t − µh,z

)Yt+1Y

′t+1

T. (D.19)

Under Assumption 3.4,∑Tt=1

(zh,t − µh,z

)Yt+1Y

′t+1

T= E

[(zh,t − µh,z

)Yt+1Y

′t+1

]+Op(T

− 12 ). (D.20)

Using (D.19), (D.20) and the compactness of Θ, we have∑Tt=1

(zh,t − µh,z

)θ′∗Yt+1Y

′t+1

T= E

[(zh,t − µh,z

)θ′∗Yt+1Y

′t+1

]+Op(T

− 12 ) (D.21)

uniformly over θ ∈ Θ. Collecting the results in (D.16), (D.18) and (D.21), we immediately get(D.15).

(iii) By definition,

ψ (θ) = E[(zt − µz) θ′∗

[Yt+1Y

′t+1 − E(Yt+1Y

′t+1)

]θ∗].

For any h = 1, . . . ,H,

E[(zh,t − µh,z

)θ′∗[Yt+1Y

′t+1 − E(Yt+1Y

′t+1)

]θ∗]

= θ′∗Cov(zh,t, Yt+1Y′t+1)θ∗

where Cov(zh,t, Yt+1Y′t+1) is a finite real matrix under Assumption 3.4, which implies that ψ (θ)

and ∂ψ(θ)∂θ′

are continuous in θ. By the definition of m (θ), we know that m (θ) is continuous in θ.

Moreover, ∂g(θ)∂θ′

= H which does not depend on θ. Hence we know that ∂m(θ)∂θ′

is also continuous inθ.

39

Page 40: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

(iv) Using (D.9), we have

T−12

T∑t=1

mt (θ0) =

[T−

12∑T

t=1 ψt (θ0)

T−12∑T

t=1 gt (θ0)

]+Op(T

− 12 )

= T−12

T∑t=1

mt (θ0) +Op(T− 12 ) (D.22)

which together with (D.2) and Assumption 3.4 implies the claim in (iv).The asymptotic variance covariance matrix Ωm can be partitioned as

Ωm =

(Ωψ Ωψg

Ωgψ Ωg

), where

Ωψ = limT→∞

Var

[T−

12

T∑t=1

ψt (θ0)

], Ωg = lim

T→∞Var

[T−

12

T∑t=1

gt (θ0)

],

Ωψg = limT→∞

E

[T−1

T∑t=1

T∑s=1

ψt (θ0) g′s (θ0)

]= Ω′gψ.

D.1 GMM Estimator and Limit Theory

In this subsection, the weight matrix Wj,T for j = m, g or g∗ are known random weight matrixwith probability limits Wj for j = m, g or g∗, respectively, where Wj are positive definite matrices.Recall that the stacked/full GMM estimator is defined as

θm,T = arg minθ∈Θ

[T−1

T∑t=1

mt (θ)

]′Wm,T

[T−1

T∑t=1

mt (θ)

](D.23)

where Wm,T is an H (p+ 1)×H (p+ 1) weight matrix.Note that Assumptions 2.1.(i) and B.1 have been verified in Corollary D.1. We assume the

consistency of Wm,T , which implies that Assumption 2.1.(ii) holds. Moreover, we have shown thatH has full rank in Lemma 3.2. Hence, the following result is directly from Proposition 2.1(i).

Theorem D.1 Under Assumptions 3.1-3.6, the GMM estimator θm,T satisfies

T12 (θm,T − θ0)→d N (0,Σθ,m) .

where Σθ,m≡ (H′Wm,22H)−1M ′θWmΩmWmMθ (H′Wm,22H)−1, and Wm,22 denotes the last Hp×Hpsubmatrix of Wm. Moreover, with the choice of Wm,T = Ω−1

m,T = Ω−1m + op(1), we have

Σ−1θ,m = H′

(Ω−1m

)22H

where(Ω−1m

)22denotes the last Hp×Hp submatrix of Ω−1

m .

40

Page 41: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

We next consider GMM estimator using the Jacobian-based moment restrictions only, i.e.,

θg,T = −(H′TWg,THT

)−1H′TWg,TST (D.24)

where

ST ≡ T−1T∑t=1

((zt − z)⊗ Ip)G′2Yt+1Y′t+1ln. (D.25)

Intuitively, we can use the same arguments in showing Theorem 2.1 to derive the asymptoticnormality for θg,T . However, the closed form expression of θg,T makes the compactness assumptionon Θ unnecessary. In the following, we give a different, yet more straightforward proof of the√T -normality of θg,T .By the parametrization θ∗ = G2θ + ln, θ∗ is automatically in N for any θ ∈ Rn−1. With the

closed form expression of θg,T , we allow that Θ = Rn−1, which means that θ0 is in the interior ofΘ and hence Assumption 3.6 is automatically satisfied.

Theorem D.2 Under Assumptions 3.1-3.4 and (3.4), the Jacobian-GMM estimator θg,T satisfies

T 1/2(θg,T − θ0

)→d N (0,Σθ,g) .

where Σθ,g≡ (H′WgH)−1H′WgΩgWgH (H′WgH)−1. Moreover, with the choice of Wg,T = Ω−1g,T =

Ω−1g + op(1), we have

Σθ,g =(H′Ω−1

g H)−1

.

Proof of Theorem D.2. By definition, ln = θ0∗ −G2θ0 which means that

ST = T−1T∑t=1

((zt − z)⊗ Ip)G′2Yt+1Y′t+1θ

0∗

−T−1T∑t=1

((zt − z)⊗ Ip)G′2Yt+1Y′t+1G2θ0

= −HT θ0 + T−1T∑t=1

gt(θ0). (D.26)

41

Page 42: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

It is clear that∑Tt=1 gt(θ0)

T=

∑Tt=1 ((zt − z)⊗ Ip)G′2Yt+1Y

′t+1θ

0∗

T

=

∑Tt=1 ((zt − µz)⊗ Ip)G′2Yt+1Y

′t+1θ

0∗

T

−((z − µz)⊗ Ip)

∑Tt=1G

′2Yt+1Y

′t+1θ

0∗

T

=

∑Tt=1 ((zt − µz)⊗ Ip)G′2Yt+1Y

′t+1θ

0∗

T− ((z − µz)⊗ Ip)G′2E(Yt+1Y

′t+1)θ0

∗ +Op(T−1

)=

∑Tt=1 ((zt − µz)⊗ Ip)G′2

[Yt+1Y

′t+1 − E(Yt+1Y

′t+1)

]θ0∗

T+Op

(T−1

)(D.27)

where the third equality is by Assumption 3.4. (D.27) implies that∑Tt=1 gt(θ0)√

T=

∑Tt=1 gt(θ0)√

T+Op

(T−

12

). (D.28)

Using Lemma D.1, the consistency of Wg,T , (D.28) and T−1/2∑T

t=1 gt (θ0) = Op(1), we have

T 1/2(θg,T − θ0

)= −(H′TWg,THT )−1H′TWg,T

[T−1/2

T∑t=1

gt (θ0)

]

= −(H′WgH)−1H′Wg

[T−1/2

T∑t=1

gt (θ0)

]+ op(1) (D.29)

→ d N (0,Σθ,g)

where H′WgH is invertible as H has full rank (which is proved in Lemma 3.2) and Wg is a positivedefinite matrix, the weak convergence is by Assumption 3.4 and the Continuous Mapping Theorem(CMT). The second claim is trivial so is omitted.

The GMM estimator θg∗,T based on the modified moment functions is defined as

θg∗,T = −(H′TWg∗,THT

)−1H′TWg∗,T (ST − FT ) (D.30)

where FT = Ωgψ,T Ω−1ψ,TAT , Ωgψ,T and Ωψ,T are consistent estimators of Ωgψ and Ωψ respectively,

AT = T−1T∑t=1

(zt − z)[(G2θg,T + ln

)′Yt+1Y

′t+1

(G2θg,T + ln

)]. (D.31)

We next present the limit theory of the GMM estimator θg∗,T .

Theorem D.3 Under Assumptions 3.1-3.4 and Condition (3.4), the modified GMM estimatorθg∗,T satisfies

T 1/2(θg∗,T − θ0

)→d N

(0,(H′(Ω−1m

)22H)−1).

42

Page 43: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

with the choice of Wg∗,T =(Ω−1m

)22

+ op(1), where(Ω−1m

)22denotes the last pH × pH submatrix of

Ω−1m .

Proof of Theorem D.3. By definition

AT = T−1T∑t=1

(zt − µz)[(G2θg,T + ln

)′Yt+1Y

′t+1

(G2θg,T + ln

)]

− (z − µz)(G2θg,T + ln

)′ ∑Tt=1 Yt+1Y

′t+1

T

(G2θg,T + ln

). (D.32)

It is clear that

(z − µz)(G2θg,T + ln

)′ ∑Tt=1 Yt+1Y

′t+1

T

(G2θg,T + ln

)= (z − µz)

(G2θg,T + ln

)′E[Yt+1Y

′t+1

] (G2θg,T + ln

)+Op(T

−1)

= (z − µz) (G2θ0 + ln)′E[Yt+1Y

′t+1

](G2θ0 + ln) +Op(T

−1)

= (z − µz) θ0′∗ E

[Yt+1Y

′t+1

]θ0∗ +Op(T

−1) (D.33)

where the first and second equalities are by Assumption 3.4 and the√T -consistency of θg,T . For

any h = 1, . . . ,H,

T−1T∑t=1

(zh,t − µh,z

) [(G2θg,T + ln

)′Yt+1Y

′t+1

(G2θg,T + ln

)]

=(G2θg,T + ln

)′ ∑Tt=1

(zh,t − µh,z

)Yt+1Y

′t+1

T

(G2θg,T + ln

)= (G2θ0 + ln)′

∑Tt=1

(zh,t − µh,z

)Yt+1Y

′t+1

T(G2θ0 + ln)

+2(θg,T − θ0)G′2

∑Tt=1

(zh,t − µh,z

)Yt+1Y

′t+1

T(G2θ0 + ln)

+(θg,T − θ0)G′2

∑Tt=1

(zh,t − µh,z

)Yt+1Y

′t+1

TG2(θg,T − θ0)

= θ0′∗

∑Tt=1

(zh,t − µh,z

)Yt+1Y

′t+1

Tθ0∗ +Op(T

−1) (D.34)

where the last equality is by the√T -consistency of θg,T , Assumption 3.4 and∑T

t=1

(zh,t − µh,z

)Yt+1Y

′t+1

T(G2θ0 + ln)

=

∑Tt=1

(zh,t − µh,z

)Yt+1Y

′t+1θ

0∗

T= Op(T

− 12 ).

43

Page 44: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Combining the results in (D.32), (D.33) and (D.34), we get

AT =

∑Tt=1 (zt − µz) θ′0,∗

[Yt+1Y

′t+1 − E(Yt+1Y

′t+1)

]θ0,∗

T+Op(T

−1)

=1

T

T∑t=1

ψt(θ0) +Op(T−1). (D.35)

As 1T

∑Tt=1 ψt(θ0) = Op(T

− 12 ) and Ωgψ,T , Ωψ,T are consistent, we have

FT = Ωgψ,T Ω−1ψ,TAT =

Ωgψ,TΩ−1ψ

T

T∑t=1

ψt(θ0) + op(T− 12 ), (D.36)

which together with (D.26) and (D.28) implies that

ST − FT = −HT θ0 +1

T

T∑t=1

[gt(θ0)− Ωgψ,TΩ−1

ψ ψt(θ0)]

+ op(T− 12 ). (D.37)

Using (D.32) and (D.37), we get

θg∗,T = θ0 −(H′TWg∗,THT )−1H′TWg∗,T

T

T∑t=1

[gt(θ0)− Ωgψ,TΩ−1

ψ ψt(θ0)]

+ op(T− 12 )

= θ0 −(H′Wg∗H)−1H′Wg∗

T

T∑t=1

[gt(θ0)− Ωgψ,TΩ−1

ψ ψt(θ0)]

+ op(T− 12 ) (D.38)

where the second equality is by the consistency of Wg∗,T , Lemma D.1 and

T−1T∑t=1

[gt(θ0)− Ωgψ,TΩ−1

ψ ψt(θ0)]

= Op(T− 12 ).

From (D.38), we deduce that

√T (θg∗,T − θ0) = −(H′Wg∗H)−1H′Wg∗

T

T∑t=1

[gt(θ0)− Ωgψ,TΩ−1

ψ ψt(θ0)]

+ op(1)

→ d N(

0,(H′(Ω−1m

)22H)−1)

where the weak convergence is by Assumption 3.4 and CMT.

D.2 Over-identification Tests and Limit Theory

Three over-identification tests are considered in Section 2. The first test is based on the J-test

Jm,T =

[T−1/2

T∑t=1

mt(θm,T )

]′Ω−1m

[T−1/2

T∑t=1

mt(θm,T )

], (D.39)

44

Page 45: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

where Ωm is the consistent estimator of Ωm, which tests the validity of the stacked moment condi-tions in (3.12). The second test is based on the J-test

Jg,T =

[T−1/2

T∑t=1

gt(θ∗g,T )

]′Ω−1g

[T−1/2

T∑t=1

gt(θ∗g,T )

], (D.40)

where Ωg is the consistent estimator of Ωg and

θ∗g,T = −

(H′T Ω−1

g HT)−1

H′T Ω−1g ST ,

which tests the validity of the Jacobian moment conditions in (3.11). The test based on

Jh,T =

[T−1/2

T∑t=1

mt(θg,T )

]′Wh,T

[T−1/2

T∑t=1

mt(θg,T )

], (D.41)

is proposed to test moment conditions in (3.12), where

θg,T = −(H′TWg,THT

)−1H′TWg,TST ,

Wg,T = Wg + op(1) and Wh,T = Wh + op(1), Wg and Wh are positive definite matrices.We have derived the

√T -normality for θm,T in Theorem D.1, which together with Corollary

D.1 and standard arguments implies the weak convergence in (2.8), i.e.

Jm,T →d χ2 (H(p+ 1)− p) .

Similarly, the√T -normality of θg,T derived in Theorem D.2 combined with Corollary D.1 enables

us to invoke Theorem 2.2 to show that

Jh,T →d B′H(p+1)Ω

12mP′WhPΩ

12mBH(p+1)

where P ≡ IH(p+1)−diag(0H×H ,H(H′WgH)−1H′Wg) and BH(p+1) denotes an H(p+1)×1 standardnormal random vector.18

Theorem D.4 Under Assumptions 3.1-3.4 and (3.4), we have

Jg,T →d χ2 (Hp− p) .

45

Page 46: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Proof of Theorem D.4. For any θ, we can write∑Tt=1 gt(θ)√

T=

∑Tt=1 ((zt − z)⊗ Ip)G′2Yt+1Y

′t+1θ∗√

T

=

∑Tt=1 ((zt − µz)⊗ Ip)G′2Yt+1Y

′t+1G2√

T(G2θ + ln)

−((z − µz)⊗ Ip)

∑Tt=1G

′2Yt+1Y

′t+1√

T(G2θ + ln). (D.42)

Note that ∑Tt=1 ((zt − µz)⊗ Ip)G′2Yt+1Y

′t+1√

T(G2θg,T + ln)

=

∑Tt=1 ((zt − µz)⊗ Ip)G′2Yt+1Y

′t+1√

T(G2θ0 + ln)

+

∑Tt=1 ((zt − µz)⊗ Ip)G′2Yt+1Y

′t+1G2

T

√T (θg,T − θ0)

=

∑Tt=1 ((zt − µz)⊗ Ip)G′2Yt+1Y

′t+1θ

0∗√

T+H√T (θg,T − θ0) +Op(T

− 12 ) (D.43)

where the last equality is by the√T -consistency of θg,T and Lemma D.1. Next, we have

((z − µz)⊗ Ip)∑T

t=1G′2Yt+1Y

′t+1√

T(G2θg,T + ln)

=((z − µz)⊗ Ip)

∑Tt=1G

′2Yt+1Y

′t+1√

T(G2θ0 + ln)

+((z − µz)⊗ Ip)

∑Tt=1G

′2Yt+1Y

′t+1G2

T

√T (θg,T − θ0)

=((z − µz)⊗ Ip)G′2E(Yt+1Y

′t+1)θ0,∗√

T+Op(T

− 12 ) (D.44)

where the last equality is by the√T -consistency of θg,T and

z − µz = Op(T− 12 ) and

∑Tt=1 Yt+1Y

′t+1

T= E(Yt+1Y

′t+1) +Op(T

− 12 ).

Combining the results in (D.42), (D.43) and (D.44), we have∑Tt=1 gt(θg,T )√

T=

∑Tt=1 gt(θ0)√

T+H√T (θg,T − θ0) +Op(T

− 12 ), (D.45)

46

Page 47: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

which together with (D.29) in the proof of Theorem D.2 implies that

∑Tt=1 gt(θ

∗g,T )

√T

=

12g −H(H′Ω−1

g H)−1H′Ω−12

g

][Ω− 12

g T−1/2T∑t=1

gt (θ0)

]+ op(1). (D.46)

By the consistency of Ωg and (D.46), we deduce that

Jg,T =

[T−1/2

T∑t=1

gt(θ∗g,T )

]′Ω−1g

[T−1/2

T∑t=1

gt(θ∗g,T )

]

=

[Ω− 12

g T−1/2T∑t=1

gt (θ0)

]′Pg

[Ω− 12

g T−1/2T∑t=1

gt (θ0)

]+ op(1) (D.47)

where Pg = IHp − Ω− 12

g H(H′Ω−1g H)−1H′Ω−

12

g is an idempotent matrix with rank Hp − p as H hasfull rank p. Using (D.47) and Assumption 3.4, we deduce that

Jg,T →d B′HpPgBHp

d∼ χ2 (Hp− p)

where BHp denotes an Hp× 1 standard normal random vector.

E Supplementary Table and Simulation Results

Table E.1. Number of Moments and Degrees of Freedom of J-tests

for Figure 5.1, when n = 2 (p = 1)

θ0 number of moments degrees of freedom

is estimated with 2IVs(zt+1) with 3IVs (zt+1) with 2IVs(zt+1) with 3IVs (zt+1)

Jm,T 4 6 3 5

Jg,T 2 3 1 2

Jh,T 4 6 N/A N/A

C1 2 3 1 2

C2 2 3 2 3

for Figure 5.2, when n = 3 (p = 2)

(θ0,1, θ0,2) number of moments degrees of freedom

are estimated with 3 IVs(zt+1) with 6 IVs (zt+1) with 3 IVs(zt+1) with 6 IVs (zt+1)

Jm,T 9 18 7 16

Jg,T 6 12 4 10

Jh,T 9 18 N/A N/A

C1 3 6 1 4

C2 3 6 3 6

47

Page 48: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Figure E.1. Finite Sample Properties of the GMM Estimators of θ0 under D1.1

48

Page 49: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Figure E.2. Finite Sample Properties of the GMM Estimators of θ0 under D1.2

49

Page 50: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Figure E.3. Finite Sample Properties of the GMM Estimators of θ0,1 under D2.1

50

Page 51: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Figure E.4. Finite Sample Properties of the GMM Estimators of θ0,2 under D2.1

51

Page 52: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Figure E.5. Finite Sample Properties of the GMM Estimators of θ0,1 under D2.2

52

Page 53: On Standard Inference for GMM with Local Identi–cation ... · the moment conditions in (2.1), we can use the moment conditions in (2.1) and (2.3) in GMM to ensure both the global

Figure E.6. Finite Sample Properties of the GMM Estimators of θ0,2 under D2.2

53