2003s-39 projection-based statistical inference in linear ... · consists here in using a variant...

52
Montréal Mai 2003 © 2003 Jean-Marie Dufour, Mohamed Taamouti. Tous droits réservés. All rights reserved. Reproduction partielle permise avec citation du document source, incluant la notice ©. Short sections may be quoted without explicit permission, if full credit, including © notice, is given to the source. Série Scientifique Scientific Series 2003s-39 Projection-Based Statistical Inference in Linear Structural Models with Possibly Weak Instruments Jean-Marie Dufour, Mohamed Taamouti

Upload: others

Post on 30-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Montréal Mai 2003

© 2003 Jean-Marie Dufour, Mohamed Taamouti. Tous droits réservés. All rights reserved. Reproduction partielle permise avec citation du document source, incluant la notice ©. Short sections may be quoted without explicit permission, if full credit, including © notice, is given to the source.

Série Scientifique Scientific Series

2003s-39

Projection-Based Statistical Inference in Linear Structural

Models with Possibly Weak Instruments

Jean-Marie Dufour, Mohamed Taamouti

Page 2: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

CIRANO

Le CIRANO est un organisme sans but lucratif constitué en vertu de la Loi des compagnies du Québec. Le financement de son infrastructure et de ses activités de recherche provient des cotisations de ses organisations-membres, d’une subvention d’infrastructure du ministère de la Recherche, de la Science et de la Technologie, de même que des subventions et mandats obtenus par ses équipes de recherche.

CIRANO is a private non-profit organization incorporated under the Québec Companies Act. Its infrastructure and research activities are funded through fees paid by member organizations, an infrastructure grant from the Ministère de la Recherche, de la Science et de la Technologie, and grants and research mandates obtained by its research teams.

Les organisations-partenaires / The Partner Organizations PARTENAIRE MAJEUR . Ministère du développement économique et régional [MDER] PARTENAIRES . Alcan inc. . Axa Canada . Banque du Canada . Banque Laurentienne du Canada . Banque Nationale du Canada . Banque Royale du Canada . Bell Canada . Bombardier . Bourse de Montréal . Développement des ressources humaines Canada [DRHC] . Fédération des caisses Desjardins du Québec . Gaz Métropolitain . Hydro-Québec . Industrie Canada . Ministère des Finances [MF] . Pratt & Whitney Canada Inc. . Raymond Chabot Grant Thornton . Ville de Montréal . École Polytechnique de Montréal . HEC Montréal . Université Concordia . Université de Montréal . Université du Québec à Montréal . Université Laval . Université McGill ASSOCIÉ AU : . Institut de Finance Mathématique de Montréal (IFM2) . Laboratoires universitaires Bell Canada . Réseau de calcul et de modélisation mathématique [RCM2] . Réseau de centres d’excellence MITACS (Les mathématiques des technologies de l’information et des systèmes complexes)

ISSN 1198-8177

Les cahiers de la série scientifique (CS) visent à rendre accessibles des résultats de recherche effectuée au CIRANO afin de susciter échanges et commentaires. Ces cahiers sont écrits dans le style des publications scientifiques. Les idées et les opinions émises sont sous l’unique responsabilité des auteurs et ne représentent pas nécessairement les positions du CIRANO ou de ses partenaires. This paper presents research carried out at CIRANO and aims at encouraging discussion and comment. The observations and viewpoints expressed are the sole responsibility of the authors. They do not necessarily represent positions of CIRANO or its partners.

Page 3: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Projection-Based Statistical Inference in Linear Structural Models with Possibly Weak Instruments *

Jean-Marie Dufour†, Mohamed Taamouti‡

Résumé / Abstract

L’une des questions les plus étudiées récemment en économétrie est celle des modèles présentant des problèmes de quasi non-identification ou d’instruments faibles. L’une des conséquences importantes de ce problème est la non validité de la théorie asymptotique standard [Dufour (1997, Econometrica), Staiger et Stock (1997, Econometrica), Wang et Zivot (1998, Econometrica), Stock et Wright (2000, Econometrica), Dufour et Jasiak (2001, International Economic Review)]. Le défi majeur dans ce cas consiste à trouver des méthodes d’inférence robustes à ce problème. Une solution possible consiste à utiliser la statistique d’Anderson-Rubin (1949, Ann. Math. Stat.). Nous mettons l’emphase sur les procédures de type Anderson-Rubin, car celles-ci sont robustes tant à la présence d’instruments faibles et à l’exclusion d’instruments. Cette dernière ne fournit cependant des tests exacts que pour les hypothèses spécifiant le vecteur entier des coefficients des variables endogènes dans un modèle structurel, et de façon correspondante, que des régions de confiance simultanées pour ces coefficients. Elle ne permet pas de tester des hypothèses spécifiant des coefficients individuels ou sur des transformations de ces coefficients. Ce problème peut être résolu en principe par des techniques de projection [Dufour (1997, Econometrica), Dufour et Jasiak (2001, International Economic Review)]. Cependant , ces techniques ne sont pas toujours faciles à appliquer et requièrent en général l’emploi de méthodes numériques. Dans ce texte, nous proposons une solution explicite complète au problème de la construction de régions de confiance par projection basées sur des statistiques de type Anderson-Rubin. Cette solution exploite les propriétés géométriques des “quadriques” et peut s’interpréter comme une extension des intervalles et ellipsoïdes de confiance usuels. Le calcul de ces régions ne requièrent que des techniques de moindres carrés. Nous étudions également par simulation le degré de conservatisme des régions de confiance obtenues par projection. Enfin, nous illustrons les méthodes proposées par trois applications différentes: la relation entre l’ouverture commerciale et la croissance, le rendement de l’éducation et une étude sur les rendement d’échelles dans l’économie américaine.

Mots clés : équations simultanées ; modèle structurel ; variable instrumentale; instruments faibles; intervalle de confiance ; test ; projection ; inférence simultanée ; inférence exacte; théorie asymptotique.

* The authors thank Craig Burnside for providing us his data, as well as Laurence Broze, John Cragg, Jean-Pierre Florens, Christian Gouriéroux, Joanna Jasiak, Frédéric Jouneau, Lynda Khalaf, Nour Meddahi, Benoît Perron, Tim Vogelsang and Eric Zivot for several useful comments. This work was supported by the Canada Research Chair Program (Chair in Econo-metrics, Université de Montréal), the Alexander-von-Humboldt Foundation (Germany), the Canadian Network of Centres of Excellence [program on Mathematics of Information Technology and Complex Systems (MITACS)], the Canada Council for the Arts (Killam Fellowship), the Natural Sciences and Engineering Research Council of Canada, the Social Sciences and Humanities Research Council of Canada, the Fonds de recherche sur la société et la culture (Québec), and the Fonds de recherche sur la nature et les technologies (Québec). One of the authors (Taamouti) was also supported by a Fellowship of the Canadian International Development Agency (CIDA). † Canada Research Chair Holder (Econometrics). Centre interuniversitaire de recherche en analyse des organisations (CIRANO), Centre interuniversitaire de recherche en économie quantitative (CIREQ), and Département de sciences économiques, Université de Montréal. Mailing address: Département de sciences économiques, Université de Montréal, C.P. 6128 succursale Centre-ville, Montréal, Québec, Canada H3C 3J7. Tel: 1 514 343 2400; Fax: 1 514 343 5831; e-mail: [email protected]. Web page: http://www.fas.umontreal.ca/SCECO/Dufour . ‡ INSEA, Rabat and CIREQ, Université de Montréal. Mailing address: INSEA., B.P. 6217, Rabat-Instituts, Rabat, Morocco. Tel: 212 7 77 09 26; Fax: 212 7 77 94 57. e-mail: [email protected].

Page 4: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

It is well known that standard asymptotic theory is not valid or is extremely unreliable in models

with identification problems or weak instruments [Dufour (1997, Econometrica), Staiger and Stock (1997, Econometrica), Wang and Zivot (1998, Econometrica), Stock and Wright (2000, Econometrica), Dufour and Jasiak (2001, International Economic Review)]. One possible way out consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows one to build exact tests and confidence sets only for the full vector of the coefficients of the endogenous explanatory variables in a structural equation, which in general does not allow for individual coefficients. This problem may in principle be overcome by using projection techniques [Dufour (1997, Econometrica), Dufour and Jasiak (2001, International Economic Review)]. Artypes are emphasized because they are robust to both weak instruments and instrument exclusion. However, these techniques can be implemented only by using costly numerical techniques. In this paper, we provide a complete analytic solution to the problem of building projection-based confidence sets from Anderson-Rubin-type confidence sets. The latter involves the geometric properties of “quadrics” and can be viewed as an extension of usual confidence intervals and ellipsoids. Only least squares techniques are required for building the confidence intervals. We also study by simulation how “conservative” projection-based confidence sets are. Finally, we illustrate the methods proposed by applying them to three different examples: the relationship between trade and growth in a cross-section of countries, returns to education, and a study of production functions in the U.S. economy.

Keyswords : Simultaneous equations; structural model; instrumental variable; weak instrument; confidence interval; testing; projection; simultaneous inference; exact inference; asymptotic theory.

Page 5: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Contents

List of Propositions and Theorems iv

1. Introduction 1

2. Framework 4

3. Quadric confidence sets 7

4. Geometry of quadric confidence sets 94.1. Nonsingular concentration matrix. . . . . . . . . . . . . . . . . . . . . . . . . 9

4.1.1. Positive definite concentration matrix. . . . . . . . . . . . . . . . . . . 104.1.2. Negative definite concentration matrix. . . . . . . . . . . . . . . . . . 104.1.3. Concentration matrix not positive or negative definite. . . . . . . . . . 10

4.2. Singular concentration matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . 114.3. Necessary and sufficient condition for bounded quadric confidence set. . . . . . 114.4. Joint confidence sets forβ andγ . . . . . . . . . . . . . . . . . . . . . . . . . 12

5. Confidence sets for transformations ofβ 125.1. The projection approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125.2. Projection-based confidence sets for scalar linear transformations. . . . . . . . 13

5.2.1. Nonsingular concentration matrix. . . . . . . . . . . . . . . . . . . . . 145.2.2. Singular concentration matrix. . . . . . . . . . . . . . . . . . . . . . . 16

5.3. A Wald-type interpretation of the projection-based confidence sets. . . . . . . . 18

6. Monte Carlo evaluation 19

7. Empirical illustrations 297.1. Trade and growth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297.2. Education and earnings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327.3. Returns to scale and externality spillovers in U.S. industry. . . . . . . . . . . . 34

8. Conclusion 36

A. Appendix: Proofs 38

iii

Page 6: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

List of Propositions and Theorems

4.1 Theorem :Necessary and sufficient condition for bounded confidence set. . . . . . 125.1 Theorem : Projection-based confidence sets for linear transformations when the con-

centration matrix is positive definite. . . . . . . . . . . . . . . . . . . . . . . 145.2 Corollary : Projection-based confidence interval for an individual coefficient. . . . 145.3 Theorem : Projection-based confidence sets for linear transformations when the con-

centration matrix has one negative eigenvalue. . . . . . . . . . . . . . . . . . 155.4 Theorem : Projection-based confidence sets for linear transformations when the con-

centration matrix has more than one negative eigenvalue. . . . . . . . . . . . . 155.5 Theorem :Projection-based confidence sets with a posssibly sigular concentration matrix 17

Proof of equation (4.13). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Proof of Theorem4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Proof of Theorem5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Proof of Theorem5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Proof of Theorem5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Proof of Theorem5.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

List of Tables

1 Empirical coverage rate of TSLS-based Wald confidence sets. . . . . . . . . . . 202 Characteristics of AR projection-based confidence sets _Cij = 0 . . . . . . . . . 213 Characteristics of AR projection-based confidence sets _1 ≤ Cij ≤ 5 . . . . . . . 234 Characteristics of AR projection-based confidence sets _10 ≤ Cij ≤ 20 . . . . . 255 Comparison between AR and LR projection-based confidence sets when they are

bounded . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 Power of tests induced by projection-based confidence sets _H0 : β1 = 0 . . . . 287 Confidence sets for the coefficients of the Frankel-Romer income-trade equation. 328 Projection-based confidence sets for the coefficients of the exogenous covariates in

the income-education equation. . . . . . . . . . . . . . . . . . . . . . . . . . . 349 Confidence sets for the returns to scale and externality coefficients in different U.S.

industries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

List of Figures

1 Power of tests induced by projection-based confidence sets _H0 : β1 = 0.5 . . . 30

iv

Page 7: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

1. Introduction

One of the classic problems of econometrics consists in making inference on the coefficients ofstructural models. Such models typically involve endogenous explanatory variables (which can leadto endogeneity biases), the need to use instrumental variables, and the possibility that “structural pa-rameters of interest” may not be identifiable. Recently, the statistical problems raised by structuralmodelling have received new attention in view of the observation that proposed instruments areoften “weak”, i.e. poorly correlated with the relevant endogenous variables, which correspond tosituations where the structural parameters are close to being not identifiable (through the instrumentsused). The literature on so-called “weak instruments” problems is now considerable; see, for ex-ample, Nelson and Startz (1990a, 1990b), Buse (1992), Maddala and Jeong (1992), Bound, Jaeger,and Baker (1993, 1995), Angrist and Krueger (1995), Hall, Rudebusch, and Wilcox (1996), Dufour(1997), Shea (1997), Staiger and Stock (1997), Wang and Zivot (1998), Zivot, Startz, and Nelson(1998), Startz, Nelson, and Zivot (1999), Perron (1999), Chao and Swanson (2000), Hall and Peixe(2000), Stock and Wright (2000), Dufour and Jasiak (2001), Hahn and Hausman (2002a, 2002b),Kleibergen (2001, 2002), Moreira (2001, 2002), Stock and Yogo (2002) and Stock, Wright, andYogo (2002)].

In such contexts, several papers have documented by simulation and approximate asymptoticmethods the poor performance of standard asymptotically justified procedures [Nelson and Startz(1990a, 1990b), Buse (1992), Bound, Jaeger, and Baker (1993, 1995), Hall, Rudebusch, and Wilcox(1996), Staiger and Stock (1997), Zivot, Startz, and Nelson (1998), Dufour and Jasiak (2001)]. Themain difficulty here is that the finite-sample distributions of the relevant statistics (in particular, teststatistics) are very sensitive to unknown nuisance parameters; indeed, they can exhibit an arbitrarylarge sensitivity to such parameters. Further, limiting distributions are non-standard when identifica-tion conditions do not hold, while usual large-sample approximations do not converge uniformly, sothat the latter may be arbitrarily inaccurate in finite samples even when identification and standardregularity conditions obtain. The fact that standard asymptotic theory can be arbitrarily inaccu-rate in finite samples (of any size) is shown rigorously in Dufour (1997), where it is observed thatvalid confidence intervals in a standard linear structural equations model must be unbounded withpositive probability and Wald-type statistics have distributions which can deviate arbitrarily fromtheir large-sample distribution (even when identification holds). The fact that both finite-sampleand large-sample distributions exhibit strong dependence upon nuisance parameters has also beendemonstrated by other methods, such as finite-sample distributional theory [see Choi and Phillips(1992)] and local to nonidentification asymptotics [see Staiger and Stock (1997) and Wang andZivot (1998)].

As a result, it appears especially important in such problems to build tests and confidence setsbased on properly pivotal (or boundedly pivotal) functions, as well as to study inference proceduresfrom a finite-sample perspective. The fact that tests should be based on statistics whose distributionscan be bounded and that confidence sets should be obtained from pivotal statistics is, of course, a re-quirement of basic statistical theory [see Lehmann (1986)]. In the framework of linear simultaneousequations and in view of weak instrument problems, the importance of using pivotal functions forstatistical inference has been recently reemphasized by several authors [see Dufour (1997), Staiger

1

Page 8: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

and Stock (1997), Wang and Zivot (1998), Zivot, Startz, and Nelson (1998), Startz, Nelson, andZivot (1999), Dufour and Jasiak (2001), Stock and Wright (2000), Kleibergen (2001, 2002), Mor-eira (2001, 2002), and Stock, Wright, and Yogo (2002)]. In particular, this suggests that confidencesets should be built by inverting likelihood ratio (LR) and Lagrange multiplier (LM) type statis-tics, as opposed to the more usual method which consists in inverting Wald-type statistics (such asasymptotict-ratios).

In this paper, we wish to concentrate on procedures for which finite-sample pivotality obtainsunder standard assumptions. In view of the results in Dufour (1997), we consider this approachas the best guide to selecting test and confidence set procedures (even though asymptotic validitywill hold under weaker distributional assumptions). Useful pivotal functions are however difficultto find in structural models. The oldest one appears to be the statistic proposed by Anderson andRubin (1949, henceforth AR). The latter is a limited-information method which allows one to testan hypothesis setting the value of the full vector of the endogenous explanatory variable coefficientsin a linear structural equation; under usual parametric assumptions (error Gaussianity, instrumentstrict exogeneity) the distribution of the statistic is a central Fisher distribution, while under weaker(standard) assumptions it is asymptotically chi-square, irrespective of the presence of weak instru-ments.

Limited-informationmethods typically involve an efficiency loss with respect tofull-informationmethods, but do allow for a less complete specification of the model and more robustness. Indeed,the AR statistic enjoys several remarkableinvariance(or robustness) properties. Namely it is com-pletely robust (in finite samples) to the presence of weak instruments (robustness to weak instru-ments), to the exclusion of possibly relevant instruments (robustness to instrument exclusion), andmore generally to the distribution of explanatory endogenous variables (robustness to endogenousexplanatory variable distribution).1 More precisely, its finite-sample distribution (under the nullhypothesis) is completely unaffected by the presence of “weak instruments”, the exclusion of rel-evant instruments, and the error distribution in the reduced form for the explanatory endogenousvariables. We view all these features as important because it is typically difficult to know whether aset of instruments is globally weak (so that the resulting inference becomes unreliable) or whetherrelevant instruments have been excluded (which seems highly likely in most practical situations).As a result, tests and confidence sets based on the AR statistic remains valid irrespective whetherinstruments are weak or relevant instruments have been excluded. Extensions of the AR statisticswith the same basic robustness properties and a finite-sample distributional theory are also proposedin Dufour and Jasiak (1993, 2001).

Other potential pivots aimed at being robust to weak instruments have recently been suggestedby Wang and Zivot (1998), Kleibergen (2002) and Moreira (2002). These methods are closer tobeing full-information methods _ in the sense that they rely on a relatively specific formulation ofthe model for the endogenous explanatory variables of the model _ and thus may lead to power gainsunder the assumptions considered. But this will typically be at the expense of robustness. Further,only asymptotic distributional theories have been supplied for these statistics, so that the level ofthe procedures may not be controlled in finite samples. Indeed, it is easy to see that none of these

1We borrow the terminology “robust to weak instruments” from Stock, Wright, and Yogo (2002, p. 518). Robustnessto instrument exclusion appears to have been little discussed in the literature on weak instruments.

2

Page 9: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

statistics is pivotal in finite samples (i.e., their finite-sample distributions involve unknown nuisanceparameters) or robust to the exclusion of relevant instruments. It is clear that these statistics do notqualify as pivotal in finite samples.

An important practical shortcoming of the above methods is that they are designed to test hy-potheses of the formH0 : β = β0, whereβ is the coefficient vector forall the endogenous ex-planatory variables. In particular these statistics do not allow to test linear and nonlinear restrictionson the vectorβ. A general solution to this problem is theprojection technique described in Du-four (1990, 1997), Wang and Zivot (1998) and Dufour and Jasiak (2001).2 This problem was alsoconsidered by Choi and Phillips (1992), Stock and Wright (2000) and Kleibergen (2001). WhileChoi and Phillips (1992) did not propose an operational method for dealing with the problem, themethods considered by Stock and Wright (2000) and Kleibergen (2001) rely on the assumptionthat the structural parameters not involved in the restrictions are well identified and rely on large-sample approximations (which become invalid when the identification assumptions made do nothold). Consequently they are not robust to weak instruments. For these reasons, we shall focus hereon the projection approach.

The basic idea behind the projection technique is simple. Letθ be a multidimensional parametervector for which we can build a confidence setCθ(α) with level 1 − α : P[θ ∈ Cθ(α)] ≥ 1 − α.Now consider a transformation of interestg(θ) which takes its values inRm. For example,g(θ)could be one of the components ofθ. Then it is easy to see that the image setg[Cθ(α)] = {g(θ) ∈Rm : θ ∈ Cθ(α)} is a confidence set with level1 − α for g(θ), i.e. P

[g(θ) ∈ g[Cθ(α)]

]> 1 − α.

Such methods are also exploited in Abdelkhalek and Dufour (1998) and Dufour and Kiviet (1998)for completely different models. In general, however, the calculation ofg[Cθ(α)] is not simple andmay require the use of costly numerical methods [as done, for example, in Abdelkhalek and Dufour(1998), Dufour and Kiviet (1998) or Dufour and Jasiak (2001)].

In this paper, we study some general geometric features ofAR-type confidence sets and we pro-vide a complete explicit solution to the problem of building projection-based confidence sets fromsuch confidence sets. Wefirst observe thatAR-type confidence sets can be described asquadrics[see Shilov (1961, Chapter 11) and Pettofrezzo and Marcoantonio (1970)], a class of geometricfigures which covers as special cases the usual confidence intervals and ellipsoids but also includeshyperboloids and paraboloids. In particular, we give a simple necessary and sufficient conditionunder which such confidence sets are bounded (which indicates that the parameters considered areidentifiable). We use the projection technique to build confidence sets for components of the vectorof unknown parameters and for linear combinations of these components.

Second, using these results, we then derive simple explicit expressions for projection-based

2Another shortcoming of AR-type tests comes from the fact that power may decline as the number of instrumentsincreases, especially if they have little relevance. This indicates that the number of instruments should be kept as smallas possible. Because AR statistics are robust to the exclusion of instruments (even if they are relevant), this can be donerelatively easily. We discuss the problem of selecting optimal instruments and reducing the number of instruments in twocompanion papers [Dufour and Taamouti (2001b, 2001a)]. In the present paper, we focus on the problem of buildingprojection-based confidence sets, for agivenset of instruments. For results relevant to instrument selection, the readermay also consult Cragg and Donald (1993), Hall, Rudebusch, and Wilcox (1996), Shea (1997), Staiger and Stock (1997),Chao and Swanson (2000), Donald and Newey (2001), Hall and Peixe (2000), Hahn and Hausman (2002a, 2002b), Stockand Yogo (2002).

3

Page 10: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

confidence intervals in the case of individual structural coefficients (or linear transformations ofthese coefficients). Consequently, no search by nonlinear methods is anymore required. The explicitcalculation of the confidence sets thus makes the projection approach very attractive. When theprojection-based confidence intervals are bounded, they may be interpreted as confidence intervalsbased on k-class estimators [for a discussion of k-class estimators, see Davidson and MacKinnon(1993, page 649)] where the “standard error” is corrected in a way that depends on the level of thetest. The confidence interval for a linear combination of the parameters, sayw′β takes the usualform [w′β − σzα, w′β + σzα] with β a k-class type estimator ofβ.

Thirdly, we show that the confidence sets obtained in this way enjoy another important property,namelysimultaneityin the sense discussed by Miller (1981), Savin (1984) and Dufour (1989). Moreprecisely, projection-based confidence sets (or confidence intervals) can be viewed as Scheffé-typesimultaneous confidence sets _ which are widely used in analysis of variance _ so that the probabilitythat any number of the confidence statements made (for different functions of the parameter vector)hold jointly is controlled. Correspondingly, multiple hypotheses onβ can be tested without everlosing control the overall level of the tests,i.e. the probability of rejecting at least one true nullhypothesis onβ is not larger than the levelα. This can provide an important check on data mining.

Fourth, the methods discussed in this work are evaluated and compared on the basis of MonteCarlo simulations. In particular we analyze the conservatism of the projection-based confidencesets.

Fifth, in order to illustrate the projection approach, we present three empirical applications. Inthe first one, we study the relationship between standards of living and openness in the contextof an equation previously considered by Frankel and Romer (1999). The second application dealswith the famous problem of measuring returns to education using the model and data consideredby Angrist and Krueger (1995) and Bound, Jaeger, and Baker (1995), while in the third examplewe study returns to scale and externalities in various industrial sectors of the U.S. economy, using aproduction function specification previously considered by Burnside (1996).

In Section 2, we present the background model and statistical inference methods on the coef-ficient vector of the explanatory endogenous variables. Section 3 presents the simultaneous confi-dence sets. In Section 4, we discuss some general properties of quadric confidence sets and providea simple necessary and sufficient condition under which such sets are bounded. Section 5 providesexplicit projection-based confidence intervals for individual structural parameters and linear trans-formations of these parameters. We also discuss the simultaneity property of these confidence in-tervals. In Section 6, we report the results of our Monte Carlo simulations, while Section 7 presentsthe empirical applications. Finally, Section 8 concludes.

2. Framework

We consider here the standard simultaneous equations model (SEM):

y = Y β + X1γ + u , (2.1)

Y = X1Π1 + X2Π2 + V , (2.2)

4

Page 11: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

wherey andY areT × 1 andT × G matrices of endogenous variables,X1 andX2 areT × k1

andT × k2 matrices of exogenous variables,β andγ areG × 1 andk1 × 1 vectors of unknowncoefficients,Π1 andΠ2 arek1×G andk2×G matrices of unknown coefficients,u = (u1, . . . , uT )′

is a vector of structural disturbances, andV = [V1, . . . , VT ]′ is aT × G matrix of reduced-formdisturbances. Further,

X = [X1, X2] is a full-column rankT × k matrix (2.3)

wherek = k1 + k2. Finally, to get a finite-sample distributional theory for the test statistics, weshall use the following assumption on the conditional distribution ofu givenX :

u |X ∼ N[0, σ2

u(X)IT

](2.4)

whereσ2u(X) is a positive scalar parameter which may depend onX (but not onβ or γ). This

means that, conditional onX, the disturbancesu1, . . . , uT are i.i.d. Gaussian. In particular, it isclear (2.4) holds under the following standard assumptions:

u andX are independent; (2.5)

u ∼ N[0, σ2

u IT

]. (2.6)

(2.5) may be interpreted as the strict exogeneity ofX with respect tou.Note that the distribution ofV is not otherwise restricted; in particular, the vectorsV1, . . . , VT

need not follow a Gaussian distribution and may be heteroskedastic. Below, we shall also considerthe situation where the reduced-form equation forY includes a third set of instrumentsX3 whichare not used in the estimation:

Y = X1Π1 + X2Π2 + X3Π3 + V (2.7)

whereX3 is aT ×k3 matrix of explanatory variables (not necessarily strictly exogenous); in partic-ular,X3 may be unobservable. We view this situation as important because, in practice, it is quiterare that one can consider all the relevant instruments have been or should be used.

In such a model, we are generally interested in making inference onβ andγ. In Dufour (1997),it is shown that if the model is unidentified (the matrixΠ2 does not have its maximal rank) anyvalid confidence set forβ or γ must be unbounded with positive probability. This is due to thefact that such a model may be unidentified and holds indeed even if identification restrictions areimposed. This result explains many recent findings about the performance of standard asymptoticstatistics when the instrumentsX2 are weakly correlated with the endogenous explanatory variablesY . The usual approach, which consists in inverting Wald-type statistics to obtain confidence sets,is not valid in these situations since the resulting confidence sets are bounded with probability 1.This is related to the fact that finite-sample distributions of such statistics are not pivotal and followdistributions which depend heavily on nuisance parameters.

Choi and Phillips (1992) considered the same model where they suppose that a subset of param-eters are not identified. They derive exact and asymptotic distributions of the instrumental variables

5

Page 12: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

estimator and the Wald statistic. The analytic expressions obtained are complex and differ fromcommonly known ones. Staiger and Stock (1997) considered the same model but assumed that theelements of the matrixΠ2 tend to 0 asT increases (Π2 = C/

√T , whereC is a fixed matrix). They

derive the asymptotic distributions of different statistics, including two-stage least squares (2SLS)limited information maximum likelihood (LIML) and the Wald-type statistics based on these esti-mators. In conformity with the results in Dufour (1997), these distributions depend on nuisance pa-rameters and are not pivotal. Wang and Zivot (1998) derived [under the same assumption as Staigerand Stock,i.e. Π2 = C/

√T ] the asymptotic distributions of likelihood ratio(LR) and Lagrange

multiplier (LM) type statistics based on maximum likelihood and GMM estimation methods. Asbefore, these distributions depend on nuisance parameters and are not pivotal. These derivationsprovide useful insights for understanding the poor performance of asymptotic approximations re-ported in previous work, but they do not solve the statistical inference problem in these models.

A first solution to this problem [see Dufour (1997) and Staiger and Stock (1997)] consists inusing the Anderson-Rubin statistic [Anderson and Rubin (1949)]. This test is based on the simpleidea that ifβ is specified, model (2.1)-(2.2) can be reduced to a simple linear regression equation.More precisely, if we consider the hypothesisH0 : β = β0 in equation (2.1), we can write:

y − Y β0 = X1θ1 + X2θ2 + ε (2.8)

whereθ1 = γ + Π1(β − β0), θ2 = Π2(β − β0) andε = u + V (β − β0). Equation (2.8) satisfiesall the conditions of the linear regression model. We can testH0 by testingH

′0 : θ2 = 0 using the

standardF -statisticH′0 [denotedAR(β0)]. With the additional assumptions (2.3) - (2.4), we have

underH0 :

AR(β0) =(y − Y β0)′[M(X1)−M(X)](y − Y β0)/k2

(y − Y β0)′M(X)(y − Y β0)/(T − k)∼ F (k2, T − k) (2.9)

where for any full rank matrixB, M(B) = I − P (B) andP (B) = B(B′B)−1B′ is the projectionmatrix on the space spanned by the columns ofB. The distributional result in (2.9) holds irrespec-tive on the rank of the matrixΠ2, which means that tests based onAR(β0) are robust to weakinstruments. It is also interesting to note that this distribution is not affected by the distribution ofV ; in other words,AR(β0) is robust to the distribution of the endogenous explanatory variablesY.

Another important feature ofAR(β0) comes from the fact that (2.9) also obtains under the widermodel (2.7), because in this case:

y − Y β0 = X1θ1 + X2θ2 + X3θ3 + ε (2.10)

whereθ1 = γ + Π1(β − β0), θ2 = Π2(β − β0), θ3 = Π3(β − β0) andε = u + V (β − β0). Sinceθ2 = 0 andθ3 = 0 underH0, it is straightforward to see that the null distribution ofAR(β0) isF (k2, T − k) [under the assumptions (2.1), (2.7), (2.3) and (2.4)]. As a result, the validity of thetest based onAR(β0) is unaffected by the fact that potentially relevant instruments are not takeninto account. For this reason, we will say it isrobust to instrument exclusion. Furthermore, thedistribution ofX3 is irrelevant to the null distribution ofAR(β0), so thatX3 does not have to be

6

Page 13: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

strictly exogenous. Even more generally, we could also assume thatY obeys a general nonlinearmodel of the form:

Y = g(X1, X2, X3, V, Π) (2.11)

whereg(·) is a possibly unspecified nonlinear function andΠ is an unknown parameter matrix.Since, underH0,

y − Y β0 = X1θ1 + ε ,

the coefficientθ2 in the regression (2.8) must be zero, and (2.9) still holds.A confidence set forβ with level1− α can be obtained by inverting the statisticAR(β0) :

Cβ(α) = {β0 : AR(β0) ≤ Fα(k2, T − k)} (2.12)

whereFα(k2, T − k) is the1 − α quantile of theF distribution withk2 andT − k degrees offreedom. This confidence set is exact and does not require an identification assumption. WhenG = 1, this set has an explicit form solution involving a quadratic inequation _ i.e.Cβ(α) ={β0 : aβ2

0 + bβ0 + c ≤ 0} wherea, b and c are simple functions of the data and the criticalvalueFα(k2, T − k) _ andCβ(α) is unbounded ifF (Π2 = 0) < Fα, whereF (Π2 = 0) is theF -test forH0 : Π2 = 0 in equation (2.2); see Dufour and Jasiak (2001) and Zivot, Startz, andNelson (1998) for details. Further, Monte Carlo simulations [Maddala (1974), Dufour and Jasiak(2001)] indicate that the AR-based test behaves well in terms of power (as long as the number ofk2 of additional instruments is not unduly large). This test also remains asymptotically valid underweaker distributional assumptions, in the sense that the asymptotic null distribution ofAR(β0) isχ2(k2)/k2 [see Dufour and Jasiak (2001) and Staiger and Stock (1997)].

Below, we shall also consider two alternative statistics proposed by Wang and Zivot (1998). Thefirst one is anLR−type statistic and the second is anLM−type statistic. Under the assumptions(2.1)-(2.6) and additional regularity conditions on the asymptotic behavior of the instruments [de-scribed by Wang and Zivot (1998)], these two statistics followχ2(k2) distributions asymptoticallywhen the model is exactly identified(k2 = G), and are bounded by aχ2(k2) distribution when themodel is over-identified(k2 > G). To test H0 : β = β0, these statistics are:

LRLIML(β0) = T [ln(k(β0))− ln[k(βLIML)] , (2.13)

LM2SLS(β0) =T (y − Y β0)′P [P [M(X1)X2]Y ](y − Y β0)

(y − Y β0)′M(X1)(y − Y β0), (2.14)

where

k(β0) =(y − Y β0)′M(X1)(y − Y β0)(y − Y β0)′M(X)(y − Y β0)

.

Asymptotic and conservative confidence sets forβ can be obtained by inverting the latter tests.However, it is easy to see that these statistics are not generally robust to instrument exclusion.3

A common shortcoming of all these tests is that they require one to specify the entire vectorβ.

3We do not study here the tests proposed by Kleibergen (2002) and Moreira (2002), because it does not appear thatthe associated confidence sets can be covered by the theory described in this paper (in terms of quadrics). Furthermore,these procedures are not robust to instrument exclusion.

7

Page 14: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

In particular, they do not allow for general hypotheses of the formH0 : g(β) = 0, whereg(β) maybe any transformation ofβ, such asg(β) = βi − βi

0, whereβi is any scalar component ofβ.In this paper, we deal with this problem by studying the characteristics of the confidence sets

obtained by inverting such statistics, and we use them to derive confidence sets for the componentsof β or linear combinations of these components. We first show that the confidence sets based onthe statisticsAR, LR andLM can be expressed in terms of a quadratic-linear form involving amatrix A, a vectorb and a scalarc. These sets (replacing the inequality by an equality) are knownasquadrics [Shilov (1961, Chapter 11), Pettofrezzo and Marcoantonio (1970, Chapters 9-10)]. Wewill then study the different possible cases as functions ofA, b andc, and we will derive analyticexpressions for projection-based confidence intervals in the case of linear transformations of modelparameters.

3. Quadric confidence sets

Let us first consider theAR statistic. A simple algebraic calculation shows that the inequality

AR(β0) ≤ Fα(k2, T − k)

may be written in the following simple form:

β′0Aβ0 + b′β0 + c ≤ 0 (3.1)

whereA = Y ′HY, b = −2Y ′Hy, c = y′Hy and

H ≡ HAR = M(X1)−M(X)[1 +

k2Fα(k2, T − k)T − k

]. (3.2)

We can thus write:Cβ(α) = {β0 : β′0Aβ0 + b′β0 + c ≤ 0} . (3.3)

If β is scalar, this set is the solution of a quadratic inequation:

Cβ(α) = {β0 : aβ20 + bβ0 + c ≤ 0}. (3.4)

Depending on the values ofa, b andc, this set may take several forms (a closed interval, a semi-openinterval, a union of two semi-open intervals, the setR of all possible values, or the empty set); seeDufour and Jasiak (2001), and Zivot, Startz, and Nelson (1998).

If we use the statisticLRLIML(β0) or LM2SLS(β0) instead ofAR, we get analogous confi-dence sets which only differ through theH matrix. ForLRLIML(β0), this matrix takes the form

HLR = M(X1)−M(X) k(βLIML) exp[χ2α(k2)/T ] (3.5)

while, for LM2SLS(β0), it is

HLM = P[P [M(X1)X2]Y

]−M(X1)[χ2α(k2)/T ] . (3.6)

8

Page 15: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

For theAR andLR statistics, the matrixA can be written:

A = Y ′M(X1)Y − Y ′M(X)Y (1 + fα)

wherefα = k2Fα(k2, T −k)/(T − k) for AR andfα = exp[χ2α(k2)/T ]k(βLIML)− 1 for theLR

statistic. ClearlyA is symmetric and a typical diagonal element of this matrix is

Aii = Y ′i M(X1)Yi − Y ′

i M(X)Yi (1 + fα) , (3.7)

which is acorrected difference between the sum of squared residuals from the regression ofYi onX1 and the sum of squared residuals from the regression ofYi on X = [X1, X2]. This differencemay be viewed as a measure of the importance ofX2 in explainingYi, i.e. the relevance ofX2 as aninstrument forYi. A necessary condition for matrixA to be positive definite is that the instrumentsX2 should provide sufficient additional explanatory power forY (with respect toX1). Similarly,c = y′Hy is a corrected difference between the sum of squared residuals from the regression ofyonX1 and the sum of squared residuals from the regression ofy onX = [X1, X2]. For the vectorb, a typical element is given by

bi = −2{[M(X1)Yi]′[M(X1)y]− [M(X)Yi]′[M(X)y](1 + fα)}. (3.8)

The first term [multiplied by−1/(2T )] is the sample covariance between the residuals of the re-gression ofYi onX1 and the residuals of the regression ofy onX1, while the second term gives thesame covariance withX1 replaced byX = [X1, X2].

4. Geometry of quadric confidence sets

The locus of points that satisfy an equation of the form

β′Aβ + b′β + c = 0 , (4.1)

whereA is a symmetricG ×G matrix, b is aG × 1 vector andc is a scalar, is known in the math-ematical literature as aquadricsurface [Shilov (1961, Chapter 11), Pettofrezzo and Marcoantonio(1970)]. Consequently, we shall call a confidence set of the form

Cβ = {β : β′Aβ + b′β + c ≤ 0} (4.2)

a quadric confidence set. A quadric is characterized by the sum a quadratic form(β′Aβ) andan affine transformation(b′β + c). Depending on the values ofA, b and c, it may take severalforms. In this section, we examine some general properties of quadric confidence sets, especiallythe conditions under which such sets are bounded or unbounded. In particular, we will see that theeigenvalues of theA matrix play a central role in these properties and that larger eigenvalues areassociated with more “concentrated” (or “smaller”) quadric confidence sets. For these reasons, wecall A theconcentration matrixof the quadric.

In the sequel of this section, it will be convenient to distinguish between two basic cases: the

9

Page 16: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

one whereA is nonsingular, and the one where it is singular. We adopt the convention that an emptyset is bounded.

4.1. Nonsingular concentration matrix

If A is nonsingular, we can write:

β′Aβ + b′β + c =(β +

12A−1b

)′A

(β +

12A−1b

)−

(14b′A−1b− c

)

=(β − β

)′A

(β − β

)− d (4.3)

whereβ = −12A−1b andd = 1

4b′A−1b− c . SinceA is a real symmetric matrix, we have:

A = P ′DP (4.4)

whereP is an orthogonal matrix andD is a diagonal matrix whose elements are the eigenvalues ofA. Inequality (3.1) may then be reexpressed as

λ1z21 + λ2z

22 + · · · + λGz2

G ≤ d (4.5)

where theλi’s are the eigenvalues ofA andz = P (β − β). The transformationz = P (β − β)represents a translation followed by a rotation ofβ, so it is clear that

Cβ is bounded⇔ Cz is bounded (4.6)

where

Cβ ≡ {β : β′Aβ + b′β + c ≤ 0} = {β :(β − β

)′A

(β − β

) ≤ d}= {β : λ1z

21 + λ2z

22 + · · · + λGz2

G ≤ d andz = P (β − β)}, (4.7)

Cz ≡ {z : λ1z21 + λ2z

22 + · · · + λGz2

G ≤ d} . (4.8)

Again it will be convenient to distinguish between three cases according to the signs of the eigen-values ofA, namely: (1) all the eigenvalues ofA are positive(λi > 0, i = 1, . . . , G), i.e. Ais positive definite; (2) all the eigenvalues ofA are negative(λi < 0, i = 1, . . . , G), i.e. A isnegative definite; (3)A has both positive and negative values,i.e. A is neither positive nor negativedefinite.

4.1.1. Positive definite concentration matrix

If λi > 0, i = 1, . . . , G, the inequality (4.5) can be reexpressed as

(z1

γ1

)2

+ · · · +(

zG

γG

)2

≤ d (4.9)

10

Page 17: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

whereγi =√

1/λi, i = 1, . . . , G. If d = 0, we haveCz = {0} andCβ reduces to{β}. Ifd < 0, Cz andCβ are empty. Ifd > 0, Cz is the area inside an ellipsoid, hence it is a compact set.Consequently,Cz andCβ are bounded.

4.1.2. Negative definite concentration matrix

If λi < 0, i = 1, . . . , G, the setCz is the set of all values ofz that satisfy

(z1

γ1

)2

+ · · · +(

zG

γG

)2

≥ −d (4.10)

whereγi =√−1/λi, or equivalently, the setnot insidethe open ellipsoid defined by

(z1

γ1

)2

+ · · · +(

zG

γG

)2

< −d . (4.11)

Cz andCβ are thus unbounded sets. In particular, ifd ≥ 0, we haveCβ = Cz = RG.

4.1.3. Concentration matrix not positive or negative definite

If A has both positive and negative eigenvalues, we can assume, without loss of generality, thatλi > 0 for i = 1 , . . . , p, andλi < 0 , for i = p + 1 , . . . , G, where1 ≤ p < G. Inequality (4.5)may then be rewritten:

(z1

γ1

)2

+ · · · +(

zp

γp

)2

−(

zp+1

γp+1

)2

− · · · −(

zG

γG

)2

− d ≤ 0 (4.12)

wherep is the number of positive eigenvalues ofA, γi =√

1/λi for i = 1 , . . . , p, andγi =√−1/λi for i = p + 1 , . . . , G.In this caseCz and Cβ are unbounded. This is easy to see: for arbitrary given values of

z1, , . . . , zp andd, it is clear that inequality (4.12) will hold if any of the valueszi, p+1 ≤ i ≤ G, issmall enough (as|zi| → ∞). Consequently, each component ofz is unbounded inCz and similarlyfor each component ofβ in Cβ.

4.2. Singular concentration matrix

We now consider the case whereA is singular with rankr (r < G).Without loss of generality, wecan assume that the firstr diagonal elements ofD in the decompositionA = P ′DP (the firstreigenvalues ofA) used in (4.5) are different from zero, while theG− r other ones are equal to zero.Then we can write (see the details in the Appendix):

β′Aβ + b′β + c =r∑

i=1

λiz2i +

G∑

i=r+1

δizi − d (4.13)

11

Page 18: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

where theλi are the eigenvalues ofA (λi 6= 0, i = 1, . . . , r), δ = Pb, z = Pβ + µ and

d = −c +r∑

i=1

δ2i /(4λi) , µi =

{δi/(2λi) , if λi 6= 0 ,0 , otherwise.

(4.14)

In the new space given by the transformationz = Pβ + µ, Cz may take many forms following thenumber of non-zero eigenvalues and their signs. However, this set will always be unbounded. From(4.13) it is clear that we can make anyzi, i = r + 1, . . . , G, arbitrarily large with an opposite signof its coefficientδi, so that the inequality (4.5) will hold.

4.3. Necessary and sufficient condition for bounded quadric confidence set

We can now deduce the conditions under whichCβ is bounded. According to results in Dufour(1997), a valid confidence setCβ for β (with level1− α) in model (2.1)-(2.6) must be unboundedwith positive probability for any parameter configuration, a probability that should be large(closeto 1− α) when the matrixΠ2 does not have full rank (or is close to have full column rank). Giventhe complicated expressions of the random matrixA, the random vectorb and the random scalarc,it seems difficult to evaluate this probability. In the following proposition, we give an easy-to-verifynecessary and sufficient condition for a confidence set of the formCβ to be bounded.

Theorem 4.1 NECESSARY AND SUFFICIENT CONDITION FOR BOUNDED CONFIDENCE SET.The setCβ in (4.2) is bounded if and only if the matrixA is positive definite.

Proofs are provided in the Appendix. It is of interest to note here that the case whereA issingular is unlikely to be met with AR-type confidence sets such as those described in Section 3,because in this case we haveA = Y ′HY, whereY andH, are aT × G and aT × T matricesrespectively. IfY follows an absolutely continuous distribution (as assumed in Section 2),A willbe nonsingular with probability one as soon as the rank ofH is greater than or equal toG.

4.4. Joint confidence sets forβ and γ

Finally, we note that the above results also apply to the problem of building joint confidence sets forβ and a subvectorγ1 of γ. This can be done by using an appropriate extension of the AR procedure[see Dufour and Jasiak (2001)]. LetX1 = [X11, X12], γ = (γ′1, γ′2)

′ andΠ1 = [Π11, Π12] whereX11, γ1 andΠ11 have dimensionsT × k11, k11 × 1 andk11 × G respectively(k11 ≤ k1). From(2.1) - (2.2), we can write:

y − Y β0 −X11γ10 = X11θ11 + X12θ12 + X2θ2 + ξ (4.15)

whereθ11 = Π11(β − β0) + γ1 − γ10, θ11 = Π11(β − β0) + γ2, θ2 = Π2(β − β0) andξ =V (β − β0) + u. We can test

H0 : (β, γ1) = (β0, γ10) (4.16)

12

Page 19: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

by testingH ′0 : θ11 = 0 andθ2 = 0, and obtain a joint confidence set forβ andγ1 by inverting the

correspondingF -test. After a similar simple calculation, we obtain the same form as before:

C(β, γ1)(α) = {(β′0, γ′10)′ : (β′0, γ

′10)A1(β′0, γ

′10)

′ + b′1(β′0, γ

′10)

′ + c1 ≤ 0} (4.17)

whereA1 = [Y, X11]′H1[Y, X11], b1 = −2[Y, X11]′H1y, c1 = y′H1y and

H1 = M(X12)−[1 +

k11 + k2

T − kFα(k11 + k2, T − k)

]M(X) . (4.18)

5. Confidence sets for transformations ofβ

5.1. The projection approach

The projection technique is a general approach that may be applied in different contexts. Given aconfidence setCθ(α) with level 1 − α for the vector of parametersθ, this method enables one todeduce confidence sets for general transformationsg in Rm of this vector. Sincex ∈ E ⇒ g(x) ∈g(E) for any setE, we have

P[θ ∈ Cθ(α)] ≥ 1− α ⇒ P[g(θ) ∈ g [Cθ(α)]

] ≥ 1− α (5.1)

whereg [Cθ(α)] = {x ∈ Rm : ∃ θ ∈ Cθ(α), g(θ) = x}. Henceg [Cθ(α)] is a conservativeconfidence set forg(θ) with level1− α.

Even if g(θ) is scalar, the projection-based confidence set is not necessarily an interval. How-ever, it is easy to see that

P[gL(α) ≤ g(θ) ≤ gU (α)] > 1− α (5.2)

where gL(α) = inf{g(θ0), θ0 ∈ Cθ(α)} and gU (α) = sup{g(θ0), θ0 ∈ Cθ(α)}; see Du-four (1997), Abdelkhalek and Dufour (1998) or Dufour and Jasiak (2001). ThusIU (α) =[gL(α), gU (α)

]\{−∞, +∞} is a confidence interval (CI) with level1 − α for g(θ), where it isassumed that−∞and+∞ are not admissible. This interval is not bounded whengL(α) or gU (α)is infinite.

It is worth noting that we obtain in this waysimultaneous confidence setsfor any number oftransformations ofβ: g1(β), g2(β), . . . , gn(β). The setCg1(β)(α)×Cg2(β)(α)× · · · ×Cgn(β)(α)whereCgi(β)(α) is the projection-based confidence set forgi(β), i = 1, . . . , n, is a simultaneous

confidence set for the vector(g1(β), g2(β), . . . , gn(β)

)′with level greater than or equal to1− α.

More generally, if{ga(β) : a ∈ A} is a set of functions ofβ,whereA is is some index set, then

P[ga(β) ∈ ga [Cβ(α)] for all a ∈ A

] ≥ 1− α . (5.3)

If these confidence intervals are used to test different hypotheses, an unlimited number of hypothesescan be tested without losing control of then overall level. The confidence sets obtained in this wayare simultaneousin the sense of Scheffé. For further discussion of simultaneous inference, thereader may consult Miller (1981), Savin (1984), and Dufour (1989).

13

Page 20: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

In this section, we build confidence sets forg(β) by “projecting” the setCβ(α).4 We study twoparticular transformations:g(β) = w′β (a linear combination of the components ofβ) andg(β) =βi (the projection on the axisβi). We also show that the confidence intervalIU (α) may involve asizeable loss of information when the two optimization problems have unbounded solutions [i.e., ifgL(α) = −∞ andgU (α) = +∞] while the appropriate projection is a proper subset ofR, hencethe importance of studying the setCβ before choosing the way to project it.

If the aim is to testH0 : g(β) = 0, we can easily deduce fromCβ(α) a conservative test. Thelatter consists in rejectingH0 when all vectorsβ0 that satisfyH0 are rejected by the AR test, orequivalently when the minimum ofAR(β0) subject to the constraint (s.c.)g(β) = 0 is larger thanFα(k2, T − k), i.e.whenmin{AR(β) : g(β) = 0} ≥ Fα(k2, T − k).

5.2. Projection-based confidence sets for scalar linear transformations

We consider now a general confidence set of the form

Cβ = {β0 : β′0Aβ0 + b′β0 + c ≤ 0} (5.4)

wherec is a real scalar,A is a symmetricG×G matrix, andb is aG× 1 vector. By definition, theassociated projection-based confidence interval for the scalar functiong(β) = w′β, wherew is anonzeroG× 1 vector, is:

Cw′β ≡ g[Cβ] = {δ0 : δ0 = w′β0 whereβ′0Aβ0 + b′β0 + c ≤ 0} . (5.5)

To study the characteristics ofCw′β, we shall distinguish again between the case whereA is non-singular and the case where it is singular.

5.2.1. Nonsingular concentration matrix

When the concentration matrix is nonsingular, all the eigenvalues ofA are different from 0. Usingthe transformationz = P (β − β), Cw′β may then be written:

Cw′β = {w′β0 : λ1z21 + λ2z

22 + · · · + λGz2

G ≤ d andz = P (β0 − β)}.

Further,w′β = w′P ′Pβ = w′P ′P (β − β) + w′P ′P β = a′z + w′β (5.6)

wherea = Pw. Setting

Ca′z = {a′z : λ1z21 + λ2z

22 + · · · + λGz2

G ≤ d} , (5.7)

it is then easy to see that, forx ∈ R,

x ∈ Cw′β ⇔ x− w′β ∈ Ca′z , (5.8)

4SinceC(β, γ1)(α) [in (4.17)] andCβ(α), have the same form, projections fromC(β, γ1)(α) can be computed in thesame way.

14

Page 21: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

hence:Cw′β = R ⇔ Ca′z = R . We will now distinguish three cases depending on the numberof negative eigenvalues: (1) all the eigenvalues ofA are positive (i.e., A is positive-definite); (2)Ahas exactly one negative eigenvalue; (3)A has at least two negative eigenvalues.

WhenA is positive definite,Cβ is a bounded set and, correspondingly, its imageg[Cβ] by thecontinuous functiong(β) = w′β is also bounded [see Abdelkhalek and Dufour (1998, Proposition2)]. The following proposition provides an explicit form for the projection-based confidence setCw′β.

Theorem 5.1 PROJECTION-BASED CONFIDENCE SETS FOR LINEAR TRANSFORMATIONS WHEN

THE CONCENTRATION MATRIX IS POSITIVE DEFINITE. Let Cβ be the set defined in(5.4), d ≡14b′A−1b − c and letw be a nonzero vector inRG. If the matrixA is positive definite andd ≥ 0,then

Cw′β =[w′β −

√d (w′A−1w) , w′β +

√d (w′A−1w)

](5.9)

whereβ = −12A−1b. If d < 0, Cw′β is empty.

In the special case wherew = ei = (δ1i, δ2i, . . . , δGi)′, with δji = 1 if j = i andδji = 0otherwise, the setCw′β is a confidence interval for the componentβi. This set is given by thefollowing corollary, which is a direct consequence of Proposition5.1.

Corollary 5.2 PROJECTION-BASED CONFIDENCE INTERVAL FOR AN INDIVIDUAL COEFFI-CIENT. Let Cβ be the set defined in(5.4) and d ≡ 1

4b′A−1b − c . Suppose the matrixA in(5.4) is positive definite. If the matrixA is positive definite andd ≥ 0, then

Cβi=

[βi −

√d (A−1)ii , βi +

√d (A−1)ii

]

whereβi = −(A−1)i.b/2 is the i-th element ofβ = −12

′A−1b, (A−1)i. is the i-th row of A−1,

(A−1)ii is thei-th element of the diagonal ofA−1, and(A−1)ii > 0 . Further, if d < 0, thenCβiis

empty.

It is interesting to note the relationship with Scheffé-type confidence sets. The confidence setfor β is based on theF -test ofH0 : θ2 = Π2(β − β0) = 0 in the regression equation:

y − Y β0 = X1θ1 + X2θ2 + ε .

Following Scheffé (1959) [see also Savin (1984)], thisF -test is equivalent to the test which doesnot rejectH0 when all hypotheses of the formH0(a) : a′θ2 = 0 are not rejected by the criterion|t(a)| > S(α), for all k2× 1 non-zero vectorsa, wheret(a) is thet-statistic forH0(a) andS(α) =√

k2 Fk2,T−k(α). Sincea′θ2 = w(β − β0) wherew = Π2a, this entails that no hypothesis of theform H ′

0(w) : w′β = w′β0, wherew = Π2a, is rejected. The projection-based confidence set forw′β can be viewed as a Scheffé-type simultaneous confidence interval forw′β.

Let us now consider the case whereA has exactly one negative eigenvalue. The basic result inthis case is given by the following proposition.

15

Page 22: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Theorem 5.3 PROJECTION-BASED CONFIDENCE SETS FOR LINEAR TRANSFORMATIONS WHEN

THE CONCENTRATION MATRIX HAS ONE NEGATIVE EIGENVALUE. LetCβ be the set defined in(5.4), d ≡ 1

4b′A−1b − c, w ∈ RG\{0}, and suppose the matrixA is nonsingular with exactly onenegative eigenvalue. Ifw′A−1w < 0 andd < 0, then

Cw′β =]−∞ , w′β −

√d (w′A−1w)

]∪

[w′β +

√d (w′A−1w) , +∞

[. (5.10)

If w′A−1w > 0 or if w′A−1w ≤ 0 andd ≥ 0, thenCw′β = R. If w′A−1w = 0 andd < 0, thenCw′β = R\{w′β}.

It is interesting to note here thatCw′β can remain informative, even if it is unbounded. Inparticular, if we want to testH0 : w′β = r and consider as a decision rule which rejectsH0 whenr /∈ Cw′β, H0 will be rejected for all values ofr in the interval

(w′β −

√d(w′A−1w) , w′β +√

d(w′A−1w)). In this case,gL(α) = −∞ andgU (α) = ∞, so thatIU (α) = R an uninformative

set, while in fact the true projection-based confidence set is a proper subset ofR.Finally, we consider the case whereA has at least two negative eigenvalues. We cover here the

case where the matrixA is negative definite, or not negative definite with at least 2 negative eigenval-ues. In this case the projection-based confidence set for any linear combination of the componentsof β is equal to the real line, thus uninformative. This is stated in the following proposition.

Theorem 5.4 PROJECTION-BASED CONFIDENCE SETS FOR LINEAR TRANSFORMATIONS WHEN

THE CONCENTRATION MATRIX HAS MORE THAN ONE NEGATIVE EIGENVALUE. LetCβ be theset defined in(5.4) andw ∈ RG\{0}. If the matrixA in (5.4) is nonsingular and admits at leasttwo negative eigenvalues, thenCw′β = R.

5.2.2. Singular concentration matrix

We will now study the case where the concentration matrixA can be singular. This can occur, forexample, if the system studied involves identities. Sincew 6= 0, we can assume without loss ofgenerality that the first component ofw (denotedw1) is different from zero. It will be convenient toconsider a nonsingular transformation ofβ :

δ =[

δ1

δ2

]=

[w′βR2β

]= Rβ , R =

[w′

R′2

]=

(w1 w′20 IG−1

)(5.11)

wherew′ = [w1, w′2] andR2 = [0, IG−1] is a (G− 1)×G matrix. If β = (β1, β2, . . . , βG)′, it isclear form this notation thatδ2 = (β2, . . . , βG)′. We study the problem of building a confidenceset forδ1.

The quadric form which definesCβ in (4.2) may be written:

β′Aβ + b′β + c = δ′Aδ + b′δ + c ≡ Q(δ) (5.12)

16

Page 23: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

whereA = (R−1)′AR−1, b = (R−1)′b, so that

Cw′β = Cδ1 = {δ1 : δ = (δ1, δ′2)′ satisfiesQ(δ) ≤ 0} . (5.13)

On partitioningA andb conformably withδ = (δ1, δ′2)′, we have:

A =(

a11 A′21

A21 A22

), b =

(b1

b2

)(5.14)

whereA22 has dimension(G − 1) × (G − 1) and, by convention, we setA = [a11] andb = [b1]whenG = 1. It is easy to see that:a11 = a11/w2

1, A21 = 1w2

1[w1A21 − a11w2],

A22 =1

w21

[a11w2w′2 − w1A21w

′2 − w1w2A

′21 + w2

1A22] , b =1w1

(b1

−b1w2 + w1b2

).

We can then write:

Q(δ) = a11δ21 + b1δ1 + c + δ′2A22δ2 + [2A21δ1 + b2]′δ2 (5.15)

where, by convention, the two last terms of (5.15) simply disappear whenG = 1. For G ≥ 1, letr2 = rank(A22), where0 ≤ r2 ≤ G− 1, and consider the spectral decomposition:

A22 = P2D2P′2 , D2 = diag(d1, . . . , dG−1) (5.16)

whered1, . . . , dG−1 are the eigenvalues ofA22 andP2 is an orthogonal matrix. Without loss ofgenerality, we can assume that

di 6= 0, if 1 ≤ i ≤ r2 ,= 0, if i > r2 .

(5.17)

Let us also define (whenever the objects considered exist)

δ2 = P ′2δ2 , A21 = P ′

2A21 , b2 = P ′2b2 , D2∗ = diag(d1, . . . , dr2) , (5.18)

and denote byδ2∗, A21∗ andb2∗ the vectors obtained by taking the firstr2 components ofδ2, A21

andb2 respectively:

δ2∗ = P ′21δ2 , A21∗ = P ′

21A21 , b2∗ = P ′21b2 , P2 = [P21, P22] (5.19)

whereP21 andP22 have dimensions(G− 1)× r2 and(G− 1)× (G− 1− r2) respectively. WhenA may be singular, the form of the setCδ1 is given by the following theorem.

Theorem 5.5 PROJECTION-BASED CONFIDENCE SETS WITH A POSSSIBLY SIGULAR CONCEN-TRATION MATRIX . Under the assumptions and notations(5.12)− (5.19), the setCδ1 takes one ofthe three following forms:

17

Page 24: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

(a) if A22 is positive semidefinite andA22 6= 0, then

Cδ1 = {δ1 : a1δ21 + b1δ1 + c1 ≤ 0} ∪ S1 (5.20)

wherea1 = a11 − A′21A+22A21 , b1 = b1 − A′21A

+22b2 , c1 = c− 1

4 b′2A+22b2 , A+

22 is the Moore-Penrose inverse ofA22 , and

S1 ={ ∅ , if rank(A22) = G− 1 ,{δ1 : P ′

22(2A21δ1 + b2) 6= 0} , if 1 ≤ rank(A22) < G− 1 ;

(b) if G = 1 or A22 = 0 , then

Cδ1 = {δ1 : a11δ21 + b1δ1 + c ≤ 0} ∪ S2 (5.21)

where

S2 ={ ∅ , if G = 1 ,{δ1 : 2A21δ1 + b2 6= 0} , if G > 1 andA22 = 0 ;

(c) if A22 is not positive semidefinite andA22 6= 0, thenCδ1 = R .

In all the cases covered by the latter theorem the joint confidence setCβ is unbounded ifA issingular [by Theorem4.1]. However, we can see from Theorem5.5 that confidence intervals forsome parameters (or linear transformations ofβ) can be bounded. This depends on the values ofthe coefficients of the second-order polynomials in (5.20) and (5.21). Specifically, it is easy to seethat the quadratic setCδ1 = {δ1 : a1δ

21 + b1δ1 + c1 ≤ 0} in (5.20) can have the following forms:

setting∆1 ≡ b21 − 4a1c1,

Cδ1 =

[−b1−

√∆1

2a1, −b1+

√∆1

2a1

], if a1 > 0 and∆1 ≥ 0 ,

]−∞ , −b1+

√∆1

2a1

]∪

[−b1−

√∆1

2a1, ∞

[, if a1 < 0 and∆1 ≥ 0 ,

]−∞ , − c1/b1

], if a1 = 0 andb1 > 0 ,[− c1/b1 , ∞[

, if a1 = 0 andb1 < 0 ,

R , if (a1 < 0 and∆1 < 0)or (a1 = b1 = 0 andc1 ≤ 0) ,

∅ , if (a1 > 0 and∆1 < 0)or (a1 = b1 = 0 andc1 > 0) .

(5.22)

Of course, a similar result holds for the quadratic set in (5.21).The results presented in sections 5.2.1-5.2.2 are important for two main reasons. First, they

allow one to obtain confidence sets in situations where no other solution has been proposed todate in the literature. Second, the explicit expressions found avoid one the use of costly numericalmethods as used in the papers cited previously. This is much more important given the nature of theproblems to be solved numerically. We tried many of the standard software as GAUSS and GAMS,

18

Page 25: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

and they seem to have difficulties to find the solutions, unless the starting point is chosen near thesolution (which is naturally unknown). However, Fortran-based IMSL routines appear to performquite well.

5.3. A Wald-type interpretation of the projection-based confidence sets

When the eigenvalues of the matrixA are positive and the projection-based confidence set forw′βis bounded, it is interesting to note that the form of this confidence set (see Proposition5.1) issimilar to the standard form:[θ − σz(α), θ + σz(α)]. Sinceθ = w′β, the corresponding estimatorof β is β = −(1/2)A−1b .The estimated variance of the estimator should be a scalar (sayσ2)times the matrixA−1, σ2A−1, and since the confidence interval has level greater than or equal1− α,

√d/σ should correspond to a quantile of an order greater than or equal1− α of the statistic∣∣(w′β − w′β)/[σ2(w′A−1w)]1/2

∣∣ . ReplacingA andb by their expressions, the estimatorβ may bewritten:

β = (Y ′HY )−1Y ′Hy.

β may be interpreted as an instrumental variables estimator. Indeed, multiplying (2.1) by(HY )′,we get

Y ′Hy = Y ′HY β + Y ′Hu.

Taking the matrixHY as a matrix of instrumental variables forY, we get:

βIV = (Y ′HY )−1Y ′Hy = β.

HY is asymptotically uncorrelated with the disturbancesu under Assumption (5.23) bellow. More-over, whenCβ is obtained from inverting theAR statistic, then under the usual assumptions,

(X ′X

T,X ′uT

,X ′VT

)p−→

T→∞(QXX , 0, 0) ,

X ′u√T

L−→T→∞

N(0, σ2uQXX) , (5.23)

it is easy to show that ifΠ2 is of full rank. Then

√T (β − β) L−→

T→∞N

[0, σ2

u plimT→∞

( 1T

A)−1]

(5.24)

where plimT→∞

1T A = Π ′

2

[QX2X2 −QX2X1Q

−1X1X1

Q′X2X1

]Π2 andQXiXj = plim

T→∞1T X ′

iXj .

On developing the expression ofβ, we may also write:

β = {Y ′[M(X1)− (1 + fα)M(X)]Y }−1Y ′[M(X1)− (1 + fα)M(X)]y.

This is the expression of the well-known Theil’s k-class estimator [see Davidson and MacKinnon(1993, page 649)] withk = 1 + fα, and sincefα tends to 0 whenT becomes large,β is asymptoti-cally equivalent to the two stage least squares estimator. The later may be written:

β2SLS = {Y ′[M(X1)−M(X)]Y }−1Y ′[M(X1)−M(X)]y.

19

Page 26: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Hence whenΠ2 is of full rank and the eigenvalues ofA are positive, the projection-based confidenceset forw′β may be interpreted as a Wald-type confidence interval based on the statistic (which isasymptotically pivotal):

T = (w′β − w′β)/√

σ2u(w′A−1w) .

6. Monte Carlo evaluation

In this section, we study projection-based statistical inference through Monte Carlo simulations. Weespecially focus on the evaluation of the degree of conservatism of the projection-based confidencesets (CS) and we compare the confidence sets obtained on the basis of different statistics. Thesestatistics are the Anderson-Rubin statistic (AR) given by (2.9), the asymptotic AR statistic(ARS)given by (2.9) but without assumption (2.6) (it follows asymptotically aχ2(k2)/k2 distribution) andthe LR and LM statistics proposed by Wang and Zivot (1998) and given by (2.13) - (2.14). We alsostudy the behavior of the Wald statistic based on 2SLS.

The data generating process is:

y = Y1β1 + Y2β2 + X1γ + u , (6.1)

(Y1, Y2) = X2Π2 + X1Π1 + (V1, V2) , (6.2)

(ut, V1t, V2t)′i.i.d.∼ N(0, Σ) , Σ =

1 .8 .8.8 1 .3.8 .3 1

, (6.3)

with k1 = 1, G = 2, β1 = 12 , β2 = 1 , γ = 2 , andΠ1 = (0.1, 0.2) . The correlation coefficient

r betweenu andVi (i = 1, 2) is set equal to0.8, the variablesY1 andY2 are endogenous and theinstrumental variablesX2 are necessary. The matrixΠ2 is such thatΠ2 = C/

√T . We consider

three different sample sizesT = 50, 100, 200. The number of instruments (k2) varies from 2to 40. All simulations are based on 10000 replications. Table 2 presents the results forC = 0(complete unidentification), Table 3 presents the results for a matrixC with componentscij suchthat1 < cij < 5 (weak identification), and Table 4 withcij such that10 ≤ cij ≤ 20. The nominalconfidence level for all tables is95%.

We begin with the behavior of the classical Wald statistic (Table 1), As expected from theresults in Dufour (1997), its real coverage rate may reach 0 when the instruments are very poor.The only case where it behaves well is when identification holds and the number of instruments issmall compared to the sample size. This shows how crucial is the need for alternative valid pivotalstatistics.

For the exact AR statistic, no size distortion, even very small, is observed. The main observationis that the coverage rate of the projection-based confidence sets forβ1 decreases ask2 increases andtends to the exact confidence level1− α of the confidence set forβ. 5 Thus the projection-basedconfidence sets become less conservative as the number of relevant instruments increases.This

5Recall that theoretically, this rate is always greater than or equal to the confidence level of the set from which theprojection is done.

20

Page 27: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Table 1. Empirical coverage rate of 2SLS-based Wald confidence sets

T k Cij = 0 1 ≤ Cij ≤ 5 10 ≤ Cij ≤ 20

2 56.13 97.40 94.463 25.10 94.05 93.714 9.19 89.06 93.685 3.82 84.49 93.65

50 10 0.03 78.28 93.3315 0.00 78.99 93.1620 0.00 77.14 92.8830 0.00 68.47 93.3540 0.00 67.84 92.302 55.22 97.68 95.073 24.53 94.33 94.434 10.52 89.45 95.165 3.81 87.16 94.16

100 10 0.03 83.88 94.4415 0.00 81.40 94.1220 0.00 72.29 94.1930 0.00 61.47 93.2040 0.00 45.48 93.662 55.53 97.85 95.323 24.55 94.68 94.804 10.32 90.33 94.955 4.10 89.19 94.64

200 10 0.04 83.99 94.7515 0.00 81.28 94.1420 0.00 71.71 94.3230 0.00 62.26 93.8940 0.00 54.99 93.76

suggests use of a number of relevant instruments as large as possible. But on the other hand, as notedby Dufour and Taamouti (2001b) and Kleibergen (2002), a large number of instruments will induceloss of power for the Anderson-Rubin test forβ. Obviously, this should not be true in the extremecase ofC = 0, X2 vanishes from equation (6.2).

The proportions of unbounded confidence sets and confidence sets equal to the real line arenearly zero when identification holds (Table 4). When we approach nonidentification (tables 3 and2), these proportions become large but decrease as the number of instruments increases. This ispredictable according to the results in Dufour (1997). It is natural when the components ofΠ2

approach 0 to get an unbounded confidence set, forβ is not identified in this case and the set ofpossible values is large.

The statistic ARS behaves in the same way as the statistic AR, except when the sample size issmall with respect to the number of instruments. In this case we observe a size distortion, in thesense that the empirical coverage rate forβ becomes smaller than the nominal level (95%).

For the LR and LM statistics, the main observation is that they produce confidence sets muchmore conservative than those based on AR or ARS, and unlike the AR statistic, the degree of conser-vatism of the resulting confidence sets increases with the number of instrumentsk2. The coverage

21

Page 28: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Tabl

e2.

Cha

ract

eris

tics

ofA

Ran

dA

RS

proj

ectio

n-ba

sed

confi

denc

ese

ts_

Cij

=0

AR

AR

ST

kC

over

age

Cov

erag

eU

nbou

nded

Em

pty

CS

=R

Cov

erag

eC

over

age

Unb

ound

edE

mpt

yC

S=R

rate

forβ

rate

forβ

1C

SC

Sra

tefo

rβra

tefo

rβ1

CS

CS

295

.06

100.

0099

.98

0.00

99.9

494

.07

99.9

999

.98

0.00

99.8

93

95.5

199

.98

99.9

70.

0099

.86

94.4

399

.97

99.9

60.

0099

.72

494

.61

99.9

799

.94

0.00

99.8

193

.01

99.8

899

.89

0.00

99.6

25

94.6

999

.92

99.9

40.

0099

.70

92.7

799

.86

99.8

70.

0099

.30

50

1094

.74

99.8

899

.90

0.00

99.6

191

.28

99.7

499

.72

0.00

98.9

115

94.7

799

.94

99.9

30.

0099

.64

89.0

899

.59

99.6

10.

0098

.32

2095

.12

99.9

199

.94

0.00

99.6

686

.79

99.2

899

.43

0.02

97.2

230

95.1

699

.89

99.8

90.

0099

.48

80.9

898

.18

98.1

90.

1293

.52

4095

.01

99.8

199

.85

0.00

99.3

769

.41

94.7

894

.33

0.68

82.7

92

94.7

399

.98

99.9

80.

0099

.88

94.3

899

.98

99.9

70.

0099

.83

394

.79

99.9

599

.96

0.00

99.7

894

.15

99.9

499

.95

0.00

99.7

54

95.1

599

.96

99.9

50.

0099

.74

94.3

499

.89

99.8

80.

0099

.63

595

.30

99.9

899

.98

0.00

99.8

094

.32

99.9

799

.97

0.00

99.7

5100

1095

.34

99.8

899

.87

0.00

99.6

893

.71

99.8

299

.84

0.00

99.5

015

94.7

599

.92

99.9

00.

0199

.68

92.1

599

.81

99.8

30.

0199

.22

2094

.72

99.9

699

.93

0.00

99.6

291

.35

99.7

599

.78

0.00

98.8

730

95.1

999

.94

99.9

40.

0099

.73

90.4

599

.67

99.7

00.

0098

.67

4094

.45

99.9

299

.89

0.00

99.5

687

.95

99.4

199

.38

0.03

97.7

22

94.9

999

.99

99.9

80.

0099

.83

94.8

199

.98

99.9

80.

0099

.83

395

.10

99.9

499

.93

0.00

99.7

694

.91

99.9

399

.93

0.00

99.7

44

94.9

999

.94

99.9

30.

0099

.68

94.6

199

.94

99.9

10.

0099

.65

594

.95

99.9

399

.90

0.00

99.6

794

.43

99.9

299

.89

0.00

99.6

1200

1095

.08

99.9

299

.94

0.00

99.6

694

.22

99.9

099

.93

0.00

99.5

715

94.9

899

.95

99.9

20.

0099

.70

94.0

099

.89

99.9

00.

0099

.58

2095

.08

99.9

599

.93

0.00

99.6

693

.55

99.9

099

.89

0.00

99.4

430

94.8

899

.95

99.9

30.

0099

.63

92.7

999

.92

99.9

10.

0099

.30

4095

.14

99.8

899

.86

0.00

99.4

992

.27

99.7

599

.72

0.00

99.0

0

22

Page 29: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Tabl

e2

(con

tinue

d).

Cha

ract

eris

tics

ofLR

and

LMpr

ojec

tion-

base

dco

nfide

nce

sets

_C

ij=

0

LMLR

Tk

Cov

erag

eC

over

age

Unb

ound

edE

mpt

yC

S=R

Cov

erag

eC

over

age

Unb

ound

edE

mpt

yC

S=R

rate

forβ

rate

forβ

1C

SC

Sra

tefo

rβra

tefo

rβ1

CS

CS

295

.08

100.

0099

.98

0.00

99.9

494

.05

99.9

999

.98

0.00

99.8

93

95.7

799

.98

99.9

70.

0099

.91

95.0

699

.97

99.9

70.

0099

.80

495

.10

99.9

799

.98

0.00

99.9

294

.76

99.9

699

.99

0.00

99.8

65

95.4

899

.94

99.9

50.

0099

.92

95.3

699

.94

99.9

40.

0099

.82

50

1096

.57

99.9

799

.95

0.00

99.9

597

.16

99.9

999

.99

0.00

99.9

215

97.7

699

.99

99.9

90.

0099

.99

97.6

410

0.00

99.9

90.

0099

.94

2098

.86

100.

0099

.99

0.00

99.9

997

.92

100.

0010

0.00

0.00

99.9

630

99.9

710

0.00

100.

000.

0010

.00

96.9

999

.97

99.9

50.

0099

.83

4010

0.00

100.

0010

0.00

0.00

100.

0089

.49

99.5

099

.50

0.00

97.4

72

94.7

499

.98

99.9

80.

0099

.88

94.3

899

.98

99.9

70.

0099

.83

394

.99

99.9

699

.96

0.00

99.8

795

.10

99.9

799

.97

0.00

99.8

84

95.3

899

.96

99.9

50.

0099

.85

96.1

599

.98

99.9

70.

0099

.85

595

.64

99.9

899

.98

0.00

99.9

396

.38

100.

0010

0.00

0.00

99.9

0100

1096

.20

99.9

099

.89

0.00

99.8

698

.20

99.9

899

.95

0.00

99.9

315

95.9

999

.95

99.9

30.

0099

.90

98.6

810

0.00

100.

000.

0099

.99

2096

.75

100.

0099

.99

0.00

99.9

999

.37

100.

0010

0.00

0.00

100.

0030

98.1

099

.99

99.9

80.

0099

.98

99.6

510

0.00

100.

000.

0010

0.00

4098

.77

100.

0099

.99

0.00

99.9

999

.60

100.

0010

0.00

0.00

99.9

92

94.9

999

.99

99.9

80.

0099

.83

94.8

199

.98

99.9

80.

0099

.83

395

.16

99.9

499

.93

0.00

99.8

695

.41

99.9

699

.94

0.00

99.8

24

95.1

199

.94

99.9

30.

0099

.79

96.1

499

.98

99.9

70.

0099

.84

595

.11

99.9

599

.91

0.00

99.8

196

.59

99.9

899

.98

0.00

99.8

8200

1095

.58

99.9

599

.95

0.00

99.9

098

.47

99.9

899

.99

0.00

99.9

615

95.7

499

.97

99.9

50.

0099

.92

99.2

010

0.00

100.

000.

0010

0.00

2096

.18

99.9

799

.96

0.00

99.9

399

.66

100.

0010

0.00

0.00

100.

0030

96.2

799

.98

99.9

70.

0099

.96

99.8

410

0.00

100.

000.

0010

0.00

4097

.19

99.9

599

.93

0.00

99.9

299

.98

100.

0010

0.00

0.00

100.

00

23

Page 30: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Tabl

e3.

Cha

ract

eris

tics

ofA

Ran

dA

RS

proj

ectio

n-ba

sed

confi

denc

ese

ts_

1≤

Cij≤

5

AR

AR

ST

kC

over

age

Cov

erag

eU

nbou

nded

Em

pty

CS

=R

Cov

erag

eC

over

age

Unb

ound

edE

mpt

yC

S=R

rate

forβ

rate

forβ

1C

SC

Sra

tefo

rβra

tefo

rβ1

CS

CS

295

.14

98.7

198

.50

0.00

94.2

194

.16

98.3

998

.20

0.00

93.2

03

95.0

897

.99

98.0

80.

0092

.16

93.8

697

.40

97.6

20.

0090

.05

495

.03

97.7

098

.10

0.02

90.6

793

.41

96.8

997

.21

0.03

87.8

75

95.1

097

.47

87.4

80.

1170

.46

93.3

696

.36

83.7

80.

264

.22

50

1095

.24

96.8

257

.95

0.59

34.7

591

.52

94.3

446

.57

1.40

24.3

515

95.0

596

.38

12.2

71.

724.

1089

.37

91.8

95.

624.

491.

4220

95.2

096

.33

28.7

51.

4715

.85

86.9

089

.30

13.1

35.

725.

5530

95.2

296

.08

16.8

62.

056.

5580

.72

83.2

52.

6612

.11

0.55

4094

.74

95.3

346

.47

1.70

24.1

768

.03

70.1

84.

7722

.67

0.86

294

.94

98.6

598

.46

0.00

94.0

394

.48

98.3

898

.31

0.00

93.5

53

94.9

897

.89

98.0

30.

0091

.95

94.3

197

.62

97.7

90.

0091

.15

494

.92

97.7

297

.40

0.00

90.6

194

.06

97.2

596

.90

0.00

88.9

55

95.0

697

.63

85.8

70.

0968

.36

94.3

297

.26

83.9

30.

1265

.18

100

1094

.76

96.9

622

.03

1.16

9.29

93.1

195

.75

17.9

91.

826.

9815

95.0

096

.40

6.02

1.81

2.35

92.7

594

.56

4.17

3.03

1.26

2094

.87

96.3

23.

642.

130.

8191

.84

93.8

22.

023.

550.

3130

94.7

195

.80

3.90

2.39

1.10

89.4

991

.44

1.65

5.54

0.40

4094

.99

95.8

43.

822.

740.

4888

.40

90.0

31.

057.

140.

072

94.9

098

.60

98.6

40.

0094

.49

94.6

598

.58

98.5

70.

0094

.25

395

.03

98.1

398

.19

0.00

92.6

894

.71

97.9

998

.10

0.00

92.2

14

95.0

097

.86

97.7

10.

0091

.20

94.6

897

.70

97.5

70.

0090

.54

595

.02

97.3

788

.50

0.08

73.8

894

.65

97.1

087

.63

0.10

72.7

6200

1095

.11

97.1

528

.79

0.83

14.5

594

.40

96.6

826

.76

0.96

16.1

415

94.8

796

.50

11.2

41.

614.

6393

.74

95.6

69.

571.

983.

8420

95.0

096

.25

11.7

81.

804.

5693

.60

95.2

99.

642.

393.

4730

95.0

396

.10

1.04

2.44

0.19

92.9

194

.31

0.65

3.59

0.08

4095

.44

96.3

30.

212.

510.

0192

.66

94.0

10.

094.

070.

01

24

Page 31: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Tabl

e3

(con

tinue

d).

Cha

ract

eris

tics

ofLR

and

LMpr

ojec

tion-

base

dco

nfide

nce

sets

_1≤

Cij≤

5

LMLR

Tk

Cov

erag

eC

over

age

Unb

ound

edE

mpt

yC

S=R

Cov

erag

eC

over

age

Unb

ound

edE

mpt

yC

S=R

rate

forβ

rate

forβ

1C

SC

Sra

tefo

rβra

tefo

rβ1

CS

CS

295

.17

98.7

198

.50

0.00

94.2

194

.16

98.3

898

.20

0.00

93.1

73

97.3

899

.17

98.1

60.

0094

.78

95.4

698

.20

98.1

20.

0092

.36

498

.62

99.5

298

.27

0.00

94.9

696

.10

98.2

298

.57

0.00

91.9

95

99.3

499

.78

88.6

90.

0080

.11

97.7

599

.08

92.0

90.

0078

.53

50

1010

0.00

100.

0064

.74

0.00

52.9

199

.75

99.8

780

.05

0.00

61.1

715

100.

0010

0.00

21.0

70.

0013

.61

99.9

799

.99

44.2

20.

0027

.38

2010

0.00

100.

0053

.08

0.00

45.1

799

.97

99.9

768

.65

0.00

53.6

330

100.

0010

0.00

87.3

00.

0082

.73

99.9

899

.99

55.5

70.

0038

.66

4010

0.00

100.

0010

0.00

0.00

100.

0099

.78

99.8

559

.82

0.00

40.3

92

94.9

498

.65

98.4

60.

0094

.03

94.4

898

.38

98.3

10.

0093

.55

397

.18

99.0

098

.08

0.00

94.4

695

.66

98.3

198

.50

0.00

93.0

94

98.2

899

.35

97.4

80.

0094

.29

96.8

098

.63

98.3

00.

0093

.14

599

.35

99.8

186

.37

0.00

76.2

898

.46

99.3

792

.18

0.00

79.4

1100

1099

.92

100.

0023

.82

0.00

14.4

099

.92

99.9

752

.26

0.00

33.9

015

100.

0010

0.00

7.74

0.00

4.39

99.9

999

.99

36.9

50.

0023

.87

2010

0.00

100.

005.

640.

002.

5410

0.00

100.

0036

.80

0.00

19.2

530

100.

0010

0.00

7.88

0.00

4.64

100.

0010

0.00

53.7

10.

0036

.68

4010

0.00

100.

0011

.78

0.00

6.84

100.

0010

0.00

62.2

40.

0040

.59

294

.90

98.6

098

.64

0.00

94.4

994

.65

98.5

898

.56

0.00

94.2

53

97.1

299

.05

98.2

00.

0094

.98

96.0

398

.55

98.6

60.

0094

.09

498

.20

99.4

197

.72

0.00

94.7

396

.89

98.7

898

.80

0.00

94.4

25

99.3

799

.86

88.7

50.

0080

.45

98.3

399

.39

93.9

80.

0084

.38

200

1099

.93

99.9

929

.84

0.00

20.0

499

.97

99.9

861

.02

0.00

43.6

915

100.

0010

.00

12.2

80.

007.

6499

.99

100.

0050

.27

0.00

35.2

720

100.

0010

0.00

13.2

10.

007.

9510

0.00

100.

0062

.28

0.00

44.8

230

100.

0010

0.00

1.42

0.00

0.65

100.

0010

0.00

40.7

30.

0023

.27

4010

0.00

100.

000.

340.

000.

0910

0.00

100.

0031

.24

0.00

16.0

0

25

Page 32: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Tabl

e4.

Cha

ract

eris

tics

ofA

Ran

dA

RS

proj

ectio

n-ba

sed

confi

denc

ese

ts_

10≤

Cij≤

20

AR

AR

ST

kC

over

age

Cov

erag

eU

nbou

nded

Em

pty

CS

=R

Cov

erag

eC

over

age

Unb

ound

edE

mpt

yC

S=R

rate

forβ

rate

forβ

1C

SC

Sra

tefo

rβra

tefo

rβ1

CS

CS

295

.14

98.5

20.

000.

000.

0094

.02

98.1

60.

000.

000.

003

94.9

297

.89

0.00

0.59

0.00

93.7

497

.34

0.00

0.69

0.00

495

.11

97.7

80.

000.

750.

0093

.33

96.8

90.

001.

190.

005

95.1

497

.45

0.00

1.21

0.00

93.2

996

.29

0.00

1.67

0.00

50

1094

.99

96.5

30.

002.

170.

0091

.29

93.6

50.

004.

130.

0015

94.8

096

.19

0.00

2.85

0.00

88.6

891

.24

0.00

6.43

0.00

2094

.89

95.9

30.

003.

130.

0086

.50

88.7

00.

009.

130.

0030

94.9

695

.79

0.00

3.56

0.00

80.4

382

.38

0.00

15.7

80.

0040

95.2

795

.70

0.00

3.84

0.00

69.3

870

.94

0.00

27.4

30.

002

95.1

098

.56

0.00

0.00

0.00

94.6

598

.44

0.00

0.00

0.00

395

.02

97.8

10.

000.

480.

0094

.52

97.6

00.

000.

580.

004

95.2

297

.91

0.00

0.73

0.00

94.4

797

.51

0.00

0.91

0.00

594

.69

97.1

10.

001.

230.

0093

.76

96.5

70.

001.

510.

00100

1095

.08

96.7

70.

001.

920.

0093

.53

95.5

70.

002.

700.

0015

94.6

896

.12

0.00

2.59

0.00

92.1

794

.25

0.00

3.99

0.00

2095

.34

96.3

50.

002.

570.

0092

.17

93.9

70.

004.

450.

0030

94.7

295

.78

0.00

3.37

0.00

89.8

591

.50

0.00

6.88

0.00

4094

.71

95.5

50.

003.

740.

0087

.60

89.0

70.

009.

290.

002

94.6

798

.60

0.00

0.00

0.00

94.4

398

.54

0.00

0.00

0.00

395

.05

98.1

10.

000.

410.

0094

.76

97.9

60.

000.

490.

004

95.0

497

.56

0.00

0.93

0.00

94.5

797

.42

0.00

1.02

0.00

595

.15

97.4

90.

001.

100.

0094

.75

97.2

70.

001.

230.

00200

1094

.96

96.8

30.

001.

880.

0094

.22

96.2

80.

002.

230.

0015

94.5

996

.20

0.00

2.62

0.00

93.5

295

.29

0.00

3.24

0.00

2094

.87

96.1

00.

002.

960.

0093

.53

94.9

30.

003.

660.

0030

95.1

796

.10

0.00

2.92

0.00

93.1

894

.40

0.00

4.12

0.00

4095

.34

96.1

50.

003.

270.

0092

.53

93.1

70.

005.

200.

00

26

Page 33: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Tabl

e4

(con

tinue

d).

Cha

ract

eris

tics

ofLR

and

LMpr

ojec

tion-

base

dco

nfide

nce

sets

_10≤

Cij≤

20

LMLR

Tk

Cov

erag

eC

over

age

Unb

ound

edE

mpt

yC

S=R

Cov

erag

eC

over

age

Unb

ound

edE

mpt

yC

S=R

rate

forβ

rate

forβ

1C

SC

Sra

tefo

rβra

tefo

rβ1

CS

CS

295

.15

98.5

30.

000.

000.

0093

.98

98.1

40.

000.

000.

003

98.1

499

.45

0.00

0.00

0.00

97.4

099

.26

0.00

0.00

0.00

499

.30

99.8

40.

000.

000.

0098

.75

99.6

40.

000.

000.

005

99.6

999

.97

0.00

0.00

0.00

99.3

099

.86

0.00

0.00

0.00

50

1099

.99

100.

000.

000.

000.

0099

.99

100.

000.

000.

000.

0015

100.

0010

0.00

0.00

0.00

0.00

100.

0010

0.00

0.00

0.00

0.00

2010

0.00

100.

000.

000.

000.

0010

0.00

100.

000.

000.

000.

0030

100.

0010

0.00

0.00

0.00

0.00

100.

0010

0.00

0.00

0.00

0.00

4010

0.00

100.

000.

000.

000.

0010

0.00

100.

000.

000.

000.

002

95.1

098

.56

0.00

0.00

0.00

94.6

498

.44

0.00

0.00

0.00

398

.08

99.3

90.

000.

000.

0097

.72

99.2

80.

000.

000.

004

99.1

999

.81

0.00

0.00

0.00

98.9

799

.77

0.00

0.00

0.00

599

.65

99.8

90.

000.

000.

0099

.44

99.8

40.

000.

000.

00100

1099

.99

100.

000.

000.

000.

0099

.99

100.

000.

000.

000.

0015

100.

0010

0.00

0.00

0.00

0.00

100.

0010

0.00

0.00

0.00

0.00

2010

0.00

100.

000.

000.

000.

0010

0.00

100.

000.

000.

000.

0030

100.

0010

0.00

0.00

0.00

0.00

100.

0010

0.00

0.00

0.00

0.00

4010

0.00

100.

000.

000.

000.

0010

0.00

100.

000.

000.

000.

002

94.6

798

.60

0.00

0.00

0.00

94.4

398

.54

0.00

0.00

0.00

397

.98

99.4

50.

000.

000.

0097

.79

99.3

90.

000.

000.

004

99.1

199

.76

0.00

0.00

0.00

98.9

699

.74

0.00

0.00

0.00

599

.71

99.9

50.

000.

000.

0099

.60

99.9

10.

000.

000.

00200

1010

0.00

100.

000.

000.

000.

0010

0.00

100.

000.

000.

000.

0015

100.

0010

0.00

0.00

0.00

0.00

100.

0010

0.00

0.00

0.00

0.00

2010

0.00

100.

000.

000.

000.

0010

0.00

100.

000.

000.

000.

0030

100.

0010

0.00

0.00

0.00

0.00

100.

0010

0.00

0.00

0.00

0.00

4010

0.00

100.

000.

000.

000.

0010

0.00

100.

000.

000.

000.

00

27

Page 34: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Table 5. Comparison between AR and LR projection-based confidence setswhen they are bounded

1 ≤ Cij ≤ 5 10 ≤ Cij ≤ 20

T k2 AR shorter CI mean length AR shorter CI mean lengththan LR (%) AR LR than LR (%) AR LR

2 0.00 9.80 13.28 0.00 0.53 0.513 38.65 25.85 15.59 45.37 0.43 0.454 59.68 20.89 31.69 68.54 0.59 0.655 71.75 82.79 62.85 80.47 0.49 0.57

50 10 91.32 17.96 23.62 95.65 0.44 0.5815 96.24 6.83 11.22 97.39 0.35 0.4920 94.14 16.07 17.61 97.59 0.35 0.5130 87.98 7.30 14.94 93.66 0.35 0.5140 53.66 13.66 11.54 67.12 0.49 0.592 0.00 13.05 12.88 0.00 0.62 0.613 44.21 16.37 15.93 59.74 0.49 0.524 69.88 17.00 23.77 82.57 0.58 0.665 85.97 16.48 16.16 92.01 0.43 0.50

100 10 99.20 6.04 14.87 99.65 0.36 0.4815 99.79 4.71 10.85 99.92 0.28 0.4020 100.00 4.78 23.20 99.98 0.33 0.5030 99.96 3.85 31.25 100.00 0.28 0.4640 100.00 8.67 17.75 100.00 0.27 0.472 0.00 13.59 43.78 0.00 0.53 0.523 56.82 33.94 18.59 70.54 0.49 0.524 88.33 41.99 259.61 91.35 0.55 0.625 95.67 21.27 15.42 96.71 0.40 0.47

200 10 99.86 7.82 14.02 99.93 0.32 0.4315 100.00 7.90 14.17 100.00 0.28 0.4020 100.00 5.30 24.65 100.00 0.23 0.3530 100.00 2.14 11.72 100.00 0.24 0.4140 100.00 1.61 20.78 100.00 0.24 0.43

rate of the confidence sets based on LM and LR statistics are always greater than 98.5% and ap-proaches rapidly 100% ask2 increases. This is predictable since the LM and LR based confidencesets are doubly conservative, by majorization of their distribution and by projection. Even in theexact identification case, the statistic LR have a small size downward distortion.

In Table 5, we present a comparison between AR and LR projection-based confidence sets whenthey are bounded. The first column gives the percentages of AR confidence sets shorter than LRones, columns 2 and 3 give the mean length of these intervals. We do not consider the caseC = 0,since the intervals are nearly always unbounded in this case. As can be seen clearly, except in thecase of exact identification, the AR-based confidence sets are shorter than the LR-based ones.

As we may expect the high coverage rate of the LM and LR-based confidence sets inducespower loss for the test that rejectsH0 : β1 = β0

1 when the projection-based confidence set forβ1 excludesβ0

1. This is shown in Table 6 and Figure 1 where we present estimates ofP [rejectingH0 : β1 = 0.5 |β1 = βi

1] with a decision rule consisting of rejectingH0 if 0.5 is excluded from the

28

Page 35: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Tabl

e6.

Pow

erof

test

sin

duce

dby

proj

ectio

n-ba

sed

confi

denc

ese

ts_

H0

:β1

=0

;T=

100

k2

=2

β1

AR

AR

SLM

LR-0

.51.

001.

001.

001.

00-0

.41.

001.

001.

001.

00-0

.31.

001.

001.

001.

00-0

.21.

001.

001.

001.

00-0

.11.

001.

001.

001.

000.

00.

990.

990.

990.

990.

10.

950.

950.

950.

950.

20.

760.

770.

760.

770.

30.

380.

390.

380.

390.

40.

090.

100.

090.

100.

50.

010.

010.

010.

010.

60.

090.

090.

090.

090.

70.

350.

360.

350.

360.

80.

710.

720.

710.

720.

90.

910.

910.

910.

911.

00.

980.

980.

980.

981.

11.

001.

001.

001.

001.

21.

001.

001.

001.

001.

31.

001.

001.

001.

001.

41.

001.

001.

001.

001.

51.

001.

001.

001.

00

k2

=4

AR

AR

SLM

LR1.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

000.

970.

970.

940.

950.

780.

800.

670.

700.

380.

400.

230.

260.

090.

100.

020.

030.

020.

030.

000.

000.

080.

090.

030.

030.

290.

320.

188

0.19

0.63

0.66

0.52

0.52

0.87

0.89

0.82

0.81

0.97

0.97

0.95

0.95

0.99

0.99

0.99

0.99

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

k2

=10

AR

AR

SLM

LR1.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

000.

990.

990.

960.

970.

740.

820.

580.

630.

150.

230.

120.

150.

000.

010.

040.

050.

000.

000.

110.

140.

000.

010.

480.

540.

120.

140.

880.

900.

610.

620.

990.

990.

930.

931.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

00

k2

=20

AR

AR

SLM

LR1.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

000.

910.

980.

870.

910.

120.

330.

210.

280.

000.

000.

040.

070.

000.

000.

180.

260.

000.

000.

750.

820.

100.

180.

990.

990.

760.

821.

001.

000.

991.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

001.

00

29

Page 36: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Figure 1. Power of tests induced by projection-based confidence setsH0 : β1 = 0.5

30

Page 37: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

confidence set forβ1. The theoretical size is 95%. The value of the alternative varies from−0.5 to1.5 with increments of0.1.

For k2 = 2, the three tests have the same power but ask2 increases, it appears clearly that theLM and LR based tests have less power and tends to reject few often. On the other hand, whenk2

increases, the test ARS appears to be powerful but in fact its size becomes greater thanα.

7. Empirical illustrations

In this section we illustrate the statistical inference methods discussed in the previous sectionsthrough three empirical applications related to important issues in the macroeconomic and labor eco-nomics literature. The first one concerns the relation between growth and trade examined throughcross-country data on a large sample of countries, the second one considers the widely studied prob-lem of returns to education, and the third application is about the returns to scale and externalityspillovers in U.S. industry.

7.1. Trade and growth

A large number of cross-country studies in the macroeconomics literature have looked at the re-lationship between standards of living and openness. Recent literature includes Irwin and Tervio(2002), Frankel and Romer (1996, 1999), Harrison (1996), Mankiw, Romer, and Weil (1992) andthe survey of Rodrik (1995). Despite the great effort that has been devoted to studying this issue,there is little persuasive evidence concerning the effect of openness on income even if many studiesconclude that openness has been conductive to higher growth.

Estimating the impact of openness on income through a cross-country regression raises twobasic difficulties. The first one consists in finding an appropriate indicator of openness. The mostcommonly used one is the trade share (ratio of imports or exports to GDP). The second problem isthe endogeneity of this indicator. Frankel and Romer (1999) argue that the trade share should beviewed as an endogenous variable, and similarly for the other indicators such as trade policies.

As a solution to this problem, Frankel and Romer (1999) proposed to use IV methods to estimatethe income-trade relationship. The equation studied is given by

yi = a + bTi + c1Ni + c2Ai + ui (7.1)

whereyi is log income per person in countryi, Ti the trade share (measured as the ratio of importsand exports to GDP),Ni the logarithm of population, andAi the logarithm of country area. Thetrade shareTi can be viewed as endogenous, and to take this into account, the authors used aninstrument constructed on the basis of geographic characteristics [see Frankel and Romer (1999,equation (6), page 383)].

The data used lists for each country, the trade share in 1985, the area and population (1985), andits income per person in 1985.6 The authors focus on two samples. The first is the full 150 countries

6The data set and its sources are given in the appendix of Frankel and Romer (1999).

31

Page 38: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Table 7. Confidence sets for the coefficients of the Frankel-Romer income-trade equationA. Bivariate joint confidence sets (size= 95%)

θ Joint confidence set(95%)

(b, c1) θ′(

1.78 −16.36−16.36 257.85

)θ +

( −2.23 , −34.50)θ + 0.19 ≤ 0

(b, c2) θ′(

3.83 −34.58−34.58 386.87

)θ +

( −10.6 , 69.17)θ + 2.13 ≤ 0

(b, a) θ′(

38.41 33.3433.35 29.52

)θ +

( −611.55 , −537.47)θ + 2445.58 ≤ 0

B. Projection-based individual confidence intervals (size≥ 95%)

Coefficient Projection-based confidence setsIV-based Wald-type confidence setsOpenness [−0.21 , 6.18] [−0.01 , 3.95]

Population [−0.01 , 0.52] [−0.01 , 0.37]

Area [−0.14 , 0.49] [−0.11 , 0.29]

Constant [2.09 , 9.38] [0.56 , 9.36]

covered by the Penn World Table, and the second sample is the 98-country sample considered byMankiw, Romer, and Weil (1992).

In this paper we consider the sample of 150 countries. For this sample, it is not clear how“weak” the instruments are. TheF -statistic of the first stage regression

Ti = α + βZi + γ1Ni + γ2Ai + εi (7.2)

is about 13 [see Frankel and Romer (1999, Table 2, page 385)].To draw inference on the coefficients of the structural equation (7.1), we can use the Anderson-

Rubin method in two ways. First if we are interested only in the coefficient of trade share, we caninvert the AR test forH0 : b = b0 to obtain a quadratic confidence set forb. On the other hand, ifwe want to build confidence sets for the other parameters of (7.1), we must first use the AR test toobtain a joint confidence set forb and each one of the other parameters and then use the projectionapproach to obtain confidence sets for each one of these parameters.7 As assumed in the literature,the observations are considered to be homoskedastic and uncorrelated but not necessarily normal,we use the asymptotic AR test with aχ2 distribution. The results are as follows.

The95% quadratic confidence set for the coefficient of trade shareb is given by:

Cb(α) = {b : 0.963b2 − 4.754b + 1.274 ≤ 0} = [0.284 , 4.652] . (7.3)

7We can not use the AR test to build directly confidence sets for the coefficients of the exogenous variables.

32

Page 39: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Thep-value of the Anderson-Rubin test forH0 : b = 0 is 0.0244, this means a significant positiveimpact of trade on income at the usual5% level. The IV estimation of this coefficient is1.97 with astandard error of0.99, yielding the confidence interval

(bIV − 2σbIV, bIV + 2σbIV

) = [−0.01 , 3.95], (7.4)

which is not very different from the AR-based confidence set. Interestingly, in contrast with (7.3),(7.4) does not exclude zero and may suggest thatb is not significantly different from zero.

The joint confidence sets obtained by applying the methods of section 4.4 to each pair obtainedby putting the trade share coefficient and each one of the other coefficients of (7.1) are given in Table7A. All the confidence sets are bounded, a natural outcome since we do not have a serious problemof identification in this model. From this confidence sets we can obtain projection-based confidenceintervals for each one of the parameters. Using Proposition5.1, we obtain the results presented inTable 7B. Even if 0 is included in the confidence sets for the coefficient of openness it is likely thatthe true value of the coefficient is positive.AR-projection-based confidence sets are conservative sowhen the level of the joint confidence set is95% it is likely that the level of the projection is close to98% (see the simulations in Section 6), but if we compare them to those obtained fromt-statistics,they are not really larger.

7.2. Education and earnings

The second application considers the well known problem of returns to education. Since the work ofAngrist and Krueger (1991), a lot of research has been done on this problem; see, for example, An-grist and Krueger (1995), Angrist, Imbens, and Krueger (1999), Bound, Jaeger, and Baker (1995).The central equation in this work is a relationship where the log weekly earning is explained by thenumber of years of education and several other covariates (age, age squared, year of birth, region,...). Education can be viewed as an endogenous variable, so Angrist and Krueger (1991) proposedto use the birth quarter as an instrument, for individuals born during the first quarter of the year startschool at an older age, and can therefore drop out after completing less schooling than individualsborn near the end of the year. Consequently, individuals born at the beginning of the year are likelyto earn less than those born during the rest of the year. Other versions of this IV regression take asinstruments interactions between the birth quarter and regional and/or birth year dummies.

It is well documented that the instrument set used by Angrist and Krueger (1991) is weak andexplain very little of the variation in education; see Bound, Jaeger, and Baker (1995). Consequently,standard IV-based inference is quite unreliable. We shall now apply the methods developed in thispaper to this relationship. The model considered is the following:

y = β0 + β1E +k1∑

i=1

γiXi + u , E = π0 +k2∑

i=1

πiZi +k2∑

i=1

φiXi + v ,

wherey is log-weekly earnings,E is the number of years of education (possibly endogenous),Xcontains the exogenous covariates [age, age squared, marital status, race, standard metropolitanstatistical area (SMSA), 9 dummies for years of birth, and 8 dummies for division of birth].Z

33

Page 40: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Table 8. Projection-based confidence sets for the coefficients of the exogenous covariates in theincome-education equation (size= 95%)

Covariate CS for education CS for covariate Wald CS covariateConstant [−0.86076934, 0.77468002] [−4.4353178, 16.836347] [4.121, 5.600]

Age [−0.86076841, 0.77467914] [−0.12099477, 0 .06963698] [−0.031, 0.002]

Age squared [−.86076865, 0.77467917] [−0.00772368, 0.00748569] [−0.001, 0.002]

Marital status R R [0.234, 0.263]

SMSA R R [0.120, 0.240]

Race R R [−0.352, −0.173]

Year 1 [−0.86076899, 0.77467898] [−0.72434684, 1.1399276] [−0.002, 0.187]

Year 2 [−0.86076919, 0.7746792] [−0.64290291, 1.0246588] [0.003, 0.172]

Year 3 [−0.86076854, 0.77467918] [−0.51469586, 0.84369807] [0.008, 0.154]

Year 4 [−.86076758, 0.77467916] [−0.4042831, 0.69265631] [0.013, 0.141]

Year 5 [−0.86076725, 0.77467906] [−0.28675828, 0.52165559] [0.015, 0.123]

Year 6 [−0.8607684., 0.77467903] [−0.2206811, 0.39879656] [0.007, 0.0980]

Year 7 R R [0.008, 0.080]

Year 8 [−0.86768146, 0.78338792] [−0.08312128, 0.17409244] [0.005, 0.0581]

Year 9 [−0.86076735, 0 .77467921] [−0.04610583, 0.1050552] [0.005, 0.038]

Division 1 R R [−0.150, −0.081]

Division 2 R R [−0.094, −0.015]

Division 3 R R [−0.048, 0.073]

Division 4 R R [−0.153,−0.067]

Division 5 R R [−0.205,−0.080]

Division 6 R R [−0.265,−0.074]

Division 7 R R [−0.161,−0.051]

Division 8 R R [−0.111,−0.075]

contains 30 dummies obtained by interacting the quarter of birth with the year of birth.β1 measuresthe return to education. The data set consists of the 5% public-use sample of the 1980 US censusfor men born between 1930 and 1939. The sample size is 329509 observations.

Since the instruments are likely to be weak, it appears important to use a method which isrobust to weak instruments. We consider her the AR procedure. If we were only interested bythe coefficient of education, we could compute the quadratic confidence set forβ1. But if we arealso interested in other coefficients, for example the age coefficient (say,γ1), the only way to geta confidence interval is to compute the AR joint confidence set for(β1, γ1) and then deduce byprojection a confidence set for age. Obviously since the instruments are weak, we should expectlarge, if not completely uninformative, intervals. Table 8 gives projection-based confidence setsfor the coefficients of education and different covariates. For each covariateXi, we computed theAR joint confidence set with education (a confidence set for(β1, γi) and then project to obtain aconfidence set forβ1 (column 2) and a confidence set forγi (column 3). The last column gives Wald-based confidence sets for each covariate obtained by 2SLS estimation of the equation of education.As expected many of the valid confidence sets are unbounded while Wald-type confidence sets arealways bounded but likely invalid.

For the coefficientβ1 measuring returns to education, the AR-based quadratic confidence inter-

34

Page 41: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

val of confidence level 95% is given by

AR_ICα(β1) = [−0.86, 0.77] . (7.5)

It is bounded but too large to provide relevant information on the magnitude of returns to education.The 2SLS estimate forβ1 is 0.06 with a standard error of 0.023 yielding the Wald-type confidenceintervalW_ICα(β1) = [0.0031, 0.1167].

7.3. Returns to scale and externality spillovers in U.S. industry

One of the widely studied problems in recent macroeconomics literature is the extent of returnsto scale and externalities in the U.S. industry. Recent work on these issues includes Hall (1990),Caballero and Lyons (1989, 1992), Basu and Fernald (1995, 1997) and Burnside (1996). The resultsof these researches and many others have important implications on many fields of macroeconomics,such as growth and business cycle models.

Burnside (1996) presents a short survey of different specifications of the production functionadopted in this literature. One of these specifications considers the following equation:

Yit = F (Kit, Lit, Eit,Mit) (7.6)

where, for each industryi and each periodt, Yit is the gross output, Kit is the amount of capitalservices used,Lit is the amount of labor,Eit is energy used, andMit is the quantity of materials. Ifwe assume thatF is a differentiable function and homogeneous of degreeρ, we get the followingregression equation [see Burnside (1996)]:

∆yit = ρ∆xit + ∆ait (7.7)

where∆yit is the growth rate of the output,∆xit is a weighted average of the inputs and∆ait

represents technological changes.8 In this specification,ρ is the coefficient that measures the extentof returns to scale. Returns to scale are increasing, constant or decreasing depending on whetherρ > 1, ρ = 1 or ρ < 1.

To identify simultaneously the effects of externalities between industries, Caballero and Lyons(1992) added to the previous regression equation the aggregated industrial output as a measure ofthis effect. Burnside (1996) suggested a variable based on inputs rather than output, arguing bythe fact that the first measure may induce spurious externalities for industries with a large output.Adopting the later suggestion, the previous regression equation becomes:

∆yit = ρ∆xit + η∆xt + uit (7.8)

where∆xt is the cost shares weighted average of the∆xit [Burnside (1996, equation (2.8))] anduit = ∆ait. The coefficientη measures the externalities effect.

To estimate this equation, Hall (1990) proposed a set of instruments that was used in most sub-sequent researches. These instruments include the growth rate of military purchases, the growth rate

8The weights are the production cost shares of each input.

35

Page 42: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

of world oil price, a dummy variable representing the political party of the President of Unites Statesand one lag of each of these variables. Estimation methods used include ordinary least squares, twostages least squares and three stages least squares.

The regressions are performed using panel data on two-digit SIC (Standard Industrial Classifi-cation) code level manufacturing industries. This classification includes 21 industries. The data setis described in detail by Jorgenson, Gollop, and Fraumeni (1987) and contains information on grossoutput, labor input, stock of capital, energy use, and materials inputs.

These regressions are interesting as an application for the statistical inference methods devel-oped in this paper because the instruments used appear to be weak and may induce identificationproblems. These instruments have been studied in detail by Burnside (1996) who showed on thebasis of calculations ofR2 and partialR2 [Shea (1997)], that these instruments are weak. A validmethod to draw inference onρ (returns to scale) andη (externalities) then consists in using an ex-tension of the Anderson-Rubin approach [as suggested in Dufour and Jasiak (2001)] to build a jointconfidence set for(ρ, η)′ and then build through projection individual confidence intervals forρ andη.9

Given this identification problem, we expect unbounded confidence sets. Using the same dataset as Burnside (1996), we obtained the results presented in Table 9. This table presents the 2SLSestimates and the confidence sets for the returns to scale coefficients and externalities coefficients in21 U.S. manufacturing industries over the period 1953-1984. The projection based confidence setsare obtained from joint confidence sets for(ρ, η) of level90%.10

The average estimation over all industries of the coefficientsρ andη are of the same order asthose obtained by Burnside (1996).11 Only 7 among 21 confidence sets are bounded. For industries19 (stone, clay and glass) and 26 (instruments), the returns to scale are increasing. For industry 15(chemicals), the returns to scale are decreasing. For industries 9 (textile mill products), 12 (furnitureand fixtures), 13 (paper and allied), and 23 (electrical machinery) the hypothesis of constant returnsto scale is rejected with a significance level smaller than or equal 10%. For industry 10 (apparel) theconfidence set is empty which may be explained by the fact that the data does not support the model.For industries 7 (food and kindred products), 8 (tobacco), 11 (lumber and wood), 16 (petroleum andcoal products), 17 (rubber and miscellaneous plastics), 18 (leather), and 24 (motor vehicles), theconfidence sets are equal toR and thus provide no information onρ andη.

8. Conclusion

Recent research in econometrics has shown that weak instruments are quite widespread and shouldbe carefully addressed. Techniques which are robust to weak instruments typically require oneto consider first joint inference problem on all or, at least, some subvector of model parameters.This leads to the problem of drawing inference on individual coefficients (or lower dimensional

9As reported in Caballero and Lyons (1989), there is no evidence of serial correlation from either the Durbin-Watsonstatistic or the Ljung-BoxQ statistic.

10We usedχ2 as asymptotic distribution for the Anderson-Rubin statistic instead of the Fisher distribution valid undernormality and independence assumption.

11The small differences may be due to the use of TSLS instead of 3SLS.

36

Page 43: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

Table 9. Confidence sets for the returns to scale and externality coefficients in different U.S.industries (size≥ 90%)

Returns to scale ExternalitiesIndustry 2SLS Confidence set 2SLS Confidence set

7: Food & kindred products 0.99 R -0.06 R8: Tobacco 1.06 R 0.28 R9: Textile mill products 0.61 ]−∞, 0.56] ∪ [2.23, ∞[ 0.20 R10: Apparel 1.09 ∅ -0.05 ∅11: Lumber & wood 0.86 R -0.08 R12: Furniture and fixtures 1.13 ]−∞, 0.58] ∪ [1.77, ∞[ -0.01 ]−∞, −0.73] ∪ [0.55, ∞[

13: Paper and allied 0.54 ]−∞, 0.74] ∪ [4.56, ∞[ 0.61 ]−∞, −4.51] ∪ [0.45, ∞[

14: Printing; publishing 0.93 [−1.2, 4.23] 0.23 [−0.11, 1.05]

15: Chemicals 0.22 [−7.36, 0.54] 1.06 [0.85, 11.7]

16: Petroleum & coal products 0.34 R 0.29 R17: Rubber & misc. plastics 1.29 R -0.31 R18: Leather 0.39 R 0.01 R19: Stone, clay, glass 1.21 [1, 3.34] -0.03 [−3.16, 0.15]

20: Primary metal 0.79 [0.46, 1.01] 0.42 [−0.37, 1.51]

21: Fabricated metal 0.80 ]−∞, 2.25] ∪ [1.15, ∞[ 0.30 ]−∞, −0.13] ∪ [4.21, ∞[

22: Machinery, non-electrical 1.16 [0.73, 1.81] 0.02 [−1.41, 0.76]

23: Electrical machinery 1.17 ]−∞, 0.29] ∪ [2.47, ∞[ 0.05 ]−∞, 1.16] ∪ [1.72, ∞[

24: Motor vehicles 1.23 R -0.12 R25: Transportation equipment 1.07 [0.64, 1.55] 0.10 [−0.36, 1.6]

26: Instruments 1.38 [1.19, 3.29] -0.07 [−1.5, 0.38]

27: Misc. manufacturing 1.5 ]−∞, −88.7] ∪ [0.48, ∞[ -0.51 ]−∞, 0.12] ∪ [102.1, ∞[

Mean 0.94 0.11

37

Page 44: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

subvectors). In this paper, we considered this problem from a finite-sample limited-informationviewpoint and focused on AR-type tests and confidence sets. Important reasons for this choice stemfrom the fact that AR statistics are free of nuisance parameters in finite samples under standardassumptions, as well as robust to excluded instruments and more generally to the specification of amodel for the endogenous explanatory variables.

We observed that AR-type confidence sets belong to a class of sets defined through quadriccurves (which include ellipsoids as a special case). A simple condition for deciding whether suchconfidence sets are bounded was derived. On observing that a projection technique does providefinite-sample confidence sets for individual coefficients in such contexts (indeed, the only procedurefor which a finite-sample theory is currently available), we derived a complete analytic solution tothe problem of building projection-based confidence sets for individual structural coefficients (or lin-ear combinations of the latter) when the joint confidence set has a quadric structure. The confidencesets so obtained turn out to be as easy to compute as standard Wald-type 2SLS-based confidenceintervals. When they take the form of a closed interval, they can be interpreted as Wald-type con-fidence intervals based on k-class estimators using a pseudo “standard error” which depend on theAR critical value. The confidence sets so obtained have the additional feature of being simultaneousin the sense of Scheffé, so an unlimited number of such confidence sets can be built without losingcontrol of their overall level. We also provided simulation results showing that projection-basedconfidence sets perform reasonably well in terms of accuracy. Since no alternative finite-sampleprocedure that enjoys the same robustness features appears to be available, it is currently the bestsolution to the problem at hand.

We think that analytical results presented here on quadric confidence sets can be useful in othercontexts involving, for example, errors-in-variables models [see Dufour and Jasiak (2001)], nonlin-ear models [see Dufour and Taamouti (2001b)] and dynamic models. Such extensions would gobeyond the scope of the present paper. Another issue not discussed here consists in choosing theinstruments to be used for the purpose of performing AR-type tests. It is easy to see that the powerof AR-type tests may decline as the number of instruments increases, especially if they have littlerelevance. We study in detail the problem of selecting optimal instruments and reducing the numberof instruments in two companion papers [Dufour and Taamouti (2001b, 2001a)].

38

Page 45: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

A. Appendix: Proofs

PROOF OF EQUATION(4.13) Settingx = Pβ andδ = Pb, we can write:

β′Aβ + b′β + c = β′P ′DPβ + b′P ′Pβ + c = (Pβ)′D(Pβ) + (Pb)′(Pβ) + c

=r∑

i=1

λix2i +

G∑

i=1

δixi + c =r∑

i=1

λi(x2i +

δixi

λi) +

G∑

i=r+1

δixi + c

=r∑

i=1

λi(xi +δi

2λi)2 +

G∑

i=r+1

δixi + c−r∑

i=1

δ2i

4λi=

r∑

i=1

λiz2i +

G∑

i=r+1

δizi − d

whered = −(c−

r∑i=1

δ2i /(4λi)

), z = Pβ + u, with ui = δi

2λi, if λi 6= 0, andui = 0, otherwise.

PROOF OFTHEOREM4.1 The proof of this proposition is a direct consequence of the study of thecharacteristics ofCβ in Section 4. If the eigenvalues ofA are all positive (see Section 4.1.1),Cβ

is bounded. If the eigenvalues ofA are negative,Cβ is unbounded (see Section 4.1.2). If they aredifferent from 0 but of different signs,Cβ is always unbounded whatever the sign ofd (see Section4.1.3). IfA is singular,Cβ is unbounded by (4.13); see Section 4.2.

PROOF OFTHEOREM 5.1 Consider again the decompositionA = P′DP as in (4.4). By (5.8),

we have, for anyx0 ∈ R, x0 ∈ Cw′β ⇔x0 − w′β ∈ Ca′z , wherea = Pw. Let x = x0 − w′β . Bydefinition,x ∈ Ca′z iff (if and only if) there is a vectorz ∈ RG such that

z′Dz ≤ d anda′z = x . (A.1)

Further, there is az verifying (A.1) iff the solution of the problem

minz

z′Dz s.c.a′z = x (A.2)

verifies the constraint (A.1). Ifd < 0, it is clear there is no solution verifying (A.1) _ forD ispositive definite _ and consequentlyCa′z = Cw′β = ∅. Let d ≥ 0. The Lagrangian of the problem(A.2) isL = z′Dz+µ(x−a′z) . SinceD is positive definite, the first order conditions are necessaryand sufficient. These are:

2Dz = µa , a′z = x,

hence

µ =2x

a′D−1a, z =

x

a′D−1aD−1a , z′Dz =

12µx =

x2

a′D−1a.

Thus

x ∈ Ca′z ⇔ x2

a′D−1a≤ d ⇔ |x| ≤

√d (a′D−1a) ⇔ |x0 − w′β | ≤

√d (a′D−1a) .

39

Page 46: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

On noting thata′D−1a = w′A−1w, this entails that the confidence set forw′β is given by (5.9).

PROOF OFTHEOREM5.3 As in the proof of Proposition5.1, we consider again the decomposition(4.4), the equivalencex0 ∈ Cw′β ⇔x0 − w′β ∈ Ca′z , and we setx = x0 − w′β anda = Pw .Now, x ∈ Ca′z iff there is a value ofz ∈ RG such that

a′z = a1z1 + · · ·+ aG−1zG−1 + aGzG = x , (A.3)

z′Dz = λ1z21 + · · ·+ λG−1z

2G−1 − |λG|z2

G ≤ d , (A.4)

where (without loss of generality) we assume thatλG is the negative eigenvalue. Leta(G) =(a1, a2, . . . , aG−1)′, z(G) = (z1, z2, . . . , zG−1)′, andD(G) = diag(λ1, λ2, . . . , λG−1)′.

If aG = 0, thena(G) 6= 0 (becausew 6= 0 entailsa 6= 0), andw′A−1w = a′D−1a > 0. In thiscase, for anyx ∈ R, we can choosez such thata1z1 + · · ·+ aG−1zG−1 = x andzG is sufficientlylarge to ensure that (A.4) holds. HenceCa′z = R andCw′β = R .

We will now suppose thataG 6= 0. Then, the conditions (A.3) - (A.4) are equivalent to:

zG = (x− a′(G)z(G))/aG , (A.5)

|λG|(

x− a′(G)z(G)

aG

)2

≥ −d + z′(G)D(G)z(G) , (A.6)

where the latter inequality can also be written as[|λG| s2

(G) − a2G(z′(G)D(G)z(G))

]− 2|λG|s(G)x +

[|λG|x2 + da2G

] ≥ 0 (A.7)

wheres(G) = a′(G)z(G). Since (A.5) always allows one to obtain (A.3) once the vectorz(G) isgiven, a necessary and sufficient condition forx ∈ Ca′z is the existence of a vectorz(G) whichsatisfies inequality (A.7). Further, such a vectorz(G) does exist iff we can find a values suchthat the supremum (with respect toz(G)) of the left-hand side of (A.7) subject to the restrictiona′(G)z(G) = s is larger than zero. Consequently, we consider the problem:

minz

z′(G)D(G)z(G) s.c.a′(G)z(G) = s (A.8)

wheres is some real number. SinceD(G) is positive definite, the first order conditions are necessaryand sufficient to characterize a solution of (A.8). The Lagrangian for this problem is given byL = z′(G)D(G)z(G) − µ(a′(G)z(G) − s) and the corresponding first order conditions are:

2D(G)z(G) = µa(G) , a′(G)z(G) = s , (A.9)

hence

µ =2s

a′(G)D−1(G)a(G)

, z(G) =s

a′(G)D−1(G)a(G)

D−1(G)a(G) , z′(G)D(G)z(G) =

s2

a′(G)D−1(G)a(G)

40

Page 47: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

wherea′(G)D−1(G)a(G) > 0. Substituting the solution of (A.8) into (A.7), we get:

q s2 − (2|λG|x)s +(|λG|x2 + da2

G

) ≥ 0 (A.10)

whereq = |λG| −[a2

G/a′(G)D−1(G)a(G)

]= δG(w′A−1w) andδG ≡ |λG|/a′(G)D

−1(G)a(G) > 0. Thus,

x ∈ Ca′z iff there is a value ofs such that (A.10) holds. The discriminant of this second degreeequation is

∆ = 4λ2Gx2 − 4Q

(|λG|x2 + da2G

)= 4δGa2

G

[x2 − d(w′A−1w)

].

We will now consider in turn each possible case for the signs ofw′A−1w andd .If w′A−1w > 0, thenq > 0 and, for anyx, we can find a (sufficiently large) value ofs such that(A.10) will hold. Consequently,Ca′z = Cw′β = R . Thus,w′A−1w > 0 entailsCa′z = Cw′β = R ,irrespective of the value ofaG (the caseaG = 0 was considered at the beginning of the proof).If w′A−1w < 0 andd < 0, thenq < 0 and (A.10) has a (real) solution iff∆ ≥ 0 or, equivalently,x2 ≥ d (w′A−1w) > 0 . Consequently,

Ca′z =]−∞ , −

√d (w′A−1w)

] ∪ [√d (w′A−1w), +∞ [

, (A.11)

Cw′β =]−∞ , w′β −

√d (w′A−1w)

] ∪ [w′β +

√d (w′A−1w), +∞ [

. (A.12)

If w′A−1w = 0 andd < 0, (A.10) can be satisfied for anyx 6= 0, henceCa′z = R\{0} andCw′β = R\{w′β}.Finally, if d ≥ 0, (A.10) is satisfied for anyx (on takings = 0), and we haveCa′z = Cw′β = R.All possible cases have been covered.

PROOF OFTHEOREM 5.4 We need to show thatCa′z = R. To see this, letλi1 andλi2 be the twonegative eigenvalues of the matrixA, and (without loss of generality) supposea1 6= 0. For any realx, we will show thatx ∈ Ca′z , which entails thatCw′β = Ca′z = R.If λi1 or λi2 is associated withz1 (say it isλi1), we can set set the components ofz such that:(1) z1 =

(x− ai2zi2

)/a1 ; (2) zi = 0, for i > 1, i 6= i2 ; (3) λ1z

21 + λi2z

2i2≤ d . Sinceλi1 andλi2

are negative,zi2 does exist. The vectorz verifies (4.5) anda′z = x, hencex ∈ Ca′z.If none ofλi1 andλi2 is associated withz1, we can setz so that:(1) z1 = x/a1 ; (2) zi = 0, fori 6= i1, i 6= i2 andi > 1 ; (3) λi1z

2i1

+ λi2z2i2≤ d− λ1 (x/a1)

2 andai1zi1 + ai2zi2 = 0 . Sinceλi1

andλi2 are negative, appropriate values ofzi1 andzi2 always exist, hencex ∈ Ca′z.

PROOF OF THEOREM 5.5 (a) Consider first the case whereA22 is positive semidefinite withA22 6= 0. To cover this situation, it will be convenient to distinguish between 2 subcases: (a.1)r2 = G− 1; (a.2) 1 ≤ r2 < G− 1 .(a.1) If r2 = G − 1, A22 is positive definite. From (5.15), we can writeQ(δ) = Q(δ1, δ2).δ1 ∈ Cδ1 iff the following condition holds: (1) ifQ(δ1, δ2) has a minimum with respect toδ2,the minimal value is less than or equal to zero, and (2) ifQ(δ1, δ2) does not have a minimumwith respect toδ2, the infimum is less than than zero. To check this, we consider the problem ofminimizing Q(δ1, δ2) with respect toδ2. The first and second order derivatives ofQ with respect

41

Page 48: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

to δ2 are:∂Q

∂δ2= 2A22δ2 + 2A21δ1 + b2 = 0 ,

∂2Q

∂δ2∂δ′2= 2A22 . (A.13)

Here the Hessian2A22 is positive definite, so that there is a unique minimum obtained by setting∂Q/∂δ2 = 0 :

δ2 = −12A−1

22

[2A21δ1 + b2

]= −A−1

22 A21δ1 − 12A−1

22 b2 . (A.14)

On settingδ2 = δ2 in Q(δ1, δ2), we get (after some algebra) the minimal value:

Q(δ1, δ2) = a1δ21 + b1δ1 + c1 (A.15)

wherea1 = a11 − A′21A−122 A21 , b1 = b1 − A′21A

−122 b2 , c1 = c − 1

4 b′2A−122 b2 . In this case, we also

haveA−122 = A+

22, and (5.20) holds withS1 = ∅.(a.2) If 1 ≤ r2 < G− 1, we get, using (5.15), (5.17) - (5.19):

Q(δ) = a11δ21 + b1δ1 + c + δ

′2D2δ2 +

[2A21δ1 + b2

]′δ2

= a11δ21 + b1δ1 + c + δ

′2∗D2∗δ2∗ +

[2A21∗δ1 + b2∗

]′δ2∗ +

[P ′

22(2A21δ1 + b2)]′

δ22

whereδ2∗ = P ′21δ2 , δ22 = P ′

22δ2 , andD2∗ is a positive definite matrix. We will now distinguishbetween two further cases: (i)P ′

22(2A21δ1 + b2) = 0 , and (ii)P ′22(2A21δ1 + b2) 6= 0 .

(i) If P ′22(2A21δ1 + b2) = 0, Q(δ) takes the form:

Q(δ) = a11δ21 + b1δ1 + c + δ

′2∗D2∗δ2∗ + [2A21∗δ1 + b2∗]′δ2∗ . (A.16)

By an argument similar to the one used for (a.1), we can see that

δ1 ∈ Cδ1 iff a1δ21 + b1δ1 + c1 ≤ 0 (A.17)

wherea1 = a11 − A′21∗D−12∗ A21∗ , b1 = b1 − A′21∗D

−12∗ b2∗ , c1 = c − 1

4 b′2∗D−12∗ b2∗. Further, since

A22 = P2D2P′2, the Moore-Penrose inverse ofA22 is [see Harville (1997, Chapter 20)]:

A+22 = P2

[D−1

2∗ 00 0

]P ′

2 = [P21, P22][

D−12∗ 00 0

][P21, P22]′ = P21D

−12∗ P ′

21 , (A.18)

hence

A′21∗D−12∗ A21∗ = A′21P21D

−12∗ P ′

21A21 = A′21A+22A21 , (A.19)

A′21∗D−12∗ b2∗ = A′21P21D

−12∗ P ′

21b2 = A′21A+22b2 , (A.20)

b′2∗D−12∗ b2∗ = b′2P21D

−12∗ P ′

21b2 = b′2A+22b2 . (A.21)

(ii) If P ′22(2A21δ1 + b2) 6= 0, then for any value ofδ1 we can chooseδ22 so thatQ(δ1, δ2) < 0,

42

Page 49: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

which entails thatδ1 ∈ Cδ1 . Putting together the conclusions drawn in (i) and (ii) above, we see that

Cδ1 = {δ1 : P ′22(2A21δ1 + b2) = 0 anda1δ

21 + b1δ1 + c1 ≤ 0} ∪ {δ1 : P ′

22(2A21δ1 + b2) 6= 0}= {δ1 : a1δ

21 + b1δ1 + c1 ≤ 0} ∪ {δ1 : P ′

22(2A21δ1 + b2) 6= 0} (A.22)

and (5.20) holds withS1 = {δ1 : P ′22(2A21δ1 + b2) 6= 0} . This completes the proof of part (a) of

the theorem.(b) If G = 1 or A22 = 0 , we can write:

Q(δ1, δ2) = a11δ21 + b1δ1 + c + [2A21δ1 + b2]′δ2 (A.23)

where we setA21 = b2 = 0 whenG = 1. If 2A21δ1 + b2 = 0, we see immediately that:δ1 ∈ Cδ1

iff a11δ21 + b1δ1 +c ≤ 0. Of course, this obtains automatically whenG = 1. If 2A21δ1 + b2 6= 0, we

can chooseδ2 so thatQ(δ1, δ2) < 0, irrespective of the value ofδ1. Part (b) of the theorem followson putting together these two observations.(c) If A22 is not positive semidefinite andA22 6= 0, this means we can find a vectorδ20 such thatδ′20A22δ20 ≡ q0 < 0 . Now, for any scalar∆0, we have:

Q(δ1, ∆0δ20) = a11δ21 + b1δ1 + c + ∆2

0 q0 + ∆0[2A21δ1 + b2]′δ20 . (A.24)

Sinceq0 < 0 , we can choose∆0 sufficiently large to haveQ(δ1, ∆0δ20) < 0, irrespective of thevalue ofδ1. This entails that all values ofδ1 belong toCδ1 , henceCδ1 = R .

43

Page 50: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

References

ABDELKHALEK , T., AND J.-M. DUFOUR (1998): “Statistical Inference for Computable General Equilibrium Models,with Application to a Model of the Moroccan Economy,”Review of Economics and Statistics, LXXX, 520–534.

ANDERSON, T. W., AND H. RUBIN (1949): “Estimation of the Parameters of a Single Equation in a Complete Systemof Stochastic Equations,”Annals of Mathematical Statistics, 20, 46–63.

ANGRIST, J. D., G. W. IMBENS, AND A. B. KRUEGER(1999): “Jackknife Instrumental Variables Estimation,”Journalof Applied Econometrics, 14, 57–67.

ANGRIST, J. D.,AND A. B. KRUEGER(1991): “Does Compulsory School Attendance Affect Schooling and Earning?,”Quarterly Journal of Economics, CVI, 979–1014.

ANGRIST, J. D.,AND A. B. KRUEGER(1995): “Split-Sample Instrumental Variables Estimates of the Return to School-ing,” Journal of Business and Economic Statistics, 13, 225–235.

BASU, S.,AND J. G. FERNALD (1995): “Are Apparent Productive Spillovers a Figment of Specification Error?,”Journalof Monetary Economics, 36, 165–188.

(1997): “Returns to Scale in U.S. Production : Estimates and Implications,”Journal of Political Economy,105(2), 249–283.

BOUND, J., D. A. JAEGER, AND R. BAKER (1993): “The Cure can be Worse than the Disease: A Cautionary Tale Re-garding Instrumental Variables,” Technical Working Paper 137, National Bureau of Economic Research, Cambridge,MA.

BOUND, J., D. A. JAEGER, AND R. M. BAKER (1995): “Problems With Instrumental Variables Estimation When theCorrelation Between the Instruments and the Endogenous Explanatory Variable Is Weak,”Journal of the AmericanStatistical Association, 90, 443–450.

BURNSIDE, C. (1996): “Production Function Regressions, Returns to Scale, and Externalities,”Journal of MonetaryEconomics, 37, 177–201.

BUSE, A. (1992): “The Bias of Instrumental Variables Estimators,”Econometrica, 60, 173–180.

CABALLERO , R. J.,AND R. K. LYONS (1989): “The Role of External Economies in US Manufacturing,” DiscussionPaper 3033, National Bureau of Economic Research, Cambridge, Massachusetts.

(1992): “External Effects in U.S. Procyclical Productivity,”Journal of Monetary Economics, 29, 209–225.

CHAO, J., AND N. R. SWANSON (2000): “Bias and MSE of the IV Estimators under Weak Identification,” Discussionpaper, Department of Economics, University of Maryland.

CHOI, I., AND P. C. B. PHILLIPS (1992): “Asymptotic and Finite Sample Distribution Theory for IV Estimators andTests in Partially Identified Structural Equations,”Journal of Econometrics, 51, 113–150.

CRAGG, J. G.,AND S. G. DONALD (1993): “Testing Identifiability and Specification in Instrumental Variable Models,”Econometric Theory, 9, 222–240.

DAVIDSON, R., AND J. G. MACK INNON (1993):Estimation and Inference in Econometrics. Oxford University Press,New York.

DONALD , S. G.,AND W. K. NEWEY (2001): “Choosing the Number of Instruments,”Econometrica, 69, 1161–1191.

DUFOUR, J.-M. (1989): “Nonlinear Hypotheses, Inequality Restrictions, and Non-Nested Hypotheses: Exact Simulta-neous Tests in Linear Regressions,”Econometrica, 57, 335–355.

(1990): “Exact Tests and Confidence Sets in Linear Regressions with Autocorrelated Errors,”Econometrica, 58,475–494.

(1997): “Some Impossibility Theorems in Econometrics, with Applications to Structural and Dynamic Models,”Econometrica, 65, 1365–1389.

DUFOUR, J.-M., AND J. JASIAK (1993): “Finite Sample Inference Methods for Simultaneous Equations and Modelswith Unobserved and Generated Regressors,” Discussion paper, C.R.D.E., Université de Montréal, 38 pages.

44

Page 51: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

DUFOUR, J.-M., AND J. JASIAK (2001): “Finite Sample Limited Information Inference Methods for Structural Equa-tions and Models with Generated Regressors,”International Economic Review, 42, 815–843.

DUFOUR, J.-M., AND J. F. KIVIET (1998): “Exact Inference Methods for First-Order Autoregressive Distributed LagModels,”Econometrica, 66, 79–104.

DUFOUR, J.-M., AND M. TAAMOUTI (2001a): “On Methods for Selecting Instruments,” Discussion paper, C.R.D.E.,Université de Montréal.

(2001b): “Point-Optimal Instruments and Generalized Anderson-Rubin Procedures for Nonlinear Models,” Dis-cussion paper, C.R.D.E., Université de Montréal.

FRANKEL , J. A., AND D. ROMER (1996): “Trade and Growth: An Empirical Investigation,” Discussion Paper 5476,National Bureau of Economic Research, Cambridge, Massachusetts.

(1999): “Does Trade Cause Growth?,”The American Economic Review, 89(3).

HAHN , J., AND J. HAUSMAN (2002a): “A New Specification Test for the Validity of Instrumental Variables,”Econo-metrica, 70, 163–189.

(2002b): “Notes on Bias in Estimators for Simultaneous Equation Models,”Economics Letters, 75, 237–241.

HALL , A. R., AND F. P. M. PEIXE (2000): “A Consistent Method for the Selection of Relevant Instruments,” Discussionpaper, Department of Economics, North Carolina State University, Raleigh, North Carolina.

HALL , A. R., G. D. RUDEBUSCH, AND D. W. WILCOX (1996): “Judging Instrument Relevance in Instrumental Vari-ables Estimation,”International Economic Review, 37, 283–298.

HALL , R. E. (1990): “Invariance Properties of Solow’s Productivity Residual,” inGrowth, Productivity, Employment,ed. by P. Diamond, pp. 71–112. The MIT Press, Cambridge, Massachusetts.

HARRISON, A. (1996): “Openness and Growth: A Time-Series, Cross-Country Analysis for Developing Countries,”Journal of Developement Economics, 48, 419–447.

HARVILLE , D. A. (1997):Matrix Algebra from a Statistician’s Perspective. Springer-Verlag, New York.

IRWIN, A. D., AND M. TERVIO (2002): “Does Trade Raise Income? Evidence from the Twentieth Century,”Journal ofInternational Economics, 58, 1–18.

JORGENSON, D. W., F. M. GOLLOP, AND B. M. FRAUMENI (1987):Productivity and U.S. Economic Growth. HarvardUniversity Press, Cambridge, Massachusetts.

KLEIBERGEN, F. (2001): “Testing Subsets of Structural Coefficients in the IV Regression Model,” Discussion paper,Department of Quantitative Economics, University of Amsterdam.

(2002): “Pivotal Statistics for Testing Structural Parameters in Instrumental Variables Regression,”Economet-rica, 70(5), 1781–1803.

LEHMANN , E. L. (1986):Testing Statistical Hypotheses, 2nd edition. John Wiley & Sons, New York.

MADDALA , G. S.(1974): “Some Small Sample Evidence on Tests of Significance in Simultaneous Equations Models,”Econometrica, 42, 841–851.

MADDALA , G. S., AND J. JEONG (1992): “On the Exact Small Sample Distribution of the Instrumental VariableEstimator,”Econometrica, 60, 181–183.

MANKIW , G. N., D. ROMER, AND D. N. WEIL (1992): “A Contribution to the Empirics of Economic Growth,”Quarterly Journal of Economics, 107(2), 407–37.

M ILLER , JR., R. G.(1981):Simultaneous Statistical Inference. Springer-Verlag, New York, second edn.

MOREIRA, M. J. (2001): “Tests With Correct Size When Instruments Can Be Arbitrarily Weak,” Discussion paper,Department of Economics, Harvard University, Cambridge, Massachusetts.

(2002): “A Conditional Likelihood Ratio Test for Structural Models,” Discussion paper, Department of Eco-nomics, Harvard University, Cambridge, Massachusetts.

NELSON, C. R.,AND R. STARTZ (1990a): “The Distribution of the Instrumental Variable Estimator and itst-ratio Whenthe Instrument is a Poor One,”Journal of Business, 63, 125–140.

45

Page 52: 2003s-39 Projection-Based Statistical Inference in Linear ... · consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows

(1990b): “Some Further Results on the Exact Small Properties of the Instrumental Variable Estimator,”Econo-metrica, 58, 967–976.

PERRON, B. (1999): “Semi-Parametric Weak Instrument Regressions with an Application to the Risk Return Trade-Off,”Discussion Paper 0199, C.R.D.E., Université de Montréal.

PETTOFREZZO, A. J.,AND M. L. M ARCOANTONIO (1970):Analytic Geometry with Vectors. Scott, Fosman and Com-pany, Glenview, Illinois.

RODRIK, D. (1995): “Trade and Industrial Policy Reform,” inHandbook of Development Economics, ed. by J. Behrman,andT. N. Srinivasan, vol. 3A. Elsevier Science, Amsterdam.

SAVIN , N. E. (1984): “Multiple Hypothesis Testing,” inHandbook of Econometrics, Volume 2, ed. by Z. Griliches,andM. D. Intrilligator, chap. 14, pp. 827–879. North-Holland, Amsterdam.

SCHEFFÉ, H. (1959):The Analysis of Variance. John Wiley & Sons, New York.

SHEA, J. (1997): “Instrument Relevance in Multivariate Linear Models: A Simple Measure,”Review of Economics andStatistics, LXXIX, 348–352.

SHILOV, G. E.(1961):An Introduction to the Theory of Linear Spaces. Prentice Hall, Englewood Cliffs, New Jersey.

STAIGER, D., AND J. H. STOCK (1997): “Instrumental Variables Regression with Weak Instruments,”Econometrica,65, 557–586.

STARTZ, R., C. R. NELSON, AND E. ZIVOT (1999): “Improved Inference for the Instrumental Variable Estimator,”Discussion paper, Department of Economics, University of Washington.

STOCK, J. H.,AND J. H. WRIGHT (2000): “GMM with Weak Identification,”Econometrica, 68, 1097–1126.

STOCK, J. H., J. H. WRIGHT, AND M. YOGO (2002): “A Survey of Weak Instruments and Weak Identification inGeneralized Method of Moments,”Journal of Business and Economic Statistics, 20(4), 518–529.

STOCK, J. H.,AND M. YOGO (2002): “Testing for Weak Instruments in Linear IV Regression,” Discussion Paper 284,N.B.E.R., Harvard University, Cambridge, Massachusetts.

WANG, J.,AND E. ZIVOT (1998): “Inference on Structural Parameters in Instrumental Variables Regression with WeakInstruments,”Econometrica, 66, 1389–1404.

ZIVOT, E., R. STARTZ, AND C. R. NELSON (1998): “Valid Confidence Intervals and Inference in the Presence of WeakInstruments,”International Economic Review, 39, 1119–1144.

46