supplementary materials for weak instruments in iv ...regression: theory and practice isaiah...

14
Supplementary Materials for Weak Instruments in IV Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun August 2, 2018 This supplement accompanies the review article “Weak Instruments in IV Regression: Theory and Practice” by Isaiah Andrews, James Stock, and Liyang Sun. Section A describes the collec- tion of American Economic Review (AER) articles and specifications for tabulations reported in the review article. Section B discusses the calibration of simulations to these results, with an empha- sis on estimation of variance-covariance matrix for the reduced-form and first-stage estimates from our collected linear instrumental variables (IV) specifications. Section (C) provides details on size simulations in calibrations to AER specifications. Section (D) presents the results underlying the discussion of AR confidence sets in Section 5.1 of the review article. Section E overviews available Stata implementations of the weak-IV robust procedures discussed in the main text. A Publication Selection Criterion To examine the practical relevance of weak instrument issues, we select recent publications in the American Economic Review (AER). We first find articles published in the AER between January 2014 to June 2018 with the keyword “instrument” in their abstract, excluding those which did not relate to instrumental variables. We exclude articles published in the May issue. There are 22 articles that meet such criteria. We then exclude five articles that do not estimate linear instrumental variables (IV) model. Table 1 lists the resulting 17 articles. From these 17 articles, we collect all IV specifications reported in their main text (12 IV specifications are excluded because only first-stage and reduced-form estimates are reported). This yields a total of 230 specifications, which is the “AER sample” in the main text. For each specification, we record the number of endogenous regressors p and the number of instruments k. In Table 2, we report summary statistics on p and k for these specifications. In one article only, all specifications have multiple endogenous regressors (p> 1). While two other articles contain a few specifications with multiple endogenous regressors, 211 of the 230 specifications collected have a single endogenous regressor (p =1), which are the focus of our discussion. Among 1

Upload: others

Post on 04-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supplementary Materials for Weak Instruments in IV ...Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun August 2, 2018 ... We do not use the “main specification”

Supplementary Materials for Weak Instruments in IV

Regression: Theory and Practice

Isaiah Andrews, James Stock, and Liyang Sun

August 2, 2018

This supplement accompanies the review article “Weak Instruments in IV Regression: Theoryand Practice” by Isaiah Andrews, James Stock, and Liyang Sun. Section A describes the collec-tion of American Economic Review (AER) articles and specifications for tabulations reported in thereview article. Section B discusses the calibration of simulations to these results, with an empha-sis on estimation of variance-covariance matrix for the reduced-form and first-stage estimates fromour collected linear instrumental variables (IV) specifications. Section (C) provides details on sizesimulations in calibrations to AER specifications. Section (D) presents the results underlying thediscussion of AR confidence sets in Section 5.1 of the review article. Section E overviews availableStata implementations of the weak-IV robust procedures discussed in the main text.

A Publication Selection Criterion

To examine the practical relevance of weak instrument issues, we select recent publications in theAmerican Economic Review (AER). We first find articles published in the AER between January 2014to June 2018 with the keyword “instrument” in their abstract, excluding those which did not relateto instrumental variables. We exclude articles published in the May issue. There are 22 articlesthat meet such criteria. We then exclude five articles that do not estimate linear instrumentalvariables (IV) model. Table 1 lists the resulting 17 articles. From these 17 articles, we collect all IVspecifications reported in their main text (12 IV specifications are excluded because only first-stageand reduced-form estimates are reported). This yields a total of 230 specifications, which is the “AERsample” in the main text.

For each specification, we record the number of endogenous regressors p and the number ofinstruments k. In Table 2, we report summary statistics on p and k for these specifications. Inone article only, all specifications have multiple endogenous regressors (p > 1). While two otherarticles contain a few specifications with multiple endogenous regressors, 211 of the 230 specificationscollected have a single endogenous regressor (p = 1), which are the focus of our discussion. Among

1

Page 2: Supplementary Materials for Weak Instruments in IV ...Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun August 2, 2018 ... We do not use the “main specification”

specifications with a single endogenous regressor, there are 101 just-identified specifications (p = k =

1), found in 12 articles.For simulations, we rely on the subset of 8 articles and 124 specifications for which we could obtain

a full and non-singular variance-covariance matrix for the reduced-form and first-stage estimates,either from the published results or from posted replication data and code. All specifications used forsimulation happen to have only a single endogenous regressor (p = 1). For details on our calibrationof simulations to these data, see Section B.

For our tabulations on reported first-stage F-statistics, we rely on the subset of 14 articles and108 specifications that report F-statistics and have a single endogenous regressor. Among thesespecifications, 56 specifications are just-identified, found in 10 articles.

We recognize that not all specifications are important. To distinguish specifications that are mostimportant in each article, we code specifications as “main specifications” for each paper based on thefollowing criteria:

1. The specification is in the first table of IV estimates.

2. If in this table the outcome and endogenous variable(s) are the same across specifications i.e.the only difference is sample restriction / control variables, then we code the last specificationas “main specification”. If in this table outcome and endogenous variable(s) differ across speci-fications, restrict specifications to those with outcome and endogenous variables referenced inthe abstract or introduction of the article, then code the last specification in this restrictedsubset as “main specification”.

3. If 1) and 2) do not give a unique specification, then we code all specifications at the end of 2)as “main specifications.”

We do not use the “main specification” variable in our analysis in the main text, but report thisvariable in our replication files in case it is of interest to other researchers.

B Details on Simulation Calibration

Consider the IV model

Yi

= Xi

� +W 0i

+ "i

, (1)

Xi

= Z 0i

⇡ +W 0i

� + Vi

, (2)

for a scalar outcome Yi

, a scalar endogenous regressors Xi

, a k ⇥ 1 vector of instrumental variablesZi

, and an r ⇥ 1 vector of exogenous regressors Wi

. A linear transformation of (1) and (2) is

Yi

= Z 0i

� +W 0i

⌧ + Ui

, (3)

Xi

= Z 0i

⇡ +W 0i

� + Vi

,

with � = ⇡�.

2

Page 3: Supplementary Materials for Weak Instruments in IV ...Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun August 2, 2018 ... We do not use the “main specification”

Define (�̂, ⇡̂) as the coefficients on Zi

from the reduced-form and first-stage regressions of Yi

andX

i

, respectively, on (Zi

,Wi

). By the Frisch-Waugh theorem these are the same as the coefficients fromregressing Y

i

and Xi

on Z?i

, the part of Zi

orthothogonal to Wi

. Under mild regularity conditions(and, in the time-series case, stationarity), (�̂, ⇡̂) are consistent and asymptotically normal in thesense that

pn

�̂ � �

⇡̂ � ⇡

!!

d

N (0,⌃⇤) (4)

for

⌃⇤ =

⌃⇤

��

⌃⇤�⇡

⌃⇤⇡�

⌃⇤⇡⇡

!=

Q

Z

?Z

? 0

0 QZ

?Z

?

!�1

⇤⇤

Q

Z

?Z

? 0

0 QZ

?Z

?

!�1

where QZ

?Z

? = E[Z?i

Z?0

i

] and

⇤⇤ = limn!1

V ar

0

@

1pn

X

i

Ui

Z?0

i

,1pn

X

i

Vi

Z?0

i

!01

A .

Under standard assumptions, the sample-analog estimator Q̂Z

?Z

? = 1n

PZ?i

Z?0

i

will be consistentfor Q

Z

?Z

? , and we can construct consistent estimators ⇤̂⇤ for ⇤⇤ depending on the assumptionsimposed on the data generating process (for example whether we allow heteroskedasticity, clustering,or time-series dependence). One can then form consistent estimators for the asymptotic variancematrix ⌃⇤.

These results imply that for the two stage least square estimator,

�̂2SLS

=⇣⇡̂0Q̂

Z

?Z

? ⇡̂⌘�1

⇡̂0Q̂Z

?Z

? �̂,

assuming that IV model is correctly specified and the instruments are strong, so � = ⇡� and ⇡ takesa fixed non-zero value,

pn(�̂2SLS

� �) !d

N(0,⌃⇤�,2SLS

)

for⌃⇤

�,2SLS

=⇡0Q

Z

?Z

?(⌃⇤��

� �(⌃⇤�⇡

+ ⌃⇤⇡�

) + �2⌃⇤⇡⇡

)QZ

?Z

?⇡

(⇡0QZ

?Z

?⇡)2, (5)

which simplifies to 1⇡

2⌃⇤��

� 2 �

3⌃⇤�⇡

+ �

2

4⌃⇤⇡⇡

for p = k = 1. If we plug in consistent estimators forall the terms in the expression we obtain a consistent estimator ⌃̂⇤

�,2SLS

for ⌃⇤�,2SLS

.For reasons discussed in the main text, motivated by the asymptotic approximation (4), we

consider the case where the reduced-form and first-stage regression coefficients are jointly normal

�̂

⇡̂

!⇠ N

!,⌃

!(6)

for

⌃ =

��

⌃�⇡

⌃⇡�

⌃⇡⇡

!=

1

n

⌃⇤

��

⌃⇤�⇡

⌃⇤⇡�

⌃⇤⇡⇡

!.

3

Page 4: Supplementary Materials for Weak Instruments in IV ...Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun August 2, 2018 ... We do not use the “main specification”

The estimated variance matrix for (�̂, ⇡̂) is thus ⌃̂ = 1n

⌃̂⇤. For our simulation exercise, we calibratethe normal model (6) to IV specifications in the AER sample, with ⇡ set to the estimate ⇡̂ in the data,� set to ⇡̂�̂2SLS

, and ⌃ set to the estimated variance matrix for (�̂, ⇡̂) under the same assumptionsused by the original authors. Most articles report estimates (�̂, ⇡̂) and their standard error, whosesquared terms are variance estimates of �̂ and ⇡̂. However, none of the articles reports the covarianceestimate between �̂ and ⇡̂. So we replicate the covariance estimate for as many specifications aspossible, as well as any parameter estimate that is not reported in the article.

B.1 Calibration by Direct Calculation

With replication data and code, we can directly estimate ⌃ by jointly estimating (�̂, ⇡̂) in a seeminglyunrelated regression under the same assumptions used by the original authors. Appropriate estimates⌃̂ are then generated automatically by standard statistical software e.g. suest or avar in Stata.1

We are able to obtain (�̂, ⇡̂) and its estimated variance matrix this way for 116 specifications from7 articles. We exclude one specification from simulation because ⌃̂ is singular. For overidentifiedspecifications, we also obtain Q̂

Z

?Z

? .

B.2 Calibration Based on 2SLS Standard Error

For specifications with p = k = 1, the 2SLS squared standard error is estimated as

⌃̂�,2SLS

=1

⇡̂2⌃̂

��

� 2�̂

⇡̂3⌃̂

�⇡

+�̂2

⇡̂4⌃̂

⇡⇡

.

Thus, we can solve for ⌃̂�⇡

using ⌃̂�,2SLS

, �̂, ⇡̂, ⌃̂��

and ⌃̂⇡⇡

. We are able to obtain (�̂, ⇡̂) and itsestimated variance matrix for 12 specifications found in Lundborg et al. (2017). We exclude threespecifications from simulation because ⌃̂ is singular.

B.3 First-stage F Statistics

To facilitate the discussion on usage of first-stage F-statistics in detecting weak instruments, for eachspecification we record the following features of each specification:

1. whether first-stage F-statistic is reported;

2. what type of the reported first-stage F-statistics is computed; We can infer this based onlabels used by authors in text, or by replicating the reported F-statistics to infer authors’choice of F-statistics. If no explicit discussion on F-statistic is found in text or replicationdata is not available, we calculate F = 1

k

⇡̂0⌃̂�1⇡⇡

⇡̂ based on reported first-stage estimates ⇡̂ and⌃̂

⇡⇡

. For an estimate ⌃̂⇡⇡

that does not assume homoskedasticity, this would match with non-homoskedasticity-robust F-statistic FR. For an estimate ⌃̂

⇡⇡

that assumes homoskedasticity,1Some articles report first-stage and reduced-form variances based on Stata’s regress routine, which applies a

degree-of-freedom adjustment to the variance estimate of �̂ and ⇡̂ by default. In contrast, Stata’s linear IV regression

routines do not apply such adjustment by default. For these articles, we replicate the variance matrix estimate for

(�̂, ⇡̂) with the degree-of-freedom adjustment for consistency.

4

Page 5: Supplementary Materials for Weak Instruments in IV ...Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun August 2, 2018 ... We do not use the “main specification”

this would match with the traditional, non-robust F-statistic FN . We categorize specificationsthat do not match with our calculation as “unknown”. For specifications specifications thatreport robust ⌃̂

⇡⇡

, but computed FR does not match with reported F-statistic, it could beeither that reported F-statistic is FN or authors use a different degree-of-freedom adjustmentin calculating FR.

3. what label is used by authors in text for their reported first-stage F-statistics; If there isno mention of the type of first-stage F-statistics, we categorize these specifications as “nodiscussion”

4. whether any weak-instruments robust methods are used.

In Table 3, we summarize the distribution of first-stage F statistics.As an alternative to the robust and non-robust F-statistics in non-homoskedastic settings, Olea

and Pflueger (2013) proposed the effective F statistic. The effective F statistic can be written

FEff =⇡̂0Q̂

Z

?Z

? ⇡̂

tr⇣⌃̂

⇡⇡

Q̂Z

?Z

?

⌘ (7)

The effective F-statistic is equivalent to non-robust F-statistic in homoskedastic settings. The effec-tive F-statistic is equivalent to robust F-statistic in non-homoskedastic settings only when k = 1.

Under the normal model (6) with known variance, the average value of the effective F-statistic is⇡

0QZ?Z?⇡

tr

⇣⌃⇡⇡QZ?Z?

⌘ + 1.

C Details on Size Simulations

In this section we describe how we calculate sizes of various tests using simulations based on ourAER sample. Based on the normal model (6) with known variance, under a given null hypothesisH0 : � = �0, we have g(�0) = �̂ � ⇡̂�0 ⇠ N(0,⌦(�0)) for

⌦ (�0) = ⌃��

� �0⌃�⇡

� �0⌃⇡�

+ �20⌃⇡⇡

.

Denote by �̂2SLS

the 2SLS estimate and ⌃⇤�,2SLS

its asymptotic variance estimate based on Expres-sion (5). Define the t-statistic as t(�0) =

pn �̂2SLS��0

⌃̂⇤1/2�,2SLS

. The size-↵ t-test of H0 : � = �0 is

�t

(�0) = 1{t(�0) > c1�↵

} (8)

where c1�↵

is the 1�↵ quantile of standard normal distribution. Besides the t-test, we also considert-test after screening on the first-stage effective F-statistic. In this case, we only evaluate the t-testwhen FEff � 10,

�screen

(�0) =

8<

:�t

(�0), FEff > 10

undefined FEff < 10. (9)

5

Page 6: Supplementary Materials for Weak Instruments in IV ...Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun August 2, 2018 ... We do not use the “main specification”

Define the AR statistic as AR(�0) = g(�0)0⌦(�0)�1g(�0). The size-↵ AR test of H0 : � = �0 is

�AR

(�0) = 1{AR(�0) > �2k,1�↵

} (10)

where �2k,1�↵

is the 1� ↵ quantile of �2k

distribution. Lastly, we consider the two-step test

�twostep

(�0) =

8<

:�t

(�0), FEff > 10

�AR

(�0), FEff < 10. (11)

In other results (not reported, but available in replication files) we also considered cutoffs based onthe Olea and Pflueger (2013) critical values.

We are also interested in testing correct specification H0 : � � ⇡�0 = 0. Define the J-statistic asJ = g(�̂2SGMM

)0⌦(�̂2SGMM

)�1g(�̂2SGMM

) where �̂2SGMM

=⇣⇡̂0⌦(�̂2SLS

)�1⇡̂⌘�1

⇡̂0⌦(�̂2SLS

)�1�̂

is the efficient two-step GMM estimator, with first-step estimator being the 2SLS estimator �̂2SLS

.The size-↵ over-identification test is

�overid

= 1{J > �2k�1,1�↵

} (12)

where �2k�1,1�↵

is the 1� ↵ quantile of �2k�1 distribution.

We calculate the size of tests with respect to H0 : � = �0 for all 124 specifications that we replicatebased on simulations as described in Section C.1 and C.2. We report these results in the main text.For over-identification test, we focus on 90 out of these 124 specifications that are over-identified.We calculate the size based on simulations calibrated to AER sample as described in Section C.1 andreport the results in Figure 1 and 2. These 90 specifications are collected from three articles.

6

Page 7: Supplementary Materials for Weak Instruments in IV ...Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun August 2, 2018 ... We do not use the “main specification”

0 5 10 15 20 25 30 35 40 45 50

Average Effective F-Statistic

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Siz

e o

f 5%

Ove

ridentif

icatio

n T

est

Figure 1: Rejection probability for nominal 5% over-identification test, plotted against average first-stage effective F-statistic in calibrations to AER sample. Limited to the 81 out of 90 over-identifiedspecifications with average F smaller than 50.

0 5 10 15 20 25 30 35 40 45 50

Average Effective F-Statistic

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Siz

e o

f 5%

Ove

ridentif

icatio

n T

est

, S

creened o

n F

Figure 2: Rejection probability for nominal 5% over-identification test after screening on the first-stage effective F-statistic, plotted against average first-stage effective F-statistic in calibrations toAER sample. Limited to the 81 out of 90 over-identified specifications with average F smaller than50.

7

Page 8: Supplementary Materials for Weak Instruments in IV ...Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun August 2, 2018 ... We do not use the “main specification”

C.1 Simulations calibrated to AER sample

In simulation runs s = 1, . . . , S for S = 10000, we draw (�̂⇤, ⇡̂⇤) from the normal model (6) calibratedto IV specifications from the AER sample, with ⇡ set to the estimate ⇡̂ in the data, � set to ⇡̂�̂2SLS

,and ⌃ set to the estimated variance matrix for (�̂, ⇡̂) under the same assumptions used by the originalauthors. The null is thus set to �0 = �̂2SLS

. We calculate the t-statistic t⇤(�0) based on the 2SLS

estimate at each draw �̂⇤2SLS

=⇣⇡̂⇤0

Q̂Z

?Z

? ⇡̂⇤⌘�1

⇡̂⇤0Q̂

Z

?Z

? �̂⇤ and its asymptotic variance estimate

⌃̂⇤�,2SLS

. We calculate the AR statistic AR⇤(�0) based on the moment function at each draw g(�0)⇤

and its variance ⌦(�0). We calculate the J-statistic J⇤ based on g(�̂⇤2SGMM

)⇤ and ⌦(�̂⇤2SGMM

). Thesize of each test is calculated as the average of �⇤

j

(�0) across simulation runs. We also record themedian of t⇤(�0) across simulation runs.

C.2 Bayesian exercise

To account for uncertainty in estimating (�,⇡), we adopt a Bayesian approach consistent with thenormal model (6). Specifically, we calculate the posterior distribution on (�,⇡) after observingestimates (�̂, ⇡̂) based on the normal likelihood from (6) with ⌃ set to the estimated variance matrixfor (�̂, ⇡̂) and a flat prior. Then in d = 1, . . . , D for D = 1000, we draw (�̃, ⇡̃) based on the posteriordistribution

N

�̂

⇡̂

!,⌃

!(13)

for (�,⇡) under a flat prior, setting � =⇣⇡

0Q

Z

?Z

?⇡⌘�1

⇡0Q

Z

?Z

?�. To obtain a posterior distribution

on the size of each test, at each posterior draw (�̃, ⇡̃) where �̃ = ⇡̃�̃, we calculate the size of eachtest by simulations as described above where the null is set to �0 = �̃. Note that we set �̃ = ⇡̃�̃ toensure that our simulation designs are consistent with the IV model.

D AR Confidence Sets in Applications

In this section, we provide the underlying results for the discussion of AR test in Section 5.1. Fork = p = 1, the 5% AR confidence set can be calculated analytically by solving the following quadraticinequality

(�̂ � ⇡̂�0)2 �2

1,1�↵

(⌃̂��

� �0⌃̂�⇡

� �0⌃̂⇡�

+ �20⌃̂⇡⇡

)

) (⇡̂2 � �21,1�↵

⌃̂⇡⇡

)�20 + (2�2

1,1�↵

⌃̂�⇡

� 2�̂⇡̂)�0 + �̂2 � �21,1�↵

⌃̂�

0 (14)

where �21,1�↵

is the 1 � ↵ quantile of �21 distribution. If (⇡̂2 � �2

1,1�↵

⌃̂⇡

) < 0, the confidence set isunbounded, which happens when we cannot reject ⇡ = 0 based on a 5% t-test.

In Table 4, we list the AR confidence set for 34 just-identified specifications out of 124 specifica-tions that we replicate.

8

Page 9: Supplementary Materials for Weak Instruments in IV ...Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun August 2, 2018 ... We do not use the “main specification”

E Weak-IV Robust Procedures in Stata

In replicating IV specifications in our AER sample, we noticed that the ivreg2 suite, describedin Baum et al. (2007), remains a common toolkit for linear IV estimation. ivreg2 implements theStock and Yogo (2005) weak instrument test and reports confidence sets for coefficients on endogenousregressors based on the t-test.

As discussed in the main text, the Stock and Yogo (2005) weak instrument test is only validin the homoskedastic case. The weak instrument test of Olea and Pflueger (2013) is robust toheteroskedasticity, autocorrelation, and clustering. It is thus the preferred test for detecting weakinstruments in the over-identified, non-homoskedastic setting. A recent Stata package weakivtest

by Pflueger and Wang (2015) implements this test. It computes the effective F-statistic and tabulatescritical values based on Olea and Pflueger (2013).

The weak instrument test of Olea and Pflueger (2013) concerns only the Nagar (1959) biasapproximation, not size distortions in conventional inference procedures (t-tests), though as discussedin the text, in the k = 1 case one can use the Olea and Pflueger (2013) effective F-statistic, or theKleibergen-Paap statistic reported by ivreg2, along with the Stock and Yogo (2005) critical valuesto test for size distortions. Our review paper discusses several tests robust to weak instruments andexplains how to construct a level 1�↵ confidence set based on test inversion. In just-identified models,the AR test is efficient and thus recommended. In over-identified models with a single endogenousregressor and homoskedastic errors, the CLR test has good properties. Except for the AR test andthe Kleibergen score test, these robust tests require simulations to calculate their critical values inmany cases, which can be computationally costly. Below we describe several recent Stata packagesthat augment ivreg2 in terms of inference with weak instruments.2

For the p = 1 and homoskedastic case, the Stata package condivreg by Mikusheva and Poi (2006)computes the AR, Kleibergen score, and CLR confidence sets. This routine implements algorithmsproposed by Mikusheva (2010) that allow one to construct confidence sets by quickly and accuratelyinverting these tests without having to use grid search.

For the non-homoskedastic and p � 1 case, the Stata package weakiv computes the AR, Kleiber-gen score, and CLR confidence sets based on grid search. Finlay and Magnusson (2009) describe aprevious version of this package called rivtest. Together with simulations for critical values of theCLR test, the grid search can be computationally demanding.

As an alternative to a two-step confidence set based on first-stage F-statistic, Andrews (2018)propose a two-step weak-instruments-robust confidence set. The Stata package twostepweakiv bySun (forthcoming) computes such confidence sets based on grid search. While twostepweakiv doesnot need to simulate critical values for most cases, the grid search alone can be computationallydemanding.

For the p > 1 cases, one may be interested in subvector inference. weakiv implements thetraditional projection method and twostepweakiv implements the refined projection method basedon Chaudhuri and Zivot (2011).

To summarize, in homoskedastic settings, ivreg2 conducts valid weak instrument tests and2ivreg2 performs the Anderson-Rubin (AR) test for the null H0 : � = 0, but does not calculate a confidence set.

9

Page 10: Supplementary Materials for Weak Instruments in IV ...Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun August 2, 2018 ... We do not use the “main specification”

condivreg calculates weak-instrument-robust confidence sets. In non-homoskedastic settings, weakivtestconducts valid weak instrument tests, but does not guarantee valid inference. For k = 1 one canuse the effective F-statistic reported by weakivtest, or the Kleibergen-Paap statistic reported byivreg2, along with the Stock and Yogo (2005) critical values to test for size distortions. To constructrobust confidence sets, weakiv or twostepweakiv should be used.3

References

Andrews, Isaiah, “Valid Two-Step Identification-Robust Confidence Sets for GMM,” Review ofEconomics and Statistics, 2018, 100 (2), 337–348.

Baum, Christopher F, Mark E. Schaffer, and Steven Stillman, “Enhanced routines forinstrumental variables/generalized method of moments estimation and testing,” Stata Journal,December 2007, 7 (4), 465–506.

Chaudhuri, Saraswata and Eric Zivot, “A new method of projection-based inference in GMMwith weakly identified nuisance parameters,” Journal of Econometrics, 2011, 164 (2), 239–251.

Finlay, Keith and Leandro Magnusson, “Implementing weak-instrument robust tests for a gen-eral class of instrumental-variables models,” Stata Journal, 2009, 9 (3), 398–421.

Lundborg, Petter, Erik Plug, and Astrid Würtz Rasmussen, “Can Women Have Childrenand a Career? IV Evidence from IVF Treatments,” American Economic Review, June 2017, 107(6), 1611–1637.

Mikusheva, Anna, “Robust confidence sets in the presence of weak instruments,” Journal of Econo-metrics, 2010, 157 (2), 236 – 247.

and Brian P. Poi, “Tests and confidence sets with correct size when instruments are potentiallyweak,” Stata Journal, September 2006, 6 (3), 335–347.

Nagar, A. L., “The Bias and Moment Matrix of the General k-Class Estimators of the Parametersin Simultaneous Equations,” Econometrica, 1959, 27 (4), 575–595.

Olea, José Luis Montiel and Carolin Pflueger, “A Robust Test for Weak Instruments,” Journalof Business & Economic Statistics, July 2013, 31 (3), 358–369.

Pflueger, Carolin E. and Su Wang, “A robust test for weak instruments in Stata,” Stata Journal,March 2015, 15 (1), 216–225.

Stock, James H. and Motohiro Yogo, “Testing for Weak Instruments in Linear IV Regression,”in Donald W. K. Andrews and James H. Stock, eds., Identification and Inference for EconometricModels: Essays in Honor of Thomas Rothenberg, Cambridge University Press, 2005, pp. 80–108.

Sun, Liyang, “Implementing valid two-step identification-robust confidence sets for linearinstrumental-variables models,” Stata Journal, forthcoming.3Unfortunately, weakivtest and twostepweakiv are only compatible with the syntax of ivreg2. weakiv is also

compatible with the syntax of other State IV model packages including xtivreg2, which accommodates fixed effects.

10

Page 11: Supplementary Materials for Weak Instruments in IV ...Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun August 2, 2018 ... We do not use the “main specification”

Table 2: Summary statistics

(a) unweighted

p # specifications % just-identified % over-identified Avg. k | over-identified1 211 48% 52% 21.952 5 40% 60% 48.673 2 0% 100% 584 12 0% 100% 8

(b) weighted by the inverse of the number of specifications in each article

p # articles % just-identified % over-identified Avg. k | over-identified1 15.63 74.23% 25.78% 23.012 0.19 26.19% 73.80% 72.683 0.18 0% 100% 584 1 0% 100% 8

Notes: In panel (a), we tabulate the distribution of p and k by specifications. The sample consists of230 specifications. In panel (b), we tabulate the distribution of p and k by specifications, weightedby the inverse of the number of specifications in each article so that each article receives the sameweight.

11

Page 12: Supplementary Materials for Weak Instruments in IV ...Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun August 2, 2018 ... We do not use the “main specification”

Tabl

e1:

Sele

cted

publ

icat

ions

Art

icle

IDIs

sue

Tit

le#

ofsp

ecifi

cati

ons

Rep

licat

ion

files

avai

labl

e1

104

(1)

Imm

igra

tion

and

the

Diff

usio

nof

Tech

nolo

gy:

The

Hug

ueno

tD

iasp

ora

inP

russ

ia11

1

210

4(1

0)G

erm

anJe

wis

hE

mig

res

and

US

Inve

ntio

n10

13

104

(11)

Stru

ctur

alTr

ansf

orm

atio

n,th

eM

ism

easu

rem

ent

ofP

rodu

ctiv

ityG

row

th,a

ndth

eC

ost

Dis

ease

ofSe

rvic

es40

1

410

4(6

)C

ompu

lsor

yE

duca

tion

and

the

Ben

efits

ofSc

hool

ing

301

510

4(6

)T

heW

age

Effe

cts

ofO

ffsho

ring

:E

vide

nce

from

Dan

ish

Mat

ched

Wor

ker-

Firm

Dat

a12

0

610

4(7

)M

afia

and

Pub

licSp

endi

ng:

Evi

denc

eon

the

Fisc

alM

ulti

plie

rfr

oma

Qua

si-e

xper

imen

t20

1

710

5(1

2)M

edia

Influ

ence

son

Soci

alO

utco

mes

:T

heIm

pact

ofM

TV

’s16

and

Pre

gnan

ton

Teen

Chi

ldbe

arin

g8

0

810

5(2

)T

heIm

pact

ofth

eG

reat

Mig

rati

onon

Mor

talit

yof

Afr

ican

Am

eric

ans:

Evi

denc

efr

omth

eD

eep

Sout

h6

0

910

5(3

)C

redi

tSu

pply

and

the

Pri

ceof

Hou

sing

31

1010

5(4

)M

edic

are

Part

D:A

reIn

sure

rsG

amin

gth

eLo

wIn

com

eSu

bsid

yD

esig

n6

1

1110

6(3

)H

owD

oE

lect

rici

tySh

orta

ges

Affe

ctIn

dust

ry?

Evi

denc

efr

omIn

dia

80

1210

6(6

)C

hart

ers

wit

hout

Lott

erie

s:Te

stin

gTa

keov

ers

inN

ewO

rlea

nsan

dB

osto

n40

0

1310

7(2

)T

heIm

pact

ofFa

mily

Inco

me

onC

hild

Ach

ieve

men

t:E

vide

nce

from

the

Ear

ned

Inco

me

Tax

Cre

dit:

Com

men

t4

0

1410

7(6

)C

anW

omen

Hav

eC

hild

ren

and

aC

aree

r?IV

Evi

denc

efr

omIV

FTr

eatm

ents

120

1510

7(9

)V

irtu

alcl

assr

oom

s:ho

won

line

colle

geco

urse

saff

ect

stud

ent

succ

ess

40

1610

8(2

)E

xpor

tD

esti

nati

ons

and

Inpu

tP

rice

s11

017

108

(3)

Dis

enta

nglin

gth

eeff

ects

ofa

bank

ing

cris

is:

evid

ence

from

Ger

man

firm

san

dco

unties

50

Not

es:

we

reco

rdw

heth

erre

plic

atio

nsfil

esar

eav

aila

ble

tore

plic

ate

alls

peci

ficat

ions

cont

aine

din

anar

ticl

e.Fo

raco

mpl

ete

listo

fspe

cific

atio

ns,

plea

sese

ere

plic

atio

nfil

es.

12

Page 13: Supplementary Materials for Weak Instruments in IV ...Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun August 2, 2018 ... We do not use the “main specification”

Table 3: Summary statistics on F statistics for specifications with p = 1

(a) unweighted; level of observation is specification

k # spec % report F Avg. F % F > 10 % F > SY cutoffs SY cutoffs % use Avg. Frobust tests

1 101 55.44% 3483.23 89.29% 92.86% 8.96 0%2 28 78.57% 33.75 86.36% 68.18% 11.59 0%3 30 60% 49.27 88.89% 88.89% 12.83 100% 49.2716 4 100% 3.96 0% 0% 27.99 100% 3.9621 4 0% 33.97 0%23 4 100% 57.1 100% 100% 36.37 0%28 16 0% 42.37 0%59 20 0% 44.78 0%100 4 100% 2.88 0% 0% 44.78 100% 2.88total 211 51.18% 1823.572 82.41% 18% 35.16

(b) weighted by the inverse of the number of specifications in each article

k # articles % report F Avg. F % F > 10 % F > SY cutoffs SY cutoffs % use Avg. Frobust tests

1 11.6 64.99% 2228.85 89.31% 91.96% 8.96 0%2 1.2 87.5% 24.02 85.71% 66.67% 11.59 0%3 1 60% 49.27 88.89% 88.89% 12.83 100% 49.2716 0.36 100% 3.96 0% 0% 27.99 100% 3.9621 0.1 0% 33.97 0%23 0.1 100% 57.1 100% 100% 36.37 0%28 0.4 0% 42.37 0%59 0.5 0% 44.78 0%100 0.36 100% 2.88 0% 0% 44.78 100% 2.88total 15.63 64.09% 1683.87 82.53% 11.06% 24.15

Notes: Column (3), (4), (5) are conditional on reporting F statistics. Column (8) is conditional onusing robust tests. There are four specifications report F statistics being >614. We take them to be614. The SY cutoffs ensure the maximal size is 15% of a 5% Wald test. Tabulation is only availablefor k 30. For k > 30, we use the cutoff for k = 30, which is 44.78. Among the 16 articles with anyspecifications with p = 1, there are 14 articles that report some F statistics, one reports p-value andthe other none.

13

Page 14: Supplementary Materials for Weak Instruments in IV ...Regression: Theory and Practice Isaiah Andrews, James Stock, and Liyang Sun August 2, 2018 ... We do not use the “main specification”

Tabl

e4:

AR

confi

denc

ese

tsfo

rju

st-id

enti

fied

spec

ifica

tion

s

Art

icle

IDTa

ble

Rep

orte

dC

onst

ruct

edIV

esti

mat

eIV

stan

dard

erro

r5%

AR

confi

denc

ese

tre

fere

nce

Fst

atis

tic

Fst

atis

tic

14Ta

ble

3Pa

nelA

(1)

3842

730

102.

25-7

0088

2054

[-74

104.

73,-

6605

2.54

]14

Tabl

e3

Pane

lA(2

)38

427

3010

2.25

-0.0

70.

01[-

0.08

,-0.

06]

14Ta

ble

3Pa

nelA

(3)

3842

730

102.

25-5

.91

0.19

[-6.

19,-

5.44

]14

Tabl

e3

Pane

lB(1

)62

8164

00-2

9378

5285

[-39

726.

2,-

1900

2.77

]14

Tabl

e3

Pane

lB(2

)62

8164

00-0

.04

0.01

[-0.

06,-

0.02

]14

Tabl

e3

Pane

lB(3

)62

8164

001.

470.

36[0

.79

,2.1

8]

14Ta

ble

3Pa

nelB

(4)

6281

6400

-26.

854.

45[-

35.8

7,-

18.4

1]

14Ta

ble

3Pa

nelC

(1)

2273

2061

.16

-306

7510

546

[-51

361.

01,-

9982

.12

]14

Tabl

e3

Pane

lC(2

)22

7320

61.1

6-0

.02

0.02

[-0.

06,0

.03

]10

Tabl

e8

(1)

33.9

110

.50.

540.

24[0

.22

,1.5

8]

10Ta

ble

8(2

)26

.68

7.99

0.62

0.32

[0.0

5,1

.98

]10

Tabl

e8

(3)

46.3

516

.21

0.69

0.24

[0.3

3,1

.5]

10Ta

ble

8(4

)43

.38

7.99

0.63

0.34

[0.1

2,2

.26

]10

Tabl

e8

(5)

44.2

816

.75

0.75

0.25

[0.3

8,1

.62

]10

Tabl

e8

(6)

40.1

88.

170.

70.

31[0

.22

,2.2

]9

Tabl

e5

(1)

10.1

39.

970.

140.

07[0

.04

,0.4

5]

9Ta

ble

5(2

)10

.31

10.3

10.

130.

06[0

.04

,0.4

]9

Tabl

e5

(3)

8.59

8.6

0.12

0.06

[0.0

4,0

.41

]2

Tabl

e4

(1)

20.8

20.8

121

8.71

60.6

1[1

07.4

,373

.95

]2

Tabl

e4

(2)

18.2

518

.25

170.

1457

.99

[63.

91,3

24.1

9]

2Ta

ble

4(3

)9.

799.

7925

.72

8.75

[14.

07,6

7.3

]2

Tabl

e4

(4)

8.99

8.99

17.1

46.

91[7

.18

,49.

31]

2Ta

ble

6(4

)21

.65

21.8

5-0

.02

0.01

[-0.

04,-

0.01

]1

Tabl

e4

(3)

3.66

83.

673.

481.

16(-1

,-58

.44

][[1

.55,1

)1

Tabl

e4

(5)

4.79

24.

793.

381.

14[1

.64

,19.

16]

1Ta

ble

5(3

)5.

736

5.74

1.67

0.85

[-0.

46,5

.94

]1

Tabl

e5

(6)

15.3

515

.35

0.07

0.04

[-0.

01,0

.16

]1

Tabl

e6

Pane

lC(1

)N

A4.

21.

631.

01[-

1.33

,18.

2]

1Ta

ble

6Pa

nelC

(2)

NA

5.48

1.11

1.21

[-2.

57,6

.87

]1

Tabl

e6

Pane

lC(3

)N

A5.

81.

530.

83[-

0.63

,5.6

2]

1Ta

ble

6Pa

nelC

(4)

NA

6.09

1.7

0.75

[-0.

34,4

.84

]1

Tabl

e6

Pane

lC(5

)N

A6.

72.

070.

52[0

.95

,4.5

2]

1Ta

ble

6Pa

nelC

(6)

NA

2.88

3.08

1.9

(-1

,-6.

14][

[-2.

29,1

)1

Tabl

e6

Pane

lC(7

)N

A5.

761.

840.

97[-

0.71

,6.4

4]

Not

es:

Art

icle

IDre

fers

toar

ticl

eid

enti

ficat

ion

inTa

ble

1.Ta

ble

refe

renc

ere

fers

toth

eta

ble

cont

aini

ngIV

esti

mat

esin

the

corr

espo

ndin

gar

ticl

e.W

eco

nstr

uct

Fst

atis

tic

for

each

spec

ifica

tion

by⇡| (⌃

⇡⇡

/N)�

1⇡/k.

14