big data in economics

Big Data in Economics

Chiara Binelli

Academic year 2019-2020

Email: [email protected]

mailto:[email protected]

Randomized Experiments

• The goal of randomized experiments is to identify…

The causal effect!

Advantage of causal effect described by statisticians:

“The advantage of causal predictors compared with non-causal

predictors is that their influence on the target variable remains

invariant under different changes of the environment.”

(Peters, Buhlmann and Meinshausen 2016, Journal of the

Royal Statistical Society)

• Correlation is not causation!

Randomized Experiments

• The gold standard to estimate a causal effect is a

randomized experiment.

• The validity of a randomized experiment depends on:

1. Randomization.

2. Well constructed control group.

1) What to randomize on:

1. Randomize eligibility

2. Randomize after acceptance into the program

3. Randomize incentives for take-up

Randomize after acceptance into the program:

– R=1 if randomized in (treatment group)

– R=0 if randomized out (control group)

– D denotes if someone applies to the program and is subject to

randomization [here D=1 for all people who are in the randomization]

– Random assignment implies:

• For treatment group: E(Y1|X, D=1,R=1) = E(Y1|X, D=1)

• For control group: E(Y0|X, D=1, R=0) = E(Y0|X, D=1)

Experiment gives TTE = E(Y1-Y0|X, D=1)

What to Take into Account when

Conducting a Randomized Experiment?

2) Power calculations

• Def: power of the design is the probability that, for a given effect

size and statistical significance level, we will be able to reject the

hypothesis of zero effect.

• Design choices that affect the “power” of an experiment:

– Sample size

– Minimum size of the effect that the researcher wants to be able to detect

– Multiple treatment groups

– Partial compliance and drop out

– Control variables (important to know how much they absorb of the

residual variance)

• Standard softwares for the single-site case (“power” command in

Stata)

• Multi-site power analyses get complicated

– Need to know the impact variation and correlations across sites



3) Choosing the sites in multi-site experiments

• External validity: choose sites at random

• Realistic impacts: choose sites that are representative

• Efficacy: choose sites that will best implement the

treatment

• Avoid contamination: choose sites with little or no

contact of any sort



Examples of Randomized Experiments

• Large-scale experiments, e.g. in the US/Canada:

– US National JTPA (Job Training Partnership Act) Study,

Tennessee class size experiment (STAR)

• More recently, randomized experiments in developing

countries:

– Small experiments addressing very specific questions, for

example microfinance experiments by Dean Karlan, education

experiments (e.g. schooling inputs) by Michael Kremer and

Esther Duflo, etc.

– Example of a large-scale and very successful conditional cash

transfer program: Progresa/Oportunidades in Mexico (1997-

2003)

Example: the STAR Experiment

(Stock and Watson Ch. 13)

Tennessee Project STAR (Student-Teacher Achievement Ratio):

4-year US study for an overall budget of $12 million. 79

Tennessee public schools for a single cohort of students in

kindergarten through third grade in the years 1985-89.

Upon entering the school system a student was randomly

assigned to one of three groups:

Regular class (22 – 25 students).

Regular class + full-time teacher’s aide.

Small class (13 – 17 students).

Regular classes’ students re-randomized after first year to regular

or regular + aide class.

Y = Stanford achievement test scores.

“Natural” (or Quasi-) Experiments

A quasi-experiment or natural experiment: “nature” provides random events that can be used as a source of exogenous variation.

Treatment (D) is “as if” randomly assigned.

Example:

Effect of changes in minimum wage on employment.

D = change in minimum wage law in some States (it changes

only in some States, thus State is “as if” randomly assigned).

The natural random event operates as an instrumental variable:

Relevance: it is strongly correlated with the treatment D (so

much that it defines the treatment!).

Exogeneity: it does not affect the outcome Y rather than via

the treatment D.


Idea of quasi-experiments follows that of “real” randomized

experiments:

find exogenous source of variation (i.e. variable that

affects participation but not the outcome directly)

Important to understand the source of variation that helps

to identify the treatment effect


Disadvantage: small amount of random events provided by nature…

Advantage: when the nature provides random events, they

can usefully be exploited. Example: Card D. and Krueger A.

(1994) “Minimum Wages and Employment: A Case Study of

the Fast-Food Industry in New Jersey and Pennsylvania”,

American Economic Review, Vol. 84, No. 4, pp. 772-793.

Regression Analysis of Experiments

for Differences Estimator• In an ideal randomized controlled experiment the

treatment D is randomly assigned:

Y=a+b*D+u (1)

• If D is randomly assigned, then u and D are independently

distributed, E(u|D)=0 (conditional mean independence)

dE(Y|D)/dD=b average causal effect of D on Y

OLS of (1) gives an unbiased estimate of the causal

effect of D on Y.

• When the treatment is binary, the causal effect b is the

difference in mean outcomes in treatment vs control.

This difference in means is the differences estimator


for Differences EstimatorWe can add covariates X to the model: Y=a+b*D+c*X+u (2)

Advantages of adding the covariates X:

1. Check if randomization worked: if D is randomly assigned, the

OLS estimates of b in model (1) and (2) (that is with and without

the covariates X) should be similar – if they aren’t, this suggests

that D was not randomly assigned

• NOTE: to check directly for randomization, we can regress the

treatment indicator, D, on the covariates X, and do a F-test.

2. Increases efficiency: smaller standard errors

3. Adjust for conditional randomization (apply conditional

randomization if interested in treatment effects for different

groups; for example schools’ effects if randomization was within

but not across schools).

Problems

with Randomized Experiments

• Randomization per se does not assure that the treatment

and the control group are perfectly comparable.

– In any given RCT, nothing ensures that other causal factors are

balanced across the groups at the point of randomization (Deaton

and Cartwright 2017).

• Randomization per se only means that, on average, if

several experiments are repeated, the estimated effect of

the treatment is the true effect.

– Unbiasedness says that, if we were to repeat the trial many times,

we would be right on average. Yet, we are almost never in such a

situation, and with only one trial (as it is virtually always the case)

unbiasedness does nothing to prevent our single estimate from

being very far away from the truth (Deaton and Cartwright 2017).

Solvable Problems


1. Drop-out of treatment: some subjects in the treatment

group may drop out before completing the program.

2. Contamination bias: some subjects in the control group

get treatment.

Two Solutions to Drop-out of Treatment

and Contamination Bias

1. Define treatment as “intent-to-treat” or “offer of

treatment”: focus on those who were invited to be

treated, whether or not they actually agreed to be

treated.

2. Treatment assignment can be used as an instrument:

• Wald estimator: IV when the instrument is a binary variable.

Wald Estimator to Solve Drop-out of

Treatment and Contamination Bias

Start with some notation:

– Initial random assignment: R=0/1

– Decision to participate: D=0/1

Drop out of treatment: R=1 and D=0

Contamination bias: R=0 and D=1

p0=P(D=1|R=0), p1=P(D=1|R=1)

Observe R, D, p0, p1, Y0 if D=0 and Y1 if D=1

E(Y|R=0)=E(Y1|R=0)*p0 + E(Y0|R=0)*(1-p0)

E(Y|R=1)=E(Y1|R=1)*p1 + E(Y0|R=1)*(1-p1)

Unsolvable Problems


• Not implementable: e.g. effect of a merger on a firm’s

outputs – we can not force a firm to merge.

• Costs are too high.

• Ethical considerations: e.g. all poor households should

receive a given income subsidy.

• Estimates would only be available after many years: e.g.

effect of healthy diet on longevity.

Threats to Internal Validity of ExperimentsA) Threats to internal validity (ability to estimate causal effects

within the study population)

1. Failure to randomize (or imperfect randomization). Randomization

changes the nature of the program (e.g. greater recruitment needs may

lead to change in acceptance standards); ethical problems and political

opposition to randomize (e.g. poorest should get the program first).

2. Failure to follow treatment protocol (“partial compliance”):

• Some controls get treatment, some treated dropout of program.

• Differential attrition (e.g. in job training program, controls who find

jobs move out of town).

3. Experimental effects:

• Experimenter bias: treatment is associated with “extra effort”.

• The experiment is perceived differently by the researcher and by the

subjects. Subject behavior is affected by taking part in an experiment

(Hawthorne effect).

4. Validity of the instruments (in quasi-experiments).

Threats to internal validity imply that Cov(D,u)≠0, so the difference

estimator is biased.

B) Threats to external validity (ability to estimate causal effects

that are valid for other populations and settings)

• Non-representative sample.

• Non-representative “treatment” (that is program or policy).

• General equilibrium effects (effect of a program can depend on its

scale), and peer effects.

• Experiments involving human subjects typically need informed

consent: no guarantee that inferences for populations that give

consent generalize to populations that do not.

• Which aspects of the treatment are responsible for the effect?

External validity can never be guaranteed, neither in randomized

experiments nor in studies using observational data.

Threats to External Validity of Experiments

Internal and External Validity

of Randomized Experiments

• The threats to the internal and external validity of an experiment are

different from the threats to the internal and external validity of an

OLS regression using observational data (OVB, sample selection

bias, reversed causality, wrong functional form, and measurement

error).

• The threats for experiments refer to estimating the causal effects

with an experiment:

– Threats to internal validity (ability to estimate causal effects within

the study population)

– Threats to external validity (ability to estimate causal effects that

are valid for other populations and settings)

Still Lots of Benefits

of Experiments• Combine several experiments to estimate heterogeneous

treatment effects:

– Run several experiments in settings that differ for population,

nature of treatments, treatment rate, etc. to assess the credibility

of generalizing the results to other settings. Ex.: Meager (2016)

considers data from 7 randomized experiments on microfinance

programs and find consistency across studies.

• They reveal facts that are sometimes in contrast with

simple mean comparisons – two famous examples:

1. Effectiveness of job training programs on earnings (positive only

when assessed with randomized experiments -> why???)

2. Effectiveness of reducing class size on students’ test scores

(positive only when assessed with randomized experiments,

famous Tennessee Star Program -> why???)

Experiments

versus Observational Studies

• Internal validity is more problematic in observational studies, external

validity is more problematic in experiments:

– Observational studies with large samples are more representative

of the overall population but run the risk of omitted variables’ bias.

– Experiments with small samples have little external validity due to

differences between the sample and the target population.

• Combine experiments and observational studies: ex. search for

treatments with large effects that should be detected even in

observational studies, and use experiments to study the effects of

specific treatments.

• To estimate a causal effect it is necessary to have a theory to establish:

– Which variables in addition to the treatment affect Y.

– How to control for these variables.

Machine Learning

and Randomized Experiments

• Ludwig, Mullainathan and Spiess (2019):

RCTs are costly in terms of both time and dollars: the Negative

Income Tax experiments cost $60 million, Congress set aside

$100 million for the Moving to Opportunity experiment, which

has taken 25 years.

Pre-analysis plans have been used to test the size of the control

groups, the variables to control for, subgroups, functional forms.

However, arbitrariness of the choices that can undermine the

credibility of the results.

Idea: use ML index that predicts the outcome of interest from the

full vector of all controls to: assess balance treatment and no

difference between treatment and control distributions at

baseline; whether treatment effects are heterogeneous; and

whether all heterogeneity is captured by the included controls.

Machine Learning and Control Groups:

(Varian 2016, PNAS)

• * Varian, H. (2016) “Causal Inference in Economics and

Marketing”, PNAS, Vol. 113, No. 27, pages 7310-7315.

Introduction to causal inference in Economics written for readers

familiar with machine learning methods.

Discussion of how machine learning techniques can be

useful for developing better estimates of the counterfactuals.


(Varian 2016, PNAS)Two main types of questions:

1. Quantify how a given treatment affects the subjects:

– Examples: effect of a drug on health outcomes; effect of class size

on students’ learning; effect of an ad campaign on consumers’

spending.

Classic treatment-control group comparison and machine learning

can help by building counterfactuals through predictions.

2. Quantify how a given treatment affects the “experimenter”:

– Example: if I increase ad expenditure by x%, how many extra sales

will I get?

– The answer depends on how consumers respond to the ad, but we

do not need to model how they respond.

• Example: we care about an increase in the number of visits to our website rather

than in how this happened (more clicks on a given ad, more search queries, etc.)

Machine learning is essential to build a predictive model for the

counterfactual.


Example of Type 2 Question

(Varian 2016, PNAS)

• Example of an ad campaign.

– Research question: if I increase ad expenditure by x%, how many

extra sales will I get?

• The advertiser increases ad spent for a given period of time

and would like to compare the sales amount after the

increase with what would have happened to sales without

the increase in ad expenditure.

– NOTE: this differs from “pure prediction problems” where causal

inference is not necessary.

• How to compute the counterfactual?

– With a predictive model using data from before treatment.



(Varian 2016, PNAS)• For type 1 questions (effect of treatment on subjects):

– Treated and untreated (control) groups.

– Comparison of outcomes between treated and control groups.

• For type 2 questions (effect of treatment on experimenter):

– All subjects are treated for a given period. One unit of analysis over time.

– 4 step process (TTTC) to build and use a predictive model:

1. TRAIN: machine learning tools to tune model’s parameters.

2. TEST: apply the model to a test set to check how well it performs.

3. TREAT: apply the model during the treatment period to predict the

counterfactual.

4. COMPARE: compare what actually happened to the treated to the

prediction (given by the model) of what would have happened without the

treatment.



(Varian 2016, PNAS)

• TTTC process is a generalization of the classic treatment-

control approach to experiments.

• Key difference:

– Classic approach requires a control group, which provides an

estimate of the counterfactual.

– TTTC allows constructing a predictive model of the counterfactual

even if we do NOT have a true control group. One unit of analysis

over time.

• NOTE: TTTC estimates only the TTE (average effect of

treatment on the treated).

Different Approaches

of Program Evaluation1. Run an experiment and use simple differences

estimator.

2. Use observational data to construct the counterfactual

a. Selection on observables:

Unconfoundedness assumption: we assume to observe all X variables

that affect both participation decision and outcome.

• Differences-in-Differences (DID)

• Matching

• Regression discontinuity

b. Selection on unobservables

We assume participation depends on unobserved variables.

• Instrumental variable estimation

• Control function approach

Differences Estimator

• Differences estimator is the simple difference in mean

outcomes (Y) between treatment and control.

• Problem 1: time-constant unobserved differences between

treated and untreated that are correlated with outcomes.

Ex: effect of job training program on earnings. Those

who participate in the program are more motivated to

work, so would earn more even without the training

program, thus the effect of the program is overestimated.

• Solution to problem 1: compare outcome of participants

before and after “treatment” using panel data.

• Problem 2: time-trends (e.g. business cycles).

Ex: if recession after treatment, underestimation of treatment effect.

Differences-in-Differences Estimator

• Differences-in-Differences estimator (DID): differences

out time-constant differences between treatment and

control and time-trends by comparing treated and

untreated before and after the program.

• Data requirement: to implement DID it is necessary to

have panel data where each unit of analysis (individual,

firm, state) is observed for at least two consecutive

periods.


The DID estimator adjusts for pre-experimental differences by subtracting off each subject’s pre-experimental value of Y

before

iY = value of Y for subject i before the expt

after

iY = value of Y for subject i after the expt

Yi = after

iY – before

iY = change over course of expt

The DID estimator differences out:

time-constant (level) differences before and after the program by

computing Yi

and time-trends by comparing treated and untreated before and after the program

1ˆ diffs in diffs = ( ,treat afterY – ,treat beforeY ) – ( ,control afterY – ,control beforeY )

1ˆ diffs in diffs = ( ,treat afterY – ,treat beforeY ) – ( ,control afterY – ,control beforeY )

Differences-in-Differences Estimator (from Stock and Watson)


Main assumption of DID:

Counterfactual LEVELS for treated and non-treated can be different,

but have the same TIME VARIATION – COMMON TREND Assumption:

E(Y0(t1)-Y0(t0)|D=1)= E(Y0(t1)-Y0(t0)|D=0)


Main assumption of DID:

Counterfactual LEVELS for treated and non-treated can be different,

but have the same TIME VARIATION – COMMON TREND Assumption:

E(Y0(t1)-Y0(t0)|D=1)= E(Y0(t1)-Y0(t0)|D=0)

In the absence of the treatment, the change in treated outcome would

have been the same as the change in non-treated outcome

i.e. changes in the economy or life-cycle that are unrelated to the

treatment affect the two groups in a similar way

What is NOT allowed are unobserved time-varying effects that affect the

treatment and the control group differently.


and Machine Learning

• We can use machine learning to construct the

counterfactuals.

• DID can be combined with TTTC:

– TTTC builds a predictive model for the outcome in the absence of

the treatment: predicts the missing potential outcome under the

absence of treatment status.

– When estimating DID, we can build a predictive model for the

group that did not receive the treatment using the same 4 step

process that we discussed for TTTC:

1. TRAIN: machine learning tools to tune parameters.

2. TEST: apply the model to a test set to check how well it performs.

3. TREAT: apply the model to the treated units to predict the counterfactual.

4. COMPARE: compare what actually happened to the treated to the prediction given

by the model of what would have happened without the treatment.

Differences-in-Differences Estimator:

Summing Up

DID differences out time-constant differences between treatment and

control and time-trends by comparing treated and untreated before and

after the treatment.

Validity assumption: COMMON TREND – absent the treatment, the

change in treated outcome would have been the same as the change in

non-treated outcome.

What is NOT allowed are unobserved time-varying effects that affect

treatment and control differently.

DID identifies the TTE; however, if the assignment to the treatment is

random, we can also estimate ATE.

Of course, as for the differences estimator, we can control for additional

covariates with the same advantages that we discussed.


for DID Estimator

• Data requirement: to implement DID it is necessary to

have panel data where each unit of analysis (individual,

firm, state) is observed for at least two consecutive

periods.

Brief Review of Panel Data

• Panel data with k regressors

{X1(it),…, Xk(it), Y(it)},

i=1,…, n (number of entities),

t=1,.., T (number of time periods)

• Another term for panel data is longitudinal data.

• Balanced panel: no missing observations.

• Unbalanced panel: some entities (unit of analysis) are not

observed for some time periods.

Why are Panel Data Useful?

Two main cases:

1. Control for entity fixed effects: effects that vary across

entities (unit of analysis), but do not vary over time.

2. Control for time fixed effects: effects that vary over

time, but do not vary across units of analysis.


Entity Fixed Effects

With panel data we can control for factors that:

• Vary across entities (unit of analysis), but do not vary

over time.

• Are unobserved or unmeasured – and therefore cannot

be included in the regression.

• Could cause omitted variable bias if they were omitted.

• Example: Can alcohol taxes reduce traffic deaths?

(Chapter 10 in Stock and Watson)

Panel Data and Omitted Variable Bias

• Why more traffic deaths in States that have higher alcohol taxes?

• Other factors that determine traffic fatality rate such as:

– Density of cars on the road

– “Culture” around drinking and driving

• These omitted factors could cause omitted variable bias.

• Example: traffic density

1. High traffic density means more traffic deaths

2. States with higher traffic density have higher alcohol taxes

Two conditions for omitted variable bias are satisfied. Specifically,

“high taxes” could reflect “high traffic density”, so the OLS

coefficient would be biased upwards.

Panel data allow eliminating omitted variable bias when the omitted

variables are constant over time within a given unit of analysis

(States in the example here).

Consider the panel data model

FatalityRate(it)=a+b*BeerTax(it)+c*Z(i)+u(it),

Where Z(i) is a factor that does not change over time (eg traffic density),

at least during the years on which we have data. Suppose Z(i) is not

observed, so its omission could result in omitted variable bias.

The effect of Z(i) can be eliminated using T=2 years.


Consider the panel data model

FatalityRate(it)=a+b*BeerTax(it)+c*Z(i)+u(it),

Where Z(i) is a factor that does not change over time (eg traffic density),

at least during the years on which we have data. Suppose Z(i) is not

observed, so its omission could result in omitted variable bias.

The effect of Z(i) can be eliminated using T=2 years.

• Key idea: Any change in the fatality rate from 1982 to 1988 cannot

be caused by Z(i), because Z(i) (by assumption) does not change

between 1982 and 1988.

• Estimate the difference in fatality rate as a function of the difference

in beer tax using OLS.


• What if you have more than 2 time periods (T>2)?

• For i=1,…,n and t=1,…, T

Y(it)=a+b*X(it)+c*Z(i)+u(it)

we can rewrite this in two useful ways:

1. “Fixed Effects” regression model

Y(it)=a(i)+b*X(it)+u(it)

intercept a(i) is unique for each State, slope b is the same in all

States

Panel Data with T>2

Panel Data with T>2

• For i=1,…,n and t=1,…, T

Y(it)=a+b*X(it)+c*Z(i)+u(it)

we can rewrite this in two useful ways:

1. “Fixed Effects” regression model

Y(it)=a(i)+b*X(it)+u(it)

intercept a(i) is unique for each State, slope b is the same in all

States

2. “n-1 binary regressors” regression model

Y(it)=a+b*X(it)+c2*D2(i)+c3*D3(i)+…+cn*Dn(i)+u(it)

where D2(i)=1{i=2}, i.e. D2(i) equals 1 if the ith observation is from

State ith

Panel Data with T>2

Three estimation methods:

1. “n-1” binary regressors” OLS regression

2. “Entity-demeaned” OLS regression

3. “Changes” specification (if and only if T=2)

These three methods produce identical estimates of the regression

coefficients and identical standard errors.

Panel Data with T≥2

Three estimation methods:

1. “n-1” binary regressors” OLS regression

2. “Entity-demeaned” OLS regression

3. “Changes” specification (if and only if T=2)

These three methods produce identical estimates of the regression

coefficients and identical standard errors.

Method 1. Y(it)=a+b*X(it)+c2*D2(i)+c3*D3(i)+…+cn*Dn(i)+u(it)

- First create the binary variables, D2(i),…, Dn(i)

- Then estimate above equation by OLS

- Inference (hypothesis tests, confidence intervals) is as usual

(using heteroscedasticity-robust standard errors)

- Impractical when n is very large


Method 2. Y(it)= a(i) +b*X(it)+u(it)

- First construct the demeaned variables

- Then estimate above equation by OLS

- Inference (hypothesis tests, confidence intervals) is as usual

(using heteroscedasticity-robust standard errors)

- This is like the “changes” approach (method 3), but Y(it) is

deviated from the state average instead of Y(i1)

Estimation can be done easily in STATA:

• “areg” automatically demeans the data (useful when n large)

• The estimated intercept is the average of the n-1 dummy variables

(no clear interpretation)



Time Fixed Effects

An omitted variable might vary over time but not across units (ex. States):

• Safer cars (air bags, etc); changes in national laws

• These produce intercepts that change over time

• Let these changes (“safer cars”) be denoted by the variable S(t),

which changes over time but not across States

• The resulting population regression model is:

Y(it)=a+b*X(it)+c*S(t)+u(it)

The intercept varies from one year to the next, m(1982)=a+c*S(1982)


Time Fixed Effects

An omitted variable might vary over time but not across units (ex. States):

• Safer cars (air bags, etc); changes in national laws

• These produce intercepts that change over time

• Let these changes (“safer cars”) be denoted by the variable S(t),

which changes over time but not across States

• The resulting population regression model is:

Y(it)=a+b*X(it)+c*S(t)+u(it)

The intercept varies from one year to the next, m(1982)=a+c*S(1982)

Again, two formulations for time fixed effects:

1. “Binary regressor” formulation: “T-1 binary regressors” OLS

regression

2. “Time effects” formulation: “Year demeaned” OLS regression (deviate

Y(it) and X(it) from year averages), then estimate by OLS

Time and Entity Fixed Effects

or “Back to DID”

Y(it)=a(t)+b*T(it)+m(i)+u(it),

where T(it)=1 if in treatment group and after treatment, 0 otherwise

or

Y(it)=a+ b*D(it)*Z(it) +c*Z(it)+d*D(it)+u(it),

where D(it)=1 if in treatment group, 0 otherwise

Z(it)=1 if in “after” period, 0 in “before” period

D(it)*Z(it)=1 if in treatment group in “after” pd (interaction effect)

b is the DID estimator

Time and Entity Fixed Effects:

Estimation

Different equivalent ways to allow for both entity and time

fixed effects:

• Differences and intercept (T=2 only)

• Entity (or time) demeaning and T-1 time (or N-1 entity)

indicators

• T-1 time indicators and n-1 entity indicators

• Entity and time demeaning

• Under the fixed effects regression assumptions, which are

basically extensions of the least squares assumptions, the

OLS fixed effects estimator of b is normally distributed.

• BUT there are difficulties associated with computing

standard errors that do not come up with cross-sectional

data.

• In Appendix 1 and 2:

1. Fixed effects regression assumptions.

2. Standard errors for fixed effects regression.

3. Proof of consistency and normality of fixed effects

estimator.

Time and Entity Fixed Effects:

Estimation

Additions to DID

• Possible to use repeated cross-sections instead of panel

data under certain conditions, e.g. stable group composition

over time (see Meyer 1995, and Abadie 2005).

• Caveats and extensions: – Endogenous treatment (Besley/Case 2000) example: DID assumptions

exclude the possibility that a State increases the alcohol tax because of

high rate of traffic fatalities in the past.

– Parallel trends conditional on X: trends can be different in treated and

control groups if the distribution of X is different (Abadie 2005: “Semi-

parametric DID”) mix of “diffs-in-diffs” and “matching” methods.

– Bertrand et al (2004): proposes solution to the case when residual

autocorrelation over time is not accounted for, thus the variance may be

underestimated (heteroscedasticity and autocorrelation-consistent

asymptotic variance).

Appendix 1:

Fixed-Effects Regression Assumptions

and Variance-Covariance Matrix


Under these assumptions, FE is consistent and normally distributed.

Assumption #2

Variance-Covariance Matrix


• In general, we would like to allow the error terms to be correlated

over time for a given entity, and this makes the formula for the

asymptotic variance complicated.

• You can also allow for heteroskedasticity. Then you can compute

the “heteroskedasticity- and autocorrelation-consistent asymptotic

variance”.

• You can also compute “clustered standard errors” because there is

a grouping, or “cluster”, within which the error term is possibly

correlated, but outside of which (across groups) it is not. For

example, you can allow for correlation of the errors for individuals

within the same family but not between families.

Variance-Covariance Matrix - special case:

no correlation across time within entities

Case 1: Serial Correlation Heteroskedasticity and autocorrelation-consistent

asymptotic variance

Case 2: No Serial Correlation

Case 3: No Serial Correlation

and No Heteroskedasticity

Appendix 2:

Proofs of Consistency and Normality of

Fixed Effects Estimator

Consistency of Fixed Effects Estimator

Normality of Fixed Effects Estimator

big data in economics

Documents