big data in economics
TRANSCRIPT
Randomized Experiments
• The goal of randomized experiments is to identify…
The causal effect!
Advantage of causal effect described by statisticians:
“The advantage of causal predictors compared with non-causal
predictors is that their influence on the target variable remains
invariant under different changes of the environment.”
(Peters, Buhlmann and Meinshausen 2016, Journal of the
Royal Statistical Society)
• Correlation is not causation!
Randomized Experiments
• The gold standard to estimate a causal effect is a
randomized experiment.
• The validity of a randomized experiment depends on:
1. Randomization.
2. Well constructed control group.
1) What to randomize on:
1. Randomize eligibility
2. Randomize after acceptance into the program
3. Randomize incentives for take-up
Randomize after acceptance into the program:
– R=1 if randomized in (treatment group)
– R=0 if randomized out (control group)
– D denotes if someone applies to the program and is subject to
randomization [here D=1 for all people who are in the randomization]
– Random assignment implies:
• For treatment group: E(Y1|X, D=1,R=1) = E(Y1|X, D=1)
• For control group: E(Y0|X, D=1, R=0) = E(Y0|X, D=1)
Experiment gives TTE = E(Y1-Y0|X, D=1)
What to Take into Account when
Conducting a Randomized Experiment?
2) Power calculations
• Def: power of the design is the probability that, for a given effect
size and statistical significance level, we will be able to reject the
hypothesis of zero effect.
• Design choices that affect the “power” of an experiment:
– Sample size
– Minimum size of the effect that the researcher wants to be able to detect
– Multiple treatment groups
– Partial compliance and drop out
– Control variables (important to know how much they absorb of the
residual variance)
• Standard softwares for the single-site case (“power” command in
Stata)
• Multi-site power analyses get complicated
– Need to know the impact variation and correlations across sites
What to Take into Account when
Conducting a Randomized Experiment?
3) Choosing the sites in multi-site experiments
• External validity: choose sites at random
• Realistic impacts: choose sites that are representative
• Efficacy: choose sites that will best implement the
treatment
• Avoid contamination: choose sites with little or no
contact of any sort
What to Take into Account when
Conducting a Randomized Experiment?
Examples of Randomized Experiments
• Large-scale experiments, e.g. in the US/Canada:
– US National JTPA (Job Training Partnership Act) Study,
Tennessee class size experiment (STAR)
• More recently, randomized experiments in developing
countries:
– Small experiments addressing very specific questions, for
example microfinance experiments by Dean Karlan, education
experiments (e.g. schooling inputs) by Michael Kremer and
Esther Duflo, etc.
– Example of a large-scale and very successful conditional cash
transfer program: Progresa/Oportunidades in Mexico (1997-
2003)
Example: the STAR Experiment
(Stock and Watson Ch. 13)
Tennessee Project STAR (Student-Teacher Achievement Ratio):
4-year US study for an overall budget of $12 million. 79
Tennessee public schools for a single cohort of students in
kindergarten through third grade in the years 1985-89.
Upon entering the school system a student was randomly
assigned to one of three groups:
Regular class (22 – 25 students).
Regular class + full-time teacher’s aide.
Small class (13 – 17 students).
Regular classes’ students re-randomized after first year to regular
or regular + aide class.
Y = Stanford achievement test scores.
“Natural” (or Quasi-) Experiments
A quasi-experiment or natural experiment: “nature” provides random events that can be used as a source of exogenous variation.
Treatment (D) is “as if” randomly assigned.
Example:
Effect of changes in minimum wage on employment.
D = change in minimum wage law in some States (it changes
only in some States, thus State is “as if” randomly assigned).
The natural random event operates as an instrumental variable:
Relevance: it is strongly correlated with the treatment D (so
much that it defines the treatment!).
Exogeneity: it does not affect the outcome Y rather than via
the treatment D.
“Natural” (or Quasi-) Experiments
Idea of quasi-experiments follows that of “real” randomized
experiments:
find exogenous source of variation (i.e. variable that
affects participation but not the outcome directly)
Important to understand the source of variation that helps
to identify the treatment effect
“Natural” (or Quasi-) Experiments
Disadvantage: small amount of random events provided by nature…
Advantage: when the nature provides random events, they
can usefully be exploited. Example: Card D. and Krueger A.
(1994) “Minimum Wages and Employment: A Case Study of
the Fast-Food Industry in New Jersey and Pennsylvania”,
American Economic Review, Vol. 84, No. 4, pp. 772-793.
Regression Analysis of Experiments
for Differences Estimator• In an ideal randomized controlled experiment the
treatment D is randomly assigned:
Y=a+b*D+u (1)
• If D is randomly assigned, then u and D are independently
distributed, E(u|D)=0 (conditional mean independence)
dE(Y|D)/dD=b average causal effect of D on Y
OLS of (1) gives an unbiased estimate of the causal
effect of D on Y.
• When the treatment is binary, the causal effect b is the
difference in mean outcomes in treatment vs control.
This difference in means is the differences estimator
Regression Analysis of Experiments
for Differences EstimatorWe can add covariates X to the model: Y=a+b*D+c*X+u (2)
Advantages of adding the covariates X:
1. Check if randomization worked: if D is randomly assigned, the
OLS estimates of b in model (1) and (2) (that is with and without
the covariates X) should be similar – if they aren’t, this suggests
that D was not randomly assigned
• NOTE: to check directly for randomization, we can regress the
treatment indicator, D, on the covariates X, and do a F-test.
2. Increases efficiency: smaller standard errors
3. Adjust for conditional randomization (apply conditional
randomization if interested in treatment effects for different
groups; for example schools’ effects if randomization was within
but not across schools).
Problems
with Randomized Experiments
• Randomization per se does not assure that the treatment
and the control group are perfectly comparable.
– In any given RCT, nothing ensures that other causal factors are
balanced across the groups at the point of randomization (Deaton
and Cartwright 2017).
• Randomization per se only means that, on average, if
several experiments are repeated, the estimated effect of
the treatment is the true effect.
– Unbiasedness says that, if we were to repeat the trial many times,
we would be right on average. Yet, we are almost never in such a
situation, and with only one trial (as it is virtually always the case)
unbiasedness does nothing to prevent our single estimate from
being very far away from the truth (Deaton and Cartwright 2017).
Solvable Problems
with Randomized Experiments
1. Drop-out of treatment: some subjects in the treatment
group may drop out before completing the program.
2. Contamination bias: some subjects in the control group
get treatment.
Two Solutions to Drop-out of Treatment
and Contamination Bias
1. Define treatment as “intent-to-treat” or “offer of
treatment”: focus on those who were invited to be
treated, whether or not they actually agreed to be
treated.
2. Treatment assignment can be used as an instrument:
• Wald estimator: IV when the instrument is a binary variable.
Wald Estimator to Solve Drop-out of
Treatment and Contamination Bias
Start with some notation:
– Initial random assignment: R=0/1
– Decision to participate: D=0/1
Drop out of treatment: R=1 and D=0
Contamination bias: R=0 and D=1
p0=P(D=1|R=0), p1=P(D=1|R=1)
Observe R, D, p0, p1, Y0 if D=0 and Y1 if D=1
E(Y|R=0)=E(Y1|R=0)*p0 + E(Y0|R=0)*(1-p0)
E(Y|R=1)=E(Y1|R=1)*p1 + E(Y0|R=1)*(1-p1)
Given:
E(Y|R=0)=E(Y1|R=0)*p0 + E(Y0|R=0)*(1-p0)
E(Y|R=1)=E(Y1|R=1)*p1 + E(Y0|R=1)*(1-p1)
Because of Randomization:
E(Y1|R=1)=E(Y1|R=0)=E(Y1) (same for Y0)
Therefore:
E(Y|R=1) – E(Y|R=0)= E(Y1)*(p1-p0) – E(Y0)*(p1-p0)
ATE= E(Y1) – E(Y0)= [E(Y|R=1)-E(Y|R=0)]/(p1-p0)
[Wald estimator]
Wald Estimator to Solve Drop-out of
Treatment and Contamination Bias
Unsolvable Problems
with Randomized Experiments
• Not implementable: e.g. effect of a merger on a firm’s
outputs – we can not force a firm to merge.
• Costs are too high.
• Ethical considerations: e.g. all poor households should
receive a given income subsidy.
• Estimates would only be available after many years: e.g.
effect of healthy diet on longevity.
Threats to Internal Validity of ExperimentsA) Threats to internal validity (ability to estimate causal effects
within the study population)
1. Failure to randomize (or imperfect randomization). Randomization
changes the nature of the program (e.g. greater recruitment needs may
lead to change in acceptance standards); ethical problems and political
opposition to randomize (e.g. poorest should get the program first).
2. Failure to follow treatment protocol (“partial compliance”):
• Some controls get treatment, some treated dropout of program.
• Differential attrition (e.g. in job training program, controls who find
jobs move out of town).
3. Experimental effects:
• Experimenter bias: treatment is associated with “extra effort”.
• The experiment is perceived differently by the researcher and by the
subjects. Subject behavior is affected by taking part in an experiment
(Hawthorne effect).
4. Validity of the instruments (in quasi-experiments).
Threats to internal validity imply that Cov(D,u)≠0, so the difference
estimator is biased.
B) Threats to external validity (ability to estimate causal effects
that are valid for other populations and settings)
• Non-representative sample.
• Non-representative “treatment” (that is program or policy).
• General equilibrium effects (effect of a program can depend on its
scale), and peer effects.
• Experiments involving human subjects typically need informed
consent: no guarantee that inferences for populations that give
consent generalize to populations that do not.
• Which aspects of the treatment are responsible for the effect?
External validity can never be guaranteed, neither in randomized
experiments nor in studies using observational data.
Threats to External Validity of Experiments
Internal and External Validity
of Randomized Experiments
• The threats to the internal and external validity of an experiment are
different from the threats to the internal and external validity of an
OLS regression using observational data (OVB, sample selection
bias, reversed causality, wrong functional form, and measurement
error).
• The threats for experiments refer to estimating the causal effects
with an experiment:
– Threats to internal validity (ability to estimate causal effects within
the study population)
– Threats to external validity (ability to estimate causal effects that
are valid for other populations and settings)
Still Lots of Benefits
of Experiments• Combine several experiments to estimate heterogeneous
treatment effects:
– Run several experiments in settings that differ for population,
nature of treatments, treatment rate, etc. to assess the credibility
of generalizing the results to other settings. Ex.: Meager (2016)
considers data from 7 randomized experiments on microfinance
programs and find consistency across studies.
• They reveal facts that are sometimes in contrast with
simple mean comparisons – two famous examples:
1. Effectiveness of job training programs on earnings (positive only
when assessed with randomized experiments -> why???)
2. Effectiveness of reducing class size on students’ test scores
(positive only when assessed with randomized experiments,
famous Tennessee Star Program -> why???)
Experiments
versus Observational Studies
• Internal validity is more problematic in observational studies, external
validity is more problematic in experiments:
– Observational studies with large samples are more representative
of the overall population but run the risk of omitted variables’ bias.
– Experiments with small samples have little external validity due to
differences between the sample and the target population.
• Combine experiments and observational studies: ex. search for
treatments with large effects that should be detected even in
observational studies, and use experiments to study the effects of
specific treatments.
• To estimate a causal effect it is necessary to have a theory to establish:
– Which variables in addition to the treatment affect Y.
– How to control for these variables.
Machine Learning
and Randomized Experiments
• Ludwig, Mullainathan and Spiess (2019):
RCTs are costly in terms of both time and dollars: the Negative
Income Tax experiments cost $60 million, Congress set aside
$100 million for the Moving to Opportunity experiment, which
has taken 25 years.
Pre-analysis plans have been used to test the size of the control
groups, the variables to control for, subgroups, functional forms.
However, arbitrariness of the choices that can undermine the
credibility of the results.
Idea: use ML index that predicts the outcome of interest from the
full vector of all controls to: assess balance treatment and no
difference between treatment and control distributions at
baseline; whether treatment effects are heterogeneous; and
whether all heterogeneity is captured by the included controls.
Machine Learning and Control Groups:
(Varian 2016, PNAS)
• * Varian, H. (2016) “Causal Inference in Economics and
Marketing”, PNAS, Vol. 113, No. 27, pages 7310-7315.
Introduction to causal inference in Economics written for readers
familiar with machine learning methods.
Discussion of how machine learning techniques can be
useful for developing better estimates of the counterfactuals.
Machine Learning and Control Groups:
(Varian 2016, PNAS)Two main types of questions:
1. Quantify how a given treatment affects the subjects:
– Examples: effect of a drug on health outcomes; effect of class size
on students’ learning; effect of an ad campaign on consumers’
spending.
Classic treatment-control group comparison and machine learning
can help by building counterfactuals through predictions.
2. Quantify how a given treatment affects the “experimenter”:
– Example: if I increase ad expenditure by x%, how many extra sales
will I get?
– The answer depends on how consumers respond to the ad, but we
do not need to model how they respond.
• Example: we care about an increase in the number of visits to our website rather
than in how this happened (more clicks on a given ad, more search queries, etc.)
Machine learning is essential to build a predictive model for the
counterfactual.
Machine Learning and Control Groups:
Example of Type 2 Question
(Varian 2016, PNAS)
• Example of an ad campaign.
– Research question: if I increase ad expenditure by x%, how many
extra sales will I get?
• The advertiser increases ad spent for a given period of time
and would like to compare the sales amount after the
increase with what would have happened to sales without
the increase in ad expenditure.
– NOTE: this differs from “pure prediction problems” where causal
inference is not necessary.
• How to compute the counterfactual?
– With a predictive model using data from before treatment.
Machine Learning and Control Groups:
Example of Type 2 Question
(Varian 2016, PNAS)• For type 1 questions (effect of treatment on subjects):
– Treated and untreated (control) groups.
– Comparison of outcomes between treated and control groups.
• For type 2 questions (effect of treatment on experimenter):
– All subjects are treated for a given period. One unit of analysis over time.
– 4 step process (TTTC) to build and use a predictive model:
1. TRAIN: machine learning tools to tune model’s parameters.
2. TEST: apply the model to a test set to check how well it performs.
3. TREAT: apply the model during the treatment period to predict the
counterfactual.
4. COMPARE: compare what actually happened to the treated to the
prediction (given by the model) of what would have happened without the
treatment.
Machine Learning and Control Groups:
Example of Type 2 Question
(Varian 2016, PNAS)
• TTTC process is a generalization of the classic treatment-
control approach to experiments.
• Key difference:
– Classic approach requires a control group, which provides an
estimate of the counterfactual.
– TTTC allows constructing a predictive model of the counterfactual
even if we do NOT have a true control group. One unit of analysis
over time.
• NOTE: TTTC estimates only the TTE (average effect of
treatment on the treated).
Different Approaches
of Program Evaluation1. Run an experiment and use simple differences
estimator.
2. Use observational data to construct the counterfactual
a. Selection on observables:
Unconfoundedness assumption: we assume to observe all X variables
that affect both participation decision and outcome.
• Differences-in-Differences (DID)
• Matching
• Regression discontinuity
b. Selection on unobservables
We assume participation depends on unobserved variables.
• Instrumental variable estimation
• Control function approach
Differences Estimator
• Differences estimator is the simple difference in mean
outcomes (Y) between treatment and control.
• Problem 1: time-constant unobserved differences between
treated and untreated that are correlated with outcomes.
Ex: effect of job training program on earnings. Those
who participate in the program are more motivated to
work, so would earn more even without the training
program, thus the effect of the program is overestimated.
• Solution to problem 1: compare outcome of participants
before and after “treatment” using panel data.
• Problem 2: time-trends (e.g. business cycles).
Ex: if recession after treatment, underestimation of treatment effect.
Differences-in-Differences Estimator
• Differences-in-Differences estimator (DID): differences
out time-constant differences between treatment and
control and time-trends by comparing treated and
untreated before and after the program.
• Data requirement: to implement DID it is necessary to
have panel data where each unit of analysis (individual,
firm, state) is observed for at least two consecutive
periods.
Differences-in-Differences Estimator
The DID estimator adjusts for pre-experimental differences by subtracting off each subject’s pre-experimental value of Y
before
iY = value of Y for subject i before the expt
after
iY = value of Y for subject i after the expt
Yi = after
iY – before
iY = change over course of expt
The DID estimator differences out:
time-constant (level) differences before and after the program by
computing Yi
and time-trends by comparing treated and untreated before and after the program
1ˆ diffs in diffs = ( ,treat afterY – ,treat beforeY ) – ( ,control afterY – ,control beforeY )
1ˆ diffs in diffs = ( ,treat afterY – ,treat beforeY ) – ( ,control afterY – ,control beforeY )
Differences-in-Differences Estimator (from Stock and Watson)
Differences-in-Differences Estimator
Main assumption of DID:
Counterfactual LEVELS for treated and non-treated can be different,
but have the same TIME VARIATION – COMMON TREND Assumption:
E(Y0(t1)-Y0(t0)|D=1)= E(Y0(t1)-Y0(t0)|D=0)
Differences-in-Differences Estimator
Main assumption of DID:
Counterfactual LEVELS for treated and non-treated can be different,
but have the same TIME VARIATION – COMMON TREND Assumption:
E(Y0(t1)-Y0(t0)|D=1)= E(Y0(t1)-Y0(t0)|D=0)
In the absence of the treatment, the change in treated outcome would
have been the same as the change in non-treated outcome
i.e. changes in the economy or life-cycle that are unrelated to the
treatment affect the two groups in a similar way
What is NOT allowed are unobserved time-varying effects that affect the
treatment and the control group differently.
Differences-in-Differences Estimator
and Machine Learning
• We can use machine learning to construct the
counterfactuals.
• DID can be combined with TTTC:
– TTTC builds a predictive model for the outcome in the absence of
the treatment: predicts the missing potential outcome under the
absence of treatment status.
– When estimating DID, we can build a predictive model for the
group that did not receive the treatment using the same 4 step
process that we discussed for TTTC:
1. TRAIN: machine learning tools to tune parameters.
2. TEST: apply the model to a test set to check how well it performs.
3. TREAT: apply the model to the treated units to predict the counterfactual.
4. COMPARE: compare what actually happened to the treated to the prediction given
by the model of what would have happened without the treatment.
Differences-in-Differences Estimator:
Summing Up
DID differences out time-constant differences between treatment and
control and time-trends by comparing treated and untreated before and
after the treatment.
Validity assumption: COMMON TREND – absent the treatment, the
change in treated outcome would have been the same as the change in
non-treated outcome.
What is NOT allowed are unobserved time-varying effects that affect
treatment and control differently.
DID identifies the TTE; however, if the assignment to the treatment is
random, we can also estimate ATE.
Of course, as for the differences estimator, we can control for additional
covariates with the same advantages that we discussed.
Regression Analysis of Experiments
for DID Estimator
• Data requirement: to implement DID it is necessary to
have panel data where each unit of analysis (individual,
firm, state) is observed for at least two consecutive
periods.
Brief Review of Panel Data
• Panel data with k regressors
{X1(it),…, Xk(it), Y(it)},
i=1,…, n (number of entities),
t=1,.., T (number of time periods)
• Another term for panel data is longitudinal data.
• Balanced panel: no missing observations.
• Unbalanced panel: some entities (unit of analysis) are not
observed for some time periods.
Why are Panel Data Useful?
Two main cases:
1. Control for entity fixed effects: effects that vary across
entities (unit of analysis), but do not vary over time.
2. Control for time fixed effects: effects that vary over
time, but do not vary across units of analysis.
Why are Panel Data Useful?
Entity Fixed Effects
With panel data we can control for factors that:
• Vary across entities (unit of analysis), but do not vary
over time.
• Are unobserved or unmeasured – and therefore cannot
be included in the regression.
• Could cause omitted variable bias if they were omitted.
• Example: Can alcohol taxes reduce traffic deaths?
(Chapter 10 in Stock and Watson)
Panel Data and Omitted Variable Bias
• Why more traffic deaths in States that have higher alcohol taxes?
• Other factors that determine traffic fatality rate such as:
– Density of cars on the road
– “Culture” around drinking and driving
• These omitted factors could cause omitted variable bias.
• Example: traffic density
1. High traffic density means more traffic deaths
2. States with higher traffic density have higher alcohol taxes
Two conditions for omitted variable bias are satisfied. Specifically,
“high taxes” could reflect “high traffic density”, so the OLS
coefficient would be biased upwards.
Panel data allow eliminating omitted variable bias when the omitted
variables are constant over time within a given unit of analysis
(States in the example here).
Consider the panel data model
FatalityRate(it)=a+b*BeerTax(it)+c*Z(i)+u(it),
Where Z(i) is a factor that does not change over time (eg traffic density),
at least during the years on which we have data. Suppose Z(i) is not
observed, so its omission could result in omitted variable bias.
The effect of Z(i) can be eliminated using T=2 years.
Panel Data and Omitted Variable Bias
Consider the panel data model
FatalityRate(it)=a+b*BeerTax(it)+c*Z(i)+u(it),
Where Z(i) is a factor that does not change over time (eg traffic density),
at least during the years on which we have data. Suppose Z(i) is not
observed, so its omission could result in omitted variable bias.
The effect of Z(i) can be eliminated using T=2 years.
• Key idea: Any change in the fatality rate from 1982 to 1988 cannot
be caused by Z(i), because Z(i) (by assumption) does not change
between 1982 and 1988.
• Estimate the difference in fatality rate as a function of the difference
in beer tax using OLS.
Panel Data and Omitted Variable Bias
• What if you have more than 2 time periods (T>2)?
• For i=1,…,n and t=1,…, T
Y(it)=a+b*X(it)+c*Z(i)+u(it)
we can rewrite this in two useful ways:
1. “Fixed Effects” regression model
Y(it)=a(i)+b*X(it)+u(it)
intercept a(i) is unique for each State, slope b is the same in all
States
Panel Data with T>2
• For i=1,…,n and t=1,…, T
Y(it)=a+b*X(it)+c*Z(i)+u(it)
we can rewrite this in two useful ways:
1. “Fixed Effects” regression model
Y(it)=a(i)+b*X(it)+u(it)
intercept a(i) is unique for each State, slope b is the same in all
States
2. “n-1 binary regressors” regression model
Y(it)=a+b*X(it)+c2*D2(i)+c3*D3(i)+…+cn*Dn(i)+u(it)
where D2(i)=1{i=2}, i.e. D2(i) equals 1 if the ith observation is from
State ith
Panel Data with T>2
Three estimation methods:
1. “n-1” binary regressors” OLS regression
2. “Entity-demeaned” OLS regression
3. “Changes” specification (if and only if T=2)
These three methods produce identical estimates of the regression
coefficients and identical standard errors.
Panel Data with T≥2
Three estimation methods:
1. “n-1” binary regressors” OLS regression
2. “Entity-demeaned” OLS regression
3. “Changes” specification (if and only if T=2)
These three methods produce identical estimates of the regression
coefficients and identical standard errors.
Method 1. Y(it)=a+b*X(it)+c2*D2(i)+c3*D3(i)+…+cn*Dn(i)+u(it)
- First create the binary variables, D2(i),…, Dn(i)
- Then estimate above equation by OLS
- Inference (hypothesis tests, confidence intervals) is as usual
(using heteroscedasticity-robust standard errors)
- Impractical when n is very large
Panel Data with T≥2
Method 2. Y(it)= a(i) +b*X(it)+u(it)
- First construct the demeaned variables
- Then estimate above equation by OLS
- Inference (hypothesis tests, confidence intervals) is as usual
(using heteroscedasticity-robust standard errors)
- This is like the “changes” approach (method 3), but Y(it) is
deviated from the state average instead of Y(i1)
Estimation can be done easily in STATA:
• “areg” automatically demeans the data (useful when n large)
• The estimated intercept is the average of the n-1 dummy variables
(no clear interpretation)
Panel Data with T≥2
Why are Panel Data Useful?
Time Fixed Effects
An omitted variable might vary over time but not across units (ex. States):
• Safer cars (air bags, etc); changes in national laws
• These produce intercepts that change over time
• Let these changes (“safer cars”) be denoted by the variable S(t),
which changes over time but not across States
• The resulting population regression model is:
Y(it)=a+b*X(it)+c*S(t)+u(it)
The intercept varies from one year to the next, m(1982)=a+c*S(1982)
Why are Panel Data Useful?
Time Fixed Effects
An omitted variable might vary over time but not across units (ex. States):
• Safer cars (air bags, etc); changes in national laws
• These produce intercepts that change over time
• Let these changes (“safer cars”) be denoted by the variable S(t),
which changes over time but not across States
• The resulting population regression model is:
Y(it)=a+b*X(it)+c*S(t)+u(it)
The intercept varies from one year to the next, m(1982)=a+c*S(1982)
Again, two formulations for time fixed effects:
1. “Binary regressor” formulation: “T-1 binary regressors” OLS
regression
2. “Time effects” formulation: “Year demeaned” OLS regression (deviate
Y(it) and X(it) from year averages), then estimate by OLS
Time and Entity Fixed Effects
or “Back to DID”
Y(it)=a(t)+b*T(it)+m(i)+u(it),
where T(it)=1 if in treatment group and after treatment, 0 otherwise
or
Y(it)=a+ b*D(it)*Z(it) +c*Z(it)+d*D(it)+u(it),
where D(it)=1 if in treatment group, 0 otherwise
Z(it)=1 if in “after” period, 0 in “before” period
D(it)*Z(it)=1 if in treatment group in “after” pd (interaction effect)
b is the DID estimator
Time and Entity Fixed Effects:
Estimation
Different equivalent ways to allow for both entity and time
fixed effects:
• Differences and intercept (T=2 only)
• Entity (or time) demeaning and T-1 time (or N-1 entity)
indicators
• T-1 time indicators and n-1 entity indicators
• Entity and time demeaning
• Under the fixed effects regression assumptions, which are
basically extensions of the least squares assumptions, the
OLS fixed effects estimator of b is normally distributed.
• BUT there are difficulties associated with computing
standard errors that do not come up with cross-sectional
data.
• In Appendix 1 and 2:
1. Fixed effects regression assumptions.
2. Standard errors for fixed effects regression.
3. Proof of consistency and normality of fixed effects
estimator.
Time and Entity Fixed Effects:
Estimation
Additions to DID
• Possible to use repeated cross-sections instead of panel
data under certain conditions, e.g. stable group composition
over time (see Meyer 1995, and Abadie 2005).
• Caveats and extensions: – Endogenous treatment (Besley/Case 2000) example: DID assumptions
exclude the possibility that a State increases the alcohol tax because of
high rate of traffic fatalities in the past.
– Parallel trends conditional on X: trends can be different in treated and
control groups if the distribution of X is different (Abadie 2005: “Semi-
parametric DID”) mix of “diffs-in-diffs” and “matching” methods.
– Bertrand et al (2004): proposes solution to the case when residual
autocorrelation over time is not accounted for, thus the variance may be
underestimated (heteroscedasticity and autocorrelation-consistent
asymptotic variance).
Fixed-Effects Regression Assumptions
Under these assumptions, FE is consistent and normally distributed.
Variance-Covariance Matrix
• In general, we would like to allow the error terms to be correlated
over time for a given entity, and this makes the formula for the
asymptotic variance complicated.
• You can also allow for heteroskedasticity. Then you can compute
the “heteroskedasticity- and autocorrelation-consistent asymptotic
variance”.
• You can also compute “clustered standard errors” because there is
a grouping, or “cluster”, within which the error term is possibly
correlated, but outside of which (across groups) it is not. For
example, you can allow for correlation of the errors for individuals
within the same family but not between families.