potential outcomes framework

Applied Econometrics for Economic Research CIES/INEI

1

The Potential Outcomes Framework or the Neyman-Rubin-Holland Model

Stanislao Maldonado1 University of California, Berkeley

January, 2010

1. Notation

This is based on Holland (1986), Angrist et al (2009) and Morgan et al (2007). The model was proposed originally by Neyman (1923) and further developed by Rubin (1974). We introduce here the basic terminology:

• i is an index for individuals in a population. • iD is the treatment or the potential cause of which we want to estimate the effect.

o 1iD = if individual i has been exposed to treatment.

o 0iD = if individual i has not been exposed to treatment.

• ( )i iY D is the outcome or the effect we want to attribute to the treatment.

o (1)iY is the outcome in case of treatment.

o (0)iY is the outcome in case of no treatment.

Note that the outcome for each individual can be written as follows:

(1) (1) (1 ) (0)i i i i iY DY D Y= + −

Or simply:

(1) if 1(0) if 0

i i i

i i i

Y Y DY Y D= == =

2. The fundamental problem of causal inference

Definition 1: Causal Effect For every individual i , the causal effect of 1iD = is (1) (0)i i iY YΔ = −

Problem: we don’t observe the same unit in both treatment states.

Proposition 1: Fundamental problem of causal inference (Holland 1986) It is not possible to observe for the same individual i the values 1iD = and 0iD = as well as the values

(1)iY and (0)iY . Therefore, it is not possible to estimate the effect of iD on iY for each individual i .

1 Ph.D student. Department of Agricultural and Resource Economics. E-mail: [email protected]


2

Table 1 The Fundamental Problem of Causal Inference

Group Y(1) Y(0) Treatment (D=1) Observable as Y Counterfactual Control (D=0) Counterfactual Observable as Y

We are required to think in terms of “counterfactuals”; i.e what would have happened with a treated individual if he or she would not have received the treatment and viceversa.

3. Solutions to the fundamental problem of causal inference.

Holland (1986) suggests two types of solutions: a) the scientific solution and b) the statistical solution.

The statistical solution is based on estimating the average effect of the treatment instead of doing so at an individual level.

The first one in known as average treatment effect (ATE):

[ ] [ ][ ] [ ]

(2) (1) (0)

(1) (0)i i i

i i

ATE E E Y Y

E Y E Y

= Δ = −

= −

This average effect is still not estimable without further assumptions on the relationship between the potential outcomes (1)iY and (0)iY with the treatment iD .

Notice that we can have a conditional version of this parameter:

[ ] [ ][ ] [ ]

(2 ') ( ) (1) (0)

(1) (0)i i i

i i

ATE X E X E Y Y X

E Y X E Y X

= Δ = −

= −

More interesting for economists is the average treatment effect on the treated (ATT):

[ ] [ ][ ] [ ]

(3) 1 (1) (0) 1

(1) 1 (0) 1i i i i i

i i i i

ATT E D E Y Y D

E Y D E Y D

= Δ = = − =

= = − =

As in the previous case, we cannot estimate this parameter without further assumptions.

The conditional version:


3

[ ] [ ][ ] [ ]

(3') ( ) , 1 (1) (0) , 1

(1) , 1 (0) , 1i i i i i

i i i i

ATT X E X D E Y Y X D

E Y X D E Y X D

= Δ = = − =

= = − =

We can define also a parameter called the average treatment effect for the untreated (ATU):

[ ] [ ][ ] [ ]

(4) 0 (1) (0) 0

(1) 0 (0) 0i i i i i

i i i i

ATU E D E Y Y D

E Y D E Y D

= Δ = = − =

= = − =

The conditional version:

[ ] [ ][ ] [ ]

(4 ') ( ) , 0 (1) (0) , 0

(1) , 0 (0) , 0i i i i i

i i i i

ATU X E X D E Y Y X D

E Y X D E Y X D

= Δ = = − =

= = − =

Other parameters of interest in the literature:

• Local average treatment effect (LATE) • Marginal treatment effect (MTE)

As mentioned before, we can extend these parameters by conditioning on a set of covariates X .

4. The selection problem

A simple way to estimate ATT is by using the mean difference in outcomes (MDO) or naïve estimator:

[ ] [ ][ ] [ ]

(5) 1 0

(1) 1 (0) 0i i i i

i i i i

MDO E Y D E Y D

E Y D E Y D

= = − =

= = − =

This provides a biased estimate of ATT:

[ ] [ ][ ] [ ][ ] [ ][ ] [ ]

[ ] [ ]{ }selection bias

(0) 1

(0

(6) 1 0

(1) 1 (0) 0

(1) 1

(0) 0

(0) 1

1

(

)

0) 0

i i i i

i i i i

i i

i i

i i

i

i

i i

i

i

MDO E Y D E Y D

E Y D E Y D

E Y D

E Y D

AT

E Y D

E

T E D

D

Y E Y D

Y

−

= = − =

= = − =

= =

−

=

+ = =

= + = − =

ATT can be consistently estimated using the naïve estimator when there is no selection bias.

Where does selection bias come from?

• Open bias:


4

[ ] [ ] [ ] [ ](7) (1) 1 (1) (1) , 1 (1)i i i i i iE Y D E Y E Y X D E Y X= ≠ ∧ = =

The selection process is based on observables.

• Hidden bias

[ ] [ ] [ ] [ ](8) (1) , 1 (1) (1) , , 1 (1) ,i i i i i i i iE Y X D E Y X E Y X D E Y Xε ε= ≠ ∧ = =

The selection process is based on unobservables.

5. Some conceptual issues

Some issues to have in mind when working with this model:

5.1. The stable unit treatment unit value assumption (SUTVA)

This assumption implies that the potential outcomes of individuals be unaffected by potential changes in the treatment exposures of other individuals (Morgan and Winship 2007, section 2.4).

One way to understand SUTVA: no general equilibrium effects due to the treatment.

5.2. “No causation without manipulation”

Critical issue: understanding causality in this framework depends on the ability of defining correctly the potential outcomes.

Poorly defined treatments are those in which the treatment cannot be potentially manipulated.

Example:

• She scored highly on the exam because she is female. • She scored highly on the exam because she studied. • She scored highly on the exam because her teacher tutored her.

In which case the potential outcomes are correctly defined?

6. The Experimental Ideal

Key idea of this course: how to approximate our research strategy to one situation that resembles an experiment in which the treatment is randomly assigned.

Angrist and Pischke (2009): random assignment is the most credible and influential research design because solves the “selection problem”.

Recall from Table 1,


5

[ ] [ ][ ] [ ]

1 (1) 1

0 (0) 0i i i i

i i i i

E Y D E Y D

E Y D E Y D

= = =

= = =

The key question is whether;

[ ] [ ](9) (0) 0 (0) 1i i i iE Y D E Y D= = =

And also;

[ ] [ ](10) (1) 1 (1) 0i i i iE Y D E Y D= = =

Comments:

• Generally, none of these conditions hold with observational data due to the existence of selection.

• There is an important case in which these conditions are met. That is the case of a randomized experiment.

• In an experimental design, the treatment iD is randomly assigned. Because of that, the

treatment iD is independent (or orthogonal) of the potential outcomes (1)iY and (0)iY .

Therefore,

[ ] [ ] [ ](11) (0) 0 (0) 1 (0)i i i i iE Y D E Y D E Y= = = =

[ ] [ ] [ ](12) (1) 1 (1) 0 (1)i i i i iE Y D E Y D E Y= = = =

Then, we can compute ATE by simply computing;

[ ] [ ] [ ] [ ][ ] [ ] [ ] [ ]

(13) (1) (0) (1) (0)

(1) 1 (0) 0 1 0i i i i i

i i i i i i i i

ATE E E Y Y E Y E Y

E Y D E Y D E Y D E Y D

= Δ = − = −

= = − = = = − =

7. Naïve estimation of treatment effects with observational data

Without experimental data, we need to rely on assumptions. In particular, we need to assume or argue that our treatment is “as good as randomly assigned”. We write this condition as follows:

{ }(14) (1), (0)i i iY Y D⊥

As we will see later, one way to do that is by arguing that the treatment is ignorable after conditioning by a set of covariates. This is known as selection on observables.


6

The critical assumption can be written as follows:

{ }(15) (1), (0)i i iY Y D X⊥

Correspondingly, there are also techniques based on the selection on unobservables, particularly the instrumental variables approach. The assumption is written as:

{ }(16) (1), (0) ,i i i iY Y D X ε⊥

Let’s consider the estimation of treatment effects with a random sample from a population. Thus, we can re-write the naïve estimator of treatment effect in the following way:

[ ] [ ](17) 1 1NAIVE N i i N i iE y d E y dΔ = = − =

Assume that an autonomous fixed treatment selection regime prevails and π is the proportion of the population of interest that takes the treatment.

In observational studies, there is no guarantee that the naïve estimator is going to converge to any of the parameters defined earlier.

For instance, we can decompose ATE in the following way:

[ ] [ ] [ ][ ] [ ]

(18) (1) 1 (1 ) (1) 0

(0) 1 (1 ) (0) 0i i i i i

i i i i

E E Y D E Y D

E Y D E Y D

π π

π π

Δ = = + − =

− = + − =

Comments:

• ATE is a function of five unknowns: the proportion of the population self-selected into the treatment, and four potential outcomes.

• Without additional assumptions, we can consistently estimate three of these five unknowns from a random sample of the population.

• In particular, we have that the following sample means converge in probability to the true population parameters:

[ ] [ ][ ] [ ]

[ ]

1 (1) 1

0 (0) 0

pN i

pN i i i i

pN i i i i

E d

E y d E Y D

E y d E Y D

π⎯⎯→

= ⎯⎯→ =

= ⎯⎯→ =

• Without imposing additional assumptions there is no way to compute the remaining unknowns.

Now, let’s discuss the bias of the naïve estimator as an estimator of ATE. After a bit of algebra, it can be shown that:


7

[ ] [ ] [ ][ ] [ ]

[ ] [ ]{ }

(19) (1) 1 (0) 0

(0) 1 (0) 0

(1 ) 1 0

i i i i i

i i i i

i i i i

E Y D E Y D E

E Y D E Y D

E D E Dπ

= − = = Δ +

= − =

+ − Δ = − Δ =

This expression suggests that the naïve estimator includes the ATE plus two terms:

• [ ] [ ](0) 1 (0) 0i i i iE Y D E Y D= − = , which is known as the “baseline bias”; and

• [ ] [ ]{ }(1 ) 1 0i i i iE D E Dπ− Δ = − Δ = , which is known as the “differential treatment

effect bias”.

It should be clear that in order to get an unbiased and consistent estimate of ATE from a random sample of a population, we have to rely on assumptions about the counterfactuals.

Consider the following:

[ ] [ ](20) A.1: (1) 1 (1) 0i i i iE Y D E Y D= = =

[ ] [ ](21) A.2: (0) 1 (0) 0i i i iE Y D E Y D= = =

By assuming A.1 and A.2, we can compute the remaining unknowns in equation (13). In such a situation ATE=ATT=ATU. Consider the following cases:

• A.1 true but A.2 is not: Therefore, the naïve estimator is biased and inconsistent for ATE but unbiased and consistent for ATU.

• A.2 true but A.1 is not: Therefore, the naïve estimator is biased and inconsistent for ATE but unbiased and consistent for ATT.

8. Final comments

In most of the cases faced by social scientists, both assumptions are hard to believe when only non-experimental data is available.

We need to find some source of exogenous variation in the data in order to be able to estimate a causal relationship.

The “beauty” of the modern econometrics is to find ways to do exactly this.

potential outcomes framework

Documents