epse 581c: causal inference for applied researchers...epse 581c: causal inference for applied...

50
EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia [email protected] May 27, 2019 Ed Kroc (UBC) Causal Inference May 27, 2019 1 / 50

Upload: others

Post on 18-Aug-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

EPSE 581C: Causal Inference for Applied Researchers

Ed Kroc

University of British Columbia

[email protected]

May 27, 2019

Ed Kroc (UBC) Causal Inference May 27, 2019 1 / 50

Page 2: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Last time

More model misspecification and (some of) its effects

Consistency and unbiasedness of estimators

Ed Kroc (UBC) Causal Inference May 27, 2019 2 / 50

Page 3: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Today

Even more model misspecification and (some of) its effects

The Neyman-Rubin causal model

Introduction to matching

Ed Kroc (UBC) Causal Inference May 27, 2019 3 / 50

Page 4: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification: Ex. 1

Misspecified model on LEFT; properly specified model on RIGHT:

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

x

y

Ed Kroc (UBC) Causal Inference May 27, 2019 4 / 50

Page 5: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification: Ex. 1

Misspecified model on LEFT; properly specified model on RIGHT:

0.0 0.5 1.0 1.5

-0.15

-0.05

0.05

0.10

0.15

0.20

fitted(mod.w)

residuals(mod.w)

0.0 0.5 1.0 1.5 2.0

-0.10

-0.05

0.00

0.05

0.10

fitted(mod.r)

residuals(mod.r)

Clear evidence of model misspecification in residuals vs. fitted plot!

Ed Kroc (UBC) Causal Inference May 27, 2019 5 / 50

Page 6: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification: Ex. 1

Misspecified model on LEFT; properly specified model on RIGHT:

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

x

yzACEwrong pX “ 0.5q “ pβ1T “ 0.515

zACE rightpX “ 0.5q “ pβT ` 0.5pβTX “ 0.008` 0.5 ˚ 0.999 “ 0.508

Not too bad. . ., but what if the misspecification was worse?

Ed Kroc (UBC) Causal Inference May 27, 2019 6 / 50

Page 7: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification: Ex. 2

Misspecified model on LEFT; properly specified model on RIGHT:

0.0 0.2 0.4 0.6 0.8 1.0

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

x

y

Ed Kroc (UBC) Causal Inference May 27, 2019 7 / 50

Page 8: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification: Ex. 1

Misspecified model on LEFT; properly specified model on RIGHT:

-2.5 -2.0 -1.5 -1.0 -0.5 0.0

-0.3

-0.2

-0.1

0.0

0.1

0.2

fitted(mod.w2)

residuals(mod.w2)

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0

-0.10

-0.05

0.00

0.05

0.10

fitted(mod.r)

residuals(mod.r)

Clear evidence of model misspecification in residuals vs. fitted plot!

Ed Kroc (UBC) Causal Inference May 27, 2019 8 / 50

Page 9: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification: Ex. 2

Misspecified model on LEFT; properly specified model on RIGHT:

0.0 0.2 0.4 0.6 0.8 1.0

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

x

yzACEwrong pX “ 0.5q “ pβ1T “ ´0.152

zACE rightpX “ 0.5q “ pβT ` 0.5pβTX ` p0.5q2

pβTX2 ““ ´0.527

Misspecified model ACE estimate is 3-times too small.

Ed Kroc (UBC) Causal Inference May 27, 2019 9 / 50

Page 10: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification: ignore fit statistics

Notice: fit statistics are useless here.

That is, misspecified models can still “fit” the data very well.

Good enough for explanatory modelling.

Not good enough for causal modelling!

Ignore all fit statistics when performing causal modelling, including:

Goodness-of-fit F -tests

R2 statistics

Information criterion statistics (AIC, BIC, DIC, etc.)

Ed Kroc (UBC) Causal Inference May 27, 2019 10 / 50

Page 11: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification: ignore statistical significance

Notice: statistical significance of model coefficient estimates isirrelevant here.

Recall numerical Ex. 1:

All estimates significant for misspecified model

In properly specified model, intercept (pβ0) and marginal treatment

(pβT ) estimates not statistically significant.

Recall numerical Ex. 2:

All estimates significant for misspecified model

In properly specified model, intercept (pβ0) and marginal first-order

treatment (pβT ) estimates not statistically significant.

Ed Kroc (UBC) Causal Inference May 27, 2019 11 / 50

Page 12: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification: bigger sample size will never fixthe problem

It is common “wisdom” that the more data you have, the better youwill be able to quantify your effects of interest.

This is true for explanatory/descriptive and predictive modelling, butfalse for causal modelling.

Ed Kroc (UBC) Causal Inference May 27, 2019 12 / 50

Page 13: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Unbiasedness of estimators

Generally, an estimator pθ for some population parameter θ of a randomvariable of interest X is called unbiased if:

Eppθq “ θ

In words, an estimator is unbiased for its estimand (what it is trying toestimate) if, on average, the estimator equals the estimand.

Example: In a random sample, the sample mean, pθ “ 1n

řni“1 Xi , is an

unbiased estimator of the population mean, θ “ EpX q

Ed Kroc (UBC) Causal Inference May 27, 2019 13 / 50

Page 14: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Consistency of estimators

Generally, an estimator pθ is called consistent if, as the sample size increaseswithout bound, the sample value of pθ approaches a single number, a:

for all ε ą 0, limnÑ8

Prp|pθ ´ a| ą ε | Snq “ 0,

where Sn denotes a random sample of size n.

If an estimator is both unbiased and consistent, then not only does itsaverage value equal the true estimand of interest, but as we increasethe sample size, the estimator becomes more and more precise aboutthis true value.

That is, such an estimator is both accurate and precise as sample sizeincreases.

(ML, OLS) Estimates of regression parameters are always consistent,but are also always biased when the functional form of the regressionmodel is misspecified.

Ed Kroc (UBC) Causal Inference May 27, 2019 14 / 50

Page 15: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification: omitted variables

So far, we have only focused on model misspecification where thefunctional form of the covariates is misspecified, but our modelsalways contained all explanatory variables.

In practical non-experimental research, we will always be missingsome confounders; we can’t measure everything, or even knoweverything we should always be measuring!

Detecting important omitted variables can be very difficult.

Residual plots still the way to go, but they will not always suggestomitted variable bias.

Hence, why the exchangeability of treatment is so important in anRD-design: treatment is “as good as” randomly assigned near thethreshold; thus, biasing effects of omitted variables should benegligible (near the threshold).

Ed Kroc (UBC) Causal Inference May 27, 2019 15 / 50

Page 16: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification with omitted variables

True data-generating process, where CorrpX ,W q “ 0.42:

Y “ 10` X ` X 2 ` X ˚W ` δ

Misspecified model 1:Y “ β0 ` βXX ` ε

Model fit:

Ed Kroc (UBC) Causal Inference May 27, 2019 16 / 50

Page 17: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification with omitted variables

True data-generating process, where CorrpX ,W q “ 0.42:

Y “ 10` X ` X 2 ` X ˚W ` δ

Misspecified model 1:Y “ β0 ` βXX ` ε

0 1 2 3 4 5 6

2040

6080

100

Misspecified first-order model

x

y

0 20 40 60

-20

-10

010

2030

Misspecified first-order model

fitted(mod1)

residuals(mod1)

Ed Kroc (UBC) Causal Inference May 27, 2019 17 / 50

Page 18: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification with omitted variables

True data-generating process, where CorrpX ,W q “ 0.42:

Y “ 10` X ` X 2 ` X ˚W ` δ

Obvious missing curvature in plots of model 1.

Misspecified model 2:

Y “ β0 ` βXX ` βX2X2 ` ε

Model fit:

Ed Kroc (UBC) Causal Inference May 27, 2019 18 / 50

Page 19: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification with omitted variables

True data-generating process, where CorrpX ,W q “ 0.42:

Y “ 10` X ` X 2 ` X ˚W ` δ

Misspecified model 2:

Y “ β0 ` βXX ` βX2X2 ` ε

0 1 2 3 4 5 6

2040

6080

100

Misspecified second-order model

x

y

20 40 60 80

-20

-10

010

Misspecified second-order model

fitted(mod2)

residuals(mod2)

Ed Kroc (UBC) Causal Inference May 27, 2019 19 / 50

Page 20: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification with omitted variables

True data-generating process, where CorrpX ,W q “ 0.42:

Y “ 10` X ` X 2 ` X ˚W ` δ

Obvious heteroskedasticity in plots of model 2 actually caused by omittedvariable W .

Misspecified model 3:

Y “ β0 ` βXX ` βWW ` βXWX ¨W ` ε

Model fit:

Ed Kroc (UBC) Causal Inference May 27, 2019 20 / 50

Page 21: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification with omitted variables

True data-generating process, where CorrpX ,W q “ 0.42:

Y “ 10` X ` X 2 ` X ˚W ` δ

Misspecified model 3:

Y “ β0 ` βXX ` βWW ` βXWX ¨W ` ε

0 1 2 3 4 5 6

-20

24

68

10

Scatterplot of Covariates

x

w

20 40 60 80 100

-2-1

01

23

45

Misspecified first-order interaction model

fitted(mod3)

residuals(mod3)

Ed Kroc (UBC) Causal Inference May 27, 2019 21 / 50

Page 22: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification with omitted variables

True data-generating process, where CorrpX ,W q “ 0.42:

Y “ 10` X ` X 2 ` X ˚W ` δ

Misspecified model 3:

Y “ β0 ` βXX ` βWW ` βXWX ¨W ` ε

0 1 2 3 4 5 6

-20

24

68

10

Scatterplot of Covariates

x

w

20 40 60 80 100

-2-1

01

23

45

Misspecified first-order interaction model

fitted(mod3)

residuals(mod3)

Ed Kroc (UBC) Causal Inference May 27, 2019 22 / 50

Page 23: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification with omitted variables

True data-generating process, where CorrpX ,W q “ 0.42:

Y “ 10` X ` X 2 ` X ˚W ` δ

Misspecified model 3:

Y “ β0 ` βXX ` βWW ` βXWX ¨W ` ε

3D plot of x,w vs. y

-1 0 1 2 3 4 5 6

0 2

0 4

0 6

0 8

0100

120

-4-2

0 2

4 6

810

x

wy

Ed Kroc (UBC) Causal Inference May 27, 2019 23 / 50

Page 24: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification with omitted variables

True data-generating process, where CorrpX ,W q “ 0.42:

Y “ 10` X ` X 2 ` X ˚W ` δ

Still seeing indication of missing curvature, at least near extreme fittedvalues.

Correctly specified (overspecified) model:

Y “ β0 ` βXX ` βX2X2 ` βWW ` βXWX ¨W ` ε

Model fit:

Ed Kroc (UBC) Causal Inference May 27, 2019 24 / 50

Page 25: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification with omitted variables

True data-generating process, where CorrpX ,W q “ 0.42:

Y “ 10` X ` X 2 ` X ˚W ` δ

Correctly specified (overspecified) model:

Y “ β0 ` βXX ` βX2X2 ` βWW ` βXWX ¨W ` ε

20 40 60 80 100

-1.5

-1.0

-0.5

0.0

0.5

1.0

Correctly specified model

fitted(mod4)

residuals(mod4)

Ed Kroc (UBC) Causal Inference May 27, 2019 25 / 50

Page 26: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification

Often impossible to assess model misspecification in the publishedliterature.

Notice how wildly the model estimates varied between the variousmisspecified models; this is often a good way to tell if modelmisspecification is of concern.

Ed Kroc (UBC) Causal Inference May 27, 2019 26 / 50

Page 27: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification: ugly example

True data-generating process, where Wi correlate with X :

Y “ 10`X`X 2`W1`0.02X ˚W1`0.01W2`0.02W3˚W4`0.01W3˚X2`δ

Misspecified model:

Y “ β0 ` βXX ` βX2X2 ` βWW1 ` βXWX ¨W1 ` ε

10 20 30 40 50 60 70

-1.5

-1.0

-0.5

0.0

0.5

1.0

Rez vs. Fitted, ugly data

fitted(modu)

residuals(modu)

Ed Kroc (UBC) Causal Inference May 27, 2019 27 / 50

Page 28: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification: ugly example

True data-generating process, where Wi correlate with X :

Y “ 10`X`X 2`W1`0.02X ˚W1`0.01W2`0.02W3˚W4`0.01W3˚X2`δ

Misspecified model:

Y “ β0 ` βXX ` βX2X2 ` βWW1 ` βXWX ¨W1 ` ε

10 20 30 40 50 60 70

-1.5

-1.0

-0.5

0.0

0.5

1.0

Rez vs. Fitted, ugly data

fitted(modu)

residuals(modu)

Ed Kroc (UBC) Causal Inference May 27, 2019 28 / 50

Page 29: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Model misspecification: ugly example

True data-generating process, where Wi correlate with X :

Y “ 10`X`X 2`W1`0.02X ˚W1`0.01W2`0.02W3˚W4`0.01W3˚X2`δ

Maybe some evidence of misspecification? Residuals don’t really look thatbad though.

Model fit:

Parameter estimates still quite biased.

Ed Kroc (UBC) Causal Inference May 27, 2019 29 / 50

Page 30: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Regression discontinuity design

Recall the RD design:

Recall in an RD design, we can estimate the ACE at the fixedthreshold.

Recall model misspecification is an issue here, since we fit separateregression models on either side of the threshold in order to constructan estimate of the ACE at the threshold.

The major challenge with RD designs is how to specify the propermodel(s).

Very common advice: use nonparametric methods that allow the datato determine the functional form of the model.

However, major problems with this:

Nonparametric methods prone to overfitting.

Overreliance on (possibly confounded) response data away from thethreshold.

Ed Kroc (UBC) Causal Inference May 27, 2019 30 / 50

Page 31: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Regression discontinuity design

Best advice for fitting RD design:

Alternative:

Gelman & Imbens, JBusEconStat (2018): use only first or second-orderparametric models.

Criticism: Still an overreliance on (possibly confounded) response dataaway from the threshold.

My advice:

Recall: have exchangeability of units over treatment near the fixedthreshold.

So fit your regression model as locally as possible.

First or second-order models will usually be sufficient locally (Taylorseries).

Rely only (mostly) on unconfounded response data near the threshold.

Problem: need a lot more data near the threshold in order to fit themodel.

Ed Kroc (UBC) Causal Inference May 27, 2019 31 / 50

Page 32: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

The Neyman-Rubin causal model

We have relied on mathematicitizing the problem of causal inferenceusing the language of counterfactuals (potential outcomes).

Such a formulation was introduced by Neyman in 1923.

Subsequently, Rubin, Holland, and others extended Neyman’s ideasbeyond the framework of randomized, controlled experiments.

Today, the Neyman-Rubin causal model is the industry standard inmost areas of social and health science (although Pearl’s framework isgaining more popularity).

Ed Kroc (UBC) Causal Inference May 27, 2019 32 / 50

Page 33: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

The Neyman-Rubin causal model

In the counterfactual formulation of causality, we imagine that everyindividual i has a pair of hypothetical “potential outcomes”:

Yi p1q is the outcome for sample unit i if they were given the treatment(T “ 1),

Yi p0q is the outcome for sample unit i if they are not given thetreatment (T “ 0).

Fundamental problem of (counterfactual) causal inference: for anyindividual i , we can never simultaneously observe Yi p0q and Yi p1q.

In particular, we can never know the individual causal effect oftreatment:

ICEi pT q :“ Yi p1q ´ Yi p0q.

Ed Kroc (UBC) Causal Inference May 27, 2019 33 / 50

Page 34: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

The Neyman-Rubin causal model

Since it is never possible to observe the ICE, we instead focus on theaverage causal effect:

ACE pT q :“ EpYi p1q ´ Yi p0qq “ EpYi p1qq ´ EpYi p0qq.

If assignment to treatment depends on some covariate(s) X , we maywrite

ACE pT | X q :“ EpYi p1q ´ Yi p0q | X q.

It is then possible to estimate the ACE (unconditional or conditional)if all sample units are exchangeable over treatment; i.e. if sampleunits only differ by what treatment they receive, on average.

Random assignment to treatment will ensure such exchangeability.

Ed Kroc (UBC) Causal Inference May 27, 2019 34 / 50

Page 35: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

The stable unit treatment value assumption (SUTVA)

One key assumption of the NRCM is called the stable unit treatmentvalue assumption (SUTVA):

The counterfactual (potential outcome) of one sample unit should beunaffected by the particular assignment of treatments to the othersample units.

More simply, whatever treatment one sample unit receives should notaffect the outcome of whatever treatment another sample unit receives.

Such an assumption should hold in a tightly controlled experiment,e.g. where sample units are not allowed to interact with each other.

But the SUTVA assumption may fail if sample units are not isolated.

Ed Kroc (UBC) Causal Inference May 27, 2019 35 / 50

Page 36: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

The stable unit treatment value assumption (SUTVA)

Example were SUTVA assumption fails:

We randomly assign treatment or placebo (double-blinded) to 100 highblood pressure patients.

However, suppose two of these patients (Boris and Doris) live in thesame household.

Unbeknownst to anyone, Boris receives the treatment, while Dorisreceives the placebo.

Suppose an unknown side effect of the drug is that it tends to makepatients very tired.

Boris becomes very tired on the treatment, causing Doris to have to domore for both of them (e.g. cook, clean, shop, etc.)

The extra stress causes Doris’ blood pressure to rise.

Thus, our estimate of the ACE of treatment may be inflated: effect ofplacebo on Doris (moderated by effect of drug on Boris) caused Doris’blood pressure to go up.

Ed Kroc (UBC) Causal Inference May 27, 2019 36 / 50

Page 37: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

The stable unit treatment value assumption (SUTVA)

Three possible fixes to the Boris and Doris problem:

(1) Ensure that all sample units are isolated by design (e.g. do not enrolpatients who know each other).

(2) Decompose causal effect of treatment into two parts:YBorisp1 | TDoris “ 1q vs. YBorisp0 | TDoris “ 1q andYBorisp1 | TDoris “ 0q vs. YBorisp0 | TDoris “ 0q.

(3) Control for confounding variable: Yi p1 | i „ jq vs. Yi p0 | i „ jq.

(1) is by far the best choice.

(2) and (3) resolve the issue by complicating the analysis; they willalso require considerably more data since we have more effects(dependencies) to disentangle.

Ed Kroc (UBC) Causal Inference May 27, 2019 37 / 50

Page 38: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

The fundamental problem of causal inference

Rubin (and Holland) call the fact that we can never simultaneouslyobserve Yi p0q and Yi p1q for any individual i the fundamental problemof causal inference.

Rubin goes one step further and characterizes this as a problem ofmissing data.

Subject Yi p0q Yi p1q ICE “ Yi p1q ´ Yi p0q

Anya ? -2 ?Boris ? -5 ?Doris 5 ? ?Natasha -1 ? ?Pyotr ? 1 ?Vladimir 1 ? ?

Ed Kroc (UBC) Causal Inference May 27, 2019 38 / 50

Page 39: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

The fundamental problem of causal inference

These data allow us to estimate the following conditionalexpectations:

EpYi | T “ 1q, EpYi | T “ 0q.

For our data below, we have

pEpYi | T “ 0q “ 1.67, pEpYi | T “ 1q “ ´2.

Subject Yi p0q Yi p1q ICE “ Yi p1q ´ Yi p0q

Anya ? -2 ?Boris ? -5 ?Doris 5 ? ?Natasha -1 ? ?Pyotr ? 1 ?Vladimir 1 ? ?

Ed Kroc (UBC) Causal Inference May 27, 2019 39 / 50

Page 40: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

The fundamental problem of causal inference

Recall that we really want the ACE:

ACE “ EpYi p1qq ´ EpYi p0qq

If treatment is randomly assigned to patients, and patients stay ontheir assigned treatment, and the SUTVA holds, then:

zACE “ pEpYi | T “ 1q ´ pEpYi | T “ 0q “ ´2´ 1.67 “ ´3.67

Subject Yi p0q Yi p1q ICE “ Yi p1q ´ Yi p0q

Anya ? -2 ?Boris ? -5 ?Doris 5 ? ?Natasha -1 ? ?Pyotr ? 1 ?Vladimir 1 ? ?

Ed Kroc (UBC) Causal Inference May 27, 2019 40 / 50

Page 41: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

The assignment to treatment mechanism

We have already argued that randomization of treatment allows foran unconfounded estimate of the ACE (assuming patients stay ontheir assigned treatment - we will later see a technique that relaxesthis assumption: instrumental variables).

We have also seen that a deterministic assignment of treatment canallow for an unconfounded estimate of the ACE locally (RD design).

In practice, true randomization can be very difficult to achieve:blinding of patients and researchers/doctors helps.

Ed Kroc (UBC) Causal Inference May 27, 2019 41 / 50

Page 42: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

The assignment to treatment mechanism

Many ways the assignment to treatment mechanism can becomeconfounded: e.g. side effects of treatment affect women more thanmen, older people susceptible to more side effects, busier people lesslikely to follow our directions, etc.

Note: self-selection or preferential sampling in study are not examplesof confounded assignment to treatment mechanisms. They certainlycompromise the generalizability of our inferences, but they do notdirectly affect the quality of the causal inferences we can make on thesample.

Oftentimes, the assignment to treatment mechanism will beconfounded due to ethical considerations.

Ed Kroc (UBC) Causal Inference May 27, 2019 42 / 50

Page 43: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

The assignment to treatment mechanism

Classic example (Rubin): “The perfect doctor”.

The perfect doctor knows (a priori) what the best treatment is forevery individual patient.

Thus, the perfect doctor assigns the best treatment for each individualpatient. The perfect doctor has complete counterfactual information:

Subject Yi p0q Yi p1q ICE “ Yi p1q ´ Yi p0q

Anya -1 -2 -1Boris -2 -5 -3Doris 5 2 -3Natasha -1 0 1Pyotr 0 1 1Vladimir 1 -1 -2

Average 0.33 -0.83 ACEtrue = -1.17

Ed Kroc (UBC) Causal Inference May 27, 2019 43 / 50

Page 44: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

The assignment to treatment mechanism

Classic example (Rubin): “The perfect doctor”.

Based on this total information, the perfect doctor makes thefollowing assignments to treatment:

Subject Yi p0q Yi p1q ICE “ Yi p1q ´ Yi p0q

Anya ? -2 ?Boris ? -5 ?Doris ? 2 ?Natasha -1 ? ?Pyotr 0 ? ?Vladimir ? -1 ?

Average -0.5 -1.5 zACE = -1

Notice: estimate of ACE is now distorted (biased). The perfect doctor isgreat for individual patients, but bad for science/inference.

Ed Kroc (UBC) Causal Inference May 27, 2019 44 / 50

Page 45: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Matching

In observational or quasi-experimental research designs, we usually do nothave the ability to assign treatment at all; we simply observe outcomesbased on assignment to treatment mechanisms that we cannotcontrol/manipulate (e.g. self-selection to treatment, or comorbidities(covariates) increasing likelihood of assignment to treatment).

The most widely used way to attempt to correct for this confoundingof the assignment to treatment mechanism is by some kind ofmatching of sample units.

Generally speaking, the idea is to match up sample units from onetreatment group to another based on how similar they are over allmeasured covariates.

Matched sample units of different treatments can then mimiccounterfactuals, assuming no omitted confounders.

Ed Kroc (UBC) Causal Inference May 27, 2019 45 / 50

Page 46: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Matching: example

Revisit blood pressure example, but this time assignment to treatment isnot randomized: first three patients to enrol receive drug, last threereceive placebo.

Subject SexBaseline

blood pressureYi p0q Yi p1q ICE “ Yi p1q ´ Yi p0q

Anya F 150 ? -2 ?Boris M 170 ? -5 ?Doris F 180 ? 2 ?Natasha F 150 -1 ? ?Pyotr M 170 0 ? ?Vladimir M 180 1 ? ?

Average 166.7 0 -1.67 zACE = -1.67

Ed Kroc (UBC) Causal Inference May 27, 2019 46 / 50

Page 47: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Matching: example

Subject SexBaseline

blood pressureYi p0q Yi p1q ICE “ Yi p1q ´ Yi p0q

Anya F 150 ? -2 ?Boris M 170 ? -5 ?Doris F 180 ? 2 ?Natasha F 150 -1 ? ?Pyotr M 170 0 ? ?Vladimir M 180 1 ? ?

Average 166.7 0 -1.67 zACE = -1.67

Can match (drug, placebo) over baseline blood pressure, but not over sex.Are females more likely to enrol before males? Other possible (omitted)confounders?

Ed Kroc (UBC) Causal Inference May 27, 2019 47 / 50

Page 48: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Matching: problems

Three fundamental problems of matching:

(1) Can never measure all possible confounders, so always imperfectcorrections.

(2) Matching on many covariates simultaneously requires a lot of data(curse of dimensionality).

(3) Matching on multiple covariates requires that we observe enoughsample units in each possible subcategory/strata so that we can finda “match”.

Ed Kroc (UBC) Causal Inference May 27, 2019 48 / 50

Page 49: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Matching: problems

Three fundamental problems of matching:

(1) Can never measure all possible confounders, so always imperfectcorrections.

Issue (1) can never be addressed by matching.

(2) Matching on many covariates simultaneously requires a lot of data(curse of dimensionality).

Propensity score matching addresses problem (2), but relies on aregression framework that is susceptible to all the usual issues withmodel misspecification.

(3) Matching on multiple covariates requires that we observe enoughsample units in each possible subcategory/strata so that we can finda “match”.

Fix that’s not a fix for (3): restrict inferences to subgroups whereenough data exist.

Ed Kroc (UBC) Causal Inference May 27, 2019 49 / 50

Page 50: EPSE 581C: Causal Inference for Applied Researchers...EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia ed.kroc@ubc.ca May 27, 2019 Ed Kroc

Next time

Propensity scores and propensity score matching

Ed Kroc (UBC) Causal Inference May 27, 2019 50 / 50