simplifying causal inference - harvard university

Simplifying Causal Inference

Gary King

Institute for Quantitative Social ScienceHarvard University

(Talk at the Center for Population and Development Studies,Harvard University, 11/8/2012)

Gary King (Harvard, IQSS) Matching Methods 1 / 58

Overview

Problem: Model dependence (review)

Solution: Matching to preprocess data (review)

Problem: Many matching methods & specifications

Solution: The Space Graph helps us choose

Problem: The most commonly used method can increase imbalance!

Solution: Other methods do not share this problem

(Coarsened Exact Matching is simple, easy, and powerful)

Lots of insights revealed in the process


Model Dependence Example

Data: 124 Post-World War II civil wars

Dependent variable: peacebuilding success

Treatment variable: multilateral UN peacekeeping intervention (0/1)

Control vars: war type, severity, duration; development status; etc.

Counterfactual question: UN intervention switched for each war

Data analysis: Logit model

The question: How model dependent are the results?


Model Dependence ExampleReplication: Doyle and Sambanis, APSR 2000

Data: 124 Post-World War II civil wars

Dependent variable: peacebuilding success

Treatment variable: multilateral UN peacekeeping intervention (0/1)

Control vars: war type, severity, duration; development status; etc.

Counterfactual question: UN intervention switched for each war

Data analysis: Logit model

The question: How model dependent are the results?


Two Logit Models, Apparently Similar Results

Original “Interactive” Model Modified ModelVariables Coeff SE P-val Coeff SE P-valWartype −1.742 .609 .004 −1.666 .606 .006Logdead −.445 .126 .000 −.437 .125 .000Wardur .006 .006 .258 .006 .006 .342Factnum −1.259 .703 .073 −1.045 .899 .245Factnum2 .062 .065 .346 .032 .104 .756Trnsfcap .004 .002 .010 .004 .002 .017Develop .001 .000 .065 .001 .000 .068Exp −6.016 3.071 .050 −6.215 3.065 .043Decade −.299 .169 .077 −0.284 .169 .093Treaty 2.124 .821 .010 2.126 .802 .008UNOP4 3.135 1.091 .004 .262 1.392 .851Wardur*UNOP4 — — — .037 .011 .001Constant 8.609 2.157 0.000 7.978 2.350 .000N 122 122Log-likelihood -45.649 -44.902Pseudo R2 .423 .433


Doyle and Sambanis: Model Dependence


Model Dependence: A Simpler Example

What to do?

Preprocess I: Eliminate extrapolation region

Preprocess II: Match (prune bad matches) within interpolation region

Model remaining imbalance


Model Dependence: A Simpler Example(King and Zeng, 2006: fig.4 Political Analysis)

What to do?

Preprocess I: Eliminate extrapolation region

Preprocess II: Match (prune bad matches) within interpolation region

Model remaining imbalance


Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12



Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

T

T T

T

T

TTT

TT

T TT T

T

T

T

T



Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

T

T T

T

T

TTT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C



Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

T

T T

T

T

TTT

TT

T TT T

T

T

T

TC

C

C

C

C

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CC

CC

C

C

C

C

CC

CCCC



Matching reduces model dependence, bias, and variance


How Matching Works

Notation:

Yi Dependent variableTi Treatment variable (0/1, or more general)Xi Pre-treatment covariates

Treatment Effect for treated (Ti = 1) observation i :

TEi = Yi (Ti = 1)−Yi (Ti = 0)

= observed −unobserved

Estimate Yi (Ti = 0) with Yj from matched (Xi ≈ Xj) controls

Yi (Ti = 0) = Yj(Ti = 0) or a model Yi (Ti = 0) = g0(Xj)

Prune unmatched units to improve balance (so X is unimportant)

QoI: Sample Average Treatment effect on the Treated:

SATT =1

nT

∑i∈{Ti=1}

TEi

or Feasible Average Treatment effect on the Treated: FSATT


How Matching Works

Notation:Yi Dependent variable

Ti Treatment variable (0/1, or more general)Xi Pre-treatment covariates


TEi = Yi (Ti = 1)−Yi (Ti = 0)






SATT =1

nT

∑i∈{Ti=1}

TEi



How Matching Works

Notation:Yi Dependent variableTi Treatment variable (0/1, or more general)

Xi Pre-treatment covariates


TEi = Yi (Ti = 1)−Yi (Ti = 0)






SATT =1

nT

∑i∈{Ti=1}

TEi



How Matching Works

Notation:Yi Dependent variableTi Treatment variable (0/1, or more general)Xi Pre-treatment covariates


TEi = Yi (Ti = 1)−Yi (Ti = 0)






SATT =1

nT

∑i∈{Ti=1}

TEi



Method 1: Mahalanobis Distance Matching

1 Preprocess (Matching)

Distance(Xi ,Xj) =√

(Xi − Xj)′S−1(Xi − Xj)Match each treated unit to the nearest control unitControl units: not reused; pruned if unusedPrune matches if Distance>caliper

2 Estimation Difference in means or a model





(Xi − Xj)′S−1(Xi − Xj)

Match each treated unit to the nearest control unitControl units: not reused; pruned if unusedPrune matches if Distance>caliper






(Xi − Xj)′S−1(Xi − Xj)Match each treated unit to the nearest control unit

Control units: not reused; pruned if unusedPrune matches if Distance>caliper






(Xi − Xj)′S−1(Xi − Xj)Match each treated unit to the nearest control unitControl units: not reused; pruned if unused

Prune matches if Distance>caliper






(Xi − Xj)′S−1(Xi − Xj)Match each treated unit to the nearest control unitControl units: not reused; pruned if unusedPrune matches if Distance>caliper



Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80



Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

TTTT

T

T

T

T

T

T

T

T

T

TT

T

T

T

T

T



Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

C

C

CC

C

C

C

C

C

CC

C

CCC

CC

C

C

C

CC CC

C

C

CC

C

CC

CC

C

C C

CC

C

C

TTTT

T

T

T

T

T

T

T

T

T

TT

T

T

T

T

T



Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TT T

TT TT T TTTTT TT

TTTT

CCC C

CC

C

C

C CCC

CC

CC C CC

C

C

CCCCC

CCC CCCCC

C CCCC

C



Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TT T

TT TT T TTTTT TT

TTTT

CCC C

CC

C

C

C CCC

CC

CC C CC

C


Method 2: Propensity Score Matching


Reduce k elements of X to scalar πi ≡ Pr(Ti = 1|X ) = 11+e−Xi β

Distance(Xi ,Xj) = |πi − πj |Match each treated unit to the nearest control unitControl units: not reused; pruned if unusedPrune matches if Distance>caliper






Distance(Xi ,Xj) = |πi − πj |

Match each treated unit to the nearest control unitControl units: not reused; pruned if unusedPrune matches if Distance>caliper






Distance(Xi ,Xj) = |πi − πj |Match each treated unit to the nearest control unit

Control units: not reused; pruned if unusedPrune matches if Distance>caliper






Distance(Xi ,Xj) = |πi − πj |Match each treated unit to the nearest control unitControl units: not reused; pruned if unused

Prune matches if Distance>caliper






Distance(Xi ,Xj) = |πi − πj |Match each treated unit to the nearest control unitControl units: not reused; pruned if unusedPrune matches if Distance>caliper



Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

C

CC

C

C

C

C

C

CC

C

CCC

CC

C

C

C

CCCC

C

C

CC

C

CC

CC

C

C C

CC

C

C

TTTT

T

T

T

T

T

T

T

T

T

TT

T

T

T

T

T



Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

C

CC

C

C

C

C

C

CC

C

CCC

CC

C

C

C

CCCC

C

C

CC

C

CC

CC

C

C C

CC

C

C

TTTT

T

T

T

T

T

T

T

T

T

TT

T

T

T

T

T

1

0

PropensityScore



Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

C

CC

C

C

C

C

C

CC

C

CCC

CC

C

C

C

CCCC

C

C

CC

C

CC

CC

C

C C

CC

C

C

TTTT

T

T

T

T

T

T

T

T

T

TT

T

T

T

T

T

1

0

PropensityScore

C

C

CC

CCC

C

C

C

CC

C

C

C

C

C

C

C

CCCCC

C

CCCCCCCCC

C

C

C

C

CC

T

TTT

T

TT

T

T

T

T

TT

T

T

T

T

T

T

T



Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

1

0

PropensityScore

C

C

CC

CCC

C

C

C

CC

C

C

C

C

C

C

C

CCCCC

C

CCCCCCCCC

C

C

C

C

CC

T

TTT

T

TT

T

T

T

T

TT

T

T

T

T

T

T

T



Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

1

0

PropensityScore

C

C

CC

CCC

C

C

C

CC

C

C

C

C

C

C

C

CCCCC

C

CCCCCCCCC

C

C

C

C

CC

CCC

C

CCC

C

CCC

CCCC

T

TTT

T

TT

T

T

T

T

TT

T

T

T

T

T

T

T



Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

1

0

PropensityScore

CCC

C

CCC

C

CCC

CCCC

T

TTT

T

TT

T

T

T

T

TT

T

T

T

T

T

T

T



Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C

C

CC

C TTTT

T

T

T

T

T

T

T

T

T

TT

T

T

T

T

T

1

0

PropensityScore

CCC

C

CCC

C

CCC

CCCC

T

TTT

T

TT

T

T

T

T

TT

T

T

T

T

T

T

T



Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C

C

CC

C TTTT

T

T

T

T

T

T

T

T

T

TT

T

T

T

T

T


Method 3: Coarsened Exact Matching


Temporarily coarsen X as much as you’re willing

e.g., Education (grade school, high school, college, graduate)Easy to understand, or can be automated as for a histogram

Apply exact matching to the coarsened X , C (X )

Sort observations into strata, each with unique values of C(X )Prune any stratum with 0 treated or 0 control units

Pass on original (uncoarsened) units except those pruned


Need to weight controls in each stratum to equal treatedsCan apply other matching methods within CEM strata (inherit CEM’sproperties)



1 Preprocess (Matching)Temporarily coarsen X as much as you’re willing










e.g., Education (grade school, high school, college, graduate)

Easy to understand, or can be automated as for a histogram











Sort observations into strata, each with unique values of C(X )

Prune any stratum with 0 treated or 0 control units












Need to weight controls in each stratum to equal treateds

Can apply other matching methods within CEM strata (inherit CEM’sproperties)


Coarsened Exact Matching



Education

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

CCC CCC CC

C CC C CCC CCCCCCC CC CCC CCCCCC

C CCC CC C

T TT T

TT TT T TTTTT TT

TTTT



Education

HS BA MA PhD 2nd PhD

Drinking age

Don't trust anyoneover 30

The Big 40

Senior Discounts

Retirement

Old

CCC CCC CC


C CCC CC C

T TT T

TT TT T TTTTT TT

TTTT



Education


Drinking age


The Big 40

Senior Discounts

Retirement

Old

CCC CCC CC


C CCC CC C

T TT T

TT TT T TTTTT TT

TTTT



Education


Drinking age


The Big 40

Senior Discounts

Retirement

Old

CC C

CC

CC C CCC CC CCCC

C

TTT T TT

TTT TT

TTTT



Education


Drinking age


The Big 40

Senior Discounts

Retirement

Old

CC C

CCCC C CC

C CC CCCC

C

TTT T TT

TTT TT

TTTT



Education

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

CC C

CCCC C CC

C CC CCCC

C

TTT T TT

TTT TT

TTTT


The Bias-Variance Trade Off in Matching

Bias (& model dependence) = f (imbalance, importance, estimator) we measure imbalance instead

Variance = f (matched sample size, estimator) we measure matched sample size instead

Bias-Variance trade off Imbalance-n Trade Off

Measuring Imbalance

Classic measure: Difference of means (for each variable)Better measure (difference of multivariate histograms):

L1(f , g ;H) =1

2

∑`1···`k∈H(X)

|f`1···`k− g`1···`k

|






Measuring Imbalance

Classic measure: Difference of means (for each variable)

Better measure (difference of multivariate histograms):

L1(f , g ;H) =1

2

∑`1···`k∈H(X)

|f`1···`k− g`1···`k

|






Measuring Imbalance

Classic measure: Difference of means (for each variable)Better measure (difference of multivariate histograms):

L1(f , g ;H) =1

2

∑`1···`k∈H(X)

|f`1···`k− g`1···`k

|


Comparing Matching Methods

MDM & PSM: Choose matched n, match, check imbalance

CEM: Choose imbalance, match, check matched n

Best practice: iterate

But given the matched solution matching method is irrelevant

Our idea: Identify the frontier of lowest imbalance for each given n,and choose a matching solution


A Space Graph: Foreign Aid Shocks & ConflictKing, Nielsen, Coberley, Pope, and Wells (2012)

Mahalanobis Discrepancy L1

Imbalance Metric

Difference in Means

N of Matched Sample

010

2030

40

2500 1500 500 0

●

●

published PSM

published PSM with 1/4 sd caliper

N of Matched Sample

0.0

0.4

0.8

2500 1500 500 0

●●

published PSM

published PSM with1/4 sd caliper

N of Matched Sample

0.00

0.10

0.20

0.30

2500 1500 500 0

●

●

published PSM

published PSM with 1/4 sd caliper

● Raw DataRandom Pruning

"Best Practices" PSMPSM

MDMCEM


A Space Graph: Healthways DataKing, Nielsen, Coberley, Pope, and Wells (2012)


A Space Graph: Called/Not Called DataKing, Nielsen, Coberley, Pope, and Wells (2012)


A Space Graph: FDA Drug Approval TimesKing, Nielsen, Coberley, Pope, and Wells (2012)


A Space Graph: Job Training (Lelonde Data)King, Nielsen, Coberley, Pope, and Wells (2012)


A Space Graph: Simulated Data — Mahalanobis

MDM: 1 Covariate

N of matched sample

L1

500 250 0

0.0

0.5

1.0

HighMedLow

Imbalance:

MDM: 2 Covariates

N of matched sample

L1

500 250 0

0.0

0.5

1.0

HighMedLow

Imbalance:

MDM: 3 Covariates

N of matched sample

L1

500 250 0

0.0

0.5

1.0

HighMedLow

Imbalance:


A Space Graph: Simulated Data — CEM

CEM: 1 Covariate

N of matched sample

L1

500 250 0

0.0

0.5

1.0

HighMedLow

Imbalance:

CEM: 2 Covariates

N of matched sample

L1

500 250 0

0.0

0.5

1.0

HighMedLow

Imbalance:

CEM: 3 Covariates

N of matched sample

L1

500 250 0

0.0

0.5

1.0

HighMedLow

Imbalance:


A Space Graph: Simulated Data — Propensity Score

PSM: 1 Covariate

N of matched sample

L1

500 250 0

0.0

0.5

1.0

HighMedLow

Imbalance:

PSM: 2 Covariates

N of matched sample

L1

500 250 0

0.0

0.5

1.0

HighMedLow

Imbalance:

PSM: 3 Covariates

N of matched sample

L1

500 250 0

0.0

0.5

1.0

HighMedLow

Imbalance:


PSM Approximates Random Matching in Balanced Data

Covariate 1

Cov

aria

te 2

−2 −1 0 1 2

−2

−1

0

1

2

−2 −1 0 1 2

−2

−1

0

1

2

●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●

PSM MatchesCEM and MDM Matches


CEM Weights and Nonparametric Propensity Score

CEM Weight: wi =mT

i

mCi

(+ normalization)

CEM Pscore: Pr(Ti = 1|Xi ) =mT

i

mTi + mC

i

CEM:

Gives a better pscore than PSM

Doesn’t match based on crippled information


Destroying CEM with PSM’s Two Step Approach

Covariate 1

Cov

aria

te 2

−2 −1 0 1 2

−2

−1

0

1

2

−2 −1 0 1 2

−2

−1

0

1

2

●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●●

●

●

●●

●

●

●

● ●

● ●

CEM MatchesCEM−generated PSM Matches


Conclusions

Propensity score matching:

The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM: mistakeAdjusting experimental data with PSM: mistakeReestimating the propensity score after eliminating noncommonsupport: mistake1/4 caliper on propensity score: mistake

In four data sets and many simulations:CEM > Mahalanobis > Propensity Score(Your performance may vary)CEM and Mahalanobis do not have PSM’s problemsYou can easily check with the Space Graph


Conclusions

Propensity score matching:The problem:






Conclusions


Imbalance can be worse than original data

Can increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)





Conclusions


Imbalance can be worse than original dataCan increase imbalance when removing the worst matches

Approximates random matching in well-balanced data(Random matching increases imbalance)





Conclusions







Conclusions



The Cause: unnecessary 1st stage dimension reduction

Implications:




Conclusions







Conclusions




Balance checking required

Adjusting for potentially irrelevant covariates with PSM: mistakeAdjusting experimental data with PSM: mistakeReestimating the propensity score after eliminating noncommonsupport: mistake1/4 caliper on propensity score: mistake



Conclusions




Balance checking requiredAdjusting for potentially irrelevant covariates with PSM: mistake

Adjusting experimental data with PSM: mistakeReestimating the propensity score after eliminating noncommonsupport: mistake1/4 caliper on propensity score: mistake



Conclusions




Balance checking requiredAdjusting for potentially irrelevant covariates with PSM: mistakeAdjusting experimental data with PSM: mistake

Reestimating the propensity score after eliminating noncommonsupport: mistake1/4 caliper on propensity score: mistake



Conclusions




Balance checking requiredAdjusting for potentially irrelevant covariates with PSM: mistakeAdjusting experimental data with PSM: mistakeReestimating the propensity score after eliminating noncommonsupport: mistake

1/4 caliper on propensity score: mistake



Conclusions







Conclusions





In four data sets and many simulations:CEM > Mahalanobis > Propensity Score

(Your performance may vary)CEM and Mahalanobis do not have PSM’s problemsYou can easily check with the Space Graph


Conclusions





In four data sets and many simulations:CEM > Mahalanobis > Propensity Score(Your performance may vary)

CEM and Mahalanobis do not have PSM’s problemsYou can easily check with the Space Graph


Conclusions





In four data sets and many simulations:CEM > Mahalanobis > Propensity Score(Your performance may vary)CEM and Mahalanobis do not have PSM’s problems

You can easily check with the Space Graph


Conclusions







For papers, software (for R, Stata, & SPSS), tutorials, etc.

http://GKing.Harvard.edu/cem


http://GKing.Harvard.edu/cem

Data where PSM Works Reasonably Well — PSM & MDM

−4 −2 0 2 4

−4

−2

02

4

Unmatched Data: L1 = 0.685

Covariate 1

Cov

aria

te 2

−4 −2 0 2 4

−4

−2

02

4

PSM: L1 = 0.452

Covariate 1

Cov

aria

te 2

−4 −2 0 2 4

−4

−2

02

4

MDM: L1 = 0.448

Covariate 1

Cov

aria

te 2


Data where PSM Works Reasonably Well — CEM

−4 −2 0 2 4

−4

−2

02

4

Bad CEM: L1 = 0.661

Covariate 1

Cov

aria

te 2

100% of the treated units

−4 −2 0 2 4

−4

−2

02

4

Better CEM: L1 = 0.188

Covariate 1

Cov

aria

te 2


−4 −2 0 2 4

−4

−2

02

4

Even Better CEM: L1 = 0.095

Covariate 1

Cov

aria

te 2



simplifying causal inference - harvard university

Documents