analyzing observational data: focus on propensity scores

26
1 Arlene Ash QMC - Third Tuesday September 21, 2010 (as amended, Sept 23) Analyzing Observational Data: Focus on Propensity Scores

Upload: taber

Post on 05-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Analyzing Observational Data: Focus on Propensity Scores. Arlene Ash. QMC - Third Tuesday September 21, 2010 (as amended, Sept 23). The Problem. Those with the intervention and those without have markedly different values for important measured risk factors & - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Analyzing Observational Data: Focus on Propensity Scores

1

Arlene Ash

QMC - Third TuesdaySeptember 21, 2010

(as amended, Sept 23)

Analyzing Observational Data: Focus on Propensity Scores

Page 2: Analyzing Observational Data: Focus on Propensity Scores

2

The Problem

• Those with the intervention and those without have markedly different values for important measured risk factors &

• Outcome is related to the risk factors that are imbalanced between the groups &

• It is not clear how the risk factors and outcome are related

• Why may standard analyses be misleading?

Page 3: Analyzing Observational Data: Focus on Propensity Scores

3

0

0.2

0.4

0.6

0.8

1.0

0 0.5 1.0 1.5 2.0

Risk

Out

com

eTrue and Modeled Relationship

Between Risk and Outcome

Page 4: Analyzing Observational Data: Focus on Propensity Scores

4

Is Imbalance in Risk a Problem?

• If we correctly model the relationship between risk factors and outcome, we correctly estimate effect of the intervention

• With many risk factors, hard to know if the relationship between risk factors and outcome is correctly modeled

• Propensity score - a way to reduce the effect of imbalance in measured risk when models may be inadequate

Page 5: Analyzing Observational Data: Focus on Propensity Scores

5

Propensity Score Method (Key Idea)

• The propensity score (PS) for an observation is

the probability that the observation is “exposed”

or “got the intervention”

• Use the PS model in pre-processing the data– To draw a sub-sample where the exposed and non-

exposed groups are fairly balanced on risk factors.

Then

– Use standard techniques to analyze the sub-sample

Page 6: Analyzing Observational Data: Focus on Propensity Scores

6

Simple Propensity Score Approach

• Estimate a model to predict the “probability of intervention/exposure”

– This is “the propensity score”• Divide the population into PS quintiles• Create a subsample by taking equal numbers of

exposed and unexposed observations from each quintile• Use a subsequent regression model to estimate the

effect of the intervention in the subsample

Page 7: Analyzing Observational Data: Focus on Propensity Scores

7

Propensity Score Sampling Example

PS Quintile # Cases # Controls # Sampled

Lowest 12 81 24

2nd 30 67 60

Middle 44 38 76

4th 53 15 30

Highest 78 8 16

Total 217 209 206

Page 8: Analyzing Observational Data: Focus on Propensity Scores

8

Propensity Score Sampling Example: Treatments for Drug Abusers

• Patients seeking substance abuse detoxification in Boston receive either

Residential detoxification

Lasts ~ one week + encouragement for post-detox treatment, or

Acupuncture

Acute (daily) detox + 3-6 months of maintenance with acupuncture and motivational counseling

Page 9: Analyzing Observational Data: Focus on Propensity Scores

9

Data

• From Boston’s publicly-funded substance abuse treatment system

• All cases discharged from residential detox or acupuncture between 1/93 and 9/94

• Client classified (only once) as residential or acupuncture based on the modality of first discharge

Page 10: Analyzing Observational Data: Focus on Propensity Scores

10

Outcome

• Is client re-admitted to detox within 6 months? (Y/N)

• Study question: Are acupuncture clients more likely to be re-admitted than residential detox clients?– Exposure = assigned to accupuncture

Page 11: Analyzing Observational Data: Focus on Propensity Scores

11

Client Characteristics Available At Time Of Admission

• Gender• Race/ethnicity• Age• Education• Employment status• Income• Health insurance status

• Living situation• Prior mental health treatment • Primary drug• Substance abuse treatment history

Page 12: Analyzing Observational Data: Focus on Propensity Scores

12

Residential Detox & Acupuncture Cases:% with Various Characteristics

CharacteristicResidential (n = 6,907)

Acupuncture(n = 1,104)

Gender: female 29 33

Race/ethnicity: black 46 46

Hispanic 12 10

White 41 43

Education: HS grad 56 59

College graduate 4 13

Page 13: Analyzing Observational Data: Focus on Propensity Scores

13

Employment: unemployed 86.8 43.2

Insurance: uninsured 65.4 52.3

Medicaid 28.2 21.2

Private insurance 3.0 15.4

Lives: with child 9.5 19.3

In shelter 30.3 2.9

CharacteristicResidential (n = 6,907)

Acupuncture(n = 1,104)

Characteristics of Residential Detox & Acupuncture Clients (2)

Page 14: Analyzing Observational Data: Focus on Propensity Scores

14

Prior mental health treatment 12.3 27.8

Primary drug: alcohol 42.3 32.4

Cocaine 16.2 16.6

Crack 15.9 20.2

Heroin 24.6 19.0

CharacteristicResidential (n = 6,907)

Acupuncture(n = 1,104)

Characteristics of Residential Detox & Acupuncture Clients (3)

Page 15: Analyzing Observational Data: Focus on Propensity Scores

15

Substance abuse admits in the last yearResidential detox: 0

12+

Short-term residential: 0Long-term residential: 0Outpatient: NoneAcupuncture: None

56.7

20.223.176.280.580.695.9

81.012.17.0

94.893.554.390.1

CharacteristicResidential (n = 6,907)

Acupuncture(n = 1,104)

Characteristics of Residential Detox & Acupuncture Clients (4)

Page 16: Analyzing Observational Data: Focus on Propensity Scores

16

Results Of Standard Analysis

Percentage of clients re-admitted to detox within 6 months• Among 1,104 acupuncture cases, 18% re-admitted • Among 6,907 residential detox cases, 36% re-admitted• Raw odds ratio = 0.40From a multivariable stepwise logistic regression model:• Odds ratio for acupuncture: 0.71 (CI = 0.53-0.95)

Page 17: Analyzing Observational Data: Focus on Propensity Scores

17

What’s the Worry? How Do We Address It?

• Given how different the two groups are, can we trust a model to correctly estimate the effect of acupuncture?

• PS methods generalize (long-standing) matching-within-strata methods that work well with 1 or 2 predictors

• PS can address imbalances in many important predictors simultaneously

• Both traditional and PS matching allow for – A pooled estimate (across all strata) or – When N is large enough, stratum-specific estimates

Page 18: Analyzing Observational Data: Focus on Propensity Scores

18

Propensity Score Application

• Use stepwise logistic regression to build a model to predict whether a client “is exposed” (i.e., receives acupuncture)

• Select sub-samples of exposed and non-exposed with similar distributions of the “propensity score” (predicted probability of being exposed)

• Model (as before) on the sub-sample

Page 19: Analyzing Observational Data: Focus on Propensity Scores

19

Sampling Results

• Able to match

740 who received acupuncture (out of 1,104) with

740 people who did not (out of 6,907)• The risk factors in this subsample of 1480 are

much more balanced between the two groups

Page 20: Analyzing Observational Data: Focus on Propensity Scores

20

Characteristic Residential Acupuncture

College graduateEmployedPrivate InsuranceLives with child or adultLives in shelterPrior mental health Rx

7% 41%

9% 72%

5% 21%

(4%)(13%)

(3%)(55%)(30%)(12%)

7% 42%

6% 77%

4% 21%

(13%)(57%)(15%)(76%)

(3%)(28%)

Characteristics of Clients in Subsample (vs. Full Sample)

Page 21: Analyzing Observational Data: Focus on Propensity Scores

21

Comparing Standard and Propensity Score Findings

From the multivariable model fit to all cases:Odds Ratio for acupuncture: 0.7195% Confidence Interval: 0.53-

0.95From multivariable model fit to more comparable sub-

sample:OR for acupuncture: 0.6195% CI: 0.39-

0.94

Page 22: Analyzing Observational Data: Focus on Propensity Scores

22

Summary

• In this case, results were similar - Why? Original model was very good (C-statistic = 0.96)• What we learned from the PS analysis:

–Could find a subset of (about 10% of) patients who got residential detox who look very similar to those who got acupuncture

–Skeptics were more receptive to findings from the PS analysis

Page 23: Analyzing Observational Data: Focus on Propensity Scores

23

Which X’s Belong in the PS Model?

The goal is to estimate the effect of exposure E on outcome Y

• Confounders (Brookhart’s X1 variables)?– Directly affect both E and Y

• Simple predictors (X2 s)?– Affect Y but not E

• Simple selectors (X3 s)?– Affect E but not Y

Page 24: Analyzing Observational Data: Focus on Propensity Scores

24

Example

The goal is to estimate the effect of

E = CABG surgery on

Y = 30-day mortality following admission for a heart attack– Confounder (e.g., disease severity)– Simple predictors (e.g., home support)– Simple selectors, aka “instrumental variables”

(e.g., random assignment)

Page 25: Analyzing Observational Data: Focus on Propensity Scores

25

Variable type Directly affectsBelongs in

which modelOutcome

(Y) Exposure

(E) PSSubsequent Regression

X1 Confounder 1 1 Yes Yes

X2 Predictor 1 0 ? Yes

X3 Selector 0 1 No ?

? = inclusion should neither harm nor help

Page 26: Analyzing Observational Data: Focus on Propensity Scores

26

Discussion

• The “pre-processing” that occurs when sub-sampling to create “PS-balanced” comparison groups protects against bias from confounding variables

• Putting selector variables in the PS model will hurt accuracy (by reducing the numbers of good matches) without making the groups more comparable

• Subsequent regression improves accuracy