analyzing observational data: focus on propensity scores

26
1 Arlene Ash QMC - Third Tuesday September 21, 2010 (as amended, Sept 23) Analyzing Observational Data: Focus on Propensity Scores

Upload: gayora

Post on 24-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Analyzing Observational Data: Focus on Propensity Scores. Arlene Ash. QMC - Third Tuesday September 21, 2010 (as amended, Sept 23). The Problem. Those with the intervention and those without have markedly different values for important measured risk factors & - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Analyzing Observational Data: Focus on Propensity Scores

1

Arlene Ash

QMC - Third TuesdaySeptember 21, 2010

(as amended, Sept 23)

Analyzing Observational Data: Focus on Propensity Scores

Page 2: Analyzing Observational Data: Focus on Propensity Scores

2

The Problem

• Those with the intervention and those without have markedly different values for important measured risk factors &

• Outcome is related to the risk factors that are imbalanced between the groups &

• It is not clear how the risk factors and outcome are related

• Why may standard analyses be misleading?

Page 3: Analyzing Observational Data: Focus on Propensity Scores

3

0

0.2

0.4

0.6

0.8

1.0

0 0.5 1.0 1.5 2.0

Risk

Out

com

eTrue and Modeled Relationship

Between Risk and Outcome

Page 4: Analyzing Observational Data: Focus on Propensity Scores

4

Is Imbalance in Risk a Problem?

• If we correctly model the relationship between risk factors and outcome, we correctly estimate effect of the intervention

• With many risk factors, hard to know if the relationship between risk factors and outcome is correctly modeled

• Propensity score - a way to reduce the effect of imbalance in measured risk when models may be inadequate

Page 5: Analyzing Observational Data: Focus on Propensity Scores

5

Propensity Score Method (Key Idea)

• The propensity score (PS) for an observation is the probability that the observation is “exposed” or “got the intervention”

• Use the PS model in pre-processing the data– To draw a sub-sample where the exposed and non-

exposed groups are fairly balanced on risk factors. Then

– Use standard techniques to analyze the sub-sample

Page 6: Analyzing Observational Data: Focus on Propensity Scores

6

Simple Propensity Score Approach

• Estimate a model to predict the “probability of intervention/exposure” – This is “the propensity score”

• Divide the population into PS quintiles• Create a subsample by taking equal numbers of

exposed and unexposed observations from each quintile• Use a subsequent regression model to estimate the

effect of the intervention in the subsample

Page 7: Analyzing Observational Data: Focus on Propensity Scores

7

Propensity Score Sampling Example

PS Quintile # Cases # Controls # Sampled

Lowest 12 81 24

2nd 30 67 60

Middle 44 38 76

4th 53 15 30

Highest 78 8 16

Total 217 209 206

Page 8: Analyzing Observational Data: Focus on Propensity Scores

8

Propensity Score Sampling Example: Treatments for Drug Abusers

• Patients seeking substance abuse detoxification in Boston receive eitherResidential detoxification Lasts ~ one week + encouragement for post-

detox treatment, orAcupuncture Acute (daily) detox + 3-6 months of maintenance

with acupuncture and motivational counseling

Page 9: Analyzing Observational Data: Focus on Propensity Scores

9

Data

• From Boston’s publicly-funded substance abuse treatment system

• All cases discharged from residential detox or acupuncture between 1/93 and 9/94

• Client classified (only once) as residential or acupuncture based on the modality of first discharge

Page 10: Analyzing Observational Data: Focus on Propensity Scores

10

Outcome

• Is client re-admitted to detox within 6 months? (Y/N)

• Study question: Are acupuncture clients more likely to be re-admitted than residential detox clients?– Exposure = assigned to accupuncture

Page 11: Analyzing Observational Data: Focus on Propensity Scores

11

Client Characteristics Available At Time Of Admission

• Gender• Race/ethnicity• Age• Education• Employment status• Income• Health insurance status

• Living situation• Prior mental health treatment • Primary drug• Substance abuse treatment history

Page 12: Analyzing Observational Data: Focus on Propensity Scores

12

Residential Detox & Acupuncture Cases:% with Various Characteristics

CharacteristicResidential (n = 6,907)

Acupuncture(n = 1,104)

Gender: female 29 33

Race/ethnicity: black 46 46

Hispanic 12 10White 41 43Education: HS grad 56 59College graduate 4 13

Page 13: Analyzing Observational Data: Focus on Propensity Scores

13

Employment: unemployed 86.8 43.2Insurance: uninsured 65.4 52.3

Medicaid 28.2 21.2

Private insurance 3.0 15.4

Lives: with child 9.5 19.3

In shelter 30.3 2.9

CharacteristicResidential (n = 6,907)

Acupuncture(n = 1,104)

Characteristics of Residential Detox & Acupuncture Clients (2)

Page 14: Analyzing Observational Data: Focus on Propensity Scores

14

Prior mental health treatment 12.3 27.8

Primary drug: alcohol 42.3 32.4

Cocaine 16.2 16.6

Crack 15.9 20.2

Heroin 24.6 19.0

CharacteristicResidential (n = 6,907)

Acupuncture(n = 1,104)

Characteristics of Residential Detox & Acupuncture Clients (3)

Page 15: Analyzing Observational Data: Focus on Propensity Scores

15

Substance abuse admits in the last yearResidential detox: 0

12+

Short-term residential: 0Long-term residential: 0Outpatient: NoneAcupuncture: None

56.720.223.176.280.580.695.9

81.012.17.0

94.893.554.390.1

CharacteristicResidential (n = 6,907)

Acupuncture(n = 1,104)

Characteristics of Residential Detox & Acupuncture Clients (4)

Page 16: Analyzing Observational Data: Focus on Propensity Scores

16

Results Of Standard Analysis

Percentage of clients re-admitted to detox within 6 months• Among 1,104 acupuncture cases, 18% re-admitted • Among 6,907 residential detox cases, 36% re-admitted• Raw odds ratio = 0.40From a multivariable stepwise logistic regression model:• Odds ratio for acupuncture: 0.71 (CI = 0.53-0.95)

Page 17: Analyzing Observational Data: Focus on Propensity Scores

17

What’s the Worry? How Do We Address It?

• Given how different the two groups are, can we trust a model to correctly estimate the effect of acupuncture?

• PS methods generalize (long-standing) matching-within-strata methods that work well with 1 or 2 predictors

• PS can address imbalances in many important predictors simultaneously

• Both traditional and PS matching allow for – A pooled estimate (across all strata) or – When N is large enough, stratum-specific estimates

Page 18: Analyzing Observational Data: Focus on Propensity Scores

18

Propensity Score Application

• Use stepwise logistic regression to build a model to predict whether a client “is exposed” (i.e., receives acupuncture)

• Select sub-samples of exposed and non-exposed with similar distributions of the “propensity score” (predicted probability of being exposed)

• Model (as before) on the sub-sample

Page 19: Analyzing Observational Data: Focus on Propensity Scores

19

Sampling Results

• Able to match 740 who received acupuncture (out of 1,104)

with 740 people who did not (out of 6,907)

• The risk factors in this subsample of 1480 are much more balanced between the two groups

Page 20: Analyzing Observational Data: Focus on Propensity Scores

20

Characteristic Residential Acupuncture

College graduateEmployedPrivate InsuranceLives with child or adultLives in shelterPrior mental health Rx

7% 41%

9% 72%

5% 21%

(4%)(13%)

(3%)(55%)(30%)(12%)

7% 42%

6% 77%

4% 21%

(13%)(57%)(15%)(76%)

(3%)(28%)

Characteristics of Clients in Subsample (vs. Full Sample)

Page 21: Analyzing Observational Data: Focus on Propensity Scores

21

Comparing Standard and Propensity Score Findings

From the multivariable model fit to all cases:Odds Ratio for acupuncture: 0.7195% Confidence Interval: 0.53-

0.95From multivariable model fit to more comparable sub-

sample:OR for acupuncture: 0.6195% CI: 0.39-

0.94

Page 22: Analyzing Observational Data: Focus on Propensity Scores

22

Summary

• In this case, results were similar - Why? Original model was very good (C-statistic = 0.96)• What we learned from the PS analysis:

–Could find a subset of (about 10% of) patients who got residential detox who look very similar to those who got acupuncture

–Skeptics were more receptive to findings from the PS analysis

Page 23: Analyzing Observational Data: Focus on Propensity Scores

23

Which X’s Belong in the PS Model?

The goal is to estimate the effect of exposure E on outcome Y

• Confounders (Brookhart’s X1 variables)?– Directly affect both E and Y

• Simple predictors (X2 s)?– Affect Y but not E

• Simple selectors (X3 s)?– Affect E but not Y

Page 24: Analyzing Observational Data: Focus on Propensity Scores

24

Example

The goal is to estimate the effect of E = CABG surgery onY = 30-day mortality following admission for a heart attack– Confounder (e.g., disease severity)– Simple predictors (e.g., home support)– Simple selectors, aka “instrumental variables”

(e.g., random assignment)

Page 25: Analyzing Observational Data: Focus on Propensity Scores

25

Variable type Directly affectsBelongs in

which modelOutcome

(Y) Exposure

(E) PSSubsequent Regression

X1 Confounder 1 1 Yes Yes

X2 Predictor 1 0 ? Yes

X3 Selector 0 1 No ?

? = inclusion should neither harm nor help

Page 26: Analyzing Observational Data: Focus on Propensity Scores

26

Discussion• The “pre-processing” that occurs when sub-

sampling to create “PS-balanced” comparison groups protects against bias from confounding variables

• Putting selector variables in the PS model will hurt accuracy (by reducing the numbers of good matches) without making the groups more comparable

• Subsequent regression improves accuracy