simulation of propensity scoring methods...simulation of propensity scoring methods dee h. wu, ph.d...

43
Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Upload: others

Post on 26-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Simulation of Propensity Scoring Methods

Dee H. Wu, Ph.D

University of Oklahoma Health Sciences Center, Oklahoma City, OK

Page 2: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

WHAT IS PROPENSITY SCORING?• In certain clinical trials or observational studies,

proper random assignment of treatment and control groups is not always possible, so that selection bias may become an issue.

• Recent efforts to address issues of nonrandom assignment, including a class of methods known as ‘Propensity Scoring,’ are alternatives to reduce bias in the estimation of treatment effects when assignment is not random. ( Rosenbaum and Rubin in 1983).

Page 3: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

But what about ANCOVA asalternative?

Treat 0 1

y

-3

-2

-1

0

1

2

3

4

5

6

x1

-4 -3 -2 -1 0 1 2 3 4

What aboutThe common Problemof HeterogeneityOf Regression?

Page 4: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

THEORY SECTION Problem Statement in Pictures

Y=outcome

covariates

Y=outcome

covariates

Some covariatesAre highly correlatedWith treatment assignment

Y=outcome

covariates

Some covariatesAre highly correlatedWith treatment assignment

Y=outcome

covariates

A vector that is highly correlated with treatment assignment

Y=outcome

covariates

A vector that is highly correlated with treatment assignment

Y=outcome

covariates

For Simplicity in Drawing (we actuallyKeep multiple covariates in the problem)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300 350 400 450

covariatesJust showing two compositeVectors a set which highly impactsThe treatment selection and one that doesn’t

covariates

Y=outcomeis disregardedFor moment

Now generate a propensity Score (a scalar function)

IIIIIIIVV

Increase in Probability Of treatment Y=outcome

Group(match)

I

II

III

III

VIV Now

We canDo our analysisOn groups

Page 5: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Y=outcome

X=covariates

Treatment may not be random on any of these

But in a Quasi-experiment, some of these variables will produce a greater likelihoodOf being in the treatment group ( the green ones)

1. a hypothesis for a causal relationship; 2. a control group and a treatment group; 3. to eliminate confounding variables that

might mess up the experiment and prevent displaying the causal relationship; and

4. to have larger groups with a carefully sorted constituency; preferably randomized, in order to keep accidental

differences from fouling things up. Quasi, when we don’t have all of the situations abovehttp://writing.colostate.edu/guides/research/experiment/pop3e.cfm

Page 6: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Y=outcome

X=covariates

Treatment may not be random on any of these

But in a Quasi-experiment, some of these variables will produce a greater likelihood of being in the treatment group ( the green ones:

Components of Collapsed Exposure variables which have less influence overThe ability to be in group that gets treatment

The Green ones are the components that influenceThe likelihood of being in the treatment.

Page 7: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Y=outcome

X=covariates

Treatment may not be random on any of these

This is no good, because the ANCOVA like methods only help push distributionsAlong the covariate axis and have strict requirements of LINEARITY and HOMOGENEITY OF REGRESSION !

Components of Collapsed Exposure variables which have less influence overThe ability to be in group that gets treatment

The Green ones are the components that influenceThe likelihood of being in the treatment.

Treat 0 1

y

-3

-2

-1

0

1

2

3

4

5

6

x1

-4 -3 -2 -1 0 1 2 3 4

Slide to means of theRegression access

Page 8: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Y=outcome

X=covariates

Treatment may not be random on any of these

Propensity methods provide a new composite variable too identify groups of variables

Groups of variables that have similar covariate patterns (I don’t know if thisIs a derived variable (linear combination) or the original covariates

In any case the GREEN ones are used to generate the propensity score , and stratification with these variables will help to reduce covariate imbalance in the analysis(i.e. because these variables predict the likelihood of treatment, thoseThat have a similar likelihood will get matched under striation.

Components of Collapsed covariates which have less influence overThe ability to be in group that gets treatment

The Green ones are the components that influenceThe likelihood of being in the treatment.

Along the Blue the likelihood of being treatment groupIs more random

Page 9: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Y=outcomeIs IGNORED

X=covariates (green and blue

Make a new axis for analysis which is the treatment group and ignore the output axis. Get

probabilities P of T(A or B) = Σ ai Xi + ε from the logistic regression analysis

Treatment A

Treatment B0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300 350 400 450

observations

ProbabilityOf being inTreatment A

Page 10: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Y=outcomeIs IGNORED

X=covariates (green and blue

Make a new axis for analysis which is the treatment group and ignore the output axis

Get probabilities P of T(A or B) = Σ ai Xi + ε from the logistic regression analysis

Treatment A

Treatment B0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300 350 400 450

observations

ProbabilityOf being inTreatment A

Stratifity the set groups(maybe Quintiles?)

I

IIIIIIV

V

NOTE EACH STRIATION WILL HAVE SOME MEMBERSThat got treatment A and some people that got B

Page 11: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Y=outcome

X=covariates (green and blue

ProbabilityOf being inTreatment A

G=Stratifity the set groups(maybe Quintiles?)

WITHIN EACH STRIATION DO analysis within each straitation (IS THISDONE AS A GROUP GLM (what is the Model statement for this?)

model Y = T Q (Q*T) test for interaction and then remove Variables are T (for treatment A or B)X are covariatesY is outcomeG is striation group

Page 12: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Some Examples (Mixtures based on one single covariate)

A 2

3

1

4

Page 13: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

proc corr data=in;var y x1 x2 x3 x4 propensity;run;

• Pearson Correlation Coefficients, N = 500

Prob > |r| under H0: Rho=0y x1 x2 x3 x4 propensity

y 1.00000 0.61654 0.27374 0.47571 0.59396 0.66762<.0001 <.0001 <.0001 <.0001 <.0001

x1 0.61654 1.00000 0.03578 0.35222 0.02003 0.92131<.0001 0.4247 <.0001 0.6550 <.0001

x2 0.27374 0.03578 1.00000 0.00116 0.02229 0.22981<.0001 0.4247 0.9794 0.6190 <.0001

x3 0.47571 0.35222 0.00116 1.00000 -0.00268 0.38666<.0001 <.0001 0.9794 0.9523 <.0001

x4 0.59396 0.02003 0.02229 -0.00268 1.00000 0.05991<.0001 0.6550 0.6190 0.9523 0.1810

propensity 0.66762 0.92131 0.22981 0.38666 0.05991 1.00000<.0001 <.0001 <.0001 <.0001 0.1810

Page 14: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

3. Calculate a propensity score based a linear combination (weights sum to 1) of the covariates X1 to X4, along with a small random normal error term The variable 'propensity score was generated by a linear combination of the covariates but could have been generated by any function. { I just made up a linear relationship from covariates}

propensity = 0.79*x1+0.15*x2+0.05*x3+0.01 *x4+0.3*Normal(0,1);

Page 15: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

4. Assign to Treatment group based on the propensity score. Note we use run a standard random normal probability with mean 0 and std=1 to assign treatment group (i.e. x1 has a large influence on setting the treatment group).

Note, we used the inverse Probit transformation to control the treatment/control ratio (proportions) and for generating the proper threshold for classifying the treatment covariate.

Page 16: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

5. Create observation vector Y for each of the test cases

Y= a1*x1 + a2*x2 + a3*x3 + a4*x4 + b*T+ c1*T*x1 + c2*T*x2 + c3*T*x3 + c4*T*x4 + 0.4*Norm(0,1)

where, a1=0.28,a2=0.13,a3=0.23,a4=0.352; b=0.4; c1=c2=c3=0.2,c4=0.02

Page 17: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Treat 0 1

y

-3

-2

-1

0

1

2

3

4

5

6

x1

-4 -3 -2 -1 0 1 2 3 4

Treat 0 1

y

-3

-2

-1

0

1

2

3

4

5

6

x2

-2 -1 0 1 2

The treatment set and control group are plotted against x1. In many applications, several covariates may be associated with treatment assignment, not just a single one like x1. x1 can represent a ‘modeled’ linear combination of covariates. X2 shows more randomness.

Low dependence on X2Red are treated groups(shown on x1)

Page 18: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

The Model (we evaluate upto first order interactions)

Include Interaction Terms

Page 19: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK
Page 20: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK
Page 21: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Now for the SAS

Page 22: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Generate RandomDistributions(this is a bimodal one)

Page 23: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Actually , I had used another program to generate these distributions import tab deliminated or ed text files.

Tab delaminatedAscii code for tab is ’09’

Page 24: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Treat 0 1

y

-3

-2

-1

0

1

2

3

4

5

6

x2

-2 -1 0 1 2

Page 25: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Distribution of Y versus confounder x1 for dataset 4 9013:25 Friday, September 21,

2007

The MEANS Procedure

Analysis Variable : y

NTreat Obs Mean Std Dev

0 311 -0.2349022 0.4766791

1 189 0.6163011 0.5493193

Page 26: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

0

0.1

0.2

0.3

0.40.5

0.6

0.7

0.80.9

1

0 50 100 150 200 250 300 350 400 450

EVENT='category' | keywordspecifies the event category for the

binary response model. PROC LOGISTIC models the probability of the

event category.. OUT= SAS-data-set

names the output data set. If you omit the OUT= option, the output data set is created and given a default name using the DATA n convention. PREDPROBS = requests individual, cumulative, or cross validated predicted probabilities. P=name -names the variable containing the predicted probabilities. For the events/trials syntax or single-trial syntax with binary response, it is the predicted event probability

Page 27: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

This is Niave Approach

Page 28: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Ranks var phat, making new variablepsquintile

Page 29: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Distribution of Y versus confounder x1 for dataset 1 10:31 Sunday, September 23, 2007

The FREQ ProcedureTable of psquintile by Treat

psquintile(Rank for Variable phat)Treat

Col Pct ‚ 0‚ 1‚ Total

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ0 ‚ 100 ‚ 0 ‚ 100‚ 20.00 ‚ 0.00 ‚ 20.00‚ 100.00 ‚ 0.00 ‚‚ 40.00 ‚ 0.00 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ1 ‚ 92 ‚ 8 ‚ 100

‚ 18.40 ‚ 1.60 ‚ 20.00‚ 92.00 ‚ 8.00 ‚‚ 36.80 ‚ 3.20 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ2 ‚ 47 ‚ 53 ‚ 100

‚ 9.40 ‚ 10.60 ‚ 20.00‚ 47.00 ‚ 53.00 ‚‚ 18.80 ‚ 21.20 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ3 ‚ 10 ‚ 90 ‚ 100

‚ 2.00 ‚ 18.00 ‚ 20.00‚ 10.00 ‚ 90.00 ‚‚ 4.00 ‚ 36.00 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ4 ‚ 1 ‚ 99 ‚ 100

‚ 0.20 ‚ 19.80 ‚ 20.00‚ 1.00 ‚ 99.00 ‚‚ 0.40 ‚ 39.60 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆTotal 250 250 500

Source DF Type III SS Mean Square F Value Pr > F

psquintile 4 49.49029688 12.37257422 117.18 <.0001Treat 1 0.94311653 0.94311653 8.93 0.0029psquintile*Treat 3 0.22326479 0.07442160

For X1

Source DF Type III SS Mean Square F Value Pr > F

psquintile 4 49.49029688 12.37257422 117.18 <.0001

Treat 1 0.94311653 0.943116538.93 0.0029

psquintile*Treat 3 0.22326479 0.07442160

For X3

Page 30: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Y=outcome

IV

quintile

Page 31: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Quintile Method (no interaction)

Page 32: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Naïve Method

Page 33: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Quintile Method

Page 34: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Surrogate phat for covariates

Page 35: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

IPTW

Page 36: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Results spew

FORMAT variable-1 <. . . variable-n> format

Page 37: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Results and Discussion:

-0.120.27-

30.0%67.5%0.280.671.642assymetric4

-0.022-0.076-5.5%-

19.0%0.3780.3241.133lumpbroad3

0.0710.01117.8%2.7%0.4710.4111.452lump2

-0.0250.001-6.3%-0.3%0.3750.3991.621lump1

Delta Full

Delta Quint

Error Full

Error Quint

Full Model

Quintile MethodNiaveDescription

Distribution

A 2

3

1

4

{

Y= a1*x1 + a2*x2 + a3*x3 + a4*x4 + b*T+ c1*T*x1 + c2*T*x2 + c3*T*x3 + c4*T*x4 + 0.4*Norm(0,1)

where, a1=0.28,a2=0.13,a3=0.23,a4=0.352; b=0.4; c1=c2=c3=0.2,c4=0.02

Yellow Highlighted has better performance

Page 38: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

• Note that in cases where the propensity scores behave “well” and cover a wide range of values, the propensity quintile method outperformed the Full regression model:

• model y = treat x1 x2 x3 x4 treat*x1 treat*x2 treat*x3 treat*x4

• However, when the propensity scores are more asymmetrically distributed or mixed, the propensity method was less useful. As we taught ourselves, the efficacy of the propensity scoring technique depended on its ability to create a well-behaved function, and on the ‘matching’ technique. Matching can be improved with better classification schemes; we performed a simple quintile stratification.

Discussion

Page 39: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

And now for something….Completely Different

Page 40: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

iiii XWY εβτα +++= '

iiii ACXMY +++= 'τα

(see next page)

Page 41: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

iiii XWY εβτα +++= '

•Zi is an observed variable that affects selection into the treatment

•Wi = 1 for individuals in the treated group. Wi = 0 for those in the control group.

Page 42: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Futures• General Boosting Methods (in R) – i.e. more

fancy matching methods• True Multivariate Distributions - (to simulate

covariates better use a multivariate model). I think there are some IML routines out there, didn’t have time to play with them.

Page 43: Simulation of Propensity Scoring Methods...Simulation of Propensity Scoring Methods Dee H. Wu, Ph.D University of Oklahoma Health Sciences Center, Oklahoma City, OK

Conclusion:• Our approach to demonstrating the Propensity

Scoring technique by pictorial description over mathematical formulation was better received by faculty and students, helped our understanding, and built collaboration.

• This demonstration provided a tool that students and faculty can use to better understand the propensity technique.

• This work can be extended to the multiple varieties of discriminant/classification schemes available for matching. Regardless, users of the technique should beware of its pitfalls and challenges once they have understood the base method.