more predictive modeling of total healthcare costs using pharmacy claims data

17
This presentation contains confidential and proprietary information of Caremark and cannot be reproduced, distributed, or printed without written permission from Caremark. ©2006 Caremark. All rights reserved. More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data: Adherence Dimension and Boosted Regression M. Christopher Roebuck & Joshua N. Liberman American Society of Health Economists Inaugural Conference Madison, Wisc. June 6, 2006

Upload: m-christopher-roebuck

Post on 10-Jun-2015

1.154 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

This presentation contains confidential and proprietary information of Caremark and cannot be reproduced, distributed, or printed without written permission from Caremark.

©2006 Caremark. All rights reserved.

More Predictive Modeling of Total Healthcare Costs UsingPharmacy Claims Data: Adherence Dimension and Boosted Regression

M. Christopher Roebuck &Joshua N. Liberman

American Society of Health EconomistsInaugural ConferenceMadison, Wisc.June 6, 2006

Page 2: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

2

Caremark proprietary and confidential information. Not for distribution.

Predictive Modeling Forecasts health services utilization and costs of

insurance plan members Identifies candidates for disease and therapy

management interventions Infers disease state, severity and

statistical/econometric methods employed, depending on the classification system used

Includes well-known claims- and diagnosis-based “groupers,” such as the chronic disease score, adjusted clinical groups, diagnostic cost groups and episode risk groups

Page 3: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

3

Caremark proprietary and confidential information. Not for distribution.

Study Background Pharmacy claims are less costly and contain fewer

coding errors than medical data Pharmacy health dimensions (PHD): pharmacy-

based risk index that categorizes a year of prescription data into 62 disease indicators

Previous study of PHD accuracy at predicting prospective total annual healthcare costs that used several econometric techniques to deal with skewness and kurtosis1

This study is an extension of that work

1 Powers, C.A., C.M. Meyer, M.C. Roebuck, and B. Vaziri. 2005. “Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data: A Comparison of Alternative Econometric Cost Modeling Techniques.” Medical Care 43(11): 1065-1072.

Page 4: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

4

Caremark proprietary and confidential information. Not for distribution.

Study Objectives Examine relationship between plan participant

adherence to drug therapy and future total healthcare costs by augmenting PHD with an adherence dimension of predictors

Evaluate use of boosted regression modeling as an alternative to other econometric approaches for predicting (commonly) skewed and kurtotic healthcare cost data

Page 5: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

5

Caremark proprietary and confidential information. Not for distribution.

About Adherence Adherence is defined as “the extent to which a person’s

behavior—taking medication, following a diet and/or executing lifestyle changes—corresponds with agreed recommendations from a healthcare provider.”1

“Poor adherence to the treatment of chronic diseases is a worldwide problem of striking magnitude. The impact of poor adherence grows as the burden of chronic disease grows.”1

Adherence to drug therapy is measured as both: Compliance: the extent to which a plan participant takes

medicine as prescribed (e.g., medication possession ratio). Persistence: the extent to which a plan participant follows

the prescribed length of therapy (e.g., length of continuous therapy in days).

Source: 1. World Health Organization 2003. “Adherence to Long-term Therapies – Evidence for action.”

Page 6: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

6

Caremark proprietary and confidential information. Not for distribution.

About Boosted Regression Came out of computational learning, called “boosting”1 Expands into generalized linear models based on the

Gradient Boosting Machine2, 3 that recast the algorithm in a likelihood framework

Fits a regression tree to residuals from previously fitted regression tree (beginning first with a “guess” of the response variable)

Updates regression tree sequentially to include all previously estimated regression trees, until the following two parameters (specified a priori) are reached: Number of “splits” (or N-way interactions) Maximum number of iterations

Sources: 1. Freund, Y. and R. E. Schapire. 1997. “A decision-theoretic generalization of online learning and an application to boosting.”  Journal of Computer and System Sciences 55(1): 119-139. 2. Friedman, J.H.  2001. “Greedy function approximation: a gradient boosting machine.” Annals of Statistics 29(5): 1189-1232.3. Friedman, J.H.  2002. “Stochastic gradient boosting.” Computational Statistics and Data Analysis 38(4): 367-378.

Page 7: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

7

Caremark proprietary and confidential information. Not for distribution.

Data Utilized integrated medical and pharmacy claims

data from a large (N=369,985) U.S. health plan Studied 2003 and 2004 data, which allowed for a

baseline/follow-up design Included plan participants continuously eligible for

pharmacy benefits for the entire study period Allowed for no other exclusions or restrictions

(e.g., all ages and all claims remained in the study) Partitioned data randomly into 70% training and

30% validation samples

Page 8: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

8

Caremark proprietary and confidential information. Not for distribution.

Methods Used five multivariate predictive models of total annual

healthcare costs (pharmacy and medical) in the follow-up year to estimate four conditions: Diabetes, congestive heart failure, hypercholesterolemia,

hypertension Included independent variables:

Continuous measure of baseline pharmacy costs 14 age/gender categories 62 PHD disease indicators Average co-pay per day supplied Percent mail service days supplied Four adherence dimension variables:

- Compliance and compliance2

- Days persistent- Number of different drugs

Page 9: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

9

Caremark proprietary and confidential information. Not for distribution.

MethodsIncluded five econometric modeling techniques:

Ordinary least squares (OLS) Robust regression Two-part model-probit/OLS Two-part model-probit/GLM (gamma,log link) Boosted regression with STATA command syntax:

boost THC2004_T50 $RHS $OTH, influence distribution(normal) trainfraction(0.7) maxiter(1000) seed(1) bag(0.5) predict(HATS`m') interaction(3) shrink(0.01)

Page 10: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

10

Caremark proprietary and confidential information. Not for distribution.

Results

Diabetes

(N=13,202)

Congestive Heart Failure (N=22,243)

Hyper- cholesterolemia

(N=33,597) Hypertension

(N=60,028) Total 2004 healthcare costs (medical + pharmacy)

Minimum Maximum Mean

$0 $1,719,645

$17,551

$0 $3,398,680

$16,127

$0 $3,398,680

$13,366

$0 $3,398,680

$13,612 Median $4640 $3673 $3726 $3204

Mean compliance 0.80 0.72 0.80 0.83 Mean days persistent 300 244 278 300 Mean number of different drugs 1.88 1.18 1.21 1.70

Descriptive statistics

Page 11: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

11

Caremark proprietary and confidential information. Not for distribution.

ResultsAdherence dimension coefficient estimates from OLS model of prospective total annual healthcare costs (untruncated)

Diabetes Congestive

Heart Failure Hyper-

cholesterolemia Hypertension Days persistent -$6 -$14*** -$6* -$16*** Compliance $6686 $20,562** -$4390 $13,936*** Compliance2 -$9406 -$18,149*** $851 -$12,851*** Number of different drugs $370 $9328*** $177 $1199*** ***p<0.01; **p<0.05; *p<0.10

Page 12: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

12

Caremark proprietary and confidential information. Not for distribution.

ResultsAdherence dimension coefficient estimates from OLS model of prospective total annual healthcare costs (truncated at $50,000)

Diabetes Congestive

Heart Failure Hyper-

cholesterolemia Hypertension Days persistent -2 -3*** -4*** -4*** Compliance 330 3882** 2234 3819*** Compliance2 -1361 -4015*** -2314* -3821*** Number of different drugs 356** 1920*** 585*** 658*** ***p<0.01; **p<0.05; *p<0.10

Page 13: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

13

Caremark proprietary and confidential information. Not for distribution.

Results

Model Diabetes Congestive

Heart Failure Hyper-

cholesterolemia Hypertension OLS

R2

MAPE*

.020

20,559

.026

19,547

.036

15,223

.025

16,383 Robust

R2

MAPE

.016

15,131

.018

14,464

.031

11,354

.023

11,836 Two-Part: Probit-OLS

R2

MAPE

.021

16,371

.031

15,591

.027

13,915

.028

14,730 Two-Part: Probit-GLM (gamma/log)

R2

MAPE

.013

16,261

.020

18,940

.001

15,755

.007

18,215 Boosted

R2

MAPE

.005

17,322

.017

19,844

.005

15,777

.007

16,800

Validation sample summary results from predictive models of prospective total annual healthcare costs (untruncated)

* Mean absolute prediction error

Page 14: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

14

Caremark proprietary and confidential information. Not for distribution.

Results

Diabetes Congestive

Heart Failure Hyper-

cholesterolemia Hypertension OLS

R2

MAPE

.123 8816

.142 8086

.117 7455

.120 7496

Robust R2

MAPE

.093 7450

.119 6815

.098 6266

.107 6118

Two-Part: Probit-OLS R2

MAPE

.085 8122

.085 7484

.094 7126

.092 7246

Two-Part: Probit-GLM (gamma/log) R2

MAPE

.065 8136

.011 8353

.004 7912

.006 8264

Boosted R2

MAPE

.132 8769

.148 8061

.124 7453

.131 7467

Validation sample summary results from predictive models of prospective total annual healthcare costs(truncated at $50,000)

Page 15: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

15

Caremark proprietary and confidential information. Not for distribution.

Conclusions Increased compliance was associated with a decrease

in next year’s total healthcare costs Each additional day of persistent therapy was

associated with a decrease of between $6 and $16 in next year’s total healthcare costs

The magnitude of this association varied, as expected, by disease state

The number of different drugs – filled within a given year and indicated for that disease state – increased next year’s total healthcare costs, likely signifying: Treatment resistance/failure Therapeutic aggressiveness/intensity Disease severity

Page 16: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

16

Caremark proprietary and confidential information. Not for distribution.

Conclusions cont.

PHD provided classification and predictive power similar to other prescription-only risk-adjustment groupers

Robust regression, as expected, always returned the least mean absolute prediction error

While boosted regression did offer higher R2, overfitting was evident in untruncated models (Note: this study did not attempt to respecify the boosting parameters)

Unfortunately, currently available user-written command BOOST does not output the regression tree structure for application in other samples

Boosting is useful in uncovering important interaction terms

Page 17: More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data

17

Caremark proprietary and confidential information. Not for distribution.

Limitations The potential endogeneity of the adherence

measures was not examined. Non-adherent behavior may not alter next year’s

total healthcare costs, but may affect future periods’ total healthcare costs.

The study sample was from a single, national health plan, and are therefore, not generalizable.

Need to consider other measures of accuracy (positive predictive value).

Need to tweak boosting specification to reduce overfitting.