modelling personalized screening: a step forward on risk...

Modelling Personalized Screening: a Step Forwardon Risk Assessment Methods

Validating Prediction Models

Inmaculada Arostegui

Universidad del País Vasco UPV/EHURed de Investigación en Servicios de Salud en Enfermedades Crónicas - REDISSEC

Basque Center for Applied Mathematics - BCAM

38th Annual Conference of the ISCBVigo, 9-13 July 2017

I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 1 / 29

Outline

1 Introduction and Motivation

2 CPRs: Validation process

3 Application to eCOPD evolution

4 Discussion


Introduction and Motivation

Prediction models and clinical practice

Prediction on the prognosis of a disease is necessary forscreening, prevention and choice of treatment

The probabilities of diagnosis and prognostic outcomes areconditioning decision-making process

“Evidence-based medicine” applies the scientific method tomedical practice

Towards “shared decision-making” on choices for diagnostictests and therapeutic interventions

↓Clinical prediction rules may provide the evidence-based input for

shared decision-making in clinical practice



Motivating data: The IRYSS–COPD Study

COPD is a leading chronic condition in many countries

Exacerbation of COPD (eCOPD) often requires assessment in anED and hospitalization

I Severe exacerbations lead to death or intubationI Moderate exacerbations require an adjustment of the therapy

Exacerbations play a major role in the burden of COPD, itsevolution, and its cost

Physicians must rely largely on their experience and the patient’spersonal criteria for gauging how an eCOPD will evolve

A clinical prediction rule for eCOPD evolution would allowphysicians to make better informed decisions about treatment

GoalThe development of clinical prediction rules (scores) for risk

stratification of patients with eCOPD



Goal

A method for the development of validated clinical prediction rules(scores) for risk stratification and to make them available as easy to

use tools for clinical decision-making process

↓

development

validatedscores

stratificationeasy to use tools


CPRs: Validation process General overview

Step-by-step process

1 Modeling: Model development and validation

2 Scoring: Score development and validation

3 Stratification: Score categorization and validation


CPRs: Validation process Model development and validation

Modeling: Development

In general:

I OutcomeI k predictorsI Model

In our case:I Binary outcomeI Continuous and categorical predictors

↓Logistic regression model

I Selection of predictorsI Model discrimination: Area under the receiver operating

characteristic (ROC) curve (AUC)I Model calibration: Calibration plot & H-L test



Modeling: Validation

1 Predictors:I Relationship predictor-outcomeI Missing values

2 Selection of predictors: Stability of the predictors with internalbootstrap validation

3 Overestimation of the AUC:I Same data were used for modeling (logistic regression) and

discrimination (AUC) purposesI Consequently, AUC is biasedI Optimism correction for the AUC is proposed: bootstrap

bias-correction methodHarrell, 2001.

4 Split validation: Application to a different sample



Predictors

Relationship predictor-outcome (logistic function)LinearNon linear

I Smooth functions (GAM)I Categorize predictor: Look for optimal categorization

Missing valuesIgnore (drop out subjects)Imputation techniquesConsider missing category



Selection of predictors: Step 1

Derivation sample

Variables with p-value <0.20 (𝑋1, … , 𝑋𝑛)

Subsample 1 ….

Generation of 2000 bootstrap samples*

Model 1 ….

….

0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

N = 2000 Bandwidth = 0.07097

De

nsity

-10 0 10 20

0.0

0.1

0.2

0.3

0.4

N = 1997 Bandwidth = 0.1757

De

nsity

• If 0 ∊ 𝛽𝑖𝐶𝐼 80%=(𝑝10−𝑝90)𝛽𝑖

𝑋𝑖 was not considered for the Step 2.0

• If 0 ∉ 𝛽𝑖𝐶𝐼 80%=(𝑝10−𝑝90)𝛽𝑖

𝑋𝑖 was considered for the Step 2.0

STEP 1: Variable selection

*Bootstrap samples: subsamples with replacement (of the same size as the derivation sample)

(β11

,…, β𝑛1

)

Subsample 2000

Model 2000

(β12000

,…, β𝑛2000

)



Selection of predictors: Step 2

STEP 2: Model building

Risk factors associated with the

outcome in Step 2.j-1 (𝑋𝑟𝑗

, … , 𝑋𝑠𝑗) 1≤ 𝑟𝑗 <𝑠𝑗≤ n

Subsample 1

…. Model 1

…. (β11

,…, β𝑛1

)

Model 2000

(β12000

,…, β𝑛2000

)

Generation of 2000 NEW boostraps Subsample 2000 ….

• If 0 ∊ 𝛽𝑖𝐶𝐼 95%=(𝑝2,5−𝑝97,5)𝛽𝑖 𝑋𝑖 was not considered for the Step 2.j+1

• If 0 ∉ 𝛽𝑖𝐶𝐼 95%=(𝑝2,5−𝑝97,5)𝛽𝑖 𝑋𝑖 was considered for the Step 2.j+1

Step 2.j is repeated since all the variables

in the model verify 0 ∉ 𝛽𝑖𝐶𝐼 95%

i ∊{𝑟𝑗,…, 𝑠𝑗}

FINAL MODEL

Step 2.j : j=1,..



AUC correctionStep 1 Fit the logistic regression model on the basis of the originalsample {(x i , yi)}N

i=1 and compute the corresponding AUC, AUCapp.

Step 2 For b = 1, . . . ,B, generate the bootstrap resample (b.r) {(x∗ib, y

∗ib)}

Ni=1

by drawing a random sample of size N with replacement from the originalsample.

Step 3 Fit the logistic regression model to the bootstrap resample and

compute the corresponding AUC, AUCb

boot .

Step 4 Obtain the predicted probabilities for the original sample based on thefitted logistic regression model obtained in Step 3 and compute the AUC,

AUCb

o.The optimism O of the original AUC is calculated as follows

O =1B

B∑b=1

(AUCb

boot − AUCb

o)

and the bias corrected AUC is then computed as AUCapp −O.I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 12 / 29

CPRs: Validation process Score development and validation

Scoring: DevelopmentStep1: Estimate the parameters of the model

f (y) = β0 + β1X1 + · · ·+ βnXn

Step2: Determine reference values for each category j of each predictor Xi (Wij )Dichotomous predictor: reference values are 0/1Continuous predictor (Xi ): Categorize in k contiguous classes (Xi1,Xi2, · · · ,Xik )

Step3: Determine the reference value of the base category for each predictor (WiREF )Step4: Set the number of regression units that reflects 1 point in the score (B)Step5: Weight each category of each predictor by its significance level (bj )

p > 0.1⇒ bij = 00.05 < p < 0.1⇒ bij = 0.60.01 < p < 0.05⇒ bij = 10.001 < p < 0.01⇒ bij = 1.2p < 0.001⇒ bij = 1.4

Step6: Determine the number of points for each category of each predictor (Sij )

Sij = bijβi (Wij−WiREF )

B

Sullivan et al., Statistics in Medicine, 2004.


CPRs: Validation process Score development and validation

Scoring: Validation

1 Comparing AUC(model) vs. AUC(score): DeLong testDeLong et al., Biometrics, 1988.

2 Optimism correction for the AUC: Bootstrap bias-correction of theoverestimation

Harrell, 2001.


CPRs: Validation process Score categorization

Stratification: Categorization methodLet Y be a dichotomous response variable and X thecontinuous score which we want to categorize

Look for the vector of k optimal cut points v = (x1, . . . , xk ) byusing genetic algorithmsThe aim is to maximize the AUC of the model

P(Y = 1|Xcatk ) =exp(β0 +

∑kl=1 βl1{Xcatk =l})

1 + exp(β0 +∑k

l=1 βl1{Xcatk =l})

The arguments used in developing the genetic algorithm:I AUC function to be maximizedI k number of parameters to be estimatedI Range of the score X in which we look for the cut points

XCatkthe categorized score taking k + 1 values (l = 0, . . . , k)

Barrio et al., Statistical Methods in Medical Research, 2015.


CPRs: Validation process Score categorization

Risk stratification

Continuous score: XAfter categorization: XCatk (k = 4)

↓4 risk categories: low - moderate - high - very high

Comparing AUC(XCat4) vs. AUC(X ): DeLong test

Optimism correction for the AUC: Modified Harrell’s proposal

Evaluation of the integrated discrimination improvement (IDI)

Steyerberg et al., Epidemiology, 2010.


Application to eCOPD evolution Data

Description of the IRYSS-COPD Study

Prospective cohort of patients with eCOPD (n = 2487)

Outcome: Short-term mortality

Potential predictors: 16 clinical variables collected from medicalrecords and direct interview (age, baseline FEV1%,dyspnea,comorbidities, arterial blood gasses,...)

GoalThe development of a clinical prediction rule for short-term mortality of

patients with eCOPD

Quintana et al., BMC Health Services Research, 2011.


Application to eCOPD evolution Methods

Modeling – Scoring – Stratification – Implementation


Application to eCOPD evolution Results

Model development and validation

AUC (Model) = 0.85 CI95% = (0.77 - 0.93)H-L test: p = 0.3131



Scoring: development and validation

Score: 0 – 27

AUC (Score) = 0.84 CI95% = (0.76 - 0.93)DeLong test(score vs. model): p = 0.564



Scoring: development and validation



Risk stratificationSubsample 2

AUC (Score) = 0.84 CI95% = (0.77 - 0.91)AUC (Categorical Score) = 0.84 CI95% = (0.78 - 0.91)

DeLong test(categorical vs. score): p = 0.608



Risk stratification


Application to eCOPD evolution Computer tool: PrEveCOPD

Implementation: PrEveCOPD App

Windows (under installation and web-application)

Available at: http://www.ehu.eus/es/web/biostit/prevecop


Application to eCOPD evolution Computer tool: PrEveCOPD

Implementation: PrEveCOPD App

Android: Available at Google Play


Discussion

Validation step-by-step

1 Modeling: Proper validation of a prediction model can lead tobetter and more stable discrimination ability

2 Scoring: A prediction model can be summarized into a valid andeasy to obtain clinical prediction rule (score)

3 Stratification: Categorization of the score allows for validstratification of patients by risk

4 Implementation: An easy to use computer application can guidethe medical decision process in clinical practice


Discussion

Conclusions

1 The proposed methodology as a whole allows for validstratification of patients with eCOPD by their risk of short-termmortality

2 The PrEveCOPD computer tool can guide medical decisionprocess at patient´s ED arrival


Discussion

Is it finished?

External validationThe CPR performs well across samples from different but relatedsource populations (transportability)

1 Relatedness of original (derivation) and new (validation) samples

2 Assessment of the CPR’s performance in the new study

3 Interpretation of the results: Correction of poor performance ifnecessary

External validation is missing! Waiting for a new sample


Discussion

Thank you!


modelling personalized screening: a step forward on risk...

Documents