modelling personalized screening: a step forward on risk...
TRANSCRIPT
Modelling Personalized Screening: a Step Forwardon Risk Assessment Methods
Validating Prediction Models
Inmaculada Arostegui
Universidad del País Vasco UPV/EHURed de Investigación en Servicios de Salud en Enfermedades Crónicas - REDISSEC
Basque Center for Applied Mathematics - BCAM
38th Annual Conference of the ISCBVigo, 9-13 July 2017
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 1 / 29
Outline
1 Introduction and Motivation
2 CPRs: Validation process
3 Application to eCOPD evolution
4 Discussion
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 2 / 29
Introduction and Motivation
Prediction models and clinical practice
Prediction on the prognosis of a disease is necessary forscreening, prevention and choice of treatment
The probabilities of diagnosis and prognostic outcomes areconditioning decision-making process
“Evidence-based medicine” applies the scientific method tomedical practice
Towards “shared decision-making” on choices for diagnostictests and therapeutic interventions
↓Clinical prediction rules may provide the evidence-based input for
shared decision-making in clinical practice
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 3 / 29
Introduction and Motivation
Motivating data: The IRYSS–COPD Study
COPD is a leading chronic condition in many countries
Exacerbation of COPD (eCOPD) often requires assessment in anED and hospitalization
I Severe exacerbations lead to death or intubationI Moderate exacerbations require an adjustment of the therapy
Exacerbations play a major role in the burden of COPD, itsevolution, and its cost
Physicians must rely largely on their experience and the patient’spersonal criteria for gauging how an eCOPD will evolve
A clinical prediction rule for eCOPD evolution would allowphysicians to make better informed decisions about treatment
GoalThe development of clinical prediction rules (scores) for risk
stratification of patients with eCOPD
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 4 / 29
Introduction and Motivation
Goal
A method for the development of validated clinical prediction rules(scores) for risk stratification and to make them available as easy to
use tools for clinical decision-making process
↓
development
validatedscores
stratificationeasy to use tools
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 5 / 29
CPRs: Validation process General overview
Step-by-step process
1 Modeling: Model development and validation
2 Scoring: Score development and validation
3 Stratification: Score categorization and validation
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 6 / 29
CPRs: Validation process Model development and validation
Modeling: Development
In general:
I OutcomeI k predictorsI Model
In our case:I Binary outcomeI Continuous and categorical predictors
↓Logistic regression model
I Selection of predictorsI Model discrimination: Area under the receiver operating
characteristic (ROC) curve (AUC)I Model calibration: Calibration plot & H-L test
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 7 / 29
CPRs: Validation process Model development and validation
Modeling: Validation
1 Predictors:I Relationship predictor-outcomeI Missing values
2 Selection of predictors: Stability of the predictors with internalbootstrap validation
3 Overestimation of the AUC:I Same data were used for modeling (logistic regression) and
discrimination (AUC) purposesI Consequently, AUC is biasedI Optimism correction for the AUC is proposed: bootstrap
bias-correction methodHarrell, 2001.
4 Split validation: Application to a different sample
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 8 / 29
CPRs: Validation process Model development and validation
Predictors
Relationship predictor-outcome (logistic function)LinearNon linear
I Smooth functions (GAM)I Categorize predictor: Look for optimal categorization
Missing valuesIgnore (drop out subjects)Imputation techniquesConsider missing category
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 9 / 29
CPRs: Validation process Model development and validation
Selection of predictors: Step 1
Derivation sample
Variables with p-value <0.20 (𝑋1, … , 𝑋𝑛)
Subsample 1 ….
Generation of 2000 bootstrap samples*
Model 1 ….
….
0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
N = 2000 Bandwidth = 0.07097
De
nsity
-10 0 10 20
0.0
0.1
0.2
0.3
0.4
N = 1997 Bandwidth = 0.1757
De
nsity
• If 0 ∊ 𝛽𝑖𝐶𝐼 80%=(𝑝10−𝑝90)𝛽𝑖
𝑋𝑖 was not considered for the Step 2.0
• If 0 ∉ 𝛽𝑖𝐶𝐼 80%=(𝑝10−𝑝90)𝛽𝑖
𝑋𝑖 was considered for the Step 2.0
STEP 1: Variable selection
*Bootstrap samples: subsamples with replacement (of the same size as the derivation sample)
(β11
,…, β𝑛1
)
Subsample 2000
Model 2000
(β12000
,…, β𝑛2000
)
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 10 / 29
CPRs: Validation process Model development and validation
Selection of predictors: Step 2
STEP 2: Model building
Risk factors associated with the
outcome in Step 2.j-1 (𝑋𝑟𝑗
, … , 𝑋𝑠𝑗) 1≤ 𝑟𝑗 <𝑠𝑗≤ n
Subsample 1
…. Model 1
…. (β11
,…, β𝑛1
)
Model 2000
(β12000
,…, β𝑛2000
)
Generation of 2000 NEW boostraps Subsample 2000 ….
• If 0 ∊ 𝛽𝑖𝐶𝐼 95%=(𝑝2,5−𝑝97,5)𝛽𝑖 𝑋𝑖 was not considered for the Step 2.j+1
• If 0 ∉ 𝛽𝑖𝐶𝐼 95%=(𝑝2,5−𝑝97,5)𝛽𝑖 𝑋𝑖 was considered for the Step 2.j+1
Step 2.j is repeated since all the variables
in the model verify 0 ∉ 𝛽𝑖𝐶𝐼 95%
i ∊{𝑟𝑗,…, 𝑠𝑗}
FINAL MODEL
Step 2.j : j=1,..
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 11 / 29
CPRs: Validation process Model development and validation
AUC correctionStep 1 Fit the logistic regression model on the basis of the originalsample {(x i , yi)}N
i=1 and compute the corresponding AUC, AUCapp.
Step 2 For b = 1, . . . ,B, generate the bootstrap resample (b.r) {(x∗ib, y
∗ib)}
Ni=1
by drawing a random sample of size N with replacement from the originalsample.
Step 3 Fit the logistic regression model to the bootstrap resample and
compute the corresponding AUC, AUCb
boot .
Step 4 Obtain the predicted probabilities for the original sample based on thefitted logistic regression model obtained in Step 3 and compute the AUC,
AUCb
o.The optimism O of the original AUC is calculated as follows
O =1B
B∑b=1
(AUCb
boot − AUCb
o)
and the bias corrected AUC is then computed as AUCapp −O.I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 12 / 29
CPRs: Validation process Score development and validation
Scoring: DevelopmentStep1: Estimate the parameters of the model
f (y) = β0 + β1X1 + · · ·+ βnXn
Step2: Determine reference values for each category j of each predictor Xi (Wij )Dichotomous predictor: reference values are 0/1Continuous predictor (Xi ): Categorize in k contiguous classes (Xi1,Xi2, · · · ,Xik )
Step3: Determine the reference value of the base category for each predictor (WiREF )Step4: Set the number of regression units that reflects 1 point in the score (B)Step5: Weight each category of each predictor by its significance level (bj )
p > 0.1⇒ bij = 00.05 < p < 0.1⇒ bij = 0.60.01 < p < 0.05⇒ bij = 10.001 < p < 0.01⇒ bij = 1.2p < 0.001⇒ bij = 1.4
Step6: Determine the number of points for each category of each predictor (Sij )
Sij = bijβi (Wij−WiREF )
B
Sullivan et al., Statistics in Medicine, 2004.
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 13 / 29
CPRs: Validation process Score development and validation
Scoring: Validation
1 Comparing AUC(model) vs. AUC(score): DeLong testDeLong et al., Biometrics, 1988.
2 Optimism correction for the AUC: Bootstrap bias-correction of theoverestimation
Harrell, 2001.
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 14 / 29
CPRs: Validation process Score categorization
Stratification: Categorization methodLet Y be a dichotomous response variable and X thecontinuous score which we want to categorize
Look for the vector of k optimal cut points v = (x1, . . . , xk ) byusing genetic algorithmsThe aim is to maximize the AUC of the model
P(Y = 1|Xcatk ) =exp(β0 +
∑kl=1 βl1{Xcatk =l})
1 + exp(β0 +∑k
l=1 βl1{Xcatk =l})
The arguments used in developing the genetic algorithm:I AUC function to be maximizedI k number of parameters to be estimatedI Range of the score X in which we look for the cut points
XCatkthe categorized score taking k + 1 values (l = 0, . . . , k)
Barrio et al., Statistical Methods in Medical Research, 2015.
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 15 / 29
CPRs: Validation process Score categorization
Risk stratification
Continuous score: XAfter categorization: XCatk (k = 4)
↓4 risk categories: low - moderate - high - very high
Comparing AUC(XCat4) vs. AUC(X ): DeLong test
Optimism correction for the AUC: Modified Harrell’s proposal
Evaluation of the integrated discrimination improvement (IDI)
Steyerberg et al., Epidemiology, 2010.
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 16 / 29
Application to eCOPD evolution Data
Description of the IRYSS-COPD Study
Prospective cohort of patients with eCOPD (n = 2487)
Outcome: Short-term mortality
Potential predictors: 16 clinical variables collected from medicalrecords and direct interview (age, baseline FEV1%,dyspnea,comorbidities, arterial blood gasses,...)
GoalThe development of a clinical prediction rule for short-term mortality of
patients with eCOPD
Quintana et al., BMC Health Services Research, 2011.
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 17 / 29
Application to eCOPD evolution Methods
Modeling – Scoring – Stratification – Implementation
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 18 / 29
Application to eCOPD evolution Results
Model development and validation
AUC (Model) = 0.85 CI95% = (0.77 - 0.93)H-L test: p = 0.3131
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 19 / 29
Application to eCOPD evolution Results
Scoring: development and validation
Score: 0 – 27
AUC (Score) = 0.84 CI95% = (0.76 - 0.93)DeLong test(score vs. model): p = 0.564
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 20 / 29
Application to eCOPD evolution Results
Scoring: development and validation
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 21 / 29
Application to eCOPD evolution Results
Risk stratificationSubsample 2
AUC (Score) = 0.84 CI95% = (0.77 - 0.91)AUC (Categorical Score) = 0.84 CI95% = (0.78 - 0.91)
DeLong test(categorical vs. score): p = 0.608
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 22 / 29
Application to eCOPD evolution Results
Risk stratification
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 23 / 29
Application to eCOPD evolution Computer tool: PrEveCOPD
Implementation: PrEveCOPD App
Windows (under installation and web-application)
Available at: http://www.ehu.eus/es/web/biostit/prevecop
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 24 / 29
Application to eCOPD evolution Computer tool: PrEveCOPD
Implementation: PrEveCOPD App
Android: Available at Google Play
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 25 / 29
Discussion
Validation step-by-step
1 Modeling: Proper validation of a prediction model can lead tobetter and more stable discrimination ability
2 Scoring: A prediction model can be summarized into a valid andeasy to obtain clinical prediction rule (score)
3 Stratification: Categorization of the score allows for validstratification of patients by risk
4 Implementation: An easy to use computer application can guidethe medical decision process in clinical practice
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 26 / 29
Discussion
Conclusions
1 The proposed methodology as a whole allows for validstratification of patients with eCOPD by their risk of short-termmortality
2 The PrEveCOPD computer tool can guide medical decisionprocess at patient´s ED arrival
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 27 / 29
Discussion
Is it finished?
External validationThe CPR performs well across samples from different but relatedsource populations (transportability)
1 Relatedness of original (derivation) and new (validation) samples
2 Assessment of the CPR’s performance in the new study
3 Interpretation of the results: Correction of poor performance ifnecessary
External validation is missing! Waiting for a new sample
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 28 / 29
Discussion
Thank you!
I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 29 / 29