logistic regression & prediction score 16-17-12-09 · nizam a. allied regression analysis and...

15/12/52

Logistic regression analysis&

developing a clinical prediction score

Ammarin Thakkinstian, Ph.D. Section for Clinical Epidemiology

and Biostatistics (SCEB)

• Part I– Logistic regression analysis

• Part II– Developing a clinical prediction score

15/12/52

Objective

• Construct the logit equation • Estimate the probability of event, the

adjusted odds ratio and its 95% confidence interval

• Interpret the results of logistic regression analysisanalysis

• Assess goodness of fit of the logit model & diagnostic measuring

Objective

• Develop a prediction score model using th l it ti & ROC l ithe logit equation & ROC curve analysis

• Calibrate the cut-off or threshold • Validate a prediction score model

15/12/52

Reference• Pagano M. and Gauvreau K. Principle of

Biostatistics California: Duxbury PressBiostatistics. California: Duxbury Press 1993; 379 - 424.

• Klienbaum GD., Kupper LL, Muller EK, and Nizam A. Allied regression analysis and other multivariable methods, 3rd edition. Washington: Duxbury Press 1998; 39 - 212.

• Hosmer DW, Lemeshow S. Applied logistic regression, 2ndedition. New York: John Weiley& Sons, Inc 2000.

Outline of talk• Construct logistic equation

Si l l i ti d l– Simple logistic model– Multiple logistic model

• Model selection – Assessing a goodness of fit of the model – Diagnostic measure g

• Creating a clinical prediction score – Derivative phase– Validation phase

15/12/52

When will we apply the logistic equation

Assessing association bet een factors• Assessing association between factors and outcome in which

• Outcome – Dichotomous only

• DM/Non-Dm, HT/Non-HT, CKD/non-CKD, , , ,Retinopathy/Non-Ratinophaty,

– Factors • Can be either continuous or categorical variables

Example I.

Factors associate with acute stroke• Design: Case-control study • Outcome variable: Case vs Control

– Case is patient who is diagnosed as h h i i h i t khaemorhagic or ischemic stroke

– Control is subject who has never had history of stroke

15/12/52

• Interested variables – Age, gender, BMI, Waist-hip ratio – Smoking, alcohol consumption – Physical activity – History of disease

• DM• HT • High Cholesterol, LDL, HDL, Trig

• Variables (cont)– Genetic factors

• tissue-type plasminogen activator (t-PA)• R353Q polymorphism of the Factor VII gene • Platelet glycoprotein (GP 1bα) gene

– Thr/Met & Kozak polymorphisms

15/12/52

Example II. Factors associate with retinopathy in diabetic type 2 patients

• Design – Cross-sectional study

• Outcome– Retinopathy vs Non-retinopathy

• Variables – Demographic data

• Age, gender BMI/Waist-hip ratio, smoking, alcoholAge, gender BMI/Waist hip ratio, smoking, alcohol – History of disease

• HT • Abnormal lipid profile

– Clinical data • SBP/DBPSBP/DBP • Kidney function (GFR or Cr) • HA1C• Medication

– ACR-I, ARB

15/12/52

Example III. Risk factors of chronic kidney disease (CKD)

• Design – Cross-sectional study

• Outcome – CKD versus non-CKD

• Variables – Age, gender, BMI/Waist-hip ratio – Alcohol consumption – Smoking – Exercise & Physical activity – History of illness

• DM, HT, Abnormal lipid profile, kidney stone , , p p , y– Medication used

• NSAID, Cyclo-oxygenase type 2 inhibitor (Cox-2), Traditional medicine

15/12/52

Example IV. A clinical decision rule to prioritize polysomnography (PSG) in patients with suspected sleep apneapatients with suspected sleep apnea • Design

– Prospective data collection on consecutive patients referred to a sleep centre.

– All consecutive new patients from February 2001 to fApril 2003 were included in the study. Data from

February 2001 to December 2002 were used to derive the decision rule, whereas data collected from January 2003 to April 2003 were used for validation of the rule.

• Setting– The Newcastle Sleep Disorders Centre,

University of Newcastle, NSW, Australia.• Patients

– Consecutive adult patients who had been scheduled for initial diagnostic PSG.

• Study ObjectivesT d i d lid t li i l d i i l th t– To derive and validate a clinical decision rule that can help to prioritize patients who are on waiting lists for PSG.

• Variables

15/12/52

Association between age & Sleep apnea

mean=531

Scatter plot of age and SA

mean=430

20 40 60 80age

15/12/52

Group Age SA Non-SA N Mean P

1 < 30 22 53 75 0.29

2 30-44 146 99 245 0.60

3 45-60 225 79 304 0.74

4 60+ 176 37 213 0.83

Probability of having SA according to age group

<30 37.5 47.5 > 60 age group

15/12/52

• Mean value of SA given age group • E(Y|X) • Expected value (mean) of SA given X

0 ≤E(Y|X) ≤ 1

Logit equation:

=⎥⎥

⎢⎢

⎟⎟

⎜⎜

⎛+−+

jjj Xββ

15/12/52

−=− k

∑+∑+

−+= k

XββXββ

jjj Xββ

15/12/52

Simple logistic regression

• Fit equation

snorebbP

P10ln +=⎥⎦

⎤⎢⎣⎡

P 101 ⎥⎦⎢⎣ −

Performing analysis in STATA

xi: logit SA i.snore, nologi.snore _Isnore_1-2 (naturally coded; _Isnore_2 omitted)

Logistic regression Number of obs = 837LR chi2(1) = 86.63Prob > chi2 = 0.0000

Log likelihood = -481.49775 Pseudo R2 = 0.0825

------------------------------------------------------------------------------SA | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------_Isnore_1 | 1.571043 .1717837 9.15 0.000 1.234354 1.907733

_cons | -.3846743 .1440453 -2.67 0.008 -.6669979 -.1023508------------------------------------------------------------------------------

15/12/52

Interpretation

• Patients with a history of snoring have the logit of sleep apnea 1.57 higher than patients without a history of snoring.

Interpretation

• The logit of sleep apnea for patients with & ith t hi t f i i th fwithout a history of snoring is therefore

equated

57.138.0)]([ln +−=+snoreSAodds

38.0)]([ln −=−snoreSAodds

15/12/52

InterpretationsnoreSAoddssnoreSAodds ++−=− −+

57.138.038.0)]([ln)]([ln

SAddsnoreSAodds

snoreSAoddssnoreSAodds

=⎥⎥⎦

⎢⎢⎣

)57.1exp()]([)]([

57.1.)]([)]([ln

ORsnoreSAoddssnoreSAodds

snoreSAodds

)]([)]([

)p()]([

Testing association

1ORor 1 == 0: βHo

• Wald test

6221β 30.461.062.21 ===

15/12/52

Testing association

• Likelihood ratio test

86.6 = 481.5)+2(-524.8- =G 10 ][2 LLLLG −−=

1-2 df with ~G 2χ

Estimate probability of having event

157.138.01

ln +−=−

pfor Solve

sides, bothfor logarithm-anti Taking

77.029.429.3

29.329.4

29.329.3

∧∧

pfor Solve

15/12/52

Multiple logistic regression

• Multiple factors associate with the outcome of interest

• Osteoporotic hip fractureA BMI f C ti t id l h l– Age, BMI, use of Corticosteroid, alcohol consumption, calcium intake, etc

• CKD– Age, Gender, BMI, use of NSAID, diabetes,

HT, Chol

• SASA– Age, gender, BMI, snore, stop breathing, etc

15/12/52

• Consider > 1 factor simultaneously • Cumulative factors can better predict

event than one factor • Control confounding effects, i.e., assess

effect of each factor controlling for other factorsfactors

ppxβxβxβxβxββDDit ++++++=

⎥⎥⎦

⎢⎢⎣

⎡−

...log 443322110

Steps of analysis

• Model selection – Only variables can well explain the interested

event • Clinical significance• Statistical significance

– Not too many (but not too small) variables y ( )

15/12/52

Model selection

• i) Univariate analysis ) y• age_gr , sex, BMI_gr, snore, stop_bre,

choking, awake_re, kick_leg, accident, smoker, alcohol, ht, dm allergie

Factors Group P value

SAn = (%)

Non-SA n = (%)

TABLE 1. Patients’ characteristics between SA and non-SA groups

Age , mean (SD)< 30

30 - 4445 - 59

GenderMaleFemale

BMI, mean (SD)< 25

25 - 29.930 - 39.9

15/12/52

Snoring YesNo

Stopping breathing YesNo

Ch kiChoking YesNo

Waking up refreshed YesSometimeNo

L ki kiLeg kicking YesSometimeNo

Accident due to sleepinessYesNo

FactorsGroup

P valueSAn = (%)

Non-SAn = (%)

ESS score, median (range)

Smoking YesEx-smokeEx smoke No

Alcohol consumptionYesNo

HypertensionYesNoNo

Diabetes mellitusYesNo

Allergy YesNo

15/12/52

Model selection

• ii) Multivariate analysis by simultaneously id i i bl 0 15 i t thconsidering variables p < 0.15 into the

AgegrβAgegrβAgegrβ

breStopββSASAit

443322

++=⎥⎥⎦

⎢⎢⎣

⎡−

ppxβSnoreβ

BMIgrβBMIgrβBMIgrβSexβ

483726

Confounder versus Interaction• Confounders• Confounders

• Crude OR versus Adjusted OR

15/12/52

Effect modifier

15/12/52

Model selection

Backward– Backward – Forward

Performance of the model

• Goodness of fit (Calibration)• How similar are the predicted and observed

outcomes?

15/12/52

Model classification• How well the model discriminate SA from

non-SA subjects? ff/• Assign the cut-off/threshold

• Construct 2x2 or kx2 tables• Estimate predictive values

– SenS– Spec

– PPV, NPV– Accuracy – Area under ROC

15/12/52

Model classification

• Area under the ROC– Summary statistics that can tell us whether

the logit model can discriminate disease from non-disease subjects.

– Plots sensitivity versus 1-specificity (false positive) for the whole range of estimated

b bilitiprobabilities

tivity

0.00 0.25 0.50 0.75 1.001 - Specificity

Area under ROC curve = 0.8101

15/12/52

Interpretation of ROC

Diagnostic measures • Outliers

- Pearson’s chi-square residual q

)ˆ1(ˆ

)ˆ()ˆ,(

jjjjj ππm

πmyπyr

square sum Residual

)ˆ1(ˆ)ˆ(

)ˆ,(2

jjjjj ππm

πmyπyr

15/12/52

Outliers

- Deviance residual

)(ln)(ˆln2)ˆ,(

⎥⎥

⎢⎢

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

⎟⎟

⎜⎜

−−+⎟

⎜⎜

⎛±=

jjjj πm

yπyd)1( ⎥

⎦⎢⎣ ⎪⎭⎪⎩

⎟⎠

⎜⎝

⎟⎠

⎜⎝ jjjj πmπm

• Leverage hjj values • Reflects distance of Xj from the centre mean

Outliers

Reflects distance of Xj from the centre mean• The higher the hjj, the longer distance that

)( 2/112/1 VXVXXXVH ′′= −

[ ])(ˆ1)(ˆv

andmatrix diagonal JxJV

j xxm jjj ππ −=

15/12/52

Influence of outliers

• Influence on prediction value of Y• Including/excluding the pattern/s that are

outlier would change Y values • Pearson residual change

2)1( jj

• Deviance residual change

−=Δ

15/12/52

Influence on estimate coefficients

( ) ( )( )2

)()(ˆˆˆˆˆ

ββVXXβββ −′′

−=Δ −−

2)1( j

Delta Pearson chi-square versus predicted probability

0 .2 .4 .6 .8 1Pr(SA)

15/12/52

Delta D versus Probability

0 .2 .4 .6 .8 1Pr(SA)

Delta B versus Probability

0 .2 .4 .6 .8 1Pr(SA)

15/12/52

0 .2 .4 .6 .8 1Pr(SA)

Create scoring scheme using coefficients of each variable Factors Coefficients Score for individual

Stopping breathingYesNo

……………………..

Age> 60 2.2 ……………………..> 60

45 - 5930 - 44

2.21.51.00

……………………..

BMI> 40

30 - 39.925 - 29.9

2.31.51.10

……………………..

SnoringSnoringYesNo

……………………..

GenderMale

Female1.10

……………………..

Total score ……………………..

15/12/52

Calculate score

gen score_full = _b[_cons] + /// b[ Istop bre 1]* Istop bre 1 + ///_b[_Istop_bre_1]*_Istop_bre_1 + ///

_b[_Iage_gr_2]*_Iage_gr_2 + /// _b[_Iage_gr_3]*_Iage_gr_3 + ///

_b[_Iage_gr_4]*_Iage_gr_4 + ///

_b[_IBMI_gr_29]*_IBMI_gr_29 + ///_b[_IBMI_gr_39]*_IBMI_gr_39 + /// _b[_IBMI_gr_40]*_IBMI_gr_40 + ///

b[ Isex 2]* Isex 2 ///_b[_Isex_2]*_Isex_2 + ///

_b[_Isnore_1]*_Isnore_1

Discrimination performance roctab SA score, detail------------------------------------------------------------------------------

CorrectlyCutpoint Sensitivity Specificity Classified LR+ LR-p y p y------------------------------------------------------------------------------( >= 3.890326 ) 91.92% 50.00% 78.49% 1.8383 0.1617( >= 3.895265 ) 91.74% 51.12% 78.73% 1.8768 0.1616( >= 3.896797 ) 89.28% 54.48% 78.14% 1.9612 0.1968( >= 3.940307 ) 89.28% 55.22% 78.38% 1.9939 0.1941( >= 3.990148 ) 88.93% 55.22% 78.14% 1.9861 0.2005( >= 4.049621 ) 88.58% 55.22% 77.90% 1.9782 0.2069( >= 4.051153 ) 87.70% 57.09% 77.90% 2.0437 0.2155( >= 4.090991 ) 87.35% 57.46% 77.78% 2.0534 0.2202( .09099 ) 8 .35% 5 . 6% . 8% .053 0. 0( >= 5.355929 ) 55.54% 85.07% 64.99% 3.7209 0.5226( >= 5.440022 ) 48.51% 88.43% 61.29% 4.1934 0.5823( >= 5.441554 ) 48.33% 89.93% 61.65% 4.7972 0.5746( >= 5.455751 ) 48.15% 89.93% 61.53% 4.7798 0.5765( >= 5.474413 ) 47.28% 90.30% 61.05% 4.8731 0.5839( >= 5.635747 ) 40.95% 91.42% 57.11% 4.7715 0.6459( >= 5.649945 ) 40.77% 91.42% 56.99% 4.7510 0.6479

15/12/52

( >= 5.651477 ) 38.66% 92.91% 56.03% 5.4537 0.6602( >= 5.67371 ) 37.79% 92.91% 55.44% 5.3298 0.6696( >= 5.867904 ) 36.73% 93.66% 54.96% 5.7905 0.6755( >= 5.883634 ) 36.03% 93.66% 54.48% 5.6797 0.6830( > 6 137812 ) 22 85% 95 90% 46 24% 5 5664 0 8046( >= 6.137812 ) 22.85% 95.90% 46.24% 5.5664 0.8046( >= 6.237287 ) 18.45% 96.64% 43.49% 5.4950 0.8438------------------------------------------------------------------------------------------------------------------------------------------------------------

ROC -Asymptotic Normal--Obs Area Std. Err. [95% Conf. Interval]

--------------------------------------------------------837 0.8101 0.0165 0.77763 0.84249

Model selection based on model classification

• ROC curve analysis • Comparing area under ROC curves

15/12/52

Calibrate cutoff

• Score’s distributionScore s distribution – Tertile, quantile

• Yuden index – Sen+spec-1p

• LR+

Validation

• Internal validation – Data are from the same setting

• Split data • Bootstrap • Period

• External validation – Generalization – Data are from different setting

logistic regression & prediction score 16-17-12-09 · nizam a. allied regression analysis and...

Documents

duxbury clipper 2011_01_06

duxbury co-located middle high school. duxbury…...

duxbury clipper 2010_09_06

duxbury clipper 04_01_2009

duxbury clipper 2009_23_12

duxbury clipper_2010_24_10

duxbury clipper 2010_29_12

duxbury clipper 2011_05_11

duxbury clipper 2010_10_03

the duxbury land trust newsletterthe duxbury land trust...

duxbury clipper 2009_09_09

duxbury clipper 2010_01_09

duxbury clipper 2010_25_08

duxbury clipper 2009_30_09

duxbury clipper 2009_04_11

duxbury clipper 2010_14_07

duxbury clipper 2011_02_02

duxbury clipper 2011_25_05

duxbury clipper 14_10_09

duxbury clipper 2009_16_09