april 4
DESCRIPTION
April 4. Logistic Regression Lee Chapter 9 Cody and Smith 9:F. HRT Use and Polyps. Case (Polyps). Control (No Polyps). HRT Use. 247. No HRT Use. 216. 174. 289. 463. RO HRT Use (Case v Control). RO = 72/102 175/114. c 2 = ( 463 ) (RO) 2 ( 174) (289) (247) (216). - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/1.jpg)
April 4
• Logistic Regression– Lee Chapter 9
– Cody and Smith 9:F
![Page 2: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/2.jpg)
HRT Use and Polyps
72 175
102 114
Case (Polyps) Control (No Polyps)
HRT Use
216
174 289
RO = 72/102 175/114
= 0.46
No HRT Use
247
RO HRT Use (Case v Control)
463 ) (RO)2
174) (289) (247) (216) =16.04
463
![Page 3: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/3.jpg)
Inference for binary data
• Relative risk, odds ratios, 2x2 tables are limited– Can’t adjust for many confounders– Limited to categorical predictors– Can’t look at multiple variables simultaneously
• Logistic regression– Adjust for many confounders– Study continuous predictors– Model interactions
![Page 4: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/4.jpg)
Linear regression model
Y = o + 1X1 + 2X2 + ... + pXp
Y = dependent variableXi = independent variables
Y is continuous, normally distributed
Model the mean response (Y) based on the predictors
is mean of Y when all Xs are 0 is increase in mean of Y for increase in 1 unit of X
![Page 5: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/5.jpg)
New regression model?
Y?= o + 1X1 + 2X2 + ... + pXp
Y = binary outcome (0 or 1)
Xi = independent variables
Would like to use this type of model for a binary outcome variable
![Page 6: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/6.jpg)
Draw a line ?
![Page 7: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/7.jpg)
What if you had multiple observationsat each Score (or you grouped scores)
Score Proportion Dying
< 10 1/10 = 0.10
11-20 4/15 = 0.27
21-30 5/15 = 0.33
31-40 8/16 = 0.50
*
**
*
![Page 8: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/8.jpg)
Possibilities for Y
Y?= o + 1X1 + 2X2 + ... + pXp
Y = probability of Y = 1 (Problem: Y bound by 0 -1)
Y = odds of Y = 1
Y = log (odds of Y = 1) – Has good properties
![Page 9: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/9.jpg)
Probability, Odds, Log Odds
Odds (Log (Odds)0.01 0.01 -4.600.10 0.11 -2.200.20 0.25 -1.380.30 0.43 -0.850.40 0.63 -0.410.50 1.00 0.000.60 1.50 0.410.70 2.33 0.850.80 4.00 1.380.90 9.00 2.200.99 99.00 4.60
Bound by 0 -1Extreme Values
Less extreme values and symmetric about =0.5
![Page 10: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/10.jpg)
![Page 11: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/11.jpg)
Nearly a straight line for middle values of P
![Page 12: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/12.jpg)
Logistic regression equation
Model log odds of outcome as a linear function of one or more variables
Xi = predictors, independent variables
...)1
log( 22110
xx
The model is:
![Page 13: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/13.jpg)
A Little Math
• The natural LOG and exponential (EXP) functions are inverse functions of each other– LOG (a) = b EXP (b) = a
– LOG (1) = 0 EXP(0) = 1
– LOG (.5) = -0.693 EXP(-.693) = .5
– LOG (1.5) = .405 EXP(.405) = 1.5
These will be logistic regression betas These will be the odds ratios
Note: Calculators and Excel use LN for natural logarithm
![Page 14: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/14.jpg)
A Little Math
• LOG function– Takes values [ 0 to +infinity] [-infinity to +infinity]
• EXP function– Takes values [ -infinity to infinity] [0 to +infinity]
![Page 15: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/15.jpg)
A Little Math
• Properties of LOG function– log (a*b) = log (a) + log (b)
– log (a/b) = log (a) – log (b)
• Properties of EXP function– exp (a+b) = exp(a) * exp(b)
– exp (a-b) = exp(a)/exp(b)
Differences in log odds
Odds Ratios
![Page 16: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/16.jpg)
(ODDS)
![Page 17: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/17.jpg)
These will be typical betas from the logistic regression model
These will be the odds ratios
![Page 18: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/18.jpg)
Logistic regression – single binary covariate
We need to use a dummy variable to code for men and women
x = 1 for women, 0 for men
What do the betas mean? What is odds ratio, women versus men?
x10)1
log(
The model is:
![Page 19: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/19.jpg)
Odds for Men and Women
For men;
01010 )0()1
log(
x
For women;
101010 )1()1
log(
x
After some algebra, the odds ratio is equal to;
)exp(menfor odds
for women odds1B
is difference in log odds between men and women
![Page 20: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/20.jpg)
Example - risk of CVD for men vs. women
log(odds) = 0 + 1x
= -2.5504 - 1.0527*x
For females; log(odds) = -2.5504 - 1.0527(1) = -3.6031
For males; log(odds) = -2.5504 - 1.0527(0) = -2.5504
exp(1) = odds ratio for women vs. men
Here, exp(1) = exp(-1.0527) = 0.35
Women are at a 65% lower risk of the outcome than men (OR<1)
Dif = -1.0527
![Page 21: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/21.jpg)
Note
• Odds ratio from 2 x 2 table• EXP () from logistic regression for binary risk factor
• These will be equal
![Page 22: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/22.jpg)
Multiple logistic regression model
log(odds) = o + 1X1 + 2X2 + ... + pXp
log(odds) = logarithm of the odds for the outcome, dependent variable
Xi = predictors, independent variables
i - log(OR) associated with either• exposure (for categorical predictors) • a 1 unit increase in predictor (for continuous)
OR adjusted for other variables in model
![Page 23: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/23.jpg)
Interpretation of coefficients - continuous predictors
Example - effect of age on risk of death in 10 years
log(odds) = -8.2784+ 0.1026*age
0 = -8.2784, 1 = 0.1026
exp(1) = exp(0.1026) = 1.108
A one year increase in age is associated with an odds ratio of death of 1.108 (assumption that this is true for any 2 consecutive ages)
This is an increase of approximately 11% (= 1.108 - 1)
![Page 24: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/24.jpg)
Interpretation of coefficients - continuous predictors
What about a 5 year increase in age?
Multiply coefficient by the change you want to look at;
exp(5*1) = exp(5*0.1026) = 1.67
A five year increase in age is associated with an odds ratio of death of 1.67
This is an increase of 67%
Note: exp(5*1) does not equal 5*exp(1)
![Page 25: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/25.jpg)
Parameter Estimation
• How do we come up with estimates for i?
• Can’t use least squares since outcome is not continuous
• Use Maximum Likelihood Estimation (MLE)
![Page 26: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/26.jpg)
Maximum Likelihood Estimation
• Choose parameter estimates that maximize the probability of observing the data you observed.
• Example for estimation a proportion – Observe 7/10 have characteristic
– P = 0.70 is estimate – P = 0.70 is MLE of Why?)
– Which value of maximizes the probability of getting 7 of 10?
– Answer: 0.70
![Page 27: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/27.jpg)
MLE Simple Example
• Wish to estimate a proportion • Sample n = 2
– Observe 1 of 2 have characteristic
– L = – What value of maximizes L?
– Answer: = 0.5 which is p=1/2
![Page 28: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/28.jpg)
Fitted regression line
xp
po 1)
1log(
Curve based on:
o effects location
1 effects curvature
![Page 29: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/29.jpg)
Inference for multiple logistic regression
• Collect data, choose model, estimate o and is
• Describe odds ratios, exp(i), in statistical terms.
– How confident are we of our estimate?– Is the odds ratio is different from one due to chance?
Not interested in inference for o (related to overall probability of outcome)
![Page 30: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/30.jpg)
Confidence Intervals for logistic regression coefficients
• General form of 95% CI: Estimate ± 1.96*SE– Bi estimate, provided by SAS– SE is complicated, provided by SAS• Related to variability of our data and sample
size
![Page 31: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/31.jpg)
95% Confidence Intervals for the odds ratio
• Based on transforming the 95% confidence interval for the parameter estimates
• Supplied automatically by SAS
• Look to see if interval contains 1
“We have a statistically significant association between the predictor and the outcome controlling for all other covariates”
• Equivalent to a hypothesis test; reject Ho: OR = 1 at alpha = 0.05. Based on whether or not 1 is in the interval
),( 96.196.1 SEbSEb ii ee
![Page 32: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/32.jpg)
Hypothesis test for individual logistic regression coefficient
• Null and alternative hypotheses– Ho : i = 0, Ha: i 0
• Test statistic: 2 = (i/ SE)2, supplied by SAS
• p-values are supplied by SAS
• If p<0.05, “there is a statistically significant association between the predictor and outcome variable controlling for all other covariates” at alpha = 0.05
![Page 33: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/33.jpg)
PROC LOGISTIC
PROC LOGISTIC DATA = dataset ; MODEL outcome = list of x variables; RUN;
• CLASS statement allows for categorical variables with many
groups (>2)
![Page 34: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/34.jpg)
DATA temp;INPUT apache death @@ ; xdeath = 2; if death = 1 then xdeath = 1;DATALINES;0 0 2 0 3 0 4 0 5 06 0 7 0 8 0 9 0 10 011 0 12 0 13 0 14 0 15 016 0 17 1 18 1 19 0 20 021 1 22 1 23 0 24 1 25 126 1 27 0 28 1 29 1 30 131 1 32 1 33 1 34 1 35 136 1 37 1 38 1 41 0;PROC LOGIST DATA=temp; MODEL xdeath = apache;RUN;
![Page 35: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/35.jpg)
The LOGISTIC Procedure
Model Information
Data Set WORK.TEMPResponse Variable xdeathNumber of Response Levels 2Number of Observations 39Model binary logitOptimization Technique Fisher's scoring
Response Profile
Ordered Total Value xdeath Frequency
1 1 18 2 2 21
Probability modeled is xdeath=1.
![Page 36: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/36.jpg)
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -4.3861 1.3687 10.2686 0.0014apache 1 0.2034 0.0605 11.3093 0.0008
Odds Ratio Estimates
Point 95% WaldEffect Estimate Confidence Limits
apache 1.226 1.089 1.380
EXP(0.2034)EXP(0.2034 – 1.96*.0605)
EXP(0.2034 +1.96*.0605)
![Page 37: April 4](https://reader034.vdocuments.site/reader034/viewer/2022051516/568138b2550346895da0716e/html5/thumbnails/37.jpg)
TOMHS – bpstudy sas dataset
• Variable CLINICAL (1=yes, 0 =no) indicates whether patient had a CVD event
• Run logistic regression separately for age and gender to determine if:
– Age is related to CVD
• What is the odds associated with a 1 year increase in age
• What is the odds associated with a 5 year increase in age
– Gender is related to CVD
• What is the odds of CVD (women versus men)
• Run logistic regression for age and gender together
• Note: Download dataset from web-page or use dataset on SATURN