stats 330: lecture 21

© Department of Statistics 2012 STATS 330 Lecture 21: Slide 1

Stats 330: Lecture 21


Plan of the day

In today’s lecture we continue our discussion of the logistic regression model

Topics covered– Probabilities, odds & log odds– Inference for coefficients, probabilities and log-

odds– Calculating them in R

• Reference: Coursebook, section 5.2.1


Probabilities, Odds and Log Odds

• If E is an event, the probability that E occurs is written P(E).

• The odds on E occuring is the ratio

P(E)/(1-P(E))

• The log-odds is the logarithm of the odds


For the logistic regression model

• Binary response Y=0/1, covariate x

• Let E be the event that Y=1. Let denote this probability. Then

= exp( + x)/[ 1+ exp( + x)]

1 - = 1- exp( + x) /[ 1+ exp( + x)] =1/ [ 1+ exp( + x)]


Odds & log-odds

)exp(

)exp(1/1

)exp(1/)exp(

1

x

x

xx

Odds

Log – odds

(logits)x

x

odds

))log(exp(

1log)log(


Logistic regression model

)exp(1

)exp(

x

x

)exp(1

x

Probability form

Odds form

Log-odds formx

1log


Interpretation of • If x is increased by 1, odds become

exp((x+1)) = exp(x) exp( = old odds exp(

measures effect of unit increase in x on odds (multiplies by exp(

• If x is increased by 1, log odds become (x+1) = x + = old log-odds + measures effect of unit increase in x on

log-odds (adds )


Estimating probabilities and log-odds

• Given a fitted model, and a value of x, how can we estimate the probability ?

• In practical terms, how can we estimate the probability a person of a given age has CHD?

• Example: If age is 45, what is =P(CHD)?

• Use estimates for and : estimate of is -5.2784, estimate of is 0.1103


Hand Calculations

• Estimated probability isexp(-5.2784 + 0.1103 45)/

(1+ exp(-5.2784 + 0.1103 45 )) = 0.4221

• Estimated odds is 0.4221/(1-0.4221) = 0.7304

• Log-odds (logit) is log (0.7304) = -0.3142


Calculations using R

> predict(chd.glm,data.frame(age=45), type="response")[1] 0.4221367

> predict(chd.glm,data.frame(age=45))[1] -0.314008

Calculates probability

Calculates log-odds


Plotting estimated probability: grouped

approachgrouped.chd.df<-data.frame(g.age=sort(unique(chd.df$age)),r=as.vector(tapply(chd.df$chd, chd.df$age,sum)),n=as.vector(tapply(chd.df$chd, chd.df$age,length)))

attach(grouped.chd.df)plot(g.age, r/n, xlab= "age", ylab= "r/n")grouped.chd.glm<-glm(cbind(r, n-r)~g.age, family=binomial, data=grouped.chd.df)

est.prob<-predict(grouped.chd.glm, grouped.chd.df, type="response")lines(g.age,est.prob,lwd=2,col="blue")


20 30 40 50 60 70

0.0

0.2

0.4

0.6

0.8

1.0

age

r/n


Ungrouped approachplot(chd.df$age, chd.df$chd, xlab="age", ylab="CHD")

chd.glm<-glm(chd~age, family=binomial, data=chd.df)

est.prob<-predict(chd.glm, data.frame(age=sort(chd.df$age)), type="response")

lines(sort(chd.df$age),est.prob,lwd=3,col="blue")

Need age in ascending order


20 30 40 50 60 70

0.0

0.2

0.4

0.6

0.8

1.0

age

CH

D


Inference for coefficients and probabilities

• Provided we have sufficient data, the estimated coefficients are approximately normal, similar to linear regression. – (in linear regression, exactly normal under the model

assumptions)

• The Maximum likelihood method gives us a way of computing standard errors for the coefficients and the estimated probabilities - we skip the (complicated) mathematical details


Testing for a zero coefficient

• To test if a coefficient is zero we use the t-statistic and p-value just as in linear regression – tests are interpreted the same way

• (in the case of a single covariate, this is testing that there is no relationship between covariate and response)


CHD example

> summary(chd.glm)

Call:glm(formula = chd ~ age, family = binomial, data = chd.df)Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.2784 1.1296 -4.673 2.97e-06 ***age 0.1103 0.0240 4.596 4.30e-06 ***---

P-values both small, need covariate and intercept


Confidence intervals

Take the form (Wald intervals)

Estimate ± standard error 1.96

e.g. for , we get 0.1103 ± 0.0240 1.96

i.e. 0.1103 ± 0.04704 or (0.0633, 0.1573)

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.2784 1.1296 -4.673 2.97e-06 ***age 0.1103 0.0240 4.596 4.30e-06 ***


Confidence intervals (2)Or, use the confint function (LR intervals)

> confint(chd.glm)Waiting for profiling to be done... 2.5 % 97.5 %(Intercept) -7.68700761 -3.2196722age 0.06638715 0.1612957

> confint(chd.glm, level=0.99)Waiting for profiling to be done... 0.5 % 99.5 %(Intercept) -8.53291031 -2.6281457age 0.05368102 0.1791288


Confidence intervals for probabilities

Calculated with predict function(Like prediction intervals in linear regression)

Form is Estimate ± standard error 1.96

Example: 0.4221 ± 0.0578 1.96 i.e. 0.4221 ± 0.11328

> predict(chd.glm,data.frame(age=45),type="response",se=T)$fit[1] 0.4221367$se.fit[1] 0.05780285$residual.scale[1] 1


Confidence intervals for log-odds



Example: -0.314008 ± 0.2369578 1.96 i.e. -0.3141 ± 0.4644

> predict(chd.glm,data.frame(age=45),se=TRUE)$fit[1] -0.314008$se.fit[1] 0.2369578$residual.scale[1] 1


Confidence intervals for log-odds



Example: -0.314008 ± 0.2369578 1.96 i.e. -0.3141 ± 0.4644

> predict(chd.glm,data.frame(age=45),se=T)$fit[1] -0.314008$se.fit[1] 0.2369578$residual.scale[1] 1

stats 330: lecture 21

Documents