© Department of Statistics 2012 STATS 330 Lecture 23: Slide 2
Plan of the day
In today’s lecture we continue our discussion of the multiple logistic regression model
Topics covered– Models and submodels– Residuals for Multiple Logistic Regression– Diagnostics in Multiple Logistic Regression– No analogue of R2
Reference: Coursebook, sections 5.2.3, 5.2.3
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 3
Comparison of models
• Suppose model 1 and model 2 are two models, with model 2 a submodel of model1
• If Model 2 is in fact correct, then the difference in the deviances will have approximately a chi-squared distribution
• df equals the difference in df of the separate models
• Approximation OK for grouped and ungrouped data
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 4
Example: kyphosis data• Is age alone an adequate model?
> age.glm<-glm(Kyphosis~Age+I(Age^2),family=binomial, data=kyphosis.df)Null deviance: 83.234 on 80 degrees of freedomResidual deviance: 72.739 on 78 degrees of freedomAIC: 78.739
Full model has deviance 54.428 on 76 df
Chisq is 72.739 - 54.428 = 18.311 on 78-76=2 df> 1-pchisq(18.311,2)[1] 0.0001056372Highly significant: need at least one of start and number
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 5
Anova in R
> anova(age.glm,kyphosis.glm, test=“Chi”)Analysis of Deviance Table
Model 1: Kyphosis ~ Age + I(Age^2)Model 2: Kyphosis ~ Age + I(Age^2) + Start + Number Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 78 72.739 2 76 54.428 2 18.311 0.0001056 ***
Two-model form: comparing
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 6
Residuals
• Two kinds of residuals– Pearson residuals
• useful for grouped data only• similar to residuals in linear regression,
actual minus fitted value– Deviance residuals
• useful for grouped and ungrouped data• Measure contribution of each covariate
pattern to the deviance
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 7
Pearson residuals
Pearson residual for pattern i is
)ˆ1(ˆ
ˆ
iii
iii
n
nr
Probability predicted by model
Standardized to have approximately unit variance, so big if more than 2 in absolute value
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 8
Deviance residuals (i)• For grouped data, the deviance is
otherwise ve- and ,nr if ve si d
nn
rnrn
n
rrd
whered
nn
rnrn
n
rrdeviance
iiii
iii
ii
ii
ii
i
ii
M
ii
iii
ii
ii
ii
iM
ii
ˆ
2/1
ˆlog)(2
ˆlog2
ˆlog)(2
ˆlog2
1
2
1
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 9
Deviance residuals (i)• Thus, the deviance can be written as the
sum of squares of M quantities d1, …, dM ,one for each covariate pattern
• Each di is the contribution to the deviance from the ith covariate pattern
• If deviance residual is big (more than about 2 in magnitude), then the covariate pattern has a big influence on the likelihood, and hence the estimates
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 10
Calculating residuals> pearson.residuals<-residuals(budworm.glm,
type="pearson")> deviance.residuals<-residuals(budworm.glm, type="deviance")
> par(mfrow=c(1,2))> plot(pearson.residuals, ylab="residuals", main="Pearson")
> abline(h=0,lty=2)
> plot(deviance.residuals, ylab="residuals", main="Deviance")
> abline(h=0,lty=2)
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 11
2 4 6 8 10 12
-3-2
-10
12
Pearson
Index
resi
du
als
2 4 6 8 10 12
-2-1
01
2
Deviance
Index
resi
du
als
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 12
Diagnostics: outlier detection
• Large residuals indicate covariate patterns poorly fitted by the model
• Large Pearson residuals indicate a poor match between the “maximum model probabilities” and the logistic model probabilities, for grouped data
• Large deviance residuals indicate influential points
• Example: budworm data
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 13
Diagnostics: detecting non-linear regression functions
• For a single x, plot the logits of the maximal model probabilities against x
• For multiple x’s, plot Pearson residuals against fitted probabilities, against individual x’s
• If the data has most ni’s equal to 1, so can’t be grouped, try gam (cf kyphosis data)
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 14
Example: budworms
• Plot Pearson residuals versus dose, plot shows a curve
0 5 10 15 20 25 30
-3-2
-10
12
Pearson residuals vs dose
budworm.df$dose
Pe
ars
on
re
sid
ua
ls
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 15
Diagnostics: influential points
Will look at 3 diagnostics
– Hat matrix diagonals– Cook’s distance– Leave-one-out Deviance Change
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 16
Example: vaso-constriction data
Data from study of reflex vaso-constriction (narrowing of the blood vessels) of the skin of the fingers– Can be caused caused by sharp intake of
breath
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 17
Example: vaso-constriction data
Variables measured:
Response = 0/1 1=vaso-constriction occurs, 0 = doesn’t occur
Volume: volume of air breathed in
Rate: rate of intake of breath
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 18
Data Volume Rate Response1 3.70 0.825 12 3.50 1.090 13 1.25 2.500 14 0.75 1.500 15 0.80 3.200 16 0.70 3.500 17 0.60 0.750 08 1.10 1.700 09 0.90 0.750 010 0.90 0.450 011 0.80 0.570 012 0.55 2.750 013 0.60 3.000 0. . . 39 obs in all
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 19
Plot of data
> plot(Rate,Volume,type="n", cex=1.2)> text(Rate,Volume,1:39, col=ifelse(Response==1, “red",“blue"), cex=1.2)> text(2.3,3.5,“blue: no VS", col=“blue", adj=0, cex=1.2)> text(2.3,3.0,“red: VS", col=“red", adj=0, cex=1.2)
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 20
0 1 2 3
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Plot of volume versus rate, with ID numbers shown
Rate
Vo
lum
e
1
2
3
4 56
7
8
91011
12 13
14
15
16
17
18
1920
21
22
2324
25
26
27
28
29
30
31
32
33 3435
3637
38
39
red: VS
blue: no VS
Note points 4 and 18
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 21
Enhanced residual plots> vaso.glm = glm(Response ~ log(Volume) + log(Rate), family=binomial, data=vaso.df)> pear.r<-residuals(vaso.glm, type="pearson")> dev.r<-residuals(vaso.glm, type="deviance")> par(mfrow=c(1,2))> plot(pear.r, ylab="residuals", main="Pearson",type="n")> text(pear.r,cex=0.7)> abline(h=0,lty=2)> abline(h=2,lty=2,lwd=2)> abline(h=-2,lty=2,lwd=2)> plot(dev.r, ylab="residuals", main="Deviance",type="h")> text(dev.r, cex=0.7)> abline(h=0,lty=2)> abline(h=2,lty=2,lwd=2)> abline(h=-2,lty=2,lwd=2)
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 22
0 10 20 30 40
-10
12
3Pearson
Index
resi
du
als
12
3
4
56
7
8
91011
12
13
14
15
1617
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
343536
37
38
39
0 10 20 30 40
-10
12
Deviance
Index
resi
du
als
12
3
4
56
7
8
91011
12
13
14
15
1617
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
343536
37
38
39
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 23
Diagnostics: Hat matrix diagonals
• Can define hat matrix diagonals (HMD’s) pretty much as in linear models
• HMD big if HMD > 3p/M (M= no of covariate patterns)
• Draw index plot of HMD’s
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 24
Plotting HMD’s
> HMD<-hatvalues(vaso.glm)> plot(HMD,ylab="HMD's",type="h")> text(HMD,cex=0.7)> abline(h=3*3/39, lty=2)
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 25
0 10 20 30 40
0.0
00
.05
0.1
00
.15
0.2
00
.25
Index
HM
D's
1
2
3
4
5
6
7
8
9
10 11
12
13
14
15
16
17
18
19
20
21
22
2324
2526
27 28
29
30
31
32
33
3435
36
37
38
39
Obs 31 high-leverage
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 26
Hat matrix diagonals
• In ordinary regression, the hat matrix diagonals measure how “outlying” the covariates for an observation are
• In logistic regression, the HMD’s measure the same thing, but are down-weighted according to the estimated probability for the observation. The weights gets small if the probability is close to 0 or 1.
• In the vaso-constriction data, points 1,2,17 had very small weights, since the probabilities are close to 1 for these points.
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 27
0 1 2 3
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Plot of volume versus rate, with ID numbers shown
Rate
Vo
lum
e
1
2
3
4 56
7
8
91011
12 13
14
15
16
17
18
1920
21
22
2324
25
26
27
28
29
30
31
32
33 3435
3637
38
39
red: VS
blue: no VS
Note points 1,2,17
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 28
Diagnostics: Cooks distance
• Can define an analogue of Cook’s distance for each pointCD = (Pearson resid )2 x HMD/(p*(1-HMD)2)
p = number of coeficients
• CD big if more than about 10% quantile of the chi-squared distribution on k+1 df, divided by k+1
• Calculate with qchisq(0.1,k+1)/(k+1)• But not that reliable as a measure
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 29
Cooks D: calculating and plotting
p<-3CD<-cooks.distance(vaso.glm)plot(CD,ylab="Cook's D",type="h",main="index plot of Cook's distances")text(CD, cex=0.7) bigcook<-qchisq(0.1,p)/pabline(h=bigcook, lty=2)
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 30
0 10 20 30 40
0.0
0.1
0.2
0.3
0.4
index plot of Cook's distances
Index
Co
ok'
s D
1 2 3
4
56
7
8
9 10 11
12
13
1415
16 17
18
19
20 2122
23
24
25 26 27
28
29
30
31
32
33
34 3536
3738 39
Points 4 and 18 influential
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 31
Diagnostics: leave-one-out deviance change
• If the ith covariate pattern is left out, the change in the deviance is approximately
(Dev. Res) 2 + (Pearson. Res)2HMD/(1-HMD)
Big if more than about 4
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 32
Deviance change: calculating and plotting
> dev.r<-residuals(vaso.glm,type="deviance")> Dev.change<-dev.r^2 + pear.r^2*HMD/(1-HMD)> plot(Dev.change,ylab="Deviance change", type="h")> text(Dev.change, cex=0.7)> bigdev<-4> abline(h=bigdev, lty=2)
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 33
0 10 20 30 40
01
23
45
6
Deviance change
Index
De
via
nce
ch
an
ge
1 23
4
56
7
8
9 10 11
12
13
14
15
16 17
18
19
20 21
22
23
24
2526
27
2829
30
31
32
33
34 35 36
37
38
39
4 and 18 influential
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 34
All together
influenceplots(vaso.glm)
0 10 20 30 40-1
01
2
Index plot of deviance residuals
Observation number
Dev
ianc
e R
esid
uals
4 18
0 10 20 30 40
0.00
0.10
0.20
Leverage plot
Observation Number
Leve
rage
31
0 10 20 30 40
0.0
0.1
0.2
0.3
0.4
Cook's Distance Plot
Observation number
Coo
k's
Dis
tanc
e
4
18
0 10 20 30 40
01
23
45
6
Deviance Changes Plot
Observation number
Dev
ianc
e ch
ange
s
4
18
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 35
Should we delete points?
• How influential are the 3 points?
• Can delete each in turn and examine changes in coefficients, predicted probabilities
• First, coefficients:Deleting: None 31 4 18 All 3(Intercept) -2.875 -3.041 -5.206 -4.758 -24.348log(Volume) 5.179 4.966 8.468 7.671 39.142log(Rate) 4.562 4.765 7.455 6.880 31.642
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 36
Should we delete points (2)?
• Next, fitted probabilities:
• Conclusion: points 4 and 18 have a big effect.
delete pointsFitted at None 31 4 18 4 and 18 All 3point 31 0.722 0.627 0.743 0.707 0.996 0.996point 4 0.075 0.073 0.010 0.015 0.000 0.000point 18 0.106 0.100 0.018 0.026 0.000 0.000
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 37
Should we delete points (3)?
• Should we delete?
• They could be genuine – no real evidence they are wrong
• If we delete them, we increase the regression coefficients, make fitted probabilities more extreme
• Overstate the predictive ability of the model
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 38
Residuals for ungrouped data
• If all cases have distinct covariate patterns, then the residuals lie along two curves (corresponding to success and failure) and have little or no diagnostic value.
• Thus, there is a pattern even if everything is OK.
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 39
Formulas
• Pearson residuals: for ungrouped data, residual for i th case is
0,ˆ1
ˆ
1,ˆ
ˆ1
y
y
i
i
i
i
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 40
Formulas (cont)
• Deviance residuals: for ungrouped data, residual for i th case is
ˆ2 | log |, 1
ˆ2 | log(1 ) |, 0
y
y
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 41
Use of plot function
-6 -4 -2 0 2
-2-1
01
2
Predicted values
Res
idua
ls
Residuals vs Fitted
77
43
46
-2 -1 0 1 2
-3-2
-10
12
3
Theoretical Quantiles
Std
. dev
ianc
e re
sid.
Normal Q-Q plot
77
43
46
-6 -4 -2 0 2
0.0
0.5
1.0
1.5
Predicted values
Std
. de
vian
ce r
esid
.
Scale-Location plot77 43
46
0 20 40 60 80
0.0
0.1
0.2
0.3
0.4
Obs. number
Coo
k's
dist
ance
Cook's distance plot43
7725
plot(kyphosis.glm)
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 42
Analogue of R2?
• There is no satisfactory analogue of R2 for logistic regression.
• For the “small m big n” situation we can use the residual deviance, since we can obtain an approximate p-value.
• For other situations we can use the Hosmer –Lemeshow statistic (next slide)
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 43
Hosmer-Lemeshow statistic
• How can we judge goodness of fit for ungrouped data?
• Can use the Hosmer-Lemeshow statistic, which groups the data into cases having similar fitted probabilities– Sort the cases in increasing order of fitted
probabilities– Divide into 10 (almost) equal groups– Do a chi-square test to see if the number of
successes in each group matches the estimated probability
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 44
Kyphosis data
Class 1 Class 2 Class 3 Class 4 Class 5
Observed 0’s 9 8 8 7 8
Observed 1’s 0 0 0 1 0
Total obs 9 8 8 8 8
Expected 1’s 0.022 0.082 0.199 0.443 0.776
Class 6 Class 7 Class 8 Class 9 Class 10
Observed 0’s 8 5 5 3 3
Observed 1’s 0 3 3 5 5
Total obs 8 8 8 8 8
Expected 1’s 1.023 1.639 2.496 3.991 6.328
Note: Expected = Total.obs x average prob
Divide probs into 10 classes : lowest 10%, next 10%......
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 45
In R, using the kyphosis data
> HLstat(kyphosis.glm)Value of HL statistic = 6.498 P-value = 0.592
Result of fitting model
A p-value of less than 0.05 indicates problems. No problem indicated for the kyphosis data – logistic appears to fit OK.
The function HLstat is in the “330 functions”