© department of statistics 2012 stats 330 lecture 23: slide 1 stats 330: lecture 23

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 1

Stats 330: Lecture 23


Plan of the day

In today’s lecture we continue our discussion of the multiple logistic regression model

Topics covered– Models and submodels– Residuals for Multiple Logistic Regression– Diagnostics in Multiple Logistic Regression– No analogue of R2

Reference: Coursebook, sections 5.2.3, 5.2.3


Comparison of models

• Suppose model 1 and model 2 are two models, with model 2 a submodel of model1

• If Model 2 is in fact correct, then the difference in the deviances will have approximately a chi-squared distribution

• df equals the difference in df of the separate models

• Approximation OK for grouped and ungrouped data


Example: kyphosis data• Is age alone an adequate model?

> age.glm<-glm(Kyphosis~Age+I(Age^2),family=binomial, data=kyphosis.df)Null deviance: 83.234 on 80 degrees of freedomResidual deviance: 72.739 on 78 degrees of freedomAIC: 78.739

Full model has deviance 54.428 on 76 df

Chisq is 72.739 - 54.428 = 18.311 on 78-76=2 df> 1-pchisq(18.311,2)[1] 0.0001056372Highly significant: need at least one of start and number


Anova in R

> anova(age.glm,kyphosis.glm, test=“Chi”)Analysis of Deviance Table

Model 1: Kyphosis ~ Age + I(Age^2)Model 2: Kyphosis ~ Age + I(Age^2) + Start + Number Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 78 72.739 2 76 54.428 2 18.311 0.0001056 ***

Two-model form: comparing


Residuals

• Two kinds of residuals– Pearson residuals

• useful for grouped data only• similar to residuals in linear regression,

actual minus fitted value– Deviance residuals

• useful for grouped and ungrouped data• Measure contribution of each covariate

pattern to the deviance


Pearson residuals

Pearson residual for pattern i is

)ˆ1(ˆ

ˆ

iii

iii

n

nr

Probability predicted by model

Standardized to have approximately unit variance, so big if more than 2 in absolute value


Deviance residuals (i)• For grouped data, the deviance is

otherwise ve- and ,nr if ve si d

nn

rnrn

n

rrd

whered

nn

rnrn

n

rrdeviance

iiii

iii

ii

ii

ii

i

ii

M

ii

iii

ii

ii

ii

iM

ii

ˆ

2/1

ˆlog)(2

ˆlog2

ˆlog)(2

ˆlog2

1

2

1


Deviance residuals (i)• Thus, the deviance can be written as the

sum of squares of M quantities d1, …, dM ,one for each covariate pattern

• Each di is the contribution to the deviance from the ith covariate pattern

• If deviance residual is big (more than about 2 in magnitude), then the covariate pattern has a big influence on the likelihood, and hence the estimates


Calculating residuals> pearson.residuals<-residuals(budworm.glm,

type="pearson")> deviance.residuals<-residuals(budworm.glm, type="deviance")

> par(mfrow=c(1,2))> plot(pearson.residuals, ylab="residuals", main="Pearson")

> abline(h=0,lty=2)

> plot(deviance.residuals, ylab="residuals", main="Deviance")

> abline(h=0,lty=2)


2 4 6 8 10 12

-3-2

-10

12

Pearson

Index

resi

du

als

2 4 6 8 10 12

-2-1

01

2

Deviance

Index

resi

du

als


Diagnostics: outlier detection

• Large residuals indicate covariate patterns poorly fitted by the model

• Large Pearson residuals indicate a poor match between the “maximum model probabilities” and the logistic model probabilities, for grouped data

• Large deviance residuals indicate influential points

• Example: budworm data


Diagnostics: detecting non-linear regression functions

• For a single x, plot the logits of the maximal model probabilities against x

• For multiple x’s, plot Pearson residuals against fitted probabilities, against individual x’s

• If the data has most ni’s equal to 1, so can’t be grouped, try gam (cf kyphosis data)


Example: budworms

• Plot Pearson residuals versus dose, plot shows a curve

0 5 10 15 20 25 30

-3-2

-10

12

Pearson residuals vs dose

budworm.df$dose

Pe

ars

on

re

sid

ua

ls


Diagnostics: influential points

Will look at 3 diagnostics

– Hat matrix diagonals– Cook’s distance– Leave-one-out Deviance Change


Example: vaso-constriction data

Data from study of reflex vaso-constriction (narrowing of the blood vessels) of the skin of the fingers– Can be caused caused by sharp intake of

breath


Example: vaso-constriction data

Variables measured:

Response = 0/1 1=vaso-constriction occurs, 0 = doesn’t occur

Volume: volume of air breathed in

Rate: rate of intake of breath


Data Volume Rate Response1 3.70 0.825 12 3.50 1.090 13 1.25 2.500 14 0.75 1.500 15 0.80 3.200 16 0.70 3.500 17 0.60 0.750 08 1.10 1.700 09 0.90 0.750 010 0.90 0.450 011 0.80 0.570 012 0.55 2.750 013 0.60 3.000 0. . . 39 obs in all


Plot of data

> plot(Rate,Volume,type="n", cex=1.2)> text(Rate,Volume,1:39, col=ifelse(Response==1, “red",“blue"), cex=1.2)> text(2.3,3.5,“blue: no VS", col=“blue", adj=0, cex=1.2)> text(2.3,3.0,“red: VS", col=“red", adj=0, cex=1.2)


0 1 2 3

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Plot of volume versus rate, with ID numbers shown

Rate

Vo

lum

e

1

2

3

4 56

7

8

91011

12 13

14

15

16

17

18

1920

21

22

2324

25

26

27

28

29

30

31

32

33 3435

3637

38

39

red: VS

blue: no VS

Note points 4 and 18


Enhanced residual plots> vaso.glm = glm(Response ~ log(Volume) + log(Rate), family=binomial, data=vaso.df)> pear.r<-residuals(vaso.glm, type="pearson")> dev.r<-residuals(vaso.glm, type="deviance")> par(mfrow=c(1,2))> plot(pear.r, ylab="residuals", main="Pearson",type="n")> text(pear.r,cex=0.7)> abline(h=0,lty=2)> abline(h=2,lty=2,lwd=2)> abline(h=-2,lty=2,lwd=2)> plot(dev.r, ylab="residuals", main="Deviance",type="h")> text(dev.r, cex=0.7)> abline(h=0,lty=2)> abline(h=2,lty=2,lwd=2)> abline(h=-2,lty=2,lwd=2)


0 10 20 30 40

-10

12

3Pearson

Index

resi

du

als

12

3

4

56

7

8

91011

12

13

14

15

1617

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

343536

37

38

39

0 10 20 30 40

-10

12

Deviance

Index

resi

du

als

12

3

4

56

7

8

91011

12

13

14

15

1617

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

343536

37

38

39


Diagnostics: Hat matrix diagonals

• Can define hat matrix diagonals (HMD’s) pretty much as in linear models

• HMD big if HMD > 3p/M (M= no of covariate patterns)

• Draw index plot of HMD’s


Plotting HMD’s

> HMD<-hatvalues(vaso.glm)> plot(HMD,ylab="HMD's",type="h")> text(HMD,cex=0.7)> abline(h=3*3/39, lty=2)


0 10 20 30 40

0.0

00

.05

0.1

00

.15

0.2

00

.25

Index

HM

D's

1

2

3

4

5

6

7

8

9

10 11

12

13

14

15

16

17

18

19

20

21

22

2324

2526

27 28

29

30

31

32

33

3435

36

37

38

39

Obs 31 high-leverage


Hat matrix diagonals

• In ordinary regression, the hat matrix diagonals measure how “outlying” the covariates for an observation are

• In logistic regression, the HMD’s measure the same thing, but are down-weighted according to the estimated probability for the observation. The weights gets small if the probability is close to 0 or 1.

• In the vaso-constriction data, points 1,2,17 had very small weights, since the probabilities are close to 1 for these points.


0 1 2 3

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Plot of volume versus rate, with ID numbers shown

Rate

Vo

lum

e

1

2

3

4 56

7

8

91011

12 13

14

15

16

17

18

1920

21

22

2324

25

26

27

28

29

30

31

32

33 3435

3637

38

39

red: VS

blue: no VS

Note points 1,2,17


Diagnostics: Cooks distance

• Can define an analogue of Cook’s distance for each pointCD = (Pearson resid )2 x HMD/(p*(1-HMD)2)

p = number of coeficients

• CD big if more than about 10% quantile of the chi-squared distribution on k+1 df, divided by k+1

• Calculate with qchisq(0.1,k+1)/(k+1)• But not that reliable as a measure


Cooks D: calculating and plotting

p<-3CD<-cooks.distance(vaso.glm)plot(CD,ylab="Cook's D",type="h",main="index plot of Cook's distances")text(CD, cex=0.7) bigcook<-qchisq(0.1,p)/pabline(h=bigcook, lty=2)


0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

index plot of Cook's distances

Index

Co

ok'

s D

1 2 3

4

56

7

8

9 10 11

12

13

1415

16 17

18

19

20 2122

23

24

25 26 27

28

29

30

31

32

33

34 3536

3738 39

Points 4 and 18 influential


Diagnostics: leave-one-out deviance change

• If the ith covariate pattern is left out, the change in the deviance is approximately

(Dev. Res) 2 + (Pearson. Res)2HMD/(1-HMD)

Big if more than about 4


Deviance change: calculating and plotting

> dev.r<-residuals(vaso.glm,type="deviance")> Dev.change<-dev.r^2 + pear.r^2*HMD/(1-HMD)> plot(Dev.change,ylab="Deviance change", type="h")> text(Dev.change, cex=0.7)> bigdev<-4> abline(h=bigdev, lty=2)


0 10 20 30 40

01

23

45

6

Deviance change

Index

De

via

nce

ch

an

ge

1 23

4

56

7

8

9 10 11

12

13

14

15

16 17

18

19

20 21

22

23

24

2526

27

2829

30

31

32

33

34 35 36

37

38

39

4 and 18 influential


All together

influenceplots(vaso.glm)

0 10 20 30 40-1

01

2

Index plot of deviance residuals

Observation number

Dev

ianc

e R

esid

uals

4 18

0 10 20 30 40

0.00

0.10

0.20

Leverage plot

Observation Number

Leve

rage

31

0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

Cook's Distance Plot

Observation number

Coo

k's

Dis

tanc

e

4

18

0 10 20 30 40

01

23

45

6

Deviance Changes Plot

Observation number

Dev

ianc

e ch

ange

s

4

18


Should we delete points?

• How influential are the 3 points?

• Can delete each in turn and examine changes in coefficients, predicted probabilities

• First, coefficients:Deleting: None 31 4 18 All 3(Intercept) -2.875 -3.041 -5.206 -4.758 -24.348log(Volume) 5.179 4.966 8.468 7.671 39.142log(Rate) 4.562 4.765 7.455 6.880 31.642


Should we delete points (2)?

• Next, fitted probabilities:

• Conclusion: points 4 and 18 have a big effect.

delete pointsFitted at None 31 4 18 4 and 18 All 3point 31 0.722 0.627 0.743 0.707 0.996 0.996point 4 0.075 0.073 0.010 0.015 0.000 0.000point 18 0.106 0.100 0.018 0.026 0.000 0.000


Should we delete points (3)?

• Should we delete?

• They could be genuine – no real evidence they are wrong

• If we delete them, we increase the regression coefficients, make fitted probabilities more extreme

• Overstate the predictive ability of the model


Residuals for ungrouped data

• If all cases have distinct covariate patterns, then the residuals lie along two curves (corresponding to success and failure) and have little or no diagnostic value.

• Thus, there is a pattern even if everything is OK.


Formulas

• Pearson residuals: for ungrouped data, residual for i th case is

0,ˆ1

ˆ

1,ˆ

ˆ1

y

y

i

i

i

i


Formulas (cont)

• Deviance residuals: for ungrouped data, residual for i th case is

ˆ2 | log |, 1

ˆ2 | log(1 ) |, 0

y

y


Use of plot function

-6 -4 -2 0 2

-2-1

01

2

Predicted values

Res

idua

ls

Residuals vs Fitted

77

43

46

-2 -1 0 1 2

-3-2

-10

12

3

Theoretical Quantiles

Std

. dev

ianc

e re

sid.

Normal Q-Q plot

77

43

46

-6 -4 -2 0 2

0.0

0.5

1.0

1.5

Predicted values

Std

. de

vian

ce r

esid

.

Scale-Location plot77 43

46

0 20 40 60 80

0.0

0.1

0.2

0.3

0.4

Obs. number

Coo

k's

dist

ance

Cook's distance plot43

7725

plot(kyphosis.glm)


Analogue of R2?

• There is no satisfactory analogue of R2 for logistic regression.

• For the “small m big n” situation we can use the residual deviance, since we can obtain an approximate p-value.

• For other situations we can use the Hosmer –Lemeshow statistic (next slide)


Hosmer-Lemeshow statistic

• How can we judge goodness of fit for ungrouped data?

• Can use the Hosmer-Lemeshow statistic, which groups the data into cases having similar fitted probabilities– Sort the cases in increasing order of fitted

probabilities– Divide into 10 (almost) equal groups– Do a chi-square test to see if the number of

successes in each group matches the estimated probability


Kyphosis data

Class 1 Class 2 Class 3 Class 4 Class 5

Observed 0’s 9 8 8 7 8


Total obs 9 8 8 8 8

Expected 1’s 0.022 0.082 0.199 0.443 0.776

Class 6 Class 7 Class 8 Class 9 Class 10



Total obs 8 8 8 8 8

Expected 1’s 1.023 1.639 2.496 3.991 6.328

Note: Expected = Total.obs x average prob

Divide probs into 10 classes : lowest 10%, next 10%......


In R, using the kyphosis data

> HLstat(kyphosis.glm)Value of HL statistic = 6.498 P-value = 0.592

Result of fitting model

A p-value of less than 0.05 indicates problems. No problem indicated for the kyphosis data – logistic appears to fit OK.

The function HLstat is in the “330 functions”

© department of statistics 2012 stats 330 lecture 23: slide 1 stats 330: lecture 23

Documents