© department of statistics 2012 stats 330 lecture 23: slide 1 stats 330: lecture 23

45
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 1 Stats 330: Lecture 23

Upload: curtis-franklin

Post on 14-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 1

Stats 330: Lecture 23

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 2

Plan of the day

In today’s lecture we continue our discussion of the multiple logistic regression model

Topics covered– Models and submodels– Residuals for Multiple Logistic Regression– Diagnostics in Multiple Logistic Regression– No analogue of R2

Reference: Coursebook, sections 5.2.3, 5.2.3

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 3

Comparison of models

• Suppose model 1 and model 2 are two models, with model 2 a submodel of model1

• If Model 2 is in fact correct, then the difference in the deviances will have approximately a chi-squared distribution

• df equals the difference in df of the separate models

• Approximation OK for grouped and ungrouped data

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 4

Example: kyphosis data• Is age alone an adequate model?

> age.glm<-glm(Kyphosis~Age+I(Age^2),family=binomial, data=kyphosis.df)Null deviance: 83.234 on 80 degrees of freedomResidual deviance: 72.739 on 78 degrees of freedomAIC: 78.739

Full model has deviance 54.428 on 76 df

Chisq is 72.739 - 54.428 = 18.311 on 78-76=2 df> 1-pchisq(18.311,2)[1] 0.0001056372Highly significant: need at least one of start and number

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 5

Anova in R

> anova(age.glm,kyphosis.glm, test=“Chi”)Analysis of Deviance Table

Model 1: Kyphosis ~ Age + I(Age^2)Model 2: Kyphosis ~ Age + I(Age^2) + Start + Number Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 78 72.739 2 76 54.428 2 18.311 0.0001056 ***

Two-model form: comparing

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 6

Residuals

• Two kinds of residuals– Pearson residuals

• useful for grouped data only• similar to residuals in linear regression,

actual minus fitted value– Deviance residuals

• useful for grouped and ungrouped data• Measure contribution of each covariate

pattern to the deviance

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 7

Pearson residuals

Pearson residual for pattern i is

)ˆ1(ˆ

ˆ

iii

iii

n

nr

Probability predicted by model

Standardized to have approximately unit variance, so big if more than 2 in absolute value

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 8

Deviance residuals (i)• For grouped data, the deviance is

otherwise ve- and ,nr if ve si d

nn

rnrn

n

rrd

whered

nn

rnrn

n

rrdeviance

iiii

iii

ii

ii

ii

i

ii

M

ii

iii

ii

ii

ii

iM

ii

ˆ

2/1

ˆlog)(2

ˆlog2

ˆlog)(2

ˆlog2

1

2

1

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 9

Deviance residuals (i)• Thus, the deviance can be written as the

sum of squares of M quantities d1, …, dM ,one for each covariate pattern

• Each di is the contribution to the deviance from the ith covariate pattern

• If deviance residual is big (more than about 2 in magnitude), then the covariate pattern has a big influence on the likelihood, and hence the estimates

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 10

Calculating residuals> pearson.residuals<-residuals(budworm.glm,

type="pearson")> deviance.residuals<-residuals(budworm.glm, type="deviance")

> par(mfrow=c(1,2))> plot(pearson.residuals, ylab="residuals", main="Pearson")

> abline(h=0,lty=2)

> plot(deviance.residuals, ylab="residuals", main="Deviance")

> abline(h=0,lty=2)

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 11

2 4 6 8 10 12

-3-2

-10

12

Pearson

Index

resi

du

als

2 4 6 8 10 12

-2-1

01

2

Deviance

Index

resi

du

als

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 12

Diagnostics: outlier detection

• Large residuals indicate covariate patterns poorly fitted by the model

• Large Pearson residuals indicate a poor match between the “maximum model probabilities” and the logistic model probabilities, for grouped data

• Large deviance residuals indicate influential points

• Example: budworm data

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 13

Diagnostics: detecting non-linear regression functions

• For a single x, plot the logits of the maximal model probabilities against x

• For multiple x’s, plot Pearson residuals against fitted probabilities, against individual x’s

• If the data has most ni’s equal to 1, so can’t be grouped, try gam (cf kyphosis data)

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 14

Example: budworms

• Plot Pearson residuals versus dose, plot shows a curve

0 5 10 15 20 25 30

-3-2

-10

12

Pearson residuals vs dose

budworm.df$dose

Pe

ars

on

re

sid

ua

ls

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 15

Diagnostics: influential points

Will look at 3 diagnostics

– Hat matrix diagonals– Cook’s distance– Leave-one-out Deviance Change

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 16

Example: vaso-constriction data

Data from study of reflex vaso-constriction (narrowing of the blood vessels) of the skin of the fingers– Can be caused caused by sharp intake of

breath

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 17

Example: vaso-constriction data

Variables measured:

Response = 0/1 1=vaso-constriction occurs, 0 = doesn’t occur

Volume: volume of air breathed in

Rate: rate of intake of breath

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 18

Data Volume Rate Response1 3.70 0.825 12 3.50 1.090 13 1.25 2.500 14 0.75 1.500 15 0.80 3.200 16 0.70 3.500 17 0.60 0.750 08 1.10 1.700 09 0.90 0.750 010 0.90 0.450 011 0.80 0.570 012 0.55 2.750 013 0.60 3.000 0. . . 39 obs in all

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 19

Plot of data

> plot(Rate,Volume,type="n", cex=1.2)> text(Rate,Volume,1:39, col=ifelse(Response==1, “red",“blue"), cex=1.2)> text(2.3,3.5,“blue: no VS", col=“blue", adj=0, cex=1.2)> text(2.3,3.0,“red: VS", col=“red", adj=0, cex=1.2)

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 20

0 1 2 3

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Plot of volume versus rate, with ID numbers shown

Rate

Vo

lum

e

1

2

3

4 56

7

8

91011

12 13

14

15

16

17

18

1920

21

22

2324

25

26

27

28

29

30

31

32

33 3435

3637

38

39

red: VS

blue: no VS

Note points 4 and 18

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 21

Enhanced residual plots> vaso.glm = glm(Response ~ log(Volume) + log(Rate), family=binomial, data=vaso.df)> pear.r<-residuals(vaso.glm, type="pearson")> dev.r<-residuals(vaso.glm, type="deviance")> par(mfrow=c(1,2))> plot(pear.r, ylab="residuals", main="Pearson",type="n")> text(pear.r,cex=0.7)> abline(h=0,lty=2)> abline(h=2,lty=2,lwd=2)> abline(h=-2,lty=2,lwd=2)> plot(dev.r, ylab="residuals", main="Deviance",type="h")> text(dev.r, cex=0.7)> abline(h=0,lty=2)> abline(h=2,lty=2,lwd=2)> abline(h=-2,lty=2,lwd=2)

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 22

0 10 20 30 40

-10

12

3Pearson

Index

resi

du

als

12

3

4

56

7

8

91011

12

13

14

15

1617

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

343536

37

38

39

0 10 20 30 40

-10

12

Deviance

Index

resi

du

als

12

3

4

56

7

8

91011

12

13

14

15

1617

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

343536

37

38

39

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 23

Diagnostics: Hat matrix diagonals

• Can define hat matrix diagonals (HMD’s) pretty much as in linear models

• HMD big if HMD > 3p/M (M= no of covariate patterns)

• Draw index plot of HMD’s

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 24

Plotting HMD’s

> HMD<-hatvalues(vaso.glm)> plot(HMD,ylab="HMD's",type="h")> text(HMD,cex=0.7)> abline(h=3*3/39, lty=2)

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 25

0 10 20 30 40

0.0

00

.05

0.1

00

.15

0.2

00

.25

Index

HM

D's

1

2

3

4

5

6

7

8

9

10 11

12

13

14

15

16

17

18

19

20

21

22

2324

2526

27 28

29

30

31

32

33

3435

36

37

38

39

Obs 31 high-leverage

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 26

Hat matrix diagonals

• In ordinary regression, the hat matrix diagonals measure how “outlying” the covariates for an observation are

• In logistic regression, the HMD’s measure the same thing, but are down-weighted according to the estimated probability for the observation. The weights gets small if the probability is close to 0 or 1.

• In the vaso-constriction data, points 1,2,17 had very small weights, since the probabilities are close to 1 for these points.

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 27

0 1 2 3

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Plot of volume versus rate, with ID numbers shown

Rate

Vo

lum

e

1

2

3

4 56

7

8

91011

12 13

14

15

16

17

18

1920

21

22

2324

25

26

27

28

29

30

31

32

33 3435

3637

38

39

red: VS

blue: no VS

Note points 1,2,17

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 28

Diagnostics: Cooks distance

• Can define an analogue of Cook’s distance for each pointCD = (Pearson resid )2 x HMD/(p*(1-HMD)2)

p = number of coeficients

• CD big if more than about 10% quantile of the chi-squared distribution on k+1 df, divided by k+1

• Calculate with qchisq(0.1,k+1)/(k+1)• But not that reliable as a measure

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 29

Cooks D: calculating and plotting

p<-3CD<-cooks.distance(vaso.glm)plot(CD,ylab="Cook's D",type="h",main="index plot of Cook's distances")text(CD, cex=0.7) bigcook<-qchisq(0.1,p)/pabline(h=bigcook, lty=2)

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 30

0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

index plot of Cook's distances

Index

Co

ok'

s D

1 2 3

4

56

7

8

9 10 11

12

13

1415

16 17

18

19

20 2122

23

24

25 26 27

28

29

30

31

32

33

34 3536

3738 39

Points 4 and 18 influential

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 31

Diagnostics: leave-one-out deviance change

• If the ith covariate pattern is left out, the change in the deviance is approximately

(Dev. Res) 2 + (Pearson. Res)2HMD/(1-HMD)

Big if more than about 4

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 32

Deviance change: calculating and plotting

> dev.r<-residuals(vaso.glm,type="deviance")> Dev.change<-dev.r^2 + pear.r^2*HMD/(1-HMD)> plot(Dev.change,ylab="Deviance change", type="h")> text(Dev.change, cex=0.7)> bigdev<-4> abline(h=bigdev, lty=2)

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 33

0 10 20 30 40

01

23

45

6

Deviance change

Index

De

via

nce

ch

an

ge

1 23

4

56

7

8

9 10 11

12

13

14

15

16 17

18

19

20 21

22

23

24

2526

27

2829

30

31

32

33

34 35 36

37

38

39

4 and 18 influential

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 34

All together

influenceplots(vaso.glm)

0 10 20 30 40-1

01

2

Index plot of deviance residuals

Observation number

Dev

ianc

e R

esid

uals

4 18

0 10 20 30 40

0.00

0.10

0.20

Leverage plot

Observation Number

Leve

rage

31

0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

Cook's Distance Plot

Observation number

Coo

k's

Dis

tanc

e

4

18

0 10 20 30 40

01

23

45

6

Deviance Changes Plot

Observation number

Dev

ianc

e ch

ange

s

4

18

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 35

Should we delete points?

• How influential are the 3 points?

• Can delete each in turn and examine changes in coefficients, predicted probabilities

• First, coefficients:Deleting: None 31 4 18 All 3(Intercept) -2.875 -3.041 -5.206 -4.758 -24.348log(Volume) 5.179 4.966 8.468 7.671 39.142log(Rate) 4.562 4.765 7.455 6.880 31.642

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 36

Should we delete points (2)?

• Next, fitted probabilities:

• Conclusion: points 4 and 18 have a big effect.

delete pointsFitted at None 31 4 18 4 and 18 All 3point 31 0.722 0.627 0.743 0.707 0.996 0.996point 4 0.075 0.073 0.010 0.015 0.000 0.000point 18 0.106 0.100 0.018 0.026 0.000 0.000

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 37

Should we delete points (3)?

• Should we delete?

• They could be genuine – no real evidence they are wrong

• If we delete them, we increase the regression coefficients, make fitted probabilities more extreme

• Overstate the predictive ability of the model

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 38

Residuals for ungrouped data

• If all cases have distinct covariate patterns, then the residuals lie along two curves (corresponding to success and failure) and have little or no diagnostic value.

• Thus, there is a pattern even if everything is OK.

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 39

Formulas

• Pearson residuals: for ungrouped data, residual for i th case is

0,ˆ1

ˆ

1,ˆ

ˆ1

y

y

i

i

i

i

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 40

Formulas (cont)

• Deviance residuals: for ungrouped data, residual for i th case is

ˆ2 | log |, 1

ˆ2 | log(1 ) |, 0

y

y

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 41

Use of plot function

-6 -4 -2 0 2

-2-1

01

2

Predicted values

Res

idua

ls

Residuals vs Fitted

77

43

46

-2 -1 0 1 2

-3-2

-10

12

3

Theoretical Quantiles

Std

. dev

ianc

e re

sid.

Normal Q-Q plot

77

43

46

-6 -4 -2 0 2

0.0

0.5

1.0

1.5

Predicted values

Std

. de

vian

ce r

esid

.

Scale-Location plot77 43

46

0 20 40 60 80

0.0

0.1

0.2

0.3

0.4

Obs. number

Coo

k's

dist

ance

Cook's distance plot43

7725

plot(kyphosis.glm)

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 42

Analogue of R2?

• There is no satisfactory analogue of R2 for logistic regression.

• For the “small m big n” situation we can use the residual deviance, since we can obtain an approximate p-value.

• For other situations we can use the Hosmer –Lemeshow statistic (next slide)

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 43

Hosmer-Lemeshow statistic

• How can we judge goodness of fit for ungrouped data?

• Can use the Hosmer-Lemeshow statistic, which groups the data into cases having similar fitted probabilities– Sort the cases in increasing order of fitted

probabilities– Divide into 10 (almost) equal groups– Do a chi-square test to see if the number of

successes in each group matches the estimated probability

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 44

Kyphosis data

Class 1 Class 2 Class 3 Class 4 Class 5

Observed 0’s 9 8 8 7 8

Observed 1’s 0 0 0 1 0

Total obs 9 8 8 8 8

Expected 1’s 0.022 0.082 0.199 0.443 0.776

Class 6 Class 7 Class 8 Class 9 Class 10

Observed 0’s 8 5 5 3 3

Observed 1’s 0 3 3 5 5

Total obs 8 8 8 8 8

Expected 1’s 1.023 1.639 2.496 3.991 6.328

Note: Expected = Total.obs x average prob

Divide probs into 10 classes : lowest 10%, next 10%......

© Department of Statistics 2012 STATS 330 Lecture 23: Slide 45

In R, using the kyphosis data

> HLstat(kyphosis.glm)Value of HL statistic = 6.498 P-value = 0.592

Result of fitting model

A p-value of less than 0.05 indicates problems. No problem indicated for the kyphosis data – logistic appears to fit OK.

The function HLstat is in the “330 functions”