© department of statistics 2012 stats 330 lecture 27: slide 1 stats 330: lecture 27

56
© Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

Upload: antonia-michael

Post on 15-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 1

Stats 330: Lecture 27

Page 2: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 2

Plan of the dayIn today’s lecture we apply Poisson regression to the analysis of one and two-dimensional contingency tables.

Topics– Contingency tables– Sampling models– Equivalence of Poisson and multinomial– Correspondence between interactions and

independence

Page 3: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 3

Contingency tables

• Contingency tables arise when we classify a number of individuals into categories using one or more criteria.

• Result is a cross-tabulation or contingency table– 1 criterion gives a 1-dimensional table– 2 criteria give a 2 dimensional table– … and so on.

Page 4: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 4

Example: one dimension

Income distribution of New Zealanders 15+ 2006 census

$5,000 or Less

$5,001 - $10,000

$10,001 - $20,000

$20,001 - $30,000

$30,001 - $50,000

$50,001 or More

Not Stated

Total Personal Income

383,574

226,800

615,984

434,958

666,372

511,803

320,892

3,160,371

Page 5: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 5

Example 2: 2 dimensions

New Zealanders 15+ by annual income and sex: 2006 census

$10,000 and under

$10,001-$20,000

20,001-40,000

$40,001-$70,000

$70,001 - $100,000

$100,001 or More Not Stated

Total Personal Income

Male 232,620 232,884 407,466 331,320 90,177 82,701 144,420 1,521,591

Female 377,754 383,097 431,565 212,139 34,938 22,821 176,472 1,638,783

Total 610,374 615,981 839,031 543,459 125,115 105,522 320,892 3,160,374

Page 6: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 6

Censuses and samples

• Sometimes tables come from a census

• Sometimes the table comes from a random sample from a population– In this case we often want to draw inferences

about the population on the basis of the sample e.g.is income independent of sex?

Page 7: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 7

Example: death by falling

The following is a random sample of 16,976 persons who met their deaths in fatal falls, selected from a larger population of deaths from this cause. The falls classified by month are

Jan 1688 July 1406

Feb 1407 Aug 1446

Mar 1370 Sep 1322

Apr 1309 Oct 1363

May 1341 Nov 1410

Jun 1388 Dec 1526

Page 8: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 8

The question

• Question: if we regard this as a random sample from several years, are the months in which death occurs equally likely?

• Or is one month more likely than the others?

• To answer this, we use the idea of maximum likelihood. First, we must detour into sampling theory in order to work out the likelihood

Page 9: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 9

Sampling models• There are two common models used for

contingency tables• The multinomial sampling model

assumes that a fixed number of individuals from a random sample are classified with fixed probabilities of being assigned to the different “cells”.

• The Poisson sampling model assumes that the table entries have independent Poisson distributions

Page 10: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 10

Multinomial sampling

Page 11: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 11

Multinomial model• Suppose a table has M “cells”• n individuals classified independently

(A.k.a. sampling with replacement)

• Each individual has probability i of being in cell i

• Every individual is classified into exactly 1 cell, so 1 + 2 + . . . + M = 1

• Let Yi be the number in the ith cell, so

Y1 + Y2 + . . . YM = n

Page 12: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 12

Multinomial model (cont)

Y1, . . . ,Ym have a multinomial distribution

My

M

y

M

MM yyy

nyYyY ...

!!...!

!),...,Pr( 1

1

21

11

This is the maximal model, as in logistic regression, making no assumptions about the probabilities.

Page 13: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 13

Log-likelihood

constyy

yyy

n

yYyY

MM

y

M

y

M

MM

M

log...log

...!!...!

!log

),...,Pr(log

11

1

21

11

1

Page 14: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 14

MLE’s• If there are no restrictions on the ’s

(except that they add to one) the log-likelihood is maximised when i = yi/n

• These are the MLE’s for the maximal model

• Substitute these into the log-likelihood to get the maximum value of the maximal log-likelihood. Call this Log Lmax

Page 15: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 15

Deviance• Suppose we have some model for the

probabilities: this will specify the form of the probabilities, perhaps as functions of other parameters. In our example, the model says that each probability is 1/12.

• Let log Lmod be the maximum value of the log-likelihood, when the probabilities are given by the model.

• As for logistic regression, we define the deviance as D = 2log Lmax - 2 log Lmod

Page 16: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 16

Deviance:Testing adequacy of the model

• If n is large (all cells more than 5) and M is small, and if the model is true, the deviance will have (approximately) a chi-square distribution with M-k-1 df, where k is the number of unknown parameters in the model.

• Thus, we accept the model if the deviance p-value is more than 0.05.

• This is almost the same as the situation in logistic regression with strongly grouped data.

• These tests are an alternative form of the chi-square tests used in stage 2.

Page 17: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 17

Example: death by falling

• Are deaths equally likely in all months? In statistical terms, is i=1/12, i=1,2,…,12?

• Log Lmax is, up to a constant,

)/log(...)/log( 121211max nyynyyL Log

Calculated in R by>y<-c(1688,1407,1370,1309,1341,1388,

1406,1446,1322,1363, 1410,1526)> sum(y*log(y/sum(y)))[1] -42143.23

Page 18: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 18

Example: death by falling

Our model is i=1/12, i=1,2,…,12. (all months equally likely). This completely specifies the probabilities, so k=0, M-k-1=11. Log Lmod is, up to a constant,

)12/1log(...)12/1log( 121max yyL Log

> sum(y*log(1/12))[1] -42183.78 # now calculate deviance> D<-2*sum(y*log(y/sum(y)))-2*sum(y*log(1/12))> D[1] 81.09515> 1-pchisq(D,11)[1] 9.05831e-13 Model is not plausible!

Page 19: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 19

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

0.0

00

.02

0.0

40

.06

0.0

8

> Months<-c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")> barplot(y/(sum(y),names.arg=Months)

Page 20: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 20

Poisson model• Assume that each cell is Poisson with a

different mean: this is just a Poisson regression model, with a single explanatory variable “month”

• This is the maximal model

• The null model is the model with all means equal

• Null model deviance will give us a test that all months have the same mean

Page 21: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 21

> months<-c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")

> months.fac<-factor(months,levels=months)

> falls.glm<-glm(y~months.fac,family=poisson)

> summary(falls.glm)

R-code for Poisson regression

Page 22: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 22

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 7.43130 0.02434 305.317 < 2e-16 ***months.facFeb -0.18208 0.03610 -5.044 4.56e-07 ***months.facMar -0.20873 0.03636 -5.740 9.46e-09 ***months.facApr -0.25428 0.03683 -6.904 5.04e-12 ***months.facMay -0.23013 0.03658 -6.291 3.15e-10 ***months.facJun -0.19568 0.03623 -5.401 6.64e-08 ***months.facJul -0.18280 0.03611 -5.063 4.13e-07 ***months.facAug -0.15474 0.03583 -4.318 1.57e-05 ***months.facSep -0.24440 0.03673 -6.655 2.84e-11 ***months.facOct -0.21386 0.03642 -5.873 4.29e-09 ***months.facNov -0.17995 0.03608 -4.988 6.10e-07 ***months.facDec -0.10089 0.03532 -2.856 0.00429 ** Null deviance: 8.1095e+01 on 11 degrees of freedomResidual deviance: -7.5051e-14 on 0 degrees of freedom> D[1] 81.09515

Same as before! Why????

Page 23: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 23

General principle:• To every Poisson model for the cell counts, there

corresponds a multinomial model, obtained by conditioning on the table total.

• Suppose the Poisson means are 1, … M .... The cell probabilities in the multinomial sampling model are related to the means in the Poisson sampling model by the relationship

i = i/(1 + … + M)• We can estimate the parameters in the multinomial

model by fitting the Poisson model and using the relationship above.

• We can test hypotheses about the multinomial model by testing the equivalent hypothesis in the Poisson regression model.

Page 24: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 24

Example: death by falling

• Poisson means are

exp(( ))

exp(( ) )

exp(( ) )

. . .

exp(( ) )

Jan

Feb Feb

Mar Mar

Dec Dec

Int

Int

Int

Int

Page 25: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 25

Relationship between Poisson means and multinomial probs

11

1 2 12

22

1 2 12

1212

1 2 12

1

... 1 ...

... 1 ...

...

... 1 ...

Feb Mar Dec

Feb

Feb Mar Dec

Dec

Feb Mar Dec

e e e

e

e e e

e

e e e

Page 26: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 26

Relationship between Poisson parameters and multinomial

probs – alternative form

12,...,2,log

12,...,2),exp(

1

1

j

or

j

j

j

j

j

Page 27: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 27

Testing all months equal

• Clearly, all months hav equal probabilities if and only if all the deltas are zero.

• Thus, testing for equal months in the multinomial model is the same as testing for deltas all zero in the Poisson model.

• This is done using the null model deviance, or, equivalently, the anova table.Recall: The null deviance was 8.1095e+01 on 11 degrees of freedom, with a p-value of 9.05831e-13, so months not equally likely

Page 28: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 28

2x2 tables

• Relationship between Snoring and Nightmares:

• Data from a

random sample,

classified according

to these 2 factors

Nightmare =

Frequently

Nightmare =

Occasionally

Snorer

= N11 82

Snorer

= Y12 74

Page 29: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 29

Parametrising the 2x2 table of Poisson means

Nightmare = frequently Nightmare = occasionally

Snorer

=N 1= expInt 2=expInt

Snorer

=Y3=expInt 4=expInt

Main effect of Nightmare

Snoring/nightmare interaction

Main effect of Snoring

Page 30: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 30

Parameterising the 2x2 table of probabilities

Nightmare = frequently

Nightmare = occasionally

All

Snorer

=N

Snorer

=Y

All

Marginal distn of Snorer

Marginal distn of Nightmare

Page 31: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 31

Relationship between multinomial probabilities and Poisson parameters

3 3 3

1 1 1

2 2 2

1 1 1

4 4 4

1 1 1

exp(( ) )exp( ), or log

exp(( ))

exp(( ) )exp( ), or log

exp(( ))

exp(( ) )exp( ), or log

exp(( ))

Int

Int

Int

Int

Int

Int

Page 32: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 32

Independence

• Events A and B are independent if the conditional probability that A occurs, given B occurs, is the same as the unconditional probability of A occuring. i.e. P(A|B)=P(A).

• Since P(A|B)=P(A and B)/P(B), this is equivalent to P(A and B) = P(A) P(B)

Page 33: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 33

Independence in a 2 x2 table

• Independence of nightmares and snoring means

P(Snoring and frequent nightmares)=

P(Snoring) x P(frequent nightmares)

(plus similar equations for the other 3 cells)

Or, 1 =(1 + 2) (1 + 3)

In fact, this equation implies the other 3 for a 2 x 2 table

Page 34: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 34

Independence in a 2x2 table (cont)

Three equivalent conditions for independence in the 2 x 2 table

. 1 =(1 + 2) (1 + 3)

. 1 4 = 2 3

. = 0 (ie zero interaction in Poisson model)Thus, we can test independence in the multinomial model by testing for zero interaction in the Poisson regression.

Page 35: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 35

Math stuff

3241

321411

3241

323221

3231212

31211

i.e.

thatso

)1(

)(

))((

1

Page 36: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 36

More math stuff

0 implies

1)exp( implies

)exp()exp()exp()exp()exp( implies

)exp()exp()exp( implies

implies1

3

1

2

1

4

3241

Page 37: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 37

Odds ratio• Prob of not being a snorer for “frequent

nightmares” population is

• Prob of being a snorer for “frequent nightmares” population is

• Odds of not being a snorer for “frequent nightmares” population is

(=

• Similarly, the odds of not being a snorer for “occasional nightmares” population is

Page 38: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 38

Odds ratio (2)• Odds ratio (OR) is the ratio of these 2 odds.

ThusOR =

• OR = exp(), log(OR) = • OR =1 (i.e. log (OR) =0 ) if and only if snoring

and nightmares are independent• Acts like a “correlation coefficient” between

factors, with 1 corresponding to no relationship• Interchanging rows (or columns) changes OR to

1/OR

Page 39: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 39

Odds ratio (3)

• Get a CI for the OR by– Getting a CI for – “Exponentiating”

• CI for is estimate +/- 1.96 standard error

• Get estimate, standard error from Poisson regression summary

Page 40: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 40

Example: snoring> y<-c(11,82,12,74)> nightmares<-factor(rep(c("F","O"),2))> snore<-factor(rep(c("N","Y"),c(2,2)))> snoring.glm<-glm(y~nightmares*snore,family=poisson)> anova(snoring.glm, test="Chisq")Analysis of Deviance TableModel: poisson, link: logResponse: y Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 3 111.304 nightmares 1 110.850 2 0.454 <2e-16 ***snore 1 0.274 1 0.180 0.6008 nightmares:snore 1 0.180 0 0.000 0.6713

No evidence of association between nightmares and snoring.

Page 41: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 41

Odds ratioCoefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 2.39790 0.30151 7.953 1.82e-15 ***

nightmaresO 2.00882 0.32110 6.256 3.95e-10 ***

snoreY 0.08701 0.41742 0.208 0.835

nightmaresO:snoreY -0.18967 0.44716 -0.424 0.671

Estimate of is -0.18967, standard error is 0.44716, so CI for is -0.18967 +/- 1.96 * 0.44716, or

(-1.066 , 0.6868,)

CI for OR = exp() is (exp(-1.066), exp(0.6868) )

= (0.3443, 1.987)

Page 42: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 42

The general I x J table

• Example: In the 1996 general Social Survey, the National center for Opinion Research collected data on the following variables from 2726 respondents:– Education: Less than high school, High

school, Bachelors or graduate– Religious belief: Fundamentalist, moderate or

liberal.

• Cross-classification is

Page 43: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 43

Education and Religious belief

  Religious belief

Education Fundamentalist Moderate Liberal Total

Less than high school 178 138 108 424High school or junior 

college 570 648 442 1660

Bachelor or graduate 138 252 252 642

Total 886 1038 802 2726

Page 44: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 44

Testing independence in the general 2-dimensional

table• We have a two-dimensional table, with the rows

corresponding to Education, having I=3 levels, and the columns corresponding to Religious Belief, having J=3 levels.

• Under the Poisson sampling model, let ij be the mean count for the i,j cell

• Split log ij into overall level, main effects, interactions as usual

log( ) ( ) ( )ij i j ijInt

Page 45: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 45

Testing independence in the general 2-dimensional

table (ii)• Under the corresponding multinomial sampling

model, let ij be the probability that a randomly chosen individual has level i of Education and level j of Religious belief, and so is classified into the i,j cell of the table.

• Then, as before

11 11

log log ( )ij iji j ij

Page 46: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 46

Testing independence (iii)

• Let i+ = i1 + … + iJ be the marginal probabilities: the probability that Education=i.

• Let +j be the same thing for Religious

belief

• The factors are independent if

ij = i+ +j

• This is equivalent to ()ij = 0 for all i,j.

Page 47: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 47

Testing independence (iv)

Now we can state the principle:

We can test independence in the multinomial sampling model by fitting a Poisson regression with interacting factors, and testing for zero interaction.

Page 48: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 48

Doing it in R

counts = c(178, 570, 138, 138, 648, 252, 108,442, 252)example.df = data.frame(y = counts, expand.grid(education = c("LessThanHighSchool", "HighSchool", "Bachelor"),religion = c("Fund", "Mod", "Lib")))

levels(example.df$education) = c("LessThanHighSchool", "HighSchool", "Bachelor")levels(example.df$religion) = c("Fund", "Mod", "Lib")

example.glm = glm(y~education*religion, family=poisson, data=example.df)anova(example.glm, test="Chisq")

Page 49: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 49

The data

> example.df y education religion1 178 LessThanHighSchool Fund2 570 HighSchool Fund3 138 Bachelor Fund4 138 LessThanHighSchool Mod5 648 HighSchool Mod6 252 Bachelor Mod7 108 LessThanHighSchool Lib8 442 HighSchool Lib9 252 Bachelor Lib

Page 50: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 50

The analysis

> example.glm = glm(y~education*religion, family=poisson, data=example.df)> anova(example.glm, test="Chisq")Analysis of Deviance TableModel: poisson, link: logResponse: yTerms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev P(>|Chi|)NULL 8 1009.20 education 2 908.18 6 101.02 6.179e-198religion 2 31.20 4 69.81 1.675e-07education:religion 4 69.81 0 -2.047e-13 2.488e-14

Degrees of freedom

P-value

Chisq

Small p-value is evidence against independence

Page 51: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 51

Odds ratios

• In a general I x J table, the relationship between the factors is described by a set of (I-1) x (J-1) odds ratios, defined by

.,...2,,...,2,11

11 Jj Ii ORji

ij

ij

Page 52: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 52

Interpretation

• Consider sub-table of all individuals having levels 1 and i of education, and levels 1 and j of religious belief

11   1j

i1   ijRow i

Column j

Page 53: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 53

Interpretation (cont)

• Odds of being Education 1 at level 1 of Religious belief are 11/i1

• Odds of being Education 1 at level j of Religious belief are 1j/ij

• Odds ratio is (11/i1)/(1j/ij) = (ij11)/(i11j)

Page 54: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 54

Odds Ratio factsMain facts

1. A and B are independent iff all the odds ratios equal to 1

2. Log (ORij) = ()ij

3. Estimate ORij by exponentiating the corresponding interaction in the Poisson summary

4. Get CI by exponentiating CI for interaction

Page 55: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 55

Numerical example: odds ratios

Fund Mod LibLess than HS * * *

High school * 1.466= exp(0.38278)

1.278=exp(0.24533)

Bachelor * 2.355= exp(0.85671)

3.010= exp(1.10183)

Coefficients: Estimate Std. ErroreducationHighSchool:religionMod 0.38278 0.12713educationBachelor:religionMod 0.85671 0.15517educationHighSchool:religionLib 0.24533 0.13746educationBachelor:religionLib 1.10183 0.16153

Page 56: © Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 56

Confidence interval

Estimate of log(OR Bach/Lib) is 1.10183, standard error is 0.16153, so CI is 1.10183 +/- 1.96 * 0.16153, or (0.7852312 1.4184288)

CI for OR is exp(0.7852312 ,1.4184288) = 2.192914, 4.130625

Odds for having a bachelor’s degree versus no high school are about 3 times higher for Liberals than fundamentalists