contingency tables - faculty.nps.edu

25
Contingency Tables Professor Ron Fricker Naval Postgraduate School Monterey, California 8/25/12 1 Reading Assignment: None

Upload: others

Post on 04-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Contingency Tables - faculty.nps.edu

Contingency Tables!

Professor Ron Fricker!Naval Postgraduate School!

Monterey, California!

8/25/12 1

Reading Assignment:!None!

Page 2: Contingency Tables - faculty.nps.edu

Goals for this Lecture!

•  Understand and be able to conduct tests for discrete contingency table data!–  One-way chi-square goodness-of-fit tests!

•  Homogeneity!•  Other distributions!

–  Two-way chi-square tests !•  Independence!•  Homogeneity !

•  All assuming SRS and no fpc!

8/25/12 2

Page 3: Contingency Tables - faculty.nps.edu

One-Way Classifications!

•  Each item classified into one (and only one) of k categories (cells)!–  Denote counts as x1, x2, …,

xk with x1+ x2 + … + xk = n!

8/25/12 3

Population

Random sample of size n

Category k Cell frequency xk

Classify

Category 1 Cell frequency x1

Category 2 Cell frequency x2

Page 4: Contingency Tables - faculty.nps.edu

One-Way Tables in R!

•  Just use table() or xtabs() on one variable!–  E.g., tabulating Q1 in the New Student Survey:!

8/25/12 4

* Data from 2008 survey of NPS new students

Page 5: Contingency Tables - faculty.nps.edu

Two-Way Contingency Tables!

•  A two-way contingency table (or cross tabulation) gives counts by all pairwise combinations of variable levels!

8/25/12 5

Variable 1

Variable 2

“A” “B”

“X”

“Y”

# or %

# or %

# or %

# or %

# or %

# or %

# or % # or %

Number or percent of obs that are both “X” and “B”

Number or percent of obs that are “Y”

Page 6: Contingency Tables - faculty.nps.edu

Two-Way Tables in R!

•  Just use table() or xtabs() on two variable!–  E.g., tabulating Q1 by gender in the New Student

Survey:!

8/25/12 6

* Data from 2008 survey of NPS new students

Page 7: Contingency Tables - faculty.nps.edu

Higher-Way Tables in R!

•  Just keep adding variables…!–  E.g., Q1 by gender by country:!

8/25/12 7

* Data from 2008 survey of NPS new students

Page 8: Contingency Tables - faculty.nps.edu

One-Way Goodness-of-Fit Test!

•  Have counts for k categories, x1, x2, …, xk, with x1+ x2 + … + xk = n!

•  (Unknown) population cell probabilities denoted p1, p2, …, pk with p1+ p2 +…+ pk = 1

•  Estimate each cell probability from the observed counts: !

•  The hypotheses to be tested are!!

8/25/12 8

ˆ / , 1,2,...,i ip x n i k= =

* * *0 1 1 2 2

*

: , ,...,

: at least one k k

a i i

H p p p p p pH p p

= = =

Page 9: Contingency Tables - faculty.nps.edu

Goodness-of-Fit Test for Homogeneity!

•  Null hypothesis is the probability of each category is equally likely:!–  I.e., the distribution of category characteristics is

homogeneous in the population!•  If the null is true, in each cell (in a perfect

world) we would expect to observe counts!

•  So, how to do a statistical test that assesses how “far away” the ei expected counts are from the xi observed counts?!

!8/25/12 9

* 1/ , 1,2,...,ip k i k= =

*i ie np=

Page 10: Contingency Tables - faculty.nps.edu

Answer: Chi-square Test!

•  Idea: Look at how far off table counts are from what is expected under the null!

•  Reject if chi-square statistic too large!–  Assess “too large” using chi-squared distribution!

8/25/12 10

22

1

2

1

(observed expected)expected

( )

k

i

ki i

i i

- X

x - ee

=

=

=

=

Page 11: Contingency Tables - faculty.nps.edu

Conducting the Test!

•  First calculate X 2 statistic!•  Then calculate the p-value:!

•  is the chi-square distribution with k-1 degrees of freedom!

•  Reject null if p-value < , for some pre-determined significance level !

8/25/12 11

21kχ −

2 21-value Pr( )kp Xχ −= ≥

αα

Page 12: Contingency Tables - faculty.nps.edu

Example!

•  In Excel:!

•  In R, use the chisq.test() function!–  Default is the GoF test for homogeneity!

8/25/12 12

* Data from 2008 survey of NPS new students; remember, here we are assuming SRS and no fpc, which is actually not true for this data

Page 13: Contingency Tables - faculty.nps.edu

Goodness-of-Fit Test for Other Distributions!

•  Homogeneity is just a special case!•  Can test whether the s are anything as long

as!

•  Might have some theory that says what the distribution should be, for example!

•  Remember, don’t look at that data first and then specify the probabilities… !

8/25/12 13

*ip

*

11

k

iip

=

=∑

Page 14: Contingency Tables - faculty.nps.edu

Example!

•  In Excel:!

•  In R, again use chisq.test() function!–  Now, add a vector for the probabilities!

8/25/12 14

* Data from 2008 survey of NPS new students; remember, here we are assuming SRS and no fpc, which is actually not true for this data

Page 15: Contingency Tables - faculty.nps.edu

A Note!

•  Pearson chi-square test depends on all cells having sufficiently large expected counts:!–  If not, collapse across some categories!–  E.g., !

15

* 5i ie np= ≥

8/25/12

Count and probability for “Strongly Disagree” and “Disagree” aggregated!

* Data from 2008 survey of NPS new students; remember, here we are assuming SRS and no fpc, which is actually not true for this data

Page 16: Contingency Tables - faculty.nps.edu

Some Notation for Two-Way Contingency Tables !

•  Table has r rows and c columns!•  Observed cell counts are xij, with!

•  Denote row sums:!

•  Denote column sums:!

8/25/12 16

1, 1,...,

r

j iji

x x j c•=

= =∑1

, 1,...,c

i ijj

x x i r•=

= =∑1 1

r c

iji j

x n= =

=∑∑

Page 17: Contingency Tables - faculty.nps.edu

Chi-square Test for Independence!

•  Independence means the probability of being in any cell is the product of the row and column probabilities!

8/25/12 17

Variable 1

Variable 2

“A” “B”

“X”

“Y”

Pr(X) x Pr(A) Pr(X)

Pr(Y)

Pr(A) Pr(B)

Pr(X) x Pr(B)

Pr(Y) x Pr(A) Pr(Y) x Pr(B)

Probability that a random obs is a “Y”

Probability that an obs is both “X” and “B”

Page 18: Contingency Tables - faculty.nps.edu

The Hypotheses!

•  Independence means, for all cells in the table, where!–  is the probability of having row i characteristic !–  is the probability of having column j

characteristic!•  The hypotheses to be tested are!!!

!

8/25/12 18

0 : , 1,2,..., ; 1,2,...,

: , for some and ij i j

a ij i j

H p p p i r j cH p p p i j

• •

• •

= = =

ij i jp p p• •=ip •

p• j

Page 19: Contingency Tables - faculty.nps.edu

Chi-square Test Statistic!

•  Test statistic: !

•  Under the null, the expected count is calculated as!

8/25/12 19

22

1 1

( )r cij ij

i j ij

x - eX

e= =

=∑∑

ˆ ˆ ˆ jiij ij i

j

j

i

xxe np np px x

nn n

n

••• •

• •

= = =×

=

Page 20: Contingency Tables - faculty.nps.edu

Conducting the Test!

•  Now, proceed as with the goodness-of-fit test!–  Except degrees of freedom are !

•  Large values of the chi-square statistic are evidence that the null is false!

•  We’ll let R do the p-value calculation!–  Reject null if p-value < , for some pre-determined

significance level !!

8/25/12 20

( 1)( 1)r cν = − −

αα

Page 21: Contingency Tables - faculty.nps.edu

Example: Mobile Learning Survey!

•  In mobile learning devices survey, is there an association between those who own a smartphone and those who own a PDA?!–  “Do you own a smartphone (such as iPhone, Android, and

Blackberry)?” (yes/no)!–  “Do you own a PDA (such as iPad, Zune HD, iPod Touch,

Palm, excluding previously mentioned devices)?” (yes/no)!

!

•  Conclusion: The two sets of responses are not independent, so yes there is an association!

8/25/12 21

* Data from 2010 mobile learning devices survey of NPS students (again, assuming SRS and no fpc)

Page 22: Contingency Tables - faculty.nps.edu

What’s the Connection?!

•  Those who do not own a smartphone are also slightly more likely not to own a PDA!

•  Similarly, those who own a smartphone are slightly more likely to own a PDA!–  Perhaps not a big surprise…!

8/25/12 22

•  Data from 2010 mobile learning devices survey of NPS students (again, assuming SRS and no fpc, and data cleaned up for convenience)

Page 23: Contingency Tables - faculty.nps.edu

Chi-square Test for Homogeneity!

•  The question: Is the distribution of a variable (say on a Likert scale) the same for two or more row categories?!

•  Idea: Each row is a population and proportion that falls in each column category is the same!

•  Good news: Calculation is exactly the same as test for independence!!

8/25/12 23

Page 24: Contingency Tables - faculty.nps.edu

Example: Mobile Learning Survey!

•  In mobile learning devices survey, is the age distribution different for resident and DL students?!

•  Sure looks different, so let’s test it formally:!

8/25/12 24

•  Data from 2010 mobile learning devices survey of NPS students (again, assuming SRS and no fpc, and data cleaned up for convenience)

Page 25: Contingency Tables - faculty.nps.edu

What We Have Just Learned!

•  Discussed tests for contingency tables!–  One-way chi-square goodness-of-fit tests!

•  Homogeneity!•  Other distributions!

–  Two-way chi-square tests !•  Independence!•  Homogeneity !

•  All can be useful for analyzing Likert scale and other categorical survey data!

•  Next class, will learn how to modify for complex sampling situations!

8/25/12 25