contingency table analysis. contingency tables show frequencies produced by cross-classifying...

72
Contingency Table Analysis

Upload: justice-priddy

Post on 31-Mar-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Contingency Table Analysis

Page 2: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• contingency tables show frequencies produced by cross-classifying observations

• e.g., pottery described simultaneously according to vessel form & surface decoration

polished burnished matte

bowl 47 28 3

jar 30 42 8

olla 6 45 25

Page 3: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• most statistical tests for tables are designed for analyzing 2-dimensions– only examine the interaction of two variables at

one time…

• most efficient when used with nominal data– using ratio data means recoding data to a lower

scale of measurement (ordinal)– means ignoring some of the information

originally available…

Page 4: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• still, you might do this, particularly if you are interested in association between metric and non-metric variables

• e.g.: variation in pot size vs. surface decoration…

• may decide to divide pot size into ordinal classes…

Page 5: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

largemediumsmall

largesmall

Page 6: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

small large

specular 4 13

non-specular 15 18

rim diameter:

slip:

• other options may let you retain more of the original information content

Page 7: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

non-specularslip

specularslip

• could use a “t-test” to test the equality of the means

• makes full use of ratio data…

Page 8: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• why do we work with contingency tables??

polished burnished matte

bowl 47 28 3

jar 30 42 8

olla 6 45 25

Page 9: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• because we think there may be some kind of interaction between the variables…

• basic question: can the state of one variable be predicted from the state of another variable?

• if not, they are independent

polished burnished matte

bowl 47 28 3

jar 30 42 8

olla 6 45 25

Page 10: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

expected counts

• a baseline to which observed counts can be compared

• counts that would occur by random chance if the variables are independent, over the long run

• for any cell E = (col total * row total)/table total

Page 11: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

M F

PP 4 1 5 45% 2.3 2.7 5

Pot 1 5 6 55% 2.7 3.3 6

Total 5 6 11 5 6 11

45% 55%

Page 12: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

significance

• = probability of getting, by chance, a table as or more deviant than the observed table, if the variables are independent– ‘deviant’ defined in terms of expected table

• no causality is necessarily implied by the outcome– but, causality may well be the reason for

observed association…– e.g.: grave goods and sex

Page 13: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Fisher’s Exact Test

• just for 2 x 2 tables

• useful where chi-square test is inappropriate

• gives the exact probability of all tables with• the same marginal totals

• as or more deviant

than the observed table…

Page 14: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

P = (a+b)!(a+c)!(b+d)!(c+d)! / (N!a!b!c!d!)

P = 5!5!6!6! / 11!4!1!1!5! = 5*6!6! / 11!

P = 5*6!6! / 11! = 5*6! / 11*10*9*8*7

P = 5*6! / 11*10*9*8*7 = 3600 / 55440

P = .065

a b

c d

4 1

1 5

Page 15: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

P = .065

use R (or Excel) if the counts aren’t too large…

> fisher.test(x)

a b

c d

4 1

1 5

Page 16: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

0 5 5

5 1 6

5 6 11

1 4 5

4 2 6

5 6 11

2 3 5

3 3 6

5 6 11

3 2 5

2 4 6

5 6 11

4 1 5

1 5 6

5 6 11

5 0 5

0 6 6

5 6 11

Page 17: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

0 5

5 1

1 4

4 2

2 3

3 3

3 2

2 4

4 1

1 5

5 0

0 6

2.3 2.7

2.7 3.3

0.013

0.162

0.433

0.325

0.065

0.002

• P = 0.065+0.002 = 0.067 or

• P = 0.067+0.013 = 0.080

(observed)

(expected)

Page 18: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• 2-tailed test = 0.067+0.013 = 0.080• 1-tailed test = 0.065+0.002 = 0.067

M F

PP 4 1 5

Pot 1 5 6

5 6 11

> fisher.test(x, alt = "two.sided")

> fisher.test(x, alt = “greater”)[i.e.: H1: odds ratio > 1]

in R:

Page 19: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Chi-square Statistic

• an aggregate measure (i.e., based on the entire table)

• the greater the deviation from expected values, the larger (exponentially!) the chi-square statistic…

• one could devise others that would place less emphasis on large deviations |o-e|/e

k

i i

ii

E

EO

1

22

Page 20: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• X2 is distributed approximately in accord with the X2 probability distribution

• X2 probabilities are traditionally found in a table showing threshold values from a CPD– need degrees of freedom– df = (r-1)*(c-1)

• just use R…

Page 21: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Status:low intermed. high

Ritual arch.: altar 7 20 16 43no altar 18 22 8 48

25 42 24 91

low intermed. highaltar 11.8 19.8 11.3 43no altar 13.2 22.2 12.7 48

25 42 24 91

low intermed. highaltar 2.0 0.0 1.9 3.9no altar 1.8 0.0 1.7 3.5

3.7 0.0 3.6 7.3

(43*24)91

(7-11.8)2

11.8

=2

.025

Page 22: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according
Page 23: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

X2 assumptions & problems

• must be based on counts:– not percentages, ratios or weighted data

• fails to give reliable results if expected counts are too low:

2 3

3 3

2.27 2.72

2.72 3.27

obs. exp.

X2=0.74

P(Fishers)=1.0

5

665

Page 24: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

rules of thumb

1. no expected counts less than 5– almost certainly too stringent

2. no exp. counts less than 2, and 80% of counts > 5

– more relaxed (but more realistic)

Page 25: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

collapsing tables

• can often combine columns/rows to increase expected counts that are too low– may increase or reduce interpretability– may create or destroy structure in the table

• no clear guidelines– avoid simply trying to identify the combination

of cells that produces a “significant” result

Page 26: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

8 3 6 2 196 1 6 5 186 4 5 4 193 12 8 3 26

23 20 25 14 82

5.3 4.6 5.8 3.2 195.0 4.4 5.5 3.1 185.3 4.6 5.8 3.2 197.3 6.3 7.9 4.4 2623 20 25 14 82

11 8 197 11 18

10 9 1915 11 2643 39 82

10.0 9.0 199.4 8.6 18

10.0 9.0 1913.6 12.4 26

43 39 82

obs. counts

exp. counts

obs. counts

exp. counts

Page 27: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• chi-square is basically a measure of significance

• it is not a good measure of strength of association

• can help you decide if a relationship exists, but not how strong it is

Page 28: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

17 1313 17

60

34 2626 34

120

X2=1.07alpha=.30

X2=2.13alpha=.14

Page 29: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• also, chi-square is a ‘global statistic’

• says nothing (directly) about which parts of a table may be ‘driving’ a large chi-square statistic

• ‘chi-square contributions’ from individual cells can help:

low intermed. highaltar 2.0 0.0 1.9 3.9no altar 1.8 0.0 1.7 3.5

3.7 0.0 3.6 7.3

Page 30: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Monte Carlo test of X2 significance

• based on simulated generation of cell-counts under imposed conditions of independence

• randomly assign counts to cells:

23 14 8 4515 6 13 3438 20 21 79

Page 31: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• significance is simply the proportion of outcomes that produced a X2 statistic >= observed

• not based on any assumptions about the distribution of the X2 statistic

• overcomes the problems associated with small expected frequencies

Page 32: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

G Test

• a measure of significance for any r x c table

• look up critical values of G2 in an ordinary chi-square table; figure out degrees of freedom the same way

• conforms to chi-square distribution better than the chi-square statistic

k

i i

iei E

OOG

1

2 log*2

Page 33: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

an R function for G2

gsq.test function(obs) {

df (nrow(obs)-1) * (ncol(obs)-1)

exp chisq.test(obs)$expected

G 2*sum(obs*log(obs/exp))

2*dchisq(G, df)

}

Page 34: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Measures of Association

Page 35: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Phi-Square (2)

• an attempt to remove the effects of sample size that makes chi-square inappropriate for measuring association

• divide chi-square by n2=X2/n

• limits:0: variables are independent

1: perfect association in a 2x2 table;

no upper limit in larger tables

Page 36: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

17 1313 17

60

34 2626 34

120

2=0.18

2=0.18

Page 37: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Cramer’s V

• also a measure of strength of association• an attempt to standardize phi-square

(i.e., control the lack of an upper boundary in tables larger than 2x2 cells)

• V= 2/mwhere m=min(r-1,c-1) ; i.e., the smaller of rows-1 or columns-1)

• limits: 0-1 for any size table; 1=highest possible association

Page 38: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Yule’s Q

• for 2x2 tables only• Q = (ad-bc)/(ad+bc)

a b

c d

Page 39: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Yule’s Q

• often used to assess the strength of presence / absence association

• range is –1 (perfect negative association) to 1 (perfect positive association); values near 0 indicate a lack of association

Bone needles + - Male burial + 12 14 - 16 3

Q = -.72

Page 40: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Yule’s Q

• not sensitive to marginal changes (unlike Phi2)• multiply a row or column by a constant;

cancels out…

jars ollas Source A 19 10 Source B 6 15

jars ollas Source A 19 20 Source B 6 30

(Q=.65 for both tables)

Page 41: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Yule’s Q

• can’t distinguish between different degrees of ‘complete’ association

• can’t distinguish between ‘complete’ and ‘absolute’ association

M FRHS 60 20LHS 0 20

100

M FRHS 60 10LHS 0 30

100

M FRHS 60 0LHS 0 40

100

Page 42: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

“odds” ratio

• easiest with 2 x 2 tables

• what are the ‘odds’ of a man being buried on his right side, compared to those of a woman??

• if there is a strong level of association between sex and burial position, the odds should be quite different…

Page 43: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

a b

c d

acbd

odds ratio =

Page 44: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

29/11=2.64

14/33=0.42

2.64/0.42=6.21

if there is no association, the odds ratio=1departures from 1 range between 0 and infinity

>1 =‘positive association’

<1 =‘negative association’

M F

RHS 29 14 43

LHS 11 33 44

40 47 87

Page 45: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Goodman and Kruskal’s Tau ()

• “proportional reduction of error”

• how are the probabilities of correctly assigning cases to one set of categories improved by the knowledge of another set of categories??

Page 46: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Goodman and Kruskal’s Tau ()

• limits are 0-1; 1=perfect association

• same results as Phi2 w/ 2x2 table

• sensitive to margin differences

• asymmetric– get different results predicting row assignments

based on columns than from column assignments based on rows

Page 47: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

=[P(error|rule 1)-P(error|rule 2)] / P(error|rule 1)

• rule 1: random assignments to one variable are made with no knowledge of 2nd variable

• rule 2: random assignments to one variable are made with knowledge of 2nd variable

B1 B2

A1

A2

6 14 20

B1 B2

A1 6 0

A2 0 14

6 14 20

Page 48: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Table Standardization

• even very large and highly significant X2 (or G2) statistics don’t necessarily mean that all parts of the table are equally “deviant” (and therefore interesting)

• usually need to do other things to highlight loci of association or ‘interaction’

• which cells diverge the most from expected values?• very difficult to decide when both row and column

totals are unequal…

Page 49: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Percent standardization

• highly intuitive approach, easy to interpret

• often used to control the effects of sample-size variation

• have to decide if it makes better sense to standardize based on rows, or on columns

Page 50: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• usually, you want to standardize whatever it is you want to compare– i.e., if you want to compare columns, base

percents on column totals

• you may decide to make two tables, one standardized on rows, the other on columns…

Page 51: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

SiteFauna A B Cbear 2 1 0 3moose 15 5 10 30coyote 2 0 0 2rabbit 16 8 12 36dog 2 3 0 5deer 16 8 7 31

53 25 29 107

SiteFauna A B Cbear 3.8 4.0 0.0moose 28.3 20.0 34.5coyote 3.8 0.0 0.0rabbit 30.2 32.0 41.4dog 3.8 12.0 0.0deer 30.2 32.0 24.1

100 100 100

MNIs

SiteFauna A B Cbear 66.7 33.3 0.0 100moose 50.0 16.7 33.3 100coyote 100.0 0.0 0.0 100rabbit 44.4 22.2 33.3 100dog 40.0 60.0 0.0 100deer 51.6 25.8 22.6 100

Page 52: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Binomial Probabilities

• P(n,k,p):“probability of k successes in n trials, with p probability of success in any one trial”

5 31 4

13

3.7 4.32.3 2.7

13

n = 13k = 5

p = 3.7/13

Page 53: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

Binomial Probabilities

• in R:> pbinom(k, n, p)

• easy to build into a function…

Page 54: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

10

20

30

40

50

60

70

80

90

100

perc

ent

10

20

30

40

50

60

70

80

90

100

cum

ulat

ive

perc

ent

K-S test for cumulative percents

Page 55: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

10

20

30

40

50

60

70

80

90

100

cum

ulat

ive

perc

ent

Page 56: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

10

20

30

40

50

60

70

80

90

100

cum

ulat

ive

perc

ent

• some useful statistical measures

(ordinal or ratio scale)

• can be misleading when used with nominal data

• good for comparing data sets

Cumulative Percent Graph

Page 57: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

PercentagesSitesA B C

Types 1 5 5 52 45 0 303 5 48 54 5 5 55 5 5 56 5 5 57 20 5 358 5 22 59 5 5 5

100 100 100

Cumulative PercentsSitesA B C

Types 1 5 5 52 50 5 353 55 53 404 60 58 455 65 63 506 70 68 557 90 73 908 95 95 959 100 100 100

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9

A

B

C

Page 58: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9

A

B

C

0

20

40

60

80

100

120

1 5 3 4 2 6 7 8 9

A

B

C

Page 59: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

K-S test

• find Dmax:– maximum difference between 2 cumulative

proportion distributions– compare to critical value for chosen sig. level

• C*((n1+n2)/(n1n2))^.5

– alpha =.05, C=1.36– alpha =.01, C=1.63– alpha =.001, C=1.95

Page 60: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

example 2

• mortuary data (Shennan, p. 56+)• burials characterized according to 2 wealth

(poor vs. wealthy) and 6 age categories (infant to old age)

Rich Poor

Infans I 6 23

Infans II 8 21

Juvenilis 11 25

Adultus 29 36

Maturus 19 27

Senilis 3 4

Total 76 136

Page 61: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• burials for younger age-classes appear to be more numerous among the poor

• can this be explained away as an example of random chance?

or

• do poor burials constitute a different population, with respect to age-classes, than rich burials?

Page 62: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• we can get a visual sense of the problem using a cumulative frequency plot:

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Infa

ns I

Infa

ns II

Juve

nilis

Adu

ltus

Mat

urus

Sen

ilis

rich

poor

Page 63: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• K-S test (Kolmogorov-Smirnov test) assesses the significance of the maximum divergence between two cumulative frequency curves

H0:dist1=dist2

• an equation based on the theoretical distribution of differences between cumulative frequency curves provides a critical value for a specific alpha level

• observed differences beyond this value can be regarded as significant at that alpha level

Page 64: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• if alpha = .05, the critical value =

1.36*(n1+n2)/n1n2

1.36*(76+136)/76*136 = 0.195

• the observed value = 0.178

• 0.178 < 0.195; don’t reject H0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Infa

ns I

Infa

ns II

Juve

nilis

Adu

ltus

Mat

urus

Sen

ilis

rich

poor

Dmax=.178

Page 65: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

age <= 30 age > 30

strongly disagree 8 7

mildly disagree 5 9

disagree 6 6

no opinion 0 1

agree 2 2

mildly agree 1 3

strongly agree 2 3

statement/question: “Oil exploration should be allowed in coastal California…”

example 2

Page 66: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

example 3

• survey data 100 sites• broken down by location and time:

  early late Total

piedmont 31 19 50

plain 19 31 50

Total 50 50 100

Page 67: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• we can do a chi-square test of independence of the two variables time and location

• H0:time & location are independent

• alpha = .05time

location

H0

location

time

H1

Page 68: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

2 values reflect accumulated differences between observed and expected cell-counts

• expected cell counts are based on the assumptions inherent in the null hypothesis

• if the H0 is correct, cell values should reflect an “even” distribution of marginal totals

  early late Totalpiedmont 50plain 50Total 50 50 100

25

Page 69: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• chi-square = ((o-e)2/e)

• observed chi-square = 4.84

• we need to compare it to the “critical value” in a chi-square table:

Page 70: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according
Page 71: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• chi-square = ((o-e)2/e)• observed chi-square = 4.84• chi-square table:

critical value (alpha = .05, 1 df) is 3.84 observed chi-square (4.84) > 3.84

• we can reject H0

• H1: time & location are not independent

Page 72: Contingency Table Analysis. contingency tables show frequencies produced by cross-classifying observations e.g., pottery described simultaneously according

• what does this mean?

  early late Total

piedmont 31 19 50

plain 19 31 50

Total 50 50 100