the chi-square test for association

1

The Chi-Square Test for AssociationMath 137

Fall‘13L. Burger

-The Chi Square Test• A statistical method used to determine goodness

of fit– Goodness of fit refers to how close the observed data

are to those predicted from a hypothesis• Note:– The chi square test does not prove that a hypothesis is

correct• It evaluates to what extent the data and the hypothesis have

a good fit

3

-Determine The Hypothesis:Whether There is an Association or Not

• Ho : The two variables are independent (not matching up well!)

• Ha : The two variables are associated (variables matching good --- called a ‘good fit!’)

Understanding the Chi-Square Distribution

• You see that there is a range here: if the results were perfect you get a chi-square value of 0 (because obs = exp). This rarely happens: most experiments give a small chi-square value (the hump in the graph).

• Note that all the values are greater than 0: that's because we squared the (obs - exp) term: squaring always gives a non-negative number.

• Sometimes you get really wild results, with obs very different from exp: the long tail on the graph. Really odd things occasionally do happen by chance alone (for instance, you might win the lottery).

Critical Chi-Square• Critical values for chi-square are found on tables,

sorted by degrees of freedom and probability levels. Be sure to use p = 0.05.

• If your calculated chi-square value is greater than the critical value from the table, you “reject the null hypothesis”.

• If your chi-square value is less than the critical value, you “fail to reject” the null hypothesis (that is, you accept that your theory about the expected ratio is correct).

6

-Calculating Test Statistics• Contrasts observed frequencies in each cell of a

contingency table with expected frequencies.• The expected frequencies represent the number of

cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal variables are unrelated).

• Expected frequency of two unrelated events is product of the row and column frequency divided by number of cases.

Fe= Fr Fc / N

7

-Calculating Test Statistics

e

eo

FFF 2

2 )(

8

4. Calculating Test Statistics

e

eo

FFF 2

2 )(

Observedfrequencies

Expe

cted

frequ

ency

Expected

frequency

9

-Determine Degrees of Freedom

df = (R-1)(C-1)

Num

ber of

levels in column

variable

Num

ber of levels in row

variable

10

-Compare computed test statistic against a tabled/critical value

• The computed value of the Pearson chi- square statistic is compared with the critical value to determine if the computed value is improbable

• The critical tabled values are based on sampling distributions of the Pearson chi-square statistic

• If calculated 2 is greater than 2 table value, reject Ho

Example 1: One-dimensional

• Suppose we want to know how people in a particular area will vote in general and go around asking them.

• How will we go about seeing what’s really going on?

Republican Democrat Other

20 30 10

-Determine Hypotheses

• Ho : There is no difference between what is observed and what is expected.

• : There is an association between the observed and expected frequencies.

• Solution: chi-square analysis to determine if our outcome is different from what would be expected if there was no preference

22 ( )O E

E

• Plug in to formula

Republican Democrat Other

Observed 20 30 10Expected 20 20 20

2 2 2(20 20) (30 20) (10 20)20 20 20

• Reject H0

• The district will probably vote democratic, there is association between what is observed and what is expected.

• However…

2

2.05

(2) 10

5.99

Conclusion of Example 1• Note that all we really can conclude is that our data

is different from the expected outcome given a situation– Although it would appear that the district will vote

democratic, really we can only conclude they were not responding by chance

– Regardless of the position of the frequencies we’d have come up with the same result

– In other words, it is a non-directional test regardless of the prediction

16

Example 2-Two Dimensional

• Suppose a researcher is interested in voting preferences on gun control issues.

• A questionnaire was developed and sent to a random sample of 90 voters.

• The researcher also collects information about the political party membership of the sample of 90 respondents.

17

Bivariate Frequency Table or Contingency Table

Favor Neutral Oppose f row

Democrat 10 10 30 50

Republican 15 15 10 40

f column 25 25 40 n = 90

18





f column 25 25 40 n = 90

Observe

d

frequ

encie

s

19





f column 25 25 40 n = 90

Row

frequency

20





f column 25 25 40 n = 90Column frequency

21

-Determine The Hypothesis

• Ho : There is no difference between D & R in their opinion on gun control issue.

• Ha : There is an association between responses to the gun control survey and the party membership in the population.

22



Democrat fo =10fe =13.9

fo =10fe =13.9

fo =30fe=22.2

50

Republican fo =15fe =11.1

fo =15fe =11.1

fo =10fe =17.8

40

f column 25 25 40 n = 90

23




fo =10fe =13.9

fo =30fe=22.2

50


fo =15fe =11.1

fo =10fe =17.8

40

f column 25 25 40 n = 90

= 50*25/90

24




fo =10fe =13.9

fo =30fe=22.2

50


fo =15fe =11.1

fo =10fe =17.8

40

f column 25 25 40 n = 90

= 40* 25/90

25


8.17)8.1710(

11.11)11.1115(

11.11)11.1115(

2.22)2.2230(

89.13)89.1310(

89.13)89.1310(

222

2222

= 11.03

26

-Determine Degrees of Freedom

df = (R-1)(C-1) =(2-1)(3-1) = 2

27

-Compare computed test statistic against a tabled/critical value

• α = 0.05• df = 2• Critical tabled value = 5.991• Test statistic, 11.03, exceeds critical value• Null hypothesis is rejected

28

-Conclusion of Example 2

-Democrats & Republicans differ significantly in their opinions on gun control issues

• Ho : There is no difference between D & R in their opinion on gun control issue.

• Ha : There is an association between responses to the gun control survey and the party membership in the population.

the chi-square test for association

Documents

chisquare test

calculated chisquare

table value

small chisquare value

chisquare distributionyou

null hypothesis

expected frequencies

computed test statistic