the chi-square test for association
DESCRIPTION
The Chi-Square Test for Association. Math 137 Fall‘13 L. Burger. -The Chi Square Test. A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted from a hypothesis Note: - PowerPoint PPT PresentationTRANSCRIPT
1
The Chi-Square Test for AssociationMath 137
Fall‘13L. Burger
-The Chi Square Test• A statistical method used to determine goodness
of fit– Goodness of fit refers to how close the observed data
are to those predicted from a hypothesis• Note:– The chi square test does not prove that a hypothesis is
correct• It evaluates to what extent the data and the hypothesis have
a good fit
3
-Determine The Hypothesis:Whether There is an Association or Not
• Ho : The two variables are independent (not matching up well!)
• Ha : The two variables are associated (variables matching good --- called a ‘good fit!’)
Understanding the Chi-Square Distribution
• You see that there is a range here: if the results were perfect you get a chi-square value of 0 (because obs = exp). This rarely happens: most experiments give a small chi-square value (the hump in the graph).
• Note that all the values are greater than 0: that's because we squared the (obs - exp) term: squaring always gives a non-negative number.
• Sometimes you get really wild results, with obs very different from exp: the long tail on the graph. Really odd things occasionally do happen by chance alone (for instance, you might win the lottery).
Critical Chi-Square• Critical values for chi-square are found on tables,
sorted by degrees of freedom and probability levels. Be sure to use p = 0.05.
• If your calculated chi-square value is greater than the critical value from the table, you “reject the null hypothesis”.
• If your chi-square value is less than the critical value, you “fail to reject” the null hypothesis (that is, you accept that your theory about the expected ratio is correct).
6
-Calculating Test Statistics• Contrasts observed frequencies in each cell of a
contingency table with expected frequencies.• The expected frequencies represent the number of
cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal variables are unrelated).
• Expected frequency of two unrelated events is product of the row and column frequency divided by number of cases.
Fe= Fr Fc / N
7
-Calculating Test Statistics
e
eo
FFF 2
2 )(
8
4. Calculating Test Statistics
e
eo
FFF 2
2 )(
Observedfrequencies
Expe
cted
frequ
ency
Expected
frequency
9
-Determine Degrees of Freedom
df = (R-1)(C-1)
Num
ber of
levels in column
variable
Num
ber of levels in row
variable
10
-Compare computed test statistic against a tabled/critical value
• The computed value of the Pearson chi- square statistic is compared with the critical value to determine if the computed value is improbable
• The critical tabled values are based on sampling distributions of the Pearson chi-square statistic
• If calculated 2 is greater than 2 table value, reject Ho
Example 1: One-dimensional
• Suppose we want to know how people in a particular area will vote in general and go around asking them.
• How will we go about seeing what’s really going on?
Republican Democrat Other
20 30 10
-Determine Hypotheses
• Ho : There is no difference between what is observed and what is expected.
• : There is an association between the observed and expected frequencies.
• Solution: chi-square analysis to determine if our outcome is different from what would be expected if there was no preference
22 ( )O E
E
• Plug in to formula
Republican Democrat Other
Observed 20 30 10Expected 20 20 20
2 2 2(20 20) (30 20) (10 20)20 20 20
• Reject H0
• The district will probably vote democratic, there is association between what is observed and what is expected.
• However…
2
2.05
(2) 10
5.99
Conclusion of Example 1• Note that all we really can conclude is that our data
is different from the expected outcome given a situation– Although it would appear that the district will vote
democratic, really we can only conclude they were not responding by chance
– Regardless of the position of the frequencies we’d have come up with the same result
– In other words, it is a non-directional test regardless of the prediction
16
Example 2-Two Dimensional
• Suppose a researcher is interested in voting preferences on gun control issues.
• A questionnaire was developed and sent to a random sample of 90 voters.
• The researcher also collects information about the political party membership of the sample of 90 respondents.
17
Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row
Democrat 10 10 30 50
Republican 15 15 10 40
f column 25 25 40 n = 90
18
Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row
Democrat 10 10 30 50
Republican 15 15 10 40
f column 25 25 40 n = 90
Observe
d
frequ
encie
s
19
Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row
Democrat 10 10 30 50
Republican 15 15 10 40
f column 25 25 40 n = 90
Row
frequency
20
Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row
Democrat 10 10 30 50
Republican 15 15 10 40
f column 25 25 40 n = 90Column frequency
21
-Determine The Hypothesis
• Ho : There is no difference between D & R in their opinion on gun control issue.
• Ha : There is an association between responses to the gun control survey and the party membership in the population.
22
-Calculating Test Statistics
Favor Neutral Oppose f row
Democrat fo =10fe =13.9
fo =10fe =13.9
fo =30fe=22.2
50
Republican fo =15fe =11.1
fo =15fe =11.1
fo =10fe =17.8
40
f column 25 25 40 n = 90
23
-Calculating Test Statistics
Favor Neutral Oppose f row
Democrat fo =10fe =13.9
fo =10fe =13.9
fo =30fe=22.2
50
Republican fo =15fe =11.1
fo =15fe =11.1
fo =10fe =17.8
40
f column 25 25 40 n = 90
= 50*25/90
24
-Calculating Test Statistics
Favor Neutral Oppose f row
Democrat fo =10fe =13.9
fo =10fe =13.9
fo =30fe=22.2
50
Republican fo =15fe =11.1
fo =15fe =11.1
fo =10fe =17.8
40
f column 25 25 40 n = 90
= 40* 25/90
25
-Calculating Test Statistics
8.17)8.1710(
11.11)11.1115(
11.11)11.1115(
2.22)2.2230(
89.13)89.1310(
89.13)89.1310(
222
2222
= 11.03
26
-Determine Degrees of Freedom
df = (R-1)(C-1) =(2-1)(3-1) = 2
27
-Compare computed test statistic against a tabled/critical value
• α = 0.05• df = 2• Critical tabled value = 5.991• Test statistic, 11.03, exceeds critical value• Null hypothesis is rejected
28
-Conclusion of Example 2
-Democrats & Republicans differ significantly in their opinions on gun control issues
• Ho : There is no difference between D & R in their opinion on gun control issue.
• Ha : There is an association between responses to the gun control survey and the party membership in the population.