math& 146 lesson 25€¦ · math& 146 lesson 25 section 3.3 the goodness of fit test 1....
TRANSCRIPT
MATH& 146
Lesson 25
Section 3.3
The Goodness of Fit Test
1
Goodness of Fit Test
Consider the following questions
• Given a sample of cases that can be classified into
several groups, determine if the sample is
representative of the general population.
• Evaluate whether data resemble a particular
distribution, such as a normal distribution or a
skewed distribution.
Each of these scenarios can be addressed using the
same statistical test: the chi-square goodness of fit
test.2
The Hypotheses
In the goodness of fit test, the counts in each bin of
the observed data are compared to the counts in
each bin some expected distribution. That is,
H0: The data is behaving according to the
expected distribution.
HA: The data is not behaving according to the
expected distribution.
3
The Hypotheses
In statistical notation, the hypotheses are
4
0 category 1 1
category 2 2
category
:
: At least one proportion is different
than expected.
k k
A
H p p
p p
p p
H
Expected Value
To find the expected value, we multiply the
probability by the number of trials.
That is, E = np.
For example, if we flip a coin 80 times, we would
expect 40 heads and 40 tails.
5
Example 1
A bag of Hershey's Miniatures was opened and the
number of each candy was counted. If there were
132 candies total, then how many of each of the
four candies should you expect?
Brand Observed Expected
Krackel 33
Mr. Goodbar 32
Milk Chocolate 37
Dark Chocolate 30
Total 132 1326
Chi-Square Test for One-Way Table
Suppose we are to evaluate whether there is
convincing evidence that a set of observed counts
O1, O2, ..., Ok in k categories are unusually
different from what might be expected under a null
hypothesis.
Call the expected counts that are based on the null
hypothesis E1, E2, ..., Ek.
7
Chi-Square Test for One-Way Table
If each expected count is at least 5 and the null
hypothesis is true, then the test statistic below
follows a chi-square distribution with k – 1 degrees
of freedom:
8
2 2 2
1 1 2 22
1 2
k k
k
O E O E O E
E E E
Example 2
Calculate the chi-square test statistic.
Brand Observed Expected
Krackel 33 33
Mr. Goodbar 32 33
Milk Chocolate 37 33
Dark Chocolate 30 33
Total 132 1329
Conditions
There are three conditions that must be checked
before performing a chi-square test:
1) Independence: Each case that contributes a
count to the table must be independent of all the
other cases in the table.
2) Sample size / distribution: Each particular
scenario (i.e. cell count) must have at least five (5)
expected cases.
3) Number of groups: The proportions of at least
three (3) groups are being tested.
10
Chi-Square Test for One-Way Table
The p-value for this test statistic can be found with the
χ2cdf command to find the area of the upper tail of this
chi-square distribution. We consider the upper tail
because larger values of χ2 would provide greater
evidence against the null hypothesis.
df stands for the degrees of freedom and, for this
test, is the number of groups minus one: df = k – 1.
11
2 2-value cdf -test statistic, BIG, dfp
Example 3
a) State the hypotheses for testing if the four
candies are uniformly distributed in the bag.
b) Check the conditions and calculate p-value.
Based on the p-value, what is your conclusion?
Brand Observed Expected (O – E)2/E
Krackel 33 33 0.0000
Mr. Goodbar 32 33 0.0303
Milk Chocolate 37 33 0.4848
Dark Chocolate 30 33 0.2727
Total 132 132 0.787812
Example 4
A professor using an open source introductory
statistics book predicts that 60% of the students will
purchase a hard copy of the book, 25% will print it out
from the web, and 15% will read it online. At the end
of the semester he asks his students to complete a
survey where they indicate what format of the book
they used.
Of the 126 students, 71 said they bought a hard copy
of the book, 30 said they printed it out from the web,
and 25 said they read it online.
13
Example 4 continued
a) State the hypotheses for testing if the
professor's predictions were inaccurate.
b) How many students did the professor expect to
buy the book, print the book, and read the book
exclusively online?
Observed Expected
purchase a hard copy 71
print it out 30
read it online 25
Total 12614
Example 4 continued
c) This is an appropriate setting for a chi-square
test. List the conditions required for a test and
verify they are satisfied.
d) Calculate the chi-squared statistic, the degrees
of freedom associated with it, and the p-value.
Observed Expected
purchase a hard copy 71 75.6
print it out 30 31.5
read it online 25 18.9
Total 126 12615
Example 4 continued
e) Based on the p-value calculated in part (d),
what is the conclusion of the hypothesis test?
Interpret your conclusion in this context.
Observed Expected
purchase a hard copy 71
print it out 30
read it online 25
Total 12616
Example 5
Absenteeism of college students from math classes is
a major concern to math instructors because missing
class appears to increase the drop rate. Three
statistics instructors wondered whether the absentee
rate was the same for every day of the school week.
Absences Observed Absences Expected
Monday 26
Tuesday 19
Wednesday 17
Thursday 18
Friday 40
Total 120 120 17
Example 5 continued
They took a sample of absent students from three of
their statistics classes during one week of the term.
The results of the survey appear in the table.
Run a goodness-of-fit test to determine if the absentee
rate is the same throughout the week.
Absences Observed Absences Expected
Monday 26
Tuesday 19
Wednesday 17
Thursday 18
Friday 40
Total 120 120 18
Example 6
Many people know the mathematical
constant π is approximately 3.14. But
that’s not exact. To be more precise,
here are 20 decimal places:
3.14159265358979323846. Still not
exact, though.
In fact, the actual value is irrational, a
decimal that goes on forever without
any repeating pattern.
Digit Count
0 99,959
1 99,758
2 100,026
3 100,229
4 100,230
5 100,359
6 99,548
7 99,800
8 99,985
9 100,106
19
Example 6 continued
The table shows the number of times
each digit appears in the first million
digits.
Test the hypothesis that the digits 0
through 9 are uniformly distributed in
the decimal representation of π.
(To help you out, most of the
calculations can be found on the next
slide.)
Digit Count
0 99,959
1 99,758
2 100,026
3 100,229
4 100,230
5 100,359
6 99,548
7 99,800
8 99,985
9 100,106
20
Example 6 continued
Digit Observed Expected (O – E)2/E
0 99,959 100,000 0.0168
1 99,758 100,000 0.5856
2 100,026 100,000 0.0068
3 100,229 100,000 0.5244
4 100,230 100,000 0.5290
5 100,359 100,000 1.2888
6 99,548 100,000 2.0430
7 99,800 100,000 0.4000
8 99,985 100,000 0.0023
9 100,106 100,000 0.1124
Total 1,000,000 1,000,000 5.5091
21