math& 146 lesson 25€¦ · math& 146 lesson 25 section 3.3 the goodness of fit test 1....

21
MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1

Upload: others

Post on 01-Jun-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

MATH& 146

Lesson 25

Section 3.3

The Goodness of Fit Test

1

Page 2: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Goodness of Fit Test

Consider the following questions

• Given a sample of cases that can be classified into

several groups, determine if the sample is

representative of the general population.

• Evaluate whether data resemble a particular

distribution, such as a normal distribution or a

skewed distribution.

Each of these scenarios can be addressed using the

same statistical test: the chi-square goodness of fit

test.2

Page 3: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

The Hypotheses

In the goodness of fit test, the counts in each bin of

the observed data are compared to the counts in

each bin some expected distribution. That is,

H0: The data is behaving according to the

expected distribution.

HA: The data is not behaving according to the

expected distribution.

3

Page 4: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

The Hypotheses

In statistical notation, the hypotheses are

4

0 category 1 1

category 2 2

category

:

: At least one proportion is different

than expected.

k k

A

H p p

p p

p p

H

Page 5: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Expected Value

To find the expected value, we multiply the

probability by the number of trials.

That is, E = np.

For example, if we flip a coin 80 times, we would

expect 40 heads and 40 tails.

5

Page 6: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Example 1

A bag of Hershey's Miniatures was opened and the

number of each candy was counted. If there were

132 candies total, then how many of each of the

four candies should you expect?

Brand Observed Expected

Krackel 33

Mr. Goodbar 32

Milk Chocolate 37

Dark Chocolate 30

Total 132 1326

Page 7: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Chi-Square Test for One-Way Table

Suppose we are to evaluate whether there is

convincing evidence that a set of observed counts

O1, O2, ..., Ok in k categories are unusually

different from what might be expected under a null

hypothesis.

Call the expected counts that are based on the null

hypothesis E1, E2, ..., Ek.

7

Page 8: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Chi-Square Test for One-Way Table

If each expected count is at least 5 and the null

hypothesis is true, then the test statistic below

follows a chi-square distribution with k – 1 degrees

of freedom:

8

2 2 2

1 1 2 22

1 2

k k

k

O E O E O E

E E E

Page 9: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Example 2

Calculate the chi-square test statistic.

Brand Observed Expected

Krackel 33 33

Mr. Goodbar 32 33

Milk Chocolate 37 33

Dark Chocolate 30 33

Total 132 1329

Page 10: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Conditions

There are three conditions that must be checked

before performing a chi-square test:

1) Independence: Each case that contributes a

count to the table must be independent of all the

other cases in the table.

2) Sample size / distribution: Each particular

scenario (i.e. cell count) must have at least five (5)

expected cases.

3) Number of groups: The proportions of at least

three (3) groups are being tested.

10

Page 11: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Chi-Square Test for One-Way Table

The p-value for this test statistic can be found with the

χ2cdf command to find the area of the upper tail of this

chi-square distribution. We consider the upper tail

because larger values of χ2 would provide greater

evidence against the null hypothesis.

df stands for the degrees of freedom and, for this

test, is the number of groups minus one: df = k – 1.

11

2 2-value cdf -test statistic, BIG, dfp

Page 12: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Example 3

a) State the hypotheses for testing if the four

candies are uniformly distributed in the bag.

b) Check the conditions and calculate p-value.

Based on the p-value, what is your conclusion?

Brand Observed Expected (O – E)2/E

Krackel 33 33 0.0000

Mr. Goodbar 32 33 0.0303

Milk Chocolate 37 33 0.4848

Dark Chocolate 30 33 0.2727

Total 132 132 0.787812

Page 13: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Example 4

A professor using an open source introductory

statistics book predicts that 60% of the students will

purchase a hard copy of the book, 25% will print it out

from the web, and 15% will read it online. At the end

of the semester he asks his students to complete a

survey where they indicate what format of the book

they used.

Of the 126 students, 71 said they bought a hard copy

of the book, 30 said they printed it out from the web,

and 25 said they read it online.

13

Page 14: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Example 4 continued

a) State the hypotheses for testing if the

professor's predictions were inaccurate.

b) How many students did the professor expect to

buy the book, print the book, and read the book

exclusively online?

Observed Expected

purchase a hard copy 71

print it out 30

read it online 25

Total 12614

Page 15: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Example 4 continued

c) This is an appropriate setting for a chi-square

test. List the conditions required for a test and

verify they are satisfied.

d) Calculate the chi-squared statistic, the degrees

of freedom associated with it, and the p-value.

Observed Expected

purchase a hard copy 71 75.6

print it out 30 31.5

read it online 25 18.9

Total 126 12615

Page 16: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Example 4 continued

e) Based on the p-value calculated in part (d),

what is the conclusion of the hypothesis test?

Interpret your conclusion in this context.

Observed Expected

purchase a hard copy 71

print it out 30

read it online 25

Total 12616

Page 17: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Example 5

Absenteeism of college students from math classes is

a major concern to math instructors because missing

class appears to increase the drop rate. Three

statistics instructors wondered whether the absentee

rate was the same for every day of the school week.

Absences Observed Absences Expected

Monday 26

Tuesday 19

Wednesday 17

Thursday 18

Friday 40

Total 120 120 17

Page 18: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Example 5 continued

They took a sample of absent students from three of

their statistics classes during one week of the term.

The results of the survey appear in the table.

Run a goodness-of-fit test to determine if the absentee

rate is the same throughout the week.

Absences Observed Absences Expected

Monday 26

Tuesday 19

Wednesday 17

Thursday 18

Friday 40

Total 120 120 18

Page 19: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Example 6

Many people know the mathematical

constant π is approximately 3.14. But

that’s not exact. To be more precise,

here are 20 decimal places:

3.14159265358979323846. Still not

exact, though.

In fact, the actual value is irrational, a

decimal that goes on forever without

any repeating pattern.

Digit Count

0 99,959

1 99,758

2 100,026

3 100,229

4 100,230

5 100,359

6 99,548

7 99,800

8 99,985

9 100,106

19

Page 20: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Example 6 continued

The table shows the number of times

each digit appears in the first million

digits.

Test the hypothesis that the digits 0

through 9 are uniformly distributed in

the decimal representation of π.

(To help you out, most of the

calculations can be found on the next

slide.)

Digit Count

0 99,959

1 99,758

2 100,026

3 100,229

4 100,230

5 100,359

6 99,548

7 99,800

8 99,985

9 100,106

20

Page 21: MATH& 146 Lesson 25€¦ · MATH& 146 Lesson 25 Section 3.3 The Goodness of Fit Test 1. Goodness of Fit Test Consider the following questions • Given a sample of cases that can

Example 6 continued

Digit Observed Expected (O – E)2/E

0 99,959 100,000 0.0168

1 99,758 100,000 0.5856

2 100,026 100,000 0.0068

3 100,229 100,000 0.5244

4 100,230 100,000 0.5290

5 100,359 100,000 1.2888

6 99,548 100,000 2.0430

7 99,800 100,000 0.4000

8 99,985 100,000 0.0023

9 100,106 100,000 0.1124

Total 1,000,000 1,000,000 5.5091

21