chapter 16: analysis of categorical data. lo1use the chi-square goodness-of-fit test to analyze...

31
Chapter 16: Analysis of Categorical Data

Upload: fay-bishop

Post on 18-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

In chapter 5 the binomial distribution was used to analyze experiments or trials that had only two possible outcomes An extension of this problem is a multinomial distribution in which more than two possible outcomes can occur The χ 2 goodness-of-fit test is used to analyze probabilities of multinomial distribution trials along a single dimension. It compares expected (theoretical) frequencies of categories from a population distribution to the observed (actual) frequencies from a distribution to determine whether there is a difference between what was expected and what was observed.  2 Goodness-of-Fit Test LO1

TRANSCRIPT

Page 1: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

Chapter 16:Analysis of

Categorical Data

Page 2: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

LO1 Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along a single dimension.

LO2 Use the chi-square test of independence to perform contingency analysis.

Learning Objectives

Page 3: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

• In chapter 5 the binomial distribution was used to analyze experiments or trials that had only two possible outcomes

• An extension of this problem is a multinomial distribution in which more than two possible outcomes can occur

• The χ2 goodness-of-fit test is used to analyze probabilities of multinomial distribution trials along a single dimension.

• It compares expected (theoretical) frequencies of categories from a population distribution to the observed (actual) frequencies from a distribution to determine whether there is a difference between what was expected and what was observed.

2 Goodness-of-Fit Test

LO1

Page 4: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

• Hypothesize– Step1: The hypotheses

• Test– Step 2: The appropriate statistical tests for the problem– Step 3: Set α value– Step 4: Determine the degrees of freedom– Step 5: Determine the expected frequencies– Step 6: Calculate the observed value of chi-square

• Action– Step 7: Make decision to accept or reject null hypothesis

• Business Implication• Use the information to answer research questions

Formulating Test of Hypothesis

LO1

Page 5: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

• When the expected value of a category is small, a large chi-square value can be obtained erroneously, leading to a type I error

• Control: to control for this potential error, the chi-square goodness of fit test should not be used when any of the expected frequencies is less than 5

• If the observed data produce expected values of less than 5, combining adjacent categories (when meaningful) to create larger frequencies may be possible

Small Expected Values of a Category

LO1

Page 6: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

• The formula which is used to compute the test statistic for a chi-square goodness-of-fit test is given below.

2 Goodness-of-Fit Test

LO1

Page 7: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

Milk Sales Data for Demonstration Problem 16.1

Month Litres of Milk

January 1,610

February 1,585

March 1,649

April 1,590

May 1,540

June 1,397

July 1,410

August 1,350

September 1,495

October 1,564

November 1,602

December 1,655

TOTAL 18,447LO1

Page 8: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

Hypotheses and Decision Rules for Demonstration Problem 16.1

ddistributeuniformly not are salesmilk for figuresmonthly The :H

ddistributeuniformly are salesmilk for figuresmonthly The :H

a

o

.

.. ,

011

12 1 011

24 72501 11

2

df k cIf reject H .

If do not reject H .

Cal

2o

Cal

2o

24 725

24 725

. ,

. ,

LO1

Page 9: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

Calculations for Demonstration Problem 16.1

1844712

1537.25ef

Cal

274 37 .

Month f0 fe (f0 –fe )2 / fe

January 1,610 1,537.25 3.44

February 1,585 1,537.25 1.48

March 1,649 1,537.25 8.12

April 1,590 1,537.25 1.81

May 1,540 1,537.25 0.00

June 1,397 1,537.25 12.80

July2 1,410 1,537.25 10.53

August 1,350 1,537.25 22.81

September 1,495 1,537.25 1.16

October 1,564 1,537.25 0.47

November 1,602 1,537.25 2.73

December 1,655 1,537.25 9.02

Totals 18,447 18,447 74.37LO1

Page 10: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

• The observed chi-square value of 74.37 is greater than the critical value of 24.725.

• The decision is to reject the null hypothesis. The data provides enough evidence to indicate that the distribution of milk sales is not uniform.

Calculations for Demonstration Problem 16.1

LO1

Page 11: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

Calculations for Demonstration Problem 16.1

LO1

Page 12: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

Bank Customer Arrival Data for Demonstration Problem 16.2

Number of Arrivals

Observed Frequencies

0 71 182 253 174 12

5 5

LO1

Page 13: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

Hypotheses and Decision Rules for Demonstration Problem 16.2

LO1

Page 14: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

Calculations for Demonstration Problem 16.2:

Estimating the Mean Arrival Rate

f Xf

19284

2 3. customers per minute

MeanArrivalRate

Number of Arrivals

X

Observed Frequencies

f f·X0 7 01 18 182 25 503 17 514 12 48

5 5 25192

LO1

Page 15: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

Calculations for Demonstration Problem 16.2: Poisson Probabilities for = 2.3

Number of Arrivals X

Expected Probabilities

P(X)

Expected Frequencies

n·P(X)0 0.1003 8.421 0.2306 19.372 0.2652 22.283 0.2033 17.084 0.1169 9.82

0.0838 7.04

n f

84

PoissonProbabilities

for = 2.3LO1

Page 16: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

2 Calculations for Demonstration Problem 16.2

Cal

2174 .Number of

Arrivals X

Observed Frequencies

f

Expected Frequencies

nP(X)

(fo - fe)2

fe

01234

5

7 8.4218 19.3725 22.2817 17.0812 9.825 7.04

84 84.00

0.240.100.330.000.480.591.74

LO1

Page 17: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

• The observed chi-square value of 1.74 is less than the critical value of 9.4877.

• The decision is not to reject the null hypothesis. The data does not provide enough evidence to indicate that the distribution of bank arrivals is Poisson.

Calculations for Demonstration Problem 16.2

LO1

Page 18: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

Calculations for Demonstration Problem 16.2

LO1

Page 19: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

• Used to analyze the frequencies of two variables with multiple categories to determine whether the two variables are independent.

2 Test of Independence

Qualitative VariablesNominal Data

LO2

Page 20: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

2 Test of Independence: Investment Example

• Where do you reside?A. Large town B. Medium town C. Small town D.

Rural area

• Which type of financial investment are you most likely to make today?

E. Stocks F. Bonds G. Treasury bills

Type of financialInvestment

E F GA O13 nA

Geographic B nB

Region C nC

D nD

nE nF nG N

Contingency Table

LO2

Page 21: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

2 Test of Independence: Investment Example

Type of Financial Investment

E F GA e12 nA

Geographic B nB

Region C nC

D nD

nE nF nG N

Contingency Table

If A and F are independent,P A F P A P F

P AN

P FN

P A FN N

A F

A F

n n

n n

AF

A F

A F

en n

n n

N P A F

NN N

N

LO2

Page 22: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

2 Test of Independence: Formulas

ij

i j

en n

Nwhere

: i = the rowj = the columnn

the total of row i

the total of column j

N = the total of all frequencies

i

j

nn

2

2

o e

where

f ff e

: df = (r - 1)(c - 1) r = the number of rowsc = the number of columns

ExpectedFrequencies

Calculated

(Observed )

LO2

Page 23: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

2 Test of Independence: Gasoline Preference Versus Income Category

LO2

Page 24: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

Contingency Table for the Gas Consumer Example

LO2

Page 25: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

Gasoline Preference Versus Income Category: Expected Frequencies

Type of Gasoline

Income Regular PremiumExtra

PremiumLess than $30,000 (66.15) (24.46) (16.40)

85 16 6 107$30,000 to $49,999 (87.78) (32.46) (21.76)

102 27 13 142$50,000 to $99,000 (45.13) (16.69) (11.19)

36 22 15 73At least $100,000 (38.95) (14.40) (9.65)

15 23 25 63238 88 59 385

ij

i j

en n

e

e

e

N

11

12

13

107 238385

66 15107 88

38524 46107 59

38516 40

.

.

.

LO2

Page 26: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

Gasoline Preference Versus Income Category: 2 Calculation

LO2

Page 27: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

• The observed chi-square value of 70.78 is greater than the critical value of 16.8119.

• The decision is to reject the null hypothesis. The data does provide enough evidence to indicate that the type of gasoline preferred is not independent of income.

Gasoline Preference Versus Income Category

LO2

Page 28: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

Gasoline Preference Versus Income Category: 2 Calculation

LO2

Page 29: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

Gasoline Preference Versus Income Category: Minitab Output

LO2

Page 30: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

• Chi-square tests indicate whether two distributions are the same or are not. They do not tell you in what specific way they are different

• The chi-square test of independence indicates whether two variables are independent or not. But it does not tell you in which way they are dependent: it does not tell the nature of the relationship between the two variables

• Chi-square techniques are an outgrowth of the binomial distribution and the inferential techniques for analyzing population proportions

• Both the chi-square test of independence and the chi-square goodness-of –fit test require that expected values be greater than or equal to 5. If they are not, add adjacent rows or columns until all expected values are five or greater.

Important Points of Interests

LO2

Page 31: Chapter 16: Analysis of Categorical Data. LO1Use the chi-square goodness-of-fit test to analyze probabilities of multinomial distribution trials along

COPYRIGHT

Copyright © 2014 John Wiley & Sons Canada, Ltd. All rights reserved. Reproduction or translation of this work beyond that permitted by Access Copyright (The Canadian Copyright Licensing Agency) is unlawful. Requests for further information should be addressed to the Permissions Department, John Wiley & Sons Canada, Ltd. The purchaser may make back-up copies for his or her own use only and not for distribution or resale. The author and the publisher assume no responsibility for errors, omissions, or damages caused by the use of these programs or from the use of the information contained herein.