statistics 203 -...

34
Topics for Today Introduction to Non-parametric Significance tests 1 - Way Chi-square test 2 – Way Chi-square test Stat203 Page 1 of 34 Fall 2011 – Week 9 Lecture 3

Upload: dinhdung

Post on 04-Apr-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

Topics for Today

Introduction to Non-parametric Significance tests

1 - Way Chi-square test

2 – Way Chi-square test

Stat203 Page 1 of 23Fall 2011 – Week 9 Lecture 3

Page 2: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

Non-Parametric is a big word

But it makes life easier! (sometimes)

Recall, we had assumptions that were necessary to use the t-tests for comparing means and z-tests for comparing proportions.

We might need a non-parametric test if these assumptions (or conditions) are not met

There are also some scientific questions about nominal or ordinal data that can’t be answered using the t-tests or z-tests.

First off, a recap of the assumptions/conditions necessary for the t-tests and z-tests we’ve looked at already

Stat203 Page 2 of 23Fall 2011 – Week 9 Lecture 3

Page 3: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

Assumptions: t-tests for means

Hypotheses involving means of interval or ratio level data.

Assumptions required to use a t-tests for one or more samples:

- Test variable(s) normally distributed, or the sample(s) large enough (ie: > 50) so that the sampling distribution of the mean is normally distributed

- Interval or Ratio level data

Stat203 Page 3 of 23Fall 2011 – Week 9 Lecture 3

Page 4: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

Assumptions: z-tests for proportions

Hypotheses involving 1 or 2 proportions.

Assumptions required to use a z-test:- n p0>10 and n(1−p¿¿0)>10¿(1-sample)- n1>10 & n2>10 and n1 p1>5 & n2 p2>5 (2-samples)

Stat203 Page 4 of 23Fall 2011 – Week 9 Lecture 3

Page 5: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

1-Way Chi-Square Test

The objective of this test is to determine how similar is an observed set of frequencies (or relative frequencies), fo, to an expected set of frequencies, fe.

A typical research hypothesis would indicate that individuals are more (or less) likely than expected to select some categories more than others.

… and the most common research hypothesis is that the relative-frequency of responses is similar for all categories.

Which has the following statistical hypotheses:

H0: fo = feHa: fo ≠ fe

Stat203 Page 5 of 23Fall 2011 – Week 9 Lecture 3

Page 6: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

… but what are fo and fe?

We’ve seen fo before, it’s just the observed frequency (or relative frquency) for each category of a nominal or ordinal variable!

The new item is fe … think of this as some expected relative frequency, or %. What does that mean?

- What is the expected relative frequency of men and women going into a mens room?

- What is the expected relative frequency of heads and tails out of 100 flips of a fair coin?

- In Vancouver, what is the expected relative frequency of raining and sunny days?

Stat203 Page 6 of 23Fall 2011 – Week 9 Lecture 3

Page 7: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

The Chi-Square Test Statistic

Here’s the formula that we’ll need to calculate the Chi-square test statistic:

χ2=∑ ( f o−f e )2

f e

…so it’s a bit more complicated than the t-statistic for means and the z-statistic for proportions.

Once we calculate this, we then look up the value in Table E. As with the t-distribution, though, we need a ‘degrees of freedom’ for this test statistic.

Differently from the t-test, the degrees of freedom for the Chi-square is the number of categories (k) minus 1.Example (1-way Chi-Square test): To determine whether dogs are color blind, a

Stat203 Page 7 of 23Fall 2011 – Week 9 Lecture 3

Page 8: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

student sets up an experiment where she provides food to a dog in 4 differently coloured dishes and records the colour of the dish the dog chooses to eat from first. She does this for a total of 80 dogs, randomly ordering the dishes each time. If dogs are truly colour blind, each colour dish should be selected about the same number of times.

The Chi-Square test allows us to formally test this research hypothesis.

Research Hypothesis:

Individuals:

Population:

Variable:

Parameter:Stat203 Page 8 of 23Fall 2011 – Week 9 Lecture 3

Page 9: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

Statistical Hypotheses:

The observed frequency of each colour from the 80 dogs is below, as is the expected frequency if the dogs were colour blind:

Colour fo fe

Brown 25 20Orange 18 20Yellow 19 20Green 18 20

And we have, N = 80k = 4 (# of categories)

Now, let’s calculate our test statistic:

Stat203 Page 9 of 23Fall 2011 – Week 9 Lecture 3

Page 10: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

Stat203 Page 10 of 23Fall 2011 – Week 9 Lecture 3

Page 11: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

p-value:

Reject H0 at α = 0.05?

Conclusion:

Stat203 Page 11 of 23Fall 2011 – Week 9 Lecture 3

Page 12: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

2-Way Chi-Square Test

We used the 1-way test, to determine whether the observed relative frequency distribution was different than some ‘expected’ distribution.

note the similarity to a 1-sample test for a mean or proportion where we are testing whether the mean or proportion is different than some ‘null’ value

We can use a 2-way test to determine whether relative frequency distributions from two samples are the same or different from one another.

note the similarity to a 2-sample test for means or proportions where we are testing whether the means or proportions are different from one another.

Stat203 Page 12 of 23Fall 2011 – Week 9 Lecture 3

Page 13: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

Research questions that require a 2-way chi-square test, are based on relative frequencies (like the 1-way test), but compare two populations (or samples).

- Is the relative frequency of sunny days in a year different between Vancouver and Seattle?

- Is the relative frequency of female students different between UBC and SFU?

- Is the relative frequency of job type (white vs blue vs service) the same for women and men?

Stat203 Page 13 of 23Fall 2011 – Week 9 Lecture 3

Page 14: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

Example (Q24, pg 338): A radio executive considering a switch in his station’s format collects data on the radio preferences of various age groups of 78 listeners. Does radio format preference differ by age group?

Research Hypothesis:

Individuals:

Populations:

Variables:

Parameters:

Statistical Hypotheses:

Stat203 Page 14 of 23Fall 2011 – Week 9 Lecture 3

Page 15: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

The observed frequency (fo) of age group and radio format preference is below:

Age GroupFormat Youn

g Adult

Middle Age

Older Adult

Total

Music 14 10 3 27News-talk 4 15 11 30Sports 7 9 5 21Total 25 34 19 78

And we have, N = 78k = 9 (# of categories)

but … we need fe! Here’s the formula for each cell, with row and column totals associated with that cell:

f e=(columntotal )∗(row total)

grand total

Stat203 Page 15 of 23Fall 2011 – Week 9 Lecture 3

Page 16: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

So, the table with fe is:

Age Group

Format Young Adult

Middle Age

Older Adult

Total

Music 25*27/78 34*27/78 19*27/78 27News-talk 25*30/78 34*30/78 19*30/78 30Sports 25*21/78 34*21/78 19*21/78 21Total 25 34 19 78

Which … after you do the arithmetic you get:

Age Group

Format Young Adult

Middle Age

Older Adult

Total

Music 8.7 11.8 6.6 27News-talk 9.6 13.1 7.3 30Sports 6.7 9.2 5.1 21Total 25 34 19 78

Stat203 Page 16 of 23Fall 2011 – Week 9 Lecture 3

Page 17: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

… now that we have all the components, we can calculate our test statistic (try the arithmetic on your own):

χ2=∑ ( f o−f e )2

f e=28.18.7 +

3.211.8+

13.06.6 +¿

31.49.6 + 3.613.1

+13.77.3

+¿

0.096.7 + 0.049.2

+ 0.015.1

=10.9

p-value:

Stat203 Page 17 of 23Fall 2011 – Week 9 Lecture 3

Page 18: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

Reject H0 at α = 0.05?

Conclusion:

Stat203 Page 18 of 23Fall 2011 – Week 9 Lecture 3

Page 19: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

One snag …

There is still an assumption necessary for us to be able to use any of the Chi-Square tests.

All the cells in the table (ie: the frequency for all categories) must be at least 5.

Stat203 Page 19 of 23Fall 2011 – Week 9 Lecture 3

Page 20: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

Other names for Chi-square tests:

1-Way:oOne-sample Chi-squareoChi-square goodness of fit

2-Wayo2-sample Chi-squareo2x2 Chi-squareor by c Chi-squareoChi-square test for independence

Nice page describing how to do Chi-Square tests in SPSS.

http://academic.uofs.edu/department/psych/methods/cannon99/level2d.html

Stat203 Page 20 of 23Fall 2011 – Week 9 Lecture 3

Page 21: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

So, a decision tree for choosing hypothesis tests:

Single sample?

- Interval or Ratio data?oOne-sample t-test

- Nominal or Ordinal data?oProportion (ie: 2 categories)?

1-sample z-test for proportionsoDistribution (ie: several categories)?

1-way Chi-square

Stat203 Page 21 of 23Fall 2011 – Week 9 Lecture 3

Page 22: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

Two samples?- Interval or Ratio data?o Individuals measured twice/Matched?

Paired t-testoVariances equal?

2-sample t-test w/equal variancesoVariances not equal?

2-sample t-test w/unequal variances

- Nominal or Ordinal data?oProportion (ie: 2 categories) &

conditions met? 2-sample z-test for proportions

oDistribution (ie: several categories)? 2-way Chi-square

Stat203 Page 22 of 23Fall 2011 – Week 9 Lecture 3

Page 23: Statistics 203 - people.stat.sfu.capeople.stat.sfu.ca/.../teaching/Stat203/Fall2011/Stat203_W9L3.docx · Web viewWe might need a non-parametric test if these assumptions (or conditions)

Today’s Topics

Chi – Square tests- for comparing distributions of nominal or

ordinal data- 1-way compare distribution in a single

sample to some expected distribution- 2-way compare distributions for two

populations

New Reading

Chapter 10 up to pg 352

Stat203 Page 23 of 23Fall 2011 – Week 9 Lecture 3