sociology 5811: lecture 14: anova 2 copyright © 2005 by evan schofer do not copy or distribute...

23
Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Upload: dylan-mclaughlin

Post on 31-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Sociology 5811:Lecture 14: ANOVA 2

Copyright © 2005 by Evan Schofer

Do not copy or distribute without permission

Page 2: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Announcements

• Midterm next class• Bring a Calculator

• Bring Pencil/Eraser

• Today:• Wrap up ANOVA

• Midterm review activities

Page 3: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Review: ANOVA

• ANOVA = “ANalysis Of VAriance”• “Oneway ANOVA” : The simplest form

• ANOVA lets us test to see if any group mean differs from the mean of all groups combined

• Answers: “Are all groups equal or not?”

• H0: All groups have the same population mean– 1 = 2 = 3 = 4

• H1: One or more groups differ

• But, doesn’t distinguish which specific group(s) differ.

Page 4: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

ANOVA: Concepts & Definitions

• The grand mean is the mean of all groups• ex: mean of all entry-level workers = $8.70/hour

• The group mean is the mean of a particular sub-group of the population

• The effect () of a group is the difference between that group’s mean from the grand mean

• The dependent variable is our variable of interest• We explore whether its value “depends” on a case’s group

Page 5: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Review: Sum of Squared Deviation• The total deviation can partitioned into j and eij

components:

• The total variance (SStotal) is made up of:

– j: between group variance (SSbetween)

– eij : within group variance (SSwithin)

– SStotal = SSbetween + SSwithin

Page 6: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Review: Sum of Squared Variance

• The sum of squares grows as N gets larger.• To derive a more comparable measure, we “average” it, just

as with the variance: i.e, (divide) by N-1

• It is desirable, for similar reasons, to “average” the Sum of Squares between/within

• Result the “Mean Square” variance

– MSbetween and MSwithin

Page 7: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Sum of Squared Variance

• Choosing relevant denominators we get:

1

)(

MS

2

1Between

J

YYnJ

jjj

JN

YYJ

j

n

ijij

j

1 1

2

Within

)(

MS

Page 8: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Mean Squares and Group DifferencesMSbetween > MSwithin:

MSbetween < MSwithin:

Page 9: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Mean Squares and Group Differences

• Question: Which suggests that group means are quite different:

– MSbetween > MSwithin or MSbetween < MSwithin

• Answer: If between group variance is greater than within, the groups are quite distinct

• It is unlikely that they came from a population with the same mean

• But, if within is greater than between, the groups aren’t very different – they overlap a lot

• It is plausible that 1 = 2 = 3 = 4

Page 10: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

The F Ratio

• The ratio of MSbetween to MSwithin is referred to

as the F ratio:

• If MSbetween > MSwithin then F > 1

• If MSbetween < MSwithin then F < 1

• Higher F indicates that groups are more separate

Within

BetweenJNJ MS

MS ,1F

Page 11: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

The F Ratio

• The F ratio has a sampling distribution• That is, estimates of F vary depending on exactly which

sample you draw

• Again, this sampling distribution has known properties that can be looked up in a table

• The “F-distribution”– Different from z & t!

• Statisticians have determined how much area falls under the curve for a given value of F…

• So, we can test hypotheses.

Page 12: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

The F Ratio

• Assumptions required for hypothesis testing using an F-statistic

• 1. J groups are drawn from a normally distributed population

• 2. Population variances of groups are equal• If these assumptions hold, the F statistic can be looked up in

an F-distribution table– Much like T distributions

• But, there are 2 degrees of freedom: J-1 and N-J– One for number of groups, one for N.

Page 13: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

The F Ratio

• Example: Looking for wage discrimination within a firm

• The company has workers of three ethnic groups:• Whites, African-Americans, Asian-Americans

• You observe in a sample of 200 employees:• Y-barWhite = $8.78 / hour

• Y-barAfAm = $8.52 / hour

• Y-barAsianAm = $8.91 / hour

Page 14: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

The F Ratio

• Suppose you calculate the following from your sample:

• F = 6.24

• Recall that N = 200, J = 3• Degrees of Freedom: J-1 = 2, N-J = 197

• If = .05, the critical F value for 2, 197 is 3.00• See Knoke, p. 514

• The observed F easily exceeds the critical value• Thus, we can reject H0

• We can conclude that the groups do not all have the same population mean.

Page 15: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

The F Ratio: Visually

• Rough sketch of F-distribution• N = 200, J = 3; Degrees of Freedom: J-1 = 2, N-J = 197

• If = .05, the critical F value for 2, 197 is 3.00

0 1 2 3 4 5

Critical a area:

Reject H0

Page 16: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Post-Hoc Tests

• Limitation: ANOVA doesn’t tell you which group(s) differ from each other

• Solution: “Post-Hoc Tests”: • e.g., Bonferroni, Scheffe, Tukey tests

• Tests provide pair-wise comparison of all group

– Ex: Bonferroni• Corrects (reduces) for pair-wise tests, to avoid Type I error

– Which one should you use?• Bonferroni & Scheffe are very conservative

• For most purposes, they all produce identical results

• Consult an advanced stats text to learn the subtle differences.

Page 17: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

ANOVA: Example

• Example: GSS data: Schooling & Race• Oneway ANOVA, plus descriptives and Scheffe Post-hoc test

• Note: N = 2752, grand mean = 13.36

Descriptives

HIGHEST YEAR OF SCHOOL COMPLETED

2170 13.52 2.934 .063 13.40 13.64

398 12.47 2.882 .144 12.19 12.76

184 13.42 3.307 .244 12.94 13.90

2752 13.36 2.974 .057 13.25 13.47

White

Black

Other Race

Total

N Mean Std. Deviation Std. Error Lower Bound Upper Bound

95% Confidence Interval forMean

Page 18: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

ANOVA: Example

• Example: GSS data: Schooling & RaceANOVA

HIGHEST YEAR OF SCHOOL COMPLETED

368.803 2 184.402 21.154 .000

23963.551 2749 8.717

24332.354 2751

Between Groups

Within Groups

Total

Sum ofSquares df Mean Square F Sig.

Sum of Squares: Between, within,

and total

Mean square variance:

between and within

Compare F-value to “critical f” in F-table. Or compare p-value

(“Sig.”) to

We reject H0 because F is big, p-value is small.

Page 19: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

ANOVA Example: Scheffe TestMultiple Comparisons

Dependent Variable: HIGHEST YEAR OF SCHOOL COMPLETED

Scheffe

1.05* .161 .000 .65 1.44

.10 .227 .902 -.45 .66

-1.05* .161 .000 -1.44 -.65

-.94* .263 .002 -1.59 -.30

-.10 .227 .902 -.66 .45

.94* .263 .002 .30 1.59

(J) RACEBlack

Other Race

White

Other Race

White

Black

(I) RACEWhite

Black

Other Race

MeanDifference

(I-J) Std. Error Sig. Lower Bound Upper Bound

95% Confidence Interval

The mean difference is significant at the .05 level.*.

Tests compare every group

Compare p-value (“Sig.”) to

Note: White/Black difference is significant; White/Other difference is not

Page 20: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Comparison with T-Test

• T-test strategy: Determine the width of the sampling distribution of the difference in means…

• Use that info to assess probability that groups have same mean (difference in means = 0)

• ANOVA strategy• Compute F-ratio, which indicates what kind of deviation is

larger: “between” vs. “within” group

• High F-value indicates groups are separate

• Note: For two groups, ANOVA and T-test produce identical results.

Page 21: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Bivariate Analyses

• Up until now, we have focused on a single variable: Y

• Even in T-test for difference in means & ANOVA, we just talked about Y – but for multiple groups…

• Alternately, we can think of these as simple bivariate analyses

• Where group type is a “variable”

• Ex: Seeing if girls differ from boys on a test…

• … is equivalent to examining whether gender (a first variable) affects test score (a second variable).

Page 22: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

2 Groups = Bivariate AnalysisGroup 1: Boys

Case Score

1 57

2 64

3 48

Case Gender Score

1 0 57

2 0 64

3 0 48

4 1 53

5 1 87

6 1 73

Group 2: Girls

Case Score

1 53

2 87

3 73

2 Groups = Bivariate analysis of Gender

and Test Score

Page 23: Sociology 5811: Lecture 14: ANOVA 2 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

T-test, ANOVA, and Regression

• Both T-test and ANOVA illustrate fundamental concepts needed to understand “Regression”

• Relevant ANOVA concepts• The idea of a “model”

• Partitioning variance

• A dependent variable

• Relevant T-test concepts• Using the t-distribution for hypothesis tests

• Note: For many applications, regression will supersede T-test, ANOVA

• But in some cases, they are still useful…