sociology 5811: lecture 14: anova 2 copyright © 2005 by evan schofer do not copy or distribute...
TRANSCRIPT
Sociology 5811:Lecture 14: ANOVA 2
Copyright © 2005 by Evan Schofer
Do not copy or distribute without permission
Announcements
• Midterm next class• Bring a Calculator
• Bring Pencil/Eraser
• Today:• Wrap up ANOVA
• Midterm review activities
Review: ANOVA
• ANOVA = “ANalysis Of VAriance”• “Oneway ANOVA” : The simplest form
• ANOVA lets us test to see if any group mean differs from the mean of all groups combined
• Answers: “Are all groups equal or not?”
• H0: All groups have the same population mean– 1 = 2 = 3 = 4
• H1: One or more groups differ
• But, doesn’t distinguish which specific group(s) differ.
ANOVA: Concepts & Definitions
• The grand mean is the mean of all groups• ex: mean of all entry-level workers = $8.70/hour
• The group mean is the mean of a particular sub-group of the population
• The effect () of a group is the difference between that group’s mean from the grand mean
• The dependent variable is our variable of interest• We explore whether its value “depends” on a case’s group
Review: Sum of Squared Deviation• The total deviation can partitioned into j and eij
components:
• The total variance (SStotal) is made up of:
– j: between group variance (SSbetween)
– eij : within group variance (SSwithin)
– SStotal = SSbetween + SSwithin
Review: Sum of Squared Variance
• The sum of squares grows as N gets larger.• To derive a more comparable measure, we “average” it, just
as with the variance: i.e, (divide) by N-1
• It is desirable, for similar reasons, to “average” the Sum of Squares between/within
• Result the “Mean Square” variance
– MSbetween and MSwithin
Sum of Squared Variance
• Choosing relevant denominators we get:
1
)(
MS
2
1Between
J
YYnJ
jjj
JN
YYJ
j
n
ijij
j
1 1
2
Within
)(
MS
Mean Squares and Group DifferencesMSbetween > MSwithin:
MSbetween < MSwithin:
Mean Squares and Group Differences
• Question: Which suggests that group means are quite different:
– MSbetween > MSwithin or MSbetween < MSwithin
• Answer: If between group variance is greater than within, the groups are quite distinct
• It is unlikely that they came from a population with the same mean
• But, if within is greater than between, the groups aren’t very different – they overlap a lot
• It is plausible that 1 = 2 = 3 = 4
The F Ratio
• The ratio of MSbetween to MSwithin is referred to
as the F ratio:
• If MSbetween > MSwithin then F > 1
• If MSbetween < MSwithin then F < 1
• Higher F indicates that groups are more separate
Within
BetweenJNJ MS
MS ,1F
The F Ratio
• The F ratio has a sampling distribution• That is, estimates of F vary depending on exactly which
sample you draw
• Again, this sampling distribution has known properties that can be looked up in a table
• The “F-distribution”– Different from z & t!
• Statisticians have determined how much area falls under the curve for a given value of F…
• So, we can test hypotheses.
The F Ratio
• Assumptions required for hypothesis testing using an F-statistic
• 1. J groups are drawn from a normally distributed population
• 2. Population variances of groups are equal• If these assumptions hold, the F statistic can be looked up in
an F-distribution table– Much like T distributions
• But, there are 2 degrees of freedom: J-1 and N-J– One for number of groups, one for N.
The F Ratio
• Example: Looking for wage discrimination within a firm
• The company has workers of three ethnic groups:• Whites, African-Americans, Asian-Americans
• You observe in a sample of 200 employees:• Y-barWhite = $8.78 / hour
• Y-barAfAm = $8.52 / hour
• Y-barAsianAm = $8.91 / hour
The F Ratio
• Suppose you calculate the following from your sample:
• F = 6.24
• Recall that N = 200, J = 3• Degrees of Freedom: J-1 = 2, N-J = 197
• If = .05, the critical F value for 2, 197 is 3.00• See Knoke, p. 514
• The observed F easily exceeds the critical value• Thus, we can reject H0
• We can conclude that the groups do not all have the same population mean.
The F Ratio: Visually
• Rough sketch of F-distribution• N = 200, J = 3; Degrees of Freedom: J-1 = 2, N-J = 197
• If = .05, the critical F value for 2, 197 is 3.00
0 1 2 3 4 5
Critical a area:
Reject H0
Post-Hoc Tests
• Limitation: ANOVA doesn’t tell you which group(s) differ from each other
• Solution: “Post-Hoc Tests”: • e.g., Bonferroni, Scheffe, Tukey tests
• Tests provide pair-wise comparison of all group
– Ex: Bonferroni• Corrects (reduces) for pair-wise tests, to avoid Type I error
– Which one should you use?• Bonferroni & Scheffe are very conservative
• For most purposes, they all produce identical results
• Consult an advanced stats text to learn the subtle differences.
ANOVA: Example
• Example: GSS data: Schooling & Race• Oneway ANOVA, plus descriptives and Scheffe Post-hoc test
• Note: N = 2752, grand mean = 13.36
Descriptives
HIGHEST YEAR OF SCHOOL COMPLETED
2170 13.52 2.934 .063 13.40 13.64
398 12.47 2.882 .144 12.19 12.76
184 13.42 3.307 .244 12.94 13.90
2752 13.36 2.974 .057 13.25 13.47
White
Black
Other Race
Total
N Mean Std. Deviation Std. Error Lower Bound Upper Bound
95% Confidence Interval forMean
ANOVA: Example
• Example: GSS data: Schooling & RaceANOVA
HIGHEST YEAR OF SCHOOL COMPLETED
368.803 2 184.402 21.154 .000
23963.551 2749 8.717
24332.354 2751
Between Groups
Within Groups
Total
Sum ofSquares df Mean Square F Sig.
Sum of Squares: Between, within,
and total
Mean square variance:
between and within
Compare F-value to “critical f” in F-table. Or compare p-value
(“Sig.”) to
We reject H0 because F is big, p-value is small.
ANOVA Example: Scheffe TestMultiple Comparisons
Dependent Variable: HIGHEST YEAR OF SCHOOL COMPLETED
Scheffe
1.05* .161 .000 .65 1.44
.10 .227 .902 -.45 .66
-1.05* .161 .000 -1.44 -.65
-.94* .263 .002 -1.59 -.30
-.10 .227 .902 -.66 .45
.94* .263 .002 .30 1.59
(J) RACEBlack
Other Race
White
Other Race
White
Black
(I) RACEWhite
Black
Other Race
MeanDifference
(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
The mean difference is significant at the .05 level.*.
Tests compare every group
Compare p-value (“Sig.”) to
Note: White/Black difference is significant; White/Other difference is not
Comparison with T-Test
• T-test strategy: Determine the width of the sampling distribution of the difference in means…
• Use that info to assess probability that groups have same mean (difference in means = 0)
• ANOVA strategy• Compute F-ratio, which indicates what kind of deviation is
larger: “between” vs. “within” group
• High F-value indicates groups are separate
• Note: For two groups, ANOVA and T-test produce identical results.
Bivariate Analyses
• Up until now, we have focused on a single variable: Y
• Even in T-test for difference in means & ANOVA, we just talked about Y – but for multiple groups…
• Alternately, we can think of these as simple bivariate analyses
• Where group type is a “variable”
• Ex: Seeing if girls differ from boys on a test…
• … is equivalent to examining whether gender (a first variable) affects test score (a second variable).
2 Groups = Bivariate AnalysisGroup 1: Boys
Case Score
1 57
2 64
3 48
Case Gender Score
1 0 57
2 0 64
3 0 48
4 1 53
5 1 87
6 1 73
Group 2: Girls
Case Score
1 53
2 87
3 73
2 Groups = Bivariate analysis of Gender
and Test Score
T-test, ANOVA, and Regression
• Both T-test and ANOVA illustrate fundamental concepts needed to understand “Regression”
• Relevant ANOVA concepts• The idea of a “model”
• Partitioning variance
• A dependent variable
• Relevant T-test concepts• Using the t-distribution for hypothesis tests
• Note: For many applications, regression will supersede T-test, ANOVA
• But in some cases, they are still useful…