Analysis of variance (ANOVA)
Comparing the means of more than two groups
Example: Cost of mating in male fruit flies
Drosophila
Treatments: place males with and without unmated
(virgin) females
Five treatments ("groups"), randomly assigned
Response variable: longevity (days) The data (n = 25 males per group):
GROUPS: 1preg 46 42 65 46 58 42 48 58 50 80 63 65 70 70 72 97 46 56 70 70 72 76 90 76 92 1virgin 21 40 44 54 36 40 56 60 48 53 60 60 65 68 60 81 81 48 48 56 68 75 81 48 68 8preg 35 37 49 46 63 39 46 56 63 65 56 65 70 63 65 70 77 81 86 70 70 77 77 81 77 8virgin 16 19 19 32 33 33 30 42 42 33 26 30 40 54 34 34 47 47 42 47 54 54 56 60 44 none 40 37 44 47 47 47 68 47 54 61 71 75 89 58 59 62 79 96 58 62 70 72 75 96 75
Results: histograms of male longevity n = 25 males per group
Descriptive statistics
Treatment iY median si ni 1preg 64.8 65 15.6 25 1virgin 56.8 56 14.9 25 8preg 63.4 65 14.5 25 8virgin 38.7 40 12.1 25 none 63.6 62 16.5 25
Hypothesis testing with ANOVA Like a two-sample t-test, but to test differences between means of more than two groups H0: μ1 = μ2 = μ3 = … = μk HA: At least one of the μi is different
Hypothesis testing with ANOVA
Rejecting H0 in ANOVA indicates that at least one of the means μi is different from the others.
Scenarios being compared
Y
Frequency
μ1 = μ2 = μ3
Y
Frequency
Not all μ's equal
Why we compare variances Under H0, the variance among group means should be
,/22 nY σσ =
or
.22 σσ =Yn
• 2Yσ is the variance (squared SD) between group means iY
• n is the sample size within groups • σ 2 is the variance in Y within each group
If the null hypothesis is not true, the variance among groups should be larger than this
Variance within groups ("pooled sample variance") Error sum of squares
∑ −= )1(2error ii nsSS
Error degrees of freedom
kNndf ierror
−=
−= ∑ )1(
where N is the total number of data points in all groups
Mean Square Error (MSerror)
error
errorerror df
SSMS =
Variance among groups
Groups sum of squares SSgroups = ni(Y i − Y )2∑
where Y is the grand mean,
NYn
Y ii∑= .
Degrees of freedom for groups
dfgroups = k − 1.
Mean Square Groups (MSgroups)
groups
groups
dfSS
MS =groups.
F-ratio
error
groups
MSMS
F =
Under H0, MSgroups ≈ MSerror
(except for chance) So under H0,
F ≈ 1 (except for chance)
F test Reject H0 is rejected if F is significantly larger than 1 Compare F with critical value of the F distribution,
F0.05(1),dfgroups ,dferror Reject H0 F ≥ F 0.05(1),dfgroups ,dferror
F distribution
ANOVA table for fruit fly experiment
Source Sum of Squares df Mean Squares F P-valueGroups 11939.28 4 2984.8200 13.61195 <0.001
Error 26313.52 120 219.2793 Total 38252.80 124
F0.05(1),4 ,120 = 2.447
Reject H0 Conclusion: Mean longevity not equal in all groups
Cost of mating in male fruit flies
Implication: sex is deadly (in males)
R2 ("R-squared")
R2 measures the fraction of the variation in Y that is "explained" by group differences.
Based on the fact that the total sum of squares is split into its two component parts, the sum of squares of error and the sum of squares of groups,
errorgroupstotal SSSSSS +=
total
groups2
SSSS
R =.
Fruit flies
312.038252.8011939.282
=
=R
Conclusion: 31% of the variation in longevity is "explained" by treatment
ANOVA vs two-sample t-test An ANOVA with k=2 is mathematically equivalent to a two-tailed, 2-sample t-test of H0: μ1 − μ2 = 0 HA: μ1 − μ2 ≠ 0
Assumptions of ANOVA
• The measurements in every group are a random sample from the corresponding population.
• The variable has a normal distribution in all k populations. • The variance is the same in all k population.
The robustness of ANOVA
• The ANOVA is fairly robust to deviations from the assumption of normality, particularly when sample size is large.
• This robustness stems from a property of sample means described by the Central Limit Theorem: the sampling distribution of means is approximately normal when sample size is large, even when the variable itself does not have a normal distribution.
• ANOVA is also fairly robust to departures from the assumption of equal variance in the k populations if the samples sizes are all equal or nearly so.
Strategies if assumptions of ANOVA not met and we can't rely on robustness property
• Transformations (e.g., log-transformation, square root transformation, and arcsine transformation).
• Nonparametric alternatives
Nonparametric alternatives to ANOVA: Kruskal-Wallis test
• Analogous to the Mann-Whitney U-test for more than
two groups. • Based on ranks • Test statistic H has an approximately χ2 distribution
H0: The population median is the same in all k groups. HA: At least one of the groups has a different median.
ANOVA: which means are different? H0: μ1 = μ2 = μ3 = … = μk HA: At least one of the μi is different (but which?)
ANOVA: which means are different? Two approaches to addressing this question:
1. Planned comparisons
2. Unplanned comparisons
Planned comparisons
One or a very small number of focal comparisons that were planned at the time the study was designed.
Unplanned comparisons A comprehensive search for differences between groups
What's so important about the difference?
1. Planned comparisons: • No need to correct for multiple comparisons • Very few planned comparisons permitted
2. Unplanned comparisons
• Correction for multiple comparisons required because of rising Type 1 error rates
Planned confidence interval for the difference between two means
Treatment iY median si ni 1preg 64.8 65 15.6 25 1virgin 56.8 56 14.9 25 8preg 63.4 65 14.5 25 8virgin 38.7 40 12.1 25 none 63.6 62 16.5 25
Planned confidence interval for the difference between two means
Formula for confidence interval similar to that for two-sample case:
kNjijikNji tSEYYtSEYY −− +−<−<−− ),2(05.0),2(05.0 )()( μμ
except that we calculate SE using MSerror instead of the pooled sample variance for just the two samples
SE = MSerror
1
ni
+1
n j
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟ .
and we use dferror = N − k for degrees of freedom
Planned confidence interval for the difference between two means
Treatment iY median si ni 1preg 64.8 65 15.6 25 1virgin 56.8 56 14.9 25 8preg 63.4 65 14.5 25 8virgin 38.7 40 12.1 25 none 63.6 62 16.5 25
)( ji YY − = 38.7 − 63.4 = −24.64
SE = 4.188, df = 120, t0.05(2),120 = 1.98 95% confidence interval:
35.1693.32 −<−<− ji μμ
Example of unplanned comparisons The Tukey-Kramer method for testing
all pairs of means
Comparison )( ji YY − SE q critical q(0.05) conclusion
1preg 8virgin 26.08 4.188 6.227 2.770 reject 1preg 1virgin 8.04 4.188 1.920 2.770 do not reject 1preg 8preg 1.44 4.188 0.344 2.770 do not reject 1preg none 1.24 4.188 0.296 2.770 do not reject none 8virgin 24.84 4.188 5.931 2.770 reject none 1virgin 6.80 4.188 1.624 2.770 do not reject none 8preg 0.20 4.188 0.048 2.770 do not reject 8preg 8virgin 24.64 4.188 5.883 2.770 reject 8preg 1virgin 6.60 4.188 1.576 2.770 do not reject
1virgin 8virgin 18.04 4.188 4.307 2.770 reject
8virgin 1virgin 8preg none 1preg38.72 56.76 63.36 63.56 64.80
The Tukey-Kramer method for testing all pairs of means
With the Tukey-Kramer method, the probability of making at least one Type 1 error throughout the
course of testing all pairs of means is no greater than the significance level α.
Fixed vs random effects ANOVA
1. Fixed effects: With fixed effects, the treatments are chosen by the experimenter. They are repeatable and of direct interest.
2. Random effects: With random effects, the treatments are a random sample from a "population" of treatments.
For single-factor ANOVA, there is no difference in the F-test of the null and alternative hypotheses
Example of random effects ANOVA: Flycatcher patch height in 2 different years
Patch height measurements
Bird Height 1 (mm) Height 2 (mm) Bird Height 1 (mm) Height 2 (mm)1 10.5 9.3 16 8.1 7.0 2 10.6 9.2 17 8.1 6.8 3 8.7 9.3 18 7.4 6.7 4 8.6 9.1 19 6.7 6.8 5 9.0 9.0 20 6.3 6.7 6 9.3 8.7 21 6.8 6.4 7 8.9 8.7 22 6.7 6.2 8 9.0 7.6 23 6.1 5.8 9 7.9 8.0 24 6.5 5.6
10 7.6 7.8 25 6.7 5.3 11 7.6 8.1 26 6.5 5.0 12 7.7 8.1 27 5.7 5.1 13 6.8 7.9 28 6.7 7.8 14 7.2 7.5 29 6.6 7.6 15 7.8 7.1 30 6.6 7.7
Random effects ANOVA
• Each bird is a "group", randomly sampled from a population of groups (birds)
• n = 2 measurements per group • We don't care so much about the individual birds, they are
just a sample • What we really want to know is whether the trait varies in
the population (and by how much) H0: Patch height does not vary in the population HA: Patch height varies in the population
Random effects ANOVA
Source Sum of Squares df Mean Squares F P-valueGroups 82.494 29 2.844621 7.946 <0.001
Error 10.740 30 0.358000 Total 93.234 59
F0.05(1),29 ,30 = 1.847
Reject H0 Conclusion: patch height varies in the population
Repeatability
Repeatability is the fraction of the variance in the trait that is among groups
,ityRepeatabilerror
2
2
MSss
A
A
+=
where the variance among groups is
.errorgroups2
nMSMS
sA
−=
Repeatability of patch height
.24331.12
0.3580002.8446212
=
−=As
yielding
.884.00.35800024331.1
24331.1ityRepeatabil
=+
=
Repeatability of patch height
Conclusion: 88% of the variance in patch height in the population is among birds, whereas the remaining 12% is variance from measurement to measurement on the same birds.