one-way analysis of variance - school of statistics :...
TRANSCRIPT
One-Way Analysis of Variance
Nathaniel E. Helwig
Assistant Professor of Psychology and StatisticsUniversity of Minnesota (Twin Cities)
Updated 04-Jan-2017
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 1
Copyright
Copyright © 2017 by Nathaniel E. Helwig
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 2
Outline of Notes
1) Categorical Predictors:OverviewDummy codingEffect coding
2) One-Way ANOVA:Model form & assumptionsEstimation & basic inferenceMemory example (part 1)
3) Multiple Comparisons:Comparing meansBonferroni correctionTukey correctionScheffé correctionSummary of correctionsMemory example (part 2)
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 3
Categorical Predictors
Categorical Predictors
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 4
Categorical Predictors Overview
Categorical Variables (revisited)
Suppose that X ∈ {x1, . . . , xg} is a categorical variable with g levels.Categorical variables are also called factors in ANOVA contextExample: sex ∈ {female,male} has two levelsExample: drug ∈ {A,B,C} has three levels
To code a categorical variable (with g levels) in a regression model, weneed to include g − 1 different variables in the model.
If we know µ = 1g∑g
j=1 µj , where µj is mean for j-th factor level
Then we know µg = g(µ− 1g∑g−1
j=1 µj) by definition
Only g total free parameters, so cannot estimate µ and {µj}gj=1
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 5
Categorical Predictors Dummy Coding
Dummy Coding: Definition
Dummy coding uses g − 1 binary variables to code a factor:
xij =
{1 if i-th observation is in j-th level0 otherwise
for i ∈ {1, . . . ,nj} and j ∈ {1, . . . ,g − 1}.
Regression model becomes
yij = b0 +∑g−1
j=1 bjxij + eij
where b0 = µg and bj = µj − µg for j ∈ {1, . . . ,g − 1}.
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 6
Categorical Predictors Dummy Coding
Dummy Coding: Considerations
Dummy coding is useful for. . .One-way ANOVA model (unique parameter for each factor level)Comparing treatment groups to clearly defined “control” group
Dummy coding is less useful when. . .Have g > 2 levels and/or model is more complicatedDo NOT have a clearly defined “control” or “reference” group
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 7
Categorical Predictors Dummy Coding
Dummy Coding: R Syntax
The contrasts function controls the coding scheme for a factor.
Use the contr.treatment option for dummy coding.
> x = factor(rep(letters[1:3], each=5))> x[1] a a a a a b b b b b c c c c c
Levels: a b c> contrasts(x) <- contr.treatment(nlevels(x))> contrasts(x)2 3
a 0 0b 1 0c 0 1
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 8
Categorical Predictors Effect Coding
Effect Coding: Definition
Effect coding also uses g − 1 variables to code a factor:
xij =
1 if i-th observation is in j-th level−1 if i-th observation is in g-th level
0 otherwise
for i ∈ {1, . . . ,nj} and j ∈ {1, . . . ,g − 1}.
Regression model becomes
yij = b0 +∑g−1
j=1 bjxij + eij
where b0 = µ and bj = µj − µ for j ∈ {1, . . . ,g}; note bg = −∑g−1
j=1 bj .
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 9
Categorical Predictors Effect Coding
Effect Coding: Considerations
Effect coding is useful for. . .Simple interpretation of b0 as overall meanComparing each group’s effect to overall mean
Effect coding is less useful when. . .Have g = 2 levels for a factorDo have a clearly defined “control” or “reference” group
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 10
Categorical Predictors Effect Coding
Effect Coding: R Syntax
The contrasts function controls the coding scheme for a factor.
Use the contr.sum option for effect (deviation) coding.
> x = factor(rep(letters[1:3],each=5))> x[1] a a a a a b b b b b c c c c c
Levels: a b c> contrasts(x) <- contr.sum(nlevels(x))> contrasts(x)[,1] [,2]
a 1 0b 0 1c -1 -1
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 11
One-Way ANOVA Model
One-Way ANOVA Model
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 12
One-Way ANOVA Model Model Form and Assumptions
One-Way ANOVA Model (cell means form)
The One-Way Analysis of Variance (ANOVA) model has the form
yij = µj + eij
for i ∈ {1, . . . ,nj} and j ∈ {1, . . . ,g} whereyij ∈ R is real-valued response for i-th subject in j-th factor levelµj ∈ R is real-valued population mean for the j-th factor level
eijiid∼ N(0, σ2) is a Gaussian error term
nj is number of subjects in j-th factor level and n =∑g
j=1 nj
g is number of factor levels
Implies that yijind∼ N(µj , σ
2).
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 13
One-Way ANOVA Model Model Form and Assumptions
One-Way ANOVA Model (dummy coding)
Using dummy coding, the one-way ANOVA becomes
yij = b0 +
g−1∑j=1
bjxij + eij
for i ∈ {1, . . . ,nj} and j ∈ {1, . . . ,g} where
xij =
{1 if i-th observation is in j-th level0 otherwise
b0 = µg is reference group meanbj = µj − µg for j ∈ {1, . . . ,g − 1}
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 14
One-Way ANOVA Model Model Form and Assumptions
One-Way ANOVA Model (effect coding)
Using effect coding, the one-way ANOVA becomes
yij = b0 +
g−1∑j=1
bjxij + eij
for i ∈ {1, . . . ,nj} and j ∈ {1, . . . ,g} where
xij =
1 if i-th observation is in j-th level−1 if i-th observation is in g-th level
0 otherwiseb0 = µ is overall meanbj = µj − µ for j ∈ {1, . . . ,g}
Note that bg = −∑g−1
j=1 bj by definition
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 15
One-Way ANOVA Model Model Form and Assumptions
One-Way ANOVA Model (matrix form)
In matrix form, the one-way ANOVA model is
y = Xb + ey1y2y3...
yn
=
1 x11 x12 · · · x1(g−1)1 x21 x22 · · · x2(g−1)1 x31 x32 · · · x3(g−1)...
......
. . ....
1 xn1 xn2 · · · xn(g−1)
b0b1b2...
bg−1
+
e1e2e3...
en
where
definition of xij and {bj}g−1j=1 will depend on coding scheme
i ∈ {1, . . . ,n} and second subscript on y and e is dropped
Implies that y ∼ N(Xb, σ2In).
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 16
One-Way ANOVA Model Model Form and Assumptions
One-Way ANOVA Model (assumptions)
The fundamental assumptions of the one-way ANOVA model are:1 xij and yi are observed random variables (known constants)
2 eiiid∼ N(0, σ2) is an unobserved random variable
3 b0,b1, . . . ,bg−1 are unknown constants
4 (yi |xi1, . . . , xi(g−1))ind∼ N(b0 +
∑g−1j=1 bjxij , σ
2)note: homogeneity of variance
Interpretation of bj depends on coding schemeDummy: bj is difference between j-th mean and reference meanEffect: bj is difference between j-th mean and overall mean
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 17
One-Way ANOVA Model Estimation and Basic Inference
Ordinary Least Squares (cell means form)
We want to find the factor level mean estimates (i.e., µj terms) thatminimize the ordinary least squares criterion
SSE =
g∑j=1
nj∑i=1
(yij − µj)2
The least-squares estimates are the factor level means
µj = yj
where yj = 1nj
∑nji=1 yij is sample mean of Y for j-th factor level.
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 18
One-Way ANOVA Model Estimation and Basic Inference
Ordinary Least Squares (simple proof)
Note that we want to minimize
(SSE)j =
nj∑i=1
(yij − µj)2 =
nj∑i=1
y2ij − 2µj
nj∑i=1
yij + njµ2j
separately for each j ∈ {1, . . . ,g}.
Taking the derivative with respect to µj we have
d(SSE)j
dµj= −2
nj∑i=1
yij + 2njµj
and setting to zero and solving for µj gives µj = 1nj
∑nji=1 yij = yj
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 19
One-Way ANOVA Model Estimation and Basic Inference
Ordinary Least Squares (general case)
In general, we can use the regression approach
SSE =n∑
i=1
(yi − b0 −
∑g−1j=1 bjxij
)2= ‖y− Xb‖2
where i ∈ {1, . . . ,n} and n =∑g
j=1 nj ; note that the second subscripton Y is now dropped because there is only one summation.
The OLS solution has the form
b = (X′X)−1X′y
which is the same as the MLR model; note that ANOVA is MLR withcategorical predictors!
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 20
One-Way ANOVA Model Estimation and Basic Inference
Fitted Values and Residuals
SCALAR FORM:
Fitted values are given by
yi = b0 +∑g−1
j=1 bjxij
and residuals are given by
ei = yi − yi
MATRIX FORM:
Fitted values are given by
y = Xb
and residuals are given by
e = y− y
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 21
One-Way ANOVA Model Estimation and Basic Inference
ANOVA Sums-of-Squares: Scalar Form
In one-way ANOVA model, the relevant sums-of-squares areTotal: SST =
∑ni=1(yi − y)2 =
∑gj=1∑nj
i=1(yij − y)2
Between: SSB =∑n
i=1(yi − y)2 =∑g
j=1 nj(yj − y)2
Within: SSW =∑n
i=1(yi − yi)2 =
∑gj=1∑nj
i=1(yij − yj)2
The corresponding degrees of freedom areSST: dfT = n − 1SSB: dfB = g − 1SSW: dfW = n − g
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 22
One-Way ANOVA Model Estimation and Basic Inference
ANOVA Sums-of-Squares: Matrix Form
In MLR models, the relevant sums-of-squares are
SST =n∑
i=1
(yi − y)2
= y′ [In − (1/n)J] y
SSB =n∑
i=1
(yi − y)2
= y′ [H− (1/n)J] y
SSW =n∑
i=1
(yi − yi)2
= y′ [In − H] y
Note: H = X(X′X)−1X′ and J is an n × n matrix of onesNathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 23
One-Way ANOVA Model Estimation and Basic Inference
Partitioning the Variance (same as MLR model)
We can partition the total variation in yi as
SST =n∑
i=1
(yi − y)2
=n∑
i=1
(yi − yi + yi − y)2
=n∑
i=1
(yi − y)2 +n∑
i=1
(yi − yi )2 + 2
n∑i=1
(yi − y)(yi − yi )
= SSB + SSW + 2n∑
i=1
(yi − y)ei
= SSB + SSW
See MLR notes for the proof.
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 24
One-Way ANOVA Model Estimation and Basic Inference
Estimated Error Variance (Mean Squared Error)
An unbiased estimate of the error variance σ2 is
σ2 = SSW/(n − g)
=n∑
i=1
(yi − yi)2/(n − g)
=1
n − g
g∑j=1
(nj − 1)s2j
where s2j = 1
nj−1∑nj
i=1(yij − yj)2 is j-th factor level’s sample variance.
The estimate σ2 is the mean squared error (MSE) of the model, and isthe pooled estimate of sample variance.
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 25
One-Way ANOVA Model Estimation and Basic Inference
ANOVA Table and Overall F Test
We typically organize the SS information into an ANOVA table:
Source SS df MS F p-valueSSB
∑ni=1(yi − y)2 g − 1 MSB F ∗ p∗
SSW∑n
i=1(yi − yi)2 n − g MSW
SST∑n
i=1(yi − y)2 n − 1MSB = SSB
g−1 , MSW = SSWn−g , F ∗ = MSB
MSW ∼ Fg−1,n−g ,p∗ = P(Fg−1,n−g > F ∗)
F ∗-statistic and p∗-value are testing H0 : b1 = · · · = bg−1 = 0 versusH1 : bk 6= 0 for some k ∈ {1, . . . ,g − 1}
Equivalent to testing H0 : µj = µ ∀j versus H1 : not all µj are equal
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 26
One-Way ANOVA Model Memory Example (Part 1)
Memory Example: Data Description
Visual and auditory cues example from Hays (1994) Statistics.
Does lack of visual/auditory synchrony affect memory?
Total of n = 30 college students participate in memory experiment.Watch video of person reciting 50 wordsTry to remember the 50 words (record number correct)
Randomly assign nj = 10 subjects to one of g = 3 video conditions:fast: sound precedes lip movements in videonormal: sound synced with lip movements in videoslow: lip movements in video precede sound
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 27
One-Way ANOVA Model Memory Example (Part 1)
Memory Example: Descriptive Statistics
Number of correctly remembered words (yij ):
Subject (i) Fast (j = 1) Normal (j = 2) Slow (j = 3)1 23 27 232 22 28 243 18 33 214 15 19 255 29 25 196 30 29 247 23 36 228 16 30 179 19 26 20
10 17 21 23∑10i=1 yij 212 274 218∑10i=1 y2
ij 4738 7742 4810
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 28
One-Way ANOVA Model Memory Example (Part 1)
Memory Example: OLS Estimation (by hand)
The least-squares estimates of µj are the sample means:
µ1 = y1 =1
10
10∑i=1
yi1 = 212/10 = 21.2
µ2 = y2 =1
10
10∑i=1
yi2 = 274/10 = 27.4
µ3 = y3 =1
10
10∑i=1
yi3 = 218/10 = 21.8
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 29
One-Way ANOVA Model Memory Example (Part 1)
Memory Example: OLS Estimation (in R: by hand)
# define response and factor vectors> sync = c(23,27,23,22,28,24,18,33,21,15,
19,25,29,25,19,30,29,24,23,36,22,16,30,17,19,26,20,17,21,23)
> cond = factor(rep(c("fast","normal","slow"), 10))
# sum of sync for each level of cond> tapply(sync, cond, sum)fast normal slow212 274 218
# sum-of-squares of sync for each level of cond> sumsq = function(x){ sum(x^2) }> tapply(sync, cond, sumsq)fast normal slow4738 7742 4810
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 30
One-Way ANOVA Model Memory Example (Part 1)
Memory Example: OLS Estimation (in R: dummy pt. 1)
> smod = lm(sync ~ cond)> summary(smod)$coef
Estimate Std. Error t value Pr(>|t|)(Intercept) 21.2 1.408440 15.0521126 1.183308e-14condnormal 6.2 1.991835 3.1127073 4.350803e-03condslow 0.6 1.991835 0.3012297 7.655470e-01
Note that. . .b0 = y1 = 21.2 is mean of fast conditionb1 = y2 − y1 = 27.4− 21.2 = 6.2 is difference between means ofnormal and fast conditionsb2 = y3 − y1 = 21.8− 21.2 = 0.6 is difference between means ofslow and fast conditions
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 31
One-Way ANOVA Model Memory Example (Part 1)
Memory Example: OLS Estimation (in R: dummy pt. 2)> contrasts(cond)
normal slowfast 0 0normal 1 0slow 0 1> contrasts(cond) <- contr.treatment(3, base=2)> contrasts(cond)
1 3fast 1 0normal 0 0slow 0 1> smod = lm(sync ~ cond)> summary(smod)$coef
Estimate Std. Error t value Pr(>|t|)(Intercept) 27.4 1.408440 19.454146 2.052094e-17cond1 -6.2 1.991835 -3.112707 4.350803e-03cond3 -5.6 1.991835 -2.811478 9.072381e-03
Note that. . .b0 = y2 = 27.4 is mean of normal conditionb1 = y1 − y2 = 21.2− 27.4 = −6.2 is difference between meansof fast and normal conditionsb2 = y3 − y2 = 21.8− 27.4 = −5.6 is difference between meansof slow and normal conditions
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 32
One-Way ANOVA Model Memory Example (Part 1)
Memory Example: OLS Estimation (in R: effect)> contrasts(cond) <- contr.sum(3)> contrasts(cond)
[,1] [,2]fast 1 0normal 0 1slow -1 -1> smod = lm(sync ~ cond)> summary(smod)$coef
Estimate Std. Error t value Pr(>|t|)(Intercept) 23.466667 0.8131633 28.858492 7.900265e-22cond1 -2.266667 1.1499866 -1.971037 5.904989e-02cond2 3.933333 1.1499866 3.420330 2.003601e-03
Note that. . .b0 = y = 23.47 is grand mean (y = 212+274+218
30 = 70430 = 23.467)
b1 = y1 − y = 21.2− 23.467 = −2.266667 is difference betweenmean of fast condition and overall meanb2 = y2 − y = 27.4− 23.467 = 3.933333 is difference betweenmean of normal condition and overall meanImplicitly we have: b3 = −(b1 + b2) = −(3.933333− 2.266667) =y3 − y = 21.8− 23.467 = −1.67 is difference between mean ofslow condition and overall mean
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 33
One-Way ANOVA Model Memory Example (Part 1)
Memory Example: Sums-of-Squares (by hand)
Defining n =∑g
j=1 nj = 30, the relevant sums-of-squares are
SST =∑g
j=1∑nj
i=1(yij − y)2 =∑g
j=1∑nj
i=1 y2ij −
1n
(∑gj=1∑nj
i=1 yij
)2
= (4738 + 7742 + 4810)− (7042/30) = 769.4667
SSW =∑g
j=1∑nj
i=1(yij − yj)2 =
∑gj=1∑nj
i=1 y2ij −
∑gj=1
(∑nji=1 yij
)2
nj
= (4738 + 7742 + 4810)− ([2122 + 2742 + 2182]/10) = 535.6
SSB = SST − SSW = 769.4667− 535.6 = 233.8667
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 34
One-Way ANOVA Model Memory Example (Part 1)
Memory Example: ANOVA Table (by hand)
Putting sums-of-squares on previous slide into an ANOVA table:
Source SS df MS F p-valueSSB 233.8667 2 116.9333 5.8947 0.0075SSW 535.6000 27 19.8370SST 769.4667 29
Note that F ∗ = 5.8947 ∼ F2,27 and P(F2,27 > F ∗) = 0.0075.
Assuming a typical α level (e.g., α = 0.01 or α = 0.05), we wouldreject the null hypothesis H0 : µj = µ ∀j .
We conclude that there is some mean difference on the responsevariable (# of remembered words) between the different conditions.
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 35
One-Way ANOVA Model Memory Example (Part 1)
Memory Example: ANOVA Table (in R)
> sync = c(23,27,23,22,28,24,18,33,21,15,19,25,29,25,19,30,29,24,23,36,22,16,30,17,19,26,20,17,21,23)
> cond = factor(rep(c("fast","normal","slow"), 10))> smod = lm(sync ~ cond)> anova(smod)Analysis of Variance Table
Response: syncDf Sum Sq Mean Sq F value Pr(>F)
cond 2 233.87 116.933 5.8947 0.007513 **Residuals 27 535.60 19.837---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 36
Multiple Comparisons
Multiple Comparisons
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 37
Multiple Comparisons Comparing Means
Limitations of Overall F Test
If we reject H0 : µj = µ ∀j then we know that there are some meandifferences, but we do not know where the mean differences exist
This is assuming that g > 2Note that if g = 2 we have an independent samples t test
If g > 2 we need to perform followup tests (multiple comparisons) todetermine where the mean differences are occurring in the data.
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 38
Multiple Comparisons Comparing Means
Linear Combinations of Factor Level Means
A linear combination L of the factor level means has the form
L =∑g
j=1 cjµj
where cj are the coefficients defining the particular linear combination.
In practice we never know µj so we define
L =∑g
j=1 cj µj
where µj is our least-squares estimate of µj .
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 39
Multiple Comparisons Comparing Means
Testing Linear Combinations
Remember µj = yj which implies
V (µj) = V(
1nj
∑nji=1 yij
)=
1n2
jV(∑nj
i=1 yij
)=
∑nji=1 V (yij)
n2j
=njσ
2
n2j
=σ2
nj
is the variance of µj , given that V(yij)
= σ2 from the assumptions.
This implies that
V (L) = V(∑g
j=1 cj µj
)=∑g
j=1 c2j V (µj) = σ2∑g
j=1 c2j /nj
is the variance of any linear combination L.
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 40
Multiple Comparisons Comparing Means
Testing Linear Combinations (continued)
To test H0 : L = L∗ versus H1 : L 6= L∗ use
t∗ =L− L∗√
V (L)∼ tn−g
where V (L) = σ2∑gj=1 c2
j /nj uses the MSE to estimate σ2.
If population σ2 is known, then use
Z ∗ =L− L∗√
V (L)∼ N(0,1)
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 41
Multiple Comparisons Comparing Means
Contrasts and Pairwise Comparisons
A contrast is a linear combination of the factor level means such thatthe coefficients sum to zero, i.e.,
∑gj=1 cj = 0.
µ1 − µ2 is mean difference of first two levels(µ1 + µ2)/2− µ3 is mean of first two levels minus third level
A pairwise comparison is a contrast involving two factor level means:µ1 − µ2 is a pairwise comparison of first two levelsµ1 − µ3 is a pairwise comparison of first and third levelsµ2 − µ3 is a pairwise comparison of second and third levels
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 42
Multiple Comparisons Comparing Means
Multiple Comparison Problem
If we test multiple linear combinations of factor level means, we needto worry about the Familywise Type I Error Rate (FWER).
FWER is probability of making at least one Type I Error among alltested linear combinations.
Single test Type I Error = P(Reject H0 | H0 true) = α
For q independent tests with level α: FWER = 1− (1− α)q
Generally FWER will depend on number of tests and whether or nottests are independent of one another.
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 43
Multiple Comparisons Bonferroni Correction
Bonferroni’s Correction: Definition
Suppose we want to test f linear combinations of factor level means.
According to Boole’s inequality, for f tests with level α∗
FWER ≤∑f
k=1 P(Reject H0k | H0k true) =∑f
k=1 α∗ = fα∗
regardless of whether or not the tests are independent of one another.
Bonferroni’s correction sets α∗ = α/f to ensure that FWER ≤ α.
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 44
Multiple Comparisons Bonferroni Correction
Bonferroni’s Correction: Properties
Major strength: applicable to many situations (no assumptions)
Major weakness: overly conservative in some cases
Suppose we have f = 3 independent tests and want FWER ≤ 0.05FWER = 1− (1− α∗)3
Bonferroni: α∗ = 0.05/3 = 0.0167FWER = 1− (1− 0.0167)3 = 0.04917
Suppose we have f = 10 independent tests and want FWER ≤ 0.05FWER = 1− (1− α∗)10
Bonferroni: α∗ = 0.05/10 = 0.005FWER = 1− (1− 0.005)10 = 0.0489
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 45
Multiple Comparisons Tukey Correction
Detour: Studentized Range Distribution
Assume the following. . .
zkiid∼ N(µ, σ2) for k ∈ {1, . . . , r}
σ2 is an estimate of σ2 based on ν degrees-of-freedomσ2 is independent of zk for k ∈ {1, . . . , r}
The studentized range statistic is defined as
qr ,ν =range(zk )
σ
where range(zk ) = max(zk )−min(zk ), and (r , ν) are the numeratorand denominator degrees of freedom.
One-sided probability distribution (similar to F ), so only reject ifobserved q > q(α)
r ,ν where P(qr ,ν > q(α)r ,ν ) = α.
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 46
Multiple Comparisons Tukey Correction
Testing All Possible Pairwise Comparisons
Want to test all possible pairwise comparisons between means.H0 : µj − µk = 0 ∀j , k versus H1 : not all µj − µk = 0There are g(g − 1)/2 unique pairwise comparisons
Note that the variance of a pairwise comparison L = µj − µk is
V (L) = σ2(
1nj
+1nk
)where nj and nk are the sample sizes of factor levels j and k .
If nj = n∗∀j , simplifies to V (L) = σ2( 2n∗
) for any pairwise comparison.
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 47
Multiple Comparisons Tukey Correction
Tukey’s Honest Significant Difference (HSD) Test
Proposed by John Tukey (1953) for balanced ANOVA, i.e., nj = n∗∀j .
Test statistic is defined as
q∗ =
√2L√
V (L)=
yj − yk√σ2/n∗
where L = yj − yk , V (L) = σ2( 2n∗
), and σ2 is MSE of model.
Considering all pairwise comparisons, q∗ ∼ qg,n−g where qg,n−g isstudentized range distribution with (g,n − g) degrees of freedom.
Note that yj ∼ N(µj , σ2/n∗) for all j ∈ {1, . . . ,g}
Under H0 : µj − µk = 0 ∀j , k , we have yj ∼ N(µ, σ2/n∗) ∀j
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 48
Multiple Comparisons Tukey Correction
Tukey’s HSD Test (continued)
To form a 100(1− α)% CI around the mean difference yj − yk use
(yj − yk )±q(α)
g,n−g√2
√V (L)
where q(α)g,n−g is critical value from studentized range distribution.
If you form all possible CIs around pairwise mean differences, you willcontrol FWER exactly at level α using Tukey’s HSD test.
More conservative than forming all CIs using tn−g critical values
Example 95% CI: q(0.95)3,27 /
√2 = 2.48 and t(0.975)
27 = 2.05
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 49
Multiple Comparisons Tukey Correction
Tukey-Kramer Test
In unbalanced ANOVA, i.e., nj 6= nk for at least one (j , k), use HSDextension proposed by John Tukey (1953) and Clyde Kramer (1956)
Called the Tukey-Kramer test or Tukey-Kramer procedure
To form a 100(1− α)% CI around the mean difference yj − yk use
(yj − yk )±q(α)
g,n−g√2
√V (L)
where V (L) = σ2( 1nj
+ 1nk
) is estimated variance of L = µj − µk .
If you form all possible CIs around pairwise mean differences, you willcontrol FWER below (but not exactly at) level α using Tukey-Kramer.
See Hayter (1984) for formal proof of TK conservativenessNathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 50
Multiple Comparisons Scheffé Correction
Testing All Possible Contrasts
Want to test all possible contrasts between factor level meansH0 :
∑gj=1 cjµj = 0 for all c ∈ C where
C = {c = (c1, . . . , cg) :∑g
j=1 cj = 0} is set of all contrasts
H1 :∑g
j=1 cjµj 6= 0 for some c ∈ C
Note that µj = µ ∀j ⇐⇒∑g
j=1 cjµj = 0 for all c ∈ C
Proof of µj = µ ∀j =⇒∑g
j=1 cjµj = 0 for all c ∈ CIf µj = µ ∀j , then
∑gj=1 cjµj = µ
∑gj=1 cj = µ(0) = 0 for all c ∈ C
Proof of∑g
j=1 cjµj = 0 for all c ∈ C =⇒ µj = µ ∀jIf∑g
j=1 cjµj = 0 for all c ∈ C, then µj − µk = 0 for all j , k
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 51
Multiple Comparisons Scheffé Correction
Contrasts and Overall ANOVA F Test
Remember the overall F test (associated with ANOVA table) is testingH0 : µj = µ ∀j versus H1 : not all µj are equal
Equivalent to testing H0 :∑g
j=1 cjµj = 0 for all c ∈ C versusH1 :
∑gj=1 cjµj 6= 0 for some c ∈ C
See previous slide for proof of equivalence
Point: if we want to test all possible contrasts, we can use Fg−1,n−gdistribution to control FWER at level α.
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 52
Multiple Comparisons Scheffé Correction
Scheffé’s Method
Want to test all possible contrasts between factor level meansH0 :
∑gj=1 cjµj = 0 for all c ∈ C where
C = {c = (c1, . . . , cg) :∑g
j=1 cj = 0} is set of all contrasts
H1 :∑g
j=1 cjµj 6= 0 for some c ∈ C
To form a 100(1− α)% CI for a contrast L =∑g
j=1 cjµj use
L±√
(g − 1)F (α)g−1,n−g
√V (L)
where V (L) = σ2∑gj=1 c2
j /nj is estimated variance of L =∑g
j=1 cj yj .
If you form CIs around all possible contrasts, you will control FWERexactly at level α using Scheffé’s method. Proof
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 53
Multiple Comparisons Summary of Corrections
Summary of Multiple Comparisons
If we want to form 100(1− α)% CI around all L = µj − µk use
L± C√
V (L)
where V (L) = σ2( 1nj
+ 1nk
) and C is some critical value.
Each procedure uses different critical value:
No correction: C = t(α/2)n−g
Bonferroni: C = t(α∗/2)
n−g with α∗ = α/[g(g − 1)/2]
Tukey: C = q(α)g,n−g/
√2
Scheffé: C =√
(g − 1)F (α)g−1,n−g
Scheffé’s critical value is ALWAYS larger than Tukey value, becauseset of all pairwise comparisons is subset of set of all contrasts.Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 54
Multiple Comparisons Summary of Corrections
Choosing Between Multiple Comparisons
You want CIs that are as narrow as possible and control FWER.
If you are interested in all pairwise comparisons, use Tukey-Kramer.
If you are interested in all possible contrasts, use Scheffé.
If you are interested in some subset of all pairwise comparisons (or allcontrasts), Bonferroni may be most efficient approach.
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 55
Multiple Comparisons Memory Example (Part 2)
Memory Example: Pairwise Comparison Estimates
Suppose we want to test all g(g − 1)/2 = 3 unique pairwisecomparisons between factor level means:
L1 = µ1 − µ2 = µfast − µnormalL2 = µ2 − µ3 = µnormal − µslowL3 = µ3 − µ1 = µslow − µfast
The estimated pairwise comparisons are given byL1 = µ1 − µ2 = 21.2− 27.4 = −6.2L2 = µ2 − µ3 = 27.4− 21.8 = 5.6L3 = µ3 − µ1 = 21.8− 21.2 = 0.6
and we know that V (L) = σ2(2/n∗) = (19.8370)(2/10) = 3.9674
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 56
Multiple Comparisons Memory Example (Part 2)
Memory Example: Pairwise Comparison CI Values
If we want to form 95% CI around all three L = µj − µk use
L± C√
V (L) = L± C√
3.9674
whereC = t(.025)
27 = 2.0518 with no correction
C = t(.008)27 = 2.5525 with Bonferroni correction
C =q(.05)
3,27√2
= 2.4794 with Tukey correction
C =√
2F (.05)2,27 = 2.5900 with Scheffé correction
Note that Tukey is best (i.e., produces narrowest CIs), followed byBonferroni, and then Scheffé.
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 57
Multiple Comparisons Memory Example (Part 2)
Memory Example: Pairwise CIs (no correction)
Using no correction the CI estimates are:
L1 ± t(.025)27
√V (L1) = −6.2± 2.0518
√3.9674 = [−10.2869; −2.1131]
L2 ± t(.025)27
√V (L2) = 5.6± 2.0518
√3.9674 = [1.5131; 9.6869]
L3 ± t(.025)27
√V (L3) = 0.6± 2.0518
√3.9674 = [−3.4869; 4.6869]
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 58
Multiple Comparisons Memory Example (Part 2)
Memory Example: Pairwise CIs (Bonferroni)
Using Bonferroni correction the CI estimates are:
L1 ± t(.008)27
√V (L1) = −6.2± 2.5525
√3.9674 = [−11.2841; −1.1159]
L2 ± t(.008)27
√V (L2) = 5.6± 2.5525
√3.9674 = [0.5159; 10.6841]
L3 ± t(.008)27
√V (L3) = 0.6± 2.5525
√3.9674 = [−4.4841; 5.6841]
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 59
Multiple Comparisons Memory Example (Part 2)
Memory Example: Pairwise CIs (Tukey)
Using Tukey correction the CI estimates are:
L1 ±q(.05)
3,27√2
√V (L1) = −6.2± 2.4794
√3.9674 = [−11.1386; −1.2614]
L2 ±q(.05)
3,27√2
√V (L2) = 5.6± 2.4794
√3.9674 = [0.6614; 10.5386]
L3 ±q(.05)
3,27√2
√V (L3) = 0.6± 2.4794
√3.9674 = [−4.3386; 5.5386]
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 60
Multiple Comparisons Memory Example (Part 2)
Memory Example: Pairwise CIs (Scheffé)
Using Scheffé correction the CI estimates are:
L1 ±√
2F (.05)2,27
√V (L1) = −6.2± 2.59
√3.9674 = [−11.3589; −1.0411]
L2 ±√
2F (.05)2,27
√V (L2) = 5.6± 2.59
√3.9674 = [0.4411; 10.7589]
L3 ±√
2F (.05)2,27
√V (L3) = 0.6± 2.59
√3.9674 = [−4.5589; 5.7589]
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 61
Multiple Comparisons Memory Example (Part 2)
Memory Example: Good use of Scheffé
Suppose we want to test four contrasts:L1 = µ1 − µ2 = µfast − µnormalL2 = µ2 − µ3 = µnormal − µslowL3 = µ3 − µ1 = µslow − µfastL4 = µ2 − µ1+µ3
2 = µnormal − µslow+µfast2
If we want to form 95% CI around all four Lj use
Lj ± C√
V (Lj)
whereC = t(.006)
27 = 2.6763 using Bonferroni (α∗ = .05/4 = .0125)
C =√
2F (.05)2,27 = 2.59 using Scheffé
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 62
Multiple Comparisons Memory Example (Part 2)
Memory Example: Good use of Scheffé (continued)
Note that L4 = 27.4− 21.8+21.22 = 5.9 and
V (L4) = σ2∑3j=1
c2j
nj= (19.8370)
(110 + (−1/2)2
10 + (−1/2)2
10
)= 2.9756
Using Scheffé correction the CI estimates are:
L1 ±√
2F (.05)2,27
√V (L1) = −6.2± 2.59
√3.9674 = [−11.3589; −1.0411]
L2 ±√
2F (.05)2,27
√V (L2) = 5.6± 2.59
√3.9674 = [0.4411; 10.7589]
L3 ±√
2F (.05)2,27
√V (L3) = 0.6± 2.59
√3.9674 = [−4.5589; 5.7589]
L4 ±√
2F (.05)2,27
√V (L4) = 5.9± 2.59
√2.9756 = [1.4322; 10.3678]
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 63
Appendix
Appendix
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 64
Appendix
Scheffé’s Method: Logic
Remember that the overall ANOVA F test has the form
F ∗ =MSBMSW
=
1g−1
∑gj=1 nj(yj − y)2
σ2 ∼ Fg−1,n−g under H0
which implies that
S2 =SSBMSW
=
∑gj=1 nj(yj − y)2
σ2 ∼ (g − 1)Fg−1,n−g under H0
Defining the test of a single contrast as Tc = L√V (L)
, note that
supc∈C
T 2c = S2
where sup denotes the supremum (i.e., least upper-bound).Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 65
Appendix
Scheffé’s Method: Proof (part 1)
To prove the claim supc∈C T 2c = S2, define the n × 1 vector
a′ =(
c1n1
1′n1c2n2
1′n2· · · cg
ng1′ng
)where cj are contrast coefficients and 1nj is an nj × 1 vector of ones.
Define y′ = (y′1, . . . ,y′g) where y′j = (y1j , . . . , ynj j) and note that
a′y =∑g
j=1cjnj
1′njyj =
∑gj=1 cj yj = L
‖a‖2 = a′a =∑g
j=1
(cjnj
)21′nj
1nj =∑g
j=1 c2j /nj
which implies that
T 2c =
(a′y)2
σ2‖a‖2
for any contrast c ∈ C.Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 66
Appendix
Scheffé’s Method: Proof (part 2)
Now note that a = Xb where
X =
1n1 0n1 · · · 0n1
0n2 1n2 · · · 0n2...
.... . .
...0ng 0ng · · · 1ng
b =
c1/n1c2/n2
...cg/ng
which implies that a = Ha where H = X(X′X)−1X′ is hat matrix.
Also note that a′1n =∑g
j=1cjnj
1′nj1nj =
∑gj=1 cj = 0, which implies that
a = (H− H0)a
where H0 = 1n 1n1′n is projection matrix for constant space.
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 67
Appendix
Scheffé’s Method: Proof (part 3)
Putting things together, we can write the single contrast test as
T 2c =
(a′y)2
σ2‖a‖2=
[a′(H− H0)y]2
σ2‖a‖2
for any contrast c ∈ C, given that a′ = a′(H− H0).
By the Cauchy-Schwarz inequality, we know that
(u′v)2 ≤ ‖u‖2‖v‖2
with equality holding only when u = wv for some w 6= 0.
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 68
Appendix
Scheffé’s Method: Proof (part 4)
Letting a = u and v = (H− H0)y, we have
T 2c =
[a′(H− H0)y]2
σ2‖a‖2
≤ ‖a‖2‖(H− H0)y‖2
σ2‖a‖2=‖(H− H0)y‖2
σ2 = S2
If we define a = (H− H0)y, then T 2c reaches its upper bound of S2.
To control FWER at level α note that
P(T 2c ≤ S2 ∀c ∈ C) = P(sup
c∈CT 2
c ≤ S2)
and we know that S2 ∼ (g − 1)Fg−1,n−g under H0.
Use S =√
(g − 1)F (α)g−1,n−g to form 100(1− α)% CI for Tc
Return
Nathaniel E. Helwig (U of Minnesota) One-Way Analysis of Variance Updated 04-Jan-2017 : Slide 69