anova: fixed effects
TRANSCRIPT
- p. 1/21
Statistics 203: Introduction to Regressionand Analysis of Variance
ANOVA: fixed effects
Jonathan Taylor
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 2/21
Today
Qualitative / categorical variables. One & Two-way ANOVA models.
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 3/21
Categorical variables
Most variables we have looked at so far were continuous:height, rating, etc.
In many situations, we record a categorical variable: gender,state, country, etc.
How do we include this in our model?
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 4/21
Example: tool lifetime
Outcome: Y , lifetime of a cutting tool on a lathe. Predictor:
X1, lathe speed, revolutions per minute T , tool type (A or B)
Goal: to study if the effect of lathe speed is differentdepending on the tool type.
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 5/21
Solution #1: stratification
One solution is to “stratify” data set by this categoricalvariable.
We could break data set up into 2 groups by tool type, fitmodel
Yi = β0 + β1Xi,1 + εi
in each group. Problem: this results in very small samples in each group:
low degrees of freedom for estimating σ2 in each group.
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 6/21
Solution #2: qualitative predictors
IF it is reasonable to assume that σ2 is constant for eachobservation.
THEN, we can incorporate all observations into 1 model.
Yi = β0 + β1Xi,1 + β2Xi,2 + β3Xi,1 ∗ Xi,2 + εi
where
Xi,2 =
1 if T = A,
0 otherwise.
This model estimate different slopes and intercepts withineach model: for tool type A: slope=β1 + β3, intercept=β0 + β2
for tool type B: slope=β1, intercept=β0
Test for different slopes: H0 : β3 = 0. Test for different intercepts: H0 : β2 = 0. Test for different slope & intercept : H0 : β2 = β3 = 0. Here is the example
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 7/21
More than two levels
If our categorical variable has r levels (i.e. r different tooltypes t1, . . . , tr) then we need to add r − 1 categoricalvariables to X: for 1 ≤ j ≤ r − 1
Ci,j =
1 if Ti = tj
0 otherwise.
Note: there are many ways to “code” the qualitative variable.The scheme aboves that the mean in group r is β0 and thecoefficients of the columns Ci,j represent differences fromthe mean of group r.
To look for different “slopes” for a given continuous predictorX we need to add r − 1 more columns: for 1 ≤ j ≤ r − 1
Ii,j = Xi ∗ Ci,j , 1 ≤ i ≤ n.
These are our first “real” interactions: taking some columnsof a smaller X and multiplying them together (i.e. the Ccolumns and X columns).
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 8/21
Analysis of Variance models
Models with only qualitative variables. One-way ANOVA: extension of “two-sample” t-test. Example: in studying the effect of BP on heart disease we
might consider the overall health (Poor, Moderate, Good). Two-way ANOVA: more than one qualitative variable: include
an ethnicity as part of our study of the effect of BP on heartdisease.
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 9/21
One-way ANOVA
Generalizes two sample t-test: more than one level. One-way ANOVA model: observations:
(Yij), 1 ≤ i ≤ r, 1 ≤ j ≤ ni: r groups and ni samples in i-thgroup.
Yij = µ + αi + εij , εij ∼ N(0, σ2).
Constraint:∑r
i=1 αi = 0. Why a constraint? Otherwise,model is unidentifiable: r + 1 parameters for only r means.We can find infinitely many choices of (µ, α1, . . . , αr) thatyield same means for each Yij .
This particular constraint comes down to a different “coding”of the group levels (see Ci,j above). In this case, αi’s aredifferences from “grand mean” µ.
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 10/21
Extension of two sample t-test
Model is easy to fit:
Yij =1
ni
ni∑
j=1
Yij .
Simplest question: is there any group effect?
H0 : α1 = · · · = αr = 0?
Test is based on F -test with full model vs. reduced model.Reduced model just has an intercept.
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 11/21
ANOVA tables: One-way
Source SS df E(MS)
Treatments SST R =Pr
i=1 ni
“
Y i· − Y··
”2r − 1 σ2 +
Pri=1 niα2
ir−1
Error SSE =Pr
i=1Pni
j=1(Yij − Y i·)
2 Pri=1 ni − r σ2
Notation: Y i· is i-th group mean, Y··
is overall mean. We see that under H0 : α1 = · · · = αr = 0, the expected
value of SSTR and SSE is σ2. Entries in the ANOVA table are, in general, independent. Therefore, under H0
F =MSTR
MSTO=
SSTRdfTR
SSEdfE
∼ FdfTR,dfE.
Reject H0 at level α if F > F1−α,dfTR,dfT O.
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 12/21
Example: rehab surgery
Example: rehab surgery How does prior fitness affect recovery from surgery?
Observations: 24 subjects’ recovery time. Three fitness levels: below average, average, above
average. If you are in better shape before surgery, does it take less
time to recover?
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 13/21
Inference for linear combinations
Suppose we want to “infer” something about
r∑
i=1
ai(µ + αi).
Var
(r∑
i=1
aiY i·
)= σ2
r∑
i=1
a2i
ni
.
Usual confidence intervals, t-tests.
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 14/21
Two-way ANOVA
Second generalization: more than one grouping variable. Two-way ANOVA model: observations:
(Yijk), 1 ≤ i ≤ r, 1 ≤ j ≤ m, 1 ≤ k ≤ nij : r groups in firstgrouping variable, m groups ins second and nij samples in(i, j)-“cell”:
Yijk = µ + αi + βj + (αβ)ij + εijk, εijk ∼ N(0, σ2).
Again: just a regression model. Main effects: α, β. Interaction effects (αβ): “second derivatives”
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 15/21
Constraints on the parameters
∑r
i=1 αi = 0
∑m
j=1 βj = 0
∑m
j=1(αβ)ij = 0, 1 ≤ i ≤ r
∑r
i=1(αβ)ij = 0, 1 ≤ j ≤ m.
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 16/21
Fitting model
Easy to fit:
Yijk = Y ij· =1
nij
nij∑
k=1
Yijk.
Inference for combinations
Var
r∑
i=1
m∑
j=1
aijY ij·
= σ2 ·
r∑
i=1
m∑
j=1
a2ij
nij
.
Usual t-tests, confidence intervals.
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 17/21
Questions of interest
Are there main effects for the grouping variables?
H0 : α1 = · · · = αr = 0, H0 : β1 = · · · = βm = 0.
Are there interaction effects:
H0 : (αβ)ij = 0, 1 ≤ i ≤ r, 1 ≤ j ≤ m.
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 18/21
ANOVA table: Two-way (assuming nij = n)
Term SS
A SSA = nm∑r
i=1
(Y i·· − Y
···
)2
B SSB = nr∑m
j=1
(Y
·j· − Y···
)2
AB SSAB = n∑r
i=1
∑mj=1
(Y ij· − Y i·· − Y
·j· + Y···
)2
Error SSE =∑r
i=1
∑mj=1
∑nk=1(Yijk − Y ij·)
2
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 19/21
ANOVA table: Two-way (continued)
SS df E(MS)
SSA r − 1 σ2 + nm
Pri=1 α2
ir−1
SSB m − 1 σ2 + nr
Pmj=1 β2
jm−1
SSAB (m − 1)(r − 1) σ2 + n
Pri=1
Pmj=1(αβ)2
ij(r−1)(m−1)
SSE (n − 1)mr σ2
Under H0 : (αβ)ij = 0, ∀i, j the expected value of SSAB andSSE is σ2 – use these for an F -test. Use
MSAB
MSE=
SSAB/dfAB
SSE/dfE
∼ F(m−1)(r−1),(n−1)mr
to test H0. To test H0 : αi = 0, ∀i, use
MSA
MSE=
SSA/dfA
SSE/dfE
∼ Fr−1,(n−1)mr.
To test H0 : βi = 0, ∀i, use
MSB
MSE
SSB/dfB
SSE/dfE
∼ Fm−1,(n−1)mr.
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 20/21
Example: kidney failure
Time of stay in hospital depends on weight gain betweentreatments and duration of treatment.
Two levels of duration, three levels of weight gain. Is there an interaction? Main effects? Here is the example
Today
Categorical variables
Example: tool lifetime
Solution #1: stratification
Solution #2: qualitative
predictors More than two levels
Analysis of Variance models
One-way ANOVA
Extension of two sample
t-test ANOVA tables: One-way
Example: rehab surgery
Inference for linear
combinations Two-way ANOVA
Constraints on the parameters
Fitting model
Questions of interest
ANOVA table: Two-way
(assuming nij = n)
ANOVA table: Two-way
(continued)
Example: kidney failure
Caveats
- p. 21/21
Caveats
Testing for main effects is NOT the same as usual. R uses SSE from full model (including interactions) as
denominator. This allows for interaction terms with no main effects.