stat22200 spring 2014 handout 2 - university of chicagoyibi/teaching/stat222/... · stat22200...
TRANSCRIPT
STAT22200 Spring 2014 Handout 2
Yibi Huang
April 15, 2014
Chapter 3 Completely Randomized Designs
Handout 2 - 1
Definition of a Completely Randomized Design (CRD) (1)
An experiment has a completely randomized design if
I the number of treatments g is predetermined
I the number of replicates (ni ) of treatment i is predetermined,i = 1, . . . , g , and
I each allocation of N = n1 + · · · + ng experimental units into ggroups of size (n1, . . . , ng ) occurs equally likely.
Handout 2 - 2
Definition of a Completely Randomized Design (CRD) (2)
I Say we have 4 units: A, B, C, D and, 2 treatments with 2units each, the CRD ensures the following allocations occurequally likely
(AB,CD), (AC ,BD), (AD,BC ),
(BC ,AD), (BD,AC ), (CD,AB).
I Suppose one tosses a coin for each of the 20 patients. Thosewho get heads go to the treatment group, others go to thecontrol group.
I This is not a CRD, as the number of replications in the twogroups is random.
I If the patients draw lots, say, draw from 20 tickets in a hat, 10of which are marked “treatment”, it is a CRD.
Handout 2 - 3
Means Model For A CRD ExperimentConsider an experiment with g treatments, and ni replicates (i.e.,ni experimental units) for treatment i , i = 1, 2, . . . , g .
yij = µi + εij (means model)
= µ+ αi + εij (effects model)
for i = 1, . . . , g and j = 1, ..., ni .
I µi = mean response for the ith treatment i
I µ = overall mean,
I αi = ith treatment effect
I µi = µ+ αi
I The error terms εij are assumed to be independent withmean 0 and constant variance σ2. Sometimes we furtherassume that errors are normal.
The means model and the effects model are just two expressions ofa linear regression model with a dummy variable for eachtreatment, under different restriction of parameters
Handout 2 - 4
Dot and Bar Notation
A dot (•) in subscript means sum over that index, for example
yi• =∑j
yij , y•j =∑i
yij , y•• =
g∑i=1
ni∑j=1
yij
A bar over a variable, along with a dot (•) in subscript meansaverage over that index, for example
y i• =1
ni
ni∑j=1
yij , y•• =1
N
g∑i=1
ni∑j=1
yij
Handout 2 - 5
Parameter Estimation for the Means ModelRecall the least square estimates µi ’s are the µi ’s that minimizethe sum of squares of the observations yij to their hypothesizedmeans µi based on the model,
SS =
n1∑j=1
(y1j − µ1)2 +
n2∑j=1
(y2j − µ2)2 + · · · +
ng∑j=1
(ygj − µg )2.
In order to minimize S , we could differentiate it with respect toeach µi and set the derivative equal to zero.
∂S
∂µi= −2
ni∑j=1
(yij − µi ) = −2ni (y i• − µi ) = 0
The least square estimate for µi is thus the sample mean ofobservations in the corresponding treatment group,
µi = y i• =1
ni
∑ni
j=1yij .
Moreover the LS estimate y i• for µi is unbiased.Handout 2 - 6
Parameter Estimation for the Effects Model
For the effects model
yij = µ+ αi + εij ,
since µ+ αi = µi , regardless of the constraint on the parameters,their least square estimates always satisfy that
µ+ αi = y i• = sample mean of ttt group i
Remark. The textbook use the constraint∑n
i=1 niαi = 0, becausethen the least square estimates for µ and αi ’s have the simple formthat
µ = y••
αi = y i• − y••
Handout 2 - 7
More on Parameter Estimations
I fitted value for yij is yij = µi = y i•I residual for yij is eij = yij − yij = yij − y i•I SSE =
∑gi=1
∑nij=1 e
2ij =
∑gi=1
∑nij=1(yij − y i•)2
I dfE = N − g because there are N = n1 + · · · + ngobservations in total and g parameters (µ1, . . . , µg )
I The estimate for σ2 is again the MSE
σ2 = MSE =SSE
dfE
=1
N−g
g∑i=1
ni∑j=1
e2ij =1
N−g
g∑i=1
ni∑j=1
(yij − y i•)2.
I The MSE is an unbiased estimator of σ2.
Handout 2 - 8
Sum of Squares (1)As the means model yij = µi + εij is a regression model, theidentity for the sum of squares SST = SSR + SSE is also valid.
yij − y•• = (y i• − y••) + (yij − y i•)
Squaring up both sides we get
(yij − y••)2 = (y i• − y••)2 + (yij − y i•)2 + 2(y i• − y••)(yij − y i•)
Summing over the indexes we get
SST︷ ︸︸ ︷g∑
i=1
ni∑j=1
(yij − y••)2 =
SSR=SSTrt︷ ︸︸ ︷g∑
i=1
ni∑j=1
(y i• − y••)2 +
SSE︷ ︸︸ ︷g∑
i=1
ni∑j=1
(yij − y i•)2
+ 2
g∑i=1
ni∑j=1
(y i• − y••)(yij − y i•)︸ ︷︷ ︸= 0, see next slide
In this context of experimental design, SSR is often addressed asSSTrt , called the treatment sum of squares
Handout 2 - 9
Sum of Squares (2)
Observe that
g∑i=1
ni∑j=1
(y i• − y••)(yij − y i•) =
g∑i=1
(y i• − y••)
ni∑j=1
(yij − y i•)
andni∑j=1
(yij − y i•) = yi• − niy i• = yi• − ni (yi•ni
) = 0
and henceg∑
i=1
ni∑j=1
(y i• − y••)(yij − y i•) = 0.
Handout 2 - 10
g∑i=1
ni∑j=1
(yij − y••)2︸ ︷︷ ︸SST
=
g∑i=1
ni∑j=1
(y i• − y••)2︸ ︷︷ ︸=SSTrt=SSB
+
g∑i=1
ni∑j=1
(yij − y i•)2︸ ︷︷ ︸=SSE=SSW
I SST = total sum of squaresI reflects total variability in the data
I SSTrt = treatment sum of squaresI reflects variability between treatmentsI also called between sum of squares, denoted as SSB
I SSE = error sum of squaresI Observe that SSE =
∑gi=1(ni − 1)s2i , in which
s2i =1
ni − 1
∑ni
j=1(yij − y i•)2
is the sample variance within treatment group i .So SSE reflects the variability within treatment groups.
I also called within sum of squares, denoted as SSW
Handout 2 - 11
Degrees of FreedomUnder the means model yij = µi + εij , and εij ’s i.i.d. ∼ N(0, σ2), itcan be shown that
SSE
σ2∼ χ2
N−g ,
If we further assume that µ1 = · · · = µg = 0, then
SST
σ2∼ χ2
N−1,SSTrt
σ2∼ χ2
g−1
and SSTrt is independent of SSE.Note the degrees of freedom of the 3 chi-square distributions
dfT = N − 1, dfTrt = g − 1, dfE = N − g
break down similarly
dfT = dfTrt + dfE
just like SST = SSTrt + SSE .Handout 2 - 12
ANOVA F -test and ANOVA TableTo test whether the treatments have different effects
H0 : µ1 = · · · = µg (no difference between treatments)
Ha : µi ’s not all equal (some difference between treatments)
the test statistic is the F -statistic.
F =MSTrtMSE
=SSTrt/(g − 1)
SSE/(N − g)
which has an F distribution with g − 1 and N − g degrees offreedom.
Sum ofSource Squares d.f. Mean Squares F0
Treatments SSTrt g − 1 MSTrt =SSTrtg − 1
MSTrtMSE
Errors SSE N − g MSE =SSE
N − gTotal SST N − 1
Handout 2 - 13
A Heuristic Interpretation of the ANOVA F -Statistic (1)
H0 : µ1 = · · · = µg (no difference between treatments)
Ha : µi ’s not all equal (some difference between treatments)
1. Under H0, all observations yij have a common mean µ, whichcan be estimated by the grand mean
y•• =1
N
g∑i=1
ni∑j=1
yij .
and all the g group means y i• should be close to the grandmean y••. Any deviation from the grand mean y i• − y••should be considered as evidence for Ha. We thus consider atest statistic of the form
g∑i=1
(y i• − y••)2.
Handout 2 - 14
A Heuristic Derivation of the ANOVA F -Statistic (2)
2. The sample mean y i• has a variance of σ2/ni . The larger thegroup, the closer y i• is to the grand mean. Hence we weightthe squared deviations (y i• − y••)2 by group size ni ,
g∑i=1
ni (y i• − y••)2 =
g∑i=1
ni∑j=1
(y i• − y••)2,
which is exactly SStrt , the treatment sum of squares. Largevalues of SSTrt are evidences toward Ha.
3. Large value of SSTrt can be due the large variability in theobservations. We should compare SSTrt with the error size σ2.As σ2 is unknown so it is replaced with its estimate MSE.
F =SSTrt/(g − 1)
MSE=
SSTrt/(g − 1)
SSE/(N − g),
which is the F -statistic we use.
Handout 2 - 15
Example — Resin Glue Failure Time — Background
I How to measure the lifetime of things like computer diskdrives, light bulbs, and glue bonds?E.g., a computer drive is claimed to have a lifetime of 800,000hours (> 90 years).Clearly the manufacturer did not have disks on test for 90years; how do they make such claims?
I Accelerated life test: Parts under stress (higher load, highertemperature, etc.) will usually fail sooner than parts that areunstressed. By modeling the lifetimes of parts under variousstresses, we can estimate (extrapolate to) the lifetime of partsthat are unstressed.
I Example: resin glue failure time
Handout 2 - 16
Example — Resin Glue Failure Time1
I Goal: to estimate the life time (in hours) of an encapsulatingresin for gold-aluminum bonds in integrated circuits(operating at 120◦C)
I Method: accelerated life test
I Design: Randomly assign 37 units to one of 5 differenttemperature stresses (in Celsius)
175◦, 194◦, 213◦, 231◦, 250◦
I Treatments: temperature in Celsius
I Response: Y = log10(time to failure in hours) of the testedmaterial.
1Source: p. 448-449, Accelerated Testing (Nelson 2004). Original data isprovided by Dr. Muhib Khan of AMD.
Handout 2 - 17
Example — Resin Glue Failure Time — Data
Temperature (◦C) 175 194 213 231 250
Y 2.04 1.66 1.53 1.15 1.261.91 1.71 1.54 1.22 0.832.00 1.42 1.38 1.17 1.081.92 1.76 1.31 1.16 1.021.85 1.66 1.35 1.21 1.091.96 1.61 1.27 1.28 1.061.88 1.55 1.26 1.171.90 1.66 1.38
Data file:http://users.stat.umn.edu/~gary/book/fcdae.data/exmpl3.2
Handout 2 - 18
Good and Bad GraphsUsually we first examine the data graphically.
> resin = read.table("resin.txt", header=T)
> attach(resin)
> plot(temp,y)●
●
●●●
●●●
●●
●
●
●●●
●
●●
●●●●●
●
●●●●●●
●
●
●
●●●●
1 2 3 4 5
0.8
1.2
1.6
2.0
temp
y
Drawback of this plot
I not properly labeled (Should display variable name + unit)
I What do temp 1, 2, 3, 4, and 5 represent?
Handout 2 - 19
Good and Bad Graphs (2)
tempC = c(175, 194, 213, 231, 250)[temp]
plot(tempC,y,ylab="log10(Failure time in hours)",
xlab="Temperature in Celsius") # Plot A
plot(tempC,y,ylab="log10(Failure time in hours)",
xlab="Temperature in Celsius",xaxt="n")
axis(side=1,at=c(175, 194, 213, 231, 250)) # Plot B
Plot A Plot B●
●
●●●
●●●
●●
●
●
●●●
●
●●
●●●●●
●
●●●●●●
●
●
●
●●●●
180 200 220 240
0.8
1.2
1.6
2.0
Temperature in Celsius
log1
0(Fa
ilure
tim
e in
hou
rs)
●
●
●●●
●●●
●●
●
●
●●●
●
●●
●●●●●
●
●●●●●●
●
●
●
●●●●
0.8
1.2
1.6
2.0
Temperature in Celsius
log1
0(Fa
ilure
tim
e in
hou
rs)
175 194 213 231 250
Which plot conveys more information?Handout 2 - 20
Example — Resin Glue Failure Time — SStrt
Temperature (◦C) 175 194 213 231 250yij 2.04 1.66 1.53 1.15 1.26
1.91 1.71 1.54 1.22 0.832.00 1.42 1.38 1.17 1.081.92 1.76 1.31 1.16 1.021.85 1.66 1.35 1.21 1.091.96 1.61 1.27 1.28 1.061.88 1.55 1.26 1.171.90 1.66 1.38
ni 8 8 8 7 6y i• 1.933 1.629 1.378 1.194 1.057
y•• = 137 (2.04 + 1.91 + · · · + 1.06) = 1.465
The between group sum of squares
SSTrt =∑g
i=1
∑ni
j=1(y i• − y••)2 =
∑5
i=1ni (y i• − y••)2
= 8(1.933 − 1.465)2 + 8(1.629 − 1.465)2 + 8(1.378 − 1.465)2
+ 7(1.194 − 1.465)2 + 6(1.057 − 1.465)2 ≈ 3.54
Handout 2 - 21
Example — Resin Glue Failure Time — SSE
The within group sum of squares:
n1∑j=1
(y1j − y1•)2 = (2.04 − 1.933)2+ (1.91 − 1.933)2+ · · · + (1.90 − 1.933)2
n2∑j=1
(y2j − y2•)2 = (1.66 − 1.629)2+ (1.71 − 1.629)2+ · · · + (1.66 − 1.629)2
......
n5∑j=1
(y5j − y5•)2 = (1.26 − 1.057)2+ (0.83 − 1.057)2+ · · · + (1.06 − 1.057)2
So
SSE =
n1∑j=1
(y1j−y1•)2+
n2∑j=1
(y2j−y2•)2+· · ·+n5∑j=1
(y5j−y5•)2 ≈ 0.294
Handout 2 - 22
Example — Resin Glue Failure Time — F -statistic
I The observed F -statistic is
F0 =SSTrt/(g − 1)
SSE/(N − g)=
3.54/(5 − 1)
0.29/(37 − 5)≈ 97.66
I The resulting F -statistic is very large compared to 1. In fact,the p-value
P(F4,32 ≥ 97.66) = 1.842 × 10−17 � 0.001
I The data exhibit strong evidence against the null hypothesisthat all means are equal.
Handout 2 - 23
ANOVA F -Test in R
The command to get the ANOVA table in R is aov().
> resin = read.table("resin.txt", header=T)
> aov1 = aov(y ~ as.factor(temp), data=resin)
> summary(aov1)
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(temp) 4 3.538 0.8844 96.36 <2e-16 ***
Residuals 32 0.294 0.0092
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Handout 2 - 24
Limitation of ANOVA F -Tests
The ANOVA F -test merely tells us the glue has different failuretime at different temperature.However, our goal is to predict the lifetime of the glue at atemperature of 120◦.
●●●●●●●●
●●
●
●●●●●
●●
●●●●●●
●●●●●●●
●
●
●●●●
120 140 160 180 200 220 240
1.0
1.5
2.0
2.5
3.0
Temperature (Celcius)
log1
0 (F
ailu
re ti
me)
(red cross marks group means)
Handout 2 - 25
Dose-Response ModelingIn some experiments, the treatments are associated with numericallevels zi such as drug dose, baking time, or temperature.
We will refer to such levels as doses.
I The means model yij = µi + εij specifies no relationshipbetween treatment levels zi and the response y , which cannotbe used to infer the response at some dose z other than thoseused in the experiment
I With a quantitative treatment factor, experimenters areusually more interested on how the response is affected by thefactor as a function of zi
yij = f (zi ; θ) + εij ,
e.g.,
f (xi ;β0, β1) = β0 + β1xi ;
f (xi ;β0, β1, β2) = β0 + β1xi + β2x2i ; or
f (xi ;β0, β1) = β0 + β1 log(xi ).
Handout 2 - 26
yij = f (zi ; θ) + εij
Advantages of dose-response modeling
I less complex (fewer parameters)
I easier to interpret (sometimes)
I generalizable to doses not included in the experiment
Issues to consider:
I How to choose the function f ?
I One commonly used family of functions f are polynomials:
f (xi ;β) = β0 + β1xi + β2x2i + · · · + βkx
ki ,
But polynomials are NOT always the best choiceI For simplicity, we would choose the lowest possible order of
polynomial that adequately fit the data.
I How to assess how well f fits the data? . . . . . . Goodness of fit
Handout 2 - 27
Polynomial Models
Let ti denote the temperature in Celsius in treatment group i .Consider the following polynomial models for the resin glue data.
Null Model : yij = µ+ εij
Linear Model : yij = β0 + β1ti + εij
Quadratic Model : yij = β0 + β1ti + β2t2i + εij
Cubic Model : yij = β0 + β1ti + β2t2i + β3t
3i + εij
Quartic Model : yij = β0 + β1ti + β2t2i + β3t
3i + β4t
4i + εij
I Every model is nested in the model below it. (Why?)
I Don’t skip terms. If a higher order term is significant, e.g., t3i ,than all lower order terms have to be kept (1, ti , t
2i ), even if
they are not significant.
I Why no quintic or higher order models?
Handout 2 - 28
In general, for an experiment with g treatment groups, if thetreatment factor is numeric, one can fit a polynomial model up todegree g − 1
yij = β0 + β1xi + · · · + βg−1xg−1i + εij .
Question: For the resin glue data, what will happen if a quinticmodel (a polynomial of order 5) is fitted?
yij = β0 + β1ti + β2t2i + β3t
3i + β4t
4i + β5t
5i + εij
Answer: There exist more than one polynomial of degree 5passing through the 5 points (175, µ1), (194, µ2), (213, µ3),(231, µ4), and (250, µ5). Thus the 6 coefficients β0, β1, . . . , β5CANNOT be uniquely determined.
As a rule of thumb, for an experiment with g treatments, we canfit a model with at most g parameters.
Handout 2 - 29
In general, for an experiment with g treatment groups, if thetreatment factor is numeric, one can fit a polynomial model up todegree g − 1
yij = β0 + β1xi + · · · + βg−1xg−1i + εij .
Question: For the resin glue data, what will happen if a quinticmodel (a polynomial of order 5) is fitted?
yij = β0 + β1ti + β2t2i + β3t
3i + β4t
4i + β5t
5i + εij
Answer: There exist more than one polynomial of degree 5passing through the 5 points (175, µ1), (194, µ2), (213, µ3),(231, µ4), and (250, µ5). Thus the 6 coefficients β0, β1, . . . , β5CANNOT be uniquely determined.
As a rule of thumb, for an experiment with g treatments, we canfit a model with at most g parameters.
Handout 2 - 29
Linear Model (1)
Let’s try to fit the linear model: yij = β0 + β1ti + εij .
In the data file, the temperature is coded as 1,2,3,4,5 rather thanthe actual values 175, 194, 213, 231, 250.
> temp
[1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4
[27] 4 4 4 4 4 5 5 5 5 5 5
Fitting the modellm(y ∼ temp, data = resin) directly will be wrong. We needto create a new variable: tempC.
> tempC = c(175, 194, 213, 231, 250)[temp]
> tempC
[1] 175 175 175 175 175 175 175 175 194 194 194 194 194 194
[15] 194 194 213 213 213 213 213 213 213 213 231 231 231 231
[29] 231 231 231 250 250 250 250 250 250
Now the variable tempC represents the temperature in Celsius.
Handout 2 - 30
Linear Model (2)
> lm1 = lm(y ~ tempC, data = resin)
> summary(lm1)
(... part of the output is omitted ...)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.9560075 0.1391174 28.44 <2e-16 ***
tempC -0.0118567 0.0006573 -18.04 <2e-16 ***
---
Residual standard error: 0.1031 on 35 degrees of freedom
Multiple R-squared: 0.9029, Adjusted R-squared: 0.9001
F-statistic: 325.4 on 1 and 35 DF, p-value: < 2.2e-16
I Fitted model: log10(failure time) = 3.956 − 0.01186T
I The predicted log10(failure time) at 120◦ is
3.956 − 0.01186 × 120 ≈ 2.5332,
and hence the failure time at 120◦ is predicted as
102.5332 ≈ 341 hours.
Handout 2 - 31
Linear Model (3)
The R command below gives the predicted log10(time) along witha 95% prediction interval.
> predict(lm1, newdata=data.frame(tempC=120),interval="prediction")
fit lwr upr
1 2.533201 2.289392 2.777011
●●●●●●●●
●●
●
●●●●●
●●
●●●●●●
●●●●●●●
●
●
●●●●
120 140 160 180 200 220 240
1.0
1.5
2.0
2.5
3.0
Temperature (Celcius)
log1
0 (F
ailu
re ti
me)
By imposing the regressionline on the top of thescatter plot, we can see y isa slightly curved withtemperature. Using thelinear model, the failuretime at 120◦ will beunderestimated.
Handout 2 - 32
Quadratic Model
> lm2 = lm(y ~ tempC+I(tempC^2), data=resin)
> summary(lm2)
(... part of the output is omitted ...)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.4179987 1.1564331 6.415 2.51e-07 ***
tempC -0.0450981 0.0110542 -4.080 0.000258 ***
I((tempC)^2) 0.0000786 0.0000261 3.011 0.004879 **
---
Residual standard error: 0.09295 on 34 degrees of freedom
Multiple R-squared: 0.9233, Adjusted R-squared: 0.9188
F-statistic: 204.8 on 2 and 34 DF, p-value: < 2.2e-16
I Fitted model: log10(time) = 7.418 − 0.0451T + 0.0000786T 2
I Predicted log10(time) at 120◦ is
7.418 − 0.0451 × 120 + 0.0000786 × (120)2 ≈ 3.138
The predicted failure time at 120◦ is 103.138 ≈ 1374 hours.
Handout 2 - 33
Cubic & Quartic Model> lm3 = lm(y ~ tempC+I(tempC^2)+I(tempC^3))
> summary(lm3)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.827e+00 1.299e+01 0.526 0.603
tempC -3.659e-02 1.865e-01 -0.196 0.846
I(tempC^2) 3.815e-05 8.860e-04 0.043 0.966
I(tempC^3) 6.357e-08 1.392e-06 0.046 0.964
Residual standard error: 0.09434 on 33 degrees of freedom
Multiple R-squared: 0.9233, Adjusted R-squared: 0.9164
F-statistic: 132.5 on 3 and 33 DF, p-value: < 2.2e-16
> lm4 = lm(y ~ tempC+I(tempC^2)+I(tempC^3)+I(tempC^4))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.699e-01 1.957e+02 0.005 0.996
tempC 7.573e-02 3.750e+00 0.020 0.984
I(tempC^2) -7.649e-04 2.679e-02 -0.029 0.977
I(tempC^3) 2.600e-06 8.459e-05 0.031 0.976
I(tempC^4) -2.988e-09 9.962e-08 -0.030 0.976
Residual standard error: 0.0958 on 32 degrees of freedom
Multiple R-squared: 0.9233, Adjusted R-squared: 0.9138
F-statistic: 96.36 on 4 and 32 DF, p-value: < 2.2e-16
Handout 2 - 34
Arrhenius LawThe Arrhenius rate law in Thermodynamics says, the log of failuretime is linear in the inverse of absolute Kelvin temperature, whichequals the Centigrade temperature plus 273.16 degrees.
Arrhenius Model: yij = β0 +β1
T + 273.15.
●
●●●●
●●●
●●
●
●
●●●
●
●●
●●●●●
●
●●●●●●
●●
●
●●●●
0.0018 0.0020 0.0022
0.5
1.0
1.5
2.0
2.5
1/(Centigrate Temperature +273.15)
log1
0 (F
ailu
re ti
me)
Handout 2 - 35
> lmarr = lm(y ~ I(1/(tempC+273.15)),data=resin)
> summary(lmarr)
(... some output is omitted ...)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.3120 0.3007 -14.34 3.2e-16 ***
I(1/(tempC + 273.15)) 2783.7764 144.6808 19.24 < 2e-16 ***
Residual standard error: 0.09724 on 35 degrees of freedom
Multiple R-squared: 0.9136, Adjusted R-squared: 0.9112
F-statistic: 370.2 on 1 and 35 DF, p-value: < 2.2e-16
●
●
●●●
●●●
●●
●
●
●●●
●
●●
●●●●●
●
●●●●●●
●
●
●
●●●●
120 140 160 180 200 220 240
1.0
1.5
2.0
2.5
3.0
Temperature (Celcius)
log1
0 (F
ailu
re ti
me)
quadraticcubicquarticArrheniuslinear
Handout 2 - 36
Data Can Distinguish Models Only at Design Points
For the resin glue data, in addition to polynomial models and theArrhenius model, we can consider many other models:
yij = β0 + β1 log(ti ) + εij ,
yij = β0 + β1 exp(ti ) + εij ,
yij = β0 + β1 sin(ti ) + εij ,
or any strange function
yij = f (ti ) + εij .
However, as we only have observations at five values of ti :175, 194, 213, 231, 250, the data cannot distinguish between thetwo models: yij = f (ti ) + εij and yij = g(ti ) + εij , as long as f (t)and g(t) coincide at t = 175, 194, 213, 231, 250, even if f and gbehave differently in other places.
Handout 2 - 37
The Model that Fit the Data the BestIf no restriction is placed on f , how well the model yij = f (ti ) + εijcan possibly fit the data?The least square method will choose the f that minimize∑
i
∑j
(yij − f (ti ))2
Recall that given a list of numbers x1, x2, . . . , xn the c thatminimize
∑ni=1(xi − c)2 is the mean x = 1
n
∑ni=1 xi .
Thus the least square method will choose the f that
f (ti ) = y i•.
Thus the smallest SSE a model yij = f (ti ) + εij can possiblyachieve is ∑
i
∑j
(yij − y i•)2
which is the SSE for the means model yij = µi + εij .
Conclusion: no other models can beat the means model inminimizing the SSE.
Handout 2 - 38
Goodness of Fit
As the means model is the model that fit the data the best, we canaccess the goodness of a model yij = f (ti ) + εij by comparing itwith the means model.
Full Model : yij = µi + εij
Reduced Model : yij = f (ti ) + εij
This comparison is legitimate because any model yij = f (ti ) + εij isnested in the means model yij = µi + εij (letting µi = f (ti ) ).
We can use the F -statistic below to compare a reduced model witha full model
F =(SSEreduced − SSEfull)/(dfreduced − dffull)
SSEfull/dffull
Handout 2 - 39
Goodness of Fit of the Linear Model
Since the linear model (reduced model) is nested in the meansmodel (full), use the F -statistic for model comparison we get
> lm1 = lm(y ~ tempC, data = resin) # linear model
> lmmeans = lm(y ~ as.factor(temp), data = resin) # means model
> anova(lm1,lmmeans)
Analysis of Variance Table
Model 1: y ~ tempC
Model 2: y ~ as.factor(temp)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 35 0.37206
2 32 0.29369 3 0.07837 2.8463 0.05303 .
The P-value 0.05303 is moderate evidence showing the lineardoesn’t fit the data so well.
Handout 2 - 40
Goodness of Fit of the Quadratic Model
Since the quadratic model (reduced model) is also nested in themeans model (full model), again use the F -statistic for modelcomparison we get
> lm2=lm(y ~ tempC+I((tempC)^2), data=resin) # quadratic model
> lmmeans = lm(y ~ as.factor(temp), data = resin) # means model
> anova(lm2,lmmeans)
Analysis of Variance Table
Model 1: y ~ tempC + I((tempC)^2)
Model 2: y ~ as.factor(temp)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 34 0.29372
2 32 0.29369 2 2.6829e-05 0.0015 0.9985
The large p-value 0.9985 shows the quadratic model fits the datanearly as good as the best model. Thus, the quadratic seems to bean appropriate model for the data.
Handout 2 - 41
Shall We Consider a Cubic or a Quartic Model?
No. Because
Quadratic ⊂ Cubic ⊂ Quartic ⊂ Means Model
the cubic model and quartic model won’t fit the data better thanthe means model does. As the quadratic model fits the data nearlyas well as the means model, the 4 models just fit as well as eachother. In this case we simply choose the model of lowestcomplexity.
Handout 2 - 42
Be Cautious About Extrapolation
●
●
●
●●
●
●●
●●
●
●
●●●
●
●●
●●●
●●
●
●●●●●●
●
●
●
●●●●
120 140 160 180 200 220 240
1.0
1.5
2.0
2.5
3.0
Temperature (Celcius)
log1
0 (F
ailu
re ti
me)
quadraticcubicquarticlinear
Though the quadratic, cubic, and quartic model fit the 5 pointsnearly as well, their predicted values at 120◦C are quite different,
quadratic > cubic > quartic > linear
Handout 2 - 43
95% Prediction Intervals
●●●●●●●●
●●
●
●●●●●
●●●●●●●●
●●●●●●● ●
●●●●●
120 160 200 240
12
34
5
Temperature (Celcius)
log1
0 (F
ailu
re ti
me)
Linear Model
●●●●●●●●
●●
●
●●●●●
●●●●●●●●
●●●●●●● ●
●●●●●
120 160 200 240
12
34
5
Temperature (Celcius)
log1
0 (F
ailu
re ti
me)
Quadratic Model
●●●●●●●●
●●
●
●●●●●
●●●●●●●●
●●●●●●● ●
●●●●●
120 160 200 240
12
34
5
Temperature (Celcius)
log1
0 (F
ailu
re ti
me)
Cubic Model
●●●●●●●●
●●
●
●●●●●
●●●●●●●●
●●●●●●● ●
●●●●●
120 160 200 240
12
34
5
Temperature (Celcius)
log1
0 (F
ailu
re ti
me)
Quartic Model
Handout 2 - 44
Prediction Intervals at 120◦CObserve the length of the 95% prediction interval increase with thedegree of the polynomial.
> predict(lm1, newdata=data.frame(tempC=120),interval="p")
fit lwr upr
1 2.533201 2.289392 2.777011
> predict(lm2, newdata=data.frame(tempC=120),interval="p")
fit lwr upr
1 3.138128 2.674383 3.601874
> lm3 = lm(y ~ tempC+I((tempC)^2)+I((tempC)^3))
> predict(lm3, newdata=data.frame(tempC=120),interval="p")
fit lwr upr
1 3.095342 1.132382 5.058303
> lm4 = lm(y ~ tempC+I((tempC)^2)+I((tempC)^3)+I((tempC)^4))
> predict(lm4, newdata=data.frame(tempC=120),interval="p")
fit lwr upr
1 2.917399 -9.330658 15.16546
Though the quadratic, cubic, and quartic models fit in the range ofdata points (175◦C - 250◦C) as well as each other, outside thatrange the reliability of their prediction changes drastically.
Handout 2 - 45
Since the Arrhenius model is nested in the means model, we cancheck its goodness of fit.
> anova(lmarr,lmmeans)
Analysis of Variance Table
Model 1: y ~ I(1/(tempC + 273.15))
Model 2: y ~ as.factor(temp)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 35 0.33093
2 32 0.29369 3 0.037239 1.3525 0.2749
The moderately large P-value 0.2749 told us the Arrhenius Modelis acceptable relative to the best model.
> predict(lmarr, newdata=data.frame(tempC=120),interval="p")
fit lwr upr
1 2.76868 2.525909 3.011451
So the predicted failure time at 120◦C is 102.769 = 587.0571 hours,and the 95% prediction interval is (102.526, 103.011) ≈ (336, 1026)hours.
Handout 2 - 46
95% Prediction Interval Based on the Arrhenius Model
●
●
●●●
●●●
●●
●
●
●●●
●
●●
●●●●●
●
●●●●●●
●
●
●
●●●●
120 140 160 180 200 220 240
1.0
1.5
2.0
2.5
3.0
Temperature (Celcius)
log1
0 (F
ailu
re ti
me)
Handout 2 - 47