chapter 5 introduction to factorial designshqxu/stat201a/ch5-8.pdf · chapter 5 introduction to...

19
Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main eects, interactions 5.3 The Two-Factor Factorial Design The battery life experiment Two factors: Material type (qualitative) and Temperature (quantitative) The engineer can control the temperature during the experiment. However, when the battery is manufactured and shipped to the field, the engineer has no control over the temperature. Response: life (in hours) of the battery. Material Temperature ( o F) type 15 70 125 1 130,155,74,180 34,40,80,75 20,70,82,58 2 150,188,159,126 136,122,106,115 25,70,58,45 3 138,110,168,160 174,120,150,139 96,104,82,60 Questions: What eects do material type and temperature have on life? Is there a choice of material that would give long life regardless of tem- perature (a robust product)? This is a two-factor factorial design (or two-way layout). Factor A has a levels and factor B has b levels. There are total ab combinations (treatments), each replicated n times. The order in which the abn observations are taken is selected at random so that this design is a completely randomized design. 21 Comparison with randomized block design (RBD) RBD has one blocking factor and one experimental factor A 2-factor factorial studies two experimental factors In a 2-factor factorial, randomization is applied to all ab units. In a RBD, randomization is applied within each block. A block represents a restriction on randomization. 1 2 3 50 100 150 material y 20 40 60 80 100 50 100 150 temp y 60 100 140 Interaction Plot temp Mean Response 15 70 125 material 3 1 2 60 100 140 Interaction Plot material Mean Response 1 2 3 temp 70 15 125 An interaction plot shows the mean responses of the treatments. It is a useful tool to see the relationship between two factors. If there is no interaction eect, the lines would be nearly parallel. 22

Upload: others

Post on 18-Jul-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

Chapter 5 Introduction to Factorial Designs

Topics: Factorial Designs, main e↵ects, interactions

5.3 The Two-Factor Factorial Design

The battery life experiment

• Two factors: Material type (qualitative) and Temperature (quantitative)

• The engineer can control the temperature during the experiment.

• However, when the battery is manufactured and shipped to the field, theengineer has no control over the temperature.

• Response: life (in hours) of the battery.

Material Temperature (oF)type 15 70 1251 130,155,74,180 34,40,80,75 20,70,82,582 150,188,159,126 136,122,106,115 25,70,58,453 138,110,168,160 174,120,150,139 96,104,82,60

Questions:

• What e↵ects do material type and temperature have on life?

• Is there a choice of material that would give long life regardless of tem-perature (a robust product)?

This is a two-factor factorial design (or two-way layout).

• Factor A has a levels and factor B has b levels.

• There are total ab combinations (treatments), each replicated n times.

• The order in which the abn observations are taken is selected at randomso that this design is a completely randomized design.

21

Comparison with randomized block design (RBD)

• RBD has one blocking factor and one experimental factor

• A 2-factor factorial studies two experimental factors

• In a 2-factor factorial, randomization is applied to all ab units.

• In a RBD, randomization is applied within each block.

• A block represents a restriction on randomization.

1 2 3

5010

015

0

material

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

20 40 60 80 100

5010

015

0

temp

y

6010

014

0

Interaction Plot

tempM

ean

Res

pons

e

15 70 125

material312

6010

014

0

Interaction Plot

material

Mea

n R

espo

nse

1 2 3

temp

7015125

An interaction plot shows the mean responses of the treatments. It is auseful tool to see the relationship between two factors.

• If there is no interaction e↵ect, the lines would be nearly parallel.

22

Page 2: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

Linear model (or e↵ects model) for the two-factor factorial design is

yijk

= µ+ ↵i

+ �j

+ (↵�)ij

+ ✏ijk

, i = 1, . . . , a; j = 1, . . . , b; k = 1, . . . , n

where yijk

is the observation for the kth replicate of the ith level of factor Aand the jth level of factor B, ↵

i

is the ith main e↵ect for A, �j

is the jthmain e↵ect for B, (↵�)

ij

is the (i, j)th interaction e↵ect between A and Band ✏

ijk

are NID(0, �2) errors.

The Zero-Sum Constraints

aX

i=1

↵i

=bX

j=1

�j

= 0,

aX

i=1

(↵�)ij

= 0 for j = 1, . . . , b

bX

j=1

(↵�)ij

= 0 for i = 1, . . . , a

The ANOVA table

Sources DF SS MSA a� 1 SS

A

MSA

= SSA

/(a� 1)B b� 1 SS

B

MSB

= SSB

/(b� 1)A⇥ B (a� 1)(b� 1) SS

AB

MSAB

= SSAB

/((a� 1)(b� 1))residual ab(n� 1) SS

E

MSE

= SSE

/(ab(n� 1))total abn� 1 SS

total

The ANOVA decomposition:

aX

i=1

bX

j=1

nX

k=1

(yijk

� y···

)2 =aX

i=1

nb(yi··

� y···

)2 +bX

j=1

na(y·j·

� y···

)2

+aX

i=1

bX

j=1

n(yij·

� yi··

� y·j·

+ y···

)2 +aX

i=1

bX

j=1

nX

k=1

(yijk

� yij·

)2

SStotal

= SSA

+ SSB

+ SSAB

+ SSE

.

23

For the null hypothesis H0 : ↵1 = · · · = ↵a

(i.e., no di↵erence between thefactor A main e↵ects), reject H0 at level ↵ if

F =MS

A

MSE

=SS

A

/(a� 1)

SSE

/ab(n� 1)> F

a�1,ab(n�1),↵,

Test of the factor B main e↵ects is similar.For the null hypothesis H0: all (↵�)ij are equal (i.e., no interaction e↵ect

between A and B), reject H0 at level ↵ if

F =MS

AB

MSE

=SS

AB

/((a� 1)(b� 1))

SSE

/ab(n� 1)> F(a�1)(b�1),ab(n�1),↵,

For the battery experiment, the ANOVA table is

Df Sum Sq Mean Sq F value Pr(>F)

material 2 10684 5342 7.9114 0.001976 **

temperature 2 39119 19559 28.9677 1.909e-07 ***

material:temperature 4 9614 2403 3.5595 0.018611 *

Residuals 27 18231 675

Both main e↵ects are very significant. There are some moderate interac-tion between the material type and temperature. The interaction e↵ect issignificant at 5% level but not significant at 1% level.

After rejecting the nulls, it is usually of interest to make comparisons be-tween the individual factor levels to discover the specific di↵erences. BecauseAB interaction is significant, one can compare levels of A (or B) for fixedlevel of B (or A); one can also compare all ab treatments. See the text fordetails on multiple comparisons.

Conclusion:

• There are some moderate interaction between material type and tem-perature. The lower the temperature (within the experimental range),the longer the battery life.

• Batteries made from material type 3 tend to have a uniform long liferegardless of temperature.

24

Page 3: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

Residual analysis for the battery experiment

60 80 100 140

−60

−20

20

Fitted values

Res

idua

ls

●●

●●

●●

●●

● ●

●●

●●

Residuals vs Fitted

3

4

9

●●

●●

●●

●●

●●

●●

−2 −1 0 1 2−2

01

2

Theoretical Quantiles

Stan

dard

ized

resi

dual

s

Normal Q−Q plot

3

4

9

1 2 3

−60

−20

20

material

Res

idua

ls

●●

●●

●●

●●

●●

●●

20 40 60 80 100

−60

−20

20

temp

Res

idua

ls

The normal probability plot shows one potential low outlier.The residual plots show some mild inequality of variance, with the treat-

ment combination of 15oF and material type 1 possibly having larger variancethan the others.

Overall, the problem is not severe enough to have a dramatic impact onthe analysis and conclusions.

25

5.4 The General Factorial Design

Consider a three-factor factorial design

• A has a levels, B has b levels and C has c levels.

• There are total abc treatments, each replicated n times.

• For each replicate, abc units are randomly assigned to the abc treatments.

The linear model for the three-factor factorial design is:

yijkl

= µ+ ↵i

+ �j

+ �k

+ (↵�)ij

+ (↵�)ik

+ (��)jk

+ (↵��)ijk

+ ✏ijkl

,

i = 1, . . . , a; j = 1, . . . , b; k = 1, . . . , c, l = 1, . . . , n, (19)

where ↵i

, �j

and �k

are the A, B and C main e↵ects; (↵�)ij

, (↵�)ik

and (��)jk

are the A ⇥ B, A ⇥ C and B ⇥ C two-factor interaction e↵ects; (↵��)ijk

isthe A⇥B⇥C three-factor interaction e↵ect, and ✏

ijkl

are NID(0, �2) errors.Let

yijkl

= µ+ ↵i

+ �j

+ �k

+ [(↵�)ij

+ d(↵�)ik

+ d(��)jk

+ \(↵��)ijk

+ rijkl

where

µ = y····

,

↵i

= yi···

� y····

,

�j

= y·j··

� y····

,

�k

= y··k·

� y····

,[(↵�)

ij

= yij··

� yi···

� y·j··

+ y····

,d(↵�)

ik

= yi·k·

� yi···

� y··k·

+ y····

,d(��)

jk

= y·jk·

� y·j··

� y··k·

+ y····

,\(↵��)

ijk

= yijk·

� yij··

� yi·k·

� y·jk·

+ yi···

+ y·j··

+ y··k·

� y····

,

rijkl

= yijkl

� yijk·

are the estimates of the parameters under the zero-sum constraints.

26

Page 4: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

The ANOVA table is

Degrees of Sum ofSource Freedom Squares

A a� 1P

a

i=1 nbc(↵i

)2

B b� 1P

b

j=1 nac(�j)2

C c� 1P

c

k=1 nab(�k)2

A⇥ B (a� 1)(b� 1)P

a

i=1

Pb

j=1 nc([(↵�)

ij

)2

A⇥ C (a� 1)(c� 1)P

a

i=1

Pc

k=1 nb(d(↵�)

ik

)2

B ⇥ C (b� 1)(c� 1)P

b

j=1

Pc

k=1 na(d(��)

jk

)2

A⇥ B ⇥ C (a� 1)(b� 1)(c� 1)P

a

i=1

Pb

j=1

Pc

k=1 n(\(↵��)

ijk

)2

residual abc(n� 1)P

a

i=1

Pb

j=1

Pc

k=1

Pn

l=1 (yijkl � yijk·

)2

total abcn� 1P

a

i=1

Pb

j=1

Pc

k=1

Pn

l=1 (yijkl � y···

)2

SStotal

= SSA

+ SSB

+ SSC

+ SSAB

+ SSAC

+ SSBC

+ SSABC

+ SSE

The F statistic for testing H0 : ↵1 = . . . = ↵a

(i.e., no di↵erence amongfactor A main e↵ects)

F =MS

A

MSE

=SS

A

/(a� 1)

SSE

/abc(n� 1),

which has a� 1 and abc(n� 1) DF.The F statistics for other hypotheses are similar.

27

Chapter 6 The 2k Factorial Design

Topics: main e↵ects, interactions, half-normal quantile plot.

• A special and important case of the general k-factor factorial design.

• Each factor has two levels (low/high, �/+, 0/1, etc.)

6.2 The 22 Design

The chemical process experiment

• To investigate the e↵ect of reactant concentration (A) and the catalystamount (B) on the conversion (yield y) in a chemical process.

• A: low (15%) and high (25%); B: low (1 lb) and high (2 lbs).

• The experiment is replicated 3 times.

• The order in which the runs are made is random.

ReplicateA B 1 2 3 Total� � 28 25 27 80+ � 36 32 32 100� + 18 19 23 60+ + 31 30 29 90

Notation

• The 4 treatment combinations are labeled as (1), a, b, ab.

• (1), a, b, ab also represent the total of all n replicates taken at thetreatment combinations.

28

Page 5: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

Main e↵ects

A = y(A = +)� y(A = �) = [ab+ a� b� (1)]/(2n).

B = y(B = +)� y(B = �) = [ab+ b� a� (1)]/(2n).

Interaction e↵ect

AB = y(AB = +)� y(AB = �) = [ab+ (1)� a� b]/(2n).

The estimates are A = 8.33, B = �5, AB = 1.67.

Linear model is

y = �0 + �1xA + �2xB + �3xAxB + ✏,

where xA

= ±1, xB

= ±1 and ✏ is NID(0, �2) error.

The Model Matrix for a single replicate is

Treatment Factorial E↵ectCombination I A B AB

(1) +1 �1 �1 +1a +1 +1 �1 �1b +1 �1 +1 �1ab +1 +1 +1 +1

• The model matrix X is orthogonal, i.e., XTX = (4n)I4.

• Each column (except for I) represents a contrast.

• All the contrasts are orthogonal to each other.

The least square estimates are

�0 = y···

= [ab+ a+ b+ (1)]/(4n)

�1 = [ab+ a� b� (1)]/(4n) = A/2

�2 = [ab+ b� a� (1)]/(4n) = B/2

�3 = [ab+ (1)� a� b]/(4n) = AB/2

• Factorial e↵ects are twice least squares estimates (under the zero-sum constraints)

29

The ANOVA table for the chemical process experiment is

Df Sum Sq Mean Sq F value Pr(>F)

A 1 208.333 208.333 53.1915 8.444e-05 ***

B 1 75.000 75.000 19.1489 0.002362 **

A:B 1 8.333 8.333 2.1277 0.182776

Residuals 8 31.333 3.917

The regression summary is

Estimate Std. Error t value Pr(>|t|)

(Intercept) 27.5000 0.5713 48.135 3.84e-11 ***

A 4.1667 0.5713 7.293 8.44e-05 ***

B -2.5000 0.5713 -4.376 0.00236 **

A:B 0.8333 0.5713 1.459 0.18278

Residual standard error: 1.979 on 8 degrees of freedom

Multiple R-Squared: 0.903,Adjusted R-squared: 0.8666

F-statistic: 24.82 on 3 and 8 DF, p-value: 0.0002093

• Both main e↵ects are very significant and the interaction e↵ect is notsignificant.

• Notice the connections between two outputs?

We shall drop the interaction term. The final model is

Estimate Std. Error t value Pr(>|t|)

(Intercept) 27.500 0.606 45.377 6.13e-12 ***

A 4.167 0.606 6.875 7.27e-05 ***

B -2.500 0.606 -4.125 0.00258 **

Residual standard error: 2.099 on 9 degrees of freedom

Multiple R-Squared: 0.8772,Adjusted R-squared: 0.8499

F-statistic: 32.14 on 2 and 9 DF, p-value: 7.971e-05

• Note that the estimates are NOT changed.

The final model in terms of coded values is

y = 27.5 + 4.167xA

� 2.5xB

The final model in terms of actual factor values is

y = 27.5 + 4.167(Conc� 20)/5� 2.5(Catalyst� 1.5)/0.5

30

Page 6: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

Residual analysis for the final model: Are the assumptions reasonable?

Interaction and residual plots for the chemical process experiment

2022

2426

2830

32

AB Interaction

A

mea

n of

y

−1 1

B

−11

2022

2426

2830

32

BA Interaction

B

mea

n of

y

−1 1

A

−11

22 24 26 28 30 32 34

−3−2

−10

12

Fitted values

Res

idua

ls

●●

Residuals vs Fitted

7

1

5

● ●

−1.5 −0.5 0.5 1.5

−1.5

−0.5

0.5

1.0

1.5

Theoretical Quantiles

Stan

dard

ized

resi

dual

s

Normal Q−Q plot

7

1

5

31

6.5 A Single Replicate of the 2k Design

• A single replicate of the 2k design is an unreplicated 2k factorial design.

• These designs are very widely used in practice for run size economy.

• Risks: modeling noise?

• Lack of replication causes potential problems in statistical testing

– Replication admits an estimate of “pure error”

– With no replication, fitting the full model results in zero degrees offreedom for error

• Potential solutions to this problem

– Pooling high-order interactions to estimate error

– Normal or half-normal quantile plotting of e↵ects

– Other methods; see text, pp. 234

The Filtration Rate Experiment

• A 24 factorial was used to investigate the e↵ects of four factors on thefiltration rate of a chemical product.

• The factors are A = temperature, B = pressure, C = concentration offormaldehyde, D= stirring rate

• The 16 runs are made in random order.

• The goal is to maximize the filtration rate.

32

Page 7: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

Factor FiltrationRun A B C D Rate (y)

1 �1 �1 �1 �1 452 1 �1 �1 �1 713 �1 1 �1 �1 484 1 1 �1 �1 655 �1 �1 1 �1 686 1 �1 1 �1 607 �1 1 1 �1 808 1 1 1 �1 659 �1 �1 �1 1 4310 1 �1 �1 1 10011 �1 1 �1 1 4512 1 1 �1 1 10413 �1 �1 1 1 7514 1 �1 1 1 8615 �1 1 1 1 7016 1 1 1 1 96

The treatment contrast table (model matrix for the full model) is

Run I A B C D AB AC AD BC BD CD ABC ABD ACD BCD ABCD1 1 �1 �1 �1 �1 1 1 1 1 1 1 �1 �1 �1 �1 12 1 1 �1 �1 �1 �1 �1 �1 1 1 1 1 1 1 �1 �13 1 �1 1 �1 �1 �1 1 1 �1 �1 1 1 1 �1 1 �14 1 1 1 �1 �1 1 �1 �1 �1 �1 1 �1 �1 1 1 15 1 �1 �1 1 �1 1 �1 1 �1 1 �1 1 �1 1 1 �16 1 1 �1 1 �1 �1 1 �1 �1 1 �1 �1 1 �1 1 17 1 �1 1 1 �1 �1 �1 1 1 �1 �1 �1 1 1 �1 18 1 1 1 1 �1 1 1 �1 1 �1 �1 1 �1 �1 �1 �19 1 �1 �1 �1 1 1 1 �1 1 �1 �1 �1 1 1 1 �110 1 1 �1 �1 1 �1 �1 1 1 �1 �1 1 �1 �1 1 111 1 �1 1 �1 1 �1 1 �1 �1 1 �1 1 �1 1 �1 112 1 1 1 �1 1 1 �1 1 �1 1 �1 �1 1 �1 �1 �113 1 �1 �1 1 1 1 �1 �1 �1 �1 1 1 1 �1 �1 114 1 1 �1 1 1 �1 1 1 �1 �1 1 �1 �1 1 �1 �115 1 �1 1 1 1 �1 �1 �1 1 1 1 �1 �1 �1 1 �116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Factorial e↵ect✓ = y(✓ = +1)� y(✓ = �1)

33

Estimated factorial e↵ects are

A B C D AB AC AD BC

21.625 3.125 9.875 14.625 0.125 -18.125 16.625 2.375

BD CD ABC ABD ACD BCD ABCD

-0.375 -1.125 1.875 4.125 -1.625 -2.625 1.375

Q: Which factorial e↵ects are significant?

Normal Quantile Plot of Factorial E↵ects:

• Plot ordered factorial e↵ects against standard normal quantiles. Let✓(1) · · · ✓(I) be ordered e↵ects.

– plot ✓(i) versus ��1([i� 0.5]/I) for i = 1, . . . , I

• Large e↵ects far from the line are important.

Half-Normal Quantile Plot of Factorial E↵ects:

• Plot ordered absolute factorial e↵ects against standard half normal quan-tiles

– plot |✓|(i) versus ��1(0.5 + 0.5[i� 0.5]/I) for i = 1, . . . , I.

• Large e↵ects above the line are important.

• Half-normal quantile plot is preferred.

● ●●●●●●●

●●

●●

−2 −1 0 1 2−2

00

1020

Normal Quantile Plot

normal quantiles

effe

cts C

DA:D

A:C

A

●●●●●

●●●●

●●

0.0 1.0 2.0

05

1015

20

Half−Normal Quantile Plot

half−normal quantiles

abso

lute

effe

cts

C

DA:DA:C

A

34

Page 8: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

• A, AC, AD, D, C are important.

• Factor B does not appear in any significant e↵ects.

Design Projection

• Projection onto factors A, C and D is a replicated 23 factorial design(hidden replication).

• By projecting, we have an estimate of error.

The regression summary for the replicated 23 design:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 70.0625 1.1842 59.164 7.40e-12 ***

A 10.8125 1.1842 9.131 1.67e-05 ***

C 4.9375 1.1842 4.169 0.003124 **

D 7.3125 1.1842 6.175 0.000267 ***

A:C -9.0625 1.1842 -7.653 6.00e-05 ***

A:D 8.3125 1.1842 7.019 0.000110 ***

C:D -0.5625 1.1842 -0.475 0.647483

A:C:D -0.8125 1.1842 -0.686 0.512032

Residual standard error: 4.737 on 8 degrees of freedom

Multiple R-Squared: 0.9687,Adjusted R-squared: 0.9413

F-statistic: 35.35 on 7 and 8 DF, p-value: 2.119e-05

The final model is

y = 70.0625+ 10.8125xA

+4.9375xC

+7.3125xD

� 9.0625xA

xC

+8.3125xA

xD

We shall perform residual analysis for the final model. Are the modeland assumptions reasonable?

How to choose optimal factor settings to maximize filtration rate?

• Use the final model : A = , C = , D =

• Use interaction plots: A = , C = , D =

35

Interaction and residual plots for the filtration rate experiment

4050

6070

8090

AC Interaction

A

mea

n of

y

−1 1

C

−11

4050

6070

8090

AD Interaction

A

mea

n of

y

−1 1

D

−11

50 60 70 80 90 100

−6−2

02

46

Fitted values

Res

idua

ls

●●

●●

●●

Residuals vs Fitted

145

7

● ●

●●

●●

−2 −1 0 1 2

−10

12

Theoretical Quantiles

Stan

dard

ized

resi

dual

s

Normal Q−Q plot

14 5

7

36

Page 9: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

Main e↵ects and Interaction plots for the filtration rate experiment

4050

6070

8090

100

Main Effect A

A

mea

n of

y

−1 1

4050

6070

8090

100

Main Effect C

C

mea

n of

y

−1 1

4050

6070

8090

100

Main Effect D

D

mea

n of

y

−1 1

4050

6070

8090

100

Interaction AC

A

mea

n of

y

−1 1

C

−11

4050

6070

8090

100

Interaction AD

A

mea

n of

y

−1 1

D

−11

4050

6070

8090

100

Interaction CD

C

mea

n of

y

−1 1

D

−11

5060

7080

90

Interaction ACD

A

mea

n of

y

−1 1

CD

−1−1−111−111

5060

7080

90

Interaction ACD

D

mea

n of

y

−1 1

AC

−1−1−111−111

5060

7080

90

Interaction ACD

C

mea

n of

y

−1 1

AD

−1−1−111−111

37

Chapter 7 Blocking and Confounding in the 2k Factorial

7.2 Blocking a Replicated 2k Factorial Design

The chemical process experiment (c.f. §6.2) in 3 blocks

• Suppose only 4 experimental trials can be made from a single batch ofraw material.

• Three batches of raw material is required to run all 3 replicates.

• Each raw material is a block.

• Runs within the block are randomized.

• This is a randomized complete block design.

BlockA B 1 2 3 Total� � 28 25 27 80+ � 36 32 32 100� + 18 19 23 60+ + 31 30 29 90Total 113 106 111 330

Linear model is

y = �0 + ↵j

+ �1xA + �2xB + �3xAxB + ✏,

where ↵j

is the jth block e↵ect, xA

= ±1, and xB

= ±1.The ANOVA table is

Df Sum Sq Mean Sq F value Pr(>F)

block 2 6.500 3.250 0.7852 0.4978348

A 1 208.333 208.333 50.3356 0.0003937 ***

B 1 75.000 75.000 18.1208 0.0053397 **

A:B 1 8.333 8.333 2.0134 0.2057101

Residuals 6 24.833 4.139

• Note that factorial estimates are NOT changed.

• The block e↵ect is not significant.

38

Page 10: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

7.3-7.4 Confounding in the 2k Factorial Design

The Filtration Rate Experiment (c.f. §6.5) in 2 Blocks

• Suppose only 8 treatment combinations can be run from a single batchof raw material.

• Two batches of raw materials are needed for 16 runs.

• The batches are blocks.

• This is an incomplete block design.

Factor FiltrationRun A B C D Block Rate (y)

1 �1 �1 �1 �1 1 252 1 �1 �1 �1 2 713 �1 1 �1 �1 2 484 1 1 �1 �1 1 455 �1 �1 1 �1 2 686 1 �1 1 �1 1 407 �1 1 1 �1 1 608 1 1 1 �1 2 659 �1 �1 �1 1 2 4310 1 �1 �1 1 1 8011 �1 1 �1 1 1 2512 1 1 �1 1 2 10413 �1 �1 1 1 1 5514 1 �1 1 1 2 8615 �1 1 1 1 2 7016 1 1 1 1 1 76

• The original responses are

45, 71, 48, 65, 68, 60, 80, 65, 43, 100, 45, 104, 75, 86, 70, 96

• All responses in block 1 is 20 units lower than the responses in theoriginal data.

Block e↵ectBlock e↵ect = y(block 1)� y(block 2)

• Cannot estimate all 15 factorial e↵ects and one block e↵ect.

• One factorial e↵ect is fully confounded with the block e↵ect.

39

The treatment contrast table (model matrix) with block e↵ect is

Run I A B C D AB AC AD BC BD CD ABC ABD ACD BCD ABCD Block1 1 �1 �1 �1 �1 1 1 1 1 1 1 �1 �1 �1 �1 1 12 1 1 �1 �1 �1 �1 �1 �1 1 1 1 1 1 1 �1 �1 �13 1 �1 1 �1 �1 �1 1 1 �1 �1 1 1 1 �1 1 �1 �14 1 1 1 �1 �1 1 �1 �1 �1 �1 1 �1 �1 1 1 1 15 1 �1 �1 1 �1 1 �1 1 �1 1 �1 1 �1 1 1 �1 �16 1 1 �1 1 �1 �1 1 �1 �1 1 �1 �1 1 �1 1 1 17 1 �1 1 1 �1 �1 �1 1 1 �1 �1 �1 1 1 �1 1 18 1 1 1 1 �1 1 1 �1 1 �1 �1 1 �1 �1 �1 �1 �19 1 �1 �1 �1 1 1 1 �1 1 �1 �1 �1 1 1 1 �1 �110 1 1 �1 �1 1 �1 �1 1 1 �1 �1 1 �1 �1 1 1 111 1 �1 1 �1 1 �1 1 �1 �1 1 �1 1 �1 1 �1 1 112 1 1 1 �1 1 1 �1 1 �1 1 �1 �1 1 �1 �1 �1 �113 1 �1 �1 1 1 1 �1 �1 �1 �1 1 1 1 �1 �1 1 114 1 1 �1 1 1 �1 1 1 �1 �1 1 �1 �1 1 �1 �1 �115 1 �1 1 1 1 �1 �1 �1 1 1 1 �1 �1 �1 1 �1 �116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

• Block e↵ect and ABCD are fully confounded.

Block e↵ect = ABCD

• The estimate of ABCD is lost.

• Is this a good blocking scheme? Why?

Estimated e↵ects are

A B C D AB AC AD BC

21.625 3.125 9.875 14.625 0.125 -18.125 16.625 2.375

BD CD ABC ABD ACD BCD Block+ABCD

-0.375 -1.125 1.875 4.125 -1.625 -2.625 -18.625

• Estimated block e↵ect (+ABCD) is 1.375� 20 = �18.625.

Design question: To arrange a 2k design in 2p blocks of size 2k�p, whiche↵ects should be confounded with block e↵ects? (Stats c225)

40

Page 11: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

Chapter 8 Two-Level Fractional Factorial Designs

Topics: Alias, resolution, minimum aberration, supersaturated design

8.1 Introduction

• Motivation for fractional factorials is obvious; as the number of factorsbecomes large, the size of the designs grows very quickly

• Emphasis is on factor screening: to e�ciently identify the factors withlarge e↵ects

• There may bemany variables (often because we don’t know much aboutthe system)

• Almost always run as unreplicated factorials, but often with centerpoints

Why do fractional factorial designs work?

• The sparsity of e↵ects principle

– There may be lots of factors, but few are important

– System is dominated by main e↵ects, low-order interactions

• The projection property

– Every fractional factorial contains full factorials in fewer factors

• Sequential experimentation

– Can add runs to a fractional factorial to resolve di�culties (or am-biguities) in interpretation

41

8.2 The One-Half Fraction of the 2k Design

Notation: because the design has 2k/2 runs, it is referred to as a 2k�1 design

• Consider a really simple case, the 23�1 design defined by I = ABC

• For the principal fraction, notice that the contrast for estimating themain e↵ect A is exactly the same as the contrast used for estimating theBC interaction.

• This phenomena is called aliasing and it occurs in all fractional designs

• Aliases can be found directly from the columns in the table of + and �

signs

42

Page 12: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

Aliasing in the 23�1 design

• A main e↵ect is aliased with a two-factor interaction (or me = 2fi)

A = BC,B = AC,C = AB

• Aliases can be found from the defining relation I = ABC:

A = AI = A(ABC) = A2BC = BC

B = BI = B(ABC) = AB2C = AC

C = CI = C(ABC) = ABC2 = AB

• Textbook notation for aliased e↵ects:

[A] �! A+BC, [B] �! B + AC, [C] �! C + AB

The alternate fraction of the 23

• I = �ABC is the defining relation

• Implies slightly di↵erent aliases: A = �BC,B = �AC, and C = �AB

• Both designs belong to the same family, defined by

I = ±ABC

• Suppose that after running the principal fraction, the alternate fractionwas also run

• The two groups of runs can be combined to form a full factorial

– an example of sequential experimentation

Design resolution: the length of the shortest word

• Resolution III Designs: example 23�1III

I = ABC

– me = 2fi

• Resolution IV Designs: example 24�1IV

I = ABCD

– me=3fi and 2fi = 2fi

43

• Resolution V Designs: example 25�1V

I = ABCDE

– me=4fi and 2fi = 3fi

• Resolution R Designs: no p-factor e↵ect is aliased with another e↵ectcontaining < R� p factors

Construction of a One-half Fraction

• write a full 2k�1 design as the basic design

• add the kth column according to the design generator I = ±ABC · · ·K

Projection of Fractional Factorials

• Every fractional factorial contains full factorials in fewer factors

• A one-half fraction 2k�1 will project into a full factorial in any k � 1 ofthe original factors (if resolution R = k)

44

Page 13: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

Example 8-1 A half fraction of the filtration rate experiment

Recall: The estimated factorial e↵ects from the full 24 design are

A B C D AB AC AD BC

21.625 3.125 9.875 14.625 0.125 -18.125 16.625 2.375

BD CD ABC ABD ACD BCD ABCD

-0.375 -1.125 1.875 4.125 -1.625 -2.625 1.375

• [A] = A+BCD = 21.625� 2.625 = 19

• Interpretation of results often relies on making some assumptions

– Main e↵ects are more likely to be important than two-factor inter-actions

• Ockhams razor: the simplest interpretation is usually the correct one

• Confirmation experiments can be important

45

• Adding the alternate fraction can resolve the ambiguity due to aliasing

Possible strategies for follow-up experimentation following a frac-tional factorial design

46

Page 14: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

8.3 The One-Quarter Fraction of the 2k Design

The injection molding process experiment:

• Parts manufactured in the process are showing excessive shrinkage.

• investigation of six factors: (A) mold temperature, (B) screw speed, (C)holding time, (D) cycle time, (E) gate size, (F) hold pressure

• objective: learn how each factor a↵ects shrinkage and something abouthow the factors interact.

• A 26�2 design with complete defining relation

I = ABCE = BCDF = ADEF

Basic Design ObservedRun A B C D E = ABC F = BCD shrinkage (⇥10)1 � � � � � � 62 + � � � + � 103 � + � � + + 324 + + � � � + 605 � � + � + + 46 + � + � � + 157 � + + � � � 268 + + + � + � 609 � � � + � + 810 + � � + + + 1211 � + � + + � 3412 + + � + � � 6013 � � + + + � 1614 + � + + � � 515 � + + + � + 3716 + + + + + + 52

• generators: E = ABC and F = BCD

• Defining words: ABCE,BCDF,ADEF

• Resolution = shortest word length

47

• This is a resolution IV design, so me=3fi, 2fi=2fi

• Each e↵ect is aliased with other three e↵ects.

• Can estimate 15 factorial e↵ects, one from each alias set

A,B,C,D,E, F,AB = CE,AC = BE,AD = EF,

AE = BC = DF,AF = DE,BD = CF,BF = CD, [ABD], [ABF ]

Estimates of aliased e↵ects

A B C D E F A:B A:C

13.875 35.625 -0.875 1.375 0.375 0.375 11.875 -1.625

A:D A:E A:F B:D B:F A:B:D A:B:F

-5.375 -1.875 0.625 -0.125 -0.125 0.125 -4.875

• [A] �! A+BCE +DEF + ABCDF = 13.875

• [AB] �! AB + CE + ACDF +BDEF = 11.875

48

Page 15: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

●●●●●●●●●●

● ●

●●

0.0 1.0 2.0

010

2030

Half−Normal Plot

half−normal quantiles

abso

lute

effe

cts

A:BA

B

1030

50

AB Interaction plot

A

mea

n of

shr

inka

ge−1 1

B1−1

• [A], [B] and [AB] are important

• Since neither C nor E is important, it is reasonable to assume CE isnegligible.

• Conclusion: Two factors (A, B) and the AB interaction are important

A final model with A, B and AB, which has R2 = .96.

y = 27.312 + 6.937xA

+ 17.812xB

+ 5.937xA

xB

Regression summary

Estimate Std. Error t value Pr(>|t|)

(Intercept) 27.312 1.138 23.996 1.65e-11 ***

A 6.937 1.138 6.095 5.38e-05 ***

B 17.812 1.138 15.649 2.39e-09 ***

A:B 5.937 1.138 5.216 0.000216 ***

Residual standard error: 4.553 on 12 degrees of freedom

Multiple R-Squared: 0.9626,Adjusted R-squared: 0.9533

• To minimize the response, choose A = and B =

• From the interaction plot, the key is to choose B =

49

Residual analysis

10 20 30 40 50

−50

5

Fitted values

Res

idua

ls

● ●

● ● ●

Residuals vs Fitted13

7 16

●●

●●●

−1.0 0.0 0.5 1.0

−6−2

26

Residuals vs holding time (C)

C

resi

d(g)

• there are some dispersion e↵ects

Uses of the alternate fractions

I = ±ABCE, I = ±BCDF

Projection

• Any 4-factor subset of the original six variables that is not a word in thecomplete defining relation will result in a full factorial design

– Consider ABCD (full factorial)

– Consider ABCE (replicated half fraction)

– Consider ABCF (full factorial)

8.4 The General 2k�p Fractional Factorial Design

• 2k�1 = one-half fraction, 2k�2 = one-quarter fraction, 2k�3 = one-eighthfraction, . . ., 2k�p = 1/2p fraction

• Add p columns to the basic design (a full 2k�p design); select p indepen-dent generators

• Important to select generators so as to maximize resolution

50

Page 16: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

Example: Two 26�2 designs (Which design is better?)

• Design 1: E = ABC,F = BCD

• Design 2: E = ABCD,F = BCD

Resolution may not be su�cient.

Minimum aberration designs

• minimize the number of words in the defining relation that are of mini-mum length

• see Table 8-14 for k 15 factors and up to n 128 runs

• see Xu (2009, Technometrics) for k up to 40–160 factors and n = 128–4096 runs.

Other topics

• Projection: a design of resolution R contains full factorials in any R�1of the factors

• Blocking

– How to optimally arrange 2k�p designs into 2q blocks?

– see Xu (2006, Annals of Statistics), Xu and Lau (2006, JSPI), Xuand Mee (2010, JSPI) for minimum aberration blocked 2k�p designs

Research problems:

• How to construct minimum aberration designs with � 256 runs?

• How to construct minimum aberration blocked designs with � 256 runs?

51

8.5 Resolution III Designs

• Main e↵ects are aliased with two-factor interactions (me=2fi)

• Often used for screening (5–7 variables in 8 runs, 9–15 variables in 16runs, for example)

• A saturated design of N runs has k = N � 1 variables

A saturated 27�4III

design

• alias structure (ignoring 3fi or higher):

[A] �! A+BD + CE + FG, [B] �! B + AD + CF + EG,

[C] �! C + AE +BF +DG, [D] �! D + AB + CG+ EF,

[E] �! E + AC +BG+DF, [F ] �! F +BC + AG+DE,

[G] �! G+ CD +BE + AF

• can use the fold-over techniques to dealiases all main e↵ects from theirtwo-factor interaction alias chains

Plackett-Burman designs

• These are a di↵erent class of resolution III design

• The number of runs, N , need only be a multiple of four

N = 4, 8, 12, 16, 20, 24, 28, 32, 36, 40,

52

Page 17: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

• The designs where N = 12, 20, 24, etc. are called nonregular designs

• See text for comments on construction of Plackett-Burman designs

The Weld Repaired Cast Fatigue Experiment (Wu and Hamada, 2009)

• used a 12-run design to study the e↵ects of seven factors on the fatiguelife of weld repaired castings.

• The response is the logged lifetime of the casting

• The goal of the experiment was to identify the factors that a↵ect thecasting lifetime.

LevelFactor � +

A. initial structure as received � treatB. bead size small largeC. pressure treat none HIPD. heat treat anneal solution treat/ageE. cooling rate slow rapidF. polish chemical mechanicalG. final treat none peen

Factor LoggedRun A B C D E F G 8 9 10 11 Lifetime1 + + � + + + � � � + � 6.0582 + � + + + � � � + � + 4.7333 � + + + � � � + � + + 4.6254 + + + � � � + � + + � 5.8995 + + � � � + � + + � + 7.0006 + � � � + � + + � + + 5.7527 � � � + � + + � + + + 5.6828 � � + � + + � + + + � 6.6079 � + � + + � + + + � � 5.81810 + � + + � + + + � � � 5.91711 � + + � + + + � � � + 5.86312 � � � � � � � � � � � 4.809

53

> g=lm(y ~ A+B+C+D+E+F+G, dat); summary(g)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 5.73025 0.17114 33.482 4.75e-06 ***

A 0.16292 0.17114 0.952 0.3950

B 0.14692 0.17114 0.858 0.4390

C -0.12292 0.17114 -0.718 0.5123

D -0.25808 0.17114 -1.508 0.2060

E 0.07492 0.17114 0.438 0.6842

F 0.45758 0.17114 2.674 0.0556 .

G 0.09158 0.17114 0.535 0.6209

• Only F is significant at 10%.

• A model with F only has R2 = 0.45 (does not fit well).

• A model with F and D has R2 = 0.59.

Aliasing

• The alias structure is complex in this PB design

• Every main e↵ect is partially aliased with every 2fi not involving itself

[A] �! A+1

3(�BC � BD � BE +BF + · · ·� FG)

• Partial aliasing can greatly complicate analysis and interpretation

• More elaborate analysis shows that F and FG are significant

• A model with F and FG has R2 = 0.89.

y = 5.7303 + 0.4576F � 0.4588FG

Projections onto 3 or 4 factors

• projection onto any 3 factors contains a full 23 deign (projectivity 3)

• regular resolution III fractions has projectivity 2.

54

Page 18: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

8.7 Supersaturated Designs

• A design is saturated if the # of variables k = N � 1, where N is the# of runs

• A design is supersaturated if the number of variables k > N � 1

• Supersaturated designs are commonly used in screening experiments

– goal is to identify sparse and dominant active e↵ects with low cost

• potential applications in many areas such as industrial, medical sciences,engineering

55

Why does it work?

• E↵ect sparsity: the number of relatively important e↵ects in a factorialexperiment is small.

Construction

• Basic idea: Choose a design with small correlations among the columns

Analysis

• Very challenging due to the existence of complex aliasing among e↵ects

• Can NOT fit a linear regression model directly

• forward stepwise variable selection

– easy but has large type I and type II errors

• Bayesian variable selection

– flexible but not easy to use

• Dantzig selector

– a good tool and easy to use

Dantzig selector proposed by Candes and Tao (2007, Annals of Statistics)

• chooses the best subset of variables or active factors by solving a verysimple convex program, which can be recast as a convenient linear pro-gram.

• successfully used in biomedical imaging, analog-to-digital data conver-sion and sensor networks, where the goals are to recover some sparsesignals from some massive data.

Consider a general linear regression model

y = X� + ✏,

where X is an n⇥p model matrix. The model is supersaturated when p > n.

56

Page 19: Chapter 5 Introduction to Factorial Designshqxu/stat201A/ch5-8.pdf · Chapter 5 Introduction to Factorial Designs Topics: Factorial Designs, main e ↵ects, interactions 5.3 The Two-Factor

The Dantzig selector is the solution to the l1-regularization problem

min�

k�kl1 subject to kXT (y �X�)k

l1 � (20)

where for a vector a, kakl1 =

P|a

i

| and kakl1 = max |a

i

|, and � is a tuningparameter.

The Dantzig selector can be recast as a linear program

minX

i

ui

subject to � u � u and � �1 XT (y �X�) �1, (21)

where the optimization variables are u, � 2 Rp and 1 is a p-dimensional vectorof ones. This is equivalent to the standard linear program in matrix form

min cTx subject to Ax � b and x � 0, (22)

where

c =

✓10

◆, A =

0

@XTX �XTX

�XTX XTX2I

p

�Ip

1

A , b =

0

@�XTy � �1XTy � �1

0

1

A , x =

✓u

u+ �

◆.

When X is an orthonormal matrix, the Dantzig selector has a simple closeform

�i

= max(|bi

|� �, 0)sign(bi

),

where bi

is the least squares estimate of �i

. Note that the Dantzig selector isshifted toward the origin, a soft-thresholding phenomena. For an arbitraryX, the method continues to exhibit a soft-thresholding type of behavior andas a result, may underestimate the true value of the nonzero parameters.

A two-stage procedure for bias correction

1. Estimate I = {i : �i

6= 0} with I = {i : |�i

| > �} for some small � � 0with � as the solution to the linear program (21).

2. Construct the least squares estimate �I

= (XT

I

XI

)�1XT

I

y and set theother coordinates to 0.

In other words, we rely on the Dantzig selector to estimate the model I andthen construct a new estimator by regressing y onto the model I.

57

Example: The cast fatigue experiment. Consider two models:

• (a) a main e↵ects model

– X is a 12⇥ 7 matrix and orthogonal

• (b) a main e↵ects plus 2-factor interactions model

– X is a 12⇥ (7 + 21) matrix and supersaturated

• center the response y and obtain the Dantzig selectors � by solving thelinear program (21) with � varying from 0 to 6.

• Make a profile plot by plotting � against �

– (left) main e↵ects, (right) main e↵ects + 2-factor interactions

0 1 2 3 4 5 6

−0.4

0.0

0.2

0.4

delta

beta

F

D

0 1 2 3 4 5 6

−0.4

0.0

0.2

0.4

delta

beta

F

FGAE

• For the main e↵ects model, F is the most significant andD is moderatelysignificant.

• When we entertain the two-factor interactions, F and FG are very sig-nificant and AE is moderately significant.

Reference: Phoa, F. K. H., Pan, Y.-H. and Xu, H. (2009). Analysis ofSupersaturated Designs via the Dantzig Selector. Journal of Statistical Plan-ning and Inference, 139, 2362-2372.

58