experiment design overview number of factors 1 2 k levels 2:min/max n - cat num regression...

33
Experiment Design Overview Number of factors 1 2 k levels levels levels 2:min/max n - cat n - cat n - cat num regression models 2 k repl interactions & errors 2 k-p weak interactio ns n x r pts 2 k r est. errors full factorial full factorial additive? Analysis to get: Main effects Interactions/errors % of variation explained Significance (CI or ANOVA)

Upload: bernice-ball

Post on 17-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

Experiment Design Overview

Number of factors

12

k

levels levels levels

2:min/maxn - catn - cat

n - catnum

regression models 2k

repl

interactions& errors

2k-p

weakinteractions

n x r pts

2kr

est.errors

full factorial full factorial

additive?

Analysis to get:Main effectsInteractions/errors% of variation explainedSignificance (CI or ANOVA)

Page 2: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Two-Factor Full Factorial Design

Without Replications• Used when you have only two parameters• But multiple levels for each• Test all combinations of the levels of the two

parameters• At this point, without replicating any

observations• For factors A and B with a and b levels, ab

experiments required

Page 3: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

Experimental Design

(l1,0, l1,1, … , l1,n1-1) x (l2,0, l2,1, … , l2,n2-1)

2 different factors, each factor with ni

levels

(and possibly r replications -- defer)

Categorical levels

Factor 1 Factor 2

Page 4: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

What is This Design Good For?

• Systems that have two important factors• Factors are categorical• More than two levels for at least one factor• Examples -

– Performance of different processors under different workloads

– Characteristics of different compilers for different benchmarks

– Effects of different reconciliation topologies and workloads on a replicated file system

Page 5: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

What Isn’t This Design Good For?

• Systems with more than two important factors– Use general factorial design

• Non-categorical variables– Use regression

• Only two levels– Use 22 designs

Page 6: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Model For This Design

• yij is the observation

• is the mean response

• j is the effect of factor A at level j

• i is the effect of factor B at level i

• eij is an error term

• Sums of j’s and j’s are both zero

y eij j i ij

Page 7: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

What Are the Model’s Assumptions?

• Factors are additive

• Errors are additive

• Typical assumptions about errors– Distributed independently of factor levels– Normally distributed

• Remember to check these assumptions!

Page 8: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Computing the Effects

• Need to figure out , j, and j

• Arrange observations in two-dimensional matrix – With b rows and a columns

• Compute effects such that error has zero mean– Sum of error terms across all rows and

columns is zero

Page 9: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Two-Factor Full Factorial Example

• We want to expand the functionality of a file system to allow automatic compression

• We examine three choices -– Library substitution of file system calls– A new VFS– UCLA stackable layers

• Using three different benchmarks– With response time as the metric

Page 10: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Sample Data for Our Example

Library VFS LayersCompileBenchmark 94.3 89.5 96.2

EmailBenchmark 224.9 231.8 247.2

Web ServerBenchmark 733.5 702.1 797.4

Page 11: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Computing

• Averaging the jth column,

• By assumption, the error terms add to zero

• Also, the js add to zero, so

• Averaging rows produces • Averaging everything produces

yb b

ej j ii

iji

. 1 1

y j j. yi i.

y..

Page 12: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

So the Parameters Are . . .

y..

j jy y . ..

i iy y . ..

Page 13: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Sample Data for Our Example

Library VFS LayersCompileBenchmark 94.3 89.5 96.2 93.3

EmailBenchmark 224.9 231.8 247.2 234.6

Web ServerBenchmark 733.5 702.1 797.4 744.3

_____________________________________________________

350.9 341.1 380.3 357.4

Page 14: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Calculating Parameters for Our

Example• = grand mean = 357.4• j = (-6.5, -16.3, 22.8)• i = (-264.1, -122.8, 386.9)• So, for example, the model predicts that

the email benchmark using a special-purpose VFS will take 357.4 - 16.3 -122.8 seconds– Which is 218.3 seconds

Page 15: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Estimating Experimental Errors

• Similar to estimation of errors in previous designs

• Take the difference between the model’s predictions and the observations

• Calculate a Sum of Squared Errors

• Then allocate the variation

Page 16: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Allocating Variation

• Using the same kind of procedure we’ve used on other models,

• SSY = SS0 + SSA + SSB + SSE

• SST = SSY - SS0

• We can then divide the total variation between SSA, SSB, and SSE

Page 17: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Calculating SS0, SSA, and SSB

• a and b are the number of levels for the factors

SS ab0 2

SSA b jj

2

SSB a ii

2

Page 18: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Allocation of Variation For Our Example

• SSE = 2512• SSY = 1,858,390• SS0 = 1,149,827• SSA = 2489• SSB = 703,561• SST=708,562• Percent variation due to A - .35%• Percent variation due to B - 99.3%• Percent variation due to errors - .35%

Page 19: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Analysis of Variation

• Again, similar to previous models– With slight modifications

• As before, use an ANOVA procedure – With an extra row for the second factor– And changes in degrees of freedom

• But the end steps are the same– Compare F-computed to F-table– Compare for each factor

Page 20: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Analysis of Variation for Our Example

• MSE = SSE/[(a-1)(b-1)]=2512/[(2)(2)]=628• MSA = SSA/(a-1) = 2489/2 = 1244• MSB = SSB/(b-1) = 703,561/2 = 351,780• F-computed for A = MSA/MSE = 1.98• F-computed for B = MSB/MSE = 560• The 95% F-table value for A & B = 6.94• A is not significant, B is

Page 21: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Checking Our Results With Visual Tests

• As always, check if the assumptions made by this analysis are correct

• Using the residuals vs. predicted and quantile-quantile plots

Page 22: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Residuals Vs. Predicted Response

for Example

-30

-20

-10

0

10

20

30

40

0 100 200 300 400 500 600 700 800 900

Page 23: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

What Does This Chart Tell Us?

• Do we or don’t we see a trend in the errors?• Clearly they’re higher at the highest level of

the predictors• But is that alone enough to call a trend?

– Perhaps not, but we should take a close look at both the factors to see if there’s a reason to look further

– And take results with a grain of salt

Page 24: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Quantile-Quantile Plot for Example

-40

-30

-20

-10

0

10

20

30

40

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Page 25: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Confidence Intervals for Effects

• Need to determine the standard deviation for the data as a whole

• From which standard deviations for the effects can be derived– Using different degrees of freedom for

each

• Complete table in Jain, pg. 351

Page 26: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Standard Deviations for Our Example

• se = 25

• Standard deviation of -

• Standard deviation of j -

• Standard deviation of i -

s s abe / / .25 3 3 8 3

s s a abj e 1 25 2 9 11 8.

s s b abi e 1 25 2 9 11 8.

Page 27: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Calculating Confidence Intervals for Our

Example• Just the file system alternatives shown

here• At 95% level, with 4 degrees of freedom• CI for library solution - (-39,26)• CI for VFS solution - (-49,16)• CI for layered solution - (-10,55)• So none of these solutions are 95%

significantly different than the mean

Page 28: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Looking a Little Closer

• Does this mean that none of the alternatives for adding the functionality are different?

• Use contrasts to checkcontrasts – any linear combination of effects whose coeff add up to zero (ch. 18.5)

Page 29: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

© 1998, Geoff Kuenning

Looking a Little Closer

• For example, is the library approach significantly better than layers?

– Using contrast of library-layers, the

confidence interval is (-58,-.5) (at 95%)

– So library approach is better, with this confidence

Page 30: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

Two-Factor Full Factorial Design With

Replications• Replicating a full factorial design allows

separating out the interactions between factors from experimental error

• Without replicating implies assumption that interactions were negligible and could be viewed as errors.

• Read Chapter 22

Page 31: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

(l1,0, l1,1, … , l1,n1-1) x (l2,0, l2,1, … , l2,n2-1) x …

x (lk,0, lk,1, … , lk,nk-1)

k different factors, each factor with ni levelsr replications

• Informal techniques just to answer which combo is of levels is best – rank responses

Factor 1

Factor k

Factor 2

Full Factorial Design With k Factors

Page 32: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

Experiment Design Overview

Number of factors

12

k

levels levels levels

2:min/maxn - catn - cat

n - catnum

regression models 2k

repl

interactions& errors

2k-p

weakinteractions

n x r pts

2kr

est.errors

full factorial full factorial

!additivelog transform

Jain ch 14,15

Jain ch 17

Jain ch 19

Jain ch 18

Jain ch 20 Jain ch 23

Jain ch 21

Jain ch 22

Page 33: Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions

For Discussion

Project Proposal1. Statement of hypothesis2. Workload decisions3. Metrics to be used4. Method