always be contented, be grateful, be understanding and be compassionate

35
1 Always be contented, be grateful, be understanding and be compassionate.

Upload: jelani-simmons

Post on 01-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Always be contented, be grateful, be understanding and be compassionate. Blocking. We will add a factor even if it is not of interest so that the study of the prime factors is under more homogeneous conditions. This factor is called “block”. Most of time, the - PowerPoint PPT Presentation

TRANSCRIPT

1

Always be contented, be grateful, be understanding and be compassionate.

2

Blocking• We will add a factor even if it is not of interest so that the study of the prime factors is under more homogeneous conditions. This factor is called “block”. Most of time, the block does not interact with prime factors.

• Popular block factors are “location”, “gender” and so on.

• A RxC two-factor design with one block factor is called a “randomized block design with RxC factorial structure”.

RBD Model (Section 15.2)

3

•A randomized (complete) block design is an experimental design for comparing t treatments (or say levels) in b blocks. Treatments are randomly assigned to units within a block and without replications.

•The probability model of RBD is the same as two-way Anova model with no interaction term (so can conduct multiple comparisons for each factor separately)

4

For example, suppose that we are studying worker absenteeism as a function of the age of the worker, and have different levels of ages: 25-30, 40-55, and 55-60. However, a worker’s gender may also affect his/her amount of absenteeism. Even though we are not particularly concerned with the impact of gender, we want to ensure that the gender factor does not pollute our conclusions about the effect of age. Moreover, it seems unlikely that “gender” interacts with “ages”. We include “gender” as a block factor.

O/L: Example 15.1

5

• Goal: To compare the effects of 3 different insecticides on a variety of string beans.

• Condition: It was necessary to use 4 different plots of land.

• Response of interest: the number of seedlings that emerged per row.

Data:

6

insecticide plot seedlings1 1 561 2 481 3 661 4 622 1 832 2 782 3 942 4 933 1 803 2 723 3 833 4 85

Minitab>>General Linear Model, response seedlings, model insecticide & plot

7

General Linear Model: seedings versus insectcide, plot Analysis of Variance for seedlings, using Adjusted SS for TestsSource DF Seq SS Adj SS Adj MS F Pinsecticide 2 1832.00 1832.00 916.00 211.38 0.000plot 3 438.00 438.00 146.00 33.69 0.000Error 6 26.00 26.00 4.33Total 11 2296.00S = 2.08167 R-Sq = 98.87% R-Sq(adj) = 97.92%

Unusual Observations for seedingsObs seedings Fit SE Fit Residual St Resid 11 83.0000 86.0000 1.4720 -3.0000 -2.04 RR denotes an observation with a large standardized residual.

8

420-2-4

99

90

50

10

1

Residual

Perc

ent

9080706050

2

0

-2

Fitted Value

Resi

dual

210-1-2-3

3

2

1

0

Residual

Fre

quency

121110987654321

2

0

-2

Observation Order

Resi

dual

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for seedings

RBD with random blocks

• We would like to apply our conclusions on a large pool of blocks

• We are able to sample blocks randomly

• Example: Minitab unit 5– Goal: to study the difference of 3 appraisers on

their appraised values– Blocks: randomly selected 5 properties

9

10

Latin Square Design (Section 15.3)Example:

Three factors, A (block factor), B (block factor), and C (treatment factor), each at three levels. A possible arrangement:

B 1 B 2 B 3

A1

C1 C1 C1

A C2 C2 C2

A3 C3 C3 C3

2

11

Notice, first, that these designs are squares: all factors are at the same number of levels, though there is no restriction on the nature of the levels themselves. Notice, that these squares are balanced: each letter (level) appears the same number of times; this insures unbiased estimates of main effects.

How to do it in a square? Each treatment appears once in every column and row.

Notice, that these designs are incomplete; of the 27 possible combinations of three factors each at three levels, we use only 9.

12

Example:

Three factors, A (block factor), B (block factor), and C (treatment factor), each at three levels, in a Latin Square design; nine combinations.

B 1 B 2 B 3

A1

C1 C2 C3

A C2 C3 C1

A3 C3 C1 C2

2

13

Example with 4 Levels per FactorExample with 4 Levels per Factor

AutomobilesAutomobiles A A four levelsfour levelsTire positionsTire positions B B four levelsfour levelsTire treatments Tire treatments C C four levelsfour levels

FACTORSFACTORS

Lifetime of a tire Lifetime of a tire (days)(days)

VARIABLEVARIABLE

A 1

A 2

A 3

A 4

B1 B2 B3 B4

C 4

8 5 5C 3

8 7 7C 2

8 9 0C 1

9 9 7

C 1

9 6 2C 2

8 1 7C 3

8 4 5C 4

7 7 6

C 3

8 4 8C 4

8 4 1C 1

7 8 4C 2

7 7 6

C 2

8 3 1C 1

9 5 2C 4

8 0 6C 3

8 7 1

14

The Model for (Unreplicated) Latin The Model for (Unreplicated) Latin SquaresSquares

Example: (p.965 for full descriptions of model terms)

Note that interaction terms are not present in the model.

Threefactors , ,andeachat mlevels,

yijk = +

i+ j + k + ijk

i= 1,... m

j=1, ..., m

k=1, ... ,m

Same three assumptions: normality, constant variances, and randomness.

Y = (A + B + C (+ e)AB, AC, BC, ABC

15

Putting in Estimates:Putting in Estimates:

Total variability

among yields

Variability among yields

associated with Rows

Variability among yields

associated with

Columns

Variability among yields

associated with Inside

Factor

where R =

or bringing y••• to the left – hand side,

(y ijk –y ...) = (y i .. – y ...) + (y .j . – y...) + (y ..k – y ...) + R,

= + +

y ijk =y ... + (y i.. – y ... ) + (y . j. – y ...) + (y ..k – y ... ) + R

yijk – y i.. – y . j. – y.. k + 2y...

16

Actually, Actually, R R

Residual (R) is an “interaction-like” term. (After all, there’s no replication!)

= y ijk - y i .. - y . j. - y ..k + 2y...= (y ijk - y ...)

-

(y i.. - y ...)

(y . j. - y ...)

(y ..k - y ...),-

-

17

The analysis of variance (omitting the mean squares, The analysis of variance (omitting the mean squares, which are the ratios of second to third entries), and which are the ratios of second to third entries), and expectations of mean squares:expectations of mean squares:

Source ofvariation

Sum ofsquares

Degrees offreedom

Expectedvalue of

mean squareRows

m (y i.. – y ...)2

i = 1

m m – 1 2 + m Rows

Columns m (y . j . – y ...)

2j = 1

m m – 1 2 + Col

Insidefactor

m (y ..k – y ...)

2k = 1

m m – 1 2 + Inside factor

by subtraction (m – 1)( m – 2) 2

Total i

j(y ijk – y ...)

2k

m 2 – 1

Error

m

m

18

The expected values of the mean squares immediately suggest the F ratios appropriate for testing null hypotheses on rows, columns and inside factor.

19

Our Example:

B1 B2 B3 B4

A1 4855

3877

2890

1997

A2 1962

2817

3845

4776

A3 3848

4841

1784

2776

A4 2831

1952

4806

3871

Tire Position

Auto.

(Inside factor = Tire Treatment)

20

General Linear Model: Lifetime versus Auto, Postn, Trtmnt

Factor Type Levels Values Auto fixed 4 1 2 3 4Postn fixed 4 1 2 3 4Trtmnt fixed 4 1 2 3 4

Analysis of Variance for Lifetime, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F PAuto 3 17567 17567 5856 2.17 0.192Postn 3 4679 4679 1560 0.58 0.650Trtmnt 3 26722 26722 8907 3.31 0.099Error 6 16165 16165 2694Total 15 65132

Unusual Observations for Lifetime

Obs Lifetime Fit SE Fit Residual St Resid 11 784.000 851.250 41.034 -67.250 -2.12R

21

Minitab DATA ENTRY

VAR1 VAR2 VAR3 VAR4855 1 1 4962 2 1 1848 3 1 3831 4 1 2877 1 2 3817 2 2 2. . . .. . . .. . . .871 4 4 3

22

Latin Square with REPLICATION

• Case One: using the same rows and columns for all Latin squares (same blocks).

• Case Two: using different rows and columns for different Latin squares (different blocks).

• Case Three: using the same rows but different columns for different Latin squares (same row blocks but different column blocks).

23

Treatment Assignments for n Replications

• Case One: repeat the same Latin square n times.

• Case Two: randomly select one Latin square for each replication.

• Case Three: randomly select one Latin square for each replication.

24

Example: n = 2, m = 4, trtmnt = A,B,C,D

Case One:

column

row 1 2 3 4

1 A B C D

2 B C D A

3 C D A B

4 D A B C

column

row 1 2 3 4

1 A B C D

2 B C D A

3 C D A B

4 D A B C

• Row = 4 tire positions; column = 4 cars

25

column

row 1 2 3 4

1 A B C D

2 B C D A

3 C D A B

4 D A B C

column

row 5 6 7 8

5 B C D A

6 A D C B

7 D B A C

8 C A B D

Case Two

• Row = clinics; column = patients; letter = drugs for flu

26

5 6 7 8

B C D A

A D C B

D B A C

C A B D

Case Three

column

row 1 2 3 4

1 A B C D

2 B C D A

3 C D A B

4 D A B C

• Row = 4 tire positions; column = 8 cars

27

ANOVA for Case 1

SSBR, SSBC, SSBIF are computed the same way as before, except that the multiplier of (say for

rows) m (Yi..-Y…)2 becomes

mn (Yi..-Y…)2

and degrees of freedom for error becomes

(nm2 - 1) - 3(m - 1) = nm2 - 3m + 2

28

ANOVA for other cases:

Using Minitab in the same way can give Anova tables for all cases.

1. SS: please refer to the book, Statistical Principles of research Design and Analysis by R. Kuehl.

2. DF: # of levels – 1 for all terms except error. DF of error = total DF – the sum of the rest DF’s.

29

Three or More Factors

Notation:

• Y = response; A, B, C, … = input factors

• AB = interaction between A and B

• ABC = interaction between A, B, and C

• The term involving k factors has order of k: eg. AB order 2 term

ABC order 3 term

30

• Full model = the model includes all factors and their interactions, denoted as

(1) Two factors

A|B (= A+B+AB)

(2) Three factors

A|B|C (= A+B+C+AB+AC+BC+ABC)

(3) And so on.

31

ABC Blow high

10 1213 1510 1213 15

lowhigh

A

C at level 1 Blow high

10 1515 1510 1515 15

lowhigh

A

C at level 2

Blow high

10 1313 1010 1313 10

lowhigh

A

C at level 3 Blow high

12 1212 1212 1212 12

lowhigh

A

C at level 4

A B AB1 3 2 02 2.5 2.5 -2.53 0 0 -34 0 0 0

A B AB1 3 2 02 2.5 2.5 -2.53 0 0 -34 0 0 0

Discussion of examples:Notice that in C at level 2 & 3 interaction is as large as or larger than main effects.

32

Example: Three factors each at two levels

The dependent variable is response rate of a direct mail offering. 2_to_3_design.mpj on class webpage

low highA postage 3rd class 1st classB price $9.95 $12.95C envelope size #10 9 x 12

33

Backward Model Selection

1. Fit the full model and delete the most insignificant highest order term.

2. Fit the reduced model from 1. and delete the most insignificant highest order term.

3. Repeat 2. until all remaining highest order terms are significant.

4. Repeat the same procedure (deleting the most insignificant term each time until no insignificant terms) for the 2nd highest order, then the 3rd highest order, …, and finally the order 1 terms.

5. Determine the final model and do assumption checking for it.

34

Note.

If a term is in the current model, then all lower order terms involving factors in that term must not be deleted even if they are insignificant.

eg. If ABC is significant (so it is in the model), then A, B, C, AB, AC, BC cannot be deleted.

35

Note.

The procedure of backward model selection can be very time-consuming if the number of factors, k, is large. In such cases, we delete all insignificant terms together when we are processing the order 4 or higher terms.

• Examples are in Minitab unit 11.