lec5.pdf

15
STAT3010: Lecture 5 1 Notation and Examples (Section 9.2, Page 413) To make a decision of reject/do not reject the null hypothesis, we simplify the test by the use of the ANOVA table. Here are the formula’s which make up the ANOVA table: Analysis of Variance Table Degrees of Source of Sums of Squares Freedom Mean Squares Variation (SS) (df) (MS) F Between 2 .. . ) ( X X n SS j j b k-1 1 2 k SS MS s b b b w b MS MS F Within 2 . ) ( j ij w X X SS N-k k N SS MS s w w w 2 Total 2 .. ) ( X X SS ij total N-1 Example 9.3: Testing Difference in Mean Time to Pain Relief Among 3 Treatments An investigator wishes to compare the average time to relief of headache pain under three distinct medications, call them Drugs A, B and C. Fifteen patients who suffer from chronic headaches are randomly selected for the investigation, and five subjects are randomly assigned to each treatment. The following data reflect times to relief (in minutes) after taking the assigned drug: Drug A Drug B Drug C 30 25 15 35 20 20 40 30 25 25 25 20 35 30 20

Upload: joy-aj

Post on 25-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: lec5.PDF

STAT3010: Lecture 5

1

Notation and Examples (Section 9.2, Page 413)

To make a decision of reject/do not reject the null hypothesis, we simplify the test by the use of the ANOVA table. Here are the formula’s which make up the ANOVA table:

Analysis of Variance Table Degrees of

Source of Sums of Squares Freedom Mean Squares Variation (SS) (df) (MS) F

Between 2... )( XXnSS jjb k-1

12

kSSMSs b

bbw

b

MSMSF

Within 2. )( jijw XXSS N-k

kNSSMSs w

ww2

Total 2.. )( XXSS ijtotal N-1

Example 9.3: Testing Difference in Mean Time to Pain Relief Among 3 Treatments

An investigator wishes to compare the average time to relief of headache pain under three distinct medications, call them Drugs A, B and C. Fifteen patients who suffer from chronic headaches are randomly selected for the investigation, and five subjects are randomly assigned to each treatment. The following data reflect times to relief (in minutes) after taking the assigned drug: Drug A Drug B Drug C

30 25 15 35 20 20 40 30 25 25 25 20 35 30 20

Page 2: lec5.PDF

STAT3010: Lecture 5

2

Summary Statistics by Treatment

291x 252x 203x

025.021s 005.02

2s 025.023s

158.01s 071.02s 158.03s

To test whether the true mean times to relief under the three different drugs are equal, we use a five step procedure:

1. Set up the hypothesis.

2. Select the appropriate test statistic.

3. Compute the test statistic.

Page 3: lec5.PDF

STAT3010: Lecture 5

3

Analysis of Variance Table Degrees of

Source of Sums of Squares Freedom Mean Squares Variation (SS) (df) (MS) F

Between

Within

Total

4. Decision Rule.

5. Conclusion.

This ANOVA procedure utilizes several calculations (as do many statistical procedures)….the calculations are generally performed using a statistical software on a computer, so we’ll use SAS to evaluate this same example.

SAS CODE:

options ps=62 ls=80;

data headache;input trt $ time;

cards;

Page 4: lec5.PDF

STAT3010: Lecture 5

4

A 30A 35A 40A 25A 35B 25B 20B 30B 25B 30C 15C 20C 25C 20C 20run; proc print;run;

proc anova;

class trt;

model time=trt;

run;

SAS OUTPUT:

The SAS System

Obs trt time

1 A 302 A 353 A 404 A 255 A 35

Page 5: lec5.PDF

STAT3010: Lecture 5

5

6 B 257 B 208 B 309 B 2510 B 3011 C 1512 C 2013 C 2514 C 2015 C 20

The ANOVA ProcedureClass Level Information

Class Levels Values trt 3 A B C

Number of Observations Read 15 Number of Observations Used 15

The ANOVA Procedure

Dependent Variable: timeSum of

Source DF Squares Mean Square F Value Pr > F Model 2 423.3333333 211.6666667 10.16 0.0026 Error 12 250.0000000 20.8333333Corrected Total 14 673.3333333

R-Square Coeff Var Root MSE time Mean 0.628713 17.33299 4.564355 26.33333

Source DF Anova SS Mean Square F Value Pr > F trt 2 423.3333333 211.6666667 10.16 0.0026

Note: SAS has two procedures for analysis of variance applications. The first is the ANOVA procedure, which is used when the sample sizes are equal, and the second is the GLM (general linear models) procedure, which can be used when

Page 6: lec5.PDF

STAT3010: Lecture 5

6

the sample sizes are unequal or equal. Since the sample sizes are equal in example 9.3, we used the ANOVA procedure.

Example 9.5: Testing Difference in Mean Weight Gain Among 4 Different Diets

A study is developed to examine the effects of vitamin and milk supplements on infant weight gain. Four diet plans are considered: Diet A involves a regular diet plus the vitamin supplement Diet B involves a regular diet plus the special milk formula, Diet C is our control diet (no restrictions) and Diet D involves a regular diet plus the vitamin and the special milk formula. Twenty infants are selected for the investigation and each is randomized to one of the four competing diet programs. The following table displays weight gains, measured in pounds, after 1 month on the assigned diet:

Diet A Diet B Diet C Diet D 2.0 1.6 1.5 2.11.5 1.9 2.0 2.42.4 2.1 1.8 1.91.9 1.1 1.3 1.82.6 1.7 1.2 2.2

1.) Set up the hypothesis. 2.) Use SAS to compute the ANOVA Table; make a decision

and conclusion based on your output.SAS CODE:options ps=62 ls=80;data infants;

input diet $ gain; cards;A 2.0A 1.5A 2.4A 1.9

Page 7: lec5.PDF

STAT3010: Lecture 5

7

A 2.6B 1.6B 1.9B 2.1B 1.1B 1.7C 1.5C 2.0C 1.8C 1.3C 1.2D 2.1D 2.4D 1.9D 1.8D 2.2run; proc print;run; proc glm; class diet; model gain=diet; run; SAS OUTPUT: The SAS System

Obs diet gain1 A 2.02 A 1.53 A 2.44 A 1.95 A 2.66 B 1.67 B 1.98 B 2.19 B 1.1

10 B 1.711 C 1.512 C 2.0

13 C 1.814 C 1.315 C 1.216 D 2.117 D 2.4

Page 8: lec5.PDF

STAT3010: Lecture 5

8

18 D 1.919 D 1.820 D 2.2

The SAS System The GLM Procedure

Class Level Information

Class Levels Values diet 4 A B C D

Number of Observations Read 20 Number of Observations Used 20

The SAS System The GLM Procedure

Dependent Variable: gainSum of

Source DF Squares Mean Square F Value Pr > F Model 3 1.09400000 0.36466667 2.92 0.0659 Error 16 1.99600000 0.12475000Corrected Total 19 3.09000000

R-Square Coeff Var Root MSE gain Mean 0.354045 19.09187 0.353200 1.850000

Source DF Type I SS Mean Square F Value Pr > F diet 3 1.09400000 0.36466667 2.92 0.0659

Source DF Type III SS Mean Square F Value Pr > F diet 3 1.09400000 0.36466667 2.92 0.0659

Decision:

Conclusion:

Page 9: lec5.PDF

STAT3010: Lecture 5

9

Note: We always make conclusions based on the alternative hypothesis. Whether we reject or do not reject the null, we will always conclude on the alternative with “sufficient” or “insufficient” evidence to say that the means are not equal.

Fixed Versus Random Effects Models (Section 9.3, Page 424)

There’s two types of analysis of variance applications: fixed effects models and random effects models.

Fixed Effects Models:

Random Effects Models:

Note: We will only be using fixed effects models in the upcoming sections. Basically, these formulas only apply to fixed effects models.

Page 10: lec5.PDF

STAT3010: Lecture 5

10

Evaluating Treatment Effects (Section 9.4, Page 424)

This section is purely based on the decision “reject 0H ”. If an ANOVA is performed and it has been established that a significant difference in means exists, we then want to figure out how much variation in the data is due to the treatments.

We use the following statistic to find the ratio of variation due to the treatments ( bSS ) to the total variation:

Page 11: lec5.PDF

STAT3010: Lecture 5

11

Multiple Comparisons Procedures (Section 9.5, Page 425)

Now that we know when to reject/do not reject the null hypothesis, let’s consider some new comparisons. Let’s say we decide to reject the null hypothesis, and conclude that not all means are equal. What if we wanted to know, specifically, which means aren’t equal? For example, in example 9.3, we wanted to test to see if the mean times to relief of three different headache medications differed:

And we came up with the decision to reject the null hypothesis. So, we are saying that there is a significant difference in at least 2 of the headache medications. Suppose we are particularly interested in comparing only the first two medications:

Or the first and third:

Tests of this type are called pairwise comparisons, since they involve pairs of treatment means.

It is however, possible to construct more complicated comparisons: For example, Compare the mean time to relief for patients assigned to either Drug A or B to the mean time to relief for patients assigned to Drug C.

Both pairwise (two-at-a-time) and more complicated comparisons are generally called contrasts.

Page 12: lec5.PDF

STAT3010: Lecture 5

12

There are a number of statistical procedures for handling these applications, which are called multiple comparison procedures(MCP). For pairwise (two-at-a-time) comparisons, we will be looking at 2 popular multiple comparison procedures, the Scheffe and Tukey procedures. Next class, we’ll look at a different method for more complicated contrasts.

Remember: These MCP’s are only used when we’ve come up with the decision of “rejecting 0H ” in our ANOVA and a conclusion that the treatment means are significantly different.

The Scheffe Procedure

The Scheffe procedure is a multiple comparison procedure that controls the familywise error rate. This means that the P(type I error) is controlled (and equal to ) over the family of all comparisons.

Recall: Type I error?

Note: The Scheffe procedure is most commonly used when involving more than a few contrasts; however, it has lower statistical power compared to competing procedures.

Outline of the Scheffe Procedure:

1. Set up the hypotheses:

2. Compute the test statistic:

Page 13: lec5.PDF

STAT3010: Lecture 5

13

3. Decision Rule:

4. Conclusion. (We should all know how to write a conclusion by now!)

Okay, let’s do an example:

Example 9.7: Recall Example 9.3;

We compared the mean time to relief of headache pain under 3 competing medications and had the following hypothesis:

Analysis of Variance Table Degrees of

Source of Sums of Squares Freedom Mean Squares Variation (SS) (df) (MS) F

Between 423.329 2 211.66 10.1598

Within 250 12 20.833

Total 673.329 14

Since we don’t know which of the 3 treatments do not equal, we now wish to compare the medications taken two-at-a-time, (i.e., pairwise comparisons).

Page 14: lec5.PDF

STAT3010: Lecture 5

14

Summary Statistics by Treatment

51n 52n 53n

331x 262x 203x

7.51s 2.42s 5.33s

Drug A versus Drug B:

1. Hypothesis:

2. Test Statistic:

3. Decision:

4. Conclusion:

Drug A versus Drug C:

1. Hypothesis:

Page 15: lec5.PDF

STAT3010: Lecture 5

15

2. Test Statistic:

3. Decision:

4. Conclusion:

Drug B versus Drug C:

1. Hypothesis:

2. Test Statistic:

3. Decision:

4. Conclusion:

Therefore, it is shown through the Scheffe comparison procedure that 31 .