overview of experimental research

Chapter 9: The analysis of variance for simple experiments (single factor, unrelated groups

designs).

Overview of experimental research

• Groups start off the same on every measure. • During the experiment, groups are TREATED

DIFFERENTLY• Responses thought to be effected by the different

treatments are then measured• If the group means become different from each other,

the differences may have been caused, in part, by the different ways the groups were treated.

• Determining whether the differences between group means result simply from sampling fluctuation or are (probably) due in part to the treatment differences is the job of the statistical analysis.

Let’s take that one point at a time. At the beginning of an experiment:

• Participants are randomly selected from a population. Then they are randomly assigned to treatment groups.

• Thus, at the beginning of the study, each treatment group is a random (sub)sample from a specific population.

Groups start off much the same in every possible way

• Since each treatment group is a random sample from the population, each group’s mean and variance will be similar to that of the population.

• That is, each group’s mean will be a best estimate of mu, the population mean.

• And the spread of scores around each group’s mean will yield a best estimate of sigma2 and sigma.

So: At the beginning of an experiment the treatment groups differ only because of random sampling fluctuation.When there are different people in each group, the random sampling fluctuation is caused by 1.) random individual differences and 2.) random measurement problems.

Sampling fluctuation is the product of the inherent variability of the data.

That is what is indexed by sigma2, the average squared distance of scores

from the population mean, mu.

To summarize:• Since the group means and variances of

random samples will be similar to that of the population, they will be similar to each other.

• This is true for any and all things you can measure.

• The only differences among the groups at the beginning of the study on any and all measures will be the mostly minor differences associated with random sampling fluctuation caused by the fact that there are different people in each group and that there are always random measurement problems (ID + MP).

The ultimate question

• If we then treat the groups differently, will the treatments make the groups more different from each other at the end of the experiment than if only sampling fluctuation created their differences?

In the simplest experiments (Ch 9)

• In the simplest experiments, the groups are exposed to treatments that vary on a single dimension.

• The dimension on which treatments of the groups vary is called the independent variable.

• We call the specific ways the groups are treated the “levels of the independent variable.”

The independent variable• An independent variable can be any preplanned difference in

the way groups are treated. Which kind of difference you chose relates to the experimental hypothesis, H1.

• For example, if you think you have a new medication for bipolar disorder, you would compare the effect of various doses of the new drug to placebo in a random sample of bipolar patients. Thus, the groups would differ in terms of the dose of drug.

• Proper experimental design would ensure that the differences in dose received is the only way the groups will be systematically treated differently from each other.

Why is it called the “independent variable”?Remember, we call the different treatments the “levels”

of the independent variable.

Who gets which level is random. It is determined solely by the group to which a participant is randomly assigned.

• So, any difference in the way a person is treated during the experiment is unrelated to or “independent of” the infinite number of pre-existing differences that precluded causal statements in correlational research.

The dependent variable

• Relevant responses (called dependent variables) are then measured to see whether the independent variable caused differences among the treatment conditions beyond those expected given ordinary sampling fluctuation.

• That is, we want to see whether response are related to (dependent on) the different levels of the independent variable to which the treatment groups were exposed.

Differences after the experiment among group means on the dependent variable

may well be simple sampling fluctuation!• The groups will always differ somewhat from each other

on anything you measure due to sampling fluctuation.• With 3 groups, one will score highest, one lowest and one

in the middle just by chance. In four groups one will score highest, one lowest, with two in the middle, one higher than the other. Etc.

• So the simple fact that the groups differ somewhat is not enough to determine that the independent variable, the different ways the groups were treated, caused the differences.

• We have to determine whether the groups are more different than they should be if only sampling fluctuation is at work.

H0 & H1: If one is wrong, the other must be right.

• Either the independent variable would cause differences in responses (the dependent variable) in the population as a whole or it would not.

• H0: The different conditions embodied by the independent variable would have NO EFFECT if administered to the whole population.

• H1: The different conditions embodied by the independent variable would produce different responses if administered to the whole population

The population can be expected to respond to the different levels of the IV similarly to the samples• Remember, random samples are

representative of the population from which they are drawn.

• If the different levels of the independent variable cause the groups to differ (more than they would from simple sampling fluctuation), the same thing should be true for the rest of the population.

For example:

• Say a new psychotherapy causes a random sample of anxious patients to become less anxious in comparison to treatment groups given more conventional approaches or pill placebo.

• Then, we would expect all anxious patients to respond better to the new treatment than to the ones to which it was compared.

However:

• As in the case of correlation, we don’t want to toss out treatments that we know work because the new treatment happens to do better in an experiment.

• We would want to be sure that the difference after treatment is not just a chance finding based on random sampling fluctuation.

The Null Hypothesis

• The null hypothesis (H0) states that the only reason that the treatment group means are different is sampling fluctuation. It says that the independent variable causes no systematic differences among the groups.

• A corollary: Try the experiment again and a different group will score highest, another lowest. If that is so, you should not generalize from which group in your study scored highest or lowest to the population from which the samples were drawn.

• Your best prediction remains that everyone will score at the mean on the dependent variable, that treatment condition will not predict response.

People often respond to random sampling fluctuation as if something was

causing a difference.• People take all kinds of food supplements because

they believe the supplements (.e.g.echinesia) will make colds go away more quickly.

• If you tried it, and it worked wouldn’t you tell your friends? Wouldn’t you try it again with your next cold?

• Having recovered quickly after taking something provides the evidence. After all, its what happened to you!

But did the food supplement really make a difference?

• To this point NO food supplement has proved to shorten colds when carefully tested.

• The mistake lay in taking random variation in the duration of a cold as evidence that the echinesia (or whatever) did something beneficial.

• That’s ok if it is just your pocket book that is affected. But what if you were an FDA scientist? Wouldn’t people expect better evidence of efficacy before they gave the food supplement company an enormous amount of their money?

We call rejecting a true null hypothesis a “Type 1 Error.”

• The first rule in science is “Do not increase error.”

• Scientists don’t like to say something will make a difference when it isn’t true.

• So, before we toss away proven treatments or say that something will cause illness or health, we want to be fairly sure that we are not just responding to sampling fluctuation.

The scientist’s answer: test the null hypothesis

• So, as we did with correlation and regression, we assume that everything is equal, all treatments have the same effect, unless we can prove otherwise.

• The null hypothesis says that the treatments do not systematically differ; one is as good as another.

• As usual, we test the null hypothesis by asking it to make a prediction and then establishing a range of results for the test statistic consistent with that prediction.

• As usual, that range is a 95% CI for the test statistic.

The test statistic: F and t tests

• In Chapter 8, you learned to use Pearson’s r as a test statistic.

• When it fell outside a 95% confidence interval consistent with the null hypothesis, we rejected the null.

• In experimental research, we generally use the F and t statistics to test the null.

• When there are only two groups, t is used as the test statistic.When there are three or more groups, Fischer’s ratio (called the F statistic) is used as the test statistic.

Nonsignificant results• Each actual t or F will either fall inside or outside the

CI.95 that is consistent with the null hypothesis.• Results inside the range consistent with the null are

called nonsignificant. Results outside the 95% CI are called significant. One or the other must occur in each statistical analysis.

• If you get nonsignificant results, you have failed to reject the null and you may not extrapolate from the differences among your experimental (treatment) groups to the population.

You must go back to saying that your best prediction is that everyone will be equal and the differences among the treatments don’t matter.

If t or F falls outside the CI.95, you have statistically significant findings.

• If your results are statistically significant, then the results are not consistent with the notion that the between group differences are solely the product of sampling fluctuation.

• Since that is what the null says, you must declare the null false and reject it.

• If the experiment is well run, the differences in the way you treated the groups will be the only systematic difference among the groups.

• Getting statistically significant findings is important.

• If you get them, you must say, as a scientist, that the responses of the different treatment groups should be mirrored by the population as a whole were it exposed to the same conditions.

• Scientists tend to be cautious with making such statements, bracketing them with “more research is necessary” type phrases.

• But they still have to say it.

The Experimental Hypothesis (H1)

• Unlike the null, H1 is different in each experiment.

• The experimental hypothesis tells us the way(s) we must treat the groups differently and what to measure.

• Therefore, the experimental hypothesis tells us (in broad terms) how to design the experiment.

• For example, if we hypothesize that embarrassed people remember sad things better, we need to embarrass different groups to different degrees (not at all to a lot) and measure their memories for sad and happy events.

The Experimental Hypothesis

• The experimental hypothesis (H1) states that between group differences on the dependent variable are caused by the independent variable as well as by sampling fluctuation.

• If F or t is significant and the null is shown to be false, and the only systematic difference among the groups is how they were treated (the differing levels of the IV), then H1 must be right.

• In that case, we must extrapolate our findings to the rest of the population, assuming that they would respond as did our different treatment groups.

The F test• In order to statistically test the null hypothesis,

we are going to ask it to make a prediction about the relationship between two estimates of sigma2.

• In an F test, we compare these two different ways of calculating mean squares to estimate the population variance.

• To estimate sigma2 you always divide a sum of squares by its degrees of freedom.

• Remember, random sampling fluctuation is indexed by sigma2, the population variance.

Our two estimates of sigma2

One way to estimate sigma2 is to find the difference between each score and its group mean, square and sum those differences. This yields a sum of squares within group (SSW). To estimate sigma2 you divide SSW by degrees of freedom within group (dfW=n-k). This estimate of sigma2 is called the mean square within groups, MSW. You have been calculating it since Chapter 5.

The other way to estimate sigma2 is to square and sum the differences between each participant’s group mean and the overall mean. This yields a sum of squares between group and grand means (SSB). To estimate sigma2 you divide SSB by degrees of freedom between groups (dfB=k-p1). This is called the mean square between groups,MSB. It is new.

What is indexed by sigma2 and its best estimate:MSW

• Sigma2 indexes random sampling fluctuation. It comprises individual differences and random measurement problems (ID + MP).

• MSW: Since everyone in a specific group is treated the same way, differences between participant’s scores and their own group mean, the basis of MSW, can only reflect ID + MP.

• Thus, MSW is always a good estimate of sigma2, the population variance, as both index ID + MP.

What is indexed by the mean square between groups (MSB)

• Since we treat the groups differently, the distance between each group’s mean and the overall mean can reflect the effects of the independent variable (as well as the effects of random individual differences and random measurement problems).

• Thus MSB = ID + MP + (?)IV• If the independent variable pushes the group

means apart, MSB will overestimate sigma2 and be larger than MSW.

Testing the Null Hypothesis (H0)• H0 says that the IV has no effect. • If H0 is true, groups differ from each other and from the

overall mean only because of sampling fluctuation based on random individual differences and measurement problems (ID + MP).

• These are the same things that make scores differ from their own group means.

• So, according to H0, MSB and MSW are two ways of measuring the same thing (ID + MP) and are both good estimates of sigma2.

• Two measurements of the same thing should be about equal to each other and a ratio between them should be about equal to 1.00.

In simple experiments (Ch.9), the ratio between MSB and MSW is

the Fisher or F ratio.

In simple experiments, F=MSB/MSW.

H0 says F should be about 1.00.

The Experimental Hypothesis (H1)• The experimental hypothesis says that the groups’ means will

be made different from each other (pushed apart) by the IV, the independent variable (as well as by random individual differences and measurement problems).

• If the means are pushed apart, MSB will increase, reflecting the effects of the independent variable (as well as of the random factors). MSW will not.

• So MSB will be larger than MSW

• Therefore, H1 suggests that an F ratio comparing MSB to MSW should be larger than 1.00.

As usual, we set up 95% confidence intervals around the prediction of the null.

• In Ch. 9, the ratio MSB/MSW is called the F ratio.• If the F ratio is about 1.00, the prediction of the null is

correct.• It is rare for the F ratio to be exactly 1.00.• At some point, the ratio gets too different from 1.00 to

be consistent with the null. We are only interested in the case where the ratio is greater than 1.00 which means that the means are further apart than the null suggests.

• The F table tells us when the difference among the means is too large to be explained as sampling fluctuation alone.

Analyzing the results of an experiment

An experiment• Population: Male, self-selected, “social drinkers”• Number of participants (9) and groups (3)• Design: Single factor, unrelated groups• Independent variable: Stress

– Level 1: No Stress– Level 2: Moderate stress– Level 3: High stress

• Dependent variable: ounces consumed• H0: Stress does not affect alcohol consumption.

• H1: Stress will cause increased alcohol consumption.

Computing MSW and MSB

1.11.21.3

2.12.22.3

3.13.23.3

81012

121215

131718

999

000

999

2MX -3-3-3

000

333

MX 101010

131313

161616

131313

131313

131313

MX404

114

914

2XX -202

-1-12

-312

XX X#S101010

131313

161616

X

101 X

132 X

163 X

13M

28WSS

6Wdf

67.4WMS

54BSS

2Bdf

27BMS

CPE 9.2.1 - ANOVA summary table

Between GroupsStress level

Within GroupsError

54 2 27 26 6 4.67

5.78

SS df MS F p

?

We need to look at the F table to determine significance.

Divide MSB by MSW

to calculate F.

Ratio of mean squares = F ratio

W

B

MS

MSMean Squareswithin groups.

Mean Squaresbetween groups.

Possibly effected byindependent variable.

Not effected byindependent variable.

If the independent variable causes differencesbetween the group means, then MSB will belarger than MSW. If the effect is large enough and/or there are enough degrees of freedom, the result may be a statistically significant F ratio.

The F Test• The null predicts that we will find an F ratio close to

1.00, not an unusually large F ratio.• F table tells us whether the F ratio is significant. • p<.05 means that we have found an F ratio that is

large enough to occur with 5 or fewer samples in 100 when the null is true. If we find a larger F ratio than the null predicts, we have shown H0 to predict badly and reject it.

• Results are statistically significant when you equal or exceed the critical value of F at p <.05.

Critical values in the F table

• The critical values in the F table depend on how good MSB and MSW are as estimates of sigma2.

• The better the estimates, the closer to 1.00 the null must predict that their ratio will fall.

• What makes estimates better??? DEGREES OF FREEDOM. Each degree of freedom corrects the sample statistic back towards its population parameter.

• Thus, the more degrees of freedom for MSW and MSB, the closer the critical value of F will be to 1.00.

Using the F tableSo, to use the F table, you must specify the degrees of freedom (df) for the numerator

and denominator of the F ratio.

In both Ch 9 and Ch 10 the denominator is MSW. As you know, dfW = n-k.

In Ch 9, the numerator is MSB and dfB=k-1.

3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49

Degrees of freedom in Numerator 1 2 3 4 5 6 7 8 Df in

denominator

4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80

5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27

6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10

7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84

8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03

9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47


denominator

36 4.41 3.26 2.86 2.63 2.48 2.36 2.28 2.21 7.39 5.25 4.38 3.89 3.58 3.35 3.18 3.04

40 4.08 3.23 2.84 2.61 2.45 2.34 2.26 2.19 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82

60 4.00 3.15 2.76 2.52 2.37 2.25 2.17 2.10 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82

100 3.94 3.09 2.70 2.46 2.30 2.19 2.10 2.03 6.90 4.82 3.98 3.51 3.20 2.99 2.82 2.69

400 3.86 3.02 2.62 2.39 2.23 2.12 2.03 1.96 6.70 4.66 3.83 3.36 3.06 2.85 2.69 2.55

3.84 2.99 2.60 2.37 2.21 2.09 2.01 1.94 6.64 4.60 3.78 3.32 3.02 2.80 2.64 2.51

3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49


denominator

4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80

5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27

6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10

7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84

8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03

9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47

These are related to the number of different

treatment groups.They relate to theMean Square

between groups.

k-1

3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49


denominator

4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80

5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27

6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10

7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84

8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03

9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47

These are related to the number of subjects.

They relate to theMean Squarewithin groups.

n-k

3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49


denominator

4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80

5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27

6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10

7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84

8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03

9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47

The critical values in thetop rows are alpha = .05.

3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49


denominator

4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80

5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27

6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10

7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84

8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03

9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47

The critical values in thebottom rows are for

bragging rights (p< .01).

3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49


denominator

4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80

5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27

6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10

7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84

8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03

9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47

In an experiment with 3 treatment groups, we have 2 df between groups (k-1).

If we have 9 subjects, and 3 groups, we have 6df within groups (n-k) . Since this is the ratio of MSB to

MSW, the variance estimate between groups must be 5.14 times larger than the variance estimate within groups.

5.14

If we find an F ratio of 5.14 or larger , we reject the null hypothesis and declare that there is a treatment

effect, significant at the .05 alpha level.

ANOVA summary table

Between GroupsStress level

Within GroupsError

54 2 27 26 6 4.67

5.78

SS df MS F p

?

3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49


denominator

4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80

5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27

6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10

7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84

8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03

9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47

5.1410.92

5.78 is as large or larger than the criticalvalue at the .05 alpha level (5.14).

So it is statistically significant. It does not equal or exceed the critical value for .01

F (2,6)=5.78, p<.05

Remember the pattern of critical values in the F table

If the null is true, as df increase, each mean square becomes a better estimate of sigma2 and the null must predict an F ratio closer and closer to 1.00. Whether an F ratio is significant depends on dfW and dfB as well as on the size of the ratio.

3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49


denominator

4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80

5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27

6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10

7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84

8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03

9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47

Notice the usual effect of consistency

The more dfB and dfW, ...

the better our estimates of sigma2, ...

the closer F should be to 1.00 when the null is true.

H0 says that F should be about 1.00.

Now you do one

An experiment• Population: Female, moderately depressed outpatients• Number of participants (10) and groups (2)• Design: Single factor, unrelated groups• Independent variable: Dose of new drug, Feelbetter

– Level 1: Placebo (n=5)– Level 2: Moderate dose of the drug (n=5)

• Dependent variable: HAM-D scores• H0: Feelbetter does no more good than placebo.

• H1: The average response in the treatment groups will differ more than they would unless Feelbetter effectively helps depression.

Computing MSW and MSB

1.11.21.31.41.5

2.12.22.32.42.5

1821223024

1115131719

1616161616

1618161616

2MX 44444

-4-4-4-4-4

MX 2323232323

1515151515

1919191919

1919191919

MX2541

491

16044

16

2XX -5-2-171

-40-224

XX X#S2323232323

1515151515

X

231 X

152 X 00.19M

120WSS8Wdf

00.15WMS

160BSS

1Bdf

160BMS

ANOVA summary table

Between GroupsDrug dose

Within GroupsError

160 1 160.00 120 8 15.00

10.67

SS df MS F p

?

We need to look at the F table to determine significance.

Divide MSB by MSW

to calculate F.

3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49


denominator

4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80

5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27

6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10

7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84

8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44

11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.039 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47

ANOVA summary table


Within GroupsError

160 1 160.00 120 8 15.00

10.67

SS df MS F p

.05

F(1,8)=10.67, p<.05.

Now do it as a t test. (t for 2, F for more)

Example 2 – t test summary table


Within GroupsError

160 1 12.65 120 8 3.87

10.67

SS df s s F p

?

We need to look at the t table to determine significance.

df 1 2 3 4 5 6 7 8

.05 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306

.01 63.657 9.925 5.841 4.604 4.032 3.707 3.499 3.355

df 9 10 11 12 13 14 15 16.05 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120.01 3.250 3.169 3.106 3.055 3.012 2.997 2.947 2.921

df 17 18 19 20 21 22 23 24.05 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064.01 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797

df 25 26 27 28 29 30 40 60.05 2.060 2.056 2.052 2.048 2.045 2.042 2.021 2.000.01 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.660

df 100 200 500 1000 2000 10000.05 1.984 1.972 1.965 1.962 1.961 1.960.01 2.626 2.601 2.586 2.581 2.578 2.576

Example 2 – t test summary table


Within GroupsError

160 1 12.65 120 8 3.87

10.67

SS df s s F p

.05

t = the square root of F.F and t always give you

the same level of significance.

t(8)=3.266, p<.05.

The t Test: t for 2, F for More

• The t test is a special case of the F ratio.

• If there are only two levels (groups) of the independent variable, then

Ft W

B

MS

MS

s

sB

t table and F table

when there are only two groups:

df in the numerator is always 1

Bdf 1k 12 1

Relationship between t and F tables

• Because there is always 1 df between groups, the t table is organized only by degrees of freedom within group (dfW).

• By the way, the values in the t table are the square root of the values in the first column of the F table.

df 1 2 3 4 5 6 7 8.05 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306.01 63.657 9.925 5.841 4.604 4.032 3.707 3.499 3.355

df 9 10 11 12 13 14 15 16.05 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120.01 3.250 3.169 3.106 3.055 3.012 2.997 2.947 2.921

df 17 18 19 20 21 22 23 24.05 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064.01 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797

df 25 26 27 28 29 30 40 60.05 2.060 2.056 2.052 2.048 2.045 2.042 2.021 2.000.01 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.660

df 100 200 500 1000 2000 10000.05 1.984 1.972 1.965 1.962 1.961 1.960.01 2.626 2.601 2.586 2.581 2.578 2.576

t tablefrom Chapter 6

Let’s look atthese values.

df 3 4 5.05 3.182 2.776 2.571.01 5.841 4.604 4.032

t tablefrom Chapter 6

F table.

3 10.13 34.12

4 7.71 21.20

5 6.61 16.26

1

Ft

2tF

In fact …

• It all fits together!

• The F table is related to the t table; The t table approaches the z table.

• Degrees of freedom!

• Alpha levels!

• Significance!

YOU NEVER TEST THE EXPERIMENTAL HYPOTHESIS

STATISTICALLY.• You can only examine the data in light of the

experimental hypothesis after rejecting the null.

• Good research design makes the experimental hypothesis the only reasonable alternative to the null.

• Accepting the experimental hypothesis is based on good research design and logic, not statistical tests.

We have to go back to H0

• Until I can prove otherwise, we must assume my new drug is no better than placebo.

• If that is the case, all participants can be expected to score about the same, right at the mean of the population.

Are the groups too different from each other for only sampling fluctuation to be at

work?

• The simple fact that the groups differ somewhat is not enough to determine that the independent variable, the different ways the groups were treated, caused any part of the differences between the groups.

• We have to determine whether the groups are more different than they should be if only sampling fluctuation is at work!

overview of experimental research

Documents

treatment groups

unrelated groups

way groups

random sampling fluctuation

random individual differences

treatment differences

random subsample

different treatments