two sample and anova handout

7/28/2019 Two Sample and ANOVA Handout

1/7

Comparing Two Means

Often we have two unknown means and areinterested in comparing them to each other. Usually

the null hypothesis is

H0: no difference between the population means

There are a number of related testing procedures we

will present. Which testing procedure you choosedepends on your data.

We will present three basic procedures here:

Paired t-test for paired or matched dataTwo-sample t-tests for comparing two

independent groups. Two basic independent-

sample tests will be presented:

Equal variance t-tests: the two groups can beassumed to have equal variances.

Unequal variance t-tests: the two groups arenot assumed to have equal variances.

Paired t-test

When the means being compared come fromobservations that are naturally paired or matched, a

paired t-test is used.

Examples: Before vs after studies, also called

longitudinal studies produce paired data. Each

patient contributes two paired observations: the

before value and the after value.

Other types of studies can produce paired data also.

One possibility would be a dental study where both

opposing treatments are used in each patient, inrandomly assigned half-mouths.

Computing a Paired t-test

To compute a paired t-test, focus on the within-pair

differences (for example after before).

Perform a t-test on the mean of the differences. To

test if the means are different the null-hypothesis is

H0: differences = 0.

Note: Even though we are comparing two means,

this is still considered a one-sample test.

Example: fluoride varnish study

In ten at-risk children, fluoride varnish is applied inrandomly assigned half-mouths. The remaining half-mouths are left untreated. The children are followed for

two years and the new dmfs and locations are recorded:

patient varnish untreated difference

1 2 3 -1

2 1 2 -1

3 0 1 -1

4 2 0 2

5 0 0 0

6 0 2 -2

7 2 5 -3

8 1 1 0

9 3 7 -4

10 5 4 1

mean 1.6 2.5 -0.90

sd 1.79

To perform the paired t-test, compute a one-sample t-test

on the last column where H0: = 0.

59.11079.1

090.=

=T

For a two-tailed test compare |-1.59|=1.59 to t9, .975 = 2.262.

We do not reject since 1.59 < 2.262. P-value is

P(|t9| > |-1.59|) = 2P(t9 > 1.59) = 0.15.

Comparing means of two independent samples

These are called two-sample tests.

Our goal is usually to estimate 1 - 2 and the

corresponding confidence intervals and to perform

hypothesis tests on:H0: 1 - 2 = 0.

For each sample we compute the relevant statistics:

Sample 1 Sample 2n1 n2

1X 2X

s1 s2

The obvious statistic to compare the two population

means is 21 XX .

Probability theory tells us that:

1. 21 XX is the best estimate of 1 - 22.the standard error is 222121 nn + 3. for large n1and n2:

( )2221212121 ,~ nnNXX +


2/7

In order to compute hypothesis tests and confidence

intervals for 1 - 2 we will need to estimate the

standard error of 21 XX .

Two different estimation procedures are commonlyused depending on whether one feels it is reasonableto assume the two groups have similar variances.

RULES OF THUMB for deciding whether to use

the equal variance or unequal variance formulas

1.For small samples can use equal varianceformulas unlesss1 is twice as big ass2, or the

other way around.

2. If n1 and n2 > 80 can use unequal varianceformulae for SE (its easier to compute), and use

the Normal distribution.

3. If you are unsure, the unequal variance formulawill be the conservative choice (less power, but

less likely to be incorrect).

4.The calculations are a snap with a computerprogram. If unsure about variance assumptions,

compute the test both ways and see if there is a

conflict.

Equal Variance case: 1 = 2

If it is reasonable to assume that 1=2, we canestimate the standard error more efficiently by

combining the sample.

Standard Error of 21 XX is estimated by

2121 11)( nnsXXSE pooled += ,

where the pooled standard deviation,spooled is

2

)1()1(

21

2

22

2

11

+

+=

nn

snsnspooled

.

This pooled standard deviation is roughly the

combined distance of observations from their

respective means.

The Tstatistic

)( 21

21

XXSE

XXTequal

=

,

has a tdistribution with n1+ n2-2 degrees of freedom.

Example: Confidence Intervals for difference

between means. Gum data from day 1.

Gum A Gum Cn1=25 n2=40

1X =-0.72 2X =2.63s1=5.37 s2=3.80

Assume equal variances (s2 /s1< 2)

46.424025

80.33937.52422

=+

+=pooleds ,

14.140125146.4)( 21 =+ XXSE ,

so 95% confidence interval is

( )07.1,63.51.142.002.63-0.72- =

Note: Since confidence interval does not cover 0,this implies that a two-sided hypothesis test of

H0: 1 - 2 = 0, would reject at level =0.05.

check: T = |(-0.72 - 2.63)/1.14| = 2.94 > 2.00 = t63,.975.

t63,.975

SPSS output for Gum example:

T-Test

Group Statistics

gumtype N Mean Std. Deviation

Std. ErrorMean

A25 -0.7200 5.36594 1.07319

change inDMFS

C40 2.6250 3.80073 0.60095

Independent Samples Test

Levene's Test

for Equalityof Variances t-test for Equality of Means

F Sig. t dfSig. (2-tailed)

MeanDifference

Std. ErrorDifference

Equalvariancesassumed 0.924 0.340 -2.940 63 0.005 -3.345 1.138

change inDMFS

Equalvariancesnotassumed

-2.720 39.05 0.010 -3.345 1.230

95% Confidence Interval of theDifference

Lower Upper

-5.61840 -1.07160

-5.83279 -0.85721


3/7

Unequal Variance case: 1 2

If one is not sure that the variances are equal it is

usually safest to assume that they are not.

Standard Error of 21 XX is estimated by

2

2

21

2

121 )( nsnsXXSE += .

The Tstatistic

)( 21

21

XXSE

XXTunequal

=

,

has a tdistribution with degrees of freedom that can

be estimated by:

( )

1

)(

1

)(

2

2

2

2

2

1

2

1

2

1

2

2

2

21

2

1

+

+

n

ns

n

ns

nsns

Note: If n1 and n2 > 80, then can use standard

Normal distribution in place of t, whichremoves necessity to estimate degrees of

freedom.

Example:NHANES III data

807 participants who got both dental exam and answeredchewing tobacco question. SPSS t-test output below.

Group Statistics

currently chewtobacco N Mean

Std.Deviation

Std.M

yes 341 1.71 1.724 mean attachment

loss no 466 1.50 1.381

Independent Samples Test

Levene's Test

for Equality

of Variances t-test for Equality of Means

F Sig. t df

Sig.

(2-tailed)

Mean

Difference

Std. Er

Differe

Equal

variances

assumed5.682 0.02 1.980 805 0.048 0.22 0.

mean

attachment

loss

Equal

variances

not

assumed

1.914 632.3 0.056 0.22 0.

95% Confidence Interval of

the Difference

Lower Upper

0.002 0.431

-0.006 0.439

What to do?

In this case choose the unequal-variances

results. They rely less on assumptions, pthe sample sizes are large enough so that

SEM estimates are probably close to optim

even if the variances are equal.


4/7

6/8/20

ANOVA - Analysis of Variance ANOVA - Analysis of Variance

Extends independent-samples t test Compares the means of groups of

independent observations

Dont be fooled by the name. ANOVA does not

compare variances.

Can compare more than two groups

ANOVA Null and Alternative Hypotheses

Say the sample contains Kindependent groups

ANOVA tests the null hypothesis

H0: 1 = 2 = = K

That is, the group means are all equal

The alternative hypothesis is

H1: i j for some i, j

or, the group means are notall equal

Example:

Accuracy of Implant

Placement

Implants were placed in a

manikin using placement

guides of various widths.

15 implants were placed

using each guide.

Error (discrepancies with a

reference implant) was

measured for each implant. 0.2

3

0.2

4

0.2

5

0.2

6

0.2

7

Mean Error by Guide Width

Guide Width

MeanImplantHeightError(mm)

4mm 6mm 8mm

Example:

Accuracy of Implant

Placement

The overall mean of the

entire sample was 0.248

mm.

This is called the grandmean, and is often

denoted by .

If H0 were true then wed

expect the group means to

be close to the grand

mean.

0.2

3

0.2

4

0

.25

0.2

6

0.2

7


Guide Width

MeanImplantH

eightError(mm)

4mm 6mm 8mm

X

Example:

Accuracy of Implant

Placement

The ANOVA test is based

on the combined distances

from .

If the combined distances

are large, that indicates we

should reject H0.

0.2

3

0.2

4

0

.25

0.2

6

0.2

7


Guide Width

MeanImplantH

eightError(mm)

4mm 6mm 8mm

X


5/7

6/8/20

The Anova Statistic

To combine the differences from the grand mean we

Square the differences

Multiply by the numbers of observations in the groups

Sum over the groups

where the are the group means.

SSB = Sum ofSquares Between groups

Note: This looks a bit like a variance.

( ) ( ) ( )28

2

6

2

4151515 XXXXXX

mmmmmm++=SSB

*X

How big is big?

For the Implant Accuracy Data, SSB = 0.0047

Is that big enough to reject H0?

As with the ttest, we compare the statistic to the

variability of the individual observations.

In ANOVA the variability is estimated by the Mean

Square Error, or MSE

0.1

0.2

0.3

0.4

0.5

Implant Height Error by Guide Width

Guide Width

ImplantHeightError(mm)

4mm 6mm 8mm

MSE

Mean Square Error

The Mean Square Error is a

measure of the variability

after the group effects

have been taken into

account.

wherexij is the ith

observation in thejth

group.

( )

=j i

jijXx

KNMSE

21

MSE

Mean Square Error

The Mean Square Error is a

measure of the variability

after the group effects

have been taken into

account.

Note that the variation of

the means seems quite

small compared to the

variance of observations

within groups

( )

=j i

jijXx

KNMSE

21

0.1

0.2

0.3

0.4

0.5

Implant Height Error by Guide Width

Guide Width

ImplantHeightError(mm)

4mm 6mm 8mm

Notes on MSE

If there are only two groups, the MSEis equal to the

pooled estimate of variance used in the equal-

variance ttest.

ANOVA assumes that all the group variances are

equal.

ANOVA F-statistic

The ANOVA is based on the Fstatistic

where Kis the number of groups.

Under H0 the Fstatistic has an F distribution, with

K-1 and N-K degrees of freedom (N is the total

number of observations)

MSE

KSSBF

)1( =


6/7

6/8/20

Implant Data:

p-value

To get a p-value we

compare our Fstatistic to

an F(2, 42) distribution.

F(2,42) distribution

F

0 1 2 3 4

Implant Data:

p-value

To get a p-value we

compare our Fstatistic to

an F(2, 42) distribution.

In our example

The p-value is

211.420467.

20047.==F

( ) 81.0211(2,42) => .FP

F(2,42) distribution

F

0 1 2 3 40.211

P = .81

ANOVA Table

Sum of

Squares df

Mean

Square F Sig.

Between Groups .005 2 .002 .211 .811

Within Groups .466 42 .011

Total .470 44

Results are often displayed using an ANOVA Table

ANOVA Table

Sum of

Squares df

Mean

Square F Sig.

Between Groups .005 2 .002 .211 .811

Within Groups .466 42 .011

Total .470 44

Results are often displayed using an ANOVA Table

Sum of SquaresBetween (SSB)

Mean SquareError (MSE)

F Statistic p value

Post Hoc Tests

Sum of

Squares df

Mean

Square F Sig.

Between

Groups33383 3 11128 5.1 .002

Within

Groups4417119 2007 2201

Total 4450502 2010

NHANES I data, women

40-60 yrs old. Compare

cholesterol between

periodontal groups.

The ANOVA shows good

evidence (p = 0.002) that

the means are not all the

same.

Which means are different?

Can directly compare the

subgroups using post hoc

tests.

Least Significant Difference test

Sum of

Squares df

Mean

Square F Sig.

Between

Groups33383 3 11128 5.1 .002

Within

Groups4417119 2007 2201

Total 4450502 2010

The most simple post hoc

test is called the Least

Significant Difference Test.

The computation is verysimilar to the equal-

variance ttest.

Compute an equal-variance

ttest, but replace the

pooled variance (s2) with

the MSE.

N Mean

Std.

Deviation

Healthy 802 221.5 46. 2

Gingivitis 490 223.5 45.3

Periodontitis 347 227.3 48.9

Edentulous 372 232.4 48. 8


7/7

6/8/20

Least Significant Difference Test: Examples

Sum of

Squares df

Mean

Square F Sig.

Between

Groups33383 3 11128 5.1 .002

Within

Groups4417119 2007 2201

Total4450502 2010

Compare Healthy group to

Periodontitis group:

Compare Gingivitis group to

Periodontitis group:

N Mean

Std.

Deviation

Healthy 802 221.5 46. 2

Gingivitis 490 223.5 45.3

Periodontitis 347 227.3 48.9

Edentulous 372 232.4 48. 8

( ) 92.13471802122013.2275.221

=+

=T

055.0)92.1(2 1147 =>= tPp

( )15.1

347149012201

3.2275.223=

+

=T

25.0)15.1(2835

=>= tPp

Post Hoc Tests: Multiple Comparisons

Post-hoc testing usually involves multiple

comparisons.

For example, if the data contain 4 groups, then 6

different pairwise comparisons can be made

Healthy Gingivitis

Periodontitis Edentulous

Post Hoc Tests: Multiple Comparisons

Each time a hypothesis test is performed at

significance level , there is probability of rejecting

in error.

Performing multiple tests increases the chances of

rejecting in error at least once.

For example:

if you did 6 independent hypothesis tests at the = 0.05

If, in truth, H0 were true for all six.

The probability that at least one test rejects H0 is 26%

P(at least one rejection) = 1-P(no rejections) = 1-.956 = .26

Bonferroni Correction for Multiple Comparisons

The Bonferroni correction is a simple way to adjust

for the multiple comparisons.

BonferroniCorrection

Perform each test at significance level .

Multiply each p-value by the number of tests

performed.

The overall significance level (chance of any of the

tests rejecting in error) will be less than .

Example: Cholesterol Data post-hoc comparisons

Group 1 Group 2

Mean

Difference

(Group 1 -

Group 2)

Least

Significant

Difference

p-value

Bonferroni

p-value

Healthy Gingivitis -2.0 .46 1.0

Healthy Pe riodontitis -5.8 .055 .330

Healthy Edentulous -10.9 .00021 .00126

Gingivi tis Per iodont it is -3.9 .25 1.0

Gingivi ti s Edentu lous -8.9 .0056 .0336

Periodontitis Edentulous -5.1 .147 .88

Example: Cholesterol Data post-hoc comparisons

Conclusion: The Edentulous group is significantly different than

the Healthy group and the Gingivitis group (p < 0.05), after

adjustment for multiple comparisons

Group 1 Group 2

Mean

Difference

(Group 1 -

Group 2)

Least

Significant

Difference

p-value

Bonferroni

p-value

Healthy Gingivitis -2.0 .46 1.0

Healthy Periodontitis -5.8 .055 .330

Healthy Edentulous -10.9 .00021 .00126

Gingivi tis Per iodont it is -3.9 .25 1.0

Gingivi ti s Edentu lous -8.9 .0056 .0336

Periodontitis Edentulous -5.1 .147 .88

two sample and anova handout

Documents