two sample and anova handout
TRANSCRIPT
-
7/28/2019 Two Sample and ANOVA Handout
1/7
Comparing Two Means
Often we have two unknown means and areinterested in comparing them to each other. Usually
the null hypothesis is
H0: no difference between the population means
There are a number of related testing procedures we
will present. Which testing procedure you choosedepends on your data.
We will present three basic procedures here:
Paired t-test for paired or matched dataTwo-sample t-tests for comparing two
independent groups. Two basic independent-
sample tests will be presented:
Equal variance t-tests: the two groups can beassumed to have equal variances.
Unequal variance t-tests: the two groups arenot assumed to have equal variances.
Paired t-test
When the means being compared come fromobservations that are naturally paired or matched, a
paired t-test is used.
Examples: Before vs after studies, also called
longitudinal studies produce paired data. Each
patient contributes two paired observations: the
before value and the after value.
Other types of studies can produce paired data also.
One possibility would be a dental study where both
opposing treatments are used in each patient, inrandomly assigned half-mouths.
Computing a Paired t-test
To compute a paired t-test, focus on the within-pair
differences (for example after before).
Perform a t-test on the mean of the differences. To
test if the means are different the null-hypothesis is
H0: differences = 0.
Note: Even though we are comparing two means,
this is still considered a one-sample test.
Example: fluoride varnish study
In ten at-risk children, fluoride varnish is applied inrandomly assigned half-mouths. The remaining half-mouths are left untreated. The children are followed for
two years and the new dmfs and locations are recorded:
patient varnish untreated difference
1 2 3 -1
2 1 2 -1
3 0 1 -1
4 2 0 2
5 0 0 0
6 0 2 -2
7 2 5 -3
8 1 1 0
9 3 7 -4
10 5 4 1
mean 1.6 2.5 -0.90
sd 1.79
To perform the paired t-test, compute a one-sample t-test
on the last column where H0: = 0.
59.11079.1
090.=
=T
For a two-tailed test compare |-1.59|=1.59 to t9, .975 = 2.262.
We do not reject since 1.59 < 2.262. P-value is
P(|t9| > |-1.59|) = 2P(t9 > 1.59) = 0.15.
Comparing means of two independent samples
These are called two-sample tests.
Our goal is usually to estimate 1 - 2 and the
corresponding confidence intervals and to perform
hypothesis tests on:H0: 1 - 2 = 0.
For each sample we compute the relevant statistics:
Sample 1 Sample 2n1 n2
1X 2X
s1 s2
The obvious statistic to compare the two population
means is 21 XX .
Probability theory tells us that:
1. 21 XX is the best estimate of 1 - 22.the standard error is 222121 nn + 3. for large n1and n2:
( )2221212121 ,~ nnNXX +
-
7/28/2019 Two Sample and ANOVA Handout
2/7
In order to compute hypothesis tests and confidence
intervals for 1 - 2 we will need to estimate the
standard error of 21 XX .
Two different estimation procedures are commonlyused depending on whether one feels it is reasonableto assume the two groups have similar variances.
RULES OF THUMB for deciding whether to use
the equal variance or unequal variance formulas
1.For small samples can use equal varianceformulas unlesss1 is twice as big ass2, or the
other way around.
2. If n1 and n2 > 80 can use unequal varianceformulae for SE (its easier to compute), and use
the Normal distribution.
3. If you are unsure, the unequal variance formulawill be the conservative choice (less power, but
less likely to be incorrect).
4.The calculations are a snap with a computerprogram. If unsure about variance assumptions,
compute the test both ways and see if there is a
conflict.
Equal Variance case: 1 = 2
If it is reasonable to assume that 1=2, we canestimate the standard error more efficiently by
combining the sample.
Standard Error of 21 XX is estimated by
2121 11)( nnsXXSE pooled += ,
where the pooled standard deviation,spooled is
2
)1()1(
21
2
22
2
11
+
+=
nn
snsnspooled
.
This pooled standard deviation is roughly the
combined distance of observations from their
respective means.
The Tstatistic
)( 21
21
XXSE
XXTequal
=
,
has a tdistribution with n1+ n2-2 degrees of freedom.
Example: Confidence Intervals for difference
between means. Gum data from day 1.
Gum A Gum Cn1=25 n2=40
1X =-0.72 2X =2.63s1=5.37 s2=3.80
Assume equal variances (s2 /s1< 2)
46.424025
80.33937.52422
=+
+=pooleds ,
14.140125146.4)( 21 =+ XXSE ,
so 95% confidence interval is
( )07.1,63.51.142.002.63-0.72- =
Note: Since confidence interval does not cover 0,this implies that a two-sided hypothesis test of
H0: 1 - 2 = 0, would reject at level =0.05.
check: T = |(-0.72 - 2.63)/1.14| = 2.94 > 2.00 = t63,.975.
t63,.975
SPSS output for Gum example:
T-Test
Group Statistics
gumtype N Mean Std. Deviation
Std. ErrorMean
A25 -0.7200 5.36594 1.07319
change inDMFS
C40 2.6250 3.80073 0.60095
Independent Samples Test
Levene's Test
for Equalityof Variances t-test for Equality of Means
F Sig. t dfSig. (2-tailed)
MeanDifference
Std. ErrorDifference
Equalvariancesassumed 0.924 0.340 -2.940 63 0.005 -3.345 1.138
change inDMFS
Equalvariancesnotassumed
-2.720 39.05 0.010 -3.345 1.230
95% Confidence Interval of theDifference
Lower Upper
-5.61840 -1.07160
-5.83279 -0.85721
-
7/28/2019 Two Sample and ANOVA Handout
3/7
Unequal Variance case: 1 2
If one is not sure that the variances are equal it is
usually safest to assume that they are not.
Standard Error of 21 XX is estimated by
2
2
21
2
121 )( nsnsXXSE += .
The Tstatistic
)( 21
21
XXSE
XXTunequal
=
,
has a tdistribution with degrees of freedom that can
be estimated by:
( )
1
)(
1
)(
2
2
2
2
2
1
2
1
2
1
2
2
2
21
2
1
+
+
n
ns
n
ns
nsns
Note: If n1 and n2 > 80, then can use standard
Normal distribution in place of t, whichremoves necessity to estimate degrees of
freedom.
Example:NHANES III data
807 participants who got both dental exam and answeredchewing tobacco question. SPSS t-test output below.
Group Statistics
currently chewtobacco N Mean
Std.Deviation
Std.M
yes 341 1.71 1.724 mean attachment
loss no 466 1.50 1.381
Independent Samples Test
Levene's Test
for Equality
of Variances t-test for Equality of Means
F Sig. t df
Sig.
(2-tailed)
Mean
Difference
Std. Er
Differe
Equal
variances
assumed5.682 0.02 1.980 805 0.048 0.22 0.
mean
attachment
loss
Equal
variances
not
assumed
1.914 632.3 0.056 0.22 0.
95% Confidence Interval of
the Difference
Lower Upper
0.002 0.431
-0.006 0.439
What to do?
In this case choose the unequal-variances
results. They rely less on assumptions, pthe sample sizes are large enough so that
SEM estimates are probably close to optim
even if the variances are equal.
-
7/28/2019 Two Sample and ANOVA Handout
4/7
6/8/20
ANOVA - Analysis of Variance ANOVA - Analysis of Variance
Extends independent-samples t test Compares the means of groups of
independent observations
Dont be fooled by the name. ANOVA does not
compare variances.
Can compare more than two groups
ANOVA Null and Alternative Hypotheses
Say the sample contains Kindependent groups
ANOVA tests the null hypothesis
H0: 1 = 2 = = K
That is, the group means are all equal
The alternative hypothesis is
H1: i j for some i, j
or, the group means are notall equal
Example:
Accuracy of Implant
Placement
Implants were placed in a
manikin using placement
guides of various widths.
15 implants were placed
using each guide.
Error (discrepancies with a
reference implant) was
measured for each implant. 0.2
3
0.2
4
0.2
5
0.2
6
0.2
7
Mean Error by Guide Width
Guide Width
MeanImplantHeightError(mm)
4mm 6mm 8mm
Example:
Accuracy of Implant
Placement
The overall mean of the
entire sample was 0.248
mm.
This is called the grandmean, and is often
denoted by .
If H0 were true then wed
expect the group means to
be close to the grand
mean.
0.2
3
0.2
4
0
.25
0.2
6
0.2
7
Mean Error by Guide Width
Guide Width
MeanImplantH
eightError(mm)
4mm 6mm 8mm
X
Example:
Accuracy of Implant
Placement
The ANOVA test is based
on the combined distances
from .
If the combined distances
are large, that indicates we
should reject H0.
0.2
3
0.2
4
0
.25
0.2
6
0.2
7
Mean Error by Guide Width
Guide Width
MeanImplantH
eightError(mm)
4mm 6mm 8mm
X
-
7/28/2019 Two Sample and ANOVA Handout
5/7
6/8/20
The Anova Statistic
To combine the differences from the grand mean we
Square the differences
Multiply by the numbers of observations in the groups
Sum over the groups
where the are the group means.
SSB = Sum ofSquares Between groups
Note: This looks a bit like a variance.
( ) ( ) ( )28
2
6
2
4151515 XXXXXX
mmmmmm++=SSB
*X
How big is big?
For the Implant Accuracy Data, SSB = 0.0047
Is that big enough to reject H0?
As with the ttest, we compare the statistic to the
variability of the individual observations.
In ANOVA the variability is estimated by the Mean
Square Error, or MSE
0.1
0.2
0.3
0.4
0.5
Implant Height Error by Guide Width
Guide Width
ImplantHeightError(mm)
4mm 6mm 8mm
MSE
Mean Square Error
The Mean Square Error is a
measure of the variability
after the group effects
have been taken into
account.
wherexij is the ith
observation in thejth
group.
( )
=j i
jijXx
KNMSE
21
MSE
Mean Square Error
The Mean Square Error is a
measure of the variability
after the group effects
have been taken into
account.
Note that the variation of
the means seems quite
small compared to the
variance of observations
within groups
( )
=j i
jijXx
KNMSE
21
0.1
0.2
0.3
0.4
0.5
Implant Height Error by Guide Width
Guide Width
ImplantHeightError(mm)
4mm 6mm 8mm
Notes on MSE
If there are only two groups, the MSEis equal to the
pooled estimate of variance used in the equal-
variance ttest.
ANOVA assumes that all the group variances are
equal.
ANOVA F-statistic
The ANOVA is based on the Fstatistic
where Kis the number of groups.
Under H0 the Fstatistic has an F distribution, with
K-1 and N-K degrees of freedom (N is the total
number of observations)
MSE
KSSBF
)1( =
-
7/28/2019 Two Sample and ANOVA Handout
6/7
6/8/20
Implant Data:
p-value
To get a p-value we
compare our Fstatistic to
an F(2, 42) distribution.
F(2,42) distribution
F
0 1 2 3 4
Implant Data:
p-value
To get a p-value we
compare our Fstatistic to
an F(2, 42) distribution.
In our example
The p-value is
211.420467.
20047.==F
( ) 81.0211(2,42) => .FP
F(2,42) distribution
F
0 1 2 3 40.211
P = .81
ANOVA Table
Sum of
Squares df
Mean
Square F Sig.
Between Groups .005 2 .002 .211 .811
Within Groups .466 42 .011
Total .470 44
Results are often displayed using an ANOVA Table
ANOVA Table
Sum of
Squares df
Mean
Square F Sig.
Between Groups .005 2 .002 .211 .811
Within Groups .466 42 .011
Total .470 44
Results are often displayed using an ANOVA Table
Sum of SquaresBetween (SSB)
Mean SquareError (MSE)
F Statistic p value
Post Hoc Tests
Sum of
Squares df
Mean
Square F Sig.
Between
Groups33383 3 11128 5.1 .002
Within
Groups4417119 2007 2201
Total 4450502 2010
NHANES I data, women
40-60 yrs old. Compare
cholesterol between
periodontal groups.
The ANOVA shows good
evidence (p = 0.002) that
the means are not all the
same.
Which means are different?
Can directly compare the
subgroups using post hoc
tests.
Least Significant Difference test
Sum of
Squares df
Mean
Square F Sig.
Between
Groups33383 3 11128 5.1 .002
Within
Groups4417119 2007 2201
Total 4450502 2010
The most simple post hoc
test is called the Least
Significant Difference Test.
The computation is verysimilar to the equal-
variance ttest.
Compute an equal-variance
ttest, but replace the
pooled variance (s2) with
the MSE.
N Mean
Std.
Deviation
Healthy 802 221.5 46. 2
Gingivitis 490 223.5 45.3
Periodontitis 347 227.3 48.9
Edentulous 372 232.4 48. 8
-
7/28/2019 Two Sample and ANOVA Handout
7/7
6/8/20
Least Significant Difference Test: Examples
Sum of
Squares df
Mean
Square F Sig.
Between
Groups33383 3 11128 5.1 .002
Within
Groups4417119 2007 2201
Total4450502 2010
Compare Healthy group to
Periodontitis group:
Compare Gingivitis group to
Periodontitis group:
N Mean
Std.
Deviation
Healthy 802 221.5 46. 2
Gingivitis 490 223.5 45.3
Periodontitis 347 227.3 48.9
Edentulous 372 232.4 48. 8
( ) 92.13471802122013.2275.221
=+
=T
055.0)92.1(2 1147 =>= tPp
( )15.1
347149012201
3.2275.223=
+
=T
25.0)15.1(2835
=>= tPp
Post Hoc Tests: Multiple Comparisons
Post-hoc testing usually involves multiple
comparisons.
For example, if the data contain 4 groups, then 6
different pairwise comparisons can be made
Healthy Gingivitis
Periodontitis Edentulous
Post Hoc Tests: Multiple Comparisons
Each time a hypothesis test is performed at
significance level , there is probability of rejecting
in error.
Performing multiple tests increases the chances of
rejecting in error at least once.
For example:
if you did 6 independent hypothesis tests at the = 0.05
If, in truth, H0 were true for all six.
The probability that at least one test rejects H0 is 26%
P(at least one rejection) = 1-P(no rejections) = 1-.956 = .26
Bonferroni Correction for Multiple Comparisons
The Bonferroni correction is a simple way to adjust
for the multiple comparisons.
BonferroniCorrection
Perform each test at significance level .
Multiply each p-value by the number of tests
performed.
The overall significance level (chance of any of the
tests rejecting in error) will be less than .
Example: Cholesterol Data post-hoc comparisons
Group 1 Group 2
Mean
Difference
(Group 1 -
Group 2)
Least
Significant
Difference
p-value
Bonferroni
p-value
Healthy Gingivitis -2.0 .46 1.0
Healthy Pe riodontitis -5.8 .055 .330
Healthy Edentulous -10.9 .00021 .00126
Gingivi tis Per iodont it is -3.9 .25 1.0
Gingivi ti s Edentu lous -8.9 .0056 .0336
Periodontitis Edentulous -5.1 .147 .88
Example: Cholesterol Data post-hoc comparisons
Conclusion: The Edentulous group is significantly different than
the Healthy group and the Gingivitis group (p < 0.05), after
adjustment for multiple comparisons
Group 1 Group 2
Mean
Difference
(Group 1 -
Group 2)
Least
Significant
Difference
p-value
Bonferroni
p-value
Healthy Gingivitis -2.0 .46 1.0
Healthy Periodontitis -5.8 .055 .330
Healthy Edentulous -10.9 .00021 .00126
Gingivi tis Per iodont it is -3.9 .25 1.0
Gingivi ti s Edentu lous -8.9 .0056 .0336
Periodontitis Edentulous -5.1 .147 .88