workshop 4 -solutions
TRANSCRIPT
MAS230 : Biostatistical Methods
Workshop 4
1. Consider Data Set 75. These data come from Judge, M.D. et al (1984). “Thermal
shrinkage temperature of intramuscular collagen of bulls and steers,” Journal of
Animal Science 59: 706–9, and are reproduced in Samuels and Witmer (1999),
Statistics for Life Sciences, 2nd Edition, Prentice Hall, p. 357. The study is designed
to assess the effect of electrical stimulation of a beef carcass in terms of improving
the tenderness of the meat. In this test, beef carcasses were split in half. One
side was subjected to a brief electrical current while the other was an untreated
control. From each side a specimen of connective tissue (collagen) was taken, and
the temperature at which shrinkage occurred was determined. Increased tenderness
is related to a low shrinkage temperature.
Carry out analyses to assess the impact of electrical stimulation on the meat tender-
ness. Use both parametric and non-parametric methods and compare the results.
Suppose acceptable tenderness corresponds to a shrinkage temperature less than
69 degrees. How would you test to see if the proportions of acceptable tenderness
values differed under the two treatments? Use SPSS to create appropriate variables
to enable this to be tested and carry out the analysis. Don’t forget that the sample
sizes are small here.
To assess the impact of electrical stimulation on meat tenderness, we will examine
whether the mean temperature at which shrinkage occurs is different for the two
groups. Since one half of each carcass is assigned to the untreated group, and
the other half is assigned to the group that receives electrical stimulation, the two
samples are not independent. Consequently, the most appropriate analysis is a test
for paired differences.
We might consider a t-test for paired differences. This test is based on the hypothe-
ses
H0 : δ = 0
H1 : δ 6= 0
where δ represents the mean difference in temperature at which shrinkage occurs for
the untreated carcass halves and the carcass halves treated with electrical stimula-
tion. To carry out a t-test for paired differences the carcasses must be independent.
1
Without information about how the carcasses were obtained, we will assume that
this is the case. Additionally, the differences in temperature at which shrinkage
occurs must come from an approximately normal distribution. A Q-Q plot for dif-
ferences in temperature at which shrinkage occurs is given in Figure 1. The observed
Figure 1: Q-Q plot of differences in temperature at which shrinkage occurs.
differences fall quite close to the line, suggesting that the assumption of normality
of the differences is plausible, and we proceed with the t-test. Output for this test is
given in Figure 2. The test statistic is given by t = 0.404 on 14 degrees of freedom.
Figure 2: Output for t-test for mean difference in temperature at which shrinkage occurs
for untreated carcass halves and electrically stimulated carcass halves.
The corresponding p-value of 0.692 is not significant for any reasonable significance
level, so we do not reject the null hypothesis, meaning that we have insufficient
evidence to suggest that treating carcasses with electrical stimulation produces any
difference in the mean temperature at which shrinkage occurs.
If the assumptions of the t-test had not met, we might have considered a sign
test or Wilcoxon signed-rank test. Even though the t-test is the appropriate test,
2
here I provide analyses using both the sign test and Wilcoxon signed-rank test for
illustrative purposes. For both the sign test and Wilcoxon signed-rank test, we
would test the hypotheses
H0 : µ̃ = 0
H1 : µ̃ 6= 0
where µ̃ represents the median difference in temperature at which shrinkage oc-
curs for the untreated carcass halves and the carcass halves treated with elec-
trical stimulation. To carry out both tests, the carcasses must be independent.
We already addressed this requirement/assumption for the t-test. To carry out a
Wilcoxon signed-rank test, the differences would need to be approximately sym-
metric. Both a histogram and boxplot for differences are given in Figure 3. Neither
show strong evidence to suggest that differences are asymmetric, so either a sign
test or a Wilcoxon-signed rank test can be carried out. Output for the two tests
are given in Figure 4. The p-values for both tests (0.791 for the sign test and 0.801
for the Wilcoxon signed-rank test) exceed any reasonable significance level, so we
would not reject the null hypothesis, meaning that we have insufficient evidence to
conclude that there is a difference in the median temperature at which shrinkage
occurs for untreated carcass halves and electrically stimulated carcass halves.
For the secondary analysis of interest, we must convert temperatures at which
shrinkage occurs to binary variables, where a ‘1’ denotes a shrinkage temperature
less than 69 degrees, and a ‘0’ denotes a shrinkage temperature of 69 degrees or
more. (Note that we could also let ‘0’ denote a shrinkage temperature less than
69 degrees and ‘1’ denote a shrinkage temperature of 69 degrees or more, and our
analysis would produce identical results.) Our two samples are still dependent, and
we would like to determine whether or not the proportions of acceptable carcass
halves differs under the two treatments, so we are restricted to either a kappa test
or McNemar’s test. McNemar’s test is the appropriate test here, as we would like to
see whether or not the proportion of acceptable carcass halves changes after apply-
ing a treatment (electrical stimulation). Since the independence assumption for this
test was already addressed when we carried out the t-test, we proceed with the test.
Output for the test is given in Figure 5. The p-value of approximately 1 means that
we will not reject the null hypothesis for any reasonable significance level, meaning
that we do not have evidence to suggest that there is a difference in the proportion
of acceptable carcass halves for untreated carcass halves and electrically stimulated
carcass halves.
3
Figure 3: Histogram (top) and boxplot (bottom) for difference in temperature at which
shrinkage occurs for the untreated carcass halves and the carcass halves treated with
electrical stimulation.
4
Figure 4: Output for sign test (left) and Wilcoxon signed-rank test (right) for median
temperature at which shrinkage occurs for untreated carcass halves and electrically stim-
ulated carcass halves.
Figure 5: Output for McNemar’s test for difference in proportion of acceptable carcass
halves for untreated carcass halves and electrically stimulated carcass halves.
5
2. Refer to Data Set 76. These data come from Mochizuki, M. et al (1984). “Effects
of smoking on fetoplacentalmaternal system during pregnancy,” American J. Ob-
stet. Gyn. 149: 13–20. The study considered the effects of smoking during preg-
nancy by examining the placentas from 58 women after childbirth. Each mother was
classified as a non-, moderate or heavy smoker during pregnancy, and the outcome
measure was presence or absence of atrophied placental villi, finger-like structures
that protrude from the wall to increase absorption area.
Combine the two smoking classes to create a “smoker” class and carry out an ap-
propriate test for association of villi atrophy with smoking status. Given there are
three ordered classes of smoking (none < moderate < heavy) think about how you
might display such data.
We can consider either of two different statistical tests, both of which will produce
identical results. First, if villi atrophy is associated with smoking status, that means
that the probability of villi atrophy changes with a person’s smoking status. Thus,
we can carry out a test to determine if the proportion of smokers with villi atrophy
is the same as the proportion of non-smokers with villi atrophy. To do this, we
consider the statistical hypotheses
H0 : πs = πn.s.
H1 : πs 6= πn.s.
where πs denotes the population proportion of smokers with villi atrophy and πn.s.
denotes the population proportion of non-smokers with villi atrophy. By hand, we
can verify that
ns = 36
nn.s. = 22
p̂s =21
36= 0.583
p̂n.s. =4
22= 0.182
p̂ =21 + 4
36 + 22= 0.431
Before carrying out this test, it is important to determine whether or not the re-
quirements/assumptions of the test are met. Without detailed information about
the sampling mechanism, we will assume that the smokers and non-smokers in the
study were sampled independently of each other, so the sample of smokers is inde-
pendent of the sample of non-smokers, and individuals within each sample are in-
dependent. Additionally, we must confirm that nsp̂ > 5, ns (1− p̂) > 5, nn.s.p̂ > 5,
6
and nn.s. (1− p̂) > 5. Since
nsp̂ = 36 · 0.431 = 15.517 > 5
ns (1− p̂) = 36 (1− 0.431) = 20.483 > 5
nn.s.p̂ = 22 · 0.431 = 9.483 > 5
nn.s. (1− p̂) = 22 (1− 0.431) = 12.517 > 5
it is appropriate to use a normal approximation for our two sample test for inde-
pendent proportions. Then our test statistic is given by
z =p̂s − p̂n.s.√
p̂ (1− p̂)(
1ns
+ 1nn.s.
)=
0.583− 0.182√0.431 (1− 0.431)
(136
+ 122
)= 2.996
and the corresponding p-value is given by
P (Z > 2.996) + P (Z < −2.996) = 2 · 0.001367431 = 0.0027
Since our p-value of 0.0027 is less than any reasonable significance level, we have
sufficient evidence to reject the null hypothesis, meaning that we have evidence
to suggestion that the proportion of smokers with villi atrophy is different from
the proportion of non-smokers with villi atrophy. This means that villi atrophy is
associated with smoking status.
The second test we can consider is a chi-square test of independence. If villi atrophy
is not associated with smoking status, then villi atrophy and smoking status are
independent. Otherwise, they are dependent. To carry out this test, we consider
the hypotheses
H0 : smoking status and villi atrophy are independent
H1 : smoking status and villi atrophy are not independent
Output for the chi-square test of independence is given in Figure 6. The test statistic
is
χ2 = 8.976 , d.f. = (2− 1) · (2− 1) = 1
and the corresponding p-value is approximately 0.003. Note that these results
are identical to the two sample test for independent proportions. In particular,
7
Figure 6: Output for chi-square test of independence.
z2 = 2.9962 = 8.976 = χ2, and the p-values are identical (The difference between
the two is simply due to rounding for the p-value for the chi-square test of inde-
pendence). As we would expect, given the results of the two sample test for inde-
pendent proportions, we reject the null hypothesis, so we have evidence to suggest
that smoking status and villi atrophy are dependent, so incidence of villi atrophy
changes based on smoking status.
Other than the previously mentioned independence assumption, the only other as-
sumption for the chi-square test is that no more than 20% of expected cell counts
have a value less than 5. From the output, we see that no cell counts are less
than 5, so using the chi-square test of independence is appropriate. Note that the
output also gives the lowest expected cell count. In fact, this quantity will always
correspond to the smallest value of n1p̂, n1 (1− p̂), n2p̂, and n2 (1− p̂). Thus, if
this value is greater than 5, we know that the sample size requirements are met
to use the normal approximation for a two sample test for independent propor-
tions. At the same time, n1p̂, n1 (1− p̂), n2p̂, and n2 (1− p̂) give the expected cell
counts for a chi-square test. These further highlight that these two tests are in fact
identical except that they use different probability distributions (standard normal
distribution versus chi-square distribution). Consequently, SPSS does not include
a function for two sample tests for independent samples because a chi-square test
of independence does exactly the same thing.
To display our data, we note that our data are categorical, so a good way to graph
the data is through bar plots. In Figure 7, we give two similar barplots of smoking
status by villi atrophy. The plot on the top gives a comparison of smokers with
non-smokers, whereas the bottom gives a comparison across the different smoking
classes. These suggest that the likelihood of villi atrophy increases with smoking.
8
Figure 7: Bar plots of smoking status villi atrophy.
9
3. An environmental scientist studying the impact of pollution on species diversity
along two nearby rivers carried out a survey in which plots (quadrats) of size 30
metres by 20 metres were randomly chosen from along the banks of the rivers.
Within each quadrat the numbers of different tree species were recorded. The data
were as follows:
Valley River Ridge River
9 9 15 12 13 13 10 6 7 10
13 13 8 11 9 9 18 6 9 9
10 9 14 11 7 8 6 11
What would you conclude from these data in terms of differences in species diver-
sity? Think about the nature of the data, what might be the best way to compare
them, what assumptions are being made in the comparison, etc. Are there any
values which might need special consideration? What is their effect on the various
analyses if included or excluded?
Examining boxplots of species for each river, given by Figure 8, the value of 18
species recorded for the Ridge River is an outlier. Without a valid reason to omit
this data point (e.g., the value was clearly recorded wrong), it should remain in the
analysis. However, for instructional purposes we will carry out analyses including
and excluding this data point to see how much of an impact it has on the analyses.
We would like to test the hypotheses
H0 : µV = µR
H1 : µV 6= µR
where µV denotes the average number of species for the Valley River, and µR denotes
the average number of species for the Ridge River. We might consider carrying out
a t-test for two independent means. To carry out such a test, we must assume that
the two rivers are independent of each other and that sampled quadrats within each
river are independent of each other. Since the number of quadrats is not all that
large for either river, we must see if it is plausible that the number of species per
quadrat comes from an approximately normal distribution for each river. To do this,
we look at Q-Q plots for species numbers separately for the two rivers. Such plots
are given in Figure 9. We note that the points fall quite close to the line with the
lone exception being the outlier, so it seems plausible that the normality assumption
has been met. The third assumption that must be checked is that of equal variance
10
Figure 8: Boxplots of species per quadrat for the .
for number of species per quadrat for the two rivers. SPSS automatically carries out
Levene’s test with a t-test for two independent means, and output from Levene’s
test and the t-test is given in Figure 10. Levene’s test essentially tests
H0 : σ2V = σ2
R
H1 : σ2V 6= σ2
R
where σ2V denotes the population variance for number of species per quadrat for
the Valley River, and σ2R denotes the population variance for number of species
per quadrat for the Ridge River. This test produces a p-value of 0.696 which is
larger than any reasonable significance level, so we do not reject the null hypothesis,
meaning that we have insufficient evidence to conclude that the variances for number
of species per quadrat are different for the two rivers. Thus, it seems plausible that
our assumption of equal variances is met. With the necessary assumptions met, we
proceed with the t-test and note that we obtained a test statistic of t = 1.711 on
26 degrees of freedom. The corresponding p-value of 0.099 is not significant at the
α = 0.05 significance level, so we would not reject H0, and we cannot conclude that
there is a difference between the two rivers in terms of diversity of the species.
To assess the impact of the outlier on this result, we repeat the analysis with this
outlier omitted. Doing so, our Q-Q plot for the Ridge River is given by Figure 11
and more strongly suggests normality than before. Repeating Levene’s test on this
data, we obtain the output given in Figure 12. The p-value of 0.540 again suggests
11
Figure 9: Q-Q plots for species numbers per quadrat for the Valley River (top) and Ridge
River (bottom).
Figure 10: Output for Levene’s test and t-test for two independent means.
12
Figure 11: Q-Q plot for species numbers per quadrat for the Ridge River.
that it is plausible that the assumption of equal variances is met, so the necessary
requirements of the test are still met, and we repeat our t-test for the original data,
minus the outlier. The output is given in Figure 12. With the outlier removed, the
Figure 12: Output for Levene’s test and t-test for two independent means with outlier
removed.
test statistic is given by t = 2.838 with 25 degrees of freedom, and the p-value is
0.009. This p-value is smaller than most any significance levels we might consider, so
we reject the null hypothesis, meaning that we have evidence to suggest that there
is a difference between the two rivers in terms of diversity of the species. Thus, this
outlier has a rather significant impact on the results of our analysis.
Non-parametric tests are not as adversely affected by outliers, so we might have
considered a Wilcoxon rank-sum/Mann-Whitney U test instead of a t-test. This
test has only the minimal assumption that the two rivers are independent of each
other and sampled quadrats are independent of each other for each river, and the
13
hypotheses are given by
H0 : µ̃V = µ̃R
H1 : µ̃V 6= µ̃R
where µ̃V denotes the median number of species per quadrat for the Valley River,
and µ̃R denotes the median number of species per quadrat for the Ridge River.
Output for such a test on the full data set is given in Figure 13. The p-value of
0.058 means that we would not reject the null hypothesis at the α = 0.05 significance
level, so we have insufficient evidence to suggest that there is a difference in the
median number of species per quadrat for the Valley River and the Ridge River.
Figure 13: Output for Wilcoxon rank-sum/ Mann-Whitney U test.
If we remove the outlier, we obtain the results given in Figure 14. As we would
expect, we get a smaller p-value of 0.019 when we remove the outlier, and we
would reject the null hypothesis at the α = 0.05 significance level. Thus, we would
conclude that we have sufficient evidence to suggest that there is a difference in the
median number of species per quadrat for the Valley River and the Ridge River.
Figure 14: Output for Wilcoxon rank-sum/ Mann-Whitney U test with outlier removed.
Note that, with the outlier included in the analysis, the p-value produced by the
Wilcoxon rank-sum/Mann-Whitney U test is substantially smaller than that pro-
duced by the t-test. When we remove the outlier, we observe the opposite relation-
ship, as the p-value produced by the Wilcoxon rank-sum/Mann-Whitney U test is
larger than that produced by the t-test. This highlights important aspects of para-
metric and non-parametric tests. Parametric tests are quite sensitive to departures
14
from the assumed distributions of the samples (in the case of the t-test, normal
distributions), so outliers will have a significant impact on the results and tend to
inflate the p-value. Since non-parametric tests are not tied to specific distributional
assumptions for the samples, outliers have a much less significant impact, so they
tend to inflate the p-value much less when outliers are present. At the same time,
if there are no outliers present and the samples fit the distributional assumptions of
the parametric test, that test will virtually always produce a smaller p-value than
a non-parametric tests, as the minimal distribution assumptions of non-parametric
tests will make their estimates of the p-value more conservative.
15