week 13 comparing two populations, part iipersonal.psu.edu/acq/401/course.info/week13.pdf · week...

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Week 13Comparing Two Populations, Part II

Week 13 Comparing Two Populations, Part II




Week 13 Objectives

Coverage of the topic of comparing two population continues withnew procedures and a new sampling design. The week concludeswith a lab session. In particular:

1 The rank-sum test, is presented.

2 The concept of paired data is introduced, and the paired-dataT -test, the signed-rank test, and McNemar’s test are described.

3 Levene’s test and the F -test for comparing two variances arepresented.

4 The lab session demonstrates the R implementation of testprocedures for one-sample, two-samples, and regressionincluding checking for the validity of assumptions.





1 The Rank-Sum Test Procedure

2 Paired Data

3 Comparing Two Variances

4 Lab 8: Hypothesis Testing with R

Hypothesis Testing in Regression

The “t.test” Command

The “wilcox.test” Command

The “prop.test” Command for Two Proportions





Motivation

If the sample sizes are small and the populationsnon-normal the T test is not valid.The Mann-Whitney-Wilcoxon rank-sum test (or rank-sumtest for short), which will be described, can be used withboth small and large sample sizes.

If the two populations are continuous, the null distribution ofthe TS is known even with very small sample sizes.For discrete populations, the null distribution of the TS canbe well approximated with much smaller sample sizes thanthose required by contrast-based procedure.

The rank-sum test has high power, especially if the twopopulation distributions are heavy tailed, or skewed.





The Null Hypothesis and the TS

The rank sum procedure tests HF0 : F1 = F2.

Let Rij denote the (mid-)rank of observation Xij in thecombined set of N = n1 + n2 observations, and set

W1 =

n1∑j=1

R1j , R1 =W1

n1, W2 =

n2∑j=1

R2j , R2 =W2

n2.

Then, the Mann-Whitney-Wilcoxon TS is

R1 − R2 =N

n1n2

(W1 − n1

N + 12

), or simply W1





The Standardized Rank-Sum TS and RR

If there are no ties,

ZH0 =W1 − n1(N + 1)/2√

n1n2(N + 1)/12

If HF0 holds, ZH0

·∼N(0,1), for n1, n2 > 8. The RR are:

Ha Rejection region at level αµ1 − µ2 > 0 ZH0 ≥ zαµ1 − µ2 < 0 ZH0 ≤ −zαµ1 − µ2 6= 0 |ZH0 | ≥ zα/2





ExampleData on sputum histamine levels from 9 allergic and 13non-allergic individuals are given in http://personal.psu.edu/acq/401/Data/HistaminData.txt. Is there adifference between the two populations? Test at α = .01.Solution. Here R11 = 18, R12 = 11, R13 = 22, R14 = 19,R15 = 17, R16 = 21, R17 = 7, R18 = 20, R19 = 16. Thus

W =∑

j

R1j = 151 and ZH0 =151− 9(23)/2√

9(13)(23)/12= 3.17.

Since n1, n2 > 8, p-value=2[1− Φ(3.17)] = .0016. Thus thedifference is significant.


http://personal.psu.edu/acq/401/Data/HistaminData.txt

http://personal.psu.edu/acq/401/Data/HistaminData.txt




Effect of Outliers

In the above example, the t test does not reject at level0.01.

With data in data frame hi,

t.test(hi$Level∼hi$Sample)

gives p-value of 0.13, and

t.test(hi$Level∼hi$Sample, var.equal=TRUE)

gives p-value of 0.06.

In general, using a procedure when the underlyingassumptions are violated will give misleading results.





Introduction, Motivation

Paired data arise when each experimental unit receiveseach of the two treatments that being compared.

1 Compare the durability of two types of tires.2 Compare two labs for the analysis of mercury content.3 Two acne treatments, two cataract treatments, etc.

Paired data are of the form: (X11,X21), . . . , (X1n,X2n).CIs and the TS are again based on X 1 − X 2. But now theyare not independent. Thus, previous formulas do not apply.For example, σ2

X 1−X 2= σ2

X 1+ σ2

X 2− 2Cov(X 1,X 2).

Similarly, the rank sum test is not valid now.





The paired data T-test

While Cov(X 1,X 2) can be estimated, it is easier to use

D1 = X11 − X21, . . . ,Dn = X1n − X2n

D1, . . . ,Dn are independent, and D = X 1 − X 2. Thus,σ2

X 1−X 2= σ2

D can be estimated by σ̂2D

= S2D/n, where

S2D =

1n − 1

[n∑

i=1

D2i −

1n

(n∑

i=1

Di)2

]

CIs and testing are based on the fact:

D − µD

SD/√

n∼ Tn−1 if normality holds, or if n ≥ 30





ExampleA total of 12 water samples are analyzed for mercury content bylabs A and B. The paired data yields D = X 1 − X 2 = −0.0167and SD = 0.02645. Does lab B give, on average, higherconcentration results than lab A? Test at α = 0.05.

Solution. Here H0 : µ1 − µ2 = 0, Ha : µ1 − µ2 < 0. Becausen < 30, we must assume normality. Doing so we have:

TH0 =D

SD/√

n=−0.0167

.02645/√

12= −2.1865.

Since TH0 < −t.05,11 = −1.796, H0 is rejected.





It is important to be able to recognize paired data. Forexample,

A study was conducted to see whether two cars, A and B,having very different wheel bases and turning radii, took thesame time to parallel park. 7 drivers were randomlyobtained and the time required for each of them to parallelpark each of the 2 cars was measured. The results are asfollows:

DriverCar 1 2 3 4 5 6 7A 19.0 21.8 16.8 24.2 22.0 34.7 23.8B 17.8 20.2 16.2 41.4 21.4 28.4 22.7





The Signed-Rank test

1 Rank the absolute differences |D1|, . . . , |Dn| from smallestto largest. Let Ri denote the rank of |Di |.

2 Assign to Ri the sign of Di , forming thus signed ranks.3 Let S+ be the sum of the ranks Ri with positive sign, i.e.

the sum of the positive signed ranks.

If H0 holds, µS+= n(n+1)

4 , σ2S+

= n(n+1)(2n+1)24 .

If H0 holds, and n > 10, S+·∼ N(µS+

, σ2S+

).The TS for testing H0 : µD = 0 is

ZH0 =

(S+ −

n(n + 1)

4

)/

√n(n + 1)(2n + 1)

24,

The RRs are the usual RRs of a Z -test.Week 13 Comparing Two Populations, Part II




Example (Mercury concentrations from Labs A and B)

The 12 differences, Di and the ranks of their absolute values aregiven in the table below. Test H0 : µ1 − µ2 = 0, Ha : µ1 − µ2 < 0 atα = 0.05.

Di -0.0206 -0.0350 -0.0161 -0.0017 0.0064 -0.0219Ri 5 10 4 1 2 6Di -0.0250 -0.0279 -0.0232 -0.0655 0.0461 -0.0159Ri 8 9 7 12 11 3

Solution: Here S+ = 2 + 11 = 13. Thus

ZH0 =13− 39√

162.5= −2.04, with p-value= Φ(−2.04) = 0.0207.

Setting the differences in the object d, e.g., d=c(-0.0206 ,-0.0350,-0.0161,-0.0017, 0.0064, -0.0219, -0.0250 ,-0.0279, -0.0232, -0.0655,0.0461, -0.0159), the command wilcox.test(d,alternative=”less”)returns a p-value of 0.0212.





Two Proportions with paired data

Here each pair (X1j ,X2j) can be either (1,1) or (1,0) or(0,1) or (0,0).

As an example, if n voters are asked, both before and aftera presidential speech, whether or not they support a certainpolicy, X1j = 1 or 0 if the j th voter supports or not before thespeech, and X2j = 1 or 0 if the same voter supports or notafter the speech.

Typically, however, the pairs (X1j ,X2j) are not given.Instead the data are presented in the following tableformat.





After

Before

1 01 Y1 Y2

0 Y3 Y4

Y1 is the number of (1,1) pairs,Y2 is the number of (1,0) pairs,Y3 is the number of (0,1) pairs,Y4 is the number of (0,0) pairs,Y1 + · · ·+ Y4 = n





A variation of the T statistic, used only for testingH0 : p1 − p2 = 0, is the McNemar test statistic:

MN =Y2 − Y3√Y2 + Y3

This is referred to N(0,1), so the RR for Ha : p1 > p2 isMN > zα. Similarly for the other Ha.

R uses the square of MN and refers it to a χ21 distribution.

In this form only Ha : p1 6= p2 can be tested with p-value1-pchisq(MN 2, 1).





Example (McNemar’s test)Data on approval of the President’s performance in office in twosurveys, one month apart, of 1600 voting-age Americans, giveY1 = 794,Y2 = 150,Y3 = 86,Y4 = 570. Is there evidence, atα = 0.05, of a shift in public opinion? Report the p-value.

Solution. Here, MN = (150− 86)/√

150 + 86 = 4.166. Sincez0.025 = 1.96 we conclude that there is evidence of a shift inpublic opinion. The R command 2*(1-pnorm(4.166)) returns ap-value of 3.10e-05.





Levene’s Test

It is based on the idea that if the variances are equal,

V1j = |X1j − X̃1|, j = 1, . . . ,n1, and

V2j = |X2j − X̃2|, j = 1, . . . ,n2,

where X̃i , i = 1,2 is the median from i th sample,correspond to populations with equal means andvariances.Thus, equality of variances can be tested by testing thehypothesis

H0 : µV1 = µV2 vs µV1 6= µV2

using the two-sample t-test with pooled variance.





Example

The plasma vitamin C concentration (µmol/l) of five randomlyselected smokers and nonsmokers are:

Nonsmokers 41.48 41.71 41.98 41.68 41.18 s1 = 0.297Smokers 40.42 40.68 40.51 40.73 40.91 s2 = 0.192

Test H0 : σ21 = σ2

2 vs Ha : σ21 6= σ2

2 at α = 0.05.

Solution. Here X̃1 = 41.68, X̃2 = 40.68. Thus,

V1 values for Nonsmokers 0.20 0.03 0.30 0.00 0.50V2 values for Smokers 0.26 0.00 0.17 0.05 0.23

The R commands x=c(0.20 ,0.03 ,0.30, 0.00, 0.50); y=c(0.26 ,0.00,0.17 ,0.05 ,0.23); t.test(x, y, var.equal=T) gives a p-value of 0.558.Thus, H0 is not rejected.





The F Test Under Normality

When the two samples have been drawn from normalpopulations, the exact distribution of S2

1/S22 is a multiple of an F

distribution.

TheoremLet X11, . . . ,X1n1 be a random sample from a normal distribution withvariance σ2

1 , let X21, . . . ,X2n2 be another sample from a normaldistribution with variance σ2

2 , and let S21 and S2

2 denote the twosample variances. Then the rv

F =S2

1/σ21

S22/σ

22

has an F distribution with ν1 = n1 − 1 and ν2 = n2 − 1 degrees offreedom.





The test statistic for H0 : σ21 = σ2

2 is:

FH0 =S2

1

S22.

If the ratio differs sufficiently from 1, the null hypothesis isrejected. In particular the RRs for testing H0 : σ2

1 = σ22 are

Ha RR at level ασ2

1 > σ22 FH0 > Fn1−1,n2−1;α

σ21 < σ2

2 F−1H0

> Fn2−1,n1−1;α

σ21 6= σ2

2 either FH0 > Fn1−1,n2−1;α/2 or F−1H0

> Fn2−1,n1−1;α/2





ExampleConsider the data in the previous example, and assume theunderlying populations are normal. The test statistic is

FH0 =0.2970.192

= 2.40.

By the formula for the p-value in p. 333, the p-value, found with2(1-pf(2.4, 3,3)) is 0.49.





Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions


2 Paired Data












The R commands

• If y and x contain the values of the response and thepredictor, the basic commands for testing in regression are:

out=lm(y∼x); summary(out); summary(aov(out))

summary(out) gives the estimated regression coefficientsand their standard errors, the p-values for testing that eachcoefficient is zero, R2, and also the F-test statistic andp-value for the model utility test.summary(aov(out)) gives the ANOVA table.






Illustration with Simulated Data

e=rnorm(50,0,5); x=runif(50,0,10); y=25 - 3.4*x+e;out=lm(y∼x)For the data generated the summary(out) output includes

Coefficients:Estimate Std. Error t value Pr(> |t |)

(Intercept) 25.9198 1.2812 20.23 <2e-16x -3.6963 0.2365 -15.63 <2e-16

Residual standard error: 5.236 on 48 degrees of freedomMultiple R-squared: 0.8358, Adjusted R-squared: 0.8323F-statistic: 244.2 on 1 and 48 DF, p-value: < 2.2e-16






Moreover, the summary(aov(out)) output includes

Df Sum Sq Mean Sq F value Pr(>F)x 1 6697 6697 244.2 <2e-16Residuals 48 1316 27

The standard errors of the coefficients in the summary(out)output can be used for computing T statistics for otherhypotheses regarding them.

For example, for the T statistic for testing H0 : β1 = −3.4 vsHa : β1 6= −3.4 is

TH0 =−3.6963 + 3.4

0.2365= −1.253

with corresponding p-value 2(1−G48(1.253)) = 0.216.qqnorm(resid(out)); qqline(resid(out), col=2) can be usedto check the normality assumption







2 Paired Data












T-tests and T-intervals for one mean

Let x contain the data set. By default, the commandt.test(x), which is equivalent to

t.test(x, mu=0, alternative=”two.sided”, conf.level=0.95)

gives the t-statistic, the df, the p-value for testing H0 : µ = 0against the two-sided alternative, the 95% CI for µ, and X .To test H0 : µ = 8.5, replace mu=0 by mu=8.5.For one-sided alternatives, use alternative = ”less” andalternative = ”greater”. Note, however, the CIs are nowone-sided.






ExampleIs there evidence that the average level of radiation is higherthan the federal health standard of 10 W/cm2? Use the data inExRadiationTestData.txt to test at α = 0.05. Also, report the pvalue, and construct a 95% CI.Solution. Reading the data set into the R object x , thecommand t.test(x, mu=10, alternative=”greater”) returns ap-value of 0.074. Thus, H0 : µ = 10 cannot be rejected in favorof Ha : µ > 10 at α = 0.05.Next use t.test(x, mu=10, alternative=”two.sided”) to get a 95%CI of (9.773, 11.425).






Power and sample size calculations for H0 : µ = µ0

• First one needs to install the package pwr using the commandinstall.packages(”pwr”)

Then issue the command library(pwr) to load the package inthe current R session.

The command for computing the power at a given µa with agiven n, α and S value , for Ha : µ > µ0, is

pwr.t.test(n, (µa − µ0)/S, α, power=NULL, ”one.sample”,”greater”)For Ha : µ < µ0 and Ha : µ 6= µ0 replace ”greater” by ”less”and ”two.sided”, respectively.






ExampleFor the testing problem H0 : µ = 10 vs Ha : µ > 10 with theExRadiationTestData.txt data set, find the power at µa = 11.Solution. The commands length(x); sd(x) return n = 25 andS = 2.00 for this data set. The R command pwr.t.test(25,(11-10)/2.00, 0.05, power = NULL, ”one.sample”, ”greater”)returns a power of 0.78.

NOTE: Treating S as the true σ, the command1-pnorm((10-11)/(2.00/sqrt(25)) + qnorm(0.95)) returns apower of 0.80 according to the formula in the teaching slides.






The command for computing the sample size needed toachieve a certain level of power at µa with a given α and Svalue , for Ha : µ > µ0, is

pwr.t.test(n=NULL, (µa − µ0)/S, α, power(µa),”one.sample”, ”greater”)For Ha : µ < µ0 and Ha : µ 6= µ0 replace ”greater” by ”less”and ”two.sided”, respectively.






ExampleFor the testing problem H0 : µ = 10 vs Ha : µ > 10 with theExRadiationTestData.txt data set, find the sample size neededto achieve power of 0.9 at µa = 11.Solution. The R command pwr.t.test(n=NULL, (11-10)/2.00,0.05, 0.9, ”one.sample”, ”greater”) returns a sample size of35.65, which is rounded up to 36.

NOTE: Treating S as the true σ, the command(2.00*(qnorm(.95)+qnorm(.9))/(10-11))**2 returns a samplesize of 34.26, which is rounded to 35, according to the formulain the teaching slides.






Two independent samples

The “t.test” command can also be used for comparing twomeans, both with independent and with paired data.The two samples can be in two separate columns (i.e., xand y), or combined in one column, say y, with a separatecolumn, say x, indicating the sample membership of eachobservation.The default is to treat the two samples as independent, do95% CI, and give the p-value for Ha : µ1 − µ2 6= 0, withoutassuming σ1 = σ2. The command with these defaultoptions is:

t.test(x, y) # One sample in x, the other in y

t.test(y ∼ x) # For values in y and sample index in x






For the pooled variance T test, and 99% CI do:

t.test(y ∼ x, var.equal = TRUE, conf.level = 0.99)

and similarly if the two samples are in separate columns.To test a different null hypothesis, e.g., H0 : µ1 − µ2 = 1.8vs Ha : µ1 − µ2 < 1.8 do:

t.test(y ∼ x, mu=1.8, alternative = ”less”).

and similarly if the two samples are in separate columns.Other options are: alternative = ”greater”, or the default”two.sided”.






ExampleUse the R data set airquality to compare the ozone levels inMay and August. Report the p-value, test at 0.05, and constructa 95%CI for µ1 − µ2, with and without the assumption of equalvariances. [NOTE: Normality is violated; check withboxplot(Ozone ∼ Month, data = airquality). ]

Solution: Use: y1=airquality$Ozone; x1=airquality$Monthx=y1[which(x1==5)]; y=y1[which(x1==8)]; t.test(x, y); t.test(x, y,var.equal = T)

More advanced application(∗):

t.test(Ozone ∼ Month, data = airquality,subset = Month %in% c(5, 8))






Paired Data

The basic command for testing and CI construction withpaired data is

t.test(y ∼ x, paired = T)

and similarly if the two samples are in different columns.Other options can be added as before. For example,

t.test(y ∼ x, alternative = c(”two.sided”, ”less”, ”greater”),mu = 1.8, paired = T, conf.level = 0.9)With paired data, equality of the two marginal variances isa non-issue, so you never need to use “var.equal=T”.






ExampleTwo brands of motorcycle tires are to be compared fordurability. Eight motorcycles are selected at random and onetire from each brand is randomly assigned (front or back) oneach motorcycle. The motorcycles are then run until the tireswear out. The data in motorcycleTiresLifetimes.txt are in km.Use the paired T -test procedure to test the hypothesis of equalaverage durability at level α = 0.05, and to construct a 90% CIfor µ1 − µ2.

Solution: Read the data in tl and use:

x=tl$Brand1; y=tl$Brand2; t.test(x,y,paired=T, conf.level=0.9) #set x and y and construct the test and CIs







2 Paired Data












The Rank Sum Test

The “wilcox.test” command can be used to conduct boththe rank-sum test and the signed-rank test.Again, the two samples can be in two separate columns, orcombined in one column with a separate column indicatingthe sample membership of each observation.The default is to treat the two samples as independent, andgive the p-value for testing equality of the two populationsagainst the two-sided alternative, without constructing a CI:

wilcox.test(x, y) # One sample in x, the other in y

wilcox.test(y ∼ x) # For values in y and sample index in x






To get a CI for the location difference use:

wilcox.test(y ∼ x, conf.int = TRUE, conf.level = 0.9)

[The description of this CI is not in the book.]To test for different null and alternative hypotheses use:

wilcox.test(y ∼ x, mu=1.8, alternative = c(”less”, ”greater”))Similarly if the two samples are in different columns.






ExampleUse the R data set airquality to compare the ozone levels inMay and August. [Check data set with boxplot(Ozone ∼ Month,data = airquality)]

Solution: Use

y1=airquality$Ozone; x1=airquality$Month

x=y1[which(x1==5)]; y=y1[which(x1==8)]; wilcox.test(x, y,conf.int = T)

More advanced application(∗):

wilcox.test(Ozone ∼ Month, data = airquality,subset = Month %in% c(5, 8))






Rank sum for paired data (Signed-Rank Test)

The basic command for the signed-rank test with paireddata (without constructing a CI) is:

wilcox.test(x, y, paired = T) # One sample in x, the other iny

wilcox.test(y ∼ x, paired = T) # For values in y and sampleindex in xOther options can be added as before. For example,

wilcox.test(x, y, alternative = c(”less”, ”greater”), mu = 1.8,paired = T, conf.int = T, conf.level = 0.9)






ExampleTwo brands of motorcycle tires are to be compared fordurability. Eight motorcycles are selected at random and onetire from each brand is randomly assigned (front or back) oneach motorcycle. The motorcycles are then run until the tireswear out. The data in http://personal.psu.edu/acq/401/Data/motorcycleTiresLifetimes.txt are in km.Use the signed-rank test procedure to test the hypothesis ofequal durability at level α = 0.05, and to construct a 90% CI forthe location difference.

Solution: Read the data in tl and use:

x=tl$Brand1; y=tl$Brand2; wilcox.test(x, y, paired=T, conf.int =T, conf.level=0.9) # set x and y and construct the test and CIs


http://personal.psu.edu/acq/401/Data/motorcycleTiresLifetimes.txt

http://personal.psu.edu/acq/401/Data/motorcycleTiresLifetimes.txt






2 Paired Data












Set the number of successes and the number of trials in xand n. For example, use

x=c(16,14); n=c(200,400)

if X1 = 16, X2 = 14, n1 = 200, n2 = 400.To test H0 : p1 − p2 = 0 vs the two-sided alternative, andconstruct a 95% CI for p1 − p2, use prop.test(x, n), or,equivalently:

prop.test(x, n, alternative = ”two.sided”, conf.level = 0.95)Other alternative options are ”less”, or ”greater”.No option for testing other null hypotheses, e.g.,H0 : p1 − p2 = 0.1






ExampleAn article in Knee Surgery, Sports Traumatology, Arthroscopy(2005), Vol. 13, 273-279, reported results of arthroscopicmeniscal repair with an absorbable screw. For tears greaterthan 25 millimeters, 10 of 18 repairs were successful, while fortears less than 25 millimeters, 22 of 30 were successful. Isthere evidence that the success rate for the two types of tearsare different? Test at α = 0.1, report the p-value, and constructa 90% confidence interval for p1 − p2.

Solution: Use

x=c(10,22); n=c(18,30); prop.test(x, n, conf.level = 0.9)


week 13 comparing two populations, part iipersonal.psu.edu/acq/401/course.info/week13.pdf · week...

Documents