week 13 comparing two populations, part iipersonal.psu.edu/acq/401/course.info/week13.pdf · week...

48
Outline The Rank-Sum Test Procedure Paired Data Comparing Two Variances Lab 8: Hypothesis Testing with R Week 13 Comparing Two Populations, Part II Week 13 Comparing Two Populations, Part II

Upload: others

Post on 23-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Week 13Comparing Two Populations, Part II

Week 13 Comparing Two Populations, Part II

Page 2: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Week 13 Objectives

Coverage of the topic of comparing two population continues withnew procedures and a new sampling design. The week concludeswith a lab session. In particular:

1 The rank-sum test, is presented.

2 The concept of paired data is introduced, and the paired-dataT -test, the signed-rank test, and McNemar’s test are described.

3 Levene’s test and the F -test for comparing two variances arepresented.

4 The lab session demonstrates the R implementation of testprocedures for one-sample, two-samples, and regressionincluding checking for the validity of assumptions.

Week 13 Comparing Two Populations, Part II

Page 3: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

1 The Rank-Sum Test Procedure

2 Paired Data

3 Comparing Two Variances

4 Lab 8: Hypothesis Testing with R

Hypothesis Testing in Regression

The “t.test” Command

The “wilcox.test” Command

The “prop.test” Command for Two Proportions

Week 13 Comparing Two Populations, Part II

Page 4: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Motivation

If the sample sizes are small and the populationsnon-normal the T test is not valid.The Mann-Whitney-Wilcoxon rank-sum test (or rank-sumtest for short), which will be described, can be used withboth small and large sample sizes.

If the two populations are continuous, the null distribution ofthe TS is known even with very small sample sizes.For discrete populations, the null distribution of the TS canbe well approximated with much smaller sample sizes thanthose required by contrast-based procedure.

The rank-sum test has high power, especially if the twopopulation distributions are heavy tailed, or skewed.

Week 13 Comparing Two Populations, Part II

Page 5: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

The Null Hypothesis and the TS

The rank sum procedure tests HF0 : F1 = F2.

Let Rij denote the (mid-)rank of observation Xij in thecombined set of N = n1 + n2 observations, and set

W1 =

n1∑j=1

R1j , R1 =W1

n1, W2 =

n2∑j=1

R2j , R2 =W2

n2.

Then, the Mann-Whitney-Wilcoxon TS is

R1 − R2 =N

n1n2

(W1 − n1

N + 12

), or simply W1

Week 13 Comparing Two Populations, Part II

Page 6: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

The Standardized Rank-Sum TS and RR

If there are no ties,

ZH0 =W1 − n1(N + 1)/2√

n1n2(N + 1)/12

If HF0 holds, ZH0

·∼N(0,1), for n1, n2 > 8. The RR are:

Ha Rejection region at level αµ1 − µ2 > 0 ZH0 ≥ zαµ1 − µ2 < 0 ZH0 ≤ −zαµ1 − µ2 6= 0 |ZH0 | ≥ zα/2

Week 13 Comparing Two Populations, Part II

Page 7: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

ExampleData on sputum histamine levels from 9 allergic and 13non-allergic individuals are given in http://personal.psu.edu/acq/401/Data/HistaminData.txt. Is there adifference between the two populations? Test at α = .01.Solution. Here R11 = 18, R12 = 11, R13 = 22, R14 = 19,R15 = 17, R16 = 21, R17 = 7, R18 = 20, R19 = 16. Thus

W =∑

j

R1j = 151 and ZH0 =151− 9(23)/2√

9(13)(23)/12= 3.17.

Since n1, n2 > 8, p-value=2[1− Φ(3.17)] = .0016. Thus thedifference is significant.

Week 13 Comparing Two Populations, Part II

Page 8: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Effect of Outliers

In the above example, the t test does not reject at level0.01.

With data in data frame hi,

t.test(hi$Level∼hi$Sample)

gives p-value of 0.13, and

t.test(hi$Level∼hi$Sample, var.equal=TRUE)

gives p-value of 0.06.

In general, using a procedure when the underlyingassumptions are violated will give misleading results.

Week 13 Comparing Two Populations, Part II

Page 9: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Introduction, Motivation

Paired data arise when each experimental unit receiveseach of the two treatments that being compared.

1 Compare the durability of two types of tires.2 Compare two labs for the analysis of mercury content.3 Two acne treatments, two cataract treatments, etc.

Paired data are of the form: (X11,X21), . . . , (X1n,X2n).CIs and the TS are again based on X 1 − X 2. But now theyare not independent. Thus, previous formulas do not apply.For example, σ2

X 1−X 2= σ2

X 1+ σ2

X 2− 2Cov(X 1,X 2).

Similarly, the rank sum test is not valid now.

Week 13 Comparing Two Populations, Part II

Page 10: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

The paired data T-test

While Cov(X 1,X 2) can be estimated, it is easier to use

D1 = X11 − X21, . . . ,Dn = X1n − X2n

D1, . . . ,Dn are independent, and D = X 1 − X 2. Thus,σ2

X 1−X 2= σ2

D can be estimated by σ̂2D

= S2D/n, where

S2D =

1n − 1

[n∑

i=1

D2i −

1n

(n∑

i=1

Di)2

]

CIs and testing are based on the fact:

D − µD

SD/√

n∼ Tn−1 if normality holds, or if n ≥ 30

Week 13 Comparing Two Populations, Part II

Page 11: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

ExampleA total of 12 water samples are analyzed for mercury content bylabs A and B. The paired data yields D = X 1 − X 2 = −0.0167and SD = 0.02645. Does lab B give, on average, higherconcentration results than lab A? Test at α = 0.05.

Solution. Here H0 : µ1 − µ2 = 0, Ha : µ1 − µ2 < 0. Becausen < 30, we must assume normality. Doing so we have:

TH0 =D

SD/√

n=−0.0167

.02645/√

12= −2.1865.

Since TH0 < −t.05,11 = −1.796, H0 is rejected.

Week 13 Comparing Two Populations, Part II

Page 12: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

It is important to be able to recognize paired data. Forexample,

A study was conducted to see whether two cars, A and B,having very different wheel bases and turning radii, took thesame time to parallel park. 7 drivers were randomlyobtained and the time required for each of them to parallelpark each of the 2 cars was measured. The results are asfollows:

DriverCar 1 2 3 4 5 6 7A 19.0 21.8 16.8 24.2 22.0 34.7 23.8B 17.8 20.2 16.2 41.4 21.4 28.4 22.7

Week 13 Comparing Two Populations, Part II

Page 13: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

The Signed-Rank test

1 Rank the absolute differences |D1|, . . . , |Dn| from smallestto largest. Let Ri denote the rank of |Di |.

2 Assign to Ri the sign of Di , forming thus signed ranks.3 Let S+ be the sum of the ranks Ri with positive sign, i.e.

the sum of the positive signed ranks.

If H0 holds, µS+= n(n+1)

4 , σ2S+

= n(n+1)(2n+1)24 .

If H0 holds, and n > 10, S+·∼ N(µS+

, σ2S+

).The TS for testing H0 : µD = 0 is

ZH0 =

(S+ −

n(n + 1)

4

)/

√n(n + 1)(2n + 1)

24,

The RRs are the usual RRs of a Z -test.Week 13 Comparing Two Populations, Part II

Page 14: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Example (Mercury concentrations from Labs A and B)

The 12 differences, Di and the ranks of their absolute values aregiven in the table below. Test H0 : µ1 − µ2 = 0, Ha : µ1 − µ2 < 0 atα = 0.05.

Di -0.0206 -0.0350 -0.0161 -0.0017 0.0064 -0.0219Ri 5 10 4 1 2 6Di -0.0250 -0.0279 -0.0232 -0.0655 0.0461 -0.0159Ri 8 9 7 12 11 3

Solution: Here S+ = 2 + 11 = 13. Thus

ZH0 =13− 39√

162.5= −2.04, with p-value= Φ(−2.04) = 0.0207.

Setting the differences in the object d, e.g., d=c(-0.0206 ,-0.0350,-0.0161,-0.0017, 0.0064, -0.0219, -0.0250 ,-0.0279, -0.0232, -0.0655,0.0461, -0.0159), the command wilcox.test(d,alternative=”less”)returns a p-value of 0.0212.

Week 13 Comparing Two Populations, Part II

Page 15: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Two Proportions with paired data

Here each pair (X1j ,X2j) can be either (1,1) or (1,0) or(0,1) or (0,0).

As an example, if n voters are asked, both before and aftera presidential speech, whether or not they support a certainpolicy, X1j = 1 or 0 if the j th voter supports or not before thespeech, and X2j = 1 or 0 if the same voter supports or notafter the speech.

Typically, however, the pairs (X1j ,X2j) are not given.Instead the data are presented in the following tableformat.

Week 13 Comparing Two Populations, Part II

Page 16: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

After

Before

1 01 Y1 Y2

0 Y3 Y4

Y1 is the number of (1,1) pairs,Y2 is the number of (1,0) pairs,Y3 is the number of (0,1) pairs,Y4 is the number of (0,0) pairs,Y1 + · · ·+ Y4 = n

Week 13 Comparing Two Populations, Part II

Page 17: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

A variation of the T statistic, used only for testingH0 : p1 − p2 = 0, is the McNemar test statistic:

MN =Y2 − Y3√Y2 + Y3

This is referred to N(0,1), so the RR for Ha : p1 > p2 isMN > zα. Similarly for the other Ha.

R uses the square of MN and refers it to a χ21 distribution.

In this form only Ha : p1 6= p2 can be tested with p-value1-pchisq(MN 2, 1).

Week 13 Comparing Two Populations, Part II

Page 18: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Example (McNemar’s test)Data on approval of the President’s performance in office in twosurveys, one month apart, of 1600 voting-age Americans, giveY1 = 794,Y2 = 150,Y3 = 86,Y4 = 570. Is there evidence, atα = 0.05, of a shift in public opinion? Report the p-value.

Solution. Here, MN = (150− 86)/√

150 + 86 = 4.166. Sincez0.025 = 1.96 we conclude that there is evidence of a shift inpublic opinion. The R command 2*(1-pnorm(4.166)) returns ap-value of 3.10e-05.

Week 13 Comparing Two Populations, Part II

Page 19: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Levene’s Test

It is based on the idea that if the variances are equal,

V1j = |X1j − X̃1|, j = 1, . . . ,n1, and

V2j = |X2j − X̃2|, j = 1, . . . ,n2,

where X̃i , i = 1,2 is the median from i th sample,correspond to populations with equal means andvariances.Thus, equality of variances can be tested by testing thehypothesis

H0 : µV1 = µV2 vs µV1 6= µV2

using the two-sample t-test with pooled variance.

Week 13 Comparing Two Populations, Part II

Page 20: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Example

The plasma vitamin C concentration (µmol/l) of five randomlyselected smokers and nonsmokers are:

Nonsmokers 41.48 41.71 41.98 41.68 41.18 s1 = 0.297Smokers 40.42 40.68 40.51 40.73 40.91 s2 = 0.192

Test H0 : σ21 = σ2

2 vs Ha : σ21 6= σ2

2 at α = 0.05.

Solution. Here X̃1 = 41.68, X̃2 = 40.68. Thus,

V1 values for Nonsmokers 0.20 0.03 0.30 0.00 0.50V2 values for Smokers 0.26 0.00 0.17 0.05 0.23

The R commands x=c(0.20 ,0.03 ,0.30, 0.00, 0.50); y=c(0.26 ,0.00,0.17 ,0.05 ,0.23); t.test(x, y, var.equal=T) gives a p-value of 0.558.Thus, H0 is not rejected.

Week 13 Comparing Two Populations, Part II

Page 21: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

The F Test Under Normality

When the two samples have been drawn from normalpopulations, the exact distribution of S2

1/S22 is a multiple of an F

distribution.

TheoremLet X11, . . . ,X1n1 be a random sample from a normal distribution withvariance σ2

1 , let X21, . . . ,X2n2 be another sample from a normaldistribution with variance σ2

2 , and let S21 and S2

2 denote the twosample variances. Then the rv

F =S2

1/σ21

S22/σ

22

has an F distribution with ν1 = n1 − 1 and ν2 = n2 − 1 degrees offreedom.

Week 13 Comparing Two Populations, Part II

Page 22: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

The test statistic for H0 : σ21 = σ2

2 is:

FH0 =S2

1

S22.

If the ratio differs sufficiently from 1, the null hypothesis isrejected. In particular the RRs for testing H0 : σ2

1 = σ22 are

Ha RR at level ασ2

1 > σ22 FH0 > Fn1−1,n2−1;α

σ21 < σ2

2 F−1H0

> Fn2−1,n1−1;α

σ21 6= σ2

2 either FH0 > Fn1−1,n2−1;α/2 or F−1H0

> Fn2−1,n1−1;α/2

Week 13 Comparing Two Populations, Part II

Page 23: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

ExampleConsider the data in the previous example, and assume theunderlying populations are normal. The test statistic is

FH0 =0.2970.192

= 2.40.

By the formula for the p-value in p. 333, the p-value, found with2(1-pf(2.4, 3,3)) is 0.49.

Week 13 Comparing Two Populations, Part II

Page 24: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

1 The Rank-Sum Test Procedure

2 Paired Data

3 Comparing Two Variances

4 Lab 8: Hypothesis Testing with R

Hypothesis Testing in Regression

The “t.test” Command

The “wilcox.test” Command

The “prop.test” Command for Two Proportions

Week 13 Comparing Two Populations, Part II

Page 25: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

The R commands

• If y and x contain the values of the response and thepredictor, the basic commands for testing in regression are:

out=lm(y∼x); summary(out); summary(aov(out))

summary(out) gives the estimated regression coefficientsand their standard errors, the p-values for testing that eachcoefficient is zero, R2, and also the F-test statistic andp-value for the model utility test.summary(aov(out)) gives the ANOVA table.

Week 13 Comparing Two Populations, Part II

Page 26: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

Illustration with Simulated Data

e=rnorm(50,0,5); x=runif(50,0,10); y=25 - 3.4*x+e;out=lm(y∼x)For the data generated the summary(out) output includes

Coefficients:Estimate Std. Error t value Pr(> |t |)

(Intercept) 25.9198 1.2812 20.23 <2e-16x -3.6963 0.2365 -15.63 <2e-16

Residual standard error: 5.236 on 48 degrees of freedomMultiple R-squared: 0.8358, Adjusted R-squared: 0.8323F-statistic: 244.2 on 1 and 48 DF, p-value: < 2.2e-16

Week 13 Comparing Two Populations, Part II

Page 27: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

Moreover, the summary(aov(out)) output includes

Df Sum Sq Mean Sq F value Pr(>F)x 1 6697 6697 244.2 <2e-16Residuals 48 1316 27

The standard errors of the coefficients in the summary(out)output can be used for computing T statistics for otherhypotheses regarding them.

For example, for the T statistic for testing H0 : β1 = −3.4 vsHa : β1 6= −3.4 is

TH0 =−3.6963 + 3.4

0.2365= −1.253

with corresponding p-value 2(1−G48(1.253)) = 0.216.qqnorm(resid(out)); qqline(resid(out), col=2) can be usedto check the normality assumption

Week 13 Comparing Two Populations, Part II

Page 28: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

1 The Rank-Sum Test Procedure

2 Paired Data

3 Comparing Two Variances

4 Lab 8: Hypothesis Testing with R

Hypothesis Testing in Regression

The “t.test” Command

The “wilcox.test” Command

The “prop.test” Command for Two Proportions

Week 13 Comparing Two Populations, Part II

Page 29: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

T-tests and T-intervals for one mean

Let x contain the data set. By default, the commandt.test(x), which is equivalent to

t.test(x, mu=0, alternative=”two.sided”, conf.level=0.95)

gives the t-statistic, the df, the p-value for testing H0 : µ = 0against the two-sided alternative, the 95% CI for µ, and X .To test H0 : µ = 8.5, replace mu=0 by mu=8.5.For one-sided alternatives, use alternative = ”less” andalternative = ”greater”. Note, however, the CIs are nowone-sided.

Week 13 Comparing Two Populations, Part II

Page 30: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

ExampleIs there evidence that the average level of radiation is higherthan the federal health standard of 10 W/cm2? Use the data inExRadiationTestData.txt to test at α = 0.05. Also, report the pvalue, and construct a 95% CI.Solution. Reading the data set into the R object x , thecommand t.test(x, mu=10, alternative=”greater”) returns ap-value of 0.074. Thus, H0 : µ = 10 cannot be rejected in favorof Ha : µ > 10 at α = 0.05.Next use t.test(x, mu=10, alternative=”two.sided”) to get a 95%CI of (9.773, 11.425).

Week 13 Comparing Two Populations, Part II

Page 31: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

Power and sample size calculations for H0 : µ = µ0

• First one needs to install the package pwr using the commandinstall.packages(”pwr”)

Then issue the command library(pwr) to load the package inthe current R session.

The command for computing the power at a given µa with agiven n, α and S value , for Ha : µ > µ0, is

pwr.t.test(n, (µa − µ0)/S, α, power=NULL, ”one.sample”,”greater”)For Ha : µ < µ0 and Ha : µ 6= µ0 replace ”greater” by ”less”and ”two.sided”, respectively.

Week 13 Comparing Two Populations, Part II

Page 32: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

ExampleFor the testing problem H0 : µ = 10 vs Ha : µ > 10 with theExRadiationTestData.txt data set, find the power at µa = 11.Solution. The commands length(x); sd(x) return n = 25 andS = 2.00 for this data set. The R command pwr.t.test(25,(11-10)/2.00, 0.05, power = NULL, ”one.sample”, ”greater”)returns a power of 0.78.

NOTE: Treating S as the true σ, the command1-pnorm((10-11)/(2.00/sqrt(25)) + qnorm(0.95)) returns apower of 0.80 according to the formula in the teaching slides.

Week 13 Comparing Two Populations, Part II

Page 33: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

The command for computing the sample size needed toachieve a certain level of power at µa with a given α and Svalue , for Ha : µ > µ0, is

pwr.t.test(n=NULL, (µa − µ0)/S, α, power(µa),”one.sample”, ”greater”)For Ha : µ < µ0 and Ha : µ 6= µ0 replace ”greater” by ”less”and ”two.sided”, respectively.

Week 13 Comparing Two Populations, Part II

Page 34: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

ExampleFor the testing problem H0 : µ = 10 vs Ha : µ > 10 with theExRadiationTestData.txt data set, find the sample size neededto achieve power of 0.9 at µa = 11.Solution. The R command pwr.t.test(n=NULL, (11-10)/2.00,0.05, 0.9, ”one.sample”, ”greater”) returns a sample size of35.65, which is rounded up to 36.

NOTE: Treating S as the true σ, the command(2.00*(qnorm(.95)+qnorm(.9))/(10-11))**2 returns a samplesize of 34.26, which is rounded to 35, according to the formulain the teaching slides.

Week 13 Comparing Two Populations, Part II

Page 35: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

Two independent samples

The “t.test” command can also be used for comparing twomeans, both with independent and with paired data.The two samples can be in two separate columns (i.e., xand y), or combined in one column, say y, with a separatecolumn, say x, indicating the sample membership of eachobservation.The default is to treat the two samples as independent, do95% CI, and give the p-value for Ha : µ1 − µ2 6= 0, withoutassuming σ1 = σ2. The command with these defaultoptions is:

t.test(x, y) # One sample in x, the other in y

t.test(y ∼ x) # For values in y and sample index in x

Week 13 Comparing Two Populations, Part II

Page 36: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

For the pooled variance T test, and 99% CI do:

t.test(y ∼ x, var.equal = TRUE, conf.level = 0.99)

and similarly if the two samples are in separate columns.To test a different null hypothesis, e.g., H0 : µ1 − µ2 = 1.8vs Ha : µ1 − µ2 < 1.8 do:

t.test(y ∼ x, mu=1.8, alternative = ”less”).

and similarly if the two samples are in separate columns.Other options are: alternative = ”greater”, or the default”two.sided”.

Week 13 Comparing Two Populations, Part II

Page 37: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

ExampleUse the R data set airquality to compare the ozone levels inMay and August. Report the p-value, test at 0.05, and constructa 95%CI for µ1 − µ2, with and without the assumption of equalvariances. [NOTE: Normality is violated; check withboxplot(Ozone ∼ Month, data = airquality). ]

Solution: Use: y1=airquality$Ozone; x1=airquality$Monthx=y1[which(x1==5)]; y=y1[which(x1==8)]; t.test(x, y); t.test(x, y,var.equal = T)

More advanced application(∗):

t.test(Ozone ∼ Month, data = airquality,subset = Month %in% c(5, 8))

Week 13 Comparing Two Populations, Part II

Page 38: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

Paired Data

The basic command for testing and CI construction withpaired data is

t.test(y ∼ x, paired = T)

and similarly if the two samples are in different columns.Other options can be added as before. For example,

t.test(y ∼ x, alternative = c(”two.sided”, ”less”, ”greater”),mu = 1.8, paired = T, conf.level = 0.9)With paired data, equality of the two marginal variances isa non-issue, so you never need to use “var.equal=T”.

Week 13 Comparing Two Populations, Part II

Page 39: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

ExampleTwo brands of motorcycle tires are to be compared fordurability. Eight motorcycles are selected at random and onetire from each brand is randomly assigned (front or back) oneach motorcycle. The motorcycles are then run until the tireswear out. The data in motorcycleTiresLifetimes.txt are in km.Use the paired T -test procedure to test the hypothesis of equalaverage durability at level α = 0.05, and to construct a 90% CIfor µ1 − µ2.

Solution: Read the data in tl and use:

x=tl$Brand1; y=tl$Brand2; t.test(x,y,paired=T, conf.level=0.9) #set x and y and construct the test and CIs

Week 13 Comparing Two Populations, Part II

Page 40: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

1 The Rank-Sum Test Procedure

2 Paired Data

3 Comparing Two Variances

4 Lab 8: Hypothesis Testing with R

Hypothesis Testing in Regression

The “t.test” Command

The “wilcox.test” Command

The “prop.test” Command for Two Proportions

Week 13 Comparing Two Populations, Part II

Page 41: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

The Rank Sum Test

The “wilcox.test” command can be used to conduct boththe rank-sum test and the signed-rank test.Again, the two samples can be in two separate columns, orcombined in one column with a separate column indicatingthe sample membership of each observation.The default is to treat the two samples as independent, andgive the p-value for testing equality of the two populationsagainst the two-sided alternative, without constructing a CI:

wilcox.test(x, y) # One sample in x, the other in y

wilcox.test(y ∼ x) # For values in y and sample index in x

Week 13 Comparing Two Populations, Part II

Page 42: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

To get a CI for the location difference use:

wilcox.test(y ∼ x, conf.int = TRUE, conf.level = 0.9)

[The description of this CI is not in the book.]To test for different null and alternative hypotheses use:

wilcox.test(y ∼ x, mu=1.8, alternative = c(”less”, ”greater”))Similarly if the two samples are in different columns.

Week 13 Comparing Two Populations, Part II

Page 43: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

ExampleUse the R data set airquality to compare the ozone levels inMay and August. [Check data set with boxplot(Ozone ∼ Month,data = airquality)]

Solution: Use

y1=airquality$Ozone; x1=airquality$Month

x=y1[which(x1==5)]; y=y1[which(x1==8)]; wilcox.test(x, y,conf.int = T)

More advanced application(∗):

wilcox.test(Ozone ∼ Month, data = airquality,subset = Month %in% c(5, 8))

Week 13 Comparing Two Populations, Part II

Page 44: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

Rank sum for paired data (Signed-Rank Test)

The basic command for the signed-rank test with paireddata (without constructing a CI) is:

wilcox.test(x, y, paired = T) # One sample in x, the other iny

wilcox.test(y ∼ x, paired = T) # For values in y and sampleindex in xOther options can be added as before. For example,

wilcox.test(x, y, alternative = c(”less”, ”greater”), mu = 1.8,paired = T, conf.int = T, conf.level = 0.9)

Week 13 Comparing Two Populations, Part II

Page 45: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

ExampleTwo brands of motorcycle tires are to be compared fordurability. Eight motorcycles are selected at random and onetire from each brand is randomly assigned (front or back) oneach motorcycle. The motorcycles are then run until the tireswear out. The data in http://personal.psu.edu/acq/401/Data/motorcycleTiresLifetimes.txt are in km.Use the signed-rank test procedure to test the hypothesis ofequal durability at level α = 0.05, and to construct a 90% CI forthe location difference.

Solution: Read the data in tl and use:

x=tl$Brand1; y=tl$Brand2; wilcox.test(x, y, paired=T, conf.int =T, conf.level=0.9) # set x and y and construct the test and CIs

Week 13 Comparing Two Populations, Part II

Page 46: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

1 The Rank-Sum Test Procedure

2 Paired Data

3 Comparing Two Variances

4 Lab 8: Hypothesis Testing with R

Hypothesis Testing in Regression

The “t.test” Command

The “wilcox.test” Command

The “prop.test” Command for Two Proportions

Week 13 Comparing Two Populations, Part II

Page 47: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

Set the number of successes and the number of trials in xand n. For example, use

x=c(16,14); n=c(200,400)

if X1 = 16, X2 = 14, n1 = 200, n2 = 400.To test H0 : p1 − p2 = 0 vs the two-sided alternative, andconstruct a 95% CI for p1 − p2, use prop.test(x, n), or,equivalently:

prop.test(x, n, alternative = ”two.sided”, conf.level = 0.95)Other alternative options are ”less”, or ”greater”.No option for testing other null hypotheses, e.g.,H0 : p1 − p2 = 0.1

Week 13 Comparing Two Populations, Part II

Page 48: Week 13 Comparing Two Populations, Part IIpersonal.psu.edu/acq/401/course.info/week13.pdf · Week 13 Objectives Coverage of the topic of comparing two population continues with new

OutlineThe Rank-Sum Test Procedure

Paired DataComparing Two Variances

Lab 8: Hypothesis Testing with R

Hypothesis Testing in RegressionThe “t.test” CommandThe “wilcox.test” CommandThe “prop.test” Command for Two Proportions

ExampleAn article in Knee Surgery, Sports Traumatology, Arthroscopy(2005), Vol. 13, 273-279, reported results of arthroscopicmeniscal repair with an absorbable screw. For tears greaterthan 25 millimeters, 10 of 18 repairs were successful, while fortears less than 25 millimeters, 22 of 30 were successful. Isthere evidence that the success rate for the two types of tearsare different? Test at α = 0.1, report the p-value, and constructa 90% confidence interval for p1 − p2.

Solution: Use

x=c(10,22); n=c(18,30); prop.test(x, n, conf.level = 0.9)

Week 13 Comparing Two Populations, Part II