comparison of 2 population means

Comparison of 2 Population Means

• Goal: To compare 2 populations/treatments wrt a numeric outcome

• Sampling Design: Independent Samples (Parallel Groups) vs Paired Samples (Crossover Design)

• Data Structure: Normal vs Non-normal

• Sample Sizes: Large (n1,n2>20) vs Small

Independent Samples

• Units in the two samples are different

• Sample sizes may or may not be equal

• Large-sample inference based on Normal Distribution (Central Limit Theorem)

• Small-sample inference depends on distribution of individual outcomes (Normal vs non-Normal)

Parameters/Estimates (Independent Samples)

• Parameter:

• Estimator:

• Estimated standard error:

• Shape of sampling distribution:– Normal if data are normal

– Approximately normal if n1,n2>20

– Non-normal otherwise (typically)

21 YY

2

22

1

21

n

S

n

S

Large-Sample Test of

• Null hypothesis: The population means differ by 0 (which is typically 0):

• Alternative Hypotheses:– 1-Sided: – 2-Sided:

• Test Statistic:

0210 : H

021: AH

021: AH

2

22

1

21

021 )(

nS

nS

yyzobs

Large-Sample Test of • Decision Rule:

– 1-sided alternative

• If zobs z ==> Conclude • If zobs < z ==> Do not reject


• If zobs z ==> Conclude • If zobs -z ==> Conclude • If -z < zobs < z ==> Do not reject

021: AH

021: AH

Large-Sample Test of

• Observed Significance Level (P-Value)– 1-sided alternative

• P=P(z zobs) (From the std. Normal distribution)

– 2-sided alternative• P=2P( z |zobs| ) (From the std. Normal distribution)

• If P-Value then reject the null hypothesis

021: AH

021: AH

Large-Sample (1-100% Confidence Interval for

• Confidence Coefficient (1-) refers to the proportion of times this rule would provide an interval that contains the true parameter value if it were applied over all possible samples

• Rule:

2

22

1

21

2/21 n

S

n

Szyy

Large-Sample (1-100% Confidence Interval for

• For 95% Confidence Intervals, z.025=1.96

• Confidence Intervals and 2-sided tests give identical conclusions at same -level:– If entire interval is above 0, conclude – If entire interval is below 0, conclude – If interval contains 0, do not reject ≠

Example: Vitamin C for Common Cold

• Outcome: Number of Colds During Study Period for Each Student

• Group 1: Given Placebo

• Group 2: Given Ascorbic Acid (Vitamin C)

15512.02.2 111 nsy

20810.09.1 222 nsy

Source: Pauling (1971)

2-Sided Test to Compare Groups

• H0: 12No difference in trt effects)

• HA: 12≠Difference in trt effects)

• Test Statistic:

• Decision Rule (=0.05) – Conclude > 0 since zobs = 25.3 > z.025 = 1.96

3.250119.0

3.0

208)10.0(

155)12.0(

0)9.12.2(22

obsz

95% Confidence Interval for

• Point Estimate:

• Estimated Std. Error:

• Critical Value: z.025 = 1.96

• 95% CI: 0.30 ± 1.96(0.0119) 0.30 ± 0.023

(0.277 , 0.323) Entire interval > 0

3.09.12.221 yy

0119.0208

)10.0(

155

)12.0( 22

Small-Sample Test for Normal Populations

• Case 1: Common Variances (12 = 2

2 = 2)

• Null Hypothesis:• Alternative Hypotheses:

– 1-Sided: – 2-Sided:

• Test Statistic:(where Sp2 is a “pooled” estimate of 2)

0210 : H

021: AH

021: AH

2

)1()1(

11

)(

21

222

2112

21

2

021

nn

SnSnS

nnS

yyt p

p

obs


• Decision Rule: (Based on t-distribution with =n1+n2-2 df)

– 1-sided alternative• If tobs t, ==> Conclude • If tobs < t ==> Do not reject

– 2-sided alternative• If tobs t , ==> Conclude • If tobs -t ==> Conclude • If -t < tobs < t ==> Do not reject


• Observed Significance Level (P-Value)• Special Tables Needed, Printed by Statistical Software

Packages


• P=P(t tobs) (From the t distribution)


• P=2P( t |tobs| ) (From the t distribution)

• If P-Value then reject the null hypothesis

Small-Sample (1-100% Confidence Interval for Normal Populations

• Confidence Coefficient (1-) refers to the proportion of times this rule would provide an interval that contains the true parameter value if it were applied over all possible samples

• Rule:

• Interpretations same as for large-sample CI’s

21

2,2/21

11

nnStyy p

Small-Sample Inference for Normal Populations

• Case 2: 12 2

2

• Don’t pool variances:

• Use “adjusted” degrees of freedom (Satterthwaites’ Approximation) :

2

22

1

21

21 n

S

n

SS

yy

11

*

2

2

2

22

1

2

1

21

2

2

22

1

21

n

nS

n

nS

nS

nS

Example - Scalp Wound Closure

• Groups: Stapling (n1=15) / Suturing (n2=16)

• Outcome: Physician Reported VAS Score at 1-Year

Stapling (i=1) Suturing (i=2)Mean 96.92 96.31Std Dev 7.51 8.06Sample Size 15 16

• Conduct a 2-sided test of whether mean scores differ

• Construct a 95% Confidence Interval for true difference

Source: Khan, et al (2002)

Example - Scalp Wound Closure

)34.6,12.5(73.561.0)80.2(045.261.0:%95

045.2||:

22.080.2

61.0

161

151

83.60

31.9692.96:

83.6021615

)06.8)(116()51.7)(115(

29,025.

222

CI

ttRR

tTS

S

obs

obs

p

H0: HA: 0 ( = 0.05)

No significant difference between 2 methods

Small Sample Test to Compare Two Medians - Nonnormal Populations

• Two Independent Samples (Parallel Groups)• Procedure (Wilcoxon Rank-Sum Test):

– Rank measurements across samples from smallest (1) to largest (n1+n2). Ties take average ranks.

– Obtain the rank sum for each group (T1 , T2 )

– 1-sided tests:Conclude HA: M1 > M2 if T2 T0

– 2-sided tests:Conclude HA: M1 M2 if min(T1, T2) T0

– Values of T0 are given in many texts for various sample sizes and significance levels. P-values printed by statistical software packages.

Example - Levocabostine in Renal Patients

Non-Dialysis Hemodialysis857 (12) 527 (7)567 (9) 740 (11)626 (10) 392 (2.5)532 (8) 514 (6)444 (5) 433 (4)357 (1) 392 (2.5)T1 = 45 T2 = 33

• 2 Groups: Non-Dialysis/Hemodialysis (n1 = n2 = 6)

• Outcome: Levocabastine AUC (1 Outlier/Group)

2-sided Test: Conclude Medians differ if min(T1,T2) 26

Source: Zagornik, et al (1993)

Computer Output - SPSS

Ranks

6 7.50 45.00

6 5.50 33.00

12

GROUPNon-Dialysis

Hemodialysis

Total

AUCN Mean Rank Sum of Ranks

Test Statisticsb

12.000

33.000

-.962

.336

.394a

Mann-Whitney U

Wilcoxon W

Z

Asymp. Sig. (2-tailed)

Exact Sig. [2*(1-tailedSig.)]

AUC

Not corrected for ties.a.

Grouping Variable: GROUPb.

Inference Based on Paired Samples (Crossover Designs)

• Setting: Each treatment is applied to each subject or pair (preferably in random order)

• Data: di is the difference in scores (Trt1-Trt2) for subject (pair) i

• Parameter: D - Population mean difference

• Sample Statistics:

21

2

21

1 dd

n

i id

n

i i ssn

dds

n

dd

Test Concerning D

• Null Hypothesis: H0:D=0 (almost always 0)

• Alternative Hypotheses: – 1-Sided: HA: D > 0

– 2-Sided: HA: D 0

• Test Statistic:

ns

dt

d

obs

Test Concerning D

Decision Rule: (Based on t-distribution with =n-1 df)1-sided alternative

If tobs t, ==> Conclude DIf tobs < t ==> Do not reject D

2-sided alternativeIf tobs t , ==> Conclude DIf tobs -t ==> Conclude DIf -t < tobs < t ==> Do not reject D

Confidence Interval for D

n

std d

,2/

Example - Evaluation of Transdermal Contraceptive Patch In Adolescents

• Subjects: Adolescent Females on O.C. who then received Ortho Evra Patch

• Response: 5-point scores on ease of use for each type of contraception (1=Strongly Agree)

• Data: di = difference (O.C.-EVRA) for subject i

• Summary Statistics:

1348.177.1 nsd d

Source: Rubinstein, et al (2004)

Example - Evaluation of Transdermal Contraceptive Patch In Adolescents

• 2-sided test for differences in ease of use (=0.05)

• H0:D = 0 HA:D 0

)66.2,88.0(89.077.1)41.0(179.277.1:%95

179.2|:|

31.441.0

77.1

1348.1

77.1:

12,025.

CI

ttRR

tTS

obs

obs

Conclude Mean Scores are higher for O.C., girls find the Patch easier to use (low scores are better)

Small-Sample Test For Nonnormal Data

• Paired Samples (Crossover Design)• Procedure (Wilcoxon Signed-Rank Test)

– Compute Differences di (as in the paired t-test) and obtain their absolute values (ignoring 0s)

– Rank the observations by |di| (smallest=1), averaging ranks for ties

– Compute T+ and T-, the rank sums for the positive and negative differences, respectively

– 1-sided tests:Conclude HA: M1 > M2 if T- T0

– 2-sided tests:Conclude HA: M1 M2 if min(T+, T- ) T0

– Values of T0 are given in many texts for various sample sizes and significance levels. P-values printed by statistical software packages.

Example - New MRI for 3D Coronary Angiography

• Previous vs new Magnetization Prep Schemes (n=7)

• Response: Blood/Myocardium Contrast-Noise-Ratio

Subject Previous New Diff=Pre-New |Diff| Rank(|Diff|)A 20 36 -16 16 7B 31 37 -6 6 1C 20 27 -7 7 2D 19 32 -13 13 5E 40 48 -8 8 3F 28 40 -12 12 4G 10 25 -15 15 6

• All Differences are negative, T- = 1+2+…+7 = 28, T+ = 0

• From tables for 2-sided tests, n=7, =0.05, T0=2

• Since min(0,28) 2, Conclude the scheme means differ Source: Nguyen, et al (2004)

Computer Output - SPSS

Ranks

0a .00 .00

7b 4.00 28.00

0c

7

Negative Ranks

Positive Ranks

Ties

Total

NEW - PREVIOUSN Mean Rank Sum of Ranks

NEW < PREVIOUSa.

NEW > PREVIOUSb.

NEW = PREVIOUSc.

Test Statisticsb

-2.366a

.018

Z

Asymp. Sig. (2-tailed)

NEW -PREVIOUS

Based on negative ranks.a.

Wilcoxon Signed Ranks Testb.

Note that SPSS is taking NEW-PREVIOUS in top table

comparison of 2 population means

Documents