comparison of 2 population means
DESCRIPTION
Comparison of 2 Population Means. Goal: To compare 2 populations/treatments wrt a numeric outcome Sampling Design: Independent Samples (Parallel Groups) vs Paired Samples (Crossover Design) Data Structure: Normal vs Non-normal Sample Sizes: Large ( n 1 , n 2 >20) vs Small. Independent Samples. - PowerPoint PPT PresentationTRANSCRIPT
Comparison of 2 Population Means
• Goal: To compare 2 populations/treatments wrt a numeric outcome
• Sampling Design: Independent Samples (Parallel Groups) vs Paired Samples (Crossover Design)
• Data Structure: Normal vs Non-normal
• Sample Sizes: Large (n1,n2>20) vs Small
Independent Samples
• Units in the two samples are different
• Sample sizes may or may not be equal
• Large-sample inference based on Normal Distribution (Central Limit Theorem)
• Small-sample inference depends on distribution of individual outcomes (Normal vs non-Normal)
Parameters/Estimates (Independent Samples)
• Parameter:
• Estimator:
• Estimated standard error:
• Shape of sampling distribution:– Normal if data are normal
– Approximately normal if n1,n2>20
– Non-normal otherwise (typically)
21 YY
2
22
1
21
n
S
n
S
Large-Sample Test of
• Null hypothesis: The population means differ by 0 (which is typically 0):
• Alternative Hypotheses:– 1-Sided: – 2-Sided:
• Test Statistic:
0210 : H
021: AH
021: AH
2
22
1
21
021 )(
nS
nS
yyzobs
Large-Sample Test of • Decision Rule:
– 1-sided alternative
• If zobs z ==> Conclude • If zobs < z ==> Do not reject
– 2-sided alternative
• If zobs z ==> Conclude • If zobs -z ==> Conclude • If -z < zobs < z ==> Do not reject
021: AH
021: AH
Large-Sample Test of
• Observed Significance Level (P-Value)– 1-sided alternative
• P=P(z zobs) (From the std. Normal distribution)
– 2-sided alternative• P=2P( z |zobs| ) (From the std. Normal distribution)
• If P-Value then reject the null hypothesis
021: AH
021: AH
Large-Sample (1-100% Confidence Interval for
• Confidence Coefficient (1-) refers to the proportion of times this rule would provide an interval that contains the true parameter value if it were applied over all possible samples
• Rule:
2
22
1
21
2/21 n
S
n
Szyy
Large-Sample (1-100% Confidence Interval for
• For 95% Confidence Intervals, z.025=1.96
• Confidence Intervals and 2-sided tests give identical conclusions at same -level:– If entire interval is above 0, conclude – If entire interval is below 0, conclude – If interval contains 0, do not reject ≠
Example: Vitamin C for Common Cold
• Outcome: Number of Colds During Study Period for Each Student
• Group 1: Given Placebo
• Group 2: Given Ascorbic Acid (Vitamin C)
15512.02.2 111 nsy
20810.09.1 222 nsy
Source: Pauling (1971)
2-Sided Test to Compare Groups
• H0: 12No difference in trt effects)
• HA: 12≠Difference in trt effects)
• Test Statistic:
• Decision Rule (=0.05) – Conclude > 0 since zobs = 25.3 > z.025 = 1.96
3.250119.0
3.0
208)10.0(
155)12.0(
0)9.12.2(22
obsz
95% Confidence Interval for
• Point Estimate:
• Estimated Std. Error:
• Critical Value: z.025 = 1.96
• 95% CI: 0.30 ± 1.96(0.0119) 0.30 ± 0.023
(0.277 , 0.323) Entire interval > 0
3.09.12.221 yy
0119.0208
)10.0(
155
)12.0( 22
Small-Sample Test for Normal Populations
• Case 1: Common Variances (12 = 2
2 = 2)
• Null Hypothesis:• Alternative Hypotheses:
– 1-Sided: – 2-Sided:
• Test Statistic:(where Sp2 is a “pooled” estimate of 2)
0210 : H
021: AH
021: AH
2
)1()1(
11
)(
21
222
2112
21
2
021
nn
SnSnS
nnS
yyt p
p
obs
Small-Sample Test for Normal Populations
• Decision Rule: (Based on t-distribution with =n1+n2-2 df)
– 1-sided alternative• If tobs t, ==> Conclude • If tobs < t ==> Do not reject
– 2-sided alternative• If tobs t , ==> Conclude • If tobs -t ==> Conclude • If -t < tobs < t ==> Do not reject
Small-Sample Test for Normal Populations
• Observed Significance Level (P-Value)• Special Tables Needed, Printed by Statistical Software
Packages
– 1-sided alternative
• P=P(t tobs) (From the t distribution)
– 2-sided alternative
• P=2P( t |tobs| ) (From the t distribution)
• If P-Value then reject the null hypothesis
Small-Sample (1-100% Confidence Interval for Normal Populations
• Confidence Coefficient (1-) refers to the proportion of times this rule would provide an interval that contains the true parameter value if it were applied over all possible samples
• Rule:
• Interpretations same as for large-sample CI’s
21
2,2/21
11
nnStyy p
Small-Sample Inference for Normal Populations
• Case 2: 12 2
2
• Don’t pool variances:
• Use “adjusted” degrees of freedom (Satterthwaites’ Approximation) :
2
22
1
21
21 n
S
n
SS
yy
11
*
2
2
2
22
1
2
1
21
2
2
22
1
21
n
nS
n
nS
nS
nS
Example - Scalp Wound Closure
• Groups: Stapling (n1=15) / Suturing (n2=16)
• Outcome: Physician Reported VAS Score at 1-Year
Stapling (i=1) Suturing (i=2)Mean 96.92 96.31Std Dev 7.51 8.06Sample Size 15 16
• Conduct a 2-sided test of whether mean scores differ
• Construct a 95% Confidence Interval for true difference
Source: Khan, et al (2002)
Example - Scalp Wound Closure
)34.6,12.5(73.561.0)80.2(045.261.0:%95
045.2||:
22.080.2
61.0
161
151
83.60
31.9692.96:
83.6021615
)06.8)(116()51.7)(115(
29,025.
222
CI
ttRR
tTS
S
obs
obs
p
H0: HA: 0 ( = 0.05)
No significant difference between 2 methods
Small Sample Test to Compare Two Medians - Nonnormal Populations
• Two Independent Samples (Parallel Groups)• Procedure (Wilcoxon Rank-Sum Test):
– Rank measurements across samples from smallest (1) to largest (n1+n2). Ties take average ranks.
– Obtain the rank sum for each group (T1 , T2 )
– 1-sided tests:Conclude HA: M1 > M2 if T2 T0
– 2-sided tests:Conclude HA: M1 M2 if min(T1, T2) T0
– Values of T0 are given in many texts for various sample sizes and significance levels. P-values printed by statistical software packages.
Example - Levocabostine in Renal Patients
Non-Dialysis Hemodialysis857 (12) 527 (7)567 (9) 740 (11)626 (10) 392 (2.5)532 (8) 514 (6)444 (5) 433 (4)357 (1) 392 (2.5)T1 = 45 T2 = 33
• 2 Groups: Non-Dialysis/Hemodialysis (n1 = n2 = 6)
• Outcome: Levocabastine AUC (1 Outlier/Group)
2-sided Test: Conclude Medians differ if min(T1,T2) 26
Source: Zagornik, et al (1993)
Computer Output - SPSS
Ranks
6 7.50 45.00
6 5.50 33.00
12
GROUPNon-Dialysis
Hemodialysis
Total
AUCN Mean Rank Sum of Ranks
Test Statisticsb
12.000
33.000
-.962
.336
.394a
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
Exact Sig. [2*(1-tailedSig.)]
AUC
Not corrected for ties.a.
Grouping Variable: GROUPb.
Inference Based on Paired Samples (Crossover Designs)
• Setting: Each treatment is applied to each subject or pair (preferably in random order)
• Data: di is the difference in scores (Trt1-Trt2) for subject (pair) i
• Parameter: D - Population mean difference
• Sample Statistics:
21
2
21
1 dd
n
i id
n
i i ssn
dds
n
dd
Test Concerning D
• Null Hypothesis: H0:D=0 (almost always 0)
• Alternative Hypotheses: – 1-Sided: HA: D > 0
– 2-Sided: HA: D 0
• Test Statistic:
ns
dt
d
obs
Test Concerning D
Decision Rule: (Based on t-distribution with =n-1 df)1-sided alternative
If tobs t, ==> Conclude DIf tobs < t ==> Do not reject D
2-sided alternativeIf tobs t , ==> Conclude DIf tobs -t ==> Conclude DIf -t < tobs < t ==> Do not reject D
Confidence Interval for D
n
std d
,2/
Example - Evaluation of Transdermal Contraceptive Patch In Adolescents
• Subjects: Adolescent Females on O.C. who then received Ortho Evra Patch
• Response: 5-point scores on ease of use for each type of contraception (1=Strongly Agree)
• Data: di = difference (O.C.-EVRA) for subject i
• Summary Statistics:
1348.177.1 nsd d
Source: Rubinstein, et al (2004)
Example - Evaluation of Transdermal Contraceptive Patch In Adolescents
• 2-sided test for differences in ease of use (=0.05)
• H0:D = 0 HA:D 0
)66.2,88.0(89.077.1)41.0(179.277.1:%95
179.2|:|
31.441.0
77.1
1348.1
77.1:
12,025.
CI
ttRR
tTS
obs
obs
Conclude Mean Scores are higher for O.C., girls find the Patch easier to use (low scores are better)
Small-Sample Test For Nonnormal Data
• Paired Samples (Crossover Design)• Procedure (Wilcoxon Signed-Rank Test)
– Compute Differences di (as in the paired t-test) and obtain their absolute values (ignoring 0s)
– Rank the observations by |di| (smallest=1), averaging ranks for ties
– Compute T+ and T-, the rank sums for the positive and negative differences, respectively
– 1-sided tests:Conclude HA: M1 > M2 if T- T0
– 2-sided tests:Conclude HA: M1 M2 if min(T+, T- ) T0
– Values of T0 are given in many texts for various sample sizes and significance levels. P-values printed by statistical software packages.
Example - New MRI for 3D Coronary Angiography
• Previous vs new Magnetization Prep Schemes (n=7)
• Response: Blood/Myocardium Contrast-Noise-Ratio
Subject Previous New Diff=Pre-New |Diff| Rank(|Diff|)A 20 36 -16 16 7B 31 37 -6 6 1C 20 27 -7 7 2D 19 32 -13 13 5E 40 48 -8 8 3F 28 40 -12 12 4G 10 25 -15 15 6
• All Differences are negative, T- = 1+2+…+7 = 28, T+ = 0
• From tables for 2-sided tests, n=7, =0.05, T0=2
• Since min(0,28) 2, Conclude the scheme means differ Source: Nguyen, et al (2004)
Computer Output - SPSS
Ranks
0a .00 .00
7b 4.00 28.00
0c
7
Negative Ranks
Positive Ranks
Ties
Total
NEW - PREVIOUSN Mean Rank Sum of Ranks
NEW < PREVIOUSa.
NEW > PREVIOUSb.
NEW = PREVIOUSc.
Test Statisticsb
-2.366a
.018
Z
Asymp. Sig. (2-tailed)
NEW -PREVIOUS
Based on negative ranks.a.
Wilcoxon Signed Ranks Testb.
Note that SPSS is taking NEW-PREVIOUS in top table