independent 6.2 two samples dependent (paired,...

23
Ismor Fischer, 5/29/2012 6.2-1 1 X 2 X Null Distribution 6.2 Two Samples § 6.2.1 Means First assume that the samples are randomly selected from two populations that are independent, i.e., no relation exists between individuals of one population and the other, relative to the random variable, or any lurking or confounding variables that might have an effect on this variable. Model : Phase III Randomized Clinical Trial (RCT) Measuring the effect of treatment (e.g., drug) versus control (e.g., placebo) on a response variable X, to determine if there is any significant difference between them. Then… CLT So… 1 X 2 X ~ N µ 1 µ 2 , σ 1 2 n 1 + σ 2 2 n 2 Comments : Recall from 4.1 : If Y 1 and Y 2 are independent , then Var(Y 1 Y 2 ) = Var(Y 1 ) + Var(Y 2 ). If n 1 = n 2 , the samples are said to be (numerically) balanced. The null hypothesis H 0 : µ 1 µ 2 = 0 can be replaced by H 0 : µ 1 µ 2 = µ 0 if necessary, in order to compare against a specific constant difference µ 0 (e.g., 10 cholesterol points), with the corresponding modifications below. s.e. = σ 1 2 n 1 + σ 2 2 n 2 can be replaced by s.e. = s 1 2 n 1 + s 2 2 n 2 , provided n 1 30, n 2 30. Independent Dependent (Paired, Matched) Control Arm Treatment Arm Assume X 1 ~ N(µ 1 , σ 1 ) Assume X 2 ~ N(µ 2 , σ 2 ) Sample, size n 1 Sample, size n 2 1 X ~ N µ 1 , σ 1 n 1 2 X ~ N µ 2 , σ 2 n 2 H 0 : µ 1 µ 2 = 0 (There is no difference in mean response between the two populations.) X X 1 X 2 0

Upload: others

Post on 19-Aug-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-1

1X − 2X

Null Distribution

6.2 Two Samples

§ 6.2.1 Means

First assume that the samples are randomly selected from two populations that are independent, i.e., no relation exists between individuals of one population and the other, relative to the random variable, or any lurking or confounding variables that might have an effect on this variable. Model: Phase III Randomized Clinical Trial (RCT)

Measuring the effect of treatment (e.g., drug) versus control (e.g., placebo) on a response variable X, to determine if there is any significant difference between them.

Then… ↓ CLT ↓

So… 1X − 2X ~ N

µ1 − µ2, σ1

2

n1 + σ2

2

n2

Comments:

Recall from 4.1: If Y1 and Y2 are independent, then Var(Y1 − Y2) = Var(Y1) + Var(Y2).

If n1 = n2, the samples are said to be (numerically) balanced.

The null hypothesis H0: µ1 − µ2 = 0 can be replaced by H0: µ1 − µ2 = µ0 if necessary, in order to compare against a specific constant difference µ0 (e.g., 10 cholesterol points), with the corresponding modifications below.

s.e. = σ1

2

n1 + σ2

2

n2 can be replaced by s.e. =

s12

n1 +

s22

n2, provided n1 ≥ 30, n2 ≥ 30.

Independent

Dependent (Paired, Matched)

Control Arm Treatment Arm

Assume

X1 ~ N(µ1, σ1)

Assume

X2 ~ N(µ2, σ2)

Sample, size n1 Sample, size n2

1X ~ N

µ1, σ1

n1 2X ~ N

µ2, σ2

n2

H0: µ1 − µ2 = 0 (There is no difference in mean response between the two populations.)

X

X1 X2

0

Page 2: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-2

Test Statistic

Z = ( 1X − 2X ) − µ0

s1

2

n1 +

s22

n2

~ N(0, 1)

Example: X = “cholesterol level (mg/dL)” Test H0: µ1 − µ2 = 0 vs. HA: µ1 − µ2 ≠ 0 for significance at the α = .05 level.

s1

2

n1 =

120080 = 15,

s22

n2 =

60060 = 10 → s.e. =

s12

n1 +

s22

n2 = 25 = 5

95% Confidence Interval for µ1 − µ2

95% limits = 11 ± (1.96)(5) = 11 ± 9.8 ← margin of error ∴ 95% CI = (1.2, 20.8), which does not contain 0 ⇒ Reject H0. Drug works!

95% Acceptance Region for H0: µ1 − µ2 = 0

95% limits = 0 ± (1.96)(5) = ± 9.8 ← margin of error ∴ 95% AR = (−9.8, +9.8), which does not contain 11 ⇒ Reject H0. Drug works!

p-value = 2 P( 1X − 2X ≥ 11)

= 2 P

Z ≥ 11 − 0

5

= 2 P(Z ≥ 2.2) = 2(.0139) = .0278 < .05 = α ⇒ Reject H0. Drug works!

Placebo

Drug n1 = 80

1x = 240

s12 = 1200

n2 = 60

2x = 229

s22 = 600

→ 1x − 2x = 11

(1 − α) × 100% Acceptance Region for H0: µ1 − µ2 = µ0

µ0 − zα/2 s1

2

n1 +

s22

n2 , µ0 + zα/2

s12

n1 +

s22

n2

(1 − α) × 100% Confidence Interval for µ1 − µ2

1 2( )x x− − zα/2 s1

2

n1 +

s22

n2 , 1 2( )x x− + zα/2

s12

n1 +

s22

n2

Page 3: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-3

0.95

1X – 2X

0.025

–9.8 µ 1 – µ 2 = 0 9.8 11

0.025

0.0139 0.0139

1.2 20.8

0 is not in the 95% Confidence Interval

= (1.2, 20.8)

11 is not in the 95% Acceptance Region

= (–9.8, 9.8)

Null Distribution

1 2X X− ~ N(0, 5)

Page 4: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-4

Small samples: What if n1 < 30 and/or n2 < 30? Then use the t-distribution, provided… H0: σ1

2 = σ22 (equivariance, homoscedasticity)

Technically, this requires a formal test using the F-distribution; see next section (§ 6.2.2). However, an informal criterion is often used:

14 < F =

s12

s22 < 4 .

If equivariance is accepted, then the common value of σ12 and σ2

2 can be estimated by the weighted mean of s1

2 and s22, the pooled sample variance:

spooled2 =

df1 s12 + df2 s2

2 df1 + df2

, where df1 = n1 − 1 and df2 = n2 − 1,

i.e.,

spooled2 =

(n1 − 1) s12 + (n2 − 1) s2

2 n1 + n2 − 2 =

SSdf .

Therefore, in this case, we have s.e. = σ1

2

n1 + σ2

2

n2 estimated by

s.e. = spooled

2

n1 +

spooled2

n2

i.e., s.e. = spooled

2

1

n1 +

1n2

= spooled 1n1

+ 1n2

.

If equivariance (but not normality) is rejected, then an approximate t-test can be used, with the approximate degrees of freedom df given by

s1

2

n1 +

s22

n2

2

(s12/n1)2

n1 − 1 + (s2

2/n2)2

n2 − 1

.

This is known as the Smith-Satterwaithe Test. (Also used is the Welch Test.)

Page 5: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-5 Example: X = “cholesterol level (mg/dL)” Test H0: µ1 − µ2 = 0 vs. HA: µ1 − µ2 ≠ 0 for significance at the α = .05 level.

Pooled Variance

spooled2 =

(8 − 1)(775) + (10 − 1)(1175) 8 + 10 − 2 =

1600016 = 1000

↑ df

Note that spooled2 = 1000 is indeed between the variances s1

2 = 775 and s22 = 1175.

Standard Error

s.e. = 1000

1

8 + 110 = 15

Margin of Error = (2.120)(15) = 31.8 Critical Value t16, .025 = 2.120

Placebo

Drug

n1 = 8

1x = 230

s12 = 775

n2 = 10

2x = 200

s22 = 1175

→ 1x − 2x = 30

→ F = s12 / s2

2 = 0.66, which is between 0.25 and 4. Equivariance accepted ⇒ t-test

Page 6: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-6

Test Statistic

T = ( 1X − 2X ) − µ0

spooled2

1

n1 +

1n2

~ tdf

where df = n1 + n2 – 2

95% Confidence Interval for µ1 − µ2

95% limits = 30 ± 31.8 ← margin of error ∴ 95% CI = (−1.8, 61.8), which contains 0 ⇒ Accept H0.

95% Acceptance Region for H0: µ1 − µ2 = 0

95% limits = 0 ± 31.8 ← margin of error ∴ 95% AR = (−31.8, +31.8), which contains 30 ⇒ Accept H0.

p-value = 2 P( 1X − 2X ≥ 30)

= 2 P

T16 ≥ 30 − 0

15

= 2 P(T16 ≥ 2.0) = 2(.0314) = .0628 > .05 = α

⇒ Accept H0.

Once again, low sample size implies low power to reject the null hypothesis. The tests do not show significance, and we cannot conclude that the drug works, based on the data from these small samples. Perhaps a larger study is indicated…

(1 − α) × 100% Confidence Interval for µ1 − µ2

1 2( )x x− − tdf, α/2 spooled2

1

n1 +

1n2

, 1 2( )x x− + tdf, α/2 spooled2

1

n1 +

1n2

where df = n1 + n2 – 2

(1 − α) × 100% Acceptance Region for H0: µ1 − µ2 = µ0

µ0 − tdf, α/2 spooled2

1

n1 +

1n2

, µ0 + tdf, α/2 spooled2

1

n1 +

1n2

where df = n1 + n2 – 2

Page 7: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-7

Now consider the case where the two samples are dependent. That is, each observation in the first sample is paired, or matched, in a natural way on a corresponding observation in the second sample.

Examples:

Individuals may be matched on characteristics such as age, sex, race, and/or other variables that might confound the intended response.

Individuals may be matched on personal relations such as siblings (similar genetics, e.g., twin studies), spouses (similar environment), etc.

Observations may be connected physically (e.g., left arm vs. right arm), or connected in time (e.g., before treatment vs. after treatment).

Calculate the difference di = xi – yi of each matched pair of observations, thereby forming a single collapsed sample {d1, d2, d3, …, dn}, and apply the appropriate one-sample Z- or t- test to the equivalent null hypothesis H0: µD = 0.

Subtract…

Subtract…

Assume

X ~ N(µ1, σ1)

Assume

Y ~ N(µ2, σ2)

x1

x2

x3 . . .

xn

y1

y2

y3 . . .

yn

#

1

2

3 . . . n

Sample, size n

Sample, size n

D = X – Y ~ N(µ, σ)

where µD = µ1 – µ2

d1 = x1 – y1

d2 = x2 – y2

d3 = x3 – y3 . . .

dn = xn – yn

Sample, size n

H0: µ1 − µ2 = 0 H0: µD = 0

Page 8: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-8 Checks for normality include normal scores plot (probability plot, Q-Q plot), etc., just as with one sample. Remedies for non-normality include transformations (e.g., logarithmic or square root), or nonparametric tests.

Independent Samples: Wilcoxon Rank Sum Test (= Mann-Whitney U Test)

Dependent Samples: Sign Test, Wilcoxon Signed Rank Test (just as with one sample)

Page 9: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-9

Step-by-Step Hypothesis Testing Two Sample Means H0: µ1 – µ2 vs. 0

1 2 02 21 1 2 2

( )X XZn n

µ

σ σ

− −=

+

1 2

1 2 02 2

2 1 1 2 2

( )n n

Z X XT s n s n

µ+ −

− −=

+

1 2

2 21 1 2 2

1 2

1 2 02

1 2

( 1) ( 1)2

( )(1 1 )

n n

n s n sn n

X XTn n

µ+ −

− + −+ −

− −=

+

=

2pooled

2pooled

s

s

…GO TO PAGE 6.1-28

Use Z-test (with σ1, σ2)

Use t-test (with 2 2

1 2ˆ ˆσ σ= = 2

pooleds )

Use Z-test or t-test (with 11ˆ s=σ , 22ˆ s=σ )

Use a transformation, or a nonparametric test,

e.g., Wilcoxon Rank Sum Test

Independent or Paired?

Compute D = X1 – X2 for each i = 1, 2, …, n. Then calculate…

• sample mean ∑=

=n

iid

nd

1

1

• sample variance ∑=

−−

=n

iid dd

ns

1

22 )(1

1

… and GO TO “One Sample Mean” testing of H0: µD = 0, section 6.1.1.

Are X1 and X2 approximately

normally distributed (or mildly skewed)?

Are σ1, σ2 known?

Are n1 ≥ 30 and n2 ≥ 30?

Equivariance: σ12 = σ2

2?

Compute F = s12 / s2

2. Is 1/4 < F < 4?

Use an approximate t-test, e.g., Satterwaithe Test

No

Paired

Yes

No, or don’t know

Yes

No

Independent

Yes

Yes

No

Page 10: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-10

ν1 = 20, ν2 = 40

ν1 = 20, ν2 = 30

ν1 = 20, ν2 = 20

ν1 = 20, ν2 = 10

ν1 = 20, ν2 = 5

F-distribution

f(x) = 1

Β(ν1/2, ν2/2)

ν1

ν2

ν1/2

xν1/2 − 1

1 +

ν1ν2

x−ν1/2 −ν2/2

σ1 σ2

Independent groups

Sample, size n1

Calculate 21s

Sample, size n2

Calculate 22s

§ 6.2.2 Variances Suppose X1 ~ N(µ1, σ1) and X2 ~ N(µ2, σ2). Null Hypothesis H0: σ1

2 = σ22

versus

Alternative Hypothesis HA: σ12 ≠ σ2

2

Formal test: Reject H0 if the F-statistic is significantly different from 1. Informal criterion: Accept H0 if the F-statistic is between 0.25 and 4. Comment: Another test, more robust to departures from the normality assumption than the F-test, is Levene’s Test, a t-test of the absolute deviations of each sample. It can be generalized to more than two samples (see section 6.3.2).

Test Statistic

F = s1

2

s22 ~ F

ν1 ν2

where ν1 = n1 – 1 and ν2 = n2 – 1 are the corresponding numerator and denominator degrees of freedom, respectively.

Page 11: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-11 § 6.2.3 Proportions

Therefore, approximately…

π̂1 − π̂2 ~ N

π1 − π2 ,

π1 (1 − π1)n1

+ π2 (1 − π2)

n2 .

↑ standard error s.e. Confidence intervals are computed in the usual way, using the estimate

s.e. = 1 2

ˆ ˆ ˆ ˆ1 (1 )( )n n

π π π π+

− −1 1 2 2 ,

as follows:

POPULATION

Binary random variable I1 = 1 or 0, with

P(I1 = 1) = π1, P(I1 = 0) = 1 − π1

Binary random variable I2 = 1 or 0, with

P(I2 = 1) = π2, P(I2 = 0) = 1 − π2

INDEPENDENT SAMPLES n1 ≥ 30 n2 ≥ 30

Random Variable X1 = #(I1 = 1) ~ Bin(n1, π1)

Recall (assuming n1π1 ≥ 15, n1(1 – π1) ≥ 15):

~ N

π1, π1 (1 − π1)

n1 , approx.

Random Variable X2 = #(I2 = 1) ~ Bin(n2, π2)

Recall (assuming n2π2 ≥ 15, n2(1 – π2) ≥ 15):

~ N

π2, π2 (1 − π2)

n2 , approx. π̂1 =

X1n1

π̂ 2 = X2n2

Page 12: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-12

(1 − α) × 100% Confidence Interval for π1 − π2

(π̂1 − π̂2 ) − zα/2

1 2

ˆ ˆ ˆ ˆ1 (1 )( )n n

π π π π+

− −1 1 2 2 ‚ (π̂1 − π̂2 ) + zα/2 1 2

ˆ ˆ ˆ ˆ1 (1 )( )n n

π π π π+

− −1 1 2 2

Test Statistic for H0: π1 − π2 = 0

Z = (π̂1 − π̂2 ) − 0

π̂pooled (1 − π̂pooled ) 1n1

+ 1n2

~ N(0, 1)

Unlike the one-sample case, the same estimate for the standard error can also be used in computing the acceptance region for the null hypothesis H0: π1 − π2 = π0, as well as the test statistic for the p-value, provided the null value π0 ≠ 0. HOWEVER, if testing for equality between two proportions via the null hypothesis H0: π1 − π2 = 0, then their common value should be estimated by the more stable weighted mean of π̂1 and π̂ 2 , the pooled sample proportion:

π̂pooled = X1 + X2n1 + n2

= n1π̂1 + n2π̂ 2

n1 + n2 .

Substituting yields…

s.e.0 = π̂pooled (1 − π̂pooled )

n1 +

π̂pooled (1 − π̂pooled )n2

i.e.,

s.e.0 = π̂pooled (1 − π̂pooled ) 1n1

+ 1n2

. Hence…

(1 − α) × 100% Acceptance Region for H0: π1 − π2 = 0

0 − zα/2 π̂pooled (1 − π̂pooled )

1n1

+ 1n2

, 0 + zα/2 π̂pooled (1 − π̂pooled ) 1n1

+ 1n2

Page 13: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-13

PT + Supplement

PT only

n1 = 400

X1 = 332

n2 = 320

X2 = 244

H0: π1 − π2 = 0

Null Distribution N(0, 0.03)

0 0.0675 π̂1 − π̂ 2

Standard Normal Distribution

N(0, 1)

Z 0 2.25

.0122 .0122

Figure 1

−0.0675 −2.25

Example: Consider a group of 720 patients who undergo physical therapy for arthritis. A daily supplement of glucosamine and chondroitin is given to n1 = 400 of them in addition to the physical therapy; after four weeks of treatment, X1 = 332 show measurable signs of improvement (increased ROM, etc.). The remaining n2 = 320 patients receive physical therapy only; after four weeks, X2 = 244 show improvement. Does this difference represent a statistically significant treatment effect? Calculate the p-value, and form a conclusion at the α = .05 significance level.

H0: π1 − π2 = 0

vs. HA: π1 − π2 ≠ 0 at α = .05

↓ ↓

π̂1 = 332400 = 0.83, π̂ 2 =

244320 = 0.7625 ⇒ π̂1 − π̂ 2 = 0.0675

π̂pooled = 332 + 244400 + 320 =

576720 = 0.8

and thus 1 – π̂pooled = 144720 = 0.2

Therefore, p-value = 2 P(π̂1 − π̂2 ≥ 0.0675) = 2 P

Z ≥ 0.0675 − 0

0.03 = 2 P(Z ≥ 2.25) = 2(.0122) = .0244 .

Conclusion: As this value is smaller than α = .05, we can reject the null hypothesis that the two proportions are equal. There does indeed seem to be a moderately significant treatment difference between the two groups.

s.e.0 = (0.8)(0.2) 1

400 + 1

320 = 0.03

Page 14: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-14 Exercise: Instead of H0: π1 − π2 = 0 vs. HA: π1 − π2 ≠ 0, test the null hypothesis for a 5% difference, i.e., H0: π1 − π2 = .05 vs. HA: π1 − π2 ≠ .05, at α = .05 . [Note that the pooled proportion π̂pooled is no longer appropriate to use in the expression for the standard error under the null hypothesis, since H0 is not claiming that the two proportions π1 and π2 are equal (to a common value); see notes above.] Conclusion? Exercise: Instead of H0: π1 − π2 = 0 vs. HA: π1 − π2 ≠ 0, test the one-sided null hypothesis H0: π1 − π2 ≤ 0 vs. HA: π1 − π2 > 0 at α = .05 . Conclusion? Exercise: Suppose that in a second experiment, n1 = 400 patients receive a new drug that targets B-lymphocytes, while the remaining n2 = 320 receive a placebo, both in addition to physical therapy. After four weeks, X1 = 376 and X2 = 272 show improvement, respectively. Formally test the null hypothesis of equal proportions at the α = .05 level. Conclusion? Exercise: Finally suppose that in a third experiment, n1 = 400 patients receive “magnet therapy,” while the remaining n2 = 320 do not, both in addition to physical therapy. After four weeks, X1 = 300 and X2 = 240 show improvement, respectively. Formally test the null hypothesis of equal proportions at the α = .05 level. Conclusion? See…

Appendix > Statistical Inference > General Parameters and FORMULA TABLES.

IMPORTANT!

Page 15: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-15 Alternate Method: Chi-Squared (χ 2) Test As before, let the binary variable I = 1 for improvement, I = 0 for no improvement, with probability π and 1 − π, respectively. Now define a second binary variable J = 1 for the “PT + Drug” group, and J = 0 for the “PT only” group. Thus, there are four possible disjoint events: “I = 0 and J = 0,” “I = 0 and J = 1,” “I = 1 and J = 0,” and “I = 1 and J = 1.” The number of times these events occur in the random sample can be arranged in a 2 × 2 contingency table that consists of four cells (NW, NE, SW, and SE) as demonstrated below, and compared with their corresponding expected values based on the null hypothesis. Observed Values Group (J) PT + Drug PT only

Stat

us (I

) Improvement 332 244 576 Row marginal

totals No Improvement 68 76 144

400 320 720 Column marginal totals

versus…

Expected Values = Column total × Row totalTotal Sample Size n

under H0: π1 = π2 π̂pooled = 576/720 = 0.8 Group (J) PT + Drug PT only

Stat

us (I

) Improvement 400 × 576

720 = 320.0

320 × 576720 = 256.0

576

No Improvement 400 × 144

720 = 80.0

320 × 144720 =

64.0 144

400.0 320.0 720

Note: “Chi” is pronounced “kye”

Informal reasoning: Consider the first cell, improvement in the 400 patients of the “PT + Drug” group. The null hypothesis conjectures that the probability of improvement is equal in both groups, and this common value is estimated by the pooled proportion 576/720. Hence, the expected number (under H0) of improved patients in the “PT + Drug” group is 400 × 576/720, etc.

Note that, by construction,

H0: 320400 =

256320

= 576720 , the pooled proportion.

Page 16: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-16

5.0625

Figure 2

.0244

χ 21

Ideally, if all the observed values = all the expected values, then this statistic would = 0, and the corresponding p-value = 1. As it is,

Χ 2 =

(332 − 320)2

320 + (244 − 256)2

256 + (68 − 80)2

80 + (76 − 64)2

64 = 5.0625 on 1 df

Therefore, the p-value = P(χ 21 ≥ 5.0625) = .0244, as before. Reject H0.

Comments:

Chi-squared Test is valid, provided Expected Values ≥ 5. (Otherwise, the score is inflated.) For small expected values in a 2 × 2 table, defer to Fisher’s Exact Test.

Chi-squared statistic with Yates continuity correction to reduce spurious significance:

Χ 2 = Σ (|Obs − Exp| − 0.5)2

Exp all cells

Chi-squared Test is strictly for the two-sided H0: π1 − π2 = 0 vs. HA: π1 − π2 ≠ 0. It cannot be modified to a one-sided test, or to H0: π1 − π2 = π0 vs. HA: π1 − π2 ≠ π0.

Test Statistic for H0: π1 − π2 = 0

Χ 2 = Σ (Obs − Exp)2

Exp ~ χ 21

all cells

Note that

5.0625 = (± 2.25)2,

i.e., χ 2

1 = Z 2.

The two test statistics are mathematically equivalent! (Compare Figures 1 and 2.)

Page 17: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-17

How could we solve this problem using R? The code (which can be shortened a bit): # Lines preceded by the pound sign are read as comments, # and ignored by R.

# The following set of commands builds the 2-by-2 contingency table, # column by column (with optional headings), and displays it as # output (my boldface).

Tx.vs.Control = matrix(c(332, 68, 244, 76), ncol = 2, nrow = 2, dimnames = list("Status" = c("Improvement", "No Improvement"), "Group" = c("PT + Drug", "PT")))

Tx.vs.Control Group Status PT + Drug PT Improvement 332 244 No Improvement 68 76 # A shorter alternative that outputs a simpler table:

Improvement = c(332, 244) No_Improvement = c(68, 76) Tx.vs.Control = rbind(Improvement, No_Improvement)

Tx.vs.Control

[,1] [,2]

Improvement 332 244

No_Improvement 68 76

# The actual Chi-squared Test itself. Since using a correction # factor is the default, the F option specifies that no such # factor is to be used in this example.

chisq.test(Tx.vs.Control, correct = F) Pearson's Chi-squared test data: Tx.vs.Control X-squared = 5.0625, df = 1, p-value = 0.02445

Note how the output includes the Chi-squared test statistic, degrees of freedom, and p-value, all of which agree with our previous manual calculations.

Page 18: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-18 Application: Case-Control Study Design

Determines if an association exists between disease D and risk factor exposure E.

TIME PRESENT PAST

Given: Cases (D+) and Controls (D−) Investigate: Relation with E+ and E−

Chi-Squared Test

H0: π E+ | D+ = π E+ | D− Randomly select a sample of cases and controls, and categorize each member according to whether or not he/she was exposed to the risk factor.

SAMPLE

n1 cases D+

n2 controls D−

For each case (D+), there are 2 disjoint possibilities for exposure: E+ or E−.

For each control (D−), there are 2 disjoint possibilities for exposure: E+ or E−.

D+ D−

E+

E−

E+

E−

a b

c d

Calculate the χ21 statistic:

(a + b + c + d) (ad − bc)2

(a + c) (b + d) (a + b) (c + d)

McNemar’s Test

E+ E−

E+

E−

D+

D−

For each matched case-control ordered pair (D+, D−), there are 4 disjoint possibilities for exposure:

SAMPLE

n cases D+

n controls D−

E+ and E+ or

E− and E+ or

E+ and E− or

E− and E−

concordant pair

discordant pair

discordant pair

concordant pair

b

c

a

d

H0: π E+ | D+ = π E+ | D− Match each case with a corresponding control on age, sex, race, and any other confounding variables that may affect the outcome. Note that this requires a balanced sample: n1 = n2.

Calculate the χ21 statistic:

(b − c)2

b + c

See Appendix > Statistical Inference > Means and Proportions, One and Two Samples.

Page 19: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-19

To quantify the strength of association between D and E, we turn to the notion of…

Odds Ratios – Revisited

Recall: Alas, the probability distribution of the odds ratio OR is distinctly skewed to the right. However, its natural logarithm, ln(OR), is approximately normally distributed, which makes it more useful for conducting the Test of Association above. Namely… (1 − α) × 100% Confidence Limits for ln(OR)

e ln(OR ) ± (zα/2) s.e.

, where s.e. = 1a +

1b +

1c +

1d

POPULATION

Case-Control Studies:

OR = odds(Exposure | Disease)

odds(Exposure | No Disease) = P(E+ | D+) / P(E− | D+)P(E+ | D−) / P(E− | D−)

Cohort Studies:

OR = odds(Disease | Exposure)

odds(Disease | No Exposure) = P(D+ | E+) / P(D− | E+)P(D+ | E−) / P(D− | E−)

H0: OR = 1 ⇔ No association exists between D, E.

versus…

HA: OR ≠ 1 ⇔ An association exists between D, E.

SAMPLE, size n

D+ D− E+ a b

E− c d

OR = adbc

(1 − α) × 100% Confidence Limits for OR

Page 20: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-20

0.79 8.30 2.56 1

1.52 4.32 2.56 1

D+ D− E+ 40 50

E− 50 160

OR = (40)(160)(50)(50) = 2.56

D+ D− E+ 8 10

E− 10 32

OR = (8)(32)

(10)(10) = 2.56

Examples: Test H0: OR = 1 versus HA: OR ≠ 1 at the α = .05 significance level. ln(2.56) = 0.94

s.e. = 18 +

110 +

110 +

132 = 0.6 ⇒ 95% Margin of Error = (1.96)(0.6) = 1.176

95% Confidence Interval for ln(OR) = ( 0.94 − 1.176, 0.94 + 1.176 ) = ( −0.236, 2.116 )

and so… 95% Confidence Interval for OR = ( e−0.236, e2.116 ) = (0.79, 8.30)

Conclusion: As this interval does contain the null value OR = 1, we cannot reject the hypothesis of non-association at the 5% significance level. ln(2.56) = 0.94

s.e. = 140 +

150 +

150 +

1160 = 0.267 ⇒ 95% Margin of Error = (1.96)(0.267) = 0.523

95% Confidence Interval for ln(OR) = ( 0.94 − 0.523, 0.94 + 0.523 ) = ( 0.417, 1.463 )

and so… 95% Confidence Interval for OR = ( e0.417, e1.463 ) = (1.52, 4.32)

Conclusion: As this interval does not contain the null value OR = 1, we can reject the hypothesis of non-association at the 5% level. With 95% confidence, the odds of disease are between 1.52 and 4.32 times higher among the exposed than the unexposed. Comments:

If any of a, b, c, or d = 0, then use s.e. = 1a + 0.5 +

1b + 0.5 +

1c + 0.5 +

1d + 0.5 .

If OR < 1, this suggests that exposure might have a protective effect, e.g., daily calcium supplements (yes/no) and osteoporosis (yes/no).

Page 21: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-21

Summary Odds Ratio

Combining 2 × 2 tables corresponding to distinct strata. Examples:

Males Females All D+ D− D+ D− D+ D−

E+ 10 50 E+ 10 10 →

E+ 20 60

E− 10 150 E− 60 60 E− 70 210

1OR = 3 2OR = 1 OR = 1

Males Females All D+ D− D+ D− D+ D−

E+ 80 20 E+ 10 20 →

E+ 90 40

E− 20 10 E− 20 80 E− 40 90

1OR = 2 2OR = 2 OR = 5.0625

Males Females All D+ D− D+ D− D+ D−

E+ 60 100 E+ 50 10 →

E+ 110 110

E− 10 50 E− 100 60 E− 110 110

1OR = 3 2OR = 3 OR = 1

These examples illustrate the phenomenon known as Simpson’s Paradox. Ignoring a confounding variable (e.g., gender) may obscure an association that exists within each stratum, but not observed in the pooled data, and thus must be adjusted for. When is it acceptable to combine data from two or more such strata? How is the summary odds ratio ORsummary estimated? And how is it tested for association?

???

???

???

Page 22: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-22 In general…

Stratum 1 Stratum 2 D+ D− D+ D−

E+ a1 b1 E+ a2 b2

E− c1 d1 E− c2 d2

1OR = a1 d1b1 c1

2OR = a2 d2b2 c2

Example:

Males Females D+ D− D+ D−

E+ 10 20 E+ 40 50

E− 30 90 E− 60 90

1OR = 1.5 2OR = 1.2

Assuming that the Test of Homogeneity H0: OR1 = OR2 is conducted and accepted,

MHOR =

(10)(90)150 +

(40)(90)240

(20)(30)150 +

(50)(60)240

= 6 + 15

4 + 12.5 = 21

16.5 = 1.273 .

Exercise: Show algebraically that MHOR is a weighted average of 1OR and 2OR .

I. Calculate the estimates of OR1 and OR2 for each stratum, as shown.

II. Can the strata be combined? Conduct a “Breslow-Day” (Chi-squared) Test of Homogeneity for

H0: OR1 = OR2 .

III. If accepted, calculate the Mantel-Haenszel Estimate of ORsummary:

MHOR =

a1 d1n1

+ a2 d2

n2

b1 c1n1

+ b2 c2

n2

.

IV. Finally, conduct a Test of Association for the combined strata

H0: ORsummary = 1

either via confidence interval, or special χ 2-test (shown below).

Page 23: Independent 6.2 Two Samples Dependent (Paired, …pages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/6...Ismor Fischer, 5/29/2012 6.2-1 X 1 − X 2 Null Distribution 6.2 Two Samples

Ismor Fischer, 5/29/2012 6.2-23

To conduct a formal Chi-squared Test of Association H0: ORsummary = 1, we calculate, for the 2 × 2 contingency table in each stratum i = 1, 2,…, s.

Observed # diseased

vs. Expected # diseased Variance

D+ D−

E+ ai bi R1i → E1i = R1i C1i

ni

Vi = R1i R2i C1i C2i

ni2 (ni − 1)

E− ci di R2i → E2i = R2i C1i

ni

C1i C2i ni

Therefore, summing over all strata i = 1, 2,…, s, we obtain the following:

Observed total, Diseased Expected total, Diseased Total Variance

Exposed: O1 = Σ ai Exposed: E1 = Σ E1i

Not Exposed: O2 = Σ ci Not Exposed: E2 = Σ E2i

and the formal test statistic for significance is given by

Χ 2 = (O1 − E1)2

V ~ χ 1

2.

This formulation will appear again in the context of the Log-Rank Test in the area of Survival Analysis (section 8.3). Example (cont’d):

For stratum 1 (males), E11 = (30)( 40)150

= 8 and V1 = 2

(30)(120)( 40)(110)150 (149)

= 4.725.

For stratum 2 (females), E12 = (90)(100)240

= 37.5 and V2 = 2

(90)(150)(100)(140)240 ( 239)

= 13.729.

Therefore, O1 = 50, E1 = 45.5, and V = 18.454, so that Χ 2 = 2(4.5)

18.454 = 1.097 on 1

degree of freedom, from which it follows that the null hypothesis H0: ORsummary = 1 cannot be rejected at the α = .05 significance level, i.e., there is not enough empirical evidence to conclude that an association exists between disease D and exposure E. Comment: This entire discussion on Odds Ratios OR can be modified to Relative Risk RR

(defined only for a cohort study), with the following changes: s.e. = 1a −

1R1

+ 1c −

1R2

,

as well as b replaced with row marginal R1, and d replaced with row marginal R2, in all other formulas. [Recall, for instance, that /OR ad bc= , whereas 2 1/RR aR R c= , etc.]

V = Σ Vi