chapter 2 st 544, d. zhang 2 contingency tables
TRANSCRIPT
CHAPTER 2 ST 544, D. Zhang
2 Contingency Tables
I. Probability Structure of a 2-way Contingency Table
I.1 Contingency tables
• X,Y :– cat. var. Y− usually random (except in a case-control study),
response; X− can be random or fixed, usually acts like a covariate. X
has I levels, Y has J levels.
• A contingency table for X,Y is an I × J table filled with data.
• For example,
Y
1 2 3
X 1 n11 n12 n13
2 n21 n22 n23
Y
1 2
X 1 n11 n12
2 n21 n22
3 n31 n32
Slide 40
CHAPTER 2 ST 544, D. Zhang
• For example, from a random sample of n = 1127 Americans, we have
the following contingency table:
Table 2.1. Cross classification of Belief in Afterlife by gender
Belief in afterlife
Yes No/Undecided
Gender Female 509 116
Male 398 104
• With a contingency table for X,Y , we would like to understand the
association between X and Y , the underlying probability structure of
the table, etc.
• For example, for the afterlife table, we would like to see if one gender
is more likely to believe in afterlife, or the overall proportion with belief
in afterlife in the population, etc.
Slide 41
CHAPTER 2 ST 544, D. Zhang
I.2 Sampling schemes, types of studies, probability structure
• Sampling schemes - ways to get data (tables):
1. Multinomial sampling: From the population, we obtain a random
sample, then cross classify individuals to table cells.
? An example on belief in afterlife from n = 1127 Americans
Table 2.1. Cross classification of Belief in Afterlife by gender
Belief in afterlife
Yes No/Undecided
Gender Female 509 116
Male 398 104
? This is an example of Multinomial sampling.
? The study using this sampling method is called across-sectional study
Slide 42
625
502
Total
CHAPTER 2 ST 544, D. Zhang
? In general, a 2× 2 table from multinomial sampling
Y
1 2
X 1 n11 n12 n1+
2 n21 n22 n2+
n+1 n+2 n
where (n11, n12, n21, n22) are random variables that have a
multinomial distribution with sample size n
(n = n11 + n12 + n21 + n22) and probabilities
Y
1 2
X 1 π11 π12
2 π21 π22
(π11, π12, π21, π22) define the probability structure of the
contingency table.
Slide 43
The standard statistical model underlying analysis of contingency tables is to assume that (unconditional on the total count) the cell counts are independent Poisson random variables.
Once you impose a total cell count for the contingency table, or a row or column count, the resulting conditional distributions of the cell counts then become multinomial.
https://stats.stackexchange.com/questions/45479/pearsons-residuals
CHAPTER 2 ST 544, D. Zhang
? πij ’s can be estimated by pij = nij/n.
? With multinomial sampling, we can estimate many relevant
quantities:
P [Y = 1] =n11 + n21
n=n+1
n
P [X = 1] =n11 + n12
n=n1+
n
P [Y = 1|X = 1] =n11
n11 + n12=n11
n1+
P [X = 1|Y = 1] =n11
n11 + n21=n11
n+1...
? For afterlife example, we estimated that
P [belief in afterlife] =509 + 398
1127= 80%
P [belief in afterlife|Female] =509
509 + 116= 81%
P [belief in afterlife|Male] =398
398 + 104= 79%...
Slide 44
907 220 1,127Total
1. Find joint prob;2. Find marginal prob;3. Find Conditional prob.
CHAPTER 2 ST 544, D. Zhang
2. Product-multinomial sampling on X: For example, in a clinical
trial for heart disease, we randomly assign 200 patients to
treatment 1 and 100 patients to treatment 2 and may obtain
potential data like the following:
Y
Better No Change Worse
Treatment 1 n11 n12 n13 200
Treatment 2 n21 n22 n23 100
Here we have
(n11, n12, n13) ⊥ (n21, n22, n23)
(n11, n12, n13) ∼ multinomial(200, (π1, π2, π3)), π1 + π2 + π3 = 1
(n21, n22, n23) ∼ multinomial(100, (τ1, τ2, τ3)), τ1 + τ2 + τ3 = 1
(π1, π2, π3) and (τ1, τ2, τ3) define the probability structure of this
contingency table.
Slide 45
CHAPTER 2 ST 544, D. Zhang
? In general, the data looks like
Y
1 2 3
X 1 n11 n12 n13 n1+
2 n21 n22 n23 n2+
where n1+ and n2+, the sample sizes for X = 1 and X = 2, are
fixed.
(n11, n12, n13) ⊥ (n21, n22, n23)
(n11, n12, n13) ∼ multinom(n1+, (π1, π2, π3)), π1 + π2 + π3 = 1
(n21, n22, n23) ∼ multinom(n2+, (τ1, τ2, τ3)), τ1 + τ2 + τ3 = 1
? Since the likelihood of π’s and τ ’s is the product of the likelihood
of π’s and the likelihood of τ ’s, this sampling scheme is called
product-multinomial sampling on X.
? Clinical trials, cohort studies (prospective studies) all use this
sampling scheme.
Slide 46
Prospective study: Participants are enrolled into the study before they develop the disease or outcome in question.
CHAPTER 2 ST 544, D. Zhang
? When X is also random (so has a distribution in the population),
(π1, π2, π3)’s defines the conditional distribution of Y given
X = 1
(τ1, τ2, τ3)’s defines the conditional distribution of Y given
X = 2.
? With product-multinomial sampling on X, we can only estimate
conditional probabilities of Y |X = x. Other probabilities are not
estimable. For example, we cannot estimate P [Y = 1].
Slide 47
CHAPTER 2 ST 544, D. Zhang
3. Product multinomial sampling on Y:
If Y represents a rare event, then a prospective study is inefficient.
For example, if we would like to investigate the association between
smoking and lung cancer and conduct a prospective study
Lung Cancer
Yes No
Smoking Yes n11 n12 n1+
No n21 n22 n2+
then n11, n21 will be small unless n1+ and n2+ are very large.
This will yield an inefficient study.
Slide 48
CHAPTER 2 ST 544, D. Zhang
? We may consider a design such as the following one:
Lung Cancer
Yes No
Smoking Yes n11 n12
No n21 n22
n+1 = 100 n+2 = 200
All cell counts will not be small ⇒ efficient.
n11 ⊥ n12
n11 ∼ Bin(n+1, π1), π1 = P [smoking|case].
n12 ∼ Bin(n+2, π2), π2 = P [smoking|control].
? We can still investigate the association between smoking and
lung cancer using this design.
? This sampling scheme is product-multinomial on Y .
? The study is often called the case-control study.
Slide 49
CHAPTER 2 ST 544, D. Zhang
? In general,
Lung Cancer
Yes No
Smoking Yes n11 n12
No n21 n22
n+1 n+2
where n+1, n+2, are all fixed.
n11 ⊥ n12
n11 ∼ Bin(n+1, π1), π1 = P [smoking|case].
n12 ∼ Bin(n+2, π2), π2 = P [smoking|control].
Slide 50
n11
n11 + n21 π1=
π2=n12
n12 + n22
CHAPTER 2 ST 544, D. Zhang
? Example of a case-control study on MI (Table 2.4)
Table 2.4. Case-Control Study on MI
Myocardial Infarction
Case Control
Ever Smoker Yes 172 173
No 90 346
262 519
where 262 is the sample size for MI cases, 519 is the sample size
for controls.
? From this study, we cannot estimate the quantities such as
P [MI]
P [Ever Smoking]
P [MI|Ever smokers]
P [MI|Never smokers] ...
Slide 51
CHAPTER 2 ST 544, D. Zhang
• Note: Multinomial sampling ⇒ product-multinomial sampling.
For example, if we have data from a multinomial sampling with sample
size n:
Y
1 2
X 1 n11 n12
2 n21 n22
Y
1 2
X 1 π11 π12
2 π21 π22
Then we can view the data from product-multinomial sampling on X
or product-multinomial sampling on Y.
That is:
n11|n1+ ∼ Bin(n1+,π11
π11+π12 π) ⊥ n21|n2+ ∼ Bin(n2+, 21
π21+π22
)
Or
n11|n+1 ∼ Bin(n+1,π11
π11+π21 π) ⊥ n12|n+2 ∼ Bin(n+2, 12
π12+π22
)
Slide 52
CHAPTER 2 ST 544, D. Zhang
I.3 Sensitivity & Specificity in Diagnostic Tests
• In a diagnostic test, X = true disease status, Y = test result. Then we
can form a 2× 2 table:
Y
Positive Negative
X Disease
No Disease
• Using data from multinomial sampling or product-multinomial
sampling on X, we can estimate
Sensitivity = P [Y = Positive|X = Disease] (True positive rate)
Specificity = P [Y = Negative|X = No disease] (True negative rate)
• 1-Sensitivity = False negative rate, 1-Specificity = False positive rate.
These two quantities tell us how accurate a test/device is.
Manufacturer of a test device usually provides these two measures.
Slide 53
Q: Find sensitivity and specificity.
The higher the sensitivity and specificity, the better the diagnostic test.
CHAPTER 2 ST 544, D. Zhang
• However, a customer (or potential patient) may be more interested in
the following quantities:
P [X = Disease|Y = Positive] (PV+)
P [X = No disease|Y = Negative] (PV-)
• An accurate test may not yield high PV+ and/or PV-.
For example, assume a mammogram (for breast cancer) has
sensitivity=0.86 and specificity=0.88. If P [breast cancer]=0.01. Then
PV+ = P [X = BR|Y = +] =P [X = BR, Y = +]
P [Y = +]
=P [Y = +|X = BR]P [X = BR]
P [Y = +|X = BR]P [X = BR] + P [Y = +|X = No BR]P [X = No BR]
=0.86× 0.01
0.86× 0.01 + (1− 0.88)× (1− 0.01)= 6.8%
Similarly, PV- = 99.8% (without the test, P[No BR]=0.99).
Slide 54
Positive Predictive Value (PV+) is the probability of disease in an individual with a positive test result. Negative Predictive Value (PV - ) is the probability of not having the disease when the test result is negative.
CHAPTER 2 ST 544, D. Zhang
I.4 Independence of X and Y
• X and Y are random with the underlying probability structure
Y
1 2 J
X 1 π11 π12 . π1J
2 π21 π22 . π2J
. . . . .
I πI1 πI2 . πIJ
• X ⊥ Y
⇔ P [X = i , Y = j ] = P [X = i ]*P [ Y = j ] f or i = 1, 2, . .., I , j = 1, 2, . .., J.⇔ πij = πi+π+j f or i = 1, 2, . .., I , j = 1, 2, . .., J.(πi+ = πi1 + πi2 + . .. + πiJ , π+j = π1j + π2j + . .. + πIj )⇔ P [ Y = j |X = i ] = P [ Y = j |X = k] f or all i , j, k.
Slide 55
CHAPTER 2 ST 544, D. Zhang
• When X and Y are random 2-level cat. variables, the underlying
probability structure is
Y
1 2
X 1 π11 π12
2 π21 π22
• X ⊥ Y⇔ πij = πi+π+j for i, j = 1, 2 (πi+ = πi1 + πi2, π+j = π1j + π2j)
We only need one of them, e.g. π11 = π1+π+1
⇔ P [Y = 1|X = 1] = P [Y = 1|X = 2], i.e.
π1 =π11
π1+=π21
π2+= π2
Slide 56
Note that
CHAPTER 2 ST 544, D. Zhang
II Comparing Proportions in 2× 2 Tables
II.1 Difference of proportions
• Given data from a multinomial sampling or product-multinomial
sampling on X
Y
1 2
X 1 n11 n12 n1+
2 n21 n22 n2+
we would like to make inference on π1 − π2 where
π1 = P [Y = 1|X = 1] is the success probability for row 1 and
π2 = P [Y = 1|X = 2] is the the success probability for row 2.
• X ⊥ Y ⇔ π1 − π2 = 0.
Slide 57
Recall:
CHAPTER 2 ST 544, D. Zhang
1. Estimate of π1 − π2:
p1 − p2 =n11
n1+− n21
n2+.
2. Estimated SE (standard error) of p1 − p2:
SE(p1 − p2) =√p1(1− p1)/n1+ + p2(1− p2)/n2+
3. Large-sample (1− α) CI for π1 − π2:
p1 − p2 ± zα/2SE(p1 − p2).
If this CI does not contain 0, we can reject H0 : X ⊥ Y at
significance level α.
Slide 58
Recall:
Critical value:Zα/2 = qnorm(0.975)=1.959964
CHAPTER 2 ST 544, D. Zhang
• Example: Aspirin and heart attack.
In a 5-yr study, 22,000+ physicians were randomized (blinded) to the
placebo/aspirin (one tablet every other day) group:
Myocardial infarction
Yes No
Treatment Placebo 189 10, 845 11,034
Aspirin 104 10,933 11,037
1. Difference of MI probabilities between placebo and aspirin groups:
p1 − p2 = 189/11034− 104/11037 = 0.0171− 0.0094 = 0.0077.
2. SE =√
0.0171(1− 0.0171)/11034 + 0.0094(1− 0.0094)/11037 =
0.0015.
3. Large sample 95% CI of Difference of MI probabilities:
0.0077± 1.96× 0.0015 = [0.0048, 0.0106].
⇒ Physicians in placebo group are more likely to develop MI.Slide 59
(on X)
Critical value:Zα/2 = qnorm(0.975)=1.959964
CHAPTER 2 ST 544, D. Zhang
II.2 Relative Risk
• When both π1 and π2 are close to zero (rare event), the difference
π1 − π2 may not be very meaningful.
For example,
Case 1: π1 = 0.01, π2 = 0.001⇒ π1 − π2 = 0.009
Case 2: π1 = 0.41, π2 = 0.401⇒ π1 − π2 = 0.009
The above cases have the same difference π1 − π2. However, the
meanings are totally different.
• For rare events, a more relevant measure for difference is the relative
risk (RR):
RR =π1
π2.
Slide 60
For example:(a) RR=0.01/0.001=10;(b) RR=0.41/.401 = 1.022444.
CHAPTER 2 ST 544, D. Zhang
• Properties of the relative risk (RR):
1. 0 < RR <∞2. π1 > π2 ⇔ RR > 1;
π1 = π2 ⇔ RR = 1;
π1 < π2 ⇔ RR < 1.
3. X ⊥ Y ⇔ RR = 1.
• Estimate of RR: Given the 2× 2 table from multinomial sampling or
product-multinomial sampling on X, RR can be estimated by
RR =p1
p2.
Slide 61
Recall:
• X ⊥ Y ⇔ π1 − π2 = 0.
RR =π1
π2.
CHAPTER 2 ST 544, D. Zhang
• RR also has a nice interpretation. For the Aspirin Study, the RR
estimate is
RR =p1
p2=
0.0171
0.0094= 1.82.
⇒ Physicians receiving the placebo are 82% more likely to develop MI
(over 5 yrs) than physicians receiving aspirin.
• SE and CI for RR are complicated, Proc Freq calculates CI for RR
and other measures:data table2_3;
input group $ mi $ count @@;datalines;placebo yes 189 placebo no 10845aspirin yes 104 aspirin no 10933
;
title "Analysis of MI data";proc freq data=table2_3 order=data;
weight count;tables group*mi / norow nocol nopercent or;
run;
Slide 62
CHAPTER 2 ST 544, D. Zhang
Output from the above SAS program:The FREQ Procedure
Table of group by mi
group mi
Frequency|yes |no | Total---------+--------+--------+placebo | 189 | 10845 | 11034---------+--------+--------+aspirin | 104 | 10933 | 11037---------+--------+--------+Total 293 21778 22071
Statistics for Table of group by mi Odds Ratio and Relative Risks
Statistic Value 95% Confidence Limits------------------------------------------------------------------Odds Ratio 1.8321 1.4400 2.3308Relative Risk (Column 1) 1.8178 1.4330 2.3059Relative Risk (Column 2) 0.9922 0.9892 0.9953
Sample Size = 22071
A 95% CI for RR is [1.43, 2.31]. We are 95% sure that physicians receiving the placebo is at least 43% and at most 131% more likely to develop MI (over 5 yrs) than physicians receiving aspirin.
Slide 63
The sample relative risk has a sampling distribution that is highly skewed unless the sample sizes are quite large. Because of this, its confidence interval formula is rather complex.
CHAPTER 2 ST 544, D. Zhang
II.3 Odds Ratio
• Odds of a prob w (of an event): if π = P (A), then
ω =π
1− π=
success prob
failure prob
is called the odds of π (or of the event A). 0 < ω <∞.
For example, π = 0.75, then ω = 0.75/(1− 0.75) = 3.
For a rare event (π ≈ 0), π ≈ ω.
• The event prob π is related to odds ω as:
π =ω
1 + ω.
For example, ω = 4, then π = 4/(1 + 4) = 0.8.
Slide 64
When odds = 3.0, we expect to observe three successes for every one failure
CHAPTER 2 ST 544, D. Zhang
• For the 2× 2 table
Y
1 2
X 1
2
the odds ratio between row 1 (π1 = P [Y = 1|X = 1]) and row 2
(π2 = P [Y = 1|X = 2]) is defined as
θ =odds1
odds2=π1/(1− π1)
π2/(1− π2).
• Properties of the odds ratio
1. 0 < θ < ∞.
2. π1 > π2 ⇔ θ > 1;π1 = π2 ⇔ θ = 1;π1 < π2 ⇔ θ < 1;
3. X ⊥ Y ⇔ θ = 1.Slide 65
Values of θ farther from 1.0 in a given direction represent a stronger association.
When θ = 0.25, for example, the odds of success in row 1 are 0.25 times the odds of success in row 2, or equivalently 1/0.25 = 4.0 times as high in row 2 as in row 1.
CHAPTER 2 ST 544, D. Zhang
• Given the 2× 2 table from multinomial sampling or
product-multinomial sampling on X:
Y
1 2
X 1 n11 n12 n1+
2 n21 n22 n2+
odds ratio θ can be estimated by
θ =p1/(1− p1)
p2/(1− p2)=n11/n1+/(1− n11/n1+)
n21/n2+/(1− n21/n2+)=n11/n12
n21/n22=n11n22
n12n21,
• var(log θ) can be estimated by
var(log θ) =1
n11+
1
n12+
1
n21+
1
n22.
Slide 66
Q: 95% CI for θ
CHAPTER 2 ST 544, D. Zhang
• We can construct a (1− α) CI for true θ as follows:
1. Get (1− α) CI for log(θ):
log θ ± zα/2SE(log θ).
2. Exponentiate both ends to get the CI for θ.
• For the Aspirin Study,
θ = 189×1093310845×104 = 1.8321(≈ RR)
var(log θ) = 1189 + 1
10845 + 1104 + 1
10933 = 0.01509
95%CI for log θ: log(1.8321)± 1.96√
0.01509 = [0.3647, 0.8462].
95% CI for θ : [e0.3647, e0.8462] = [1.44, 2.33].
Slide 67
Recall:
The estimated odds of MI for those takingplacebo equal 1.83 times the estimated odds for those taking aspirin. The estimated oddswere 83% higher for the placebo group.
We estimate that the odds of MIare at least 44% higher when taking placebo than when taking aspirin.
Critical value:Zα/2 = qnorm(0.975)=1.959964
CHAPTER 2 ST 544, D. Zhang
• Note 1: If we have multinomial sampling:
Y
1 2
X 1 n11 n12
2 n21 n22
Y
1 2
X 1 π11 π12
2 π21 π22
the odds ratio θ can be also defined as
θ =π11π22
π12π21.
MLE of πij ’s are πij = nij/n ⇒ the same estimate of θ:
θ =π11π22
π12π21=n11n22
n12n21.
• Note 2: If some of nij ’s are small, add 0.5 to each cell then
re-calculate θ and var(log θ), e.g.
θ =(n11 + 0.5)(n22 + 0.5)
(n12 + 0.5)(n21 + 0.5)
Slide 68
CHAPTER 2 ST 544, D. Zhang
• The relationship between θ and RR:
θ =π1/(1− π1)
π2/(1− π2)=π1
π2× (1− π2)
(1− π1)= RR× (1− π2)
(1− π1)
1. RR = 1⇔ θ = 1⇔ X ⊥ Y .
2. π1 > π2 ⇔ θ > RR > 1.
3. π1 < π2 ⇔ θ < RR < 1.
4. When π1 ≈ 0 & π2 ≈ 0 (rare events), θ ≈ RR.
0
-
θ RR 1 RR θ
Slide 69
CHAPTER 2 ST 544, D. Zhang
• The odds ratio for case-control studies:
? For the MI study (page 32)
Table 2.4. Case-Control Study on MI
Myocardial Infarction
Case Control
Ever Smoker Yes 172 173
No 90 346
262 519
we know that we cannot estimate π1 = P [MI|Eversmokers] and
π2 = P [MI|Neversmokers], and hence cannot estimate
RR =π1. π2
? However, we still want to assess the association between smoking and MI.
Slide 70
τ1 = P [Ever smoking|MI Case] τ2 = P [Ever smoking|MI Control]
CHAPTER 2 ST 544, D. Zhang
? From the design, we can estimate
τ1 = P [Ever smoking|MI Case] : τ1 = 172/262 = 0.6565
τ2 = P [Ever smoking|MI Control] : τ2 = 172/262 = 0.3333
and the odds ratio between τ1 and τ2
θ∗ =τ1/(1− τ1)
τ2/(1− τ2): θ∗ =
τ1/(1− τ1)
τ2/(1− τ2)=n11n22
n12n21= 3.82.
? It can be shown that
θ∗ =π1/(1− π1)
π2/(1− π2)= θ
So we can use a case-control study to make inference on θ!
? The formula for var(log θ) is the same:
var(log θ) =1
n11+
1
n12+
1
n21+
1
n22.
Slide 71
CHAPTER 2 ST 544, D. Zhang
? Therefore, for the Aspirin case-control study, the odds ratio of
developing MI between ever smokers and never smokers is
estimated as
θ = 3.82.
var(log θ) =1
172+
1
173+
1
90+
1
346= 0.0256.
95% CI for log θ:
log(3.82)± 1.96×√
0.0256 = [1.02665, 1.65385]
95% CI for θ: [e1.02665, e1.65385] = [2.79, 5.227].
• Since MI is a rare event, RR ≈ θ, so
RR ≈ 3.82 ≈ 4.
That is, ever smokers is about 3 times more likely
to develop MI than never smokers.
Slide 72
We estimate that the odds of MI are at least 179% higher when taking placebo than when taking aspirin.
CHAPTER 2 ST 544, D. Zhang
III χ2 Test for Independence between X and Y (nominal)
Suppose X and Y are random and have the prob structure:
Y
1 2 J
X 1 π11 π12 . π1J
2 π21 π22 . π2J
. . . . .
I πI1 πI2 . πIJ
Given data {nij}’s from a multinomial sampling, we would like to test
H0 : πij = πij(θ), for i = 1, .., I, and j = 1, ..., J , where θ is a parameter
vector with dim(θ) = k.
If dim(θ) = 0, then πij ’s are totally known under H0.
Slide 73
https://academo.org/demos/dice-roll-statistics/
Consider the null hypothesis (H0) that cell probabilities in a two-way contingency table equal certain fixed values {πij}. For a sample of size n with cell counts {nij}, the values {μij = nπij} are called expected frequencies. They represent the expected values {E(nij)} when H0 is true. To judge whether the data contradict H0, we compare {nij} to {μij}. If H0 is true, nij should be close to μij in each cell.
CHAPTER 2 ST 544, D. Zhang
III.1 General Pearson χ2 test and LRT
• MLE θ of θ under H0; µij = nπij(θ), where n = n++.
• If H0 is true and n is large such as µij ’s are reasonably large (µij ≥ 5),
then the Pearson stat
χ2 =∑
all cells
(nij − µij)2
µij
H0∼ χ2df
where df = IJ − 1− dim(θ).
Reject H0 at level α if χ2 ≥ χ2df,α.
• LRT
G2 = 2∑
all cells
nij log
(nijµij
)H0∼ χ2
df .
• Calculation of df :
df = [# of unknown parameters under H 1 ∪ H 0 ] − [# of unknown parameters under H 0].
Slide 74
For testing independence in r × c contingency tables, the approximate chi-squared sampling distributions of X2 and G2 have df = (r − 1)(c − 1).
The df value means: under H0, {πi+} and {π+j} determines the cell prob. There are r − 1 non-redundant row prob. Because they sum to 1, the first r − 1 determines the last one through πr+ = 1− (π1+ + · · · + πr−1,+). Similarly, there are c − 1 non-redundant column prob, so, under H0, there are (r − 1) + (c − 1) parameters. Alternative hypothesis Ha states that there is not independence but does not specify a pattern for the rc cell prob. The prob are then solely constrained to sum to 1, so there are rc − 1 non-redundant parameters. Value for df is the difference between the number of parameters under (Ha and H0) and (H0), ordf = (rc − 1) − [(r − 1) + (c − 1)] = rc − r − c + 1 = (r − 1)(c − 1).
CHAPTER 2 ST 544, D. Zhang
Some χ2 distributions
Slide 75
CHAPTER 2 ST 544, D. Zhang
III.2 Test of independence
• X ⊥ Y ⇔ H0 : πij = πi+π+j , i = 1, ..., I, j = 1, ..., J
• The MLE of πi+’s and π+j ’s are
πi+ =ni+n, π+j =
n+j
n
• µij is equal to
µij = nπi+π+j =ni+n+j
n
• Pearson χ2 and LRT :
χ2 =∑
all cells
(nij − µij)2
µij, G2 = 2
∑all cells
nij log
(nijµij
)H0∼ χ2
df
df = IJ − 1− (I − 1 + J − 1) = (I − 1)(J − 1).
Reject H0 : X ⊥ Y if χ2 or G2 ≥ χ2df,α.
Slide 76
Note: For both test statistics, larger values provide stronger evidence against H0
For both test statistics:p-value = 1-pchisq(X2, df)
Q: Find X2 and G2, and then find the p-values.
CHAPTER 2 ST 544, D. Zhang
• Note: With data {nij}’s from a multinomial sampling or
product-multinomial sampling on X, we can test H0 : X ⊥ Y by
testing
H0 : P [Y = j|X = i] = P [Y = j|X = k] for all i, j, k
(cond. dist. of Y given X is the same across all levels of X)
It can be shown that the Pearson χ2 and LRT test stats are the same
with the same null dist χ2(I−1)(J−1).
Slide 77
CHAPTER 2 ST 544, D. Zhang
• Example: Gender gap in party identification
Y –Party Identification
Democrat Independent Republican Total
X – Gender Female 762 327 468 1557
Male 484 239 477 1200
1246 566 945 n = 2757
Then µ11 = 1557× 1246/2757 = 703.7,
µ12 = 1557× 566/2757 = 319.6, etc.
⇒ χ2 =(762− 703.7)2
703.7+
(327− 319.6)2
319.6+ ... = 30.1
G2 = 2(762 log(762/703.7) + 327 log(327/319.6) + ...) = 30.0
χ22,0.05 = 5.99
Both Pearson test and LRT reject H0 : X ⊥ Y at level 0.05.
Note: χ2 ≈ G2 even if H0 is likely not true.
Slide 78
This evidence of association would be rather unusual if the variables were truly independent. Both test statistics suggest that political party ID and gender are associated.
See Chap2 R codes for details
CHAPTER 2 ST 544, D. Zhang
• SAS program for the example:data table2_5;
input gender $ party $ count @@;datalines;female dem 762 female ind 327 female rep 468male dem 484 male ind 239 male rep 477
;
title "Analysis of Party Identification data";proc freq data=table2_5 order=data;
weight count;tables gender*party / norow nocol nopercent chisq expected measures cmh;
run;
• Output from the above program:Analysis of Party Identification data 1
The FREQ Procedure
Table of gender by party
gender party
Frequency|Expected |dem |ind |rep | Total---------+--------+--------+--------+female | 762 | 327 | 468 | 1557
| 703.67 | 319.65 | 533.68 |---------+--------+--------+--------+male | 484 | 239 | 477 | 1200
| 542.33 | 246.35 | 411.32 |---------+--------+--------+--------+Total 1246 566 945 2757
Slide 79
CHAPTER 2 ST 544, D. Zhang
Statistics for Table of gender by party
Statistic DF Value Prob------------------------------------------------------Chi-Square 2 30.0701 <.0001Likelihood Ratio Chi-Square 2 30.0167 <.0001Mantel-Haenszel Chi-Square 1 28.9797 <.0001Phi Coefficient 0.1044Contingency Coefficient 0.1039Cramer’s V 0.1044
Sample Size = 2757
Statistic Value ASE------------------------------------------------------Gamma 0.1710 0.0315Kendall’s Tau-b 0.0964 0.0180Stuart’s Tau-c 0.1078 0.0202
Somers’ D C|R 0.1097 0.0205Somers’ D R|C 0.0848 0.0158
Pearson Correlation 0.1025 0.0190Spearman Correlation 0.1016 0.0190
Summary Statistics for gender by party
Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic Alternative Hypothesis DF Value Prob---------------------------------------------------------------
1 Nonzero Correlation 1 28.9797 <.00012 Row Mean Scores Differ 1 28.9797 <.00013 General Association 2 30.0592 <.0001
Slide 80
CHAPTER 2 ST 544, D. Zhang
III.3 Cell residuals for a contingency table
• Under H0 : X ⊥ Y ,
µij =ni+n+j
n.
• Calculate standardized Pearson residuals:
estij =nij − µij√
µij(1− pi+)(1− p+j).
• Under H0 : X ⊥ Y , E(estij) ≈ 0, var(estij) ≈ 1, and estij behaves like a
N(0, 1) variable.
• We can use estij to check the departure from H0 : X ⊥ Y .
• For the Party Identification example, p1+ = 1557/2757 = 0.565,
p+1 = 1246/2757 = 0.452
⇒ est11=
762− 703.7√703.7(1− 0.565)(1− 0.452)
= 4.50
Slide 81
P-value=2*pnorm(-4.5) = 6.795346e-06
Under H0, we expect about 5% of the standardized residuals to be farther from 0 than ±2 by chance alone.
Q: Find est12
CHAPTER 2 ST 544, D. Zhang
• We can use Proc Genmod of SAS to get the standardized Pearson
residuals:Proc Genmod order=data;
class gender party;model count = gender party / dist=poisson link=log residuals;
run;
• Part of the output:
Std StdRaw Pearson Deviance Deviance Pearson Likelihood
Observation Residual Residual Residual Residual Residual Residual
1 58.328618 2.1988558 2.1694814 4.4419109 4.5020535 4.48777992 7.3547334 0.4113702 0.4098076 0.6967948 0.6994517 0.69853393 -65.68335 -2.84324 -2.904774 -5.430995 -5.315946 -5.349114 -58.32862 -2.504669 -2.551707 -4.586602 -4.502054 -4.5283915 -7.354733 -0.468583 -0.470944 -0.702976 -0.699452 -0.7010366 65.683351 3.2386734 3.157751 5.1831197 5.3159455 5.2670354
The observation order is for row 1, then row 2, etc.
Slide 82
CHAPTER 2 ST 544, D. Zhang
• Put the standardized Pearson residuals in the original table:
Y –Party Identification
Democrat Independent Republican Total
X – Gender Female 4.5 0.7 -5.3
Male -4.5 -0.7 5.3
We see from the table that the independence model does not fit data well.
There are significantly more democrat females (less males) than predicted by
the independence model, there are significantly less republican females (more
males) than predicted by the model.
Slide 83
Under H0, we expect about 5% of the standardized residuals to be farther from 0 than ±2 by chance alone.
CHAPTER 2 ST 544, D. Zhang
IV Testing Independence for Ordinal Data
IV.1 X,Y are both ordinal random cat. variables; Mantel-Haenszel M2
(CMH1)
• Assign scores u1 < u2 < · · · < uI to X and v1 < v2 < · · · < vJ to Y
Y
1(v1) j(vj) J(vJ)
1(u1)
X i(ui) πij
I(uI)
• Want to test H0 : X ⊥ Y given data such as
Slide 84
Let u1 ≤ u2 ≤ · · · ≤ ur denote scores for the rows, and v1 ≤ v2 ≤ · · · ≤ vc denote scores for the columns, having the same ordering as the categories.
CHAPTER 2 ST 544, D. Zhang
Y
v1 v2 v3
u1 2 1 3
X u2 1 2 1
u3 1 1 2
⇒
Patient X Y
1 u1 v1
2 u1 v1
3 u1 v2
4 u1 v3
5 u1 v3
6 u1 v3
7 u2 v1
8 u2 v2
9 u2 v2
10 u2 v3
11 u3 v1
12 u3 v2
13 u3 v3
14 u3 v3
Slide 85
Q: Find X-bar and Y-bar.
Let u1 ≤ u2 ≤ · · · ≤ ur denote scores for the rows, and v1 ≤ v2 ≤ · · · ≤ vc denote scores for columns, having same ordering as categories.
CHAPTER 2 ST 544, D. Zhang
• Pearson correlation coefficient describes linear relationship between X
and Y and can be used to test H0 : X ⊥ Y :
r =1
n−1
∑ni=1(xi − x)(yi − y)√
1n−1
∑ni=1(xi − x)2 1
n−1
∑ni=1(yi − y)2
,
where
x =1
n
n∑i=1
xi =1
n
I∑i=1
ni+ui =I∑i=1
pi+ui = u
y =1
n
n∑i=1
yi =1
n
J∑j=1
n+jvj =
J∑j=1
p+jvj = v
Slide 86
Correlation falls between −1 and +1. Independence between variables implies that its population value ρ = 0. Larger value of |R|,farther data fall fromindependence in lineardimension.
CHAPTER 2 ST 544, D. Zhang
=⇒
r =
∑Ii=1
∑Jj=1 pij(ui − u)(vj − v)√∑I
i=1 pi+(ui − u)2∑Jj=1 p+j(vj − v)2
• It can be shown that under H0 : X ⊥ Y√n − 1 r ∼a N(0, 1)
∼M2 = (n − 1) r2 a χ21
This is the Mantel-Haenszel test for H0 : X ⊥ Y (cmh1 in SAS).
• Note: We don’t have to expand the data to calculate r. Proc Freq
calculates r and M2.
Slide 87
CHAPTER 2 ST 544, D. Zhang
• How to choose scores {ui}’s for X and {vj}’s for Y :
1. Any increasing/decreasing seq is ok for {ui}’s and {vj}’s. They
have to be chosen before analyzing data.
2. Mid-rank. For example,
Y
1 2 3 ui
1 2 1 3 6 3.5
X 2 1 2 1 4 8.5
3 1 1 2 4 12.5
4 4 6
vj 2.5 6.5 11.5Proc Freq order=data
tables x*y/CMH1 Scores=rank;run;
3. The default is “1, 2, · · · , I” for X and “1, 2, · · · , J” for Y in SAS.
Slide 88
CHAPTER 2 ST 544, D. Zhang
• Note 1: M2 only detects “linear trend” between X and Y , Pearson
χ2 and LRT G2 detects any deviation from indep.
• Note 2: Proc corr of SAS uses (as the default)
t = (n− 2)1/2
(r2
1− r2
)1/2
to test H0 : ρ = 0 by comparing t to tn−2. M2 and t2 are asymptotically equivalent under H0.
• From slide 80, M2 = 28.98 using 1,2 for gender and 1,2,3 for party
identification. Reject H0 : X ⊥ Y .
• Note 3: M2 is for a 2-sided test. We can use√n− 1r for a
one-sided test.
From slide 80,√n− 1r =
√28.98 = 5.4 ⇒ reject H0 : X ⊥ Y in
favor of H1 : ρ > 0 (even if r = 0.1).
Slide 89
CHAPTER 2 ST 544, D. Zhang
• Example: Mother’s alcohol consumption and infant malformation(Table 2.7 on p. 42)
Alcohol Malformation
Consumption Present (Y = 1) Absent (Y = 0)
0 48 17, 066
< 1 38 14, 464
1− 2 5 788
3− 5 1 126
≥ 6 1 37
χ2 = 12.1 (p-value = 0.016) , G2 = 6.2 (p-value = 0.185) ⇒ mixed results.
Assigned scores for alcohol consumption: 0, 0.5, 1.5, 4, 7 and 0/1 for absent/present
⇒ r = 0.0142, M2 = 6.6, p-value =P [χ2
1 ≥ M2] = 0.01.
χ2, G2, M2 may not be valid ⇒ Exact test (later).
Slide 90
CHAPTER 2 ST 544, D. Zhang
• SAS program:data table2_7;
input alcohol malform count @@;datalines;0 1 48 0 0 170660.5 1 38 0.5 0 144641.5 1 5 1.5 0 7884 1 1 4 0 1267 1 1 7 0 37
;
title "Analysis of infant malformation data";proc freq data=table2_7;
weight count;tables alcohol*malform / measures chisq cmh;
run;
• Part of the output:Statistics for Table of alcohol by malform
Statistic DF Value Prob------------------------------------------------------Chi-Square 4 12.0821 0.0168Likelihood Ratio Chi-Square 4 6.2020 0.1846Mantel-Haenszel Chi-Square 1 6.5699 0.0104
Statistic Value ASE------------------------------------------------------Pearson Correlation 0.0142 0.0106Spearman Correlation 0.0033 0.0059
Slide 91
CHAPTER 2 ST 544, D. Zhang
IV.2 Trend test for I × 2 and 2× J tables
• For an I × 2 table where X is an I-level ordinal variable and Y is a
2-level variable (such as the infant malformation table) from a
multinomial sampling or product-multinomial sampling on X:
Y
1 0
u1 n11 n12 n1+
X u2 n21 n22 n2+
...
uI nI1 nI2 nI+
we can assign scores to X and any scores (usually 0/1) to Y ⇒ M2.
Slide 92
CHAPTER 2 ST 544, D. Zhang
• The Mantel-Haenszel M2 can be derived in a different way (taken
from Section 3.2.1)
Consider
πi = P [Y = 1|X = ui].
Assume a linear trend model for πi:
πi = α+ βui
Then H0 : X ⊥ Y =⇒ H∗0 : β = 0
An unbiased estimate of πi:
πi =ni1ni+
= pi ← sample proportion at X = ui
The trend model implies the following linear model for pi:
pi = α+ βui + εi,
Slide 93
CHAPTER 2 ST 544, D. Zhang
var(εi) = πi(1 − πi)/ni+, which equals α(1 − α)/ni+ under H0∗ : β = 0
=⇒ WLS (weighted LS, weighted by sample size ni+) estimate of β
β =
∑Ii=1 ni+(ui − u)(pi − p)∑I
i=1 ni+(ui − u)2,
where
u =1
n
I∑i=1
ni+ui ← sample mean of {Xi}
p =n+1
n← pooled sample response rate
var(β) under H0 can be estimated by
varH0(β) =
p(1− p)∑Ii=1 ni+(ui − u)2
.
Slide 94
WLS: weighted least square
CHAPTER 2 ST 544, D. Zhang
For testing H∗0 : β = 0, let’s use Wald test
Z =β√
varH0(β)
Under H0 : X ⊥ Y , Z ∼ N(0, 1) or Z2 ∼ χ21.
• Z2 or Z is the Cochran-Armitage Trend test.
It can be shown that Z2 = nr2. Remember M2 = (n− 1)r2
⇒ Z2 =n
n− 1M2 ≈M2
• SAS program:title "Trend test of infant malformation data";proc freq data=table2_7 order=data;
weight count;tables alcohol*malform / trend;
run;
Slide 95
CHAPTER 2 ST 544, D. Zhang
• Part of the output:Statistics for Table of alcohol by malform
Cochran-Armitage Trend Test--------------------------Statistic (Z) 2.5632One-sided Pr > Z 0.0052Two-sided Pr > |Z| 0.0104
Sample Size = 32574
• We see that Z = 2.5632. Both one-sided and 2-sided p-values are
significant. Since Z > 0, we conclude that β > 0.
We can confirm the relationship:
Z2 =n
n− 1M2.
Slide 96
CHAPTER 2 ST 544, D. Zhang
• For a 2× J table where X is nominal or ordinal variable, Y is an
ordinal variable with data {nij}’s from a multinomial sampling or
product-multinomial sampling on X
Y
v1 v2 · · · vJ
X 1 n11 n12 · · · n1J
2 n21 n22 · · · n2J
We have a situation similar to two sample t-test for comparing means of Y scores b/w X = 1 and X = 2. It can be shown that t2 ≈ M2 (M2 will be independent of the score choice for X).
If we use mid-ranks as the scores for Y , M2 is same as Mann-Whitney test.
Slide 97
CHAPTER 2 ST 544, D. Zhang
IV.3 Tests for nominal-ordinal tables
• X – nominal, Y – ordinal with data from multinomial sampling or
product-multinomial sampling on X such as:
Y
v1 v2 v3
1 n11 n12 n13 n1+
X 2 n21 n22 n23 n2+
3 n31 n32 n33 n3+
• H0 : X ⊥ Y⇓Cond. dists. of Y are same across levels of X⇓Mean scores of Y at X = i are same across levels of X
• This is an ANOVA problem.
Slide 98
CHAPTER 2 ST 544, D. Zhang
• We can use the ANOVA F -test to test X ⊥ Y :
F =SST/(I − 1)
SSE/(n− I)
H0∼ FI−1,n−I
• Equivalently (for large n), we can useχ2 =
SST
SSE∗/(n− 1)
H0∼ χ2I−1
where SSE∗ is the modified sum of squares of errors.
The test χ2 is called cmh2 by SAS:
proc freq;weight count;tables x*y / cmh2;
run;
Slide 99
SST: Sum of Square of TreatmentSSE: Sum of Square of Error
CHAPTER 2 ST 544, D. Zhang
V. Exact Inference for Sparse Tables
V.1 Fisher’s exact test for 2× 2 tables
• X,Y – 2 level cat. variables with structure
Y
1 2
X 1 π11 π12
2 π21 π22
• Want to test H0 : X ⊥ Y given data, WLOG, assuming from a
multinomial sampling:
Y
1 2
X 1 n11 n12
2 n21 n22
Slide 100
CHAPTER 2 ST 544, D. Zhang
• When {nij}’s are large, we can use the Pearson χ2 or LRT G2 to test
H0 : X ⊥ Y .
• However, when some cell counts {nij}’s are small, the exact dist. of
χ2 or LRT G2 under H0 may be far from χ21, =⇒ use of asym. dist
may give wrong conclusions.
• Fisher’s tea example: Fisher’s colleague, Muriel Bristol claimed she
could tell whether or not tea (or milk) was added to the cup first.
Muriel’s Guess
Milk Tea
True Milk 3 1 4
Tea 1 3 4
4 4
Slide 101
CHAPTER 2 ST 544, D. Zhang
• By the design of Fisher’s tea example, Pearson χ2 or G2 can at most
take 5 different values (there are only 5 possible different tables).
Therefore, the χ21 approximate dist. of χ2 or G2 is very poor!
• Even if we assumed multinomial sampling, there would only be(8+3
3
)= 165 tables. Moreever, nij ’s are small. The χ2
1 approximation
of Pearson χ2 or G2 will still be very poor.
• Let us develop an exact test for testing H0 : X ⊥ Y in these kind of
sparse 2× 2 tables.
• Let us assume multinomial sampling and would like to test
H0 : θ = 1(X ⊥ Y ) v.s. one-sided alternative Ha : θ > 1.
Slide 102
CHAPTER 2 ST 544, D. Zhang
• With multinomial sampling, (n11, n12, n21, n22) are random variables
(only the sum n = n++ is fixed).
• Under H0 : θ = 1(X ⊥ Y ), πij = πi+π+j , there are two unknown
π1+, π+1 parameters. So the distribution of data (n11, n12, n21, n22) is
unknown even under H0.
• It can be shown that under H0 : θ = 1(X ⊥ Y ), the conditional
distribution of n11|n1+, n+1 is totally known:
P [n11 = t0] =
(n1+
t0
)(n2+
n+1−t0
)(nn+1
) .
where t0 is the observed value of n11. This is a hyper-geometric
distribution.
Slide 103
CHAPTER 2 ST 544, D. Zhang
V.2 P-values of Fisher’s exact tests:
Y
1 2
X 1 n11 n12 n1+
2 n21 n22 n1+
n+1 n+2 n
• Simple algebra shows
θ =n11n22
n12n21=
n11(n+2 − n1+ + n11)
(n1+ − n11)(n+1 − n11)↗ n11
=⇒ larger θ ⇔ larger n11
=⇒ We should reject H0 in favor of H1 when n11 is large.
=⇒ P-value = P [n11 ≥ t0|n1+, n+1, H0] – one-sided Fisher’s exact
test.
Slide 104
CHAPTER 2 ST 544, D. Zhang
• For Fisher’s tea example, one-sided p-value is:
P-value = P [n11 ≥ 3|n1+, n+1, H0]
= P [n11 = 3|n1+, n+1, H0] + P [n11 = 4|n1+, n+1, H0]
=
(43
)(41
)(84
) +
(44
)(40
)(84
) = 0.229 + 0.014 = 0.243
Mid P-value = 0.229/2 + 0.014 = 0.129.
Note: In this example, n1+, n+1 are naturally fixed.
Slide 105
CHAPTER 2 ST 544, D. Zhang
• Two-sided Fisher’s exact test: H0 : θ = 1(X ⊥ Y ) v.s. two-sided
alternative Ha : θ 6= 1.
Table n11 = 0 n11 = 1 n11 = 2 n11=3 n11 = 4
Prob 0.014 0.229 0.514 0.229 0.014
• P-value of two-sided Fisher’s exact test:
P-value =∑
P (n11)I{P (n11) ≤ P (t0)}
= sum of table probs that are ≤ observed table prob.
p-value = P [n11 = 0] + P [n11 = 1] + P [n11 = 3] + P [n11 = 4] =
0.014 + 0.229 + 0.229 + 0.014 = 0.486.
Slide 106
CHAPTER 2 ST 544, D. Zhang
• SAS program & output for Fisher’s exact test:data table2_8;input pour $ guess $ count @@;datalines;milk milk 3 milk tea 1tea milk 1 tea tea 3
;
title "Analysis of Fisher’s tea data";proc freq data=table2_8;
weight count;tables pour*guess / norow nocol nopercent chisq;exact fisher or;
run;
The FREQ Procedure
Table of pour by guess
pour guess
Frequency|milk |tea | Total---------+--------+--------+milk | 3 | 1 | 4---------+--------+--------+tea | 1 | 3 | 4---------+--------+--------+Total 4 4 8
Statistics for Table of pour by guess
Statistic DF Value Prob------------------------------------------------------Chi-Square 1 2.0000 0.1573Likelihood Ratio Chi-Square 1 2.0930 0.1480
Slide 107
CHAPTER 2 ST 544, D. Zhang
Fisher’s Exact Test----------------------------------Cell (1,1) Frequency (F) 3Left-sided Pr <= F 0.9857Right-sided Pr >= F 0.2429
Table Probability (P) 0.2286Two-sided Pr <= P 0.4857
Odds Ratio-----------------------------------Odds Ratio 9.0000
Asymptotic Conf Limits95% Lower Conf Limit 0.366695% Upper Conf Limit 220.9270
Exact Conf Limits95% Lower Conf Limit 0.211795% Upper Conf Limit 626.2435
Sample Size = 8
Note: We can also obtain an exact CI for the true θ.
Slide 108
CHAPTER 2 ST 544, D. Zhang
V.3 Fisher’s exact tests can be conservative
• For the Fisher’s tea example, the exact null distribution of
n11|n1+, n+1:
Table n11 = 0 n11 = 1 n11 = 2 n11=3 n11 = 4
Prob 0.014 0.229 0.514 0.229 0.014
• If we would like to construct a one-sided test at significance level 0.05
(target type I error prob), then we would only reject H0 : θ = 1 in favor
of Ha : θ > 1 when n11 = 4. Therefore, the actual type I error prob is
P [n11 = 4|H0, n1+, n+1] = 0.014 < 0.05.
So the test is very conservative!
Slide 109
CHAPTER 2 ST 544, D. Zhang
VI Association in Three-Way Tables
• X, Y – 2 categorical variables
The X, Y (marginal) association may not reflect a Causal relation.
Need to adjust a 3rd variable Z, confounding variable (related to both
X, Y )
For example,
X = second hand smoking
Y = lung cancer
Z = age, may be related to X and Y
Lung Cancer
Yes No
Second Hand Smoking Yes π11 π12
No π21 π22
Slide 110
For Chap2, skipped from here...
CHAPTER 2 ST 544, D. Zhang
VI.1 Partial tables, conditional and marginal associations
• With 3 categorical variables X,Y and Z, at each level of Z, there is an
XY tables. Together, they form partial tables.
• Each partial table provides information on conditional associations
between X and Y given Z = k.
• When collapsing partial tables over Z, we get a 2-way XY (marginal)
table. This table provides information of marginal association between
X and Y .
• We need to be aware that the conditional associations and marginal
association may be different!
Slide 111
CHAPTER 2 ST 544, D. Zhang
• Death penalty example (Table 2.10). Data from Florida, 1976-1987.
X = defendant’s’ race (W, B), Y = death penalty (Yes, No).
Y – Death Penalty
Yes No
X – Race W 53 430
B 15 176
Death penalty rate for W = π1 = 5353+430 = 0.11
Death penalty rate for B = π2 = 1515+176 = 0.079
ψ = 1.39, θ =53× 176
430× 15= 1.45
⇒ White defendants are (40%) more likely to receive a death penalty
than black defendants.
• Maybe the race of victims (Z) affects the XY association?
Slide 112
CHAPTER 2 ST 544, D. Zhang
When Z = White, XY table is
Y – Death Penalty
Yes No
X – Race W 53 414 π1 = 11.3%
B 11 37 π2 = 22.9%
When Z = Black, XY table is
Y – Death Penalty
Yes No
X – Race W 0 16 π1 = 0%
B 4 139 π2 = 2.8%
• We see that the conditional associations and the marginal association
between X and Y have different directions! This phenomenon is called
Simpson’s paradox.
Slide 113
CHAPTER 2 ST 544, D. Zhang
• Reasons causing Simpson’s paradox:
Z is related to both X and Y .
1. More white victims than black victims.
2. Given Z =white, defendants (X) are about 90% likely to be white
3. Given Z =black, defendants (X) are only about 10% likely to be
white.
4. More white defendants received death penalty (X,Y are related).
Slide 114
CHAPTER 2 ST 544, D. Zhang
VI.2 Conditional and marginal odds ratios
• When we have 2× 2×K tables for X,Y and Z, At Z = k, observed
table for XY is
Y
1 2
X 1 n11k n12k
2 n21k n22k
Then we have K conditional odds ratios that estimate the conditional
associations between X and Y at Z = k
θXY (k) =n11kn22k
n12kn21k.
Slide 115
CHAPTER 2 ST 544, D. Zhang
The marginal XY table is
Y
1 2
X 1 n11+ n12+
2 n21+ n22+
The marginal odds-ratio estimates the marginal association between X
and Y :
θXY =n11+n22+
n12+n21+.
Slide 116
CHAPTER 2 ST 544, D. Zhang
• For the death penalty example,
θXY = 1.45
θXY (1) =53× 37
11× 414= 0.43
θXY (2) =0× 139
4× 16= 0
θmodXY (2) =
0.5× 139.5
4.5× 16.5= 0.94
Slide 117
CHAPTER 2 ST 544, D. Zhang
VI.3 Conditional and marginal independence
• If X and Y are independent at any level of Z, then X and Y are
called conditionally independent given Z.
If X,Y are 2-level variables, then X and Y conditionally independent
⇔ θXY (k) = 1, k = 1, 2, ...,K.
• X,Y marginally independent if X, Y are independent.
If X,Y are 2-level variables, then X and Y marginally independent ⇔θXY = 1.
Slide 118
CHAPTER 2 ST 544, D. Zhang
• Example: Conditional independence 6 ⇒ marginal independence.
Y
S F
X A 18 12
B 12 8
θXY (1) = 1 A = B
Y
S F
X A 2 8
B 8 32
θXY (2) = 1 A = B
Marginally,
Y
S F
X A 20 20
B 20 40
θXY = 2 ⇒ A > B
Slide 119
CHAPTER 2 ST 544, D. Zhang
• Example: Marginal independence 6 ⇒ conditional independence
Y
S F
X A 4 1
B 9 6
θXY (1) = 8/3
Y
S F
X A 6 9
B 1 4
θXY (2) = 8/3
Marginally,
Y
S F
X A 10 10
B 10 10
θXY = 1 ⇒ A = B
Slide 120
CHAPTER 2 ST 544, D. Zhang
VI.4 Homogeneous association
• Assume X,Y are 2-level variables.
Homogeneous association (in terms of θ) – no interaction
m
θXY (1) = θXY (2) = · · · = θXY (K)
When θXY (k) are not all the same, Z is called an effect modifier (there
is interaction).
• Note: Under homogeneous association, we cannot claim
θXY = θXY (1) = θXY (2) = · · · = θXY (K).
See previous examples.
Slide 121