chapter 2 st 544, d. zhang 2 contingency tables

CHAPTER 2 ST 544, D. Zhang

2 Contingency Tables

I. Probability Structure of a 2-way Contingency Table

I.1 Contingency tables

• X,Y :– cat. var. Y− usually random (except in a case-control study),

response; X− can be random or fixed, usually acts like a covariate. X

has I levels, Y has J levels.

• A contingency table for X,Y is an I × J table filled with data.

• For example,

Y

1 2 3

X 1 n11 n12 n13

2 n21 n22 n23

Y

1 2

X 1 n11 n12

2 n21 n22

3 n31 n32

Slide 40

chenc

Highlight

chenc

Highlight


• For example, from a random sample of n = 1127 Americans, we have

the following contingency table:

Table 2.1. Cross classification of Belief in Afterlife by gender

Belief in afterlife

Yes No/Undecided

Gender Female 509 116

Male 398 104

• With a contingency table for X,Y , we would like to understand the

association between X and Y , the underlying probability structure of

the table, etc.

• For example, for the afterlife table, we would like to see if one gender

is more likely to believe in afterlife, or the overall proportion with belief

in afterlife in the population, etc.

Slide 41


I.2 Sampling schemes, types of studies, probability structure

• Sampling schemes - ways to get data (tables):

1. Multinomial sampling: From the population, we obtain a random

sample, then cross classify individuals to table cells.

? An example on belief in afterlife from n = 1127 Americans

Table 2.1. Cross classification of Belief in Afterlife by gender

Belief in afterlife

Yes No/Undecided

Gender Female 509 116

Male 398 104

? This is an example of Multinomial sampling.

? The study using this sampling method is called across-sectional study

Slide 42

625

502

Total

chenc

Highlight


? In general, a 2× 2 table from multinomial sampling

Y

1 2

X 1 n11 n12 n1+

2 n21 n22 n2+

n+1 n+2 n

where (n11, n12, n21, n22) are random variables that have a

multinomial distribution with sample size n

(n = n11 + n12 + n21 + n22) and probabilities

Y

1 2

X 1 π11 π12

2 π21 π22

(π11, π12, π21, π22) define the probability structure of the

contingency table.

Slide 43

The standard statistical model underlying analysis of contingency tables is to assume that (unconditional on the total count) the cell counts are independent Poisson random variables.

Once you impose a total cell count for the contingency table, or a row or column count, the resulting conditional distributions of the cell counts then become multinomial.

https://stats.stackexchange.com/questions/45479/pearsons-residuals


? πij ’s can be estimated by pij = nij/n.

? With multinomial sampling, we can estimate many relevant

quantities:

P [Y = 1] =n11 + n21

n=n+1

n

P [X = 1] =n11 + n12

n=n1+

n

P [Y = 1|X = 1] =n11

n11 + n12=n11

n1+

P [X = 1|Y = 1] =n11

n11 + n21=n11

n+1...

? For afterlife example, we estimated that

P [belief in afterlife] =509 + 398

1127= 80%

P [belief in afterlife|Female] =509

509 + 116= 81%

P [belief in afterlife|Male] =398

398 + 104= 79%...

Slide 44

907 220 1,127Total

1. Find joint prob;2. Find marginal prob;3. Find Conditional prob.

chenc

Highlight


2. Product-multinomial sampling on X: For example, in a clinical

trial for heart disease, we randomly assign 200 patients to

treatment 1 and 100 patients to treatment 2 and may obtain

potential data like the following:

Y

Better No Change Worse

Treatment 1 n11 n12 n13 200

Treatment 2 n21 n22 n23 100

Here we have

(n11, n12, n13) ⊥ (n21, n22, n23)

(n11, n12, n13) ∼ multinomial(200, (π1, π2, π3)), π1 + π2 + π3 = 1

(n21, n22, n23) ∼ multinomial(100, (τ1, τ2, τ3)), τ1 + τ2 + τ3 = 1

(π1, π2, π3) and (τ1, τ2, τ3) define the probability structure of this

contingency table.

Slide 45

chenc

Highlight


? In general, the data looks like

Y

1 2 3

X 1 n11 n12 n13 n1+

2 n21 n22 n23 n2+

where n1+ and n2+, the sample sizes for X = 1 and X = 2, are

fixed.

(n11, n12, n13) ⊥ (n21, n22, n23)

(n11, n12, n13) ∼ multinom(n1+, (π1, π2, π3)), π1 + π2 + π3 = 1

(n21, n22, n23) ∼ multinom(n2+, (τ1, τ2, τ3)), τ1 + τ2 + τ3 = 1

? Since the likelihood of π’s and τ ’s is the product of the likelihood

of π’s and the likelihood of τ ’s, this sampling scheme is called

product-multinomial sampling on X.

? Clinical trials, cohort studies (prospective studies) all use this

sampling scheme.

Slide 46

Prospective study: Participants are enrolled into the study before they develop the disease or outcome in question.

chenc

Highlight

chenc

Highlight


? When X is also random (so has a distribution in the population),

(π1, π2, π3)’s defines the conditional distribution of Y given

X = 1

(τ1, τ2, τ3)’s defines the conditional distribution of Y given

X = 2.

? With product-multinomial sampling on X, we can only estimate

conditional probabilities of Y |X = x. Other probabilities are not

estimable. For example, we cannot estimate P [Y = 1].

Slide 47

chenc

Highlight

chenc

Highlight


3. Product multinomial sampling on Y:

If Y represents a rare event, then a prospective study is inefficient.

For example, if we would like to investigate the association between

smoking and lung cancer and conduct a prospective study

Lung Cancer

Yes No

Smoking Yes n11 n12 n1+

No n21 n22 n2+

then n11, n21 will be small unless n1+ and n2+ are very large.

This will yield an inefficient study.

Slide 48

chenc

Highlight

chenc

Highlight

chenc

Highlight


? We may consider a design such as the following one:

Lung Cancer

Yes No

Smoking Yes n11 n12

No n21 n22

n+1 = 100 n+2 = 200

All cell counts will not be small ⇒ efficient.

n11 ⊥ n12

n11 ∼ Bin(n+1, π1), π1 = P [smoking|case].

n12 ∼ Bin(n+2, π2), π2 = P [smoking|control].

? We can still investigate the association between smoking and

lung cancer using this design.

? This sampling scheme is product-multinomial on Y .

? The study is often called the case-control study.

Slide 49

chenc

Highlight

chenc

Highlight

chenc

Highlight

chenc

Highlight


? In general,

Lung Cancer

Yes No

Smoking Yes n11 n12

No n21 n22

n+1 n+2

where n+1, n+2, are all fixed.

n11 ⊥ n12

n11 ∼ Bin(n+1, π1), π1 = P [smoking|case].

n12 ∼ Bin(n+2, π2), π2 = P [smoking|control].

Slide 50

n11

n11 + n21 π1=

π2=n12

n12 + n22

chenc

Highlight

chenc

Highlight

chenc

Highlight


? Example of a case-control study on MI (Table 2.4)

Table 2.4. Case-Control Study on MI

Myocardial Infarction

Case Control

Ever Smoker Yes 172 173

No 90 346

262 519

where 262 is the sample size for MI cases, 519 is the sample size

for controls.

? From this study, we cannot estimate the quantities such as

P [MI]

P [Ever Smoking]

P [MI|Ever smokers]

P [MI|Never smokers] ...

Slide 51

chenc

Highlight


• Note: Multinomial sampling ⇒ product-multinomial sampling.

For example, if we have data from a multinomial sampling with sample

size n:

Y

1 2

X 1 n11 n12

2 n21 n22

Y

1 2

X 1 π11 π12

2 π21 π22

Then we can view the data from product-multinomial sampling on X

or product-multinomial sampling on Y.

That is:

n11|n1+ ∼ Bin(n1+,π11

π11+π12 π) ⊥ n21|n2+ ∼ Bin(n2+, 21

π21+π22

)

Or

n11|n+1 ∼ Bin(n+1,π11

π11+π21 π) ⊥ n12|n+2 ∼ Bin(n+2, 12

π12+π22

)

Slide 52

chenc

Highlight

chenc

Highlight

chenc

Highlight


I.3 Sensitivity & Specificity in Diagnostic Tests

• In a diagnostic test, X = true disease status, Y = test result. Then we

can form a 2× 2 table:

Y

Positive Negative

X Disease

No Disease

• Using data from multinomial sampling or product-multinomial

sampling on X, we can estimate

Sensitivity = P [Y = Positive|X = Disease] (True positive rate)

Specificity = P [Y = Negative|X = No disease] (True negative rate)

• 1-Sensitivity = False negative rate, 1-Specificity = False positive rate.

These two quantities tell us how accurate a test/device is.

Manufacturer of a test device usually provides these two measures.

Slide 53

Q: Find sensitivity and specificity.

The higher the sensitivity and specificity, the better the diagnostic test.

chenc

Highlight

chenc

Highlight

chenc

Highlight

chenc

Highlight


• However, a customer (or potential patient) may be more interested in

the following quantities:

P [X = Disease|Y = Positive] (PV+)

P [X = No disease|Y = Negative] (PV-)

• An accurate test may not yield high PV+ and/or PV-.

For example, assume a mammogram (for breast cancer) has

sensitivity=0.86 and specificity=0.88. If P [breast cancer]=0.01. Then

PV+ = P [X = BR|Y = +] =P [X = BR, Y = +]

P [Y = +]

=P [Y = +|X = BR]P [X = BR]

P [Y = +|X = BR]P [X = BR] + P [Y = +|X = No BR]P [X = No BR]

=0.86× 0.01

0.86× 0.01 + (1− 0.88)× (1− 0.01)= 6.8%

Similarly, PV- = 99.8% (without the test, P[No BR]=0.99).

Slide 54

Positive Predictive Value (PV+) is the probability of disease in an individual with a positive test result. Negative Predictive Value (PV - ) is the probability of not having the disease when the test result is negative.

chenc

Highlight


I.4 Independence of X and Y

• X and Y are random with the underlying probability structure

Y

1 2 J

X 1 π11 π12 . π1J

2 π21 π22 . π2J

. . . . .

I πI1 πI2 . πIJ

• X ⊥ Y

⇔ P [X = i , Y = j ] = P [X = i ]*P [ Y = j ] f or i = 1, 2, . .., I , j = 1, 2, . .., J.⇔ πij = πi+π+j f or i = 1, 2, . .., I , j = 1, 2, . .., J.(πi+ = πi1 + πi2 + . .. + πiJ , π+j = π1j + π2j + . .. + πIj )⇔ P [ Y = j |X = i ] = P [ Y = j |X = k] f or all i , j, k.

Slide 55

chenc

Highlight

chenc

Highlight


• When X and Y are random 2-level cat. variables, the underlying

probability structure is

Y

1 2

X 1 π11 π12

2 π21 π22

• X ⊥ Y⇔ πij = πi+π+j for i, j = 1, 2 (πi+ = πi1 + πi2, π+j = π1j + π2j)

We only need one of them, e.g. π11 = π1+π+1

⇔ P [Y = 1|X = 1] = P [Y = 1|X = 2], i.e.

π1 =π11

π1+=π21

π2+= π2

Slide 56

Note that


II Comparing Proportions in 2× 2 Tables

II.1 Difference of proportions

• Given data from a multinomial sampling or product-multinomial

sampling on X

Y

1 2

X 1 n11 n12 n1+

2 n21 n22 n2+

we would like to make inference on π1 − π2 where

π1 = P [Y = 1|X = 1] is the success probability for row 1 and

π2 = P [Y = 1|X = 2] is the the success probability for row 2.

• X ⊥ Y ⇔ π1 − π2 = 0.

Slide 57

Recall:

chenc

Highlight

chenc

Highlight


1. Estimate of π1 − π2:

p1 − p2 =n11

n1+− n21

n2+.

2. Estimated SE (standard error) of p1 − p2:

SE(p1 − p2) =√p1(1− p1)/n1+ + p2(1− p2)/n2+

3. Large-sample (1− α) CI for π1 − π2:

p1 − p2 ± zα/2SE(p1 − p2).

If this CI does not contain 0, we can reject H0 : X ⊥ Y at

significance level α.

Slide 58

Recall:

Critical value:Zα/2 = qnorm(0.975)=1.959964

chenc

Highlight

chenc

Highlight


• Example: Aspirin and heart attack.

In a 5-yr study, 22,000+ physicians were randomized (blinded) to the

placebo/aspirin (one tablet every other day) group:

Myocardial infarction

Yes No

Treatment Placebo 189 10, 845 11,034

Aspirin 104 10,933 11,037

1. Difference of MI probabilities between placebo and aspirin groups:

p1 − p2 = 189/11034− 104/11037 = 0.0171− 0.0094 = 0.0077.

2. SE =√

0.0171(1− 0.0171)/11034 + 0.0094(1− 0.0094)/11037 =

0.0015.

3. Large sample 95% CI of Difference of MI probabilities:

0.0077± 1.96× 0.0015 = [0.0048, 0.0106].

⇒ Physicians in placebo group are more likely to develop MI.Slide 59

(on X)


chenc

Highlight

chenc

Highlight


II.2 Relative Risk

• When both π1 and π2 are close to zero (rare event), the difference

π1 − π2 may not be very meaningful.

For example,

Case 1: π1 = 0.01, π2 = 0.001⇒ π1 − π2 = 0.009

Case 2: π1 = 0.41, π2 = 0.401⇒ π1 − π2 = 0.009

The above cases have the same difference π1 − π2. However, the

meanings are totally different.

• For rare events, a more relevant measure for difference is the relative

risk (RR):

RR =π1

π2.

Slide 60

For example:(a) RR=0.01/0.001=10;(b) RR=0.41/.401 = 1.022444.

chenc

Highlight

chenc

Highlight


• Properties of the relative risk (RR):

1. 0 < RR <∞2. π1 > π2 ⇔ RR > 1;

π1 = π2 ⇔ RR = 1;

π1 < π2 ⇔ RR < 1.

3. X ⊥ Y ⇔ RR = 1.

• Estimate of RR: Given the 2× 2 table from multinomial sampling or

product-multinomial sampling on X, RR can be estimated by

RR =p1

p2.

Slide 61

Recall:

• X ⊥ Y ⇔ π1 − π2 = 0.

RR =π1

π2.

chenc

Highlight


• RR also has a nice interpretation. For the Aspirin Study, the RR

estimate is

RR =p1

p2=

0.0171

0.0094= 1.82.

⇒ Physicians receiving the placebo are 82% more likely to develop MI

(over 5 yrs) than physicians receiving aspirin.

• SE and CI for RR are complicated, Proc Freq calculates CI for RR

and other measures:data table2_3;

input group $ mi $ count @@;datalines;placebo yes 189 placebo no 10845aspirin yes 104 aspirin no 10933

;

title "Analysis of MI data";proc freq data=table2_3 order=data;

weight count;tables group*mi / norow nocol nopercent or;

run;

Slide 62

chenc

Highlight


Output from the above SAS program:The FREQ Procedure

Table of group by mi

group mi

Frequency|yes |no | Total---------+--------+--------+placebo | 189 | 10845 | 11034---------+--------+--------+aspirin | 104 | 10933 | 11037---------+--------+--------+Total 293 21778 22071

Statistics for Table of group by mi Odds Ratio and Relative Risks

Statistic Value 95% Confidence Limits------------------------------------------------------------------Odds Ratio 1.8321 1.4400 2.3308Relative Risk (Column 1) 1.8178 1.4330 2.3059Relative Risk (Column 2) 0.9922 0.9892 0.9953

Sample Size = 22071

A 95% CI for RR is [1.43, 2.31]. We are 95% sure that physicians receiving the placebo is at least 43% and at most 131% more likely to develop MI (over 5 yrs) than physicians receiving aspirin.

Slide 63

The sample relative risk has a sampling distribution that is highly skewed unless the sample sizes are quite large. Because of this, its confidence interval formula is rather complex.

chenc

Highlight


II.3 Odds Ratio

• Odds of a prob w (of an event): if π = P (A), then

ω =π

1− π=

success prob

failure prob

is called the odds of π (or of the event A). 0 < ω <∞.

For example, π = 0.75, then ω = 0.75/(1− 0.75) = 3.

For a rare event (π ≈ 0), π ≈ ω.

• The event prob π is related to odds ω as:

π =ω

1 + ω.

For example, ω = 4, then π = 4/(1 + 4) = 0.8.

Slide 64

When odds = 3.0, we expect to observe three successes for every one failure

chenc

Highlight

chenc

Highlight


• For the 2× 2 table

Y

1 2

X 1

2

the odds ratio between row 1 (π1 = P [Y = 1|X = 1]) and row 2

(π2 = P [Y = 1|X = 2]) is defined as

θ =odds1

odds2=π1/(1− π1)

π2/(1− π2).

• Properties of the odds ratio

1. 0 < θ < ∞.

2. π1 > π2 ⇔ θ > 1;π1 = π2 ⇔ θ = 1;π1 < π2 ⇔ θ < 1;

3. X ⊥ Y ⇔ θ = 1.Slide 65

Values of θ farther from 1.0 in a given direction represent a stronger association.

When θ = 0.25, for example, the odds of success in row 1 are 0.25 times the odds of success in row 2, or equivalently 1/0.25 = 4.0 times as high in row 2 as in row 1.

chenc

Highlight

chenc

Highlight


• Given the 2× 2 table from multinomial sampling or

product-multinomial sampling on X:

Y

1 2

X 1 n11 n12 n1+

2 n21 n22 n2+

odds ratio θ can be estimated by

θ =p1/(1− p1)

p2/(1− p2)=n11/n1+/(1− n11/n1+)

n21/n2+/(1− n21/n2+)=n11/n12

n21/n22=n11n22

n12n21,

• var(log θ) can be estimated by

var(log θ) =1

n11+

1

n12+

1

n21+

1

n22.

Slide 66

Q: 95% CI for θ

chenc

Highlight

chenc

Highlight

chenc

Highlight


• We can construct a (1− α) CI for true θ as follows:

1. Get (1− α) CI for log(θ):

log θ ± zα/2SE(log θ).

2. Exponentiate both ends to get the CI for θ.

• For the Aspirin Study,

θ = 189×1093310845×104 = 1.8321(≈ RR)

var(log θ) = 1189 + 1

10845 + 1104 + 1

10933 = 0.01509

95%CI for log θ: log(1.8321)± 1.96√

0.01509 = [0.3647, 0.8462].

95% CI for θ : [e0.3647, e0.8462] = [1.44, 2.33].

Slide 67

Recall:

The estimated odds of MI for those takingplacebo equal 1.83 times the estimated odds for those taking aspirin. The estimated oddswere 83% higher for the placebo group.

We estimate that the odds of MIare at least 44% higher when taking placebo than when taking aspirin.


chenc

Highlight


• Note 1: If we have multinomial sampling:

Y

1 2

X 1 n11 n12

2 n21 n22

Y

1 2

X 1 π11 π12

2 π21 π22

the odds ratio θ can be also defined as

θ =π11π22

π12π21.

MLE of πij ’s are πij = nij/n ⇒ the same estimate of θ:

θ =π11π22

π12π21=n11n22

n12n21.

• Note 2: If some of nij ’s are small, add 0.5 to each cell then

re-calculate θ and var(log θ), e.g.

θ =(n11 + 0.5)(n22 + 0.5)

(n12 + 0.5)(n21 + 0.5)

Slide 68

chenc

Highlight

chenc

Highlight


• The relationship between θ and RR:

θ =π1/(1− π1)

π2/(1− π2)=π1

π2× (1− π2)

(1− π1)= RR× (1− π2)

(1− π1)

1. RR = 1⇔ θ = 1⇔ X ⊥ Y .

2. π1 > π2 ⇔ θ > RR > 1.

3. π1 < π2 ⇔ θ < RR < 1.

4. When π1 ≈ 0 & π2 ≈ 0 (rare events), θ ≈ RR.

0

-

θ RR 1 RR θ

Slide 69

chenc

Highlight


• The odds ratio for case-control studies:

? For the MI study (page 32)

Table 2.4. Case-Control Study on MI

Myocardial Infarction

Case Control

Ever Smoker Yes 172 173

No 90 346

262 519

we know that we cannot estimate π1 = P [MI|Eversmokers] and

π2 = P [MI|Neversmokers], and hence cannot estimate

RR =π1. π2

? However, we still want to assess the association between smoking and MI.

Slide 70

τ1 = P [Ever smoking|MI Case] τ2 = P [Ever smoking|MI Control]

chenc

Highlight

chenc

Highlight


? From the design, we can estimate

τ1 = P [Ever smoking|MI Case] : τ1 = 172/262 = 0.6565

τ2 = P [Ever smoking|MI Control] : τ2 = 172/262 = 0.3333

and the odds ratio between τ1 and τ2

θ∗ =τ1/(1− τ1)

τ2/(1− τ2): θ∗ =

τ1/(1− τ1)

τ2/(1− τ2)=n11n22

n12n21= 3.82.

? It can be shown that

θ∗ =π1/(1− π1)

π2/(1− π2)= θ

So we can use a case-control study to make inference on θ!

? The formula for var(log θ) is the same:

var(log θ) =1

n11+

1

n12+

1

n21+

1

n22.

Slide 71

chenc

Highlight


? Therefore, for the Aspirin case-control study, the odds ratio of

developing MI between ever smokers and never smokers is

estimated as

θ = 3.82.

var(log θ) =1

172+

1

173+

1

90+

1

346= 0.0256.

95% CI for log θ:

log(3.82)± 1.96×√

0.0256 = [1.02665, 1.65385]

95% CI for θ: [e1.02665, e1.65385] = [2.79, 5.227].

• Since MI is a rare event, RR ≈ θ, so

RR ≈ 3.82 ≈ 4.

That is, ever smokers is about 3 times more likely

to develop MI than never smokers.

Slide 72

We estimate that the odds of MI are at least 179% higher when taking placebo than when taking aspirin.

chenc

Highlight


III χ2 Test for Independence between X and Y (nominal)

Suppose X and Y are random and have the prob structure:

Y

1 2 J

X 1 π11 π12 . π1J

2 π21 π22 . π2J

. . . . .

I πI1 πI2 . πIJ

Given data {nij}’s from a multinomial sampling, we would like to test

H0 : πij = πij(θ), for i = 1, .., I, and j = 1, ..., J , where θ is a parameter

vector with dim(θ) = k.

If dim(θ) = 0, then πij ’s are totally known under H0.

Slide 73

https://academo.org/demos/dice-roll-statistics/

Consider the null hypothesis (H0) that cell probabilities in a two-way contingency table equal certain fixed values {πij}. For a sample of size n with cell counts {nij}, the values {μij = nπij} are called expected frequencies. They represent the expected values {E(nij)} when H0 is true. To judge whether the data contradict H0, we compare {nij} to {μij}. If H0 is true, nij should be close to μij in each cell.

chenc

Highlight

chenc

Highlight

https://academo.org/demos/dice-roll-statistics/


III.1 General Pearson χ2 test and LRT

• MLE θ of θ under H0; µij = nπij(θ), where n = n++.

• If H0 is true and n is large such as µij ’s are reasonably large (µij ≥ 5),

then the Pearson stat

χ2 =∑

all cells

(nij − µij)2

µij

H0∼ χ2df

where df = IJ − 1− dim(θ).

Reject H0 at level α if χ2 ≥ χ2df,α.

• LRT

G2 = 2∑

all cells

nij log

(nijµij

)H0∼ χ2

df .

• Calculation of df :

df = [# of unknown parameters under H 1 ∪ H 0 ] − [# of unknown parameters under H 0].

Slide 74

For testing independence in r × c contingency tables, the approximate chi-squared sampling distributions of X2 and G2 have df = (r − 1)(c − 1).

The df value means: under H0, {πi+} and {π+j} determines the cell prob. There are r − 1 non-redundant row prob. Because they sum to 1, the first r − 1 determines the last one through πr+ = 1− (π1+ + · · · + πr−1,+). Similarly, there are c − 1 non-redundant column prob, so, under H0, there are (r − 1) + (c − 1) parameters. Alternative hypothesis Ha states that there is not independence but does not specify a pattern for the rc cell prob. The prob are then solely constrained to sum to 1, so there are rc − 1 non-redundant parameters. Value for df is the difference between the number of parameters under (Ha and H0) and (H0), ordf = (rc − 1) − [(r − 1) + (c − 1)] = rc − r − c + 1 = (r − 1)(c − 1).

chenc

Highlight

chenc

Highlight


Some χ2 distributions

Slide 75


III.2 Test of independence

• X ⊥ Y ⇔ H0 : πij = πi+π+j , i = 1, ..., I, j = 1, ..., J

• The MLE of πi+’s and π+j ’s are

πi+ =ni+n, π+j =

n+j

n

• µij is equal to

µij = nπi+π+j =ni+n+j

n

• Pearson χ2 and LRT :

χ2 =∑

all cells

(nij − µij)2

µij, G2 = 2

∑all cells

nij log

(nijµij

)H0∼ χ2

df

df = IJ − 1− (I − 1 + J − 1) = (I − 1)(J − 1).

Reject H0 : X ⊥ Y if χ2 or G2 ≥ χ2df,α.

Slide 76

Note: For both test statistics, larger values provide stronger evidence against H0

For both test statistics:p-value = 1-pchisq(X2, df)

Q: Find X2 and G2, and then find the p-values.

chenc

Highlight

chenc

Highlight


• Note: With data {nij}’s from a multinomial sampling or

product-multinomial sampling on X, we can test H0 : X ⊥ Y by

testing

H0 : P [Y = j|X = i] = P [Y = j|X = k] for all i, j, k

(cond. dist. of Y given X is the same across all levels of X)

It can be shown that the Pearson χ2 and LRT test stats are the same

with the same null dist χ2(I−1)(J−1).

Slide 77

chenc

Highlight

chenc

Highlight

chenc

Highlight


• Example: Gender gap in party identification

Y –Party Identification

Democrat Independent Republican Total

X – Gender Female 762 327 468 1557

Male 484 239 477 1200

1246 566 945 n = 2757

Then µ11 = 1557× 1246/2757 = 703.7,

µ12 = 1557× 566/2757 = 319.6, etc.

⇒ χ2 =(762− 703.7)2

703.7+

(327− 319.6)2

319.6+ ... = 30.1

G2 = 2(762 log(762/703.7) + 327 log(327/319.6) + ...) = 30.0

χ22,0.05 = 5.99

Both Pearson test and LRT reject H0 : X ⊥ Y at level 0.05.

Note: χ2 ≈ G2 even if H0 is likely not true.

Slide 78

This evidence of association would be rather unusual if the variables were truly independent. Both test statistics suggest that political party ID and gender are associated.

See Chap2 R codes for details


• SAS program for the example:data table2_5;

input gender $ party $ count @@;datalines;female dem 762 female ind 327 female rep 468male dem 484 male ind 239 male rep 477

;

title "Analysis of Party Identification data";proc freq data=table2_5 order=data;

weight count;tables gender*party / norow nocol nopercent chisq expected measures cmh;

run;

• Output from the above program:Analysis of Party Identification data 1

The FREQ Procedure

Table of gender by party

gender party

Frequency|Expected |dem |ind |rep | Total---------+--------+--------+--------+female | 762 | 327 | 468 | 1557

| 703.67 | 319.65 | 533.68 |---------+--------+--------+--------+male | 484 | 239 | 477 | 1200

| 542.33 | 246.35 | 411.32 |---------+--------+--------+--------+Total 1246 566 945 2757

Slide 79


Statistics for Table of gender by party

Statistic DF Value Prob------------------------------------------------------Chi-Square 2 30.0701 <.0001Likelihood Ratio Chi-Square 2 30.0167 <.0001Mantel-Haenszel Chi-Square 1 28.9797 <.0001Phi Coefficient 0.1044Contingency Coefficient 0.1039Cramer’s V 0.1044

Sample Size = 2757

Statistic Value ASE------------------------------------------------------Gamma 0.1710 0.0315Kendall’s Tau-b 0.0964 0.0180Stuart’s Tau-c 0.1078 0.0202

Somers’ D C|R 0.1097 0.0205Somers’ D R|C 0.0848 0.0158

Pearson Correlation 0.1025 0.0190Spearman Correlation 0.1016 0.0190

Summary Statistics for gender by party

Cochran-Mantel-Haenszel Statistics (Based on Table Scores)

Statistic Alternative Hypothesis DF Value Prob---------------------------------------------------------------

1 Nonzero Correlation 1 28.9797 <.00012 Row Mean Scores Differ 1 28.9797 <.00013 General Association 2 30.0592 <.0001

Slide 80

chenc

Highlight

chenc

Highlight

chenc

Highlight


III.3 Cell residuals for a contingency table

• Under H0 : X ⊥ Y ,

µij =ni+n+j

n.

• Calculate standardized Pearson residuals:

estij =nij − µij√

µij(1− pi+)(1− p+j).

• Under H0 : X ⊥ Y , E(estij) ≈ 0, var(estij) ≈ 1, and estij behaves like a

N(0, 1) variable.

• We can use estij to check the departure from H0 : X ⊥ Y .

• For the Party Identification example, p1+ = 1557/2757 = 0.565,

p+1 = 1246/2757 = 0.452

⇒ est11=

762− 703.7√703.7(1− 0.565)(1− 0.452)

= 4.50

Slide 81

P-value=2*pnorm(-4.5) = 6.795346e-06

Under H0, we expect about 5% of the standardized residuals to be farther from 0 than ±2 by chance alone.

Q: Find est12

chenc

Highlight


• We can use Proc Genmod of SAS to get the standardized Pearson

residuals:Proc Genmod order=data;

class gender party;model count = gender party / dist=poisson link=log residuals;

run;

• Part of the output:

Std StdRaw Pearson Deviance Deviance Pearson Likelihood

Observation Residual Residual Residual Residual Residual Residual

1 58.328618 2.1988558 2.1694814 4.4419109 4.5020535 4.48777992 7.3547334 0.4113702 0.4098076 0.6967948 0.6994517 0.69853393 -65.68335 -2.84324 -2.904774 -5.430995 -5.315946 -5.349114 -58.32862 -2.504669 -2.551707 -4.586602 -4.502054 -4.5283915 -7.354733 -0.468583 -0.470944 -0.702976 -0.699452 -0.7010366 65.683351 3.2386734 3.157751 5.1831197 5.3159455 5.2670354

The observation order is for row 1, then row 2, etc.

Slide 82

chenc

Highlight

chenc

Highlight


• Put the standardized Pearson residuals in the original table:

Y –Party Identification

Democrat Independent Republican Total

X – Gender Female 4.5 0.7 -5.3

Male -4.5 -0.7 5.3

We see from the table that the independence model does not fit data well.

There are significantly more democrat females (less males) than predicted by

the independence model, there are significantly less republican females (more

males) than predicted by the model.

Slide 83

Under H0, we expect about 5% of the standardized residuals to be farther from 0 than ±2 by chance alone.

chenc

Highlight

chenc

Highlight

chenc

Highlight

chenc

Highlight

chenc

Highlight

chenc

Highlight


IV Testing Independence for Ordinal Data

IV.1 X,Y are both ordinal random cat. variables; Mantel-Haenszel M2

(CMH1)

• Assign scores u1 < u2 < · · · < uI to X and v1 < v2 < · · · < vJ to Y

Y

1(v1) j(vj) J(vJ)

1(u1)

X i(ui) πij

I(uI)

• Want to test H0 : X ⊥ Y given data such as

Slide 84

Let u1 ≤ u2 ≤ · · · ≤ ur denote scores for the rows, and v1 ≤ v2 ≤ · · · ≤ vc denote scores for the columns, having the same ordering as the categories.

chenc

Highlight

chenc

Highlight

chenc

Highlight


Y

v1 v2 v3

u1 2 1 3

X u2 1 2 1

u3 1 1 2

⇒

Patient X Y

1 u1 v1

2 u1 v1

3 u1 v2

4 u1 v3

5 u1 v3

6 u1 v3

7 u2 v1

8 u2 v2

9 u2 v2

10 u2 v3

11 u3 v1

12 u3 v2

13 u3 v3

14 u3 v3

Slide 85

Q: Find X-bar and Y-bar.

Let u1 ≤ u2 ≤ · · · ≤ ur denote scores for the rows, and v1 ≤ v2 ≤ · · · ≤ vc denote scores for columns, having same ordering as categories.


• Pearson correlation coefficient describes linear relationship between X

and Y and can be used to test H0 : X ⊥ Y :

r =1

n−1

∑ni=1(xi − x)(yi − y)√

1n−1

∑ni=1(xi − x)2 1

n−1

∑ni=1(yi − y)2

,

where

x =1

n

n∑i=1

xi =1

n

I∑i=1

ni+ui =I∑i=1

pi+ui = u

y =1

n

n∑i=1

yi =1

n

J∑j=1

n+jvj =

J∑j=1

p+jvj = v

Slide 86

Correlation falls between −1 and +1. Independence between variables implies that its population value ρ = 0. Larger value of |R|,farther data fall fromindependence in lineardimension.

chenc

Highlight


=⇒

r =

∑Ii=1

∑Jj=1 pij(ui − u)(vj − v)√∑I

i=1 pi+(ui − u)2∑Jj=1 p+j(vj − v)2

• It can be shown that under H0 : X ⊥ Y√n − 1 r ∼a N(0, 1)

∼M2 = (n − 1) r2 a χ21

This is the Mantel-Haenszel test for H0 : X ⊥ Y (cmh1 in SAS).

• Note: We don’t have to expand the data to calculate r. Proc Freq

calculates r and M2.

Slide 87

chenc

Highlight

chenc

Highlight


• How to choose scores {ui}’s for X and {vj}’s for Y :

1. Any increasing/decreasing seq is ok for {ui}’s and {vj}’s. They

have to be chosen before analyzing data.

2. Mid-rank. For example,

Y

1 2 3 ui

1 2 1 3 6 3.5

X 2 1 2 1 4 8.5

3 1 1 2 4 12.5

4 4 6

vj 2.5 6.5 11.5Proc Freq order=data

tables x*y/CMH1 Scores=rank;run;

3. The default is “1, 2, · · · , I” for X and “1, 2, · · · , J” for Y in SAS.

Slide 88

chenc

Highlight

chenc

Highlight

chenc

Highlight


• Note 1: M2 only detects “linear trend” between X and Y , Pearson

χ2 and LRT G2 detects any deviation from indep.

• Note 2: Proc corr of SAS uses (as the default)

t = (n− 2)1/2

(r2

1− r2

)1/2

to test H0 : ρ = 0 by comparing t to tn−2. M2 and t2 are asymptotically equivalent under H0.

• From slide 80, M2 = 28.98 using 1,2 for gender and 1,2,3 for party

identification. Reject H0 : X ⊥ Y .

• Note 3: M2 is for a 2-sided test. We can use√n− 1r for a

one-sided test.

From slide 80,√n− 1r =

√28.98 = 5.4 ⇒ reject H0 : X ⊥ Y in

favor of H1 : ρ > 0 (even if r = 0.1).

Slide 89

chenc

Highlight

chenc

Highlight

chenc

Highlight


• Example: Mother’s alcohol consumption and infant malformation(Table 2.7 on p. 42)

Alcohol Malformation

Consumption Present (Y = 1) Absent (Y = 0)

0 48 17, 066

< 1 38 14, 464

1− 2 5 788

3− 5 1 126

≥ 6 1 37

χ2 = 12.1 (p-value = 0.016) , G2 = 6.2 (p-value = 0.185) ⇒ mixed results.

Assigned scores for alcohol consumption: 0, 0.5, 1.5, 4, 7 and 0/1 for absent/present

⇒ r = 0.0142, M2 = 6.6, p-value =P [χ2

1 ≥ M2] = 0.01.

χ2, G2, M2 may not be valid ⇒ Exact test (later).

Slide 90

chenc

Highlight

chenc

Highlight


• SAS program:data table2_7;

input alcohol malform count @@;datalines;0 1 48 0 0 170660.5 1 38 0.5 0 144641.5 1 5 1.5 0 7884 1 1 4 0 1267 1 1 7 0 37

;

title "Analysis of infant malformation data";proc freq data=table2_7;

weight count;tables alcohol*malform / measures chisq cmh;

run;

• Part of the output:Statistics for Table of alcohol by malform

Statistic DF Value Prob------------------------------------------------------Chi-Square 4 12.0821 0.0168Likelihood Ratio Chi-Square 4 6.2020 0.1846Mantel-Haenszel Chi-Square 1 6.5699 0.0104

Statistic Value ASE------------------------------------------------------Pearson Correlation 0.0142 0.0106Spearman Correlation 0.0033 0.0059

Slide 91


IV.2 Trend test for I × 2 and 2× J tables

• For an I × 2 table where X is an I-level ordinal variable and Y is a

2-level variable (such as the infant malformation table) from a

multinomial sampling or product-multinomial sampling on X:

Y

1 0

u1 n11 n12 n1+

X u2 n21 n22 n2+

...

uI nI1 nI2 nI+

we can assign scores to X and any scores (usually 0/1) to Y ⇒ M2.

Slide 92

chenc

Highlight

chenc

Highlight

chenc

Highlight


• The Mantel-Haenszel M2 can be derived in a different way (taken

from Section 3.2.1)

Consider

πi = P [Y = 1|X = ui].

Assume a linear trend model for πi:

πi = α+ βui

Then H0 : X ⊥ Y =⇒ H∗0 : β = 0

An unbiased estimate of πi:

πi =ni1ni+

= pi ← sample proportion at X = ui

The trend model implies the following linear model for pi:

pi = α+ βui + εi,

Slide 93

chenc

Highlight

chenc

Highlight


var(εi) = πi(1 − πi)/ni+, which equals α(1 − α)/ni+ under H0∗ : β = 0

=⇒ WLS (weighted LS, weighted by sample size ni+) estimate of β

β =

∑Ii=1 ni+(ui − u)(pi − p)∑I

i=1 ni+(ui − u)2,

where

u =1

n

I∑i=1

ni+ui ← sample mean of {Xi}

p =n+1

n← pooled sample response rate

var(β) under H0 can be estimated by

varH0(β) =

p(1− p)∑Ii=1 ni+(ui − u)2

.

Slide 94

WLS: weighted least square


For testing H∗0 : β = 0, let’s use Wald test

Z =β√

varH0(β)

Under H0 : X ⊥ Y , Z ∼ N(0, 1) or Z2 ∼ χ21.

• Z2 or Z is the Cochran-Armitage Trend test.

It can be shown that Z2 = nr2. Remember M2 = (n− 1)r2

⇒ Z2 =n

n− 1M2 ≈M2

• SAS program:title "Trend test of infant malformation data";proc freq data=table2_7 order=data;

weight count;tables alcohol*malform / trend;

run;

Slide 95

chenc

Highlight

chenc

Highlight


• Part of the output:Statistics for Table of alcohol by malform

Cochran-Armitage Trend Test--------------------------Statistic (Z) 2.5632One-sided Pr > Z 0.0052Two-sided Pr > |Z| 0.0104

Sample Size = 32574

• We see that Z = 2.5632. Both one-sided and 2-sided p-values are

significant. Since Z > 0, we conclude that β > 0.

We can confirm the relationship:

Z2 =n

n− 1M2.

Slide 96

chenc

Highlight


• For a 2× J table where X is nominal or ordinal variable, Y is an

ordinal variable with data {nij}’s from a multinomial sampling or

product-multinomial sampling on X

Y

v1 v2 · · · vJ

X 1 n11 n12 · · · n1J

2 n21 n22 · · · n2J

We have a situation similar to two sample t-test for comparing means of Y scores b/w X = 1 and X = 2. It can be shown that t2 ≈ M2 (M2 will be independent of the score choice for X).

If we use mid-ranks as the scores for Y , M2 is same as Mann-Whitney test.

Slide 97

chenc

Highlight

chenc

Highlight


IV.3 Tests for nominal-ordinal tables

• X – nominal, Y – ordinal with data from multinomial sampling or

product-multinomial sampling on X such as:

Y

v1 v2 v3

1 n11 n12 n13 n1+

X 2 n21 n22 n23 n2+

3 n31 n32 n33 n3+

• H0 : X ⊥ Y⇓Cond. dists. of Y are same across levels of X⇓Mean scores of Y at X = i are same across levels of X

• This is an ANOVA problem.

Slide 98

chenc

Highlight

chenc

Highlight

chenc

Highlight


• We can use the ANOVA F -test to test X ⊥ Y :

F =SST/(I − 1)

SSE/(n− I)

H0∼ FI−1,n−I

• Equivalently (for large n), we can useχ2 =

SST

SSE∗/(n− 1)

H0∼ χ2I−1

where SSE∗ is the modified sum of squares of errors.

The test χ2 is called cmh2 by SAS:

proc freq;weight count;tables x*y / cmh2;

run;

Slide 99

SST: Sum of Square of TreatmentSSE: Sum of Square of Error


V. Exact Inference for Sparse Tables

V.1 Fisher’s exact test for 2× 2 tables

• X,Y – 2 level cat. variables with structure

Y

1 2

X 1 π11 π12

2 π21 π22

• Want to test H0 : X ⊥ Y given data, WLOG, assuming from a

multinomial sampling:

Y

1 2

X 1 n11 n12

2 n21 n22

Slide 100

chenc

Highlight

chenc

Highlight


• When {nij}’s are large, we can use the Pearson χ2 or LRT G2 to test

H0 : X ⊥ Y .

• However, when some cell counts {nij}’s are small, the exact dist. of

χ2 or LRT G2 under H0 may be far from χ21, =⇒ use of asym. dist

may give wrong conclusions.

• Fisher’s tea example: Fisher’s colleague, Muriel Bristol claimed she

could tell whether or not tea (or milk) was added to the cup first.

Muriel’s Guess

Milk Tea

True Milk 3 1 4

Tea 1 3 4

4 4

Slide 101

chenc

Highlight


• By the design of Fisher’s tea example, Pearson χ2 or G2 can at most

take 5 different values (there are only 5 possible different tables).

Therefore, the χ21 approximate dist. of χ2 or G2 is very poor!

• Even if we assumed multinomial sampling, there would only be(8+3

3

)= 165 tables. Moreever, nij ’s are small. The χ2

1 approximation

of Pearson χ2 or G2 will still be very poor.

• Let us develop an exact test for testing H0 : X ⊥ Y in these kind of

sparse 2× 2 tables.

• Let us assume multinomial sampling and would like to test

H0 : θ = 1(X ⊥ Y ) v.s. one-sided alternative Ha : θ > 1.

Slide 102


• With multinomial sampling, (n11, n12, n21, n22) are random variables

(only the sum n = n++ is fixed).

• Under H0 : θ = 1(X ⊥ Y ), πij = πi+π+j , there are two unknown

π1+, π+1 parameters. So the distribution of data (n11, n12, n21, n22) is

unknown even under H0.

• It can be shown that under H0 : θ = 1(X ⊥ Y ), the conditional

distribution of n11|n1+, n+1 is totally known:

P [n11 = t0] =

(n1+

t0

)(n2+

n+1−t0

)(nn+1

) .

where t0 is the observed value of n11. This is a hyper-geometric

distribution.

Slide 103

chenc

Highlight


V.2 P-values of Fisher’s exact tests:

Y

1 2

X 1 n11 n12 n1+

2 n21 n22 n1+

n+1 n+2 n

• Simple algebra shows

θ =n11n22

n12n21=

n11(n+2 − n1+ + n11)

(n1+ − n11)(n+1 − n11)↗ n11

=⇒ larger θ ⇔ larger n11

=⇒ We should reject H0 in favor of H1 when n11 is large.

=⇒ P-value = P [n11 ≥ t0|n1+, n+1, H0] – one-sided Fisher’s exact

test.

Slide 104

chenc

Highlight


• For Fisher’s tea example, one-sided p-value is:

P-value = P [n11 ≥ 3|n1+, n+1, H0]

= P [n11 = 3|n1+, n+1, H0] + P [n11 = 4|n1+, n+1, H0]

=

(43

)(41

)(84

) +

(44

)(40

)(84

) = 0.229 + 0.014 = 0.243

Mid P-value = 0.229/2 + 0.014 = 0.129.

Note: In this example, n1+, n+1 are naturally fixed.

Slide 105


• Two-sided Fisher’s exact test: H0 : θ = 1(X ⊥ Y ) v.s. two-sided

alternative Ha : θ 6= 1.

Table n11 = 0 n11 = 1 n11 = 2 n11=3 n11 = 4

Prob 0.014 0.229 0.514 0.229 0.014

• P-value of two-sided Fisher’s exact test:

P-value =∑

P (n11)I{P (n11) ≤ P (t0)}

= sum of table probs that are ≤ observed table prob.

p-value = P [n11 = 0] + P [n11 = 1] + P [n11 = 3] + P [n11 = 4] =

0.014 + 0.229 + 0.229 + 0.014 = 0.486.

Slide 106


• SAS program & output for Fisher’s exact test:data table2_8;input pour $ guess $ count @@;datalines;milk milk 3 milk tea 1tea milk 1 tea tea 3

;

title "Analysis of Fisher’s tea data";proc freq data=table2_8;

weight count;tables pour*guess / norow nocol nopercent chisq;exact fisher or;

run;

The FREQ Procedure

Table of pour by guess

pour guess

Frequency|milk |tea | Total---------+--------+--------+milk | 3 | 1 | 4---------+--------+--------+tea | 1 | 3 | 4---------+--------+--------+Total 4 4 8

Statistics for Table of pour by guess

Statistic DF Value Prob------------------------------------------------------Chi-Square 1 2.0000 0.1573Likelihood Ratio Chi-Square 1 2.0930 0.1480

Slide 107


Fisher’s Exact Test----------------------------------Cell (1,1) Frequency (F) 3Left-sided Pr <= F 0.9857Right-sided Pr >= F 0.2429

Table Probability (P) 0.2286Two-sided Pr <= P 0.4857

Odds Ratio-----------------------------------Odds Ratio 9.0000

Asymptotic Conf Limits95% Lower Conf Limit 0.366695% Upper Conf Limit 220.9270

Exact Conf Limits95% Lower Conf Limit 0.211795% Upper Conf Limit 626.2435

Sample Size = 8

Note: We can also obtain an exact CI for the true θ.

Slide 108


V.3 Fisher’s exact tests can be conservative

• For the Fisher’s tea example, the exact null distribution of

n11|n1+, n+1:

Table n11 = 0 n11 = 1 n11 = 2 n11=3 n11 = 4

Prob 0.014 0.229 0.514 0.229 0.014

• If we would like to construct a one-sided test at significance level 0.05

(target type I error prob), then we would only reject H0 : θ = 1 in favor

of Ha : θ > 1 when n11 = 4. Therefore, the actual type I error prob is

P [n11 = 4|H0, n1+, n+1] = 0.014 < 0.05.

So the test is very conservative!

Slide 109

chenc

Highlight


VI Association in Three-Way Tables

• X, Y – 2 categorical variables

The X, Y (marginal) association may not reflect a Causal relation.

Need to adjust a 3rd variable Z, confounding variable (related to both

X, Y )

For example,

X = second hand smoking

Y = lung cancer

Z = age, may be related to X and Y

Lung Cancer

Yes No

Second Hand Smoking Yes π11 π12

No π21 π22

Slide 110

For Chap2, skipped from here...


VI.1 Partial tables, conditional and marginal associations

• With 3 categorical variables X,Y and Z, at each level of Z, there is an

XY tables. Together, they form partial tables.

• Each partial table provides information on conditional associations

between X and Y given Z = k.

• When collapsing partial tables over Z, we get a 2-way XY (marginal)

table. This table provides information of marginal association between

X and Y .

• We need to be aware that the conditional associations and marginal

association may be different!

Slide 111


• Death penalty example (Table 2.10). Data from Florida, 1976-1987.

X = defendant’s’ race (W, B), Y = death penalty (Yes, No).

Y – Death Penalty

Yes No

X – Race W 53 430

B 15 176

Death penalty rate for W = π1 = 5353+430 = 0.11

Death penalty rate for B = π2 = 1515+176 = 0.079

ψ = 1.39, θ =53× 176

430× 15= 1.45

⇒ White defendants are (40%) more likely to receive a death penalty

than black defendants.

• Maybe the race of victims (Z) affects the XY association?

Slide 112


When Z = White, XY table is

Y – Death Penalty

Yes No

X – Race W 53 414 π1 = 11.3%

B 11 37 π2 = 22.9%

When Z = Black, XY table is

Y – Death Penalty

Yes No

X – Race W 0 16 π1 = 0%

B 4 139 π2 = 2.8%

• We see that the conditional associations and the marginal association

between X and Y have different directions! This phenomenon is called

Simpson’s paradox.

Slide 113


• Reasons causing Simpson’s paradox:

Z is related to both X and Y .

1. More white victims than black victims.

2. Given Z =white, defendants (X) are about 90% likely to be white

3. Given Z =black, defendants (X) are only about 10% likely to be

white.

4. More white defendants received death penalty (X,Y are related).

Slide 114


VI.2 Conditional and marginal odds ratios

• When we have 2× 2×K tables for X,Y and Z, At Z = k, observed

table for XY is

Y

1 2

X 1 n11k n12k

2 n21k n22k

Then we have K conditional odds ratios that estimate the conditional

associations between X and Y at Z = k

θXY (k) =n11kn22k

n12kn21k.

Slide 115


The marginal XY table is

Y

1 2

X 1 n11+ n12+

2 n21+ n22+

The marginal odds-ratio estimates the marginal association between X

and Y :

θXY =n11+n22+

n12+n21+.

Slide 116


• For the death penalty example,

θXY = 1.45

θXY (1) =53× 37

11× 414= 0.43

θXY (2) =0× 139

4× 16= 0

θmodXY (2) =

0.5× 139.5

4.5× 16.5= 0.94

Slide 117


VI.3 Conditional and marginal independence

• If X and Y are independent at any level of Z, then X and Y are

called conditionally independent given Z.

If X,Y are 2-level variables, then X and Y conditionally independent

⇔ θXY (k) = 1, k = 1, 2, ...,K.

• X,Y marginally independent if X, Y are independent.

If X,Y are 2-level variables, then X and Y marginally independent ⇔θXY = 1.

Slide 118


• Example: Conditional independence 6 ⇒ marginal independence.

Y

S F

X A 18 12

B 12 8

θXY (1) = 1 A = B

Y

S F

X A 2 8

B 8 32

θXY (2) = 1 A = B

Marginally,

Y

S F

X A 20 20

B 20 40

θXY = 2 ⇒ A > B

Slide 119


• Example: Marginal independence 6 ⇒ conditional independence

Y

S F

X A 4 1

B 9 6

θXY (1) = 8/3

Y

S F

X A 6 9

B 1 4

θXY (2) = 8/3

Marginally,

Y

S F

X A 10 10

B 10 10

θXY = 1 ⇒ A = B

Slide 120


VI.4 Homogeneous association

• Assume X,Y are 2-level variables.

Homogeneous association (in terms of θ) – no interaction

m

θXY (1) = θXY (2) = · · · = θXY (K)

When θXY (k) are not all the same, Z is called an effect modifier (there

is interaction).

• Note: Under homogeneous association, we cannot claim

θXY = θXY (1) = θXY (2) = · · · = θXY (K).

See previous examples.

Slide 121

chapter 2 st 544, d. zhang 2 contingency tables

Documents