analysis of matched data hrp 261 02/02/04 chapter 9 agresti – read sections 9.1 and 9.2
TRANSCRIPT
Analysis of matched dataAnalysis of matched data
HRP 261 02/02/04HRP 261 02/02/04
Chapter 9 Agresti – read sections 9.1 and 9.2Chapter 9 Agresti – read sections 9.1 and 9.2
Pair Matching: Why match?Pair Matching: Why match?Pairing can control for extraneous sources
of variability and increase the power of a statistical test.
Match 1 control to 1 case based on potential confounders, such as age, gender, and smoking.
ExampleExample Johnson and Johnson (NEJM 287: 1122-1125,
1972) selected 85 Hodgkin’s patients who had a sibling of the same sex who was free of the disease and whose age was within 5 years of the patient’s…they presented the data as….
Hodgkin’s
Sib control
Tonsillectomy None
41 44
33 52
From John A. Rice, “Mathematical Statistics and Data Analysis.
OR=1.47; chi-square=1.53 (NS)
ExampleExample But several letters to the editor pointed out that
those investigators had made an error by ignoring the pairings. These are not independent samples because the sibs are paired…better to analyze data like this:
From John A. Rice, “Mathematical Statistics and Data Analysis.
OR=2.14; chi-square=2.91 (p=.09)
Tonsillectomy
None
Tonsillectomy None
37 7
15 26
Case
Control
Pair Matching: Pair Matching: Agresti Agresti exampleexample
Match each MI case to an MI control based on age and gender.
Ask about history of diabetes to find out if diabetes increases your risk for MI.
Pair Matching: Pair Matching: AgrestiAgresti example example
Which cells are informative?
Just the discordant cells are informative!
Diabetes
No diabetes
25 119
Diabetes No Diabetes
9 37
16 82
46
98
144
MI cases
MI controls
Pair MatchingPair Matching
Diabetes
No diabetes
25 119
Diabetes No Diabetes
9 37
16 82
46
98
144
MI cases
MI controls
OR estimate comes only from discordant pairs!
The question is: among the discordant pairs, what proportion are discordant in the direction of the case vs. the direction of the control. If more discordant pairs “favor” the case, this indicates OR>1.
Diabetes
No diabetes
25 119
Diabetes No Diabetes
9 37
16 82
46
98
144
MI cases
MI controls
P(“favors” case/discordant pair) =
)~/(*)/(~)~/(~*)/(
)~/(~*)/(
DEPDEPDEPDEP
DEPDEP
=the probability of observing a case-control pair with only the control exposed
=the probability of observing a case-control pair with only the case exposed
Diabetes
No diabetes
25 119
Diabetes No Diabetes
9 37
16 82
46
98
144
MI cases
MI controls
P(“favors” case/discordant pair) =
53
37
1637
37ˆ
cb
bp
Diabetes
No diabetes
25 119
Diabetes No Diabetes
9 37
16 82
46
98
144
MI cases
MI controls
odds(“favors” case/discordant pair) =
16
37
c
bOR
Diabetes
No diabetes
25 119
Diabetes No Diabetes
9 37
16 82
46
98
144
MI cases
MI controls
OR estimate comes only from discordant pairs!!
OR= 37/16 = 2.31
Makes Sense!
Diabetes
No diabetes
Diabetes No Diabetes
9 37
16 82
MI casesMI controls
McNemar’s TestMcNemar’s Test
...)5(.)5(.39
53)5(.)5(.
38
53)5(.)5(.
37
53 143915381637
valuep
01.;88.264.3
5.10
)5)(.5(.53
)2
53(37
pZ
Null hypothesis: P(“favors” case / discordant pair) = .5(note: equivalent to OR=1.0 or cell b=cell c)
By normal approximation to binomial:
McNemar’s Test: generallyMcNemar’s Test: generally
cb
cb
cb
cb
cb
cbb
Z
4
22)5)(.5)(.(
)2
(
By normal approximation to binomial:
Equivalently:
cb
cb
cb
cb
2
221
)()(
exp
No exp
exp No exp
a b
c d
casescontrols
95% CI for difference in 95% CI for difference in dependent proportionsdependent proportions
Diabetes
No diabetes
25 119
Diabetes No Diabetes
9 37
16 82
46
98
144
MI cases
MI controls
24.05.)0024.(96.115.17.- 32. : CI %95
0024.144
)11.*26.57.*06(.2)83)(.17(.)68)(.32(.
),(2)1()1(
)(
),(2)()()(
~//~/~///
~//
212121
DEDE
controlscases
DEDE
controlscases
DEDE
DEDE
ppCovn
pp
n
pp
ppVar
ppCovpVarpVarppVar
Each pair is it’s own “age-Each pair is it’s own “age-gender” stratumgender” stratum
Diabetes
No diabetes
Case (MI) Control
1 1
0 0
Example: Concordant for
exposure (cell “a” from before)
Diabetes
No diabetes
Case (MI) Control
1 1
0 0
Diabetes
No diabetes
Case (MI) Control
1 0
0 1
x 9
x 37
Diabetes
No diabetes
Case (MI) Control
0 1
1 0
Diabetes
No diabetes
Case (MI) Control
0 0
1 1
x 16
x 82
Mantel-Haenszel for pair-Mantel-Haenszel for pair-matched datamatched data
We want to know the relationship between diabetes and MI controlling for age and gender.
Mantel-Haenszel methods apply.
RECALL: The Mantel-Haenszel RECALL: The Mantel-Haenszel Summary Odds RatioSummary Odds Ratio
Exposed
Not Exposed
Case Control
a b
c d
k
i i
ii
k
i i
ii
T
cbT
da
1
1
Diabetes
No diabetes
Case (MI) Control
1 1
0 0
Diabetes
No diabetes
Case (MI) Control
1 0
0 1
ad/T = 0
bc/T=0
ad/T=1/2
bc/T=0
Diabetes
No diabetes
Case (MI) Control
0 1
1 0
Diabetes
No diabetes
Case (MI) Control
0 0
1 1
ad/T=0
bc/T=1/2
ad/T=0
bc/T=0
16
37
21
*16
21
37
2
2144
1
144
1
x
cb
da
OR
i
ii
i
ii
MH
Mantel-Haenszel Summary ORMantel-Haenszel Summary OR
Mantel-Haenszel Test StatisticMantel-Haenszel Test Statistic(same as McNemar’s)(same as McNemar’s)
cb
cb
cb
cbCMH
nVar
nn
nnnnVar
n
nn
cellsdisc
cellsdisccon cellsdisccase
k
kk
kkkk
k
kk
22
.
..
2
.
21111k
22211
11k
1111k
)(
)25)(.(
)](5.)(5[.
25.
]5.5.[
4
1
)12(2
)1)(1)(1)(1()(;
2
1
2
)1)(1(
:cells discordant
0 contribute cells Concordant
)1()(n
)E(n :recall
From: “Large outbreak of Salmonella enterica serotype paratyphi B infection caused by a goats' milk cheese, France, 1993: a case finding and epidemiological study” BMJ 312: 91-94; Jan 1996.
Example: Salmonella Example: Salmonella Outbreak in France, 1996Outbreak in France, 1996
Epidemic CurveEpidemic Curve
Matched Case Control StudyMatched Case Control Study
Case = Salmonella gastroenteritis.
Community controls (1:1) matched for: age group (< 1, 1-4, 5-14, 15-34, 35-44,
45-54, 55-64, or >= 65 years) gender city of residence
ResultsResults
In 2x2 table form: any goat’s In 2x2 table form: any goat’s cheesecheese
Goat’s cheese
None
29 30
Goat’ cheese None
23 23
6 7
46
13
59
Cases
Controls
8.36
23
c
bOR
In 2x2 table form: Brand B In 2x2 table form: Brand B Goat’s cheeseGoat’s cheese
Goat’s cheese B
None
10 49
Goat’ cheese B None
8 24
2 25
32
27
59
Cases
Controls
0.122
24
c
bOR
Brand B
None
Case (MI) Control
1 1
0 0
Brand B
None
Case (MI) Control
1 0
0 1
Brand B
None
Case (MI) Control
0 1
1 0
Brand B
None
Case (MI) Control
0 0
1 1
x8
x24
x2
x25
0)12(4
1*0*1*2
)1()(n
011)n(
12
1*2)E(n :exposed concordant 8
22211
11k
11k11k
1111k11k
kk
kkkk
k
kk
nn
nnnnVar
Observed
n
nn
0)12(4
1*2*1*0
)1()(n
000)n(
02
1*0)E(n :unexposed concordant 25
22211
11k
11k11k
1111k11k
kk
kkkk
k
kk
nn
nnnnVar
Observed
n
nn
Summary: 8 concordant-exposed pairs (=strata) contribute nothing to the numerator (observed-expected=0) and nothing to the denominator (variance=0).
Summary: 25 concordant-unexposed pairs contribute nothing to the numerator (observed-expected=0) and nothing to the denominator (variance=0).
Summary: 2 discordant “control-exposed” pairs contribute -.5 each to the numerator (observed-expected= -.5) and .25 each to the denominator (variance= .25).
4
1
)12(4
1*1*1*1
)1()(n
5.5.1)n(2
1
2
)1)(1( :casefavor cells discordant 24
22211
11k
11k11k
11k
kk
kkkk
nn
nnnnVar
Observed
4
1
)12(4
1*1*1*1
)1()(n
5.5.0)n(2
1
2
)1)(1(:controlfavor cells discordant 2
22211
11k
11k11k
11k
kk
kkkk
nn
nnnnVar
Observed
Summary: 24 discordant “case-exposed” pairs contribute +.5 each to the numerator (observed-expected= +.5) and .25 each to the denominator (variance= .25).
cb
cb
CMH
2222
2
)(
26
)224(
26
22
)25(.26
)25(.22
)25(.2)25(.2400
)]5(.2)5(.24)0(25)0(8[
M:1 matched studiesM:1 matched studies
One-to-one pair matching provides the most cost-effective design when cases and controls are equally scarce.
But when cases are the limiting factor, as with rare diseases, statistical power may be increased by selecting more than 1 control matched to each case.
But with diminishing returns…
M:1 matched studiesM:1 matched studies
2:1 matched study of colorectal cancer. Background: Carcinoembryonic antigen (CEA) is
the classical tumor marker for colorectal cancer. This study investigated whether the plasma levels of carcinoembryonic antigen and/or CA 242 were elevated BEFORE clinical diagnosis of colorectal cancer.
From: Palmqvist R et al. Prediagnostic Levels of Carcinoembryonic Antigen and CA 242 in Colorectal Cancer: A Matched Case-Control Study. Diseases of the Colon & Rectum. 46(11):1538-1544, November 2003.
M:1 matched studiesM:1 matched studies Prediagnostic Levels of Carcinoembryonic Antigen and CA Prediagnostic Levels of Carcinoembryonic Antigen and CA
242 in Colorectal Cancer: A Matched Case-Control Study242 in Colorectal Cancer: A Matched Case-Control Study
Study design: A so-called “nested case-control study.”Idea: Study subjects who were members of an
ongoing prospective cohort study in Sweden had given blood at baseline, when they had no disease. Years later, blood can be thawed and tested for the presence of prediagnostic antigens.
Key innovation: The cohort is large, the disease is rare, and it’s too costly to test everyone’s blood; so only test stored blood of cases and matched controls from the cohort.
M:1 matched studiesM:1 matched studies
Two cancer-free controls were randomly selected to each case from the corresponding cohort at the time of diagnosis of the matched case.
Matched for: Gender age at recruitment (±12 months) date of blood sampling ±2 months fasting time (<4 hours, 4–8 hours, >8 hours).
2:1 matching:2:1 matching:
•stratum=matching groupstratum=matching group
•3 subjects per stratum3 subjects per stratum
•6 possible 2x2 tables…6 possible 2x2 tables…
CEA +
CEA -
Case (CRC) Controls
1 1
0 1
CEA +
CEA -
Case (CRC) Controls
1 2
0 0
Everyone exposed; non-informative
Case exposed; 1 control unexposed
CEA +
CEA -
Case (CRC) Controls
1 0
0 2Case exposed; both controls unexposed
CEA +
CEA -
Case (CRC) Controls
0 1
1 1
CEA +
CEA -
Case (CRC) Controls
0 2
1 0
Case unexposed; both controls exposed
Case unexposed; 1 control exposed
CEA +
CEA -
Case (CRC) Controls
0 0
1 2
Everyone unexposed; non-informative
CEA +
CEA -
Case (CRC) Controls
1 1
0 1
CEA +
CEA -
Case (CRC) Controls
1 2
0 00
2
CEA +
CEA -
Case (CRC) Controls
1 0
0 212
CEA +
CEA -
Case (CRC) Controls
0 1
1 1
CEA +
CEA -
Case (CRC) Controls
0 2
1 0
0
1
CEA +
CEA -
Case (CRC) Controls
0 0
1 2102
CEA +
CEA -
Case (CRC) Controls
1 1
0 1
CEA +
CEA -
Case (CRC) Controls
1 0
0 2
CEA +
CEA -
Case (CRC) Controls
0 2
1 0
2 Tables with 2 exposed
13 Tables with 1 exposed
CEA +
CEA -
Case (CRC) Controls
0 1
1 1
2
2
1
1
Represents all possible
discordant tables (either 2 or 1 total exposed)
CEA +
CEA -
Case (CRC) Controls
1 1
0 1
CEA +
CEA -
Case (CRC) Controls
0 2
1 0
2 Tables with 2 exposed
2
2
)1()() tablesecond(
)1()1()efirst tabl(
~/~//
~/~//
2
1
022
2
DEDEDE
DEDEDE
pppP
pppP
)1()()1(
)1()(
)exposed total2exposed/ case(
~/~//~//
~/~//
2
1
22
2
2
1
DEDEDEDEDE
DEDEDE
ppppp
ppp
P
12
2
2
2
2
2
)1(2)()1(
)1(2)(
)1()()1(
)1()(
~//
~/~/
~//
~//
~//
~/~/
~/~/~//
~/~/
~//~//
~//
~/~//~//
~/~//
~~
~
~
~
2
1
22
2
2
1
OR
OR
pp
pp
pp
pp
pp
pp
pppp
pp
pppp
pp
ppppp
ppp
DEDE
DEDE
DEDE
DEDE
DEDE
DEDE
DEDEDEDE
DEDE
DEDEDEDE
DEDE
DEDEDEDEDE
DEDEDE
CEA +
CEA -
Case (CRC) Controls
0 1
1 1
CEA +
CEA -
Case (CRC) Controls
1 0
0 2
13 Tables with 1 exposed
1
1
)1()1() tablesecond(
)1()efirst tabl(
~/~//
~/~//
2
1
202
0
DEDEDE
DEDEDE
pppP
pppP
)1()1()1(
)1(
)exposed total1exposed/ case(
~/~//~//
~//
2
1
22
0
22
0
DEDEDEDEDE
DEDE
ppppp
pp
P
22
2
2
)1()1()1(
)1(
~//
~//
~//
~//
~//
~//
~//~//
~//
~/~//~//
~//
~/~//~//
~//
~
~
~
~
~
~
~~
~
~~2
~
2~
2
1
22
0
22
0
OR
OR
pp
pp
pp
pp
pp
pp
pppp
pp
ppppp
pp
ppppp
pp
DEDE
DEDE
DEDE
DEDE
DEDE
DEDE
DEDEDEDE
DEDE
DEDEDEDEDE
DEDE
DEDEDEDEDE
DEDE
SummarySummary
P(case exposed/2 total exposed)=2OR/(2OR+1) P(case unexposed/2 total exposed)=1-2OR/(2OR+1) P(case exposed/1 total exposed) = OR/(OR+2) P(case unexposed/1 total exposed)= 1-OR/(OR+2)
Therefore, we can make a likelihood equation for our data that is a function of the OR, and use MLE to solve for OR
Applying to example dataApplying to example data
11202
11202
)2
2()
2()
12
1()
12
2(
)2
1()2
()12
21()
12
2()/(
OROR
OR
OROR
OROR
OR
OR
OR
OR
OR
OR
ORORdataP
A little complicated to solve further…
Applying to example dataApplying to example data
BD give a more simple robust estimate of OR for 2:1 matching:
0.26)1(1)0(2
)12(2)2(1
exposed) control & exposed total1 where1(#exposed) controls 2 & exposed total2 where#(2
exposed) case & exposed total1 where2(#exposed) case & exposed total2 where#(1
OR