statistik for mph: 5
TRANSCRIPT
Statistik for MPH: 5
11. oktober 2016
www.biostat.ku.dk/~pka/mph16
Confounding (Silva: 305-319, 327-331.)
Per Kragh Andersen
1
Fra den 4. uges statistikundervisning:
– skulle jeg gerne
1. forstå, at man fra en kohorte-undersøgelse (hvor alle er fulgt ca.lige længe) kan estimere både risiko, relativ risiko, odds og oddsratio
2. forstå, at man fra en case-kontrol undersøgelse med “raske”kontroller kan estimere odds ratio
Fra den. 4. uges statistikundervisning:
– behøver jeg derimod ikke nødvendigvis
1. at have forstået, hvordan sikkerhedsgrænserne omkring ORfremkommer
2
Confounding.Do we always get a fair comparison between exposed and non-exposed?
Not necessarily - a randomly selected exposed person will (in the diagram below) typicallybe older than a randomly chosen non-exposed.
This is a problem if age is a risk factor for the outcome.
Young
OldOld
Young
NON-EXPOSEDEXPOSED
3
Confounding.A variable C is a potential confounder for the association:
E → O
if it is
1. related to the exposure:E − C
2. an independent risk factor for the outcome:
C → O
3. not a consequence of the exposure (not an “intermediate” variable):
E → C → O
That is:
E − C
↘ ↙O
4
ConfoundingExample: Age is a confounder for the study of mortality in California and Maine, since
1. the age distributions differ between the two states
2. age is a risk factor for mortality
California Maine
Age Pop. in 1000 % Pop. in 1000 %
< 15 5524 28 286 29
15-24 3558 18 168 17
25-34 2677 13 110 11
35-44 2359 12 109 11
45-54 2330 12 110 11
55-64 1704 9 94 9
65-74 1105 6 69 7
75+ 696 3 46 5
Total 19953 100 992 100
Similarly when comparing stomach cancer rates between Cali and Birmingham.
5
Adjustment for confounding using stratificationExample (from Lauritzen, 1996)
Death sentences in 4863 murder cases in Florida, partitioned aftercolour of alleged murderer.
SentenceMurderer Death Other TotalWhite 72 2185 2257Black 59 2547 2606Total 131 4732 4863
6
Risk of death sentence:
White murderer: 722257 = 3.2%
Black murderer: 592606 = 2.3%
R = 3.2%2.3% = 1.41
OR = 72·254759·2185 = 1.42
white vs. black
ln(OR) = 0.352
L1 = 0.352− 1.96√
159
+ 12547
+ 172
+ 12185
= 0.0035
L2 = 0.352 + 1.96√
159
+ 12547
+ 172
+ 12185
= 0.701
95% conf. limits: from exp(L1) = 1.00 to exp(L2) = 2.02
7
Possible confounder: race of victim
SentenceVictim Murderer Death Other TotalBlack White 0 111 111
Black 11 2309 2320Total 11 2420 2431
White White 72 2074 2146Black 48 238 286Total 120 2312 2432
8
How do we see if “race of victim” is a confounder?
(2) SentenceVictim death otherwhite 120 2312black 11 2420
OR = 11.41
(1) MurdererVictim white blackwhite 2146 286black 111 2320
OR = 156.8
Separate analyses in strata defined by confounder:
Black victim OR = 0·230911·111 = 0
White victim OR = 72·23848·2074 = 0.172
(white murderer vs. black murderer)
9
Combined analysis over strata(= stratified analysis)(= the Mantel-Haenszel method)We have a series (here two!) of two by two tables: one from eachstratum.
stratum 1 stratum k
Outcome Outcome
Exposed a b · · · a b
Unexposed c d · · · c d
n n
In each stratum, we can estimate odds ratio by
a · db · c
=a · d/nb · c/n
10
The Mantel-Haenszel estimator.A common odds ratio for all strata may be estimated by theMantel-Haenszel estimator∑
a·dn∑b·cn
= ORMH .
where the sums are over the k strata. This is a weighted average ofseparate OR-estimates.
In the example:
ORMH =0·23092431 + 72·238
243211·1112431 + 48·2074
2432
= 0.170
11
The Mantel-Haenszel test.In each stratum, we can calculate:
OBServed = a
EXPected = (a+b)(a+c)n = E(a)
SD =√
(a+b)(c+d)(a+c)(b+d)n2(n−1) = SD(a)
The combined Mantel-Haenszel test statistic is
(∑a−
∑E(a))
2∑(SD(a))
2 = X2MH ∼ χ2
1 under H0
(where the sums are over the k strata). In the example: X2MH =(
(0 + 72)−(111·112431 + 2146·120
2432
))2111·11·2320·2420
(2431)2·2430 + 120·2146·286·2312(2432)2·2431
=(−34.39)2
12.32= 96.0, P < 0.001.
12
Interpretation:ORMH is an estimate of the association between exposure (colour ofmurderer) and outcome (use of death sentence), adjusted for theconfounder (colour of victim).
X2MH is a test statistic for no association between exposure and
outcome, adjusted for the confounder.Examination of confoundingNote that 1. and 2., but NOT 3. can be checked (statistically) on thedata at hand.
Degree of confounding is often examined using the change in estimateprinciple: how different are adjusted and unadjusted (“marginal”)estimates.
Selection of confounders is often based on prior knowledge and thestructure of variables in the problem may be depicted in a ‘DAG’(’directed acyclic graph’).
13
Confidence limits for common odds ratio1) calculate ln(ORMH), here: ln(0.170) = −1.772
2)calculate L1 = ln(ORMH)− 1.96 · SD
L2 = ln(ORMH) + 1.96 · SD
where SD = |ln(ORMH)|√X2
MH
,
here SD = 1.772√96.0
= 0.181 that is:L1 = −1.772− 1.96 · 0.181 = −2.126,
L2 = −1.772 + 1.96 · 0.181 = −1.418
3) The desired 95% confidence limits are from exp(L1) = 0.119 toexp(L2) = 0.242.
Alternative (more complicated) formula for SD in Silva, p.327.
14
Exercise:Analyse (using the Mantel-Haenszel method) the association betweenlength of prenatal care and neonatal mortality based on the followingdata.
Is clinic a confounder for this association?
Mortality among newborns and amount of prenatal care (McNeil, 1996).
Clinic A prenatal care<30 days
Dead Yes No Total
Yes 3 4 7
No 176 293 469
Total 179 297 476
Clinic B prenatal care<30 days
Dead Yes No Total
Yes 17 2 19
No 197 23 220
Total 214 25 239
15
Solution:Clinic A: OR = 3·293
4·176 = 1.25, Clinic B: OR = 17·232·197 = 0.99.
MH-estimate
ORMH =3·293476 + 17·23
2394·176476 + 2·197
239
=3.483
3.128= 1.113
MH-test
X2MH =
((3+17)−( 7·179476 + 19·214
239 ))2
7·469·179·297476·476·475 + 19·220·214·25
239·239·238= (20−19.64)2
1.621+1.644 = 0.040, P = 0.84.
ln(ORMH) = ln(1.113) = 0.108, SD(lnORMH) =0.108√0.04
= 0.540
L1 = 0.108− 1.96 · 0.540 = −0.951, exp(L1) = 0.39
L2 = 0.108 + 1.96 · 0.540 = 1.16, exp(L2) = 3.21
16
The “marginal” 2 by 2 table ignoring clinic is
prenatal care<30 days
Dead Yes No Total
Yes 20 6 26
No 373 316 689
Total 393 322 715
The “marginal” OR ignoring clinic is 20·3166·373 = 2.82 with 95%
confidence limits from 1.10 to 7.25 quite different from the estimateadjusted for clinic (1.113).
This suggests that clinic is a confounder.
17
Mostly used approach to confounderadjustment of ORLogistic regression, see Silva, Section 14.6.
Using this method, it is also possible to estimate/test the effect of anexposure on odds for an outcome adjusted for other variables.
18
When is the stratified analysis sensible?In the stratified analysis, we average the individual OR’s from theseparate strata.
This makes sense if the individual OR’s are roughly the same in allstrata (taking the random variation into account).
= if there is no interaction between exposure and stratificationvariable on the outcome
= if there is no effect-modification of the stratification variable onthe relation between exposure and outcome
19
Tests for no interaction existbut go beyond what we can cover in this course.
Example
Prevalence of myocardial infarction by systolic blood pressure and age (Israeli Ischemic
Heart Disease Study, Kahn & Sempos, 1989).
Age≥60 MI cases MI negative Total
SBP≥ 140 9 115 124
SBP< 140 6 73 79
Total 15 188 203 OR=0.95
Age<60 MI cases MI negative Total
SBP≥ 140 20 596 616
SBP< 140 21 1171 1192
Total 41 1767 1808 OR=1.87
Interaction? (= Effect-modification?) Are the separate OR’s, 0.95 and 1.87 different?
Can be tested using logistic regression or the “Breslow-Day” test (the answer is no).
20
Adjustment for confounding in cohort studies:Stratified analysis of rates (and risks)
Male stomach cancer cases in Cali and Birmingham:
Cali Birmingham
Age Popu- Person- No.of Rate per Popu- Person- No.of Rate per
lation years cancers 100000 ys. lation years cancers 100000 ys.
0-44 524220 2621100 39 1.5 1683600 6734400 79 1.2
45-64 76304 381520 266 69.7 581500 2326000 1037 44.6
65+ 22398 111990 315 281.3 291100 1164400 2352 202.0
Total 622922 3114610 620 19.9 2556200 10224800 3468 33.9
Crude RR = 19.933.9 = 0.59 not relevant due to age-confounding.
Age-adjustment?
Previously: standardisation.
Now: stratified (Mantel-Haenszel) analysis.
21
Mantel-Haenszel analysis.We have a series (here 3) of tables of the form:
Events Person Stratum no. 3
years
“Risk factor +” a1 y1 a1 = 315 y1 = 111990
“Risk factor −” a2 y2 a2 = 2352 y2 = 1164400
Total a y a = 2667 y = 1276390
One table from each stratum.
22
The Mantel-Haenszel estimator.In each table, we may estimate the rate ratio by
a1/y1a2/y2
=a1y2a2y1
=a1y2/y
a2y1/y
A common rate ratio for all strata may be estimated by theMantel-Haenszel estimator (sum over k strata; weighted average of
separate RR-estimates):∑ a1y2
y∑ a2y1y
= RRMH .
In example: RRMH =39·67344009355500 + 266·2326000
2707520 + 315·11644001276390
79·26211009355500 + 1037·381520
2707520 + 2352·1119901276390
= 1.45.
23
The Mantel-Haenszel test.In each stratum, we can calculate:
OBServed = a1
EXPected = y1 · ay = y1a1+a2
y1+y2= E(a1)
Standard Deviation =√ay1
yy2
y =√(a1 + a2)
y1y2
(y1+y2)2= SD(a1)
The combined Mantel-Haenszel test statistic is (sums over k strata):
(∑a1 −
∑E(a1))
2∑(SD(a1))2
= X2MH ∼ χ2
1 under H0.
X2MH =
((39 + 266 + 315)−
(118 2621100
9355500 + 1303 3815202707520 + 2667 111990
1276390
))2118 2621100·6734400
(9355500)2 + 1303 381520·2326000(2707520)2 + 2667 111990·1164400
(1276390)2
= (169.33)2/395.00 = 72.59 ∼ χ21, P < 0.001.
24
95% confidence limits for common rate ratio:1) Calculate ln(RRMH) = ln(1.45) = 0.373
2) Calculate L1 = ln(RRMH)− 1.96 · SDand L2 = ln(RRMH) + 1.96 · SD
Here: SD = |ln(RRMH)|√X2
MH
= 0.373√72.59
= 0.044.
That is: L1 = 0.373− 1.96 · 0.044 = 0.287,
L2 = 0.373 + 1.96 · 0.044 = 0.459.
3)The desired 95% confidence limits are from exp(L1) = 1.33 toexp(L2) = 1.58.
Note: alternative (and more complicated) formula for SD in Silva,p.330.
25
Example:British Doctors cohort study, comparing coronary deaths amongsmokers and non-smokers adjusted for age (McNeil, 1996)
Deaths Person-years
Age Group SM Non-SM SM Non-SM RR
35-44 32 2 52407 18790 5.74
45-54 104 12 43248 10673 2.14
55-64 206 28 28612 5710 1.47
65-74 186 28 12663 2585 1.36
75-84 102 31 5317 1462 0.90
Total 630 101 142247 39220 1.72
Crude RR =630/142247
101/39220= 1.72
Age-adjustment not obvious, RR varies among strata:Effect-modification/interaction - yes!
26
27
If risks (“person-denominators”) are used instead of rates(“person-years-denominators”):
Events No-events Total
Risk factor + a c a+ c
Risk factor − b d b+ d
Exactly the same calculations may be used:
Replacing y1 by a+ c
and y2 by b+ d
See: Silva, p.329.
28
Exercise:Consider the British Doctors Study restricting attention to the two agegroups 45-54 and 55-64 and compare smokers and non-smokersadjusting for age.
Data:
45-54 Deaths PYRS 55-64 Deaths PYRS
SM 104 43248 SM 206 28612
Non-SM 12 10673 Non-SM 28 5710
116 53921 234 34322
29
Solution.
RRMH =104·10673
53921 + 206·571034322
12·4324853921 + 28·28612
34322
= 1.66
X2MH =
((104 + 206)−
(116 · 4324853921 + 234 · 2861234322
))2116 · 43248·10673(53921)2 + 234 · 28612·5710(34322)2
= 9.42 ∼ χ21, P ∼ 0.002
ln(RRMH) = 0.509 , SD(ln(RRMH)) =0.509√9.42
= 0.166
L1 = 0.509− 1.96× 0.166 = 0.184, L2 = 0.509 + 1.96× 0.166 =
0.834.
95% confidence interval from exp(L1) = 1.20 to exp(L2) = 2.30
30