statistik for mph: 5

30
Statistik for MPH: 5 11. oktober 2016 www.biostat.ku.dk/~pka/mph16 Confounding (Silva: 305-319, 327-331.) Per Kragh Andersen 1

Upload: hoanghanh

Post on 28-Jan-2017

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistik for MPH: 5

Statistik for MPH: 5

11. oktober 2016

www.biostat.ku.dk/~pka/mph16

Confounding (Silva: 305-319, 327-331.)

Per Kragh Andersen

1

Page 2: Statistik for MPH: 5

Fra den 4. uges statistikundervisning:

– skulle jeg gerne

1. forstå, at man fra en kohorte-undersøgelse (hvor alle er fulgt ca.lige længe) kan estimere både risiko, relativ risiko, odds og oddsratio

2. forstå, at man fra en case-kontrol undersøgelse med “raske”kontroller kan estimere odds ratio

Fra den. 4. uges statistikundervisning:

– behøver jeg derimod ikke nødvendigvis

1. at have forstået, hvordan sikkerhedsgrænserne omkring ORfremkommer

2

Page 3: Statistik for MPH: 5

Confounding.Do we always get a fair comparison between exposed and non-exposed?

Not necessarily - a randomly selected exposed person will (in the diagram below) typicallybe older than a randomly chosen non-exposed.

This is a problem if age is a risk factor for the outcome.

Young

OldOld

Young

NON-EXPOSEDEXPOSED

3

Page 4: Statistik for MPH: 5

Confounding.A variable C is a potential confounder for the association:

E → O

if it is

1. related to the exposure:E − C

2. an independent risk factor for the outcome:

C → O

3. not a consequence of the exposure (not an “intermediate” variable):

E → C → O

That is:

E − C

↘ ↙O

4

Page 5: Statistik for MPH: 5

ConfoundingExample: Age is a confounder for the study of mortality in California and Maine, since

1. the age distributions differ between the two states

2. age is a risk factor for mortality

California Maine

Age Pop. in 1000 % Pop. in 1000 %

< 15 5524 28 286 29

15-24 3558 18 168 17

25-34 2677 13 110 11

35-44 2359 12 109 11

45-54 2330 12 110 11

55-64 1704 9 94 9

65-74 1105 6 69 7

75+ 696 3 46 5

Total 19953 100 992 100

Similarly when comparing stomach cancer rates between Cali and Birmingham.

5

Page 6: Statistik for MPH: 5

Adjustment for confounding using stratificationExample (from Lauritzen, 1996)

Death sentences in 4863 murder cases in Florida, partitioned aftercolour of alleged murderer.

SentenceMurderer Death Other TotalWhite 72 2185 2257Black 59 2547 2606Total 131 4732 4863

6

Page 7: Statistik for MPH: 5

Risk of death sentence:

White murderer: 722257 = 3.2%

Black murderer: 592606 = 2.3%

R = 3.2%2.3% = 1.41

OR = 72·254759·2185 = 1.42

white vs. black

ln(OR) = 0.352

L1 = 0.352− 1.96√

159

+ 12547

+ 172

+ 12185

= 0.0035

L2 = 0.352 + 1.96√

159

+ 12547

+ 172

+ 12185

= 0.701

95% conf. limits: from exp(L1) = 1.00 to exp(L2) = 2.02

7

Page 8: Statistik for MPH: 5

Possible confounder: race of victim

SentenceVictim Murderer Death Other TotalBlack White 0 111 111

Black 11 2309 2320Total 11 2420 2431

White White 72 2074 2146Black 48 238 286Total 120 2312 2432

8

Page 9: Statistik for MPH: 5

How do we see if “race of victim” is a confounder?

(2) SentenceVictim death otherwhite 120 2312black 11 2420

OR = 11.41

(1) MurdererVictim white blackwhite 2146 286black 111 2320

OR = 156.8

Separate analyses in strata defined by confounder:

Black victim OR = 0·230911·111 = 0

White victim OR = 72·23848·2074 = 0.172

(white murderer vs. black murderer)

9

Page 10: Statistik for MPH: 5

Combined analysis over strata(= stratified analysis)(= the Mantel-Haenszel method)We have a series (here two!) of two by two tables: one from eachstratum.

stratum 1 stratum k

Outcome Outcome

Exposed a b · · · a b

Unexposed c d · · · c d

n n

In each stratum, we can estimate odds ratio by

a · db · c

=a · d/nb · c/n

10

Page 11: Statistik for MPH: 5

The Mantel-Haenszel estimator.A common odds ratio for all strata may be estimated by theMantel-Haenszel estimator∑

a·dn∑b·cn

= ORMH .

where the sums are over the k strata. This is a weighted average ofseparate OR-estimates.

In the example:

ORMH =0·23092431 + 72·238

243211·1112431 + 48·2074

2432

= 0.170

11

Page 12: Statistik for MPH: 5

The Mantel-Haenszel test.In each stratum, we can calculate:

OBServed = a

EXPected = (a+b)(a+c)n = E(a)

SD =√

(a+b)(c+d)(a+c)(b+d)n2(n−1) = SD(a)

The combined Mantel-Haenszel test statistic is

(∑a−

∑E(a))

2∑(SD(a))

2 = X2MH ∼ χ2

1 under H0

(where the sums are over the k strata). In the example: X2MH =(

(0 + 72)−(111·112431 + 2146·120

2432

))2111·11·2320·2420

(2431)2·2430 + 120·2146·286·2312(2432)2·2431

=(−34.39)2

12.32= 96.0, P < 0.001.

12

Page 13: Statistik for MPH: 5

Interpretation:ORMH is an estimate of the association between exposure (colour ofmurderer) and outcome (use of death sentence), adjusted for theconfounder (colour of victim).

X2MH is a test statistic for no association between exposure and

outcome, adjusted for the confounder.Examination of confoundingNote that 1. and 2., but NOT 3. can be checked (statistically) on thedata at hand.

Degree of confounding is often examined using the change in estimateprinciple: how different are adjusted and unadjusted (“marginal”)estimates.

Selection of confounders is often based on prior knowledge and thestructure of variables in the problem may be depicted in a ‘DAG’(’directed acyclic graph’).

13

Page 14: Statistik for MPH: 5

Confidence limits for common odds ratio1) calculate ln(ORMH), here: ln(0.170) = −1.772

2)calculate L1 = ln(ORMH)− 1.96 · SD

L2 = ln(ORMH) + 1.96 · SD

where SD = |ln(ORMH)|√X2

MH

,

here SD = 1.772√96.0

= 0.181 that is:L1 = −1.772− 1.96 · 0.181 = −2.126,

L2 = −1.772 + 1.96 · 0.181 = −1.418

3) The desired 95% confidence limits are from exp(L1) = 0.119 toexp(L2) = 0.242.

Alternative (more complicated) formula for SD in Silva, p.327.

14

Page 15: Statistik for MPH: 5

Exercise:Analyse (using the Mantel-Haenszel method) the association betweenlength of prenatal care and neonatal mortality based on the followingdata.

Is clinic a confounder for this association?

Mortality among newborns and amount of prenatal care (McNeil, 1996).

Clinic A prenatal care<30 days

Dead Yes No Total

Yes 3 4 7

No 176 293 469

Total 179 297 476

Clinic B prenatal care<30 days

Dead Yes No Total

Yes 17 2 19

No 197 23 220

Total 214 25 239

15

Page 16: Statistik for MPH: 5

Solution:Clinic A: OR = 3·293

4·176 = 1.25, Clinic B: OR = 17·232·197 = 0.99.

MH-estimate

ORMH =3·293476 + 17·23

2394·176476 + 2·197

239

=3.483

3.128= 1.113

MH-test

X2MH =

((3+17)−( 7·179476 + 19·214

239 ))2

7·469·179·297476·476·475 + 19·220·214·25

239·239·238= (20−19.64)2

1.621+1.644 = 0.040, P = 0.84.

ln(ORMH) = ln(1.113) = 0.108, SD(lnORMH) =0.108√0.04

= 0.540

L1 = 0.108− 1.96 · 0.540 = −0.951, exp(L1) = 0.39

L2 = 0.108 + 1.96 · 0.540 = 1.16, exp(L2) = 3.21

16

Page 17: Statistik for MPH: 5

The “marginal” 2 by 2 table ignoring clinic is

prenatal care<30 days

Dead Yes No Total

Yes 20 6 26

No 373 316 689

Total 393 322 715

The “marginal” OR ignoring clinic is 20·3166·373 = 2.82 with 95%

confidence limits from 1.10 to 7.25 quite different from the estimateadjusted for clinic (1.113).

This suggests that clinic is a confounder.

17

Page 18: Statistik for MPH: 5

Mostly used approach to confounderadjustment of ORLogistic regression, see Silva, Section 14.6.

More on this later!

Using this method, it is also possible to estimate/test the effect of anexposure on odds for an outcome adjusted for other variables.

18

Page 19: Statistik for MPH: 5

When is the stratified analysis sensible?In the stratified analysis, we average the individual OR’s from theseparate strata.

This makes sense if the individual OR’s are roughly the same in allstrata (taking the random variation into account).

= if there is no interaction between exposure and stratificationvariable on the outcome

= if there is no effect-modification of the stratification variable onthe relation between exposure and outcome

19

Page 20: Statistik for MPH: 5

Tests for no interaction existbut go beyond what we can cover in this course.

Example

Prevalence of myocardial infarction by systolic blood pressure and age (Israeli Ischemic

Heart Disease Study, Kahn & Sempos, 1989).

Age≥60 MI cases MI negative Total

SBP≥ 140 9 115 124

SBP< 140 6 73 79

Total 15 188 203 OR=0.95

Age<60 MI cases MI negative Total

SBP≥ 140 20 596 616

SBP< 140 21 1171 1192

Total 41 1767 1808 OR=1.87

Interaction? (= Effect-modification?) Are the separate OR’s, 0.95 and 1.87 different?

Can be tested using logistic regression or the “Breslow-Day” test (the answer is no).

20

Page 21: Statistik for MPH: 5

Adjustment for confounding in cohort studies:Stratified analysis of rates (and risks)

Male stomach cancer cases in Cali and Birmingham:

Cali Birmingham

Age Popu- Person- No.of Rate per Popu- Person- No.of Rate per

lation years cancers 100000 ys. lation years cancers 100000 ys.

0-44 524220 2621100 39 1.5 1683600 6734400 79 1.2

45-64 76304 381520 266 69.7 581500 2326000 1037 44.6

65+ 22398 111990 315 281.3 291100 1164400 2352 202.0

Total 622922 3114610 620 19.9 2556200 10224800 3468 33.9

Crude RR = 19.933.9 = 0.59 not relevant due to age-confounding.

Age-adjustment?

Previously: standardisation.

Now: stratified (Mantel-Haenszel) analysis.

21

Page 22: Statistik for MPH: 5

Mantel-Haenszel analysis.We have a series (here 3) of tables of the form:

Events Person Stratum no. 3

years

“Risk factor +” a1 y1 a1 = 315 y1 = 111990

“Risk factor −” a2 y2 a2 = 2352 y2 = 1164400

Total a y a = 2667 y = 1276390

One table from each stratum.

22

Page 23: Statistik for MPH: 5

The Mantel-Haenszel estimator.In each table, we may estimate the rate ratio by

a1/y1a2/y2

=a1y2a2y1

=a1y2/y

a2y1/y

A common rate ratio for all strata may be estimated by theMantel-Haenszel estimator (sum over k strata; weighted average of

separate RR-estimates):∑ a1y2

y∑ a2y1y

= RRMH .

In example: RRMH =39·67344009355500 + 266·2326000

2707520 + 315·11644001276390

79·26211009355500 + 1037·381520

2707520 + 2352·1119901276390

= 1.45.

23

Page 24: Statistik for MPH: 5

The Mantel-Haenszel test.In each stratum, we can calculate:

OBServed = a1

EXPected = y1 · ay = y1a1+a2

y1+y2= E(a1)

Standard Deviation =√ay1

yy2

y =√(a1 + a2)

y1y2

(y1+y2)2= SD(a1)

The combined Mantel-Haenszel test statistic is (sums over k strata):

(∑a1 −

∑E(a1))

2∑(SD(a1))2

= X2MH ∼ χ2

1 under H0.

X2MH =

((39 + 266 + 315)−

(118 2621100

9355500 + 1303 3815202707520 + 2667 111990

1276390

))2118 2621100·6734400

(9355500)2 + 1303 381520·2326000(2707520)2 + 2667 111990·1164400

(1276390)2

= (169.33)2/395.00 = 72.59 ∼ χ21, P < 0.001.

24

Page 25: Statistik for MPH: 5

95% confidence limits for common rate ratio:1) Calculate ln(RRMH) = ln(1.45) = 0.373

2) Calculate L1 = ln(RRMH)− 1.96 · SDand L2 = ln(RRMH) + 1.96 · SD

Here: SD = |ln(RRMH)|√X2

MH

= 0.373√72.59

= 0.044.

That is: L1 = 0.373− 1.96 · 0.044 = 0.287,

L2 = 0.373 + 1.96 · 0.044 = 0.459.

3)The desired 95% confidence limits are from exp(L1) = 1.33 toexp(L2) = 1.58.

Note: alternative (and more complicated) formula for SD in Silva,p.330.

25

Page 26: Statistik for MPH: 5

Example:British Doctors cohort study, comparing coronary deaths amongsmokers and non-smokers adjusted for age (McNeil, 1996)

Deaths Person-years

Age Group SM Non-SM SM Non-SM RR

35-44 32 2 52407 18790 5.74

45-54 104 12 43248 10673 2.14

55-64 206 28 28612 5710 1.47

65-74 186 28 12663 2585 1.36

75-84 102 31 5317 1462 0.90

Total 630 101 142247 39220 1.72

Crude RR =630/142247

101/39220= 1.72

Age-adjustment not obvious, RR varies among strata:Effect-modification/interaction - yes!

26

Page 27: Statistik for MPH: 5

27

Page 28: Statistik for MPH: 5

If risks (“person-denominators”) are used instead of rates(“person-years-denominators”):

Events No-events Total

Risk factor + a c a+ c

Risk factor − b d b+ d

Exactly the same calculations may be used:

Replacing y1 by a+ c

and y2 by b+ d

See: Silva, p.329.

28

Page 29: Statistik for MPH: 5

Exercise:Consider the British Doctors Study restricting attention to the two agegroups 45-54 and 55-64 and compare smokers and non-smokersadjusting for age.

Data:

45-54 Deaths PYRS 55-64 Deaths PYRS

SM 104 43248 SM 206 28612

Non-SM 12 10673 Non-SM 28 5710

116 53921 234 34322

29

Page 30: Statistik for MPH: 5

Solution.

RRMH =104·10673

53921 + 206·571034322

12·4324853921 + 28·28612

34322

= 1.66

X2MH =

((104 + 206)−

(116 · 4324853921 + 234 · 2861234322

))2116 · 43248·10673(53921)2 + 234 · 28612·5710(34322)2

= 9.42 ∼ χ21, P ∼ 0.002

ln(RRMH) = 0.509 , SD(ln(RRMH)) =0.509√9.42

= 0.166

L1 = 0.509− 1.96× 0.166 = 0.184, L2 = 0.509 + 1.96× 0.166 =

0.834.

95% confidence interval from exp(L1) = 1.20 to exp(L2) = 2.30

30