local model uncertainty and incomplete-data bias

1

　 Local model uncertainty and

Incomplete-data bias

S. Eguchi, ISM & GUAS

This talk was a part of co-work withJ. Copas, University of Warwick

2

Hidden Bias

Publication bias － not all studies are reviewed

Confounding － causal effect only partly explained

Measurement error － errors in measure of exposure

3

Lung cancer & passive smoking

1.00.50.3 1.5 2.0 3.0 4.0 5.0 10.0

stud

y

Odds ratio

5

10

15

20

25

30

4

Passive smoke and lung cancer

Log relative risk estimates (j =1,…,30) from 30 2x2tables j

)weighte varinancinverse theis( jj

jj ww

w

The estimated relative risk 1.24 with 95% confidence interval (1.13, 1.36)

Conventional analysis

1.00.50.3 1.5 2.0 3.0 4.0 5.0 10.0

stud

y

Odds ratio

5

10

15

20

25

30

1.24

6

Incomplete Data

z = (data on all studies, selection indicators)

y = (data on selected studies)

z = (response, treatment, potential confounders)

y = (response, treatment)

z = (disease status, true exposure, error)

y = (disease status, observed exposure)

y = h(z)

7

Level Sets of h(z)

1. One-to-one 2. Missing 3. Measurement error

4. Interval censor

5. Competing risk 6.Hidden confounder

8

Ignorable incompleteness

Let Y = h(Z) be a many-to-one mapping.

If Z has then Y has

)(1d),(),(

yhZY zzθy ff

Z is complete; Y is incomplete

),( θzZf

ZθZ onData ˆMLE

YθY onData ˆMLE

)ˆE()ˆ(E trueis ZYZ θθ f

)ˆE()ˆ(E wrongis ZYZ θθ f

9

Tubular Neighborhood

M

M

})),,((KLmin:),,({ 221

YYY gfg θθyΘθ

N

}:),({ θθyYfMModel

Copas, Eguchi (2001)

2

2

),(KL

MM

Near-model }:),,({ θθyY gM

10

Mis-specification

}{ ),(exp),(),,( θzθzθz ZZZ ufg

0 )(E,1)(,0)(E 2ZZZZ suuEu fff

21

}),KL(2{ ZZ fg

"direction cationmisspecifi"Zu

11

Near model

}{ ),(exp),(),,( θyθyθy YYY ufg

],|),([E),(where yθzθy ZθY uu

)(:By zhyzh

h),( θzZf ),( θyYfModel

Near-modelh

),,( θzZg ),,( θyYg

12

Asymtotic bias

)1( ||||max min22

|

b

Zu

ifonly and if holds bounds The

.),(),( θysθy YY u

)ˆE()ˆ(E ZY θθb

loss ninformatio of eigenvaluesmallest theismin

2/12/1 YZY III

13

From pure misspecification

Unbiased perturbed biased perturbedh

14

The worst case

),,,( ωθyYg

),( θyYf

),( ωθy YY If ).,(),( T θysωθy YY u

*)),(),,,,(KL(minarg*

θωθωθ YYΘθ

Y

fgI

15

Nonignorable missingness

rrTfrf 1)1(;(;,( ttZ

),(),( )( rr rtytz

0ifRI

1ifwhere )(

r

rp

r tt

}logvar{)1(|||| )|0()|1(212

trPtrP

g

gI

Yb

The model assumes MCAR or MARZf Yf

16

Potential confounder

,),(),,( xtycxtz

confounderexposure,responce,

where

cxt

),(~|: 2T xθxtY Nf

xccβxθcxtZ ,),|E(: TTf

),(cor)|,(cor|||| 2212 xcxctb YI

17

Problem in estimation of bias

The nonignorable model

}{ 22 )(),(exp),(),,( 2

1 θθyθyθy YYY ufg

gives the worst case if ).,(),( T θysωθy YY u

However is inestimable and untestable: ω

The profile likelihood

n

iigPL

1

)},,,({logmax),( ωθyω YΘθ

Y

is flat at 0ω

18

Heckman model for MNAR

))(()1( T1T, xβxψX|R yrg Y xβTy

),,()(),,( )( rthrt r xzxz

19

Sensitivity analysis

}{ TT

2

1),(exp),(),,( ωωθysωθyωθy YYY YIfg

θ

θysωθysωθy

θY

YY

),(

),()},,(log{ 2/1TYIg

The most sensitive model

Estimating function of with fixed

Yθ̂}const.:ˆ{ T

, ωωθ Yω IY

20

Scenarios A, B, C

Inference from using fY nyy ...,,1

}.)()ˆ()ˆ(:{)(C 2rkIk T YYY θθθθθ

Scenario A: 10 Ak

Scenario C: 1unknown0 Ck

Scenario B:

acceptable

n

found had and

,..., observed had weif,0 1 zz

CBA kkk

21

Scenarios A and C

),0(~)ˆ(2/1 INI fYθθYY

}.)()ˆ()ˆ(:{)(C 2rkIk T YYY θθθθθ

Scenario A: 0

Scenario C:

),0(~)ˆ(

unknown02/1 INI gY

bθθYY

?!)(,1 AA kCk

?!)(,1 22CC kCk

22

Scenario B

andˆ MLEhave couldwe,,..., observe could weIf 1 Zθzz n

),0(~)ˆˆ()( 2/12/1 INII fYZYY θθU

),0(~|)(* INgYUSUS

)}ˆˆ()ˆ{()ˆ( 2/12/1ZYYZZZ θθθθθθS II

Conditional confidence interval

}||)(||:{)( 22*rC uSθu

23

Theorem

}.)()ˆ()ˆ(:{)(CLet 2rkIk T YYY θθθθθ

).2()()1( Then22||||

CCCr

uu

-1.5 -1 -0.5 0.5 1 1.5

-1.5

-1

-0.5

0.5

1

1.5

-1.5 -1 -0.5 0.5 1 1.5

-1.5

-1

-0.5

0.5

1

1.5

-1.5 -1 -0.5 0.5 1 1.5

-1.5

-1

-0.5

0.5

1

1.5

24

Risk from passive smoke

25

Passive smoke and lung cancer

The estimated relative risk 1.24 with 95% confidence interval (1.13, 1.36)

Square root rule 95% confidence interval (1.08, 1.41)

Root-2-rule

1.00.50.3 1.5 2.0 3.0 4.0 5.0 10.0

stud

y

Odds ratio

5

10

15

20

25

30

1.24

27

Present and Future

Does all this matter?

Statistics ( missing data, response bias, censoring)

Biostatistics (drop-outs, compliance)

Epidemiology ( confounding, measurement error)

Econometrics (identifiability, instruments)

Psychometrics (publication bias, SEM)

causality, counter-factuals, ...

local model uncertainty and incomplete-data bias

Documents