monte carlo maximum likelihood methods for estimating uncertainty arising from shared errors in...

Monte Carlo Maximum Likelihood Methods for Estimating Uncertainty

Arising from Shared Errors in Exposures in Epidemiological Studies

Daniel O. Stram

University of Southern California

Complex Dosimetry Systems: a Working Definition (my definition)

• A complex dosimetry system for the study of an environmental exposure is one in which no single best exposure estimate is provided

• Instead a distribution of possible true exposures is developed, together with a computer program that generates exposure replications from this distribution

– Generates doses conditional on input data

• Both shared and unshared errors are incorporated into the dose replications

• The statistician/epidemiologist treats this system as a “Black Box”, ie one that (s)he can manipulate, but doesn’t know (or care?) about its inner workings

• Some examples of epidemiological studies (of radiation) that use a complex dosimetry system to estimate doses

– Utah Thyroid Disease Cohort Study

– Hanford Thyroid Disease Study

– Colorado plateau Uranium Miners study

• In such studies limited or no direct measurements of individual dose exist. Instead a complex dose reconstruction (Utah, Hanford) or interpolation system (Colorado) is used to construct individual dose estimates or histories.

• Even when all subjects in the study have (radiation) badge measurements these may need adjustments to reflect temporal or geographical differences in monitoring technology

– Random errors and systematic biases exist for virtually any method

– Information about the size of random and systematic biases for each dosimeter type comes from only a few experiments

• Therefore there may considerable uncertainty in the systematic biases for any single dosimeter

– Systematic biases constitute shared error

Representation of uncertainty in complex dosimetry systems

• Uncertainty in the dose estimates produced by these systems is increasingly characterized using Monte-Carlo methods which yield many “realizations” of possible dose, rather than a single best estimate of dose for each subject.

• Part of the uncertainty of these estimates may be due to lack of knowledge of factors that influence simultaneously some or all the subjects’ doses

Dose estimation in the Hanford Thyroid Disease Study

• Reconstruction based on physical modeling and some measurements of

– Releases of I-131

– Deposition and pasture retention of I-131

– Pasture practices

– Milk transfer coefficients

– Individual consumption of milk

• Note that errors in most of these will affect doses for all individuals simultaneously

Colorado Plateau Underground Miners Study

• Dose estimates created using a complex exposure history / job history matrix

• PHS exposure history matrix consisted of interpolations of limited WLM measurements temporally and geographically

– Stram et al 1999 used a PHS developed hierarchy of mines within localities within districts and used a multilevel model to mimic temporal and geographical variation in dose.

• The 1999 analysis was based upon the ‘regression-substitution’ method in which E(true dose|all measurements) was computed for each mine-year, after fitting the lognormal multilevel model to the WLM measurements

• Errors in mine-year measurements are correlated by the interpolation system used, and many miners work in the same mines leading to correlated errors in the exposure history of each miner.

Pooled Nuclear Workers

• Multi-facility, multi-year study

– Each worker had badge measurements but the technologies changed through time and across facility.

• The systematic errors in each badge type are shared by all subjects working at the time the badge was in use

– For many but not all types of personal monitor some limited work (using phantoms, etc.) has been done to assess the relationship between true exposure and the badge measurement

• One important issue is whether the low dose-rate exposures of the N workers produce risks that are in line with those seen for the A-bomb

– upper confidence intervals that take account of shared dosimetry error needed

Monte-Carlo Dosimetry• Adopts a Bayesian framework

– Is Bayesian about sampling error in the experimental work (with badges), interpreted as giving posterior distributions

– Prior distributions for uncertain parameters (for N workers, the likely size of biases for unmeasured badges) using expert opinion

• For each replication the uncertain parameters are sampled from their distribution and combined with samples of other random factors (e.g. local meteorology for the Hanford or Utah studies) and with all relevant individual data for each subject (location, milk consumption, age, etc)

• Each set of random quantities is combined with individual data to form dose estimates for each individual

• Let us assume that the dose replications really may be regarded as samples from the distribution of true dose given all the individual data

– For retrospective dose-reconstruction systems this assumption may be a very large leap of faith

– For other studies using badge calibration (Workers) or measurement interpolation this may be considerably more solidly founded.

• Consider the sampling characteristics of frequentist inference concerning risk estimation. We want to know the influence of uncertainty on

– The power to detect an effect (of exposure on risk of disease) of a certain size

– Confidence limits on estimated risk parameters

An idealized dosimetry system• Assume each replication of dose is a

sample from the joint distribution

f(X1, X2,.., XN | W1, W2,…,WN)

of true dose given the “input data” Wi recorded for all subjects. Because many realizations, from f(X|W) are available we can calculate

Zi = E(Xi| W)

as the average over a very large number of realizations, Xri where Xr ~ f(X|W) r=1 …

How should an epidemiologist deal with the uncertainty in the random

replications of dose?

• We are interested in estimating parameters in the dose-response function for disease Di given true dose Xi , specifically the relationship

E(Di | Xi)

• parameterized by (dose response slope)

Simplifications of the disease model

• Assume a linear relation between D and X

E(Di | Xi) = a + b Xi (1)

• Linear models are of interest for at least two reasons

1. They may be important for radio-biological and radio protection reasons even for binary disease outcomes (where logistic regression models are the “standard”)

2. For “small” b it may be impossible to distinguish between linear and smooth nonlinear (e.g. logistic) dose response shapes

– A study with good power to detect a dose-response relationship may have very poor power to fully define the shape of the response

Berkson error models

• If the errors in the Z_i’s defined above are independent from one another then fitting model (1) is done by replacement of true X_i with Z_i.

• This is a Berkson error model in the sense that the truth is distributed around the measured value. Regression-substitution yields unbiased estimates.

• The classical error model has the measurement distributed around the truth. This produces risk estimates that are biased towards the null.

Impact of independent measurement error

• For either Berkson or Classical error models the most important effect of random error is loss of power to detect nonzero risk estimates

– If R2 is the squared correlation between true exposure X and measured exposure Z then it will take 1/R2 subjects to detect the same risk using Z as using true X.

Shared versus unshared dosimetry error

• A key distinction between the effects of shared versus unshared dosimetry error is their effect on the validity of sample variance estimates used to characterize the variability of the estimates

– Independent Berkson Errors: The “usual” estimate of the std error of the slope estimate remains valid despite the loss of power

– Independent Classical Errors: Again the “usual” estimate of the standard error of the slope estimates generally remains valid despite1. The loss of power2. The attenuation in the dose response parameter estimate

Dosimetry simplifications

• Adopt a generalization of the Berkson error model for the joint distribution of true Xi around its conditional mean Zi which incorporates both shared and unshared errors

SM is shared multiplicative error with mean 1

M,i is unshared multiplicative error with mean 1

SA is shared additive error with mean 0

A,i is unshared additive error with mean 0

SMi M i i SA AiX Z

• Under this shared and unshared multiplicative and additive (SUMA) error model we have E(X|W) = Z (the usual Berkson property) over the distribution of all four

• What happens when we fit

E(Di | Zi) = a* + b* Zi

If there are no measurement errors Var() = 0, we will have (for small values of b* )

(1)* ( )ˆ( )

( )

Var DVar b

NVar Z

Effects of shared and unshared errors on estimation

– We are interested in three questions regarding each error component in the SUMA model• What is its effect on study power?

• What is its effect on the validity of expression (1) for the variance of the estimate of b?

• How valid are the estimates of study power when they are based on expression (1)?

– Shared Additive error: has little effect on either the estimation of b or on the variability of the estimate

– Unshared Additive or Multiplicative errors: reduces the correlation, R, between Xi and Zi, thereby reducing study power, the reduction in study efficiency due to unshared measurement error is roughly proportional to R2

– however the validity of expression (1) for the variance of the estimator remains appropriate. Further the estimate of study power using (1) remains appropriate

• Averaging over the distribution of random SM we retain the “Berkson” property that

But with

2 2 ( )(2)

( )SM

Var Db

NVar Z

*ˆ( )E b b

*ˆ{ ( | , )}Sm SAE Var b

Effect of multiplicative shared error

* *ˆ ˆ( ) { ( ) | }SMVar b Var E b

• Notice that if b = 0 that the “naïve” estimate of the variance of

ignoring the shared error is equal to

the true variance of this parameter

• If |b| > 0, the naïve estimate of the variance is biased downward by

*b̂

2 2SMb

• We conclude

1. Ignoring shared error does not affect the validity of the test of the null hypothesis that b=0, because expression (2) = expression (1) when b=0

– More generally: non-differential ME weakens the power, but doesn’t invalidate the validity, of a test of association between disease and exposure

2. Ignoring shared error will overstate the power to detect a b>0, because (1) < (2) in this case

– Ignoring shared error will result in confidence limits that are too narrow

– However it is the upper confidence limit that is most affected.

• If the lower confidence limit ignoring shared error does not include zero, correcting for shared error will not cause it to include zero (because of conclusion 1)

How to incorporate shared ME directly into an analysis

• Multiple imputation

• “Full Parametric Bootstrap”

• Likelihood analysis with MCML

Multiple Imputation• It is tempting to try to quantify the uncertainty in

by regressing Di on each set of Xr and using the quantiles of the resulting as confidence limits for b

– This ignores the sampling variability of D

– Moreover the distribution of the slope estimates can be badly biased towards the null value. Essentially there is a reintroduction of classical error into the problem

• True multiple imputation requires sampling Xr from the distribution of X given both the input data W and the outcomes Di (not just W) to remove these biases

b̂ˆrb

“Full Parametric Bootstrap”

• A simulation experiment in which is used as the true value of the risk parameter and both doses and outcomes Di are simulated from a complete model

b̂

Monte-Carlo maximum likelihood• We can compute likelihood ratio tests as

follows

– For null a0 and b0 generate n samples of Xr from the distribution of X given W and D

– For any test values a and b compute the log likelihood ratio as

– If we use b0 = 0 then we don’t have to condition on D (so that we can use the dosimetry system directly)

0 0

( | ; , )1( , ) log )

( | ; , )r

nr

f a bl a b

n f a b

D X

D X

Once we compute the likelihood what do we do with it?

• We have a funny mishmash; we are being

– Bayesian about the doses

– Frequentist about the dose-response parameter

• Moreover we can’t really expect standard Frequentist asymptotic likelihood theory to hold

– Suppose the number of subjects ∞ then the distribution of will be dominated by the distribution of the shared multiplicative errors in the dosimetry system the distribution of which is arbitrary.

– Is it still reasonable to use chi-square approximations to the distribution of changes in log likelihood?

b̂

Other problems

• If shared multiplicative error is large then as |b-b0|

gets large the summands in (5)

become extremely variable

– Convergence of the average is incredibly slow

– Round-off error dominates the performance of the algorithm

0 0

( | ; , )

( | ; , )r

r

f a b

f a b

D X

D X

Application to the ORNL N-workers dataStayner et al in review

• Estimate a single risk parameter using (time dependent) total dose in a partial likelihood analysis

• Write a computer program that simulates the bias factors for the badges used in those facilities and re-links the risk sets

Three analyses

1. Compute the MCML Likelihood

– For each replication of doses compute the partial likelihoods over a 1-dimensional grid of risk parameters

– Average the partial likelihoods over the replications

– Pretend that the asymptotics still hold and compute a confidence interval

2. Compute FPB estimates of

• Compare these to the MCML confidence intervals

3. For each set of D computed in 2 compute a separate MCML confidence interval (more simulations from the dose distribution)

• Count the number of times that the standard frequentist confidence interval contains the true value of the risk parameter

b̂

0 2 4 6 8 10 12

-3-2

-10

Log Likelihood

beta

y

Uncorrected log likelihood

MCML log likelihood

Betahat_r

bhatx

Fre

qu

en

cy

-5 0 5 10 15

05

01

00

15

0

7.8% < LCI

6.8% < 0

3.3% > UCI

FPB simulations

Some observations

• The MCML widens the confidence interval on the high side more than the low side

– The 90 percent “asymptotic” lower CI for the MCML does not include 0.

• This is good because (1): the uncorrected CI did not include 0; and (2) we claim that correcting for measurement error shouldn’t affect the significance of a test of no association.

• Note that the two curves (corrected and MCML log likelihoods) are very close to parallel at b=0

– This implies that a score test of beta=0 wil be (nearly) identical using the corrected and uncorrected likelihoods using any significance criterion

• This observed result follows from Tosteson and Tsiatis 1988 on score tests for EIV problems

• The FPB on the other hand puts “significantly” more than 5 percent of the estimates < 0 (68 of 1,000) and significantly fewer of the estimates (33 of 1,000) above the MCML UCI.

• This may actually be a promising observation for the validity of the MCML confidence intervals

– They tend to be skewed to the right (not symmetric around the MLE) so more (than 3.3 percent) of the upper confidence limits and fewer (than 6.8 percent) of the lower confidence intervals should fail to contain the true value

– Simulations are now in progress

Validity of MCML CI

b̂

• Consider limiting case when n->∞ the slope estimate will be determined by the distribution of shared multiplicative errors

• Worst case would be the SUMA model

– Suppose that SM is distributed as log normal with arithmetic mean 1 (log mean -1/2 2)

– Then = b / SM is also distributed as log normal with mean parameter log(b) -1/2 2 and log variance 2

– Consider twice the change in log likelihood from true b to MLE

– This will be 1/2[log( )-(log(b)-1/2 2)]2 which is exactly 2

• Consider next a normal distribution for the shared multiplicative error

– Would make sense if the SM was itself a sum of many components

b̂

• For this model twice the change in log likelihood is of form

-2 log(|SM|) + 1/(2)(SM-1)2 + c

Where c = 2 log(1/2+1/2(1+4 2))

- [1/2(1+4 2)-1/2]2 / 2

• What is the distribution of this random variable?

– How close is it to a Chi Square w 1 df?

0.4 0.6 0.8 1.0

0.4

0.6

0.8

1.0

Change in likelihood,Std Dev=.1

Chisq CDF

Em

piric

al C

DF

0.4 0.6 0.8 1.0

0.4

0.6

0.8

1.0


Chisq CDF

Em

piric

al C

DF

0.4 0.6 0.8 1.0

0.4

0.6

0.8

1.0


Chisq CDF

Em

piric

al C

DF

0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0


Chisq CDF

Em

piric

al C

DF

Conclusions

• The MCML has promise but it is complicated

– But other methods (multiple imputation, etc, have complications of their own)

– Score tests of b=0 based on the average likelihood agree with analyses that ignore measurement errors

• Our application of the MCML method for partial likelihoods ignores the dilution effects described by Prentice (Biometrika 1982): but these are expected to be very small in most settings

• In shared error settings the asymptotics are not “correct” for ordinary frequentist calculations, but it seems to be hard to come up with situations where they fail drastically

Acknowledgements

• Leslie Stayner, Stephen Gilbert UIC/NIOSH

• Elisabeth Cardis, Martine Vrijheid, Isabelle Deltour, IARC

• Geoffrey Howe, Columbia

• Terri Kang, USC

monte carlo maximum likelihood methods for estimating uncertainty arising from shared errors in...

Documents

dose replications

individual dose estimates

dose estimation

wlm measurements errors

single best estimate

year measurements

realizations of possible

measurements of releases