some comments on misspecification of priors in bayesian modelling of measurement error problems
TRANSCRIPT
![Page 1: SOME COMMENTS ON MISSPECIFICATION OF PRIORS IN BAYESIAN MODELLING OF MEASUREMENT ERROR PROBLEMS](https://reader036.vdocuments.site/reader036/viewer/2022082717/5750010b1a28ab11488bd796/html5/thumbnails/1.jpg)
STATISTICS IN MEDICINE, VOL. 16, 203—213 (1997)
SOME COMMENTS ON MISSPECIFICATION OF PRIORS INBAYESIAN MODELLING OF MEASUREMENT ERROR
PROBLEMS
SYLVIA RICHARDSON AND LAURENT LEBLOND
Institut National de la Sante et de la Recherche Medicale-U.170, 16 Avenue Paul Vaillant-Couturier,94807 VILLEJUIF Cedex 07, France
SUMMARY
In this paper we discuss some aspects of misspecification of prior distributions in the context of Bayesianmodelling of measurement error problems. A Bayesian approach to the treatment of common measurementerror situations encountered in epidemiology has been recently proposed. Its implementation involves, first,the structural specification, through conditional independence relationships, of three submodels —a measurement model, an exposure model and a disease model — and secondly, the choice of functional formsfor the distributions involved in the submodels. We present some results indicating how the estimation of theregression parameters of interest, which is carried out using Gibbs sampling, can be influenced bya misspecification of the parametric shape of the prior distribution of exposure.
1. INTRODUCTION
Imperfectly observed covariate data is a common problem in many epidemiological studies. Thefields of nutritional, environmental or occupational epidemiology abound with examples wheresurrogates, for example, a food frequency questionnaire or a job title, are used to measure ina coarse way the risk factor of particular interest.1 Consequently, the development of methods forcorrecting the effects of measurement errors has been an active area of research,2~7 particularly inthe epidemiological context.8~14
The statistical treatment of measurement error problems can be seen as part of the widerframework of incomplete data problems.15 In that area, Bayesian modelling has made substantialcontributions as it proposes a unified treatment of all unobservables, which encompasses the caseof missing or unknown covariates through sampling protocols, or because only surrogates arerecorded, as well as more broadly grouped or censored data, latent variables or unobserved stagesin a longitudinal process.16~18 We use the term ‘missing covariate’ to denote both missing orimperfectly measured covariates. In a Bayesian framework, all unobservables are considered asrandom quantities which are assigned a prior distribution expressing probabilistically uncertain-ty or degrees of belief about their values. Statistical parameters are also treated as unobservables.Inference about the (few) parameters of interest will then be based on their marginal posteriordistribution, resulting from an integration with respect to the missing data. By explicitly introdu-cing the probabilistic structure of the missing data into the model, one ensures that the
CCC 0277—6715/97/020203—11( 1997 by John Wiley & Sons, Ltd.
![Page 2: SOME COMMENTS ON MISSPECIFICATION OF PRIORS IN BAYESIAN MODELLING OF MEASUREMENT ERROR PROBLEMS](https://reader036.vdocuments.site/reader036/viewer/2022082717/5750010b1a28ab11488bd796/html5/thumbnails/2.jpg)
uncertainty derived from the missing data is correctly propagated onto the estimation of theparameters of interest. Thus, even though missing covariates and parameters are formally treatedsymmetrically in the model building step, when it comes to inference the situation is no longersymmetric as one is concerned with integrating over the structure of the missing data.
Particularly suited to this integration is the treatment of these problems by stochastic algo-rithms of the family of Markov chain Monte Carlo methods (MCMC).19 Indeed the missing datastructure is fully exploited by this approach which provides dependent, approximate samplesfrom the joint posterior distribution of all unobservables given the data. From this joint posterior,any marginal posterior distribution of interest is immediately retrievable.
In the measurement error problem, the key steps are:
(i) the specification of a measurement model linking surrogate measures Z and true covari-ates X;
(ii) the specification of an exposure model (prior distribution for X), also sometimes referred toas structural modelling in the literature.
The aim of this note is to present some results indicating how the estimation of the regressionparameters of interest can be influenced by misspecification of the parametric form for the priordistribution of X. Clearly regression parameters may be influenced by other types of misspecifica-tion, in particular concerning the measurement model. In this paper, we restrict ourselves tomisspecification of the prior distribution of X as it is a key element of the Bayesian formulationfor which there is often only weak prior information. Let us note that other approaches tomeasurement error problems, in particular along semi-parametric lines,3,5, 6,25 are specificallytrying to avoid making assumptions about the distribution of X.
We start by summarizing a Bayesian approach to measurement error problems in epidemi-ology using conditional independence relationships which has been outlined in a series ofpapers.17,20,21 Using simulated data sets this approach has been shown to deal successfully witha wide range measurement error problems. Related approaches have been discussed by severalauthors.22~25
2. CONDITIONAL INDEPENDENCE MODELLING OF MEASUREMENT ERRORPROBLEMS
The structure of the measurement problem in epidemiology can be formulated as follows. Riskfactors (covariates) for each individual are to be related to the disease status (response variable)½ of that individual. However, for many or all individuals, while some risk factors C are trulyknown, other risk factors X are unknown, although one or several surrogate measures Z of X arerecorded. To model this situation we shall distinguish three submodels (following the terminologyof Clayton26):
(i) a disease model, which expresses the relationship between the risk factors C and X and thedisease status ½ ;
(ii) a measurement model, which expresses the relationship between the surrogate measuresZ and the true unknown risk factor X ;
(iii) an exposure model which specifies the distribution of the unknown risk factor X in thegeneral population.
The underlying structure of these three submodels can be fundamentally characterized byexpressing the following conditional independence assumptions :
disease model [½iDX
i, C
i, b] (1)
204 S. RICHARDSON AND L. LEBLOND
![Page 3: SOME COMMENTS ON MISSPECIFICATION OF PRIORS IN BAYESIAN MODELLING OF MEASUREMENT ERROR PROBLEMS](https://reader036.vdocuments.site/reader036/viewer/2022082717/5750010b1a28ab11488bd796/html5/thumbnails/3.jpg)
measurement model [ZiDX
i, j] (2)
exposure model [XiDn] (3)
where the index i denotes individual i, b, j and n denote model parameters and [º D»] genericallydenotes the conditional distribution of º given ». The variables in (1), (2) and (3) can be scalar orvector. Since we are in a Bayesian framework, prior distributions for b, j and n are also required(denoted, respectively, by [b], [j] and [n]). Equations (1), (2) and (3) are called model conditionaldistributions (model conditionals for short).
We also imply additional conditional independence assumptions (the directed Markov assump-tion) which specify that the joint distribution of all the variables can be written as the product ofall the model conditionals:
[b][j] <i
[XiD n] <
i
[ZiDX
i, j] <
i
[½iDX
i,C
i, b] . (4)
Equation (1) states that we place ourselves in the classical case, where, conditionally on the trueexposure being known, the surrogate measures Z
ido not add any information on the disease
status, a hypothesis also referred to in the epidemiological literature as ‘non-differentiable errors’.Equation (2) states that by conditioning on appropriately defined parameters j and the trueexposure X
i, the surrogate measures Z
iare independent amongst individuals. Equation (3)
models the population distribution of unknown risk factors amongst individuals in terms ofparameters n.
By specifying the conditional distribution of the surrogate Z given the true exposure X as inequation (2), we are placing ourselves in the Bayesian analogue of what is traditionally referred toas the ‘classical error model’, where measurement error is independent of X. Another type of errormodel which has been considered in the literature8, 14 is the Berkson error model, where equa-tion (2) is replaced by [X
iDZ
i, j]. With the Berkson error model, usually no model is specified for
the marginal distribution of Z, and so, in the formulation of the measurement model, there isimplicit conditioning on the data.
Equations (1) to (3) have only specified generically the structure of the measurement errorproblem. To use our conditional modelling approach for a given epidemiological study, one mustwrite down model equations corresponding to the type of study design (assessment of a goldstandard and existence of a validation group, use of repeated measures, of several instruments, ofancillary risk factor information) and the specific measurement instruments used in the study.
Models appropriate for studies in nutritional epidemiology have been discussed by severalauthors9,10,17 and a measurement error structure common to many occupational studies hasbeen detailed in Gilks and Richardson.20 After the structural part, the functional part of themodel equations has to be specified. This entails choosing particular parametric forms for thedistributions involved in equations (1) to (3), as well as specifying the prior distributions of theparameters.
As indicated in the introduction, Bayesian estimation in the general framework outlined abovecan be carried out straightforwardly by Gibbs sampling. Gibbs sampling is a Markov chainMonte Carlo method for generating samples from the joint posterior distribution of the modelparameters. It was originally proposed by Hastings.27 As the joint posterior distribution ofinterest, the target distribution, is highly multidimensional, direct simulation is not possible,hence an irreducible Markov chain is constructed with stationary distribution, the targetdistribution of interest. The wide applicability of the algorithm to general statistical modellingwas emphasized by Gelfand and Smith,28 and Gelfand et al.,29 and has since been demonstrated
BAYESIAN MODELLING OF MEASUREMENT ERROR PROBLEMS 205
![Page 4: SOME COMMENTS ON MISSPECIFICATION OF PRIORS IN BAYESIAN MODELLING OF MEASUREMENT ERROR PROBLEMS](https://reader036.vdocuments.site/reader036/viewer/2022082717/5750010b1a28ab11488bd796/html5/thumbnails/4.jpg)
by many authors. A general introduction and many examples of application including measure-ment error problems can be found in Gilks et al.30 For computational details of the implementa-tion of some measurement error models the reader is referred to references 17 and 20.
3. INFLUENCE OF THE MISSPECIFICATION OF THE EXPOSURE MODEL
At each step of the approach we have outlined, conditional distributions need to be explicitlyspecified in a parametric way. While some of these parametric distributions arise naturally, suchas the choice of the logistic model for the disease risk, other assumed distributional forms aremore arbitrary. In particular, there are some cases where little is known about the distribution ofthe exposure X and an appropriate model for it. In a context of radiation exposure throughfall-out of nuclear tests, Thomas et al.22 have preferred to use a discrete distribution witha variable number of atoms for modelling the exposure.
In this section we present a series of examples with the aim to investigate how misspecificationof the exposure model influences the performance of our method of analysis. We have usedsimulated data sets throughout.
3.1. Design set-up
Two risk factors are involved in the disease model. The first one, X, is measured with error andthe second one, C, is known accurately. We consider the case of a logistic link between risk factorsand disease status. Specifically, we suppose that ½
ifollows a Bernoulli distribution with para-
meter ai, where logit a
i"b
0#b
1X
i#b
2C
i.
Concerning the measurement process, we consider the simple case of an unbiased instrumentZ, possibly recorded twice in a subgroup of individuals. We thus suppose that the measurementmodel conditional, for the rth repeat of instrument Z
ir, is a normal distribution with mean X
iand
variance h~1 :
[ZirDX
i, h]&N(X
i, h~1), r"1, 2.
Two cases of study design are considered:
Design 1: It comprises a main study consisting of 1000 individuals where only the surrogate Z hasbeen recorded. In complement to the main study, there is a validation group, planned inadvance, of n"50 individuals, that is, a group where besides Z, it is assumed that it hasbeen possible to measure accurately the true exposure X by means of a gold standard.
Design 2: The main study is as above but instead of a validation group, the design now includesa subgroup of n"50 individuals where Z has been recorded independently twice.
3.2. Simulation set-up
For each design, a baseline case (a) where X is normally distributed and four cases (b) to (e) ofnon-Gaussian distributions for X are considered:
(a) X&N(2; 4)(b) X&0·5 N(0·26; 1·0)#0·5 (3·73; 1·0)(c) X&0·5 N(0·26; 2·22)#0·5 (3·73; 0·74)(d) X&log-normal (0·346; 0·832)(e) X&s2
2.
206 S. RICHARDSON AND L. LEBLOND
![Page 5: SOME COMMENTS ON MISSPECIFICATION OF PRIORS IN BAYESIAN MODELLING OF MEASUREMENT ERROR PROBLEMS](https://reader036.vdocuments.site/reader036/viewer/2022082717/5750010b1a28ab11488bd796/html5/thumbnails/5.jpg)
For all the data sets, the disease status ½ was generated assuming logistic regression para-meters fixed at b
0"!0·8, b
1"0·9, b
2"1·2. Throughout we suppose that X and C are
associated by a linear relation; C"cX#e, where e is a standard Gaussian variable, independentof X and c"0·375. In our baseline case (a) this corresponds to a correlation between X andC equal to 0·6. Note that the expected value of X is 2·0 for all examples. For each design and eachcase (a) to (e), 10 data sets were simulated and analysed by Gibbs sampling.
3.3. Prior distribution used in the Bayesian analysis
Each of the simulated data sets was analysed by Gibbs sampling, under the assumption that theexposure model for (X
C) was specified as a bivariate normal distribution with mean k and
variance-covariance matrix &, with a vague normal prior distribution for k centred around (2>02>0
)with precision matrix (0>015,
0>0,0>00>015
) and a Wishart prior distribution for & with 5 degrees offreedom and identity scale matrix.
Hence, apart from data set (a), the parametric shape of the prior distribution of the exposureused to carry out the Bayesian analysis is misspecified. Concerning the prior distribution for theregression parameters, we assumed b
0&N(!1·0; 4·0), b
i&N(0·0; 4·0) i"1, 2 and h&gamma
(0·1; 0·1).
3.4. Results
The results are presented in Table I for the design with a validation group (design 1) and in Table IIfor the design with repeated measures in a subgroup (design 2). For each case and each of the 10data sets, we ran the Gibbs sampler for 5000 iterations discarding the first 500 iterations as a burnin; good convergence behaviour of the Gibbs sampler in measurement error problems have beenpreviously reported17 and visual inspection of the sequences of parameters values confirmed this.We have summarized marginal posterior distributions of the parameters of interest by reportingposterior means and posterior standard deviations averaged over the 10 simulations, as well asthe empirical standard deviation between the mean estimates in each of the 10 simulations.
3.4.1. Standard case (data set (a))
As expected from previous simulations the results show that our estimation method has per-formed satisfactorily for the two designs, with all the estimated posterior means of the parametersvery close to the values set in the simulation.
Note that the prior mean kX
and the precision h have also been well estimated. As expected thedesign with a validation group leads to lower posterior standard deviations for all the parametersdirectly influenced by the measurement design, that is, b
0, b
1and h. Note also that b
2which
corresponds to the covariate C measured without error but correlated to X is well estimated withsimilar precision for the two designs.
3.4.2. Mixture case (data sets (b) and (c))
In data sets (b), the underlying true exposure distribution is symmetric, bimodal, with wellseparated peaks, whereas in data set (c) the mixture distribution is asymmetric (Figure 1). Thecorresponding shapes for the distribution of the surrogate Z are outlined in Figure 1. Note thatthe bimodality is clearly attenuated so that choosing a parametric distribution for X from thehistogram of the Z’s is not straightforward.
Overall we see some deterioration in the estimation of b0
and b1. The estimated posterior
means are not so close to the set values and the posterior standard deviations are noticeably
BAYESIAN MODELLING OF MEASUREMENT ERROR PROBLEMS 207
![Page 6: SOME COMMENTS ON MISSPECIFICATION OF PRIORS IN BAYESIAN MODELLING OF MEASUREMENT ERROR PROBLEMS](https://reader036.vdocuments.site/reader036/viewer/2022082717/5750010b1a28ab11488bd796/html5/thumbnails/6.jpg)
Tab
leI.
Gib
bssa
mplin
gan
alys
isofa
design
with
ava
lidat
ion
subg
roup
(n"
50)u
nder
diff
eren
tge
nera
ting
distr
ibutionsof
theex
pos
ure
X.D
ata
sets
(b)
to(e
)ar
ean
alys
edw
ith
am
issp
ecifi
edprior
distr
ibution
for
the
exposu
reX
Par
amet
ertr
ue
valu
eb 0
b 1b 2
k xh
!0·
80·
91·
22·
00·
9
Dat
ase
ts(a
)(n
orm
al)
post
erio
rm
ean
!0·
77(0
·25)
0·89
(0·1
1)1·
24(0
·08)
2·01
(0·0
7)0·
94(0
·05)
post
erio
rSD
0·16
(0·0
3)0·
11(0
·02)
0·12
(0·0
08)
0·07
(0·0
02)
0·09
(0·0
1)
Dat
ase
ts(b
)(m
ixtu
re)
post
erio
rm
ean
!1·
13(0
·15)
1·05
(0·1
2)1·
18(0
·13)
2·04
(0·1
1)0·
89(0
·26)
post
erio
rSD
0·23
(0·0
5)0·
17(0
·04)
0·13
(0·0
09)
0·09
(0·0
04)
0·16
(0·0
7)
Dat
ase
ts(c
)(m
ixtu
re)
post
erio
rm
ean
!0·
85(0
·25)
0·94
(0·1
6)1·
31(0
·08)
2·02
(0·0
5)1·
02(0
·24)
post
erio
rSD
0·19
(0·0
6)0·
14(0
·05)
0·14
(0·0
2)0·
09(0
·002
)0·
19(0
·06)
Dat
ase
ts(d
)(lo
g-norm
al)
post
erio
rm
ean
!0·
57(0
·13)
0·67
(0·0
9)1·
14(0
·15)
1·95
(0·0
4)0·
72(0
·13)
post
erio
rSD
0·22
(0·0
5)0·
16(0
·04)
0·12
(0·0
09)
0·08
(0·0
05)
0·14
(0·0
3)
Dat
ase
ts(e
)(c
hi-sq
uar
e)po
ster
ior
mea
n!
0·65
(0·1
5)0·
69(0
·09)
1·15
(0·1
0)1·
97(0
·11)
0·88
(0·1
2)po
ster
ior
SD
0·19
(0·0
3)0·
14(0
·04)
0·11
(0·0
09)
0·08
(0·0
04)
0·17
(0·0
3)
Eac
hlin
esu
mm
ariz
esre
sults
over
10in
dep
enden
tre
plic
atio
ns
:m
ean
and
bet
wee
nre
plic
atio
nst
andar
dde
viat
ions
(giv
enin
bra
cket
s).
(a)AX CB&
NCA2·
0
1·5B; A4·
0
1·5
1·5
1·56BD
(b)
X&
0·5
N(0
·26;
1·0)#
0·5(3
·73;
1·0)
(c)
X&
0·5
N(0
·26;
2·22
)#0·
5(3
·73;
0·74
)(d
)X
&lo
g-no
rmal
(0·3
46;0·
832)
(e)
X&
s2 2
208 S. RICHARDSON AND L. LEBLOND
![Page 7: SOME COMMENTS ON MISSPECIFICATION OF PRIORS IN BAYESIAN MODELLING OF MEASUREMENT ERROR PROBLEMS](https://reader036.vdocuments.site/reader036/viewer/2022082717/5750010b1a28ab11488bd796/html5/thumbnails/7.jpg)
Tab
leII
.G
ibbssa
mplin
gan
alys
isofa
des
ign
with
two
repea
ted
mea
sure
sin
asu
bgr
oup
(n"
50)u
nde
rdiff
eren
tge
ner
atin
gdistr
ibution
soft
heex
posu
reX
.D
ata
sets
(b)to
(e)ar
ean
alys
edw
ith
am
issp
ecifi
edprior
distr
ibution
for
the
exposu
reX
Par
amet
ertr
ue
valu
eb 0
b 1b 2
k xh
!0·
80·
91·
22·
00·
9
Dat
ase
ts(a
)(n
orm
al)
post
erio
rm
ean
!0·
79(0
·12)
0·90
(0·1
2)1·
24(0
·11)
2·02
(0·0
4)0·
95(0
·17)
post
erio
rSD
0·22
(0·0
4)0·
17(0
·04)
0·13
(0·0
1)0·
09(0
·003
)0·
18(0
·04)
Dat
ase
ts(b
)(m
ixtu
re)
post
erio
rm
ean
!1·
34(0
·37)
1·25
(0·2
9)1·
15(0
·06)
2·00
(0·0
7)0·
75(0
·11)
post
erio
rSD
0·45
(0·2
4)0·
36(0
·20)
0·16
(0·0
3)0·
09(0
·003
)0·
15(0
·04)
Dat
ase
ts(c
)(m
ixtu
re)
post
erio
rm
ean
!1·
31(0
·56)
1·35
(0·4
6)1·
07(0
·16)
2·00
(0·0
7)0·
78(0
·20)
post
erio
rSD
0·37
(0·2
2)0·
32(0
·19)
0·17
(0·0
4)0·
10(0
·005
)0·
14(0
·05)
Dat
ase
ts(d
)(lo
g-norm
al)
post
erio
rm
ean
!0·
38(0
·28)
0·62
(0·2
5)1·
16(0
·24)
2·02
(0·0
5)0·
90(0
·20)
post
erio
rSD
0·20
(0·0
9)0·
15(0
·09)
0·13
(0·0
5)0·
09(0
·005
)0·
19(0
·06)
Dat
ase
ts(e
)(c
hi-sq
uar
e)po
ster
ior
mea
n!
0·69
(0·2
1)0·
75(0
·19)
1·22
(0·1
4)1·
99(0
·09)
0·92
(0·2
5)po
ster
ior
SD
0·25
(0·1
2)0·
19(0
·11)
0·13
(0·0
2)0·
09(0
·006
)0·
19(0
·04)
Eac
hlin
esu
mm
ariz
esre
sults
over
10in
dep
enden
tre
plic
atio
ns:
mea
nan
dbet
wee
nre
plic
atio
nst
anda
rddev
iation
s(g
iven
inbra
cket
s)
(a)AX CB&
NCA2·
0
1·5B; A4·
0
1·5
1·5
1·56BD
(b)
X&
0·5
N(0
·26;
1·0)#
0·5(3
·73;
1·0)
(c)
X&
0·5
N(0
·26;
2·22
)#0·
5(3
·73;
0·74
)(d
)X
&lo
g-no
rmal
(0·3
46;0·
832)
(e)
X&
s2 2
BAYESIAN MODELLING OF MEASUREMENT ERROR PROBLEMS 209
![Page 8: SOME COMMENTS ON MISSPECIFICATION OF PRIORS IN BAYESIAN MODELLING OF MEASUREMENT ERROR PROBLEMS](https://reader036.vdocuments.site/reader036/viewer/2022082717/5750010b1a28ab11488bd796/html5/thumbnails/8.jpg)
Figure 1. Histograms of the distribution of the surrogate Z for cases (b) to (e) of non-Gaussian distribution for theexposure X.
Z&N(X, (0.9)~1). The density of X is plotted as a full line and a smooth density estimate of Z as a broken line:(b) X&0·5 N(0·26; 1·0)#0·5 (3·73; 1·0)(c) X&0·5 N(0·26; 2·22)#0·5 (3·73; 0·74)(d) X&log-normal (0·346; 0·832)(e) X&s2
2
increased, particularly for data sets (b). The regression parameter b1
is overestimated particularlyfor design 2 and there are also wider fluctuations between the results of the 10 replicated data sets.
As expected, the estimation of b2
is not much affected by the misspecification, the correlationbetween X and C leading only to slightly larger posterior standard deviation for b
2. Results for k
Xand h are still satisfactory. Note that the misspecification of X has increased the posteriorstandard deviation for h in the validation design but not in the repeated measure design. This isbecause in design 1, the validation group provides information both in h and n, whereas in design2, there is only weak information in n in the repeated measures. Thus h is more precisely assessedin design 1, provided that there is no conflict between the values of X in the validation group andthose generated in the main study with the help of the prior distribution of X.
3.4.3 Log-normal and chi-square cases (datasets (d) and (e))
There is again a deterioration of the estimation of b0
and b1, with underestimation of b
1rather
than overestimation as in cases (b) and (c). In contrast to the mixture cases (b) and (c), there is noclear pattern of increase of the posterior standard deviations for these parameters. Consequently,
210 S. RICHARDSON AND L. LEBLOND
![Page 9: SOME COMMENTS ON MISSPECIFICATION OF PRIORS IN BAYESIAN MODELLING OF MEASUREMENT ERROR PROBLEMS](https://reader036.vdocuments.site/reader036/viewer/2022082717/5750010b1a28ab11488bd796/html5/thumbnails/9.jpg)
in the log-normal case, the posterior mean of b0
is biased and more than two posterior standarddeviations away from the set value of !0·8. As previously, the parameters b
2, k
Xand h are well
estimated, the misspecification being only reflected by increased fluctuations between the 10 datasets, in particular for h.
4. DISCUSSION
In this short note we have reviewed some of the aspects of the Bayesian approach to measurementerror problems via the specification of conditional independence models and the implementationof stochastic simulation algorithms. There are several advantages of this approach over methodspreviously proposed which have been extensively discussed in Richardson and Gilks.17 Ofparamount importance is its flexibility, which enables the modelling of a wide range of measure-ment error situations without resorting to artificial simplifying assumptions. This has importantdesign implications for future studies and an important area for research is to develop guidelinesfor complex designs.
A first step in the construction of such models is the stipulation of suitable conditionalindependence assumptions. Careful thought has to be given to the implications of each of theseassumptions in any particular context. As an example, the conditional independence between therepeated measures of a surrogate given the true risk factor, assumed in design 2, would not hold ifthere is a systematic bias in the measurement instrument.
At a second step, parametric distributions are specified. Misspecification can thus occur ina variety of ways. The influence of misspecification of the unknown exposure distribution onregression parameter estimates gives cause for concern and we have centred our discussion onthis problem. With respect to the regression coefficient for X we have shown some sensitivity tomisspecification, the overall picture being that of a moderate bias in the estimates and increasedposterior standard deviations. On the other hand, misspecification of the prior distribution ofX has little influence on the estimation of the regression coefficient of a covariate measuredwithout error (even when correlated with X) or on the estimation of the precision of themeasurement error model.
There is strong interest in being able to relax the fully parametric set up, in a way which is notdata dependent, while keeping the flexibility of the conditional independence modelling and theBayesian approach. The use of flexible mixture distributions is a natural way to go in thatdirection. Indeed, mixture of standard distributions are often used in a semi-parametric way toapproximate distributions which are not easily modelled by standard parametric families. Gibbssampling analysis of finite mixtures has been described by Diebolt and Robert31 and the nextstage of development of our modelling approach will be to incorporate the possibility of usinga mixture model for the exposure distribution. Furthermore, advantage might be taken of recentdevelopments in the class of MCMC algorithms32 which will enable the use of mixtures witha variable number of components, thus increasing the flexibility of this semi-parametric approach.Finally, let us note that in recent work, Mallick and Gelfand33 have used mixtures of betadistributions to model unknown link functions and have also applied these ideas in the context ofmeasurement error problems.25
REFERENCES
1. Willett, W. ‘An overview of issues related to the correction of non-differential exposure measurementerror in epidemiologic studies’, Statistics in Medicine, 8, 1031—1040 (1989).
2. Fuller, W. A. Measurement Error Models, Wiley, New York, 1987.
BAYESIAN MODELLING OF MEASUREMENT ERROR PROBLEMS 211
![Page 10: SOME COMMENTS ON MISSPECIFICATION OF PRIORS IN BAYESIAN MODELLING OF MEASUREMENT ERROR PROBLEMS](https://reader036.vdocuments.site/reader036/viewer/2022082717/5750010b1a28ab11488bd796/html5/thumbnails/10.jpg)
3. Carroll, R. J. and Wand, M. P. ‘Semiparametric estimation in logistic measurement error models’,Journal of the Royal Statistical Society, Series B, 53, 573—585 (1991).
4. Chesher, A. ‘The effect of measurement error’, Biometrika, 78, 451—462 (1991).5. Pepe, M. S. and Fleming, T. R. ‘A non-parametric method for dealing with mismeasured covariate data’,
Journal of the American Statistical Association, 86, 108—113 (1991).6. Robins, J. M., Rotnitzky, A. and Zhao, L. P. ‘Estimation of regression coefficients when some regressors
are not always observed’, Journal of the American Statistical Association, 89, 846—866 (1994).7. Carroll, R. J., Ruppert D. and Stefanski, L. A. Measurement Error in Nonlinear Models, Chapman
& Hall, New York, 1995.8. Armstrong, B. G. ‘The effects of measurement errors on relative risk regression’, American Journal of
Epidemiology, 132, 1176—1184 (1990).9. Rosner, B., Willett, W. C. and Spiegelman, D. ‘Correction of logistic regression relative risk estimates
and confidence intervals for systematic within-person measurement error’, Statistics in Medicine, 8,1051—1069 (1989).
10. Rosner, B., Spiegelman, D. and Willett, W. C. ‘Correction of logistic regression relative risk estimatesand confidence intervals for measurement error: the case of multiple covariate measured with error’,American Journal of Epidemiology, 132, 734—745 (1990).
11. Pierce, D. A., Stram, D. O., Vaeth, M. and Schafer, D. W. ‘The errors in variables problem: consider-ations provided by radiation dose-response analyses of the A-bomb survivor data’, Journal of theAmerican Statistical Association, 87, 351—359 (1992).
12. Duffy, S. W., Maximovitch, D. and Day, N. E. ‘External validation, repeat determination and precisionof risk estimation in misclassified exposure data in epidemiology’, Journal of Epidemiology and Commun-ity Health, 46, 620—624 (1992).
13. Caroll, R. J., Gail, M. H. and Lubin, J. H. ‘Case-control studies with errors in covariates’, Journal of theAmerican Statistical Association, 88, 421, 185—199 (1993).
14. Thomas, D., Stram, D. and Dwyer, J. ‘Exposure measurement error: influence on exposure-diseaserelationship and methods of correction’, Annual Review of Public Health, 14, 69—93 (1993).
15. Tanner, A. ¹ools for Statistical Inference, Springer Verlag, New York, 1993.16. Gelfand, A. E. and Smith, A. F. M. ‘Bayesian analysis of constrained parameters and truncated data
problems using Gibbs sampling’, Journal of the American Statistical Association, 87, 523—532 (1992).17. Richardson, S. and Gilks, W. R. ‘Conditional independence models for epidemiological studies with
covariate measurement error’, Statistics in Medicine, 12, 1703—1722 (1993).18. Kirby, A. J. and Spielgelhalter, D. J. Statistical Modelling for the Precursors of Cervical Cancer, Case
Studies in Biometry, N. Lange (ed), Wiley, New York, 1994.19. Besag, J., Green, P. J., Higdon, D. and Mengersen, K. ‘Bayesian computation and stochastic system’,
Statistical Science, 10, 1, 3—41 (1995).20. Gilks, W. R. and Richardson, S. ‘Analysis of disease risks using ancillary risk factors, with application to
job-exposure matrices’, Statistics in Medicine, 11, 1443—63 (1992).21. Richardson, S. and Gilks, W. R. ‘A Bayesian approach to measurement error problems in epidemiology
using conditional independence models’, American Journal of Epidemiology, 138, 6, 430—442 (1993).22. Thomas, D. C., Gauderman, J. and Kerber, R. ‘A non-parametric Monte Carlo approach to adjustment
for covariate measurement errors in regression problems’, Technical report, Department of PreventiveMedicine, University of Southern California, 1991.
23. Stephens, D. A. and Dellaportas, P. ‘Bayesian analysis of generalised linear models with covariatemeasurement error’, in Bernardo, J. M., Berger, J. O., Dawid, A. P. and Smith, A. F. M. (eds), BayesianStatistics 4, Oxford University Press, Oxford, 1992.
24. Schmid, C. H. and Rosner, B. ‘A Bayesian approach to logistic regression models having measurementerror following a mixture distribution’, Statistics in Medicine, 12, 1141—1153 (1993).
25. Mallick, B. K. and Gelfand, A. E. ‘Semiparametric errors in variables models: a Bayesian approach’,Journal of Statistical Planning and Inference, 52, 307—321 (1996).
26. Clayton, D. G. ‘Models for the analysis of cohort and case-control studies with inaccurately measuredexposures’, in Dwyer, J. H., Feinleib, N., Lippert, P. and Hoffmeister, H. (eds) Statistical Models for¸ongitudinal Studies of Health, Oxford University Press, New York, 1992.
27. Hastings, W. K. ‘Monte-Carlo sampling methods using Markov chains and their applications’, Biomet-rika, 57, 97—109 (1970).
28. Gelfand, A. E. and Smith, A. F. M. ‘Sampling based approaches to calculating marginal densities’,Journal of the American Statistical Association, 85, 398—409 (1990).
212 S. RICHARDSON AND L. LEBLOND
![Page 11: SOME COMMENTS ON MISSPECIFICATION OF PRIORS IN BAYESIAN MODELLING OF MEASUREMENT ERROR PROBLEMS](https://reader036.vdocuments.site/reader036/viewer/2022082717/5750010b1a28ab11488bd796/html5/thumbnails/11.jpg)
29. Gelfand, A. E., Hills, S. E., Racine-Poon, A. and Smith, A. F. M. ‘Illustration of Bayesian inference innormal data models using Gibbs sampling’, Journal of the American Statistical Association, 85, 972—985(1990).
30. Gilks, W. R., Richardson, S. and Spiegelhalter, D. J. (eds) Practical Markov Chains Monte Carlo,Chapman and Hall, London, 1996.
31. Diebolt, J. and Robert, C. P. ‘Estimation of finite mixture distributions through Bayesian sampling’,Journal of the Royal Statistical Society, Series B, 56, 163—175 (1994).
32. Green, P. J. ‘Reversible jump MCMC computation and Bayesian model determination’, Biometrika, 82,4, 711—732 (1995).
33. Mallick, B. K. and Gelfand, A. E. ‘Generalized linear models with unknown link functions’, Biometrika,81, 237—245 (1995).
.
BAYESIAN MODELLING OF MEASUREMENT ERROR PROBLEMS 213