bayesian hierarchical models in ecological studies of health–environment effects

ENVIRONMETRICS

Environmetrics 2003; 14: 129–147 (DOI: 10.1002/env.571)

Bayesian hierarchical models in ecological studiesof health–environment effects

Sylvia Richardson*,y and Nicky Best

Department of Epidemiology and Public Health, Imperial College School of Medicine at St Mary’s, Norfolk Place,London W2 1PG, U.K.

SUMMARY

We describe Bayesian hierarchical models and illustrate their use in epidemiological studies of the effects ofenvironment on health. The framework of Bayesian hierarchical models refers to a generic model buildingstrategy in which unobserved quantities (e.g. statistical parameters, missing or mismeasured data, random effects,etc.) are organized into a small number of discrete levels with logically distinct and scientifically interpretablefunctions, and probabilistic relationships between them that capture inherent features of the data. It has proved tobe successful for analysing many types of complex epidemiological and biomedical data. The generalapplicability of Bayesian hierarchical models has been enhanced by advances in computational algorithms,notably those belonging to the family of stochastic algorithms based on Markov chain Monte Carlo techniques.

In this article, we review different types of design commonly used in studies of environment and health, givedetails on how to incorporate the hierarchical structure into the different components of the model (baseline risk,exposure) and discuss the model specification at the different levels of the hierarchy with particular attention to theproblem of aggregation (ecological) bias. Copyright # 2003 John Wiley & Sons, Ltd.

key words: aggregation; Bayesian graphical models; ecological bias; exposure measurement error; spatialdependence; time series

1. INTRODUCTION

The study of the effects of environmental exposures on health presents many challenges. Among these

are the practical and methodological problems of collecting suitable data, and the need for

sophisticated statistical models to capture the complex nature of the underlying health–exposure

relationships and to acknowledge the type and quality of the available data. In this article, our aim is to

show how Bayesian hierarchical models provide a unifying framework for modelling such health–

environment relationships, and to illustrate how this framework may be elaborated to incorporate the

additional complexity demanded by non-standard features of the data. These include availability of

data at individual and group level and the associated inconsistencies between individual and aggregate

Received 3 December 2001

Copyright # 2003 John Wiley & Sons, Ltd. Accepted 4 March 2002

*Correspondence to: Sylvia Richardson, Department of Epidemiology and Public Health, Imperial College School of Medicineat St Mary’s, Norfolk Place, London W2 1PG, U.K.yE-mail: [email protected]

Contract/grant sponsor: U.K. Medical Research Council; contract/grant number: G9803841.Contract/grant sponsor: U.K. Small Area Health Statistics Unit.

level disease–exposure relationships, the need for exposure measurement models relating individual

and ambient measures of exposure, and the occurrence of complex patterns of spatial and temporal

dependence in exposure and outcome data and of missing data and unmeasured confounders.

The article is organised as follows. In Section 2, we introduce the hierarchical modelling

framework and show how a graphical representation of the local dependence relations between model

quantities can facilitate construction of complex statistical models by combining several sub-graphs

representing different features of the substantive problem. In Section 3, we elaborate on the functional

specification of the relationships implicated in our hierarchical models, focusing on non-standard

aspects relating particularly to environment and health problems. We also draw on a recent example

from the literature to illustrate the practical use of hierarchical models for investigating environment–

health relationships in the context of air pollution. In Section 4, we make some concluding remarks

about the potential benefits of using a hierarchical Bayesian modelling strategy when faced with the

complex problem of studying health–environment effects.

2. BAYESIAN HIERARCHICAL MODELLING FRAMEWORK

The framework of Bayesian hierarchical modelling refers to a generic model building strategy in which

unobserved quantities (e.g. statistical parameters, missing or mismeasured data, random effects, etc.) are

organized into a small number of discrete levels with logically distinct and scientifically interpretable

functions and probabilistic relationships between them that capture inherent features of the data. It has

proved to be successful for analysing many types of complex epidemiological, biomedical, environ-

mental and other data, as illustrated by the case studies in Gilks et al. (1996) and the wide range of

examples in the literature (for example, Morris and Normand (1991); Wakefield (1996); Su et al. (2001);

Rosenberg et al. (1999)). The general applicability of Bayesian hierarchical models has been enhanced

by advances in computational algorithms, notably those belonging to the family of stochastic algorithms

based on Markov chain Monte Carlo (MCMC) techniques (see Green, 2001, for a recent review).

When specifying a hierarchical model, it is often convenient to start with a graphical representation

of the structural assumptions relating the quantities in the model. Such models are commonly referred

to as Bayesian graphical models and have become increasingly popular as ‘building blocks’ for

constructing complex statistical models of biological and other phenomena (Spiegelhalter, 1998).

These graphs consist of nodes representing the variables in the model, linked by directed or undirected

edges representing the dependence relationships between the variables. A graph where all the edges are

directed and where there is no loop is known as a directed acyclic graph (DAG). DAGs have been

extensively used in modelling situations where the relationships between the variables are asymmetric,

from ‘‘cause’’ to ‘‘effect’’. A unique and easily computed joint distribution can be written for any DAG

(see, for example, Lauritzen, 1996). There are other cases where the links between the variables are

symmetric. For example, in spatial epidemiology, one might simply want to formulate that incidence

rates in neighbouring areas are correlated. This type of symmetric dependence is encoded via an

undirected graph, where by convention, the absence of a link between two variables signifies that the

variables are conditionally independent given all the others in the graph. This latter type of graph is

referred to as a conditional independence graph, and extra conditions are required to guarantee that the

joint distribution of all the variables exist. When a particular context involves a mixture of both types of

links, the corresponding graphs are called chain graphs. In our review, we shall mostly display DAGs,

but chain graphs would be required for representing health–environment studies exploring the spatial

variability of outcomes and exposure (see Spiegelhalter et al., 1995, for more details of such graphs).

130 S. RICHARDSON AND N. BEST

Copyright # 2003 John Wiley & Sons, Ltd. Environmetrics 2003; 14: 129–147

Throughout, we adopt the notation used by Spiegelhalter (1998), denoting variable quantities in the

model (whether observed or not) by circular nodes in the graph, and constants by rectangular nodes;

arrows indicate directed dependencies between nodes. Repetitive structures are known as ‘plates’ and

are represented by large rectangles enclosing the repeated nodes. Figure 1 shows some examples of

DAGs representing various environment–health models; these are discussed in more detail below.

2.1. Study designs

There are three main designs used to study relationships between environmental exposures and health,

depending on the type and resolution of the available data. We will refer to these as individual, semi-

ecological and ecological (or aggregate) designs. We first provide some basic notation and then

describe the generic structure of each design using graphical models.

Let i ¼ 1; . . . ;Ng denote individuals within groups g ¼ 1; . . . ;G, where g may index, for example,

time units, geographical (spatial) units, socioeconomic groups, etc. We denote the total population at

risk by N; health outcomes by yi or Yg for individuals or groups, respectively; the environmental

exposure of interest by Xi if measured at the individual level and by Zg if measured at the group level;

the regression coefficient measuring the effect of exposure on disease risk by �; and the baseline risk

(possibly adjusted for known risk factors such as age and sex) by �0i (individual-level) or �0g (group

level). Note that, for simplicity of presentation in this section, we assume that no other concomitant

risk factor data are available, although multiple exposures are easily accommodated by extending Xi,

Zg and � to vector notation.

2.1.1. Individual design. This design is appropriate if both exposure and health outcome data are

measured at the individual level on the same set of subjects. Figure 1a shows the graph corresponding

to such a model. The functional form of the local dependence relationships implied by this graph may

be represented as

yi � p ð f ð�0i;Xi; �ÞÞ i ¼ 1; . . . ;N

Figure 1. Graphical models (DAGs) illustrating the structure of basic environment–health relationships for (a) individual design;

(b) semi-ecological design; (c) ecological design. Note that, in order for each graph to represent a full joint distribution, we also

assume that the unknown quantities at the top of each graph (i.e. the �, �0 and �0 parameters) are given appropriate prior

probability distributions (usually chosen to be minimally informative); however, for clarity we suppress these dependencies in

the graphical representation

BAYESIAN MODELS OF HEALTH–ENVIRONMENT EFFECTS 131


where pðÞ is an appropriate probability distribution (taken to be Bernoulli if yi is dichotomous) and f ðÞis a suitable link function specifying the disease risk �i for individual i as a function of the baseline risk

�0i and the exposure Xi (see Section 3.1 for further details).

2.1.2. Semi-ecological design. If health outcome data are measured at the individual level, but

measurements of the environmental exposure are only available at the group level (for example,

mean daily or annual ambient pollution concentrations, or spatially averaged exposures), then a semi-

ecological design is appropriate. The baseline risk may be specified at either the individual or group

level. Figure 1b shows the graph corresponding to such a model (illustrated assuming individual-level

baseline risk). Again, the functional form of the local dependence relationships implied by this graph is

yig � p ð f ð�0ig; Zg; ��ÞÞ i ¼ 1; . . . ;Ng; g ¼ 1; . . . ;G

As before pðÞ will be taken as Bernoulli if yig is dichotomous, and f ðÞ specifies the form of the

functional dependence of the overall disease risk �ig for individual i in group g on the baseline risk �0ig

and exposure Zg. However, note that the coefficient �� representing the effect of exposure Zg is not

necessarily the same as the coefficient in the individual-level model. The difference will depend on the

relationship between the group-level measured exposure Zg and the (unobserved) individual exposures

Xig in the group (see Section 3.4 for further details).

2.1.3. Ecological design. The ecological design is appropriate when both the health outcome data and

exposure data are only available at the group level (for example, counts of health events in space or

time, and average exposure measurements over space or time). Figure 1c shows the graph correspond-

ing to such a model. The functional form of the local dependence relationships is given below:

YgjNg � p ð f ð�0g; Zg; ��ÞÞ g ¼ 1; . . . ;G

Here pðÞ will typically be taken as binomial if Yg represent counts of cases or, if the disease is rare, a

Poisson approximation may be assumed. f ðÞ specifies the form of the functional dependence of the

average disease risk �g for group g on the average baseline risk �0g and the ecological exposure Zg.

The coefficient �� representing the effect of this exposure is not necessarily the same as in the previous

two models. In this case, the difference depends on a number of factors (Greenland, 1992) including:

(i) the relationship between Zg and Xig (see above and Section 3.4); (ii) the functional form of the

underlying individual-level relationship f ð�0ig;Xig; �Þ between disease risk and exposure (see Section

3.1); (iii) the presence of group-level confounders (which we may attempt to adjust for by modelling

between-group variations in baseline risk; see Section 2.2.1 and 3.2); and (iv) the presence of group-

level effect modifiers (which we may attempt to adjust for by modelling between-group variations in

the exposure coefficient; see Section 2.2.2 and 3.3).

2.2. Incorporating hierarchical structure

Many statistical applications involve multiple parameters that can be regarded as related or connected

in some way by the structure of the problem. For example, in the present context, we wish to estimate

the parameters Hk of an exposure–response relationship for each of a large number of individuals or

groups (where Hk denotes unknown quantities of interest, including regression parameters, mis-

measured or unobserved risk factors etc., and k is a generic index representing individuals or groups).



The Hk are likely to be similar (but not necessarily identical) across units k since the same exposure

and response are of interest in each case. Adjustment can be made for factors such as age and sex that

are known to influence the relationship, and spatial and/or temporal proximity of the units may suggest

that some of the Hk are likely to be more closely related than others. The dependence among the Hk

can be represented by a joint probability model for these parameters. That is, we regard the unknown

quantities Hk as drawn from a (prior) probability distribution that may depend on known covariates

plus additional parameters / that represent the overall mean and variances/covariances among the Hk.

This crucial step of specifying a probability distribution relating unknown quantities across units leads

to a hierarchical or multilevel model (Good, 1987; Gelman et al., 1995; Breslow, 1990; Goldstein,

1995). The links between parameters implied by this joint distribution enable the estimates of Hk for

any one unit to ‘borrow strength’ from information on related parameters for other units, as well as

depending on the (often sparse) information contained in the data for unit k. This leads to improved

parameter estimates over those obtained from non-hierarchical models that treat each unit indepen-

dently, or pool all units together ignoring between-unit variability (Gelman et al., 1995).

Below we discuss how to extend the above models in hierarchical fashion by considering

elaboration of three different aspects of the basic model. Our focus will be on the full ecological

design, since this is a common design in environmental epidemiology and is the most challenging

statistically. However, similar hierarchical extensions to the individual and semi-ecological designs

are also possible.

2.2.1. Hierarchical modelling of baseline risk. The baseline risk of disease typically depends on the

age and sex of the individual, plus possibly other factors, for example, genetic susceptibility. The

baseline risk will therefore vary from individual to individual, a concept that is often termed ‘frailty’ in

the survival analysis literature. Baseline disease risk is also likely to vary from group to group. For

example, groups defined by small areas or socioeconomic categories are likely to share similar

lifestyle or cultural factors that may lead to between-group variations in �0g. Likewise, seasonal

effects may manifest as between-group variations in baseline risk if temporal groupings are used. If

data are available on any of these factors, these may be included as additional observed covariates in

the model. However, typically, many risk factors that influence baseline risk are unobserved, or are not

even known. Nonetheless, it may be reasonable to assume that the baseline risks are similar across

individuals or groups, in which case a hierarchical model specifying a joint probability distribution for

the unknown baseline across units is appropriate.

Figure 2a shows the part of the graph corresponding to a hierarchical model for baseline risk in the

ecological study design. The parameters �0g are usually interpreted as representing the inherent part of

the risk due to the aging process as well as that of unobserved group-level risk factors. However,

Knorr-Held and Besag (1998) show that this model also corresponds to the situation in which baseline

risk varies within groups, providing that the individuals in each of the baseline risk ‘strata’ are

distributed randomly within the group. If there is clustering (e.g. in space or time) of the factors

influencing baseline risk within groups, then inclusion of a group level baseline risk parameter �0g

should be viewed as only a rough approximation to the ‘true’ model (see Wakefield et al., 2000, for

further details). Specification of the joint probability distribution pðK0j/�Þ;K0 ¼ f�0ggg¼1;...;G,

implied by the graph will be discussed in Section 3.2.

2.2.2. Hierarchical modelling of exposure risk. The effect of an environmental exposure on risk of

disease may be modified by a number of factors. For example, lifestyle factors such as diet or poor

housing conditions, or climatic factors such as temperature may interact with the exposure of interest,



leading to non-uniformity of the exposure–response relationship across values of the other factors. This

suggests that our statistical model should allow the coefficient representing risk of disease associated

with the exposure of interest to be different across groups. In general, such a model is not fully

identifiable unless repeated pairs of measurements of the exposure and response are available for each

group. In the absence of replicate data within groups, some authors have attempted to estimate group-

specific coefficients ��g by ‘borrowing strength’ across groups using a hierarchical model to induce

similarity between the coefficients (King et al., 1999). In this case, the hierarchical model is used, not

only to improve the precision of the parameter estimates, but to make the coefficients identifiable. The

validity of the inference achieved therefore depends crucially on the appropriateness of the joint

probability distribution assumed for the coefficients f��gg. This issue is discussed further in Section 3.3.

A more satisfactory alternative may be to specify a priori subsets of groups Sj � g;g ¼ 1; . . . ;G; j ¼ 1; . . . ; J < G that share a common value ��

j for the exposure coefficient, and

assume a hierarchical model for these coefficients across subsets. For example, if groups g represent

weeks or months, then it may be reasonable to allow the exposure coefficients to vary across quarters

or years. Such a model also depends crucially on the appropriateness of the prior choice of subsets but

has the advantage that the subset-specific coefficients for the exposure effect are fully identifiable,

since the exposure and response data for each group g � Sj form replicate pairs of measurements

within the appropriate subset. Figure 2b shows the part of the graph corresponding to such a

hierarchical model for the regression coefficient associated with exposure in the ecological study

design. Specification of the joint distribution pðb�j/�Þ; b� ¼ f��j gj¼1;...;J implied by the graph is

discussed in Section 3.3.

2.2.3. Hierarchical modelling of exposure measurement error. Uncertainties in exposure assessment

remain one of the major constraints on studying environment–health relationships. The complex

pathways linking measurements of a particular environmental pollutant (say) to ambient concentra-

tions, and in turn, to the internally absorbed (biological) dose present severe methodological challenges

(see Briggs, 2000, for a discussion). This problem may be partly addressed by elaborating the graph to

specify a hierarchical model relating the measured and ‘true’ exposures. Focusing on the ecological

Figure 2. Graphical model illustrating hierarchical extensions to the basic ecological model of environment–health relation-

ships: (a) hierarchical model for baseline risk; (b) hierarchical model of risk associated with exposure; (c) hierarchical model of

exposure measurement error. The term Ig�Sjis an indicator variable of whether group g belongs to subset Sj



design, we might assume a joint probability distribution pðZjZ�;/zÞ � pðZ�j/�z Þ relating the observed

and ‘true’ ecological exposures Z and Z�, respectively. Here /z represents the measurement error

variances and covariances, while /�z represents the mean and (co)variances of the distribution of the

true ecological exposure across groups. Specification of this measurement error model is considered in

Section 3.4, and the corresponding elaboration to the graphical model is shown in Figure 2c.

3. MODEL SPECIFICATION AT DIFFERENT LEVELS OF THE HIERARCHY

In the previous section, we have discussed the structural assumptions that underpin hierarchical models.

Here, we come to discuss the functional specifications of these models. In the context of studies of

health–environment effects, some functional specifications are straightforward and guided by the usual

considerations for expressing sources of variability in the generalized linear model framework. We

would like to focus our discussion on a number of non-standard aspects: the specification of aggregated

level rather than individual level dose–effect relationships, modelling of between-group variability of

baseline risk and risk due to exposure, and the need to build exposure measurement models. These

functional specifications will involve parameters common to all the groups, represented at the top of

Figure 2 by the nodes /. These parameters will themselves be given suitable prior distributions,

although we will not discuss these here. It is not our purpose to give details of the Bayesian estimation

framework that allows joint estimation of these parameters together with those which are group

specific. The necessary calculations are often numerically intractable, and so simulation approaches

based on MCMC techniques are generally the only feasible method. The reader is referred to Gilks et al.

(1996) and Green (2001) for further details. All the models discussed in here may also be implemented

using the WinBUGS software (Spiegelhalter et al., 2001) for MCMC estimation.

3.1. Aggregating dose–effect relationship from the individual to the group level

Let us consider the following situation. At the individual level, we let �ig be the risk that individual i

belonging to group g and having received exposure Xig contracts a disease D. For ease of presentation,

we assume a single exposure of interest. However, extension of the models discussed below to the case

of multivariate exposures Xig ¼ ðX1ig;X2ig; . . . ;XpigÞ is straightforward. For an exposure X that is

common to all individuals in the group, as in the case in the semi-ecological model defined in Section

2.1.2, we have Xig ¼ Xg.

In general, �ig is a function of the baseline risk and the exposure Xig : �ig ¼ f ð�0ig;Xig; �Þ. Note

that here we have allowed the baseline risk to be individual specific. This could refer to intrinsic

differences between individuals, in other words different frailties, and/or include the influence of latent

(unmeasured) covariates. In many situations, the simplifying assumption is that the baseline risk is

constant over the group, i.e. that �0ig ¼ �0g is made.

As before, let Yg be the total number of cases in group g composed of Ng individuals i. The group g

could represent, for example, persons living in a specific location, like an electoral ward, over a period

of time. For the present, we do not need to state precisely how the group is defined with respect to

space and/or time.

In order to specify meaningfully a functional relation between the disease rate in the group,

EðYgÞ=Ng, and a measure of exposure defined for the group, we investigate how an individual level

model can be aggregated into a full ecological model. Several cases, corresponding to different

functional forms for f ðÞ, have been discussed in the literature. For each case, we express the



aggregation problem in two forms. In the first one (semiparametric), we condition on the observed

values of individual baseline risk and exposures and we simply aggregate by summing the risk �ig over

all the individuals in the group. In the second situation (model-based), we do not assume that we

necessarily have individual level data observed. We model instead the joint distribution of baseline

risk and exposure in the group: pgð�0;XÞ. We thus derive the group disease rate by integrating

f ð�0;X; �Þ with respect to that joint distribution. Throughout, we shall make the additional simplifying

assumption that �0 and X are independent, so that their joint distribution is simply the product

pgð�0ÞpgðXÞ, where pgð�0Þ and pgðXÞ denote respectively the within-group distribution of �0 and X.

Thus, we seek to evaluate

EðYgÞ=Ng ¼ð ð

f ð�0; x; �Þpgð�0ÞpgðxÞd�0dx ð1Þ

for different functional forms of f ðÞ.

3.1.1. Linear dose–effect relationship. Suppose that

f ð�0ig;Xig; �Þ ¼ �0ig þ �Xig ð2Þ

where �0ig represents the baseline risk of disease for individual i and � is the vector of regression

coefficients of interest. Note that this formulation supposes that suitable constraints are operating on

the range of Xig and �, so that the function specified in (2) still defines a risk. Then, we obtain that

EðYgÞ ¼Xi2g

�0ig þ �Xi2g

Xits

We can rewrite this as

EðYgÞ ¼ Ngð��0g þ ��XgÞ ð3Þ

where ��0g represents the mean baseline risk and �Xg equals a straightforward average of the individual

exposures in group g. Thus, because of the linearity of Equation (2), the functional form of the

aggregated dose–effect relationship is the same as the individual level one; in particular, the same

coefficients � relate both the individual level and the group exposure to the disease.

Alternatively, integrating (2) as in (1), we get

EðYgÞ=Ng ¼ð�0pgð�0Þd�0 þ �

ðxpgðxÞdx ¼ Egð�0Þ þ �EgðXÞ ð4Þ

a similar expression to (3).

3.1.2. Exponential dose–effect relationship. This is the most common form of dose–effect relation-

ship used in epidemiology. We assume that

f ð�0ig;Xig; �Þ ¼ �0igexpð�XigÞ ð5Þ



By summing (5) over the group, we obtain

EðYgÞ ¼Xi2g

�0igexpð�XigÞ ð6Þ

an expression that is not a simple function of expð��XgÞ.With reference to (1), we would obtain

EðYgÞ=Ng ¼ð ð

�0expð�xÞpgð�0ÞpgðxÞd�0dx ¼ Egð�0ÞEgðexpð�XÞÞ ð7Þ

again an expression that does not involve EgðXÞ. To progress, different assumptions can be made.

(a) Small regression coefficient

When � is small, we can linearize the exponential in (6) to obtain

EðYgÞ �Xi2g

�0ig þ �

�Xi2g

�0igXig

�

EðYgÞ � Ng��0g 1 þ �~Xg

� �

where ~Xg ¼�P

�0igXig

�=P

�0ig is the baseline risk-weighted average exposure. Using again a first

order approximation, we thus obtain

EðYgÞ=Ng � ��0gexp �~Xg

� �ð8Þ

Comparing (5) and (8), we see that, to first order approximation, the regression coefficient � also

measures the effect of exposure at the group level, if the average group exposure is appropriately

weighted. The equivalent linearization in (7) leads to

EðYgÞ=Ng � Egð�0Þexpð�EgðXÞÞ ð9Þ

which has a form that again corresponds to expression (5). Note that (9) involves the unweighted group

average exposure, EgðXÞ, rather than the baseline-risk-weighted one because we have assumed

independence between �0 and X.

(b) Normally distributed exposure

If we now consider the case where X is normally distributed with mean �g and variance �2g, i.e.

pgðXÞ ¼ Nð�g; �2gÞ, then we obtain

Egðexpð�XÞÞ ¼ exp ��g þ 0:5�2�2g

� �

(Note that, in the case of multivariate exposures, the full variance–covariance matrix of the joint

exposure distribution enters here.) In general, we can evaluate the expression Egðexpð�XÞÞ for any

distribution pgðXÞ for which the moment generating function (Laplace transform) is known explicitly,



as was noted in Richardson et al. (1987). Substituting in (7) with �g ¼ EgðXÞ, we obtain an aggregated

form of (5) in the Gaussian case,

EðYgÞ=Ng ¼ Egð�0Þexp 0:5�2�2g

� �expð�EgðXÞÞ ð10Þ

We see that in this case the aggregated form corresponding to (5) is also a function of the within-area

variance of the exposure variable (or the within-area covariance matrix for multiple exposures).

Neglecting this term thus leads to a biased functional relation at the group level. This bias is often

referred to as specification bias (see Richardson and Monfort, 2000). Nevertheless, as discussed by

Plummer and Clayton (1996), there are several situations where this bias can become negligible. This is

the case if either (i) �2g is small or (ii) �2

g hardly varies with g and thus can be absorbed in a constant

term. Essentially, for (i) to hold, the exposure has to be nearly uniform over the group, which is rarely

the case, except if the group is fairly small, whilst it is difficult for (ii) to hold if the mean �g also varies

between groups. Thus, it is important to have some knowledge of the within-area (co)variance of the

exposure(s), and to input this into the specification of the dose–effect relationship at the group level.

Note that our calculations in the Gaussian case can be easily extended to other within-area distributions

for X (Wakefield and Salway, 2001). However, in general, the moment generating function of X will

involve higher order moments of the within-area distribution of X, and these would be hard to estimate.

3.2. Modelling the baseline risk

In this section, we discuss the modelling of the beween-group variability of the average baseline risk:

Egð�0Þ. To simplify the notation, we now let this random quantity be denoted by �0g, as in Section 2.

3.2.1. Exchangeability. The simplest form of between-group variability is to suppose that the �0g are

exchangeable between the groups, in other words that all the �0g come from a common distribution

and are independent. Since �0g is positive, it is easier to work with its log-transform:

log�0g � pð/�Þ; independently for g ¼ 1; . . . ;G

Examples of commonly adopted parametric forms for pð/�Þ are a Gaussian or a Student t-distribution

with a chosen small degree of freedom; the latter distribution is advisable if outliers are suspected in

the variability of the baseline risk. In both cases, /� consists of a mean and a variance parameter, that

will be given suitable weakly informative priors, and estimated jointly with the f�0gg.

3.2.2. Spatial or temporal dependence. The simple exchangeable structure is appropriate if there is no

reason to suspect that some of the groups are more closely related than others in terms of their baseline

risk. In many ecological designs, though, the structure of the group renders this assumption

implausible. As mentioned previously, the groups often represent temporal or spatial units. In air

pollution studies, a common study design is to follow the population of one city on a daily, weekly or

monthly basis and to relate counts of chosen health events to levels of air pollution measured at

different monitoring stations. In geographical epidemiology studies, a particular geographical scale of

analysis is chosen first, like electoral wards or districts in the U.K., giving a set of predefined areas.

Then the map of health events per area cumulated over a time period is related to the spatial

distribution of environmental factors. Whether the units are indexed by time, �0g ¼ �0t, or by space,

�0g ¼ �0s, it is clear that an appropriate dependence structure has to be built.



(a) Temporal structure

A flexible temporal structure for the log �0t is to assume that they follow either a first order random

walk: log �0t ¼ log�0t�1 þ ut, or a second order random walk: log�0t ¼ 2 log�0t�1 � log�0t�2 þ ut,

where ut is assumed to be Gaussian white noise, i.e. independent and identically distributed Gaussian

variables with mean 0 and variance �2. Both these models make the assumptions that neighbouring

time points share some common component, the second order random walk being particularly well

suited for predictions. The size of the variance �2 controls the smoothness of the temporal pattern, a

small value corresponding to stronger time dependence. These models have been used in studies of

time trends for cancer rates; see Knorr-Held and Besag (1998) and Fahrmeir and Lang (2001).

(b) Spatial structure

A spatial structure for log�0s can be built along similar lines. Instead of using the natural time

ordering, it relies on defining a neighbourhood structure between areas s and s0, symbolized as s � s0, a

common choice being to say that s � s0 when the areas s and s0 are contiguous. Then, a commonly used

model of spatial dependence, referred to as an intrinsic or conditional autoregressive (CAR) model,

specifies the distribution of �s ¼ log�0s by

pð�sj�s0 ; s0 6¼ sÞ � Nð�s; �2=nsÞ

where �2 is an unknown variance parameter, �s ¼P

s0�s �s 0=ns, and ns denotes the number of

neighbours of area s. The CAR model has been extensively used in disease mapping studies concerned

with rare diseases after its introduction by Besag et al. (1991). The resulting estimates of the f�0sgborrow strength from the neighbouring areas and are smoothed towards a local mean. Note that to

account for spatial and non-spatial structure in the flog�0sg, one could model their distribution as the

sum of a CAR and an exchangeable model, as suggested by Besag et al. (1991) in the disease mapping

context. The associated graph is a chain graph containing both directed and undirected links (see

Spiegelhalter et al. (1995) and Bernardinelli et al. (1997) for illustrations).

3.3. Modelling the effect of exposure across groups

We now turn to a discussion of models for between-group variability in the effect of exposure on

disease risk. As noted in Section 2.2.2, if we wish to allow group-specific exposure coefficients ��g in

ecological models, it becomes important to specify some form of hierarchical prior distribution for the

f��gg to attempt to improve the identifiability of the model. Recent work on ecological inference in

sociological applications may provide some useful insight on this issue.

King et al. (1999) and Wakefield (2001) consider estimation of the cell probabilities of 2� 2 tables

where only the table margins are observed. In the present context, this corresponds to a scenario in

which each 2�2 table represents a group g, with columns and rows representing disease status and a

binary exposure, respectively (see Table 1). If individual-level data (equivalent in this case to the cell-

specific counts n:g) are available, then estimation of the group-specific probabilities �0g ¼ Pr(disease junexposed) and �1g ¼ Pr(disease j exposed) is straightforward, from which it is possible to obtain an

exposure coefficient for individuals within group g, i.e. ��g ¼ f ð�1gÞ � f ð�0gÞ, where f ðÞ is an

appropriate link function (usually log or logit). Note also that �0g corresponds to what we have

termed the baseline risk of disease in group g, denoted �0g in previous sections. In ecological studies

we only observe the margins of the underlying 2�2 table (i.e. total number of disease cases Yg, total



proportion exposed Zg, and total population at risk Ng). Nonetheless, depending on the particular data

set, there may be some information in the margins to allow the construction of bounds on the

proportion of diseased individuals who are or are not exposed. For example, Wakefield (2001) cites

the case in which one of the table margins is near zero. That is, if most or all individuals in a group are

exposed (i.e. Zg!1), then knowledge of the total number of diseased and disease-free individuals in

the group provides considerable information about �1g (although virtually no information about �0g).

Likewise, if most or all individuals in a group are unexposed (i.e. Zg!0), considerable information is

available about �0g but not �1g. In the worst case scenario, if exactly half the individuals in a group

have the disease, and exactly half are exposed, then the margins contain no information about �0g and

�1g since any combination of cell counts n:g yielding row and column sums equal to Ng=2 is possible

(Wakefield, 2001).

In order to facilitate estimation of �0g and �1g (and hence, in our case, ��g) for tables (groups)

containing little information, King et al. (1999) propose a hierarchical model. The idea is to borrow

strength across groups where the proportion of exposed individuals is close to 1 to estimate �1g for all

g ¼ 1; . . . ;G, and to borrow strength across groups where the proportion of exposed individuals is

close to 0 to estimate �0g for all g ¼ 1; . . . ;G. King et al. (1999) assume independent beta distributions

for the joint distributions of �jg; j ¼ 0; 1; g ¼ 1; . . . ;G. Alternatively, Wakefield (2001) considers both

independent normal and Student t priors for the joint distributions of the logit-transformed

probabilities �jg; j ¼ 0; 1; g ¼ 1; . . . ;G, and discusses the extension to a bivariate normal prior

distribution to incorporate dependence between �0g and �1g. In the present context, it seems more

natural to specify the hierarchical model directly on the regression coefficients ��g and the baseline

risks �0 than on the probabilities �0g and �1g. For example, Assuncao et al. (2001) assume

independent CAR distributions (see Section 3.2.2) for the f�0gg and f��gg in a spatial regression

model of human fertility rates (where g represents small areas). However, the scenario they consider is

a special case in which the data for each area (including the ‘exposure’ of interest) are available by

age-group. Hence there is replication of exposure and response data within areas, leading to a fully

identified model. Whether such a model would be estimable more generally is not clear.

In general, although the work by King et al. (1999) and Wakefield (2001) indicates that ecological

data may contain some information by which to estimate random coefficient models for categorical

exposures, it is not clear that this will be sufficient to derive useful estimates of group-varying effects

of environmental exposures on disease risk. It is rare to find environmental epidemiological examples

with close to 100 per cent of individuals in a group exposed or vice versa (the most information-rich

situations), so even using a hierarchical model to borrow strength across groups is unlikely to produce

reliable estimates of the group-specific regression coefficients. The examples presented by both

Wakefield and King et al. also suggest that the probabilities for some groups may be sensitive to the

choice of prior distribution at the second level. Furthermore, Wakefield notes that the case of a

Table 1. 2� 2 table summarizing the potential data available on disease statusand a binary exposure for group g. In an ecological study only the margins are

observed

No disease Disease Total

Unexposed n00g n10g

Exposed n01g n11g NgZg

Yg Ng



continuous exposure is even more difficult since knowledge of the average exposure for a group does

not provide any information about the distribution of individual exposure values within that group.

As noted in Section 2.2.2, a pragmatic solution to the problem of estimating group-varying effects of

exposure is to introduce an additional level of aggregation into the model and allow the exposure

coefficients to vary at this coarser resolution. Denoting the subsets of groups by Sj � g; g ¼1; . . . ;G; j ¼ 1; . . . ; J < G as before, we may model the variability between exposure coefficients by

assuming a joint distribution pðb�j/�Þ for b� ¼ f��j g. Appropriate distributional choices include the

exchangeable, random walk (for temporally defined subsets) and CAR (for spatially defined subsets)

distributions discussed in Section 3.2 in the context of modelling the baseline risk. This approach is

illustrated in Section 3.6, where we briefly discuss an example by Dominici et al. (2000). These authors

consider a hierarchical model with groups defined by time (daily counts of mortality and mean air

pollution concentration) nested within space (20 large cities in the U.S.A.). Each city thus represents a

spatially defined subset of temporal groups, and the authors then assume a hierarchical model for the

city-specific regression coefficients, representing risk of mortality associated with specific air pollutants.

If we are unwilling to assign groups to subsets a priori, a more flexible approach is to assume a

mixture model for the joint distribution of the group-specific coefficients. Such a model also assumes

that the groups g belong to subsets Sj; j ¼ 1; . . . ; J G, with a common exposure effect ��j for all

groups g 2 Sj. However, the number and composition of the subsets Sj is assumed to be unknown

a priori, and is instead estimated as part of the model. Such an approach has been used by Hurn et al.

(2002) for various applications, including to estimate the effect of daily ambient nitrogen dioxide

concentrations on risk of hospital admissions for circulatory and respiratory diseases.

3.4. Measurement error model between group average and ecological measures

of environmental exposure

Whatever the form (4), (9) or (10) of the disease–exposure relationship at the group level, we have seen

that the average group exposure EgðXÞ is involved. In the majority of studies, there is not enough

information to estimate EgðXÞ for each group, and this random quantity is commonly replaced by a

group level ecological surrogate, Zg. For example, Zg could represent a recorded measure of ambient

air pollution or a level of chlorine in drinking water supplied to a city or region. Thus, the investigation

of the health effect of a specific environmental exposure relies in many cases on linking EðYgÞ to Zg by

a coefficient �� (see Figure 2), whereas the true interest lies in the more interpretable coefficient �linking EðYgÞ to EgðXÞ (modulo the within-area variability).

Modelling the different sources of variability between the recorded measures Zg of environmental

exposure and the relevant individual or group averaged exposure is a key component of any statistical

analysis of environmental effects. Such analyses typically involve building several model components

expressing links between individual Xig and true environmental exposure Z�g , and between measured

Zg and true Z�g environmental exposure and combining these sub-models to derive the link between

EgðXÞ and EðYgÞ.Misclassification of exposure has long been recognized as a limitation of many epidemiological

studies. Indeed, measurement error can distort the quantification of the dose–effect relationship

investigated. The extent and the nature of the distortion, depending on several factors including the

study design, the type of error and the relationship between the outcome and the covariates; see, for

example, the review by Thomas et al. (1993) or the general Bayesian framework outlined in

Richardson and Gilks (1993). Several types of error models have been traditionally considered. In

the classical measurement error formulation, the conditional distribution of the surrogate exposure



given the true exposure is specified. In doing this, one is essentially characterizing the measuring

instrument. In the Berkson error model, it is instead the conditional distribution of the true exposure

given the surrogate exposure that is specified. For example, if Z�g is the ‘true’ mean ambient

concentration of a particular pollutant in a city on a specific day and Zg are the recorded concentrations

at one or several monitoring stations, then it might be appropriate to formulate a classical error model

pðZgjZ�g ;/zÞ, where /z quantifies both the sensitivity of the recording instrument and the network

coverage of the monitoring stations (see Zeger, 2000, for a detailed discussion of measurement error

models in the air pollution context). On the other hand, the model between the exposure Xig of an

individual to this pollutant and the true mean ambient concentration Z�g is more of the Berkson type

and would be formulated using a physiologically based dose absorption model and allowing for the

individual’s activity patterns. In this case, it is pðXigjZ�g ; �Þ, the conditional distribution of Xig given Z�

g ,

that is modelled, where � are the dose absorption parameters. From the discussion above, one can see

that the relationship between EgðXÞ and Z�g is a combination of the measurement error model relating

monitored and true ambient exposures, the physiologically based dose absorption model and some

group measure of time-activity data aiming to characterize the proportion of time spent in different

ambient environments (e.g. outdoors, in traffic, in the home, etc.). Thus it combines both classical and

Berkson components, each component requiring careful specification. Cases of measurement error

situations combining both Berkson and classical features have been discussed in the occupational

context of job-exposure matrices by Gilks and Richardson (1992), and Richardson (1996), and in the

environmental context of radon exposure by Reeves et al. (1998).

To estimate the parameters of these models, quantitative information on the measurement error has

to be introduced. This information may come from different sources: a priori external information on

the measurement instrument or purpose built validation or replication sub-studies. For example, the

relationship between indoor and outdoor air pollution levels can be studied if recording of the indoor

pollution is made in a representative sample of houses. Moreover, to quantify the link between group

exposure and environmental exposure, specifically designed surveys collecting time-activity data in

order to characterize the proportion of time spent in different environments for different age-groups

and categories of occupations will be needed. The information contributed by these sub-studies can

then be built as an additional part of the graph of the hierarchical model and provide the necessary

information to identify the relevant parameters of the measurement error model. For example, Figure 3

shows how part (c) of the graph in Figure 2 could be extended to incorporate data from a sub-study on

replicate measurements of ambient pollution concentrations in different environments, and data from a

time-activity sub-study measuring the amount of time spent by a sample of individuals in the various

environments. This illustates how hierarchical models may be used to help construct more realistic

models of the complex relationships involved in studying the effect of the environment on health.

3.5. Summary of hierarchical structures

In summary, in full ecological designs involving counts of cases of a rare disease and a single

environmental exposure, the following core 2-level hierarchy is usually specified:

Yg � Poissonð�gÞlogð�gÞ ¼ logð�0gÞ þ ��

j Z�g

accompanied by hierarchical substructures describing the variability of logð�0gÞ; �j and Z�g .



For the baseline risk, we have described three different structures, the choice being dictated by the

underlying dependence structure which is related to the constitution of the groups:

logð�0gÞ � pð��Þ; independently for g ¼ 1; . . . ;G ðexchangeabilityÞ

or, when g ¼ t:

logð�0tÞ ¼ logð�0t�1Þ þ ut; ut � pð��Þ; independently ðfirst order random walkÞ

or, when g ¼ s:

logð�0sÞ ¼ �s þ us; �s � spatial CAR model; us � pð��Þ; independently

For modelling the group-varying effect of exposure, we have described an additional level in terms of

the subset Sj � g; g ¼ 1; . . . ;G; j ¼ 1; . . . ; J < G : f��j g � pð/�Þ; in order to be able to identify the

f��j g. pð/�Þ may take the form of an exchangeable, random walk or spatial CAR distribution as

appropriate, or, if the composition and number of subsets Sj is unknown a priori, a mixture distribution

may be specified.

To allow for measurement error between the true environmental level of the covariates Z�g and the

recorded levels Zg, and/or the average group exposure EgðXÞ, we have described a classical error

model pðZgjZ�g ;/zÞ and a combined Berkson and classical model pðEgðXÞjZ�

g ; h;TgÞ:

3.6. Combined analyses of several data sets

In this section, we give an illustration of hierarchical modelling of health–environmental effects in the

context of air pollution. In this domain, it is important to be able to combine the analyses of several

Figure 3. Extension of part of the graphical model to include information from sub-studies to improve estimation of the

relationship between EgðXÞ and Zg. Zger represents replicate measurements r, and Z�ge the true value, of the ambient pollutant

concentration in different environments e within group g; Tige represents measurements of the amount of time individual i in

group g spends in environment e; the dashed arrow between Xige and EgðXÞ represents a deterministic dependence as opposed to

the stochastic relationships implied by the solid arrows



data sets to increase the power of the analyses. Indeed, the relative risks associated with environmental

exposure are often small and the results of separate studies will be associated with large uncertainties.

Combined analyses can be done straightforwardly by extending the hierarchical framework discussed

in previous sections to include a model of between-datasets variability. The resulting estimates will be

different from a simple pooled estimate that assumes no variability between studies. To illustrate the

power of the hierarchical modelling strategy, we discuss in this section a recent study published by

Dominici et al. (2000) that concerns the effect of air pollution on mortality. Precisely, the authors

analysed time series of daily air pollution and mortality in the 20 largest U.S. cities, using a two-stage

model building strategy:

1. Building the time series model for each city. Here, the data consist of the daily total number of

deaths Yt for an age group (the analysis is repeated for several age groups, but for simplicity of

exposition, we do not include an age index). It is thus aggregated by day, t, and will be further indexed

by the city, s. The covariates of interest are the air pollution levels Zts ¼ ðZ1ts; Z2tsÞ, where Z1ts is the

recorded level of particulates (PM10) and Z2ts is the recorded ozone (O3) level. The aim is to study the

short term effect of these pollutants on mortality and not the chronic effects. Thus, there is a need to

account for time trends and seasonal patterns in mortality to avoid confounding the short term effects

by the long term trends that might be due to changes in the characteristics of the population. The

authors chose to model these effects by a flexible function of time, that we will denote globally by Sts

and not detail further. Thus, for each city s, the following model is specified:

Yts � Poissonð�tsÞlogð�tgÞ ¼ Sts þ ��T

s Zts þ ut

where ut is a Gaussian white noise, the time dependence in the daily counts having been absorbed in

the flexible function of time Sts.

If as discussed previously, external information is available on the link between Zts and the average

group exposure, this information can be included at this stage.

2. Building the between city model for combining the data. A pooled analysis of the effect of air

pollution on mortality for the 20 cities would assume that the effects quantified by ��s are the same for

all the cities, ��s ¼ ��, for all s, and combine the separate estimates with weights inversely

proportional to their variance. This simplifying assumption can be misleading as there are many

sources of variability between the cities that can create variability of the ��s .

In a hierarchical framework, this variability is explicitly modelled; in particular, site-specific

explanatory variables Cs such as the percentage of people living in poor socio-economic conditions or

even the average pollution level in the study period are taken into account. The following model is thus

adopted at the next hierarchical level:

��s � Nð�� þ Cs;�Þ; independently for each s ð11Þ

In (11), the intercept �� represents a synthesis of the information from the different cities, and

estimates the overall effects of air pollution on mortality after accounting for within and between site

confounders. Moreover, the between-city variability � is also quantified in the hierarchical analysis,

and this is of interest per se. Dominici et al. (2000) further extend (11) to a multivariate normal

distribution for b� ¼ f��s g with spatially structured covariance matrix � allowing the correlation

between cities s and s0 to depend on distance. Besides giving a sensible estimate of the overall effects,



the hierarchical framework also leads to improved estimation of the parameters ��s for each city

through shrinkage and borrowing of strength between the data sets. In their combined study, Dominici

et al. (2000) found an overall short term effect of PM10 after adjustment for within and between city

covariates.

4. DISCUSSION

There are many benefits of using a hierarchical Bayesian modelling strategy when faced with the

complex problem of studying health–environment effects. We regroup these under the following

headings, but note that there is some obvious intersection between these: (a) modular model

elaboration; (b) integration of different sources of information; (c) coherent propagation of

uncertainty; (d) borrowing of strength; (e) integrated treatment of information at different levels.

(a) The current trend of increasing sophistication and deployment of measurement instruments forquantifying environmental exposure has led to increasing availability of more abundant and betterquality data. It is important that the analyst uses a framework where each new type of data, forexample a new series of indoor measures of air pollution, can be treated in a modular fashion. Thisis exactly what hierarchical model building provides. Suppose that there is already a core modelthat can be represented by a DAG and that a new type of data becomes available. Firstly, a separatemodel is built to account for the specific characteristics of the new type of data. Subsequently, this‘module’ is linked to the rest of the variables by extending the DAG of the original model aroundthe pivotal variables that are common to the ‘module’ and the original graph. In our example ofindoor pollution, it is the unobserved individual exposure Xig that is pivotal to the core model andthat describing the indoor pollution variability, as was illustrated in Figure 3.

(b) A related benefit of the modularity just described is that it renders possible the simultaneousintegration of different sources of information. Indeed, by always building a joint model of all thevariables that encompasses any number of different modules, all sources of information areintegrated to contribute to the estimation of the dose–effect relationships of interest. For example,in the air pollution context, separate sub-studies might be available: (i) linking personal (badge-measured) and indoor exposure, (ii) relating indoor and outdoor exposure, and (iii) quantifyingproportion of time spent outdoors for different categories of people, all contributing to building theoverall model for studying the effect of environmental air pollution on health at the group level.

(c) In parallel, the joint hierarchical model leads automatically to a correct propagation of all sourcesof uncertainty that have been quantified in each ‘module’ onto the estimation of the parameters ofinterest. We stress that this is not the case for methods that would proceed by ‘substitution’, forinstance by replacing an unknown exposure Xig by an estimated one Xig using an equationcalibrated in a sub-study. In the present framework, unknown exposures are treated as randomvariables, their distribution is informed by the different sub-studies to which they are related andthe associated uncertainty is then propagated on the distribution of the coefficients � or ��.

(d) One important aspect of hierarchical models that has been widely discussed in the previoussections is that it allows the borrowing of strength between different data sets, thus leading toimproved and more stable estimates of the parameters of interest. The simplest model forborrowing strength is the exchangeable model, but we have also discussed several extensions thatpermit flexible modelling of dependence between the data sets, dependence that can extend intime, in space or through the sharing of a higher level grouping structure. This is an active area ofresearch at the moment with natural extension to space–time or space–time–activity models.

(e) In some cases, health data might be available at an individual level, while contextual orenvironmental exposure variables are expressed at the aggregated level of geographical units.



Thus, there is a need to build a framework that can, in principle, accommodate data observed atdifferent scales. One framework that is useful for that purpose is that of point process models. Inspatial epidemiology, one might consider that the location of the individuals at risk is beingmodelled by a baseline demographic process, on which is superimposed a disease process that‘picks’ out the cases with a probability that is dependent both on individual level risk factors andarea-level variables measuring individual exposure (Richardson, 2002). In a study of the effect oftraffic pollution on respiratory disorders of children carried out by Best et al. (2000), a pointprocess is incorporated in a hierarchical model, allowing to account both for individual risk factorsof the children and for environmental levels of exposure to air pollution. There is much scope forfurther work in this direction.

ACKNOWLEDGEMENTS

The authors would like to acknowledge support from the U.K. Medical Research Council (Career EstablishmentGrant G9803841) and the U.K. Small Area Health Statistics Unit. They are grateful to their colleagues, PeterGreen, David Spiegelhalter and Jon Wakefield for stimulating discussions. SR thanks the organizers of the ISCEPconference for the invitation to speak.

REFERENCES

Assuncao RM, Potter JE, Cavenaghi SM. 2001. A Bayesian space varying parameter model applied to estimating fertilityschedules. Technical Report, Departamento de Estatistica, UFMG, Brazil.

Bernardinelli L, Pascutto C, Best NG, Gilks WR. 1997. Disease mapping with errors in covariates. Statistics in Medicine 16:741–752.

Besag J, York J, Mollie A. 1991. Bayesian image restoration, with two applications in spatial statistics (with discussion). Annalsof the Institute of Statistical Mathematics 43: 1–59.

Best N, Ickstadt K, Wolpert R. 2000. Spatial Poisson regression for health and exposure data measured at disparate resolutions.Journal of the American Statistical Society 95: 1076–1088.

Breslow N. 1990. Biostatistics and Bayes (with discussion). Statistical Science 5: 269–298.Briggs DB. 2000. Exposure assessment. In Spatial Epidemiology: Methods and Applications, Elliott P, Wakefield JC, Best NG,

Briggs DB (eds). Oxford University Press: Oxford, UK; 335–359.Dominici F, Samet JM, Zeger SL. 2000. Combining evidence on air pollution and daily mortality from the 20 largest us cities: a

hierarchical modelling strategy. Journal of the Royal Statistical Society, Series A 163(3): 263–302.Fahrmeir L, Lang S. 2001. Bayesian inference for generalized additive mixed models based on Markov random field priors.

Journal of the Royal Statistical Society, Series C 50(2): 201–220.Gelman A, Carlin JB, Stern HS, Rubin DB (eds). 1995. Bayesian Data Analysis. Chapman & Hall: London, UK.Gilks W, Richardson S. 1992. Analysis of disease risks using ancillary risk factors, with application to job–exposure matrices.

Statistics in Medicine 11: 1443–1463.Gilks WR, Richardson S, Spiegelhalter DJ (eds). 1996. Markov chain Monte Carlo in Practice. Chapman and Hall: London, UK.Goldstein H (ed.). 1995. Multilevel Models in Educational and Social Research, 2 edn. Arnold: London, UK.Good I. 1987. Hierarchical Bayesian and empirical Bayesian methods with discussion. American Statistician 41.Green PJ. 2001. A primer on Markov chain Monte Carlo. In Complex Stochastic Systems, Barndorff-Nielsen OE, Cox DR,

Kluppelberg K (eds). Chapman & Hall: London, UK.Greenland S. 1992. Divergent biases in ecologic and individual-level studies. Statistical Medicine 11: 1200–1223.Hurn M, Justel A, Robert CP. 2002. Estimating mixtures of regressions. J. Comput. Graph. Stat. (to appear).King G, Rosen O, Tanner MA. 1999. Binomial-beta hierarchical models for ecological inference. Sociological Methods and

Research 28: 61–90.Knorr-Held L, Besag J. 1998. Modelling risk from a disease in time and space. Statistics in Medicine 17: 2045–2060.Lauritzen SL. 1996. Graphical Models. Clarendon Press: Oxford, UK.Morris CN, Normand SL. 1991. Hierarchical models form combining information and for meta-analyses. In Bayesian Statistics

IV, Bernardo JO, Berger JP, Dawid AP, Smith AFM (eds). Oxford University Press: Oxford, UK; 321–344.Plummer M, Clayton D. 1996. Estimation of population exposure in ecological studies (with discussion). Journal of the Royal

Statistical Society, Series B 58: 113–126.



Reeves G, Cox D, Darby S, Whitley E. 1998. Some aspects of measurement error in explanatory variables for continuous andbinary regression models. Statistics in Medicine 17: 2157–2177.

Richardson S. 1996. Measurement error. In Markov chain Monte Carlo in Practice. Chapman & Hall: London, UK; 401–417.Richardson S. 2002. Spatial models in epidemiological applications. In Highly Structured Stochastic Systems, Green P, Hjort N,

Richardson S (eds). Oxford University Press: Oxford, UK (in press).Richardson S, Gilks W. 1993. Conditional independence models for epidemiological studies with covariate measurement error.

Statistics in Medicine 12: 1703–1722.Richardson S, Monfort C. 2000. Ecological correlation studies. In Spatial Epidemiology: Methods and Applciations, Elliott P,

Wakefield J, Best N, Briggs D (eds). Oxford University Press: Oxford, UK; 205–220.Richardson S, Stucker I, Hemon D. 1987. Comparison of relative risks obtained in ecological and individual studies: some

methodological considerations. International Journal of Epidemiology 16: 111–120.Rosenberg MA, Andrews RW, Lenk PJ. 1999. A hierarchical Bayesian model for predicting the rate of nonacceptable in-patient

hospital utilization. J. Bus. Econ. Stat. 17: 1–8.Spiegelhalter DJ. 1998. Bayesian graphical modelling: a case study in monitoiring health outcomes. Applied Statistics 47(1):

115–133.Spiegelhalter DJ, Thomas A, Best NG. 1995. Computation on Bayesian graphical models. In Bayesian Statistics 5, Bernardo JM,

Berger JO, Dawid AP, Smith AFM (eds). Oxford University Press: Oxford, UK; 407–425.Spiegelhalter DJ, Thomas A, Best NG, Lunn D. 2001. WinBUGS Version 1.4 User Manual. Imperial College, London and MRC

Biostatistics Unit, Cambridge, Available from www.mrc-bsu.cam.ac.uk/bugs.Su ZM, Adkison MD, Van Alen BW. 2001. A hierarchical Bayesian model for estimating historical salmon escapement and

escapement timing. Can. J. Fish. Aquat. Sci. 58: 1648–1662.Thomas D, Stram D, Dwyer J. 1993. Exposure measurement error: infiuence on exposure–disease relationships and methods of

correction. Annual Revue of Public Health 14: 69–93.Wakefield J, Salway R. 2001. A statistical framework for ecological and aggregate studies. Journal of the Royal Statistical

Society 164(1): 119–137.Wakefield JC. 1996. The Bayesian analysis of population pharmacokinetic models. Journal of the American Statistical

Association 91: 62–75.Wakefield JC. 2001. Ecological inference for 2� 2 tables. Technical Report 12, Centre for Statistics and the Social Sciences.

University of Washington: Seattle.Wakefield JC, Best NG, Waller L. 2000. Bayesian approaches to disease mapping. In Spatial Epidemiology: Methods and

Applications, Elliott P, Wakefield JC, Best NG, Briggs DB (eds). Oxford University Press: Oxford, UK; 104–127.Zeger S. 2000. Exposure measurement error in time-series studies of air pollution: concepts and consequences. Environmental

Health Perspectives 108: 419–426.



bayesian hierarchical models in ecological studies of health–environment effects

Documents