bayesian modeling of animal- and herd-level prevalences
DESCRIPTION
This descibes the famous bayesian modelling for animal and herd level prevalencesTRANSCRIPT
Bayesian modeling of animal- and
herd-level prevalences
A.J. Branscuma,�, I.A. Gardnerb, W.O. Johnsona
aDepartment of Statistics, University of California, Davis, CA 95616, USAbDepartment of Medicine and Epidemiology, School of Veterinary Medicine,
University of California, Davis, CA 95616, USA
Received 9 September 2003; received in revised form 11 June 2004; accepted 6 September 2004
Abstract
We reviewed Bayesian approaches for animal-level and herd-level prevalence estimation based on
cross-sectional sampling designs and demonstrated fitting of these models using the WinBUGS
software. We considered estimation of infection prevalence based on use of a single diagnostic test
applied to a single herd with binomial and hypergeometric sampling. We then considered multiple
herds under binomial sampling with the primary goal of estimating the prevalence distribution and
the proportion of infected herds. A new model is presented that can be used to estimate the herd-level
prevalence in a region, including the posterior probability that all herds are non-infected. Using this
model, inferences for the distribution of prevalences, mean prevalence in the region, and predicted
prevalence of herds in the region (including the predicted probability of zero prevalence) are also
available. In the models presented, both animal- and herd-level prevalences are modeled as mixture
distributions to allow for zero infection prevalences. (If mixture models for the prevalences were not
used, prevalence estimates might be artificially inflated, especially in herds and regions with low or
zero prevalence.) Finally, we considered estimation of animal-level prevalence based on pooled
samples.
# 2004 Elsevier B.V. All rights reserved.
Keywords: Bayesian modeling; Prevalence estimation; Pooled samples; WinBUGS
www.elsevier.com/locate/prevetmed
Preventive Veterinary Medicine 66 (2004) 101–112
� Corresponding author. Tel.: +1 530 752 2361; fax: +1 530 752 7099.
E-mail address: [email protected] (A.J. Branscum).
0167-5877/$ – see front matter # 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.prevetmed.2004.09.009
1. Introduction
Recently, Bayesian analysis has been used increasingly for veterinary epidemiologic
studies. Branscum et al. (2005) and Suess et al. (2002) provide brief introductions to
Bayesian methods and applied those methods to estimating measures of test accuracy and
prevalence inference. Advantages of a Bayesian approach include the ability to incorporate
prior information about model parameters into the analysis; obtaining exact inference (up
to Monte Carlo error) without the need for large-sample approximations; and, notably, the
development of the WinBUGS software (Spiegelhalter et al., 1996) (which enables a
relatively straightforward means of fitting various Bayesian models) (Dunson, 2001;
Gardner, 2002). Branscum et al. (2005) gave an overview of using WinBUGS for Bayesian
modeling and reviewed several models to estimate the sensitivity and specificity of
multiple (possibly correlated) diagnostic tests.
In this paper, we considered Bayesian approaches for prevalence estimation based on
cross-sectional sampling designs and discussed implementation of these models in
WinBUGS, wherever possible. Tu et al. (1999) considered prevalence estimation with
covariates and Erkanli et al. (1999) presented Bayesian approaches for prevalence
estimation based on longitudinal data (some of which can also be done easily in
WinBUGS)—but we did not consider covariates or longitudinal data herein.
We reviewed the two models developed in Hanson et al. (2003a), the hierarchical
model of Hanson et al. (2003b), and the Bayesian approach to animal-level prevalence
inference based on pooled samples described in Cowling et al. (1999). The two
models developed by Hanson et al. (2003a) were used for estimating the animal-level
infection prevalence of a single herd based on the results of an imperfect diagnostic test
applied independently to each of n randomly sampled animals. The first model
(which assumes binomial sampling) is appropriate provided the herd size is much larger
than the sample size. Otherwise, a model based on hypergeometric sampling should be
used.
Next, a modified version of the hierarchical model of Hanson et al. (2003b) for
prevalence inferences based on data from multiple herds is presented. We also present a
new model—similar to the model presented in Suess et al. (2002) but easier to implement
computationally—designed to estimate herd-level prevalence based on data from multiple
herds obtained using a two-stage cluster-sampling design. The models using data from
multiple herds yielded estimates of the distribution of prevalences in the population. Lastly,
we illustrate a Bayesian approach to animal-level prevalence estimation based on pooled
samples (Cowling et al., 1999).
The models were fitted using Markov-chain Monte Carlo techniques, in particular the
Gibbs sampler (Robert and Casella, 1999). Suess et al. (2002, pp. 160–161) gave a detailed
explanation of this approach in the context of prevalence estimation. For the analyses
presented in this paper, inferences were based on 50,000 iterations after discarding an
initial burn-in of 5,000 iterations with convergence assessed by running multiple chains
from various starting values (Gelman and Rubin, 1992). The WinBUGS code used to fit all
models is available on our website, http://www.epi.ucdavis.edu/diagnostictests/ (except in
the hypergeometric case where specialized code was used). The WinBUGS code can be
altered readily to conform to different data.
A.J. Branscum et al. / Preventive Veterinary Medicine 66 (2004) 101–112102
2. Methods for a single herd
2.1. Binomial sampling
Consider estimation of the infection prevalence for a single herd where y animals tested
positive out of n randomly sampled animals. If the herd size (N) is much larger than n, then
the sampling distribution of y is approximately binomial:
yjp; Se; Sp�Binðn;pSe þ ð1 � pÞð1 � SpÞÞ
where p is the prevalence of infection in the herd and Se and Sp are the sensitivity and
specificity, respectively, of the diagnostic test applied to each sampled animal. With the
specification of prior distributions for the model parameters, the Bayesian model is
complete.
We modeled uncertainty about the sensitivity (Se) and specificity (Sp) of the diagnostic
test using independent beta prior distributions (Hanson et al., 2003a):
Se�BetaðaSe; bSeÞ Sp�BetaðaSp; bSpÞ:
A beta distribution provides a flexible means of modeling uncertainty about parameters
ranging from 0 to 1. The prior parameters of all beta distributions used in this paper were
determined based on the most-likely (modal) prior value of the parameter and an upper or
lower percentile for the parameter. These values typically are elicited from experts. The Beta
Buster software available at our website (http://www.epi.ucdavis.edu/diagnostictests/)
computes the two parameters of a beta distribution based on the mode and one of the two
percentiles.
A beta prior distribution also may be used for the infection prevalence, but this would
imply that Pðp ¼ 0Þ ¼ 0. This approach was used by Johnson et al. (2004). However, if the
herd is truly not infected, then p ¼ 0 and so we require a model that will allow for this.
Hanson et al. (2003a) modeled the infection prevalence using a mixture distribution:
p�Betaðap; bpÞ with probability t
p ¼ 0 with probability 1 � t
where t is the probability that the herd is infected. With this mixture distribution,
computation of the posterior probability that the herd is not infected is possible and this
computation can be performed easily using WinBUGS under binomial-sampling schemes.
A beta prior distribution also can be used for t. Alternately, t can be set equal to an expert-
specified constant (t0).
2.1.1. Prevalence of Toxoplasma gondii
Davies et al. (1998) determined the seroprevalence of T. gondii in finishing pigs for 28
herds in North Carolina, USA. We estimated the prevalence of T. gondii in one of the
sampled herds where y ¼ 1 pig tested positive out of n ¼ 91 pigs with a modified
agglutination test (MAT). The beta prior distribution used for the Se and Sp of the MAT had
modal values of 0.70 and 0.99, respectively, with 5th percentiles of 0.55 and 0.95. These
values were based on estimates from a previous study (Dubey et al., 1995) and field
A.J. Branscum et al. / Preventive Veterinary Medicine 66 (2004) 101–112 103
evidence that the Sp of the MAT in grower-finisher pigs is much higher than that in sows,
which were used in the original study.
The scientifically most likely (prior) value for the probability that the herd was infected
was taken to be 0.10; i.e. we set t0 ¼ 0:10. The infection prevalence, assuming the herd
truly was infected, was modeled using a beta (1.80, 26.74) distribution, which has a mode
of 0.03 and 95th percentile of 0.15. These values were elicited from one of the authors (Dr.
Ian Gardner) and were based on estimates from previous National Animal Health
Monitoring System studies in 1990, and changes in industry demographics and production
systems in the last decade that would decrease prevalence of infection in grower-finisher
pigs.
The posterior median and 95% probability interval (sometimes called ‘‘credibility
interval’’) for the prevalence was 0 (0.00, 0.01). Also, the posterior probability that the herd
was infected was 0.03; that is, Pðp> 0 j y ¼ 1; n ¼ 91Þ ¼ 0:03. The median for p was zero
because 97% of the Gibbs iterates for p were 0.
2.2. Hypergeometric sampling
The distribution of the number of test-positive animals out of the n sampled can be
approximated well by a binomial distribution if N n (that is, if the herd size far exceeds
the sample size). When the test outcomes for the sampled animals cannot be considered as
independent Bernoulli trials, the distribution of y (which is related to the hypergeometric
distribution) is more complicated (Johnson et al., 2004), and therefore the model is difficult
to implement in WinBUGS. However, the model consists of only three parameters: the Se
and Sp of the test and the prevalence (p, where for d infected animals in the herd,
p ¼ d=N).
The freely available, user-friendly software called BDFree (short for Bayesian Disease
Freedom) developed by Johnson et al. (2004) and downloadable from http://
www.epi.ucdavis.edu/diagnostictests, allows the user to input data (y, n, N) and to
specify beta prior distributions for Se and Sp. A discretized beta distribution (a discrete
distribution taking values in the set {0; 1=N; 2=N; . . . ; 1} where the corresponding
probabilities are obtained from a continuous beta distribution) is used for p (Hanson et al.,
2003a). The output provided by BDFree includes the posterior median and 95% probability
interval for p; it also provides the posterior probability that p is no larger than a user-
specified threshold (p0)—namely, Pðp p0 j y; n;NÞ.We note that BDFree also can be used to fit the model in Section 2.1 for the special case
of t ¼ 1. In this instance, a value of p0 can be selected so that if p p0, then the herd can
be considered ‘‘low risk’’. If Pðp p0 j y; n;NÞ is sufficiently large (say, 0.95), then the
herd can be considered to have the disease well controlled.
2.2.1. Prevalence of Johne’s disease
The prevalence of Johne’s disease (Mycobacterium avium subsp. paratuberculosis) in a
herd in the central valley of California, USA was estimated by testing all N ¼ n ¼ 333
cows using a commercial ELISA kit (Idexx Laboratories, Westbrook, Maine). Ten of the
cows had positive test results.
A.J. Branscum et al. / Preventive Veterinary Medicine 66 (2004) 101–112104
The beta prior distributions for the Se and Sp of the ELISA were constructed using
expert information from Dr. Michael Collins, University of Wisconsin (Madison,
Wisconsin, USA). For the Se, a mode of 0.25 was used with 95th percentile of 0.30. A
modal value of 0.98 was used for the Sp with 5th percentile equal to 0.96. We used a
relatively diffuse discretized beta prior distribution for the prevalence with mode equal to
0.05 and 95th percentile equal to 0.30.
The posterior probability that the herd was infected was 0.991: Pðp> 0 j y ¼ 10;n ¼ 333;N ¼ 333Þ ¼ 0:991. The posterior median and 95% probability interval for pwere
0.064 (0.006, 0.161).
3. Methods for multiple herds
3.1. Estimation of the within-herd prevalence distribution
In the multiple-herd setting, suppose nt animals are sampled randomly from herds
t ¼ 1; 2; . . . ;K. We consider estimation of the prevalence distribution (PD), which is
defined to be the distribution of prevalences for herds in the super-population of all herds in
the region. The model we present is a modified version of that proposed by Hanson et al.
(2003b) and we estimated the PD based on results of two dependent diagnostic tests
applied to each sampled animal. The data for herd t are given by the vector
yt ¼ ðy11t; y12t; y21t; y22tÞ, where y11t (y22t) is the number of animals from herd t testing
positive (negative) on both tests and y12t (y21t) denotes the number of animals from herd t
testing positive (negative) on test 1 and negative (positive) on test 2. The complete data set
is comprised of the independent vectors y1; y2; . . . ; yK .
The data were assumed to have independent multinomial sampling distributions:
yt �multinomialðnt; ðp11t; p12t; p21t; p22tÞÞ
where
p11t ¼ PtðTþ1 ; Tþ
2 Þ ¼ pt½Se1Se2 þ covDþ � þ ð1 � ptÞ½ð1 � Sp1Þð1 � Sp2Þ þ covD��p12t ¼ PtðTþ
1 ; T�2 Þ ¼ pt½Se1ð1 � Se2Þ � covDþ� þ ð1 � ptÞ½ð1 � Sp1ÞSp2 � covD��
p21t ¼ PtðT�1 ; Tþ
2 Þ ¼ pt½ð1 � Se1ÞSe2 � covDþ� þ ð1 � ptÞ½Sp1ð1 � Sp2Þ � covD��p22t ¼ PtðT�
1 ; T�2 Þ ¼ pt½ð1 � Se1Þð1 � Se2Þ þ covDþ� þ ð1 � ptÞ½Sp1Sp2 þ covD��:
Here, Sej and Spj denote the sensitivity and specificity of test j, for j ¼ 1; 2, and the two
covariance terms are the covariances between the two tests for infected animals (Dþ) and
for non-infected animals (D�):
covDþ ¼ Se11 � Se1Se2
covD� ¼ Sp22 � Sp1Sp2
where Se11 ¼ PðTþ1 ; Tþ
2 jDþÞ and Sp22 ¼ PðT�1 ;T�
2 jD�Þ (Dendukuri and Joseph, 2001).
Note that if one test was used rather than two tests, then the data would be modeled as
independent binomials as in Section 3.2.
A.J. Branscum et al. / Preventive Veterinary Medicine 66 (2004) 101–112 105
As in the single-herd setting, the Ses and Sps of the two tests were modeled with
independent beta prior distributions. The prevalences of the sampled herds were considered
exchangeable and thus were assumed to be independent and identically distributed:
p1;p2; . . . ;pK jm;c�Betaðmc;cð1 � mÞÞwhere m is the average prevalence in the population and c is associated with the variability
of these prevalences about their mean. A large value of c implies less heterogeneity among
prevalences. The PD is this beta distribution. We modeled uncertainty about the mean of
the prevalence distribution using:
m�Betaðam; bmÞ
and a gamma prior distribution was used to model c:
c�Gammaðac; bcÞ:
The constants ac and bc were determined based on the expert-elicited most-likely
value for an upper percentile of the PD and a value for which the expert is virtually certain
that this upper percentile is not exceeded. For instance, the expert might believe that 90%
of all herds in the population have prevalences <0.50 and might be 99.5% sure that the
90th percentile of the prevalence distribution is <0.60. The values ac and bc are
computed from these two values using the method described in detail in Hanson et al.
(2003b).
The covariance between the test outcomes for infected animals is such that
ðSe1 � 1Þð1 � Se2Þ covDþ min ðSe1; Se2Þ � Se1Se2, and for non-infected animals
the covariance satisfies ðSp1 � 1Þð1 � Sp2Þ covD� min ðSp1;Sp2Þ � Sp1Sp2 (Den-
dukuri and Joseph, 2001). Because prior information about the two covariances is typically
unavailable, uniform prior distributions over these ranges can be used for covDþ and
covD� .
We note that, as presented, our model differs from that of Hanson et al. (2003b) in that they
modeled the test-accuracy vectors ðSe11; Se12; Se21; Se22Þ and ðSp11; Sp12; Sp21; Sp22Þwith
independent Dirichlet distributions. However, such an approach gives approximately equal
weight to the prior information for the test accuracy measures of both tests, so we did not
consider it here. We also note that our model easily generalizes to allow for herds with zero
infection prevalence by modeling the prevalences as independent:
ptjm;c�Betaðmc;cð1 � mÞÞ with probability t
pt ¼ 0 with probability 1 � t
where t denotes the proportion of infected herds in the population. With this general-
ization, one can compute the (posterior) predictive probability of zero infection prevalence
for each sampled herd, or for a randomly selected herd from the population of herds. An
estimate of the herd-level prevalence (t) is also available. This approach was used in the
analysis described in Section 3.2.
A beta distribution might not model the PD adequately, for example if the PD is
multimodal. A useful model that generalizes the one described above involves a mixture of
Dirichlet processes (Antoniak, 1974). Mixtures of Dirichlet processes (MDP) have been
used to model the PD in a setting related to ours (Hanson et al., 2003b). The essence of this
A.J. Branscum et al. / Preventive Veterinary Medicine 66 (2004) 101–112106
model is that the class of models under consideration is allowed to include all possible
PDs, thus allowing for great flexibility in terms of the shape of the PD. The family of
models is ‘‘centered’’ on a single parametric family of Beta (mc;cð1 � mÞ) distributions.
A weight a is selected by the user. If a is large, the PD is modeled as coming from
the above beta family. If a is small (for example, a ¼ 1), then the model is much-more
flexible, allowing for data-driven departures from this family. A parametric model
and the MDP generalization were fitted to data on Johne’s disease in California in
Section 3.2.1.
3.1.1. Prevalence distribution for brucellosis in cattle
We estimated the prevalence distribution for brucellosis in cattle in a region of Mexico
where brucellosis was endemic. Twenty cow herds were selected randomly and between 8 and
147 cows were sampled randomly from these herds. Serum from each cow was subjected to
two diagnostic tests for Brucella abortus: a buffered acidified plate-agglutination (BAPA) and
Rivanol. The tests were applied sequentially (in series) in that all sampled cows were tested
with the BAPA and only BAPA-positive cows were given the Rivanol test. The data are
therefore yt ¼ ðy11t; y12t; y2 tÞ where, for herd t, y11t denotes the number of cows (out of nt)
testing positive on both tests, y12t is the number of cows testing positive with the BAPA and
then testing negative with the Rivanol test, and y2 t is the number of cows testing negative with
the BAPA test. Note that the marginal observed total, y2 t, is the sum of the two latent
(unobserved) components y21t and y22t (that is, y2 t ¼ y21t þ y22t).
The data are modeled as independent multinomials:
yt �multinomialðnt; ðp11t; p12t; p21t þ p22tÞÞ:
The prior distributions used for the parameters of the prevalence distribution
were described by Hanson et al. (2003b). The prior mode for the mean of the prevalence
distribution was 0.25 and the corresponding 95th percentile was 0.35. Also, we set
t0 ¼ 1.
We elicited prior estimates of the accuracy of BAPA and Rivanol tests from Dr. Sharon
Hietala, University of California, Davis. The modal values elicited for the Ses of the BAPA
and Rivanol tests were 0.99 and 0.90, respectively, with corresponding 5th percentiles of
0.95 and 0.85. The modal values of the Sps for BAPA and Rivanol were 0.70 and 0.95,
respectively with 5th percentiles of 0.50 and 0.90. The two covariances were modeled with
uniform prior distributions.
The posterior predictive PD is displayed in Fig. 1 and is highly right skewed. Indeed, for
a randomly selected herd with prevalence p�, we computed the following (posterior)
predictive probabilities:
Pðp� 0:10 j fytgÞ ¼ 0:431
Pðp� 0:50 j fytgÞ ¼ 0:915
Pðp� � 0:75 j fytgÞ ¼ 0:015
Pðp� � 0:90 j fytgÞ ¼ 0:002:
The interpretation is that about 92% of the herds in the super-population consisting of all
herds in the region are estimated to have prevalences <0.50, and only 1 in every 500 herds
is estimated to have a prevalence >0.90. The posterior median and 95% probability interval
A.J. Branscum et al. / Preventive Veterinary Medicine 66 (2004) 101–112 107
for the mean of the prevalence distribution are 0.190 (0.129, 0.264), so we are 95% sure that
the average prevalence is between 0.13 and 0.26.
Each iteration of the Gibbs sampler yielded a sampled value from the posterior
distribution of the 20 prevalences. Using WinBUGS, these 20 values were ranked from
lowest to highest. We monitored the ranks of these sampled prevalences at each iteration.
Histograms of the distribution of ranks for each herd were obtained. Herds 15, 7, 19, and 9
were ranked highest with herd 15 having the largest infection prevalence (posterior median
of p15 was 0.82) while herds 12 and 16 were ranked among the lowest (each having
posterior median prevalences of about 0.02). (The original herd-test results are in Table 2 of
Hanson et al. (2003b).)
Despite the different choices of prior distributions, our results are very similar to
those presented in Hanson et al. (2003b)—who sampled the full conditional distributions
using their own code (in Fortran) rather than using WinBUGS. Fitting the model
in WinBUGS is advantageous because the full conditional distributions (which might
not be common distributions such as the normal distribution) do not need to be
specified.
3.2. Estimation of herd-level prevalence
The model presented in Section 3.1 was developed with the goal of estimating the PD
for the population or region. We now focus on estimating the herd-level prevalence (t) in a
region. Suppose a two-stage cluster-sampling design is used where K herds first are
sampled randomly from the region, and subsequently nt animals are sampled randomly
from herd t (t ¼ 1; 2; . . . ;K). Typically, K is large enough to ensure adequate coverage of
the region and the nts are relatively small.
The resulting data are the numbers of test-positive animals from each herd, denoted as
y1; y2; . . . ; yK . We assume the data {yt} are independent and follow binomial distributions:
ytjpt; Se; Sp�Binðnt;ptSe þ ð1 � ptÞð1 � SpÞÞ
with independent beta prior distributions assumed for Se and Sp. In contrast, Suess et al.
(2002) modeled the animal-level test outcomes using independent Bernoulli distributions
A.J. Branscum et al. / Preventive Veterinary Medicine 66 (2004) 101–112108
Fig. 1. Prior (dashed line) and posterior (solid line) prevalence distribution for brucellosis in Mexico.
and incorporated the true latent infection status of each sampled animal into the analysis to
facilitate the Gibbs sampler. Their approach was relatively computationally intensive and
cannot be implemented in WinBUGS.
As in the single-herd setting, the prevalences were modeled as a mixture of point mass at
zero and a continuous beta distribution on (0,1):
ptjm;c�Betaðmc;cð1 � mÞÞ with probability t
pt ¼ 0 with probability 1 � t:
We modeled t analogously to the prevalences to allow for the possibility that the
infectious agent is completely absent from the region:
t�Betaðat; btÞ with probability g
t ¼ 0 with probability 1 � g
where g is typically set equal to an expert-elicited constant (g0)—but also could be
modeled using a beta distribution.
The prior distributions for m and c were described in Section 3.1. Note that the model
allows for the estimation of (i) the herd-level prevalence (HP), (ii) prevalence distribution
for herds, (iii) the mean of the prevalence distribution, m, (iv) the probability that the region
is infection-free [Pðt ¼ 0 j fytgÞ], and (v) the predicted probability that a randomly
selected herd in the region is infection-free, Pðp� ¼ 0 j fytgÞ.
3.2.1. Prevalence of Johne’s disease in California, USA
We estimated the HP and the PD for Johne’s disease in California using data from
K ¼ 29 randomly sampled herds from the central valley of California. From each
herd, nt � n ¼ 60 cows were sampled randomly. Each sampled cow was tested for
Johne’s disease using an ELISA. Prior information about the ELISA was described in
Section 2.2.1.
The modal value and 95th percentile for m were determined as follows. We elicited a
mode of 0.12 and a 95th percentile of 0.30 for the PD, e.g. for the prevalence (p�) of a
randomly selected herd from the infected sub-population and computed a and b where
p� �Betaða; bÞ. The modal value of m was taken to be a=ða þ bÞ. Noting that the 95th
percentile for m will be less than the 95th percentile of p�, we used 0.30 for the 95th
percentile of m to be conservative.
A conservative value of 0.50 was used for the 99th percentile of our prior on
the 95th percentile of the PD. In other words, a priori we are 99% sure that 95% of all the
region’s prevalences are <0.50. A sensitivity analysis using modal values of (0.16, 0.20,
0.25) for the 95th percentile of the prevalence distribution, and using values of (0.35, 0.40,
0.45) for the 99th percentiles for our prior on the 95th percentile yielded relatively similar
results.
An MDP model was also used for the prevalences, as described above. We used a ¼ 1,
which allows for greater flexibility in the shape of the PD.
We set g0 ¼ 1 because Johne’s disease was known to be present in this population of
dairy herds. The beta prior distribution used for t had a modal value of 0.60 with 95th
percentile equal to 0.83 (also provided by Dr. Michael Collins).
A.J. Branscum et al. / Preventive Veterinary Medicine 66 (2004) 101–112 109
The posterior median and 95% probability interval for HP (t) was 0.59 (0.30, 0.85).
Also, the predictive probability that a randomly selected herd in the region had prevalence
<0.05 and 0.50 was:
Pðp� 0:05 j fytgÞ ¼ 0:52
Pðp� 0:50 j fytgÞ ¼ 0:97
so it was predicted that 97% of the herds in the region had prevalences <0.50.
For infected herds, the parametric and semiparametric posterior predictive PD for
Johne’s disease in this region of California are displayed in Fig. 2 with the PD having
most of its mass below 0.60. For these data, the mode of the PD based on the
semiparametric estimate was greater than the mode of the PD based on the parametric
estimate—thereby demonstrating some of the effects of the parametric assumptions used
for the PD. The prior and posterior prevalence distribution are similar because the data
were consistent with the prior information incorporated into the analysis.
4. Prevalence inference from pooled samples
For rare infections, pooled testing often is used as a cost-efficient means of prevalence
estimation. Cowling et al. (1999) reviewed several frequentist and one Bayesian approach to
estimating animal-level prevalence from pooled samples. Bayesian approaches also were
considered by Joseph et al. (1995) and Tu et al. (1999). Here, we demonstrate a Bayesian
model for estimating animal-level prevalence from pooled samples using WinBUGS.
In particular, suppose we tested m pools with an imperfect diagnostic test where each
pool is comprised of samples from k animals. Let y denote the number of pools testing
positive and P denote the probability that a pool tests positive. Then, the animal-level
prevalence is related to the pool prevalence by:
p ¼ 1 � Sep � P
Spp þ Sep � 1
!1=k
A.J. Branscum et al. / Preventive Veterinary Medicine 66 (2004) 101–112110
Fig. 2. Estimated posterior prevalence distribution for Johne’s disease in California using the parametric model
(solid line) and semiparametric model (dashed line). An estimate of the prior prevalence distribution is given by
the dotted line.
where Sep and Spp denote the pooled test Se and Sp, respectively, and need to be
approximately equal to the individual-level Se and Sp. The data were assumed to follow
a binomial distribution yjP�Binðm;PÞ. Hence, the model was similar in form to that of the
single-herd, binomial-sampling model presented in Section 2.1. We modeled p as before,
using a mixture of point mass at zero and a beta distribution over the range (0,1).
4.1. Estimation of Salmonella enteritidis from pooled samples
We re-created the Bayesian analysis in Cowling et al. (1999) that used data from Kinde
et al. (1996). The data came from a California egg-producing ranch. The eggs were pooled
into batches of size k ¼ 20 and y ¼ 2 pools out of m ¼ 656 pools tested positive for
S. enteritidis by culture. We used the prior distributions provided in Cowling et al. (1999)
for the Se (median ¼ 0:70) and Sp (median ¼ 0:999931) of culture. The ranch was known
to be producing eggs contaminated with S. enteritidis, so we set t ¼ t0 ¼ 1. The median
bird-level prevalence was 0.000198. The analysis resulted in a posterior median for p of
0.0002034 with 95% probability interval (0.000043, 0.000555). These results agree with
those obtained by Cowling et al. (1999).
We note that Cowling et al. (1999) did their analysis in Splus (Insightful Corp., Seattle,
Washington) and introduced latent data to facilitate the Gibbs sampler (that is, to have
recognizable full conditional distributions). In WinBUGS, these latent data are not needed.
5. Conclusions
We reviewed Bayesian approaches for prevalence inferences in various settings. Most of
the models were fitted easily using the WinBUGS software. We also presented a new model
that allows for the estimation of the prevalence distribution in addition to estimating herd-
level prevalence in a region. Although not always explicitly mentioned, a sensitivity analysis
was conducted for each analysis to assess the influence of the prior distributions on the
resulting inferences. As a means to reduce the effects of modeling the PD parametrically (e.g.
with a beta distribution), or as part of a sensitivity analysis, a mixture of Dirichlet processes
centered on the family of beta distributions can be used to model the PD. Also, convergence
diagnostics should be used as a routine practice. Our companion paper (Branscum et al.,
2005) details Bayesian models for estimating measures of test accuracy.
Acknowledgements
The study was supported in part by the USDA-CSREES-NRI Competitive Grants
program award number 2001-35204-10874. We thank Dr. Michael Collins and Dr. Sharon
Hietala for providing prior information about test accuracy and prevalence.
References
Antoniak, C.E., 1974. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems.
Ann. Statist. 2, 1152–1174.
A.J. Branscum et al. / Preventive Veterinary Medicine 66 (2004) 101–112 111
Branscum, A.J., Gardner, I.A., Johnson, W.O., 2005. Estimation of diagnostic test sensitivity and specificity
through Bayesian modeling. Prev. Vet. Med., submitted for publication.
Cowling, D.W., Gardner, I.A., Johnson, W.O., 1999. Comparison of methods for estimating individual-level
prevalence based on pooled samples. Prev. Vet. Med. 39, 211–225.
Davies, P.R., Morrow, W.E.M., Deen, J., Gamble, H.R., Patton, S., 1998. Seroprevalence of Toxoplasma gondii in
different production systems in North Carolina, USA. Prev. Vet. Med. 36, 67–76.
Dendukuri, N., Joseph, L., 2001. Bayesian approaches to modeling the conditional dependence between multiple
diagnostic tests. Biometrics 57, 158–167.
Dubey, J.P., Thulliez, P., Weigel, R.M., Andrews, C.D., Lind, P., Powell, E.C., 1995. Sensitivity and specificity of
various serologic tests for detection of Toxoplasma gondii infection in naturally infected sows. Am. J. Vet. Res.
56, 1030–1036.
Dunson, D.B., 2001. Commentary: practical advantages of Bayesian analysis of epidemiologic data. Am. J.
Epidemiol. 12, 1222–1226.
Erkanli, A., Soyer, R., Costello, E., 1999. Bayesian inference for prevalence in longitudinal two-phase studies.
Biometrics 55, 1145–1150.
Gardner, I.A., 2002. The utility of Bayes’ theorem and Bayesian inference in veterinary clinical practice and
research. Aust. Vet. J. 80, 758–761.
Gelman, A., Rubin, D.B., 1992. Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–
511.
Hanson, T.E., Johnson, W.O., Gardner, I.A., Georgiadis, M.P., 2003a. Determining the infection status of a herd. J.
Agric. Biol. Environ. Statist. 8, 469–485.
Hanson, T.E., Johnson, W.O., Gardner, I.A., 2003b. Hierarchical models for estimating herd prevalence and test
accuracy in the absence of a gold-standard. J. Agric. Biol. Environ. Statist. 8, 223–239.
Johnson, W.O., Su, C.-L., Gardner, I.A., Christensen, R., 2004. Sample size calculations for surveys to substantiate
freedom of populations from infectious agents. Biometrics 60, 165–171.
Joseph, L., Gyorkos, T.W., Coupal, L., 1995. Bayesian estimation of disease prevalence and the parameters of
diagnostic tests in the absence of a gold standard. Am. J. Epidemiol. 141, 263–272.
Kinde, H., Read, D.H., Chin, D.P., Bickford, A.A., Walker, R.L., Ardans, A., Breitmeyer, R.E., Willoughby, D.,
Little, H.E., Kerr, D., Gardner, I.A., 1996. Salmonella enteritidis, phage type 4 infection in a commercial layer
flock in southern California: bacteriologic and epidemiologic findings. Avian Dis. 40, 665–671.
Robert, C.P., Casella, G., 1999. Monte Carlo Statistical Methods. Springer-Verlag, New York.
Spiegelhalter, D., Thomas, A., Best, N., Gilks, W., 1996. BUGS: Bayesian inference using Gibbs sampling,
version 0.50. MRC Biostatistics Unit, Cambridge. http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/con-
tents.shtml.
Suess, E.A., Johnson, W.O., Gardner, I.A., 2002. Hierarchical Bayesian model for prevalence inferences and
determination of a country’s status for an animal pathogen. Prev. Vet. Med. 55, 155–171.
Tu, X.M., Kowalski, J., Jia, G., 1999. Bayesian analysis of prevalence with covariates using simulation-based
techniques: applications to HIV screening. Statist. Med. 18, 3059–3073.
A.J. Branscum et al. / Preventive Veterinary Medicine 66 (2004) 101–112112