bayesian modeling of animal- and herd-level prevalences

Bayesian modeling of animal- and

herd-level prevalences

A.J. Branscuma,�, I.A. Gardnerb, W.O. Johnsona

aDepartment of Statistics, University of California, Davis, CA 95616, USAbDepartment of Medicine and Epidemiology, School of Veterinary Medicine,

University of California, Davis, CA 95616, USA

Received 9 September 2003; received in revised form 11 June 2004; accepted 6 September 2004

Abstract

We reviewed Bayesian approaches for animal-level and herd-level prevalence estimation based on

cross-sectional sampling designs and demonstrated fitting of these models using the WinBUGS

software. We considered estimation of infection prevalence based on use of a single diagnostic test

applied to a single herd with binomial and hypergeometric sampling. We then considered multiple

herds under binomial sampling with the primary goal of estimating the prevalence distribution and

the proportion of infected herds. A new model is presented that can be used to estimate the herd-level

prevalence in a region, including the posterior probability that all herds are non-infected. Using this

model, inferences for the distribution of prevalences, mean prevalence in the region, and predicted

prevalence of herds in the region (including the predicted probability of zero prevalence) are also

available. In the models presented, both animal- and herd-level prevalences are modeled as mixture

distributions to allow for zero infection prevalences. (If mixture models for the prevalences were not

used, prevalence estimates might be artificially inflated, especially in herds and regions with low or

zero prevalence.) Finally, we considered estimation of animal-level prevalence based on pooled

samples.

# 2004 Elsevier B.V. All rights reserved.

Keywords: Bayesian modeling; Prevalence estimation; Pooled samples; WinBUGS

www.elsevier.com/locate/prevetmed

Preventive Veterinary Medicine 66 (2004) 101–112

� Corresponding author. Tel.: +1 530 752 2361; fax: +1 530 752 7099.

E-mail address: [email protected] (A.J. Branscum).

0167-5877/$ – see front matter # 2004 Elsevier B.V. All rights reserved.

doi:10.1016/j.prevetmed.2004.09.009

1. Introduction

Recently, Bayesian analysis has been used increasingly for veterinary epidemiologic

studies. Branscum et al. (2005) and Suess et al. (2002) provide brief introductions to

Bayesian methods and applied those methods to estimating measures of test accuracy and

prevalence inference. Advantages of a Bayesian approach include the ability to incorporate

prior information about model parameters into the analysis; obtaining exact inference (up

to Monte Carlo error) without the need for large-sample approximations; and, notably, the

development of the WinBUGS software (Spiegelhalter et al., 1996) (which enables a

relatively straightforward means of fitting various Bayesian models) (Dunson, 2001;

Gardner, 2002). Branscum et al. (2005) gave an overview of using WinBUGS for Bayesian

modeling and reviewed several models to estimate the sensitivity and specificity of

multiple (possibly correlated) diagnostic tests.

In this paper, we considered Bayesian approaches for prevalence estimation based on

cross-sectional sampling designs and discussed implementation of these models in

WinBUGS, wherever possible. Tu et al. (1999) considered prevalence estimation with

covariates and Erkanli et al. (1999) presented Bayesian approaches for prevalence

estimation based on longitudinal data (some of which can also be done easily in

WinBUGS)—but we did not consider covariates or longitudinal data herein.

We reviewed the two models developed in Hanson et al. (2003a), the hierarchical

model of Hanson et al. (2003b), and the Bayesian approach to animal-level prevalence

inference based on pooled samples described in Cowling et al. (1999). The two

models developed by Hanson et al. (2003a) were used for estimating the animal-level

infection prevalence of a single herd based on the results of an imperfect diagnostic test

applied independently to each of n randomly sampled animals. The first model

(which assumes binomial sampling) is appropriate provided the herd size is much larger

than the sample size. Otherwise, a model based on hypergeometric sampling should be

used.

Next, a modified version of the hierarchical model of Hanson et al. (2003b) for

prevalence inferences based on data from multiple herds is presented. We also present a

new model—similar to the model presented in Suess et al. (2002) but easier to implement

computationally—designed to estimate herd-level prevalence based on data from multiple

herds obtained using a two-stage cluster-sampling design. The models using data from

multiple herds yielded estimates of the distribution of prevalences in the population. Lastly,

we illustrate a Bayesian approach to animal-level prevalence estimation based on pooled

samples (Cowling et al., 1999).

The models were fitted using Markov-chain Monte Carlo techniques, in particular the

Gibbs sampler (Robert and Casella, 1999). Suess et al. (2002, pp. 160–161) gave a detailed

explanation of this approach in the context of prevalence estimation. For the analyses

presented in this paper, inferences were based on 50,000 iterations after discarding an

initial burn-in of 5,000 iterations with convergence assessed by running multiple chains

from various starting values (Gelman and Rubin, 1992). The WinBUGS code used to fit all

models is available on our website, http://www.epi.ucdavis.edu/diagnostictests/ (except in

the hypergeometric case where specialized code was used). The WinBUGS code can be

altered readily to conform to different data.

A.J. Branscum et al. / Preventive Veterinary Medicine 66 (2004) 101–112102

http://www.epi.ucdavis.edu/diagnostictests/

2. Methods for a single herd

2.1. Binomial sampling

Consider estimation of the infection prevalence for a single herd where y animals tested

positive out of n randomly sampled animals. If the herd size (N) is much larger than n, then

the sampling distribution of y is approximately binomial:

yjp; Se; Sp�Binðn;pSe þ ð1 � pÞð1 � SpÞÞ

where p is the prevalence of infection in the herd and Se and Sp are the sensitivity and

specificity, respectively, of the diagnostic test applied to each sampled animal. With the

specification of prior distributions for the model parameters, the Bayesian model is

complete.

We modeled uncertainty about the sensitivity (Se) and specificity (Sp) of the diagnostic

test using independent beta prior distributions (Hanson et al., 2003a):

Se�BetaðaSe; bSeÞ Sp�BetaðaSp; bSpÞ:

A beta distribution provides a flexible means of modeling uncertainty about parameters

ranging from 0 to 1. The prior parameters of all beta distributions used in this paper were

determined based on the most-likely (modal) prior value of the parameter and an upper or

lower percentile for the parameter. These values typically are elicited from experts. The Beta

Buster software available at our website (http://www.epi.ucdavis.edu/diagnostictests/)

computes the two parameters of a beta distribution based on the mode and one of the two

percentiles.

A beta prior distribution also may be used for the infection prevalence, but this would

imply that Pðp ¼ 0Þ ¼ 0. This approach was used by Johnson et al. (2004). However, if the

herd is truly not infected, then p ¼ 0 and so we require a model that will allow for this.

Hanson et al. (2003a) modeled the infection prevalence using a mixture distribution:

p�Betaðap; bpÞ with probability t

p ¼ 0 with probability 1 � t

where t is the probability that the herd is infected. With this mixture distribution,

computation of the posterior probability that the herd is not infected is possible and this

computation can be performed easily using WinBUGS under binomial-sampling schemes.

A beta prior distribution also can be used for t. Alternately, t can be set equal to an expert-

specified constant (t0).

2.1.1. Prevalence of Toxoplasma gondii

Davies et al. (1998) determined the seroprevalence of T. gondii in finishing pigs for 28

herds in North Carolina, USA. We estimated the prevalence of T. gondii in one of the

sampled herds where y ¼ 1 pig tested positive out of n ¼ 91 pigs with a modified

agglutination test (MAT). The beta prior distribution used for the Se and Sp of the MAT had

modal values of 0.70 and 0.99, respectively, with 5th percentiles of 0.55 and 0.95. These

values were based on estimates from a previous study (Dubey et al., 1995) and field

A.J. Branscum et al. / Preventive Veterinary Medicine 66 (2004) 101–112 103

http://www.epi.ucdavis.edu/diagnostictests/

evidence that the Sp of the MAT in grower-finisher pigs is much higher than that in sows,

which were used in the original study.

The scientifically most likely (prior) value for the probability that the herd was infected

was taken to be 0.10; i.e. we set t0 ¼ 0:10. The infection prevalence, assuming the herd

truly was infected, was modeled using a beta (1.80, 26.74) distribution, which has a mode

of 0.03 and 95th percentile of 0.15. These values were elicited from one of the authors (Dr.

Ian Gardner) and were based on estimates from previous National Animal Health

Monitoring System studies in 1990, and changes in industry demographics and production

systems in the last decade that would decrease prevalence of infection in grower-finisher

pigs.

The posterior median and 95% probability interval (sometimes called ‘‘credibility

interval’’) for the prevalence was 0 (0.00, 0.01). Also, the posterior probability that the herd

was infected was 0.03; that is, Pðp> 0 j y ¼ 1; n ¼ 91Þ ¼ 0:03. The median for p was zero

because 97% of the Gibbs iterates for p were 0.

2.2. Hypergeometric sampling

The distribution of the number of test-positive animals out of the n sampled can be

approximated well by a binomial distribution if N n (that is, if the herd size far exceeds

the sample size). When the test outcomes for the sampled animals cannot be considered as

independent Bernoulli trials, the distribution of y (which is related to the hypergeometric

distribution) is more complicated (Johnson et al., 2004), and therefore the model is difficult

to implement in WinBUGS. However, the model consists of only three parameters: the Se

and Sp of the test and the prevalence (p, where for d infected animals in the herd,

p ¼ d=N).

The freely available, user-friendly software called BDFree (short for Bayesian Disease

Freedom) developed by Johnson et al. (2004) and downloadable from http://

www.epi.ucdavis.edu/diagnostictests, allows the user to input data (y, n, N) and to

specify beta prior distributions for Se and Sp. A discretized beta distribution (a discrete

distribution taking values in the set {0; 1=N; 2=N; . . . ; 1} where the corresponding

probabilities are obtained from a continuous beta distribution) is used for p (Hanson et al.,

2003a). The output provided by BDFree includes the posterior median and 95% probability

interval for p; it also provides the posterior probability that p is no larger than a user-

specified threshold (p0)—namely, Pðp p0 j y; n;NÞ.We note that BDFree also can be used to fit the model in Section 2.1 for the special case

of t ¼ 1. In this instance, a value of p0 can be selected so that if p p0, then the herd can

be considered ‘‘low risk’’. If Pðp p0 j y; n;NÞ is sufficiently large (say, 0.95), then the

herd can be considered to have the disease well controlled.

2.2.1. Prevalence of Johne’s disease

The prevalence of Johne’s disease (Mycobacterium avium subsp. paratuberculosis) in a

herd in the central valley of California, USA was estimated by testing all N ¼ n ¼ 333

cows using a commercial ELISA kit (Idexx Laboratories, Westbrook, Maine). Ten of the

cows had positive test results.


http://www.epi.ucdavis.edu/diagnostictests

http://www.epi.ucdavis.edu/diagnostictests

The beta prior distributions for the Se and Sp of the ELISA were constructed using

expert information from Dr. Michael Collins, University of Wisconsin (Madison,

Wisconsin, USA). For the Se, a mode of 0.25 was used with 95th percentile of 0.30. A

modal value of 0.98 was used for the Sp with 5th percentile equal to 0.96. We used a

relatively diffuse discretized beta prior distribution for the prevalence with mode equal to

0.05 and 95th percentile equal to 0.30.

The posterior probability that the herd was infected was 0.991: Pðp> 0 j y ¼ 10;n ¼ 333;N ¼ 333Þ ¼ 0:991. The posterior median and 95% probability interval for pwere

0.064 (0.006, 0.161).

3. Methods for multiple herds

3.1. Estimation of the within-herd prevalence distribution

In the multiple-herd setting, suppose nt animals are sampled randomly from herds

t ¼ 1; 2; . . . ;K. We consider estimation of the prevalence distribution (PD), which is

defined to be the distribution of prevalences for herds in the super-population of all herds in

the region. The model we present is a modified version of that proposed by Hanson et al.

(2003b) and we estimated the PD based on results of two dependent diagnostic tests

applied to each sampled animal. The data for herd t are given by the vector

yt ¼ ðy11t; y12t; y21t; y22tÞ, where y11t (y22t) is the number of animals from herd t testing

positive (negative) on both tests and y12t (y21t) denotes the number of animals from herd t

testing positive (negative) on test 1 and negative (positive) on test 2. The complete data set

is comprised of the independent vectors y1; y2; . . . ; yK .

The data were assumed to have independent multinomial sampling distributions:

yt �multinomialðnt; ðp11t; p12t; p21t; p22tÞÞ

where

p11t ¼ PtðTþ1 ; Tþ

2 Þ ¼ pt½Se1Se2 þ covDþ � þ ð1 � ptÞ½ð1 � Sp1Þð1 � Sp2Þ þ covD��p12t ¼ PtðTþ

1 ; T�2 Þ ¼ pt½Se1ð1 � Se2Þ � covDþ� þ ð1 � ptÞ½ð1 � Sp1ÞSp2 � covD��

p21t ¼ PtðT�1 ; Tþ

2 Þ ¼ pt½ð1 � Se1ÞSe2 � covDþ� þ ð1 � ptÞ½Sp1ð1 � Sp2Þ � covD��p22t ¼ PtðT�

1 ; T�2 Þ ¼ pt½ð1 � Se1Þð1 � Se2Þ þ covDþ� þ ð1 � ptÞ½Sp1Sp2 þ covD��:

Here, Sej and Spj denote the sensitivity and specificity of test j, for j ¼ 1; 2, and the two

covariance terms are the covariances between the two tests for infected animals (Dþ) and

for non-infected animals (D�):

covDþ ¼ Se11 � Se1Se2

covD� ¼ Sp22 � Sp1Sp2

where Se11 ¼ PðTþ1 ; Tþ

2 jDþÞ and Sp22 ¼ PðT�1 ;T�

2 jD�Þ (Dendukuri and Joseph, 2001).

Note that if one test was used rather than two tests, then the data would be modeled as

independent binomials as in Section 3.2.


As in the single-herd setting, the Ses and Sps of the two tests were modeled with

independent beta prior distributions. The prevalences of the sampled herds were considered

exchangeable and thus were assumed to be independent and identically distributed:

p1;p2; . . . ;pK jm;c�Betaðmc;cð1 � mÞÞwhere m is the average prevalence in the population and c is associated with the variability

of these prevalences about their mean. A large value of c implies less heterogeneity among

prevalences. The PD is this beta distribution. We modeled uncertainty about the mean of

the prevalence distribution using:

m�Betaðam; bmÞ

and a gamma prior distribution was used to model c:

c�Gammaðac; bcÞ:

The constants ac and bc were determined based on the expert-elicited most-likely

value for an upper percentile of the PD and a value for which the expert is virtually certain

that this upper percentile is not exceeded. For instance, the expert might believe that 90%

of all herds in the population have prevalences <0.50 and might be 99.5% sure that the

90th percentile of the prevalence distribution is <0.60. The values ac and bc are

computed from these two values using the method described in detail in Hanson et al.

(2003b).

The covariance between the test outcomes for infected animals is such that

ðSe1 � 1Þð1 � Se2Þ covDþ min ðSe1; Se2Þ � Se1Se2, and for non-infected animals

the covariance satisfies ðSp1 � 1Þð1 � Sp2Þ covD� min ðSp1;Sp2Þ � Sp1Sp2 (Den-

dukuri and Joseph, 2001). Because prior information about the two covariances is typically

unavailable, uniform prior distributions over these ranges can be used for covDþ and

covD� .

We note that, as presented, our model differs from that of Hanson et al. (2003b) in that they

modeled the test-accuracy vectors ðSe11; Se12; Se21; Se22Þ and ðSp11; Sp12; Sp21; Sp22Þwith

independent Dirichlet distributions. However, such an approach gives approximately equal

weight to the prior information for the test accuracy measures of both tests, so we did not

consider it here. We also note that our model easily generalizes to allow for herds with zero

infection prevalence by modeling the prevalences as independent:

ptjm;c�Betaðmc;cð1 � mÞÞ with probability t

pt ¼ 0 with probability 1 � t

where t denotes the proportion of infected herds in the population. With this general-

ization, one can compute the (posterior) predictive probability of zero infection prevalence

for each sampled herd, or for a randomly selected herd from the population of herds. An

estimate of the herd-level prevalence (t) is also available. This approach was used in the

analysis described in Section 3.2.

A beta distribution might not model the PD adequately, for example if the PD is

multimodal. A useful model that generalizes the one described above involves a mixture of

Dirichlet processes (Antoniak, 1974). Mixtures of Dirichlet processes (MDP) have been

used to model the PD in a setting related to ours (Hanson et al., 2003b). The essence of this


model is that the class of models under consideration is allowed to include all possible

PDs, thus allowing for great flexibility in terms of the shape of the PD. The family of

models is ‘‘centered’’ on a single parametric family of Beta (mc;cð1 � mÞ) distributions.

A weight a is selected by the user. If a is large, the PD is modeled as coming from

the above beta family. If a is small (for example, a ¼ 1), then the model is much-more

flexible, allowing for data-driven departures from this family. A parametric model

and the MDP generalization were fitted to data on Johne’s disease in California in

Section 3.2.1.

3.1.1. Prevalence distribution for brucellosis in cattle

We estimated the prevalence distribution for brucellosis in cattle in a region of Mexico

where brucellosis was endemic. Twenty cow herds were selected randomly and between 8 and

147 cows were sampled randomly from these herds. Serum from each cow was subjected to

two diagnostic tests for Brucella abortus: a buffered acidified plate-agglutination (BAPA) and

Rivanol. The tests were applied sequentially (in series) in that all sampled cows were tested

with the BAPA and only BAPA-positive cows were given the Rivanol test. The data are

therefore yt ¼ ðy11t; y12t; y2 tÞ where, for herd t, y11t denotes the number of cows (out of nt)

testing positive on both tests, y12t is the number of cows testing positive with the BAPA and

then testing negative with the Rivanol test, and y2 t is the number of cows testing negative with

the BAPA test. Note that the marginal observed total, y2 t, is the sum of the two latent

(unobserved) components y21t and y22t (that is, y2 t ¼ y21t þ y22t).

The data are modeled as independent multinomials:

yt �multinomialðnt; ðp11t; p12t; p21t þ p22tÞÞ:

The prior distributions used for the parameters of the prevalence distribution

were described by Hanson et al. (2003b). The prior mode for the mean of the prevalence

distribution was 0.25 and the corresponding 95th percentile was 0.35. Also, we set

t0 ¼ 1.

We elicited prior estimates of the accuracy of BAPA and Rivanol tests from Dr. Sharon

Hietala, University of California, Davis. The modal values elicited for the Ses of the BAPA

and Rivanol tests were 0.99 and 0.90, respectively, with corresponding 5th percentiles of

0.95 and 0.85. The modal values of the Sps for BAPA and Rivanol were 0.70 and 0.95,

respectively with 5th percentiles of 0.50 and 0.90. The two covariances were modeled with

uniform prior distributions.

The posterior predictive PD is displayed in Fig. 1 and is highly right skewed. Indeed, for

a randomly selected herd with prevalence p�, we computed the following (posterior)

predictive probabilities:

Pðp� 0:10 j fytgÞ ¼ 0:431

Pðp� 0:50 j fytgÞ ¼ 0:915

Pðp� � 0:75 j fytgÞ ¼ 0:015

Pðp� � 0:90 j fytgÞ ¼ 0:002:

The interpretation is that about 92% of the herds in the super-population consisting of all

herds in the region are estimated to have prevalences <0.50, and only 1 in every 500 herds

is estimated to have a prevalence >0.90. The posterior median and 95% probability interval


for the mean of the prevalence distribution are 0.190 (0.129, 0.264), so we are 95% sure that

the average prevalence is between 0.13 and 0.26.

Each iteration of the Gibbs sampler yielded a sampled value from the posterior

distribution of the 20 prevalences. Using WinBUGS, these 20 values were ranked from

lowest to highest. We monitored the ranks of these sampled prevalences at each iteration.

Histograms of the distribution of ranks for each herd were obtained. Herds 15, 7, 19, and 9

were ranked highest with herd 15 having the largest infection prevalence (posterior median

of p15 was 0.82) while herds 12 and 16 were ranked among the lowest (each having

posterior median prevalences of about 0.02). (The original herd-test results are in Table 2 of

Hanson et al. (2003b).)

Despite the different choices of prior distributions, our results are very similar to

those presented in Hanson et al. (2003b)—who sampled the full conditional distributions

using their own code (in Fortran) rather than using WinBUGS. Fitting the model

in WinBUGS is advantageous because the full conditional distributions (which might

not be common distributions such as the normal distribution) do not need to be

specified.

3.2. Estimation of herd-level prevalence

The model presented in Section 3.1 was developed with the goal of estimating the PD

for the population or region. We now focus on estimating the herd-level prevalence (t) in a

region. Suppose a two-stage cluster-sampling design is used where K herds first are

sampled randomly from the region, and subsequently nt animals are sampled randomly

from herd t (t ¼ 1; 2; . . . ;K). Typically, K is large enough to ensure adequate coverage of

the region and the nts are relatively small.

The resulting data are the numbers of test-positive animals from each herd, denoted as

y1; y2; . . . ; yK . We assume the data {yt} are independent and follow binomial distributions:

ytjpt; Se; Sp�Binðnt;ptSe þ ð1 � ptÞð1 � SpÞÞ

with independent beta prior distributions assumed for Se and Sp. In contrast, Suess et al.

(2002) modeled the animal-level test outcomes using independent Bernoulli distributions


Fig. 1. Prior (dashed line) and posterior (solid line) prevalence distribution for brucellosis in Mexico.

and incorporated the true latent infection status of each sampled animal into the analysis to

facilitate the Gibbs sampler. Their approach was relatively computationally intensive and

cannot be implemented in WinBUGS.

As in the single-herd setting, the prevalences were modeled as a mixture of point mass at

zero and a continuous beta distribution on (0,1):

ptjm;c�Betaðmc;cð1 � mÞÞ with probability t

pt ¼ 0 with probability 1 � t:

We modeled t analogously to the prevalences to allow for the possibility that the

infectious agent is completely absent from the region:

t�Betaðat; btÞ with probability g

t ¼ 0 with probability 1 � g

where g is typically set equal to an expert-elicited constant (g0)—but also could be

modeled using a beta distribution.

The prior distributions for m and c were described in Section 3.1. Note that the model

allows for the estimation of (i) the herd-level prevalence (HP), (ii) prevalence distribution

for herds, (iii) the mean of the prevalence distribution, m, (iv) the probability that the region

is infection-free [Pðt ¼ 0 j fytgÞ], and (v) the predicted probability that a randomly

selected herd in the region is infection-free, Pðp� ¼ 0 j fytgÞ.

3.2.1. Prevalence of Johne’s disease in California, USA

We estimated the HP and the PD for Johne’s disease in California using data from

K ¼ 29 randomly sampled herds from the central valley of California. From each

herd, nt � n ¼ 60 cows were sampled randomly. Each sampled cow was tested for

Johne’s disease using an ELISA. Prior information about the ELISA was described in

Section 2.2.1.

The modal value and 95th percentile for m were determined as follows. We elicited a

mode of 0.12 and a 95th percentile of 0.30 for the PD, e.g. for the prevalence (p�) of a

randomly selected herd from the infected sub-population and computed a and b where

p� �Betaða; bÞ. The modal value of m was taken to be a=ða þ bÞ. Noting that the 95th

percentile for m will be less than the 95th percentile of p�, we used 0.30 for the 95th

percentile of m to be conservative.

A conservative value of 0.50 was used for the 99th percentile of our prior on

the 95th percentile of the PD. In other words, a priori we are 99% sure that 95% of all the

region’s prevalences are <0.50. A sensitivity analysis using modal values of (0.16, 0.20,

0.25) for the 95th percentile of the prevalence distribution, and using values of (0.35, 0.40,

0.45) for the 99th percentiles for our prior on the 95th percentile yielded relatively similar

results.

An MDP model was also used for the prevalences, as described above. We used a ¼ 1,

which allows for greater flexibility in the shape of the PD.

We set g0 ¼ 1 because Johne’s disease was known to be present in this population of

dairy herds. The beta prior distribution used for t had a modal value of 0.60 with 95th

percentile equal to 0.83 (also provided by Dr. Michael Collins).


The posterior median and 95% probability interval for HP (t) was 0.59 (0.30, 0.85).

Also, the predictive probability that a randomly selected herd in the region had prevalence

<0.05 and 0.50 was:

Pðp� 0:05 j fytgÞ ¼ 0:52

Pðp� 0:50 j fytgÞ ¼ 0:97

so it was predicted that 97% of the herds in the region had prevalences <0.50.

For infected herds, the parametric and semiparametric posterior predictive PD for

Johne’s disease in this region of California are displayed in Fig. 2 with the PD having

most of its mass below 0.60. For these data, the mode of the PD based on the

semiparametric estimate was greater than the mode of the PD based on the parametric

estimate—thereby demonstrating some of the effects of the parametric assumptions used

for the PD. The prior and posterior prevalence distribution are similar because the data

were consistent with the prior information incorporated into the analysis.

4. Prevalence inference from pooled samples

For rare infections, pooled testing often is used as a cost-efficient means of prevalence

estimation. Cowling et al. (1999) reviewed several frequentist and one Bayesian approach to

estimating animal-level prevalence from pooled samples. Bayesian approaches also were

considered by Joseph et al. (1995) and Tu et al. (1999). Here, we demonstrate a Bayesian

model for estimating animal-level prevalence from pooled samples using WinBUGS.

In particular, suppose we tested m pools with an imperfect diagnostic test where each

pool is comprised of samples from k animals. Let y denote the number of pools testing

positive and P denote the probability that a pool tests positive. Then, the animal-level

prevalence is related to the pool prevalence by:

p ¼ 1 � Sep � P

Spp þ Sep � 1

!1=k


Fig. 2. Estimated posterior prevalence distribution for Johne’s disease in California using the parametric model

(solid line) and semiparametric model (dashed line). An estimate of the prior prevalence distribution is given by

the dotted line.

where Sep and Spp denote the pooled test Se and Sp, respectively, and need to be

approximately equal to the individual-level Se and Sp. The data were assumed to follow

a binomial distribution yjP�Binðm;PÞ. Hence, the model was similar in form to that of the

single-herd, binomial-sampling model presented in Section 2.1. We modeled p as before,

using a mixture of point mass at zero and a beta distribution over the range (0,1).

4.1. Estimation of Salmonella enteritidis from pooled samples

We re-created the Bayesian analysis in Cowling et al. (1999) that used data from Kinde

et al. (1996). The data came from a California egg-producing ranch. The eggs were pooled

into batches of size k ¼ 20 and y ¼ 2 pools out of m ¼ 656 pools tested positive for

S. enteritidis by culture. We used the prior distributions provided in Cowling et al. (1999)

for the Se (median ¼ 0:70) and Sp (median ¼ 0:999931) of culture. The ranch was known

to be producing eggs contaminated with S. enteritidis, so we set t ¼ t0 ¼ 1. The median

bird-level prevalence was 0.000198. The analysis resulted in a posterior median for p of

0.0002034 with 95% probability interval (0.000043, 0.000555). These results agree with

those obtained by Cowling et al. (1999).

We note that Cowling et al. (1999) did their analysis in Splus (Insightful Corp., Seattle,

Washington) and introduced latent data to facilitate the Gibbs sampler (that is, to have

recognizable full conditional distributions). In WinBUGS, these latent data are not needed.

5. Conclusions

We reviewed Bayesian approaches for prevalence inferences in various settings. Most of

the models were fitted easily using the WinBUGS software. We also presented a new model

that allows for the estimation of the prevalence distribution in addition to estimating herd-

level prevalence in a region. Although not always explicitly mentioned, a sensitivity analysis

was conducted for each analysis to assess the influence of the prior distributions on the

resulting inferences. As a means to reduce the effects of modeling the PD parametrically (e.g.

with a beta distribution), or as part of a sensitivity analysis, a mixture of Dirichlet processes

centered on the family of beta distributions can be used to model the PD. Also, convergence

diagnostics should be used as a routine practice. Our companion paper (Branscum et al.,

2005) details Bayesian models for estimating measures of test accuracy.

Acknowledgements

The study was supported in part by the USDA-CSREES-NRI Competitive Grants

program award number 2001-35204-10874. We thank Dr. Michael Collins and Dr. Sharon

Hietala for providing prior information about test accuracy and prevalence.

References

Antoniak, C.E., 1974. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems.

Ann. Statist. 2, 1152–1174.


Branscum, A.J., Gardner, I.A., Johnson, W.O., 2005. Estimation of diagnostic test sensitivity and specificity

through Bayesian modeling. Prev. Vet. Med., submitted for publication.

Cowling, D.W., Gardner, I.A., Johnson, W.O., 1999. Comparison of methods for estimating individual-level

prevalence based on pooled samples. Prev. Vet. Med. 39, 211–225.

Davies, P.R., Morrow, W.E.M., Deen, J., Gamble, H.R., Patton, S., 1998. Seroprevalence of Toxoplasma gondii in

different production systems in North Carolina, USA. Prev. Vet. Med. 36, 67–76.

Dendukuri, N., Joseph, L., 2001. Bayesian approaches to modeling the conditional dependence between multiple

diagnostic tests. Biometrics 57, 158–167.

Dubey, J.P., Thulliez, P., Weigel, R.M., Andrews, C.D., Lind, P., Powell, E.C., 1995. Sensitivity and specificity of

various serologic tests for detection of Toxoplasma gondii infection in naturally infected sows. Am. J. Vet. Res.

56, 1030–1036.

Dunson, D.B., 2001. Commentary: practical advantages of Bayesian analysis of epidemiologic data. Am. J.

Epidemiol. 12, 1222–1226.

Erkanli, A., Soyer, R., Costello, E., 1999. Bayesian inference for prevalence in longitudinal two-phase studies.

Biometrics 55, 1145–1150.

Gardner, I.A., 2002. The utility of Bayes’ theorem and Bayesian inference in veterinary clinical practice and

research. Aust. Vet. J. 80, 758–761.

Gelman, A., Rubin, D.B., 1992. Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–

511.

Hanson, T.E., Johnson, W.O., Gardner, I.A., Georgiadis, M.P., 2003a. Determining the infection status of a herd. J.

Agric. Biol. Environ. Statist. 8, 469–485.

Hanson, T.E., Johnson, W.O., Gardner, I.A., 2003b. Hierarchical models for estimating herd prevalence and test

accuracy in the absence of a gold-standard. J. Agric. Biol. Environ. Statist. 8, 223–239.

Johnson, W.O., Su, C.-L., Gardner, I.A., Christensen, R., 2004. Sample size calculations for surveys to substantiate

freedom of populations from infectious agents. Biometrics 60, 165–171.

Joseph, L., Gyorkos, T.W., Coupal, L., 1995. Bayesian estimation of disease prevalence and the parameters of

diagnostic tests in the absence of a gold standard. Am. J. Epidemiol. 141, 263–272.

Kinde, H., Read, D.H., Chin, D.P., Bickford, A.A., Walker, R.L., Ardans, A., Breitmeyer, R.E., Willoughby, D.,

Little, H.E., Kerr, D., Gardner, I.A., 1996. Salmonella enteritidis, phage type 4 infection in a commercial layer

flock in southern California: bacteriologic and epidemiologic findings. Avian Dis. 40, 665–671.

Robert, C.P., Casella, G., 1999. Monte Carlo Statistical Methods. Springer-Verlag, New York.

Spiegelhalter, D., Thomas, A., Best, N., Gilks, W., 1996. BUGS: Bayesian inference using Gibbs sampling,

version 0.50. MRC Biostatistics Unit, Cambridge. http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/con-

tents.shtml.

Suess, E.A., Johnson, W.O., Gardner, I.A., 2002. Hierarchical Bayesian model for prevalence inferences and

determination of a country’s status for an animal pathogen. Prev. Vet. Med. 55, 155–171.

Tu, X.M., Kowalski, J., Jia, G., 1999. Bayesian analysis of prevalence with covariates using simulation-based

techniques: applications to HIV screening. Statist. Med. 18, 3059–3073.


http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml

http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml

bayesian modeling of animal- and herd-level prevalences

Documents