robust bayesian prediction of subject disease status and population prevalence using several similar...

10
STATISTICS IN MEDICINE Statist. Med. 2004; 23:2227–2236 (DOI: 10.1002/sim.1792) Robust Bayesian prediction of subject disease status and population prevalence using several similar diagnostic tests Richard B. Evans ; and Keith Erlandson Production Animal Medicine; College of Veterinary Medicine; Iowa State University; Ames; Iowa SUMMARY Sometimes several diagnostic tests are performed on the same population of subjects with the aim of assessing disease status of individuals and the prevalence of the disease in the population, but no test is a reference test. Although the diagnostic tests may have the same biological underpinnings, test results may disagree for some specic animals. In that case, it may be dicult to determine disease status for individual subjects, and consequently population prevalence estimation becomes dicult. In this paper, we propose a robust method of estimating disease status and prevalence that uses heavy-tailed sampling distributions in a hierarchical model to protect against the inuence of conicting observations on inferences. If a subject has a test outcome that is discordant with the other test results then it is downweighted in diagnosing a subject’s disease status, and for estimating disease prevalence. The amount of downweighting depends on the degree of conict among the test results for the subject. Copyright ? 2004 John Wiley & Sons, Ltd. KEY WORDS: heavy tails; Bayes; robust inference; disease prediction; bi-normal model 1. INTRODUCTION It is common in epidemiological studies to diagnose disease status of individual subjects and to estimate disease prevalence. Several diagnostic tests are often used in animal health- certication, disease-surveillance and eradication programmes [1–3], and none of them can be considered a denitive reference test. Although the tests are biologically similar, they may give conicting results for some subjects, which aects inferences for the disease status of the concerned subject and the disease prevalence. McIntosh and Pepe [4] propose a method for combining several screening tests, each lacking sucient sensitivity and specicity, with the aim of improving screening ability. They show that the risk score (the probability of disease status conditional on biomarkers) is an optimal quantity with which to assess the disease status of a subject. Correspondence to: Richard B. Evans, Iowa State University, College of Veterinary Medicine, Room 1710, Ames IA, 50011, U.S.A. E-mail: [email protected] Received April 2003 Copyright ? 2004 John Wiley & Sons, Ltd. Accepted November 2003

Upload: richard-b-evans

Post on 06-Jul-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robust Bayesian prediction of subject disease status and population prevalence using several similar diagnostic tests

STATISTICS IN MEDICINEStatist. Med. 2004; 23:2227–2236 (DOI: 10.1002/sim.1792)

Robust Bayesian prediction of subject disease status andpopulation prevalence using several similar diagnostic tests

Richard B. Evans∗;† and Keith Erlandson

Production Animal Medicine; College of Veterinary Medicine; Iowa State University; Ames; Iowa

SUMMARY

Sometimes several diagnostic tests are performed on the same population of subjects with the aim ofassessing disease status of individuals and the prevalence of the disease in the population, but no test isa reference test. Although the diagnostic tests may have the same biological underpinnings, test resultsmay disagree for some speci�c animals. In that case, it may be di�cult to determine disease statusfor individual subjects, and consequently population prevalence estimation becomes di�cult. In thispaper, we propose a robust method of estimating disease status and prevalence that uses heavy-tailedsampling distributions in a hierarchical model to protect against the in�uence of con�icting observationson inferences. If a subject has a test outcome that is discordant with the other test results then it isdownweighted in diagnosing a subject’s disease status, and for estimating disease prevalence. Theamount of downweighting depends on the degree of con�ict among the test results for the subject.Copyright ? 2004 John Wiley & Sons, Ltd.

KEY WORDS: heavy tails; Bayes; robust inference; disease prediction; bi-normal model

1. INTRODUCTION

It is common in epidemiological studies to diagnose disease status of individual subjectsand to estimate disease prevalence. Several diagnostic tests are often used in animal health-certi�cation, disease-surveillance and eradication programmes [1–3], and none of them canbe considered a de�nitive reference test. Although the tests are biologically similar, they maygive con�icting results for some subjects, which a�ects inferences for the disease status of theconcerned subject and the disease prevalence. McIntosh and Pepe [4] propose a method forcombining several screening tests, each lacking su�cient sensitivity and speci�city, with theaim of improving screening ability. They show that the risk score (the probability of diseasestatus conditional on biomarkers) is an optimal quantity with which to assess the diseasestatus of a subject.

∗Correspondence to: Richard B. Evans, Iowa State University, College of Veterinary Medicine, Room 1710,Ames IA, 50011, U.S.A.

†E-mail: [email protected]

Received April 2003Copyright ? 2004 John Wiley & Sons, Ltd. Accepted November 2003

Page 2: Robust Bayesian prediction of subject disease status and population prevalence using several similar diagnostic tests

2228 R. B. EVANS AND K. ERLANDSON

We also use the risk score, but propose a robust model that downweights the e�ect oftest results that are not consistent with the other results. For example, if two test results(using the same or di�erent diagnostic tests) suggest that a subject is disease negative, but athird test result indicates disease positive then the third result will be downweighted by themodel.The robust model works by combining two pieces of information, the data and the property

that the tests are biologically similar. It is assumed that if the two pieces of informationcon�ict (i.e. inconsistent test results for a subject), then the con�ict is resolved in favour oftest consistency, rather than rejecting the assumption that the tests are biologically similar andcombining inconsistent test results.The model is motivated by a data set, presented in Section 4, which describes the analysis

of outcomes from three enzyme-linked immunosorbent assay (ELISA) diagnostic tests on aset of 60 swine. We expect these tests to provide similar responses for individual subjectsbecause the tests are biologically similar. However, for some swine, the results are inconsistantwith one test suggesting one disease state, while the other tests indicate the other state. Wenote that the discrepant outcomes are distributed among all three diagnostic tests, rather thanone test consistantly providing con�icting results.Our robust method uses a model with heavy-tailed distributions. Heavy-tailed distributions

are a class of distributions typically having more probability in the tails than normal distri-butions.Dawid [5] proved a useful result, under certain reasonable regularity conditions. With data

sampled from heavy-tailed distributions, like the Student t, and conditional on a populationmean parameter, if the mean parameter has a lighter-tailed prior distribution (e.g. normal),then con�ict between data and prior mean is resolved in favour of the prior mean. That is, ifthe prior mean and the data are far apart, then posterior inference about the population meanwill be weighted in favour of the prior, possibly to the exclusion of the data. In contrast, ifthe sampling distribution and the prior distribution have the same tail weight, then outlyingdata points could exert considerable in�uence on inference for the population mean. Dawid[5] also gave the reciprocal result for situations where the prior distribution is heavy tailedand the sampling distribution has lighter tails.In addition, O’Hagan [2] presented two models that resolve con�ict between the data and

prior information using heavy-tailed distributions. He describes operating conditions for theheavy-tailed models and demonstrates the outlier rejection properties with simulations. Thesemodels are robust in the sense that inference is not in�uenced by suspect information (e.g.outliers). Note that we are not necessarily concerned with outcomes that are outlying relativeto other outcomes from the same diagnostic test but rather con�icting results for an individualsubject.For robust inference about disease status and prevalence we use the heavy-tailed models

(and their robustness properties) in a mixture model that is similar to the Bayesian bi-normalmodel for diagnostic tests. The distribution of outcomes for a diagnostic test under the bi-normal model is a mixture of two normal distributions, one corresponding to the diseasepositive subjects and the other to the disease negative subjects. The mixing parameter isthe prevalence. The bi-normal model has been used to model continuous and latent variablediagnostic tests at least as early as Green and Swets [6]. In a summary paper, Greiner et al.[7] describe the bi-normal model in the context of veterinary diagnostic tests, with knowngold standard.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2227–2236

Page 3: Robust Bayesian prediction of subject disease status and population prevalence using several similar diagnostic tests

ROBUST BAYESIAN PREDICTION USING SEVERAL SIMILAR DIAGNOSTIC TESTS 2229

Henkelman et al. [8] considered the problem of inference for receiver operator characteristiccurves of several diagnostic tests with discrete, repeated measures data. They used a multivari-ate latent variable model, with the latent variable distributed as a mixture of two multivariatenormal distributions. The dimension of the multivariate normals corresponds to the numberof diagnostic tests. They do not have a gold standard and use maximum likelihood estimationfor inference about parameters. Our problem is similar to Henkelman et al. [8], but here weconsider continuous multivariate outcomes, and develop a robust approach to inference. Also,we use a hierarchical model because it is di�cult to use a multivariate normal distribution inmore that several dimensions (i.e. more than a few diagnostic tests) due to the di�culty inestimating a high-dimensional covariance matrix.Instead of using a bi-normal likelihood we replace the normal distributions with Student t

distributions and call it the bi-Student t model. Then the next stage is to model the biologicalsimilarity of the tests using a normal hierarchical model. This combination of a heavy-tailedlikelihood and lighter-tailed prior permits the automatic downweighting of con�icting resultsin the same fashion as described in Reference [5].The layout of the paper is as follows. Section 2 describes the robust hierarchical model,

Section 3 describes the results of constructed examples that illuminate the properties of therobust model, Section 4 presents the application of the model to a real example and thediscussion is presented in Section 5.

2. THE ROBUST MODEL

There are two sources of correlation that need that require modelling. First, the tests arecorrelated since they have the same biological underpinnings and we model this correlationusing a hierarchical model. Second, this is repeated measures data, and normally distributedrandom e�ects �i (i=1; : : : ; N ) are used to account for within subject correlation [9].We begin by describing the likelihood, which is conditional on the (unknown) true disease

status of the subjects. For a single diagnostic test, the distribution of test outcomes is a mixtureof two Student t distributions, one for non-diseased, and the other for diseased subjects, whichis analogous to the bi-normal model. Let yit , i=1; : : : ; N , t=1; : : : ; T , represent the outcomefor subject i using test t, and let di be a binary variable representing the disease status ofsubject i (let di=0 represent non-diseased and di=1 represent diseased). The di are unknown,but the �rst stage of the model is conditional on disease status, di= k, k=0; 1, namely,

yit =�kt + �i + �it ; i=1; : : : ; N; t=1; : : : ; T; di= k (1)

where the �it are independently distributed as Student t distributions with location 0, scaleterm �2kt , and two degrees of freedom. Selecting two degrees of freedom provides greater pro-tection against con�icting outcomes than would selecting more than two degrees of freedom.Alternatively, the degrees of freedom could be treated as a nuisance parameter, with a priordistribution, and integrated from the model [1]. The random e�ects terms are assumed to bedistributed as

�iind∼ N(0; �2) (2)

Markov chains are used for inference, and when the di are unknown the chains correspond-ing to the �i and the �kt may not converge quickly. For example, suppose that a subject has

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2227–2236

Page 4: Robust Bayesian prediction of subject disease status and population prevalence using several similar diagnostic tests

2230 R. B. EVANS AND K. ERLANDSON

data consistent with disease positive status (e.g. the outcomes for that subject are all relativelylarge). At any iteration in the chain both disease states may be consistent with the data, �0twith a large random e�ect, or �1t with a small random e�ect. The result is that the chainsmay not stabilize easily. One solution is to limit the size of the random e�ect, by restricting�2, so that a subject cannot ‘�ip’ to the other disease state.Each Student t distribution has it’s own scale term �2kt , and we use the prior distributions

described on the BUGS project web FAQ page [10] on �tting mixture modelling as givingthe most stable results, namely we reparameterize as

log(�−20t ) =

�t + �t2

log(�−21t ) =

�t − �t2

(3)

and

�t; �tind∼ uniform(at; bt); k=0; 1; t=1; : : : ; T (4)

Finally,

log(�−2) ∼ uniform(a; b) (5)

For consistancy with the FAQ page, the priors are on the precisions.There are several approaches for determining hyperparameters. First, hyperparameter values

may be constructed from information, provided by subject matter experts, about parameters.For example, the expert may provide endpoints of a set that he or she is 95 per cent con�dentcontains the expected score for the disease negative subjects, for a test. Treating the set likea symmetric 95 per cent con�dence interval provides the elicited standard deviation (one-fourth the length of the interval), that is, the experts prior expectation for �0t . Then at andbt are chosen so that the mean of the prior distribution corresponds to the elicited standarddeviation. If no expert information exists, hyperparameters can be selected to make the priorproper, but essentially non-informative about the parameters. Their in�uence on inference canbe checked by sensitivity analysis. Finally, the data may be used to determine hyperparametervalues using empirical Bayes techniques.Next, we describe the hierarchical prior distribution that accounts for test correlation

�kt |�k ; 2k ind∼ N(�k ; 2k); k=0; 1; t=1; : : : ; T (6)

Let

�kind∼ locally uniform; k=0; 1 (7)

By modelling the �kt , k=0; 1 with common normal distributions we express the prior beliefthat the diagnostic tests are positively correlated when applied to diseased subjects and theyare also correlated when applied to non-diseased individuals.Note that the likelihood has heavier tails than the prior distribution so that con�icts between

the prior distribution and data are resolved in favour of the prior. Finally, the 2k have proper,non-informative prior distributions given by inverse gamma distributions with a mean of 1and a variance that is very large relative to the data.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2227–2236

Page 5: Robust Bayesian prediction of subject disease status and population prevalence using several similar diagnostic tests

ROBUST BAYESIAN PREDICTION USING SEVERAL SIMILAR DIAGNOSTIC TESTS 2231

In order to use (1)–(7) the outcomes for each test must be on the same scale. Thisis accomplished by standardizing each test outcome by subtracting the sample mean anddividing by the sample standard deviation. Thus, data for each test must have sample meanzero and standard deviation one. Linear transformations do not change the accuracy propertiesof diagnostic tests. Also, some tests may have larger outcomes regarded as corresponding topositive subjects, while the reverse holds for other tests. This reversal can be recti�ed bymultiplying corresponding test outcomes by negative one.If we assume that larger test scores are indicative of positive status then we include the order

restriction �1t¿�0t in the model. The restriction may improve convergence of the Markovchains in the case of diagnostic tests that do a poor job distinguishing disease positive anddisease negative subjects, that is, when �1t and �0t are not su�ciently far apart relative tothe scales of the bi-Student t distribution.The model thus far has been are conditional on disease status. We model the uncertainty

about disease status with

di| ind∼ Bernoulli() (8)

and

∼ uniform(0; 1) (9)

The primary object of inference is the disease status, di, and the disease prevalence .Inference for will be from the posterior distribution of , which is estimated using Markovchain Monte Carlo (MCMC) methods.For inference about the di, we will use the predictive probability, Pr(di=1|y), where y is

the N ×T (N subjects and T tests) matrix of data. This quantity is the posterior distribution ofthe di. Values of Pr(di=1|y) near zero suggest that subject i is disease negative, values near0.5 indicate uncertain disease status, and values near 1 suggest subject i is disease positive. Inorder to classify subjects as diseased or not, Pr(di=1|y) will need to be dichotomized with acut-o�. Greiner et al. [7] provide a summary of methods for selecting cut-o�s for continuousdiagnostic tests.

3. CONSTRUCTED EXAMPLES

In order to explore the robustness properties and operating characteristics of the bi-Studentt hierarchical model we predicted disease status, and estimated prevalence for sets of hypo-thetical subjects generated from a model with known disease status for each individual. Theconstructed example data were generated using a real example (presented in Section 4) tomotivate the sample size and number of diagnostic tests. The constructed examples have 60subjects with outcomes generated from three bi-normal, random e�ects (for within subjectcorrelation) models representing three diagnostic tests evaluated on the same subjects. Weexpect that disease positive subjects have positive values, and disease negative subjects havenegative values, however, there is substantial overlap of the mixture distributions so that thesehypothetical tests do not separate disease positive and disease negative subjects very well.The two constructed examples correspond to two disease prevalences (20 and 50 per cent)

where both have 10 per cent of the subjects who are simulated to have a con�icting observationfor one of the three tests.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2227–2236

Page 6: Robust Bayesian prediction of subject disease status and population prevalence using several similar diagnostic tests

2232 R. B. EVANS AND K. ERLANDSON

After standardizing the simulated data, the positive subjects for each diagnostic test havea mean roughly around 0.6, and a standard deviation of about 0.7. The negative subjects foreach test have a mean roughly −0:6 and a standard deviation of about 0.7.Values are changed to have con�icting observations in the following way: some disease

negative subjects will get a value (for one test) consistent with disease positive, and somedisease positive subjects will get a value consistent with disease negative. Con�icting ob-servations are generated by replacing a subject’s outcome with approximately the mean ofthe normal distribution corresponding to the other disease state. The con�icting observationswere spread over all three tests and generally evenly distributed among the positive and nega-tive subjects so that a con�icting observation was combined with strongly positive (negative)moderately positive (negative), and weakly positive (negative) subjects.The examples are analysed with the robust model (1)–(9), and a bi-normal hierarchical

model which is analogous to the bi-Student t model except that the �it (in (1)) are indepen-dently distributed as N(0; �2kt). All other model parts are the same. The goals are to showthat the robust model provides sensible inferences for and the di, and that inferences arepreferable to the bi-normal model.The hierarchical model has several hyperparameters that must be speci�ed. There is no

expert information to aid in hyperparameter selection, so they were chosen to produce priorsthat have little in�uence on the posterior distributions. The hyperparameters in (4) and (5) areassigned values at =−3, bt =3, a=3, and b=5. These values were chosen to give a widerange of values, and manipulating them (e.g. using at =−2) did not change inferences.The posterior distributions of the di and are estimated using Markov-chain Monte Carlo

methods, and the WinBUGS software [10]. Markov chains were generated for all the pa-rameters using a ‘burn in’ of 5000 iterations, then collecting every 10th iteration (to removeautocorrelation) for 50 000 more iterations. Convergence for all chains was assessed usingGeweke’s test, which compares the �rst part of each chain (after burn in) to the last part ofthe chain. The test detected no statistical di�erence implying that the chains converged to thetarget distribution. Also, several di�erent sets of initial values were used and there was nopractical di�erence in the results.Table I gives the results for a descriptive subset of subjects for the �rst example

(50 per cent prevalence and 10 per cent con�icting observations). The �rst four subjects do nothave con�icting values and the last six do. The �rst two subjects are disease positive, the thirdis disease negative but has marginal outcomes and the forth is disease negative, and the dataindicate so. As expected, both models performed sensibly on these subjects. There is some dis-crepancy for subject 36, for whom the robust model is undecided (Pr(di=1|y; robust)=0:56)but Pr(di=1|y; bi-normal)=0:17 which is �rmly disease negative. Note that for the remaining50 subjects without con�icting outcomes Pr(di=1|y; robust) is generally consistent with theoutcomes for example, if a subject is disease positive, but has all negative values, it hassmaller posterior probability of disease than other disease positive subjects.The subjects (1; 15; 24; 38; 48; 60) with con�icting values are listed from most positive to

most negative test results (for the non-con�icting data). The �rst three subjects are diseasepositive and the remaining three are disease negative, which means that Pr(di=1|y) shouldbe higher for the �rst three subjects and lower for the last three. Using Pr(di=1|y)=0:5 asa cut-o�, the robust model failed for subjects 24 and 38, but predicted correctly for the fourother subjects. The bi-normal model was in�uenced by the con�icting data and predicted allthe subjects to disease negative.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2227–2236

Page 7: Robust Bayesian prediction of subject disease status and population prevalence using several similar diagnostic tests

ROBUST BAYESIAN PREDICTION USING SEVERAL SIMILAR DIAGNOSTIC TESTS 2233

Table I. E�ect of con�icting observations on Pr(di=1|y) for 50 per cent prevalence and 10 per centcon�icting observations. The �rst four rows contain results for subjects with consistant test scores andthe last six rows correspond to the subjects with con�icting observations. The average positive subjecttest scores are 0.54, 0.64 and 0.73 for the three tests, and the average test scores for the negativesubjects are close to the negative of these scores. The standard deviations for the positive and negative

subjects for each test is about 0.73.

Disease Test Pr(di =1|y)Subject Status 1 2 3 Bi-normal Robust

10 1 1.03 1.13 1.04 0.99 0.9920 1 0.61 0.53 0.51 0.98 0.9936 0 −0.08 0.08 0.05 0.17 0.5747 0 −0.73 −0.89 −0.92 0.0 0.0

1 1 −0.6 2.40 1.73 0.003 0.64515 1 0.87 −0.6 0.85 0.005 0.88824 1 0.25 0.05 −0.6 0 0.26238 0 0.6 −0.01 −0.23 0.101 0.73948 0 −0.76 0.6 −1.07 0 0.07260 0 −2.54 −2.07 0.6 0 0.231

The estimated prevalences were 0.59 for the bi-normal model but 0.49 for the robust model,which was closer to the true value of 0.5. Posterior standard deviations for prevalence wereabout the same, 0.07 for each model.Table I also illustrates an interesting property of the robust model. Subject 60 has two

strongly negative outcomes and one positive outcome, so that the expectation is that thepositive outcome is rejected and Pr(di=1|y)=0. But the negative values are in the fartails of the disease negative distributions (recall the means are about −0:6 and the standarddeviations are about 0.7) so that the negative values are outliers, and are downweighted, sothat Pr(di=1|y)=0:231. In other words, the con�icting value (0.6) appears more reasonableto the model than do the two extreme values, so they are downweighted. However, mostcut-o�s would still classify subject 60 as negative.Table II presents a subset of the results for the second example (20 per cent prevalence and

10 per cent con�icting observations). In this constructed data set, there were three con�ictingobservations among the 12 positive subjects, and three among the 48 negative subjects. Thus,there is less reliable data for the disease positive part of the bi-Student t. The subjects are listedfrom ‘most positive’ to ‘most negative’ test results. The �rst three subjects are disease positiveand the remaining three are disease negative. The robust model has posterior probabilities thatare consistent with downweighting the con�icting outcome. The bi-normal model failed forthe disease negative subjects, with the con�icting outcome exerting considerable in�uence. Aswith the �rst example, Pr(di=1|y; robust) is consistent for the remaining 54 subjects.The estimated prevalences were 0.46 for the normal model and 0.36 for the robust model,

thus both di�erent from the true value of 0.2. However, inspection of the data suggests that thevalues for Pr(di=1|y) are consistent with the data, and that the models misclassify subjects inthe overlapping tails of the true model. For example, the disease negative subjects in the tailof the disease positive distribution are sensibly classi�ed as disease positive. Also, the prior

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2227–2236

Page 8: Robust Bayesian prediction of subject disease status and population prevalence using several similar diagnostic tests

2234 R. B. EVANS AND K. ERLANDSON

distribution for is �at on the range 0–1, but additional information about disease prevalencewould provide better inferences for .

4. ELISA EXAMPLE

In this section, we demonstrate the robust methodology using a real example of 60 marketswine that were tested for Mycoplasma hyopneumoniae using three ELISA diagnostic tests.M. hyopneumoniae is a small, endemic bacteria that causes pneumonia and potentiates otherrespiratory diseases in swine. Two of the tests, the IDEXX (IDEXX Labs, Inc.) and theDAKO (Dako Corp.), are commercial products and the third is the TWEEN 20 [11]. Theoutcomes are continuous measures representing the amount of antibody present in the serum,so we expect that the tests should give about the same results.For the IDEXX and TWEEN 20 larger values represent positive subjects, but the reverse is

true for the DAKO. Also, the tests are not on the same scale, so that they must be transformedto a common scale. First, the DAKO values are multiplied by negative one, and then the testswere standardized.The hyperparameters in (4) and (5) are assigned values in the same fashion as in Section 3,

with values at =−3, bt =3, a=3, and b=5.The posterior distributions of the di and are approximated using Markov-chain Monte

Carlo methods, and the WinBUGS software [10]. Markov chains were generated, and conver-gence was assessed as described in Section 3.Table III is a selection of transformed ELISA data and the posterior probability of disease

status, Pr(di=1|y) for the bi-Student t model and for comparison, the bi-normal model. Largepositive data values are regarded as corresponding to disease positive subjects. The subjectswith con�icting data have di�erent predictive probabilities under the two models. The bi-normal model gave large probabilities to all subjects, but the robust model was less decisivewith the con�icting results, with most values closer to 0.5, than to 0 or 1. Using a cut-o� of0.5, the models predicted di�erently on 10 subjects (some are reported in Table III).The bi-normal model estimated prevalence as 0.61 with a credible interval of (0:46; 0:74)

and the bi-Student t estimate is 0.48 (0:35; 0:69). Experts familiar with this example believethat about half the animals are disease positive.

Table II. E�ect of con�icting observations on Pr(di=1|y) for 20 per cent prevalence and 10 per centcon�icting observations. The average positive subject test score are 0.75, 0.7 and 0.7 for the threetests, and the average test scores for the negative subjects are approximately minus these. The standard

deviations are about 0.8 for each test.

Disease Test Pr(di =1|y)Subject Status 1 2 3 Bi-normal Robust

1 1 −0.6 2.47 2.83 1 0.9476 1 1.71 −0.6 1.42 1 0.98512 1 0.93 0.66 −0.6 1 0.79529 0 0.6 −0.04 −0.07 0.988 0.05548 0 −0.8 0.6 −0.85 0.903 0.06860 0 −1.98 −1.82 0.6 1 0.159

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2227–2236

Page 9: Robust Bayesian prediction of subject disease status and population prevalence using several similar diagnostic tests

ROBUST BAYESIAN PREDICTION USING SEVERAL SIMILAR DIAGNOSTIC TESTS 2235

Table III. Selected results for the ELISA data. This table contains the normalized test outcomes andPr(di=1|y) under the bi-normal model and for the robust model.

Pr(di =1|y)Subject TWEEN 20 IDEX DAKO Bi-normal Robust

16 0.91 −0.6 −0.77 0.989 0.30721 2.31 −0.42 −0.33 0.999 0.45924 −0.67 −0.95 2.92 0.904 0.125 −0.49 −0.24 1.49 0.896 0.43644 1.51 −0.56 −0.44 0.999 0.414

-1 0 1 2

0

42

68

12

0

42

68

12

0

42

68

12

TWEEN 20

-1 0

IDEXX

-2 -1 0 1 2 3

DAKO

Figure 1. Histograms of the transformed ELISA test data.

Since we do not know the true disease status of the swine, is not possible to verify thebi-Student t distributional assumption. Moreover, the histograms of the standardized data inFigure 1 provide little information about the bi-modality of the distributions. Extensive sim-ulations (not reported here) of the bi-normal and bi-Student t have shown that, unless themixture distributions are well separated, histograms of virtually every shape are possible.A Bayes factor was used to compare the relative �t of the bi-normal and the bi-Student t

models, and they had similar �t (Bayes factor equal to 1.03). This is not surprising, becausea normal distribution with a large variance is an approximation to a t distribution. Both

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2227–2236

Page 10: Robust Bayesian prediction of subject disease status and population prevalence using several similar diagnostic tests

2236 R. B. EVANS AND K. ERLANDSON

models were also �t to natural log transformed data, but the results clearly indicated that thetransformation was inappropriate.

5. DISCUSSION

This article proposes a robust method for prediction of disease status and estimation of diseaseprevalence. The key feature is the relationship between the likelihood (which contains theinformation in the data) and the prior distribution (which contains information about thebiological similarity of the diagnostic tests). When the data disagree with the prior distribution(e.g. the results for individuals con�ict among tests, as for the last six subjects in Table I)then the disagreement is resolved in favour of the prior distribution, and the spurious testsresults are downweighted.If the tests are not biologically similar (e.g. if they are conditionally independent) then

there may be many more discordant observations. While the robust model may be still used,these observations cannot be considered outliers, but rather as valid results measuring di�erentbiological objects. In that case, 2k in (6) would be large enough to account for the disparityin the tests, and the rejection properties of hierarchial structure would disappear. There wouldbe no ‘outliers,’ and the predictive probabilities would not conclusively predict disease status.The robust method may be applied to any problem with more than one diagnostic test.

Sometimes there is no clear outlier. For example, one positive, one negative and oneuncertain result, or two positive and two negative results. These uncertainties are re�ectedin Pr(di=1|y), which will be around 0.5, a conservative estimate for disease status. If thesepatterns persist in the data, then they should be incorporated into the model.

ACKNOWLEDGEMENTS

The authors thank the reviewers for their useful suggestions and comments.

REFERENCES

1. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. Chapman & Hall: New York, 1995.2. O’Hagan A. Modelling with heavy tails. Bayesian Statistics 1988; 3:345–359.3. Gardner I, Stryhn H, Lind P, Collins M. Conditional dependence between tests a�ects the diagnosis andsurveillance of animal diseases. Preventive Veterinary Medicine 2000; 45:107–122.

4. McIntosh MW, Pepe MS. Combining several screening tests: optimality of the risk score. Biometrics 2002;58(3):657–664.

5. Dawid AP. Posterior expectations for large observations. Biometrika 1973; 60:664–667.6. Green DM, Swets JA. Signal Detection Theory and Psychophysics. Wiley, Inc.: New York, 1966.7. Greiner M, Pfei�er D, Smith RD. Principles and practical application of the receiver-operating characteristicanalysis for diagnostic tests. Preventive Veterinary Medicine 2000; 45:23–41.

8. Henkelman M, Kay I, Bronskill MJ. Receiver operator characteristic analysis without truth. Medical DecisionMaking 1990; 10(1):24–29.

9. Qu Y, Tan M, Kutner M. Random E�ects models in latent class analysis for evaluating accuracy of diagnostictests. Biometrics 1996; 52:797–810.

10. Spiegelhalter D, Thomas A, Best N, Gilks W. WinBUGS Version 1.3. MRC Biostatistics Unit, 1995.11. Bereiter M, Young TF, Joo HS, Ross RF. Evaluation of the ELISA and comparison to the complement �xation

test and radial immunodi�usion enzyme assay for detection of antibodies against Mycoplasma hyopneumoniaeis swine serum. Veterinary Microbiology 1990; 25:177–192.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:2227–2236