bayes in biological anthropology

32
Bayes in Biological Anthropology Lyle W. Konigsberg* and Susan R. Frankenberg Department of Anthropology, University of Illinois at Urbana-Champaign, Urbana, IL 61801 KEY WORDS Bayesian statistics; MCMC; OpenBUGS ABSTRACT In this article, we both contend and illustrate that biological anthropologists, particularly in the Americas, often think like Bayesians but act like fre- quentists when it comes to analyzing a wide variety of data. In other words, while our research goals and per- spectives are rooted in probabilistic thinking and rest on prior knowledge, we often proceed to use statistical hypothesis tests and confidence interval methods unre- lated (or tenuously related) to the research questions of interest. We advocate for applying Bayesian analyses to a number of different bioanthropological questions, espe- cially since many of the programming and computational challenges to doing so have been overcome in the past two decades. To facilitate such applications, this article explains Bayesian principles and concepts, and provides concrete examples of Bayesian computer simulations and statistics that address questions relevant to biologi- cal anthropology, focusing particularly on bioarchaeology and forensic anthropology. It also simultaneously reviews the use of Bayesian methods and inference within the discipline to date. This article is intended to act as primer to Bayesian methods and inference in bio- logical anthropology, explaining the relationships of vari- ous methods to likelihoods or probabilities and to classical statistical models. Our contention is not that traditional frequentist statistics should be rejected out- right, but that there are many situations where biologi- cal anthropology is better served by taking a Bayesian approach. To this end it is hoped that the examples pro- vided in this article will assist researchers in choosing from among the broad array of statistical methods cur- rently available. Am J Phys Anthropol 57:153–184, 2013. V C 2013 Wiley Periodicals, Inc. “The Bayesian approach can have a clarifying effect on one’s thinking about evidence.” (Koehler and Saks, 1991, p 364) Traditional training in statistical methods for those who go on to become practicing biological anthropolo- gists has focused primarily on classical hypothesis test- ing. This is apparent in both textbooks geared toward anthropologists in general (Thomas, 1986; Madrigal, 1998; Bernard, 2011) and specialized texts for biological anthropologists (Slice, 2005; D’Ao~ aut and Vereecke, 2011). While Bayes’ Theorem may be mentioned in pass- ing in introductory statistics courses, this is typically restricted to examples of such limited interest that the student has little motivation to recall the theorem, and even less motivation to assume that there may be future value in having learned about Bayes’ Theorem. In Bayesian terms, the prior probability that the student will retain Bayes’ Theorem is quite low. In contrast, the student and eventual practitioner is likely to learn about confidence intervals, Type I and Type II errors in hypothesis testing, and P-values, and to blithely assume that what they have learned represents the near totality of what is available and useful within modern statistical practice. This represents an unfortunate omission of Bayesian methods and inference. Bayesian methods and inference are particularly help- ful for creating estimates and uncertainties about those estimates without asymptotic approximation, and for incorporating prior information with data to generate problem-specific distributions in a systematic and logical way. Such methods obey the likelihood principle (unlike classical inference), generate interpretable answers in terms of a probability distribution, readily accommodate missing data and complex parametric models, and allow comparison between models. This is not to say that Bayesian methods and inference are appropriate in all contexts: there is no single best practice for selecting prior distributions, and Bayesian methods often have high computational costs. However, these drawbacks do not explain why biological anthropologists in the Ameri- cas have largely chosen to ignore Bayesian methods, while these tools have become popular and useful else- where. Courgeau (2012) gives a very complete account of the use of both frequentist and Bayesian methods within the broader social sciences, and McGrayne’s (2011) popu- lar history of Bayes’ Rule (another name for the theo- rem) gives insight into why Bayesian methods have only fairly recently come to the fore. In large measure, the recent increase in Bayesian applications across diverse fields and around the world has occurred because of the development of computer simulation methods and related software (Geyer, 1992; Gilks et al., 1996; Gamerman, 1997; Lunn et al., 2000, 2009; Brooks et al., 2011) that remove the computational burden from the user. Our goals here are to explain Bayesian principles in a way that make their applicabil- ity understandable and straightforward, to provide Grant sponsor: National Science Foundation; Grant number: BCS97–27386. *Correspondence to: Lyle W. Konigsberg, University of Illinois Department of Anthropology, 109 Davenport Hall, MC-148, 607 S. Mathews Street, Urbana, Illinois 61801, USA. E-mail: [email protected] DOI: 10.1002/ajpa.22397 Published online 4 November 2013 in Wiley Online Library (wileyonlinelibrary.com). Ó 2013 WILEY PERIODICALS, INC. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 57:153–184 (2013)

Upload: susan-r

Post on 23-Dec-2016

220 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Bayes in biological anthropology

Bayes in Biological Anthropology

Lyle W. Konigsberg* and Susan R. Frankenberg

Department of Anthropology, University of Illinois at Urbana-Champaign, Urbana, IL 61801

KEY WORDS Bayesian statistics; MCMC; OpenBUGS

ABSTRACT In this article, we both contend andillustrate that biological anthropologists, particularly inthe Americas, often think like Bayesians but act like fre-quentists when it comes to analyzing a wide variety ofdata. In other words, while our research goals and per-spectives are rooted in probabilistic thinking and rest onprior knowledge, we often proceed to use statisticalhypothesis tests and confidence interval methods unre-lated (or tenuously related) to the research questions ofinterest. We advocate for applying Bayesian analyses toa number of different bioanthropological questions, espe-cially since many of the programming and computationalchallenges to doing so have been overcome in the pasttwo decades. To facilitate such applications, this articleexplains Bayesian principles and concepts, and providesconcrete examples of Bayesian computer simulationsand statistics that address questions relevant to biologi-

cal anthropology, focusing particularly on bioarchaeologyand forensic anthropology. It also simultaneouslyreviews the use of Bayesian methods and inferencewithin the discipline to date. This article is intended toact as primer to Bayesian methods and inference in bio-logical anthropology, explaining the relationships of vari-ous methods to likelihoods or probabilities and toclassical statistical models. Our contention is not thattraditional frequentist statistics should be rejected out-right, but that there are many situations where biologi-cal anthropology is better served by taking a Bayesianapproach. To this end it is hoped that the examples pro-vided in this article will assist researchers in choosingfrom among the broad array of statistical methods cur-rently available. Am J Phys Anthropol 57:153–184,2013. VC 2013 Wiley Periodicals, Inc.

“The Bayesian approach can have a clarifying effecton one’s thinking about evidence.” (Koehler and Saks,1991, p 364)

Traditional training in statistical methods for thosewho go on to become practicing biological anthropolo-gists has focused primarily on classical hypothesis test-ing. This is apparent in both textbooks geared towardanthropologists in general (Thomas, 1986; Madrigal,1998; Bernard, 2011) and specialized texts for biologicalanthropologists (Slice, 2005; D’Ao~aut and Vereecke,2011). While Bayes’ Theorem may be mentioned in pass-ing in introductory statistics courses, this is typicallyrestricted to examples of such limited interest that thestudent has little motivation to recall the theorem, andeven less motivation to assume that there may be futurevalue in having learned about Bayes’ Theorem. InBayesian terms, the prior probability that the studentwill retain Bayes’ Theorem is quite low. In contrast, thestudent and eventual practitioner is likely to learn aboutconfidence intervals, Type I and Type II errors inhypothesis testing, and P-values, and to blithely assumethat what they have learned represents the near totalityof what is available and useful within modern statisticalpractice. This represents an unfortunate omission ofBayesian methods and inference.

Bayesian methods and inference are particularly help-ful for creating estimates and uncertainties about thoseestimates without asymptotic approximation, and forincorporating prior information with data to generateproblem-specific distributions in a systematic and logicalway. Such methods obey the likelihood principle (unlikeclassical inference), generate interpretable answers interms of a probability distribution, readily accommodatemissing data and complex parametric models, and allow

comparison between models. This is not to say thatBayesian methods and inference are appropriate in allcontexts: there is no single best practice for selectingprior distributions, and Bayesian methods often havehigh computational costs. However, these drawbacks donot explain why biological anthropologists in the Ameri-cas have largely chosen to ignore Bayesian methods,while these tools have become popular and useful else-where. Courgeau (2012) gives a very complete account ofthe use of both frequentist and Bayesian methods withinthe broader social sciences, and McGrayne’s (2011) popu-lar history of Bayes’ Rule (another name for the theo-rem) gives insight into why Bayesian methods have onlyfairly recently come to the fore.

In large measure, the recent increase in Bayesianapplications across diverse fields and around the worldhas occurred because of the development of computersimulation methods and related software (Geyer, 1992;Gilks et al., 1996; Gamerman, 1997; Lunn et al., 2000,2009; Brooks et al., 2011) that remove the computationalburden from the user. Our goals here are to explainBayesian principles in a way that make their applicabil-ity understandable and straightforward, to provide

Grant sponsor: National Science Foundation; Grant number:BCS97–27386.

*Correspondence to: Lyle W. Konigsberg, University of IllinoisDepartment of Anthropology, 109 Davenport Hall, MC-148, 607 S.Mathews Street, Urbana, Illinois 61801, USA.E-mail: [email protected]

DOI: 10.1002/ajpa.22397Published online 4 November 2013 in Wiley Online Library

(wileyonlinelibrary.com).

� 2013 WILEY PERIODICALS, INC.

AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 57:153–184 (2013)

Page 2: Bayes in biological anthropology

concrete examples of computer simulations and statisticsgrounded in Bayes Theorem that address questions rele-vant to biological anthropology, and to simultaneouslyreview how Bayesian methods and inference have beenused in biological anthropology to date. We first reviewmaximum likelihood estimation to establish some termi-nology, and then use a simple example of Bayes’ postu-late (Bayes’ Theorem with a uniform prior) to examine:1) the likelihood, prior and posterior, and 2) differencesbetween highest posterior density (HPD) regions andconfidence intervals. We move on to Bayes Theorem andhow it can be used to 1) create new priors (sequentialuse), 2) generate predictive densities for new samples, 3)evaluate competing models (Bayes’ factor), and 4) esti-mate normally distributed parameters. We then delveinto computer simulation, Bayesian statistics, and free-ware applications, reserving the “nuts and bolts” of dif-ferent methods for simulating values out of variousdistributions for the Appendix.

The final sections of the article illustrate variousBayesian methods using published and practical “toy”examples from bioarchaeology and from forensic anthro-pology. The bioarchaeology examples involve modelingmortality and accounting for uncertainty in age esti-mates in paleodemography, and using full posterior den-sity distributions to address disease prevalence,specificity, and sensitivity in paleopathology. The foren-sic anthropology examples use Bayesian methods toaddress the analysis of commingled remains and issuesof identification in closed population mass disasters. Theforensics section also includes a discussion of the poten-tial problems that arise when conditional probabilitiesare transposed in evidentiary settings and when priorprobabilities are misinterpreted. We then conclude witha brief review of the frequentist–Bayesian debate andtexts that focus on Bayesian inference.

MAXIMUM LIKELIHOOD ESTIMATION

We briefly review maximum likelihood estimation inthis section as a prelude to examining Bayesian infer-ence. Throughout this article we use a simple bioanthro-pological example based on Mays and Faerman’s (2001)data on sex identification for 13 infants from twoRomano-British cemeteries. The sample size is smallbecause sex identifications were made using ancientDNA (aDNA), but ultimately Mays and Faerman wantedto estimate the proportion of males among all infantsfrom these two sites. The authors suspected that theseinfant burials were the result of infanticide, and thusthat the proportion of males (which can be written as h)for all infants buried at the two sites would differ fromthe expected proportion among living neonates. TheiraDNA identifications indicated that 9 of the 13 individu-als were males and only four were females.

The likelihood of obtaining certain parameter valuesgiven observed outcomes lies at the core of statisticalinference. If we actually knew the value of h, then wecould find the binomial probability of getting 9 males outof 13 individuals. For example, if h 5 0.5, then that prob-ability is p x59jn513; h50:5ð Þ50:0873. However, since wein fact do not know the value of h, we must estimate it.In maximum likelihood estimation, we can refer to thelikelihood of a specific value for a parameter (h) condi-tional on the observed data (the fact that 9 of 13 individ-uals were observed to be males). The likelihood isdefined as proportional to the probability of obtaining

the data conditional on the specific value of h, or in theparticular case from Mays and Faerman:L hjx59;n513ð Þ / p x59jn513; hð Þ. This definition of alikelihood follows Fisher’s (1922, p 310) succinctdefinition:

The likelihood that any parameter (or set of parame-ters) should have any assigned value (or set of values)is proportional to the probability that if this were so,the totality of observations should be that observed.

Figure 1 shows the likelihood of the possible parame-ter values (proportion of males) given 9 males observedout of 13 individuals as equal to p x59jn513; hð Þ from thebinomial distribution, where:

p xjn; hð Þ5n

x

!hx 12hð Þn2x: (1)

Because the likelihood is only defined up to a multiplica-tive constant of proportionality (or an additive constantfor the log-likelihood), we can re-scale Eq. (1) by drop-ping the binomial coefficient and writing the log-likelihood in place of the likelihood:

‘ hjx;nð Þ5x3ln hð Þ1 n2xð Þ3ln 12hð Þ: (2)

The first derivative of Eq. (2) is:

x2nhh 12hð Þ : (3)

Setting Eq. (3) equal to zero and solving for h gives x/n,the well-known maximum likelihood estimate. Takingthe inverse of the negative of the second derivative ofEq. (3) and evaluating that at h 5 x/n gives:

Fig. 1. Likelihood function for observing 9 males out of 13individuals assuming a binomial distribution with h as the pro-portion of males among all dead infants.

154 L.W. KONIGSBERG AND S.R. FRANKENBERG

American Journal of Physical Anthropology

Page 3: Bayes in biological anthropology

xn 12 x

n

� �n

; (4)

the variance of the estimate of h. If one is willing toassume that the normal distribution forms a reasonableapproximation to the binomial, then x/n 61:96 times thesquare root of Eq. (4) gives a 95% confidence interval.Later we will refer to this as the “asymptotic confidenceinterval.” We can also consider an “exact” confidenceinterval (Clopper and Pearson, 1934) that is “based oninverting equal-tailed binomial tests” (Agresti and Coull,1998, p 119) and an approximate confidence intervalthat adds 1.92 males and 1.92 females to the counts andthen applies the asymptotic normal equation (see Agrestiand Coull, 1998 for the justification of this approach).The 95% asymptotic, exact, and Agresti–Coull confidenceintervals for Mays and Faerman’s data are 0.4414–0.9432, 0.3857–0.9091, and 0.4204–0.8765, where theintervals were obtained using the package“binom”(Dorai-Raj, 2009) within the program “R” (RDevelopment Core Team, 2013).

BAYES’ POSTULATE

Bayes’ postulate allows one to begin with an initialbelief that events are equiprobable and then modify thisbelief after observing data. In his original example,Bayes (1763) attempted to determine the position of atossed ball on a table by counting the number of timessubsequently tossed balls fell to the left (as versus theright) of the initially tossed ball. Each tossed ball couldfall anywhere from the extreme left side of the table, “0,”to the extreme right side, “1,” and was equally likely tobe at any given point along the left to right continuum.This is a special case of Bayes’ Theorem in which theprior is a uniform distribution (whereas Bayes’ Theoremallows for many different prior distributions). As Stigler(1986, p 361) points out, it was really Laplace in his1774 publication who most fully developed “‘Bayesian’ideas” using the uniform prior, but the terms Bayes’Theorem and “Bayesian” have stuck because of historicalpriority.

While Mays and Faerman took a frequentist approachto analyzing their data, the same data also can be usedto infer the proportion of males for all infant deathsfrom these two Romano-British sites following a Bayes-ian approach. This problem is essentially no differentthan Bayes’ (1763) original example of tossed balls. Inthe Mays and Faerman example, h, the proportion ofmales among dead infants, takes the place of the posi-tion of the initially tossed ball. The initial proportion,like the initial ball, can be anywhere from “0” to “1,” andit is equally likely to be at any given point along the con-tinuum. The counts of infants sexed as male and asfemale among the 13 sexed individuals take the place ofBayes’ subsequent balls falling to the left (as versusright) of the initial ball. Four important components inusing Bayes postulate (and Bayes Theorem) are the like-lihood, the prior, the posterior, and HPD. These aredefined below with examples from the Mays and Faer-man data.

The likelihood, prior, and posterior

Because the likelihood is only defined up to a multipli-cative constant of proportionality (or an additive con-stant for the log-likelihood), we can consider thefollowing re-scaling for Eq. (1):

L hjx;nð Þ5n

x

!hx 12hð Þn2x n11ð Þ: (5)

This n11 rescaling, often called the normalized likeli-hood, follows from the fact that there are n11 possiblestates for the x variable (from 0 to n).

The prior is simply the initial probability we assign toany particular value for h prior to observing or at leastanalyzing our data (summarized in the likelihood) that 9of 13 individuals were sexed as male. In his postulate,Bayes adopted a uniform prior such that f hsð Þ51 for 0 �hs � 1 where we use the subscript “s” to mean a specificvalue of h and f �ð Þ to mean a probability density func-tion. This is a “proper prior” in the sense that the priorintegrates to 1.0 across the defined range of h (from zeroto one, inclusive). We will work with other probabilitydensity functions where particular values of the functionrise above 1.0. This often causes confusion for readersused to dealing with probabilities (which are constrainedbetween 0 and 1 inclusive) and not with probability den-sity functions. Probability density functions are con-strained only in that they cannot take negative valuesand that they must integrate to 1.0. The posterior is theprobability we arrive at after evidence or data is takeninto account. Bayes’ postulate allows us to reversethe conditioning on the probability shown in Eq. (1) sothat:

f hsjx59;n513ð Þ5 p x59jn513; hsð Þð1

0

p x59jn513; hð Þdh

: (6)

Equation (6) is equivalent to the scaled likelihoodalready given in Eq. (5). Equation (6) is referred to as a“posterior probability density function.” Because of theintegration in the denominator, the posterior integratesto 1.0. For any given value of h the function gives theprobability density after (posterior to) modifying theprior by the likelihood. Figure 2 shows a plot of theentire posterior density for the proportion of males inour current example. As noted above, probability densityfunctions can have values that exceed 1.0. This is cer-tainly true for the case shown in Figure 2 where thehighest density is equal to about 3.3 at the mode andwhere for h values between about 0.48 and 0.86 the den-sity is greater than 1.0.

Of confidence and HPDs

At this point in the analysis, a frequentist would prob-ably calculate a 95% confidence interval to characterizeor evaluate the posterior density for h (such as the threeconfidence intervals given in the previous section onmaximum likelihood estimation) so we need to considerhow the comparable problem is addressed in a Bayesiananalysis. One issue both frequentists and Bayesiansmust deal with is the asymmetry of the posterior densityfunction in Figure 2 such that the posterior mean (equalto 0.6667) is less than the posterior mode (equal to 9/13,or 0.6923). We can find left and right tail areas of 0.025in order to try to establish a 95% confidence interval forh, but because of the asymmetry this will lead to includ-ing a region which has lower posterior density values. InFigure 2, the 95% confidence interval is shown collec-tively by the light gray and dark gray areas, where thelight gray area represents the region with the lower

BAYES IN BIOLOGICAL ANTHROPOLOGY 155

American Journal of Physical Anthropology

Page 4: Bayes in biological anthropology

density. We can address the issue of including a regionwith lower probability densities within the 95% confi-dence interval by instead forming the 95% HPD region.The 95% Bayesian HPD shown in Figure 3 for h is from0.436 to 0.885, with the excluded lower tail containingabout 3.38% of the total posterior density and theexcluded upper tail about 1.62%. This HPD has probabil-ity densities that range from a low of 0.577 on both sidesto a high of 3.278 at the mode. In contrast, the equal-tailed interval shown in Figure 2 has probability den-sities that fall to as low as 0.454 on the left as versus alow of 0.777 on the right. Ultimately, the HPD is nar-rower than the equal-tailed interval, which must be thecase when there is asymmetry in the distribution.

The Bayesian HPD differs from the frequentist confi-dence interval in how it is interpreted, and the two areconsequently not directly equivalent or comparable. TheHPD is a probabilistic statement about the parameter,whereas a confidence interval is a probabilistic state-ment about potential replications of the “experiment.” Asa consequence, confidence intervals do not mean whatwe commonly think them to mean. Kruschke (2010b, p661) gives a succinct definition for the term “confidenceinterval” as used in frequentist statistical methods: “Aconfidence interval is merely the range of hypotheticalparameter values we would not reject if we replicatedthe intended experiment many times.” Lee (2012, p xxi)in the preface to his book points out how the term“confidence interval” confused him when he “firstlearned a little statistics”:

. . .the statement that a 95% confidence interval for anunknown parameter ran from 22 to 12 sounded as ifthe parameter lay in that interval with 95% probabilityand yet I was warned that all I could say was that if

I carried out similar procedures time after time thenthe unknown parameters would lie in the confidenceintervals I constructed 95% of the time. It appearedthat the books I looked at were not answering the ques-tions that would naturally occur to a beginner, andthat instead they answered rather recondite questionswhich no one was likely to want to ask.

Smith (1986, p 303) has made similar comments withregard to the difference between confidence intervalsand Bayesian HPDs for the recombination fraction ingenetic linkage analyses:

Careful books also warn that, though one would like tothink that there is a 95% probability that the truevalue of the parameter will fall in this interval, this isnot logically justified by the definition of “confidenceinterval.” The only substantial difference using Bayesis to show that this interval is in fact a “probabilityinterval”; there is (approximately) a 95% probabilitythat the true value lies in the interval.

To fully understand how a confidence interval for thebinomial operates it is best to look at the “coverage prob-ability” (Vollset, 1993; Agresti and Coull, 1998; Brownet al., 2001) for the interval. This interval is simply theproportion of times that h is expected to fall within itsnominal confidence interval. There are several differentways that frequentist confidence intervals could beassigned, including the three we mentioned in the sec-tion on maximum likelihood estimation. We followAgresti and Coull (1998) in focusing on the Wald method(this is the asymptotic normal method), the exactmethod (Clopper and Pearson, 1934), and what theyrefer to as the modified Wald method, but that we referto as Agresti–Coull following later publications (Brown

Fig. 2. Posterior density function for the proportion of malesout of all dead infants, where the prior density was a uniformdensity on 0–1, and the likelihood function was based onobserving 9 males out of 13 individuals. The entire shaded(both light and dark gray) region is a frequentist-style 95% con-fidence interval for h. The light gray region has lower probabil-ity densities than the dark gray region.

Fig. 3. The 95% HPD region for h is shown with gray shad-ing. The vertical lines represent the 95% confidence interval(from Fig. 2) which, because of the asymmetry of the posterior,is shifted to the left and is slightly wider than the 95% HPD.

156 L.W. KONIGSBERG AND S.R. FRANKENBERG

American Journal of Physical Anthropology

Page 5: Bayes in biological anthropology

et al., 2001; Miao and Gastwirth, 2004; Tobi et al.,2005).

Figure 4 shows plots of coverage probabilities againsth for the asymptotic, Agresti–Coull, and exact method95% confidence intervals using a sample size of 13 fol-lowing our example by Mays and Faerman (2001). Theasymptotic method gives confidence intervals that aretoo narrow, having coverage probabilities at or slightlyabove 0.95 only in a tight band around h values of 0.45and 0.55 and at points 0.21 and 0.79 (coverage plots aresymmetric around the center point of h50:5). At h59=13,the coverage probability for the 95% confidence interval(which runs from 0.441 to 0.943, as mentioned in thesection on maximum likelihood estimation) is 0.922.Conversely, the exact method confidence intervals areconservative in that they have higher coverage probabil-ities than they should at a given confidence level. Ath59=13, the coverage probability is 0.970 for the exact95% confidence interval (from 0.386 to 0.909, again seethe section on maximum likelihood estimation). TheAgresti–Coull interval also gives a coverage probabilityof 0.970 at h59=13 (from 0.420 to 0.876), but the exactintervals are less conservative for other values of h.

Unlike the “coverage probabilities” for the threefrequentist-style confidence intervals, which are too nar-row or too broad, a Bayesian HPD has the proper cover-age probability even at this rather small sample size of13 individuals. Figure 5 shows a plot of the realized cov-erage against nominal HPD regions ranging from 0.50 to0.99 in increments of 0.01 at a sample size of 13. Thisfigure was drawn based on 100,000 simulations whichwe describe in some detail as it makes the Bayesianmodel more explicit. The first step in the simulation wasto generate 100,000 values between 0 and 1 from theuniform density, which simulates sampling from theprior distribution. The second step was to generate abinomial variate at a sample size of 13 using the100,000 uniform deviates. These 100,000 pairs of simu-lated h and x values (counts of from 0 to 13 males) thenformed the “data” to be assessed. The third step was to

loop through HPD values of from 0.50 to 0.99 in incre-ments of 0.01, filling in a 14 row (from x 5 0 to 13) bytwo column (lower and upper bounds) table of calculatedintervals within that loop. Finally, the simulated h val-ues were compared to the tabled values at their paired xvalue and the proportion of h values within their HPDswas assessed. For example, a simulated h of 0.1137 andx value of two “males,” is counted as being within the66% to 99% HPDs but not within the 50–65% HPDs

Fig. 4. Plots of coverage probabilities from three methods of calculating 95% confidence intervals using a sample size of 13 indi-viduals. The asymptotic method (normal approximation to the binomial) gives confidence intervals that are too narrow and cover-age consequently is below the desired 95%.

Fig. 5. Plot of actual coverage against the nominal BayesianHPD regions based on sample size of 13 individuals. The plotwas produced using 100,000 simulated pairs of h and x (integervariable from 0 to 13), where h was drawn from a uniform andx was drawn from the binomial using the simulated h value.Actual coverage was found empirically at nominal HPDs offrom 0.50 to 0.99 in increments of 0.01. The diagonal line is theline of identity.

BAYES IN BIOLOGICAL ANTHROPOLOGY 157

American Journal of Physical Anthropology

Page 6: Bayes in biological anthropology

because the lower HPD boundary is at 0.1139 for the65% HPD and at 0.1124 for the 66% HPD when thereare 2 males out of 13 individuals.

BAYES’ THEOREM

Thus far, we have considered Bayes’ postulate and theroles of the likelihood, the prior, the posterior, and HPDsin Bayesian analysis. But how can we move from Bayes’postulate to Bayes’ Theorem proper? This involves gen-eralizing the prior distribution so that we can considerother prior distributions beyond the uniform. As pointedout above, Bayes’ postulate uses a particular prior den-sity: the uniform distribution between zero and one.This prior can be written as the beta probability densityfunction Be 1; 1ð Þ, where the two “shape parameters” areequal to 1.0. This is a conjugate prior density, meaningthat when “updated” by the likelihood, it produces a pos-terior density that has the same distributional form asthe prior. The convenience of a conjugate prior is thatthe onus of integrating the denominator in Eq. (6) isremoved, although in Bayes’ postulate the denominatoris simply 1= n11ð Þ(and see Stigler’s, 1982, p 252 com-ments about Bayes’ “scholium”). Raiffa and Schlaifer(1961, p xii), who first defined the term “conjugate prior,”noted:

. . .we can obtain a very tractable family of “conjugate”prior distributions by simply interchanging the roles ofvariables and parameters in the algebraic expressionfor the sample likelihood, and the posterior distributionwill be a member of the same family as the prior. Thisprocedure leads, for example, to the beta family of dis-tributions in situations where the state is described bythe parameter p of a Bernoulli process. . .

Following this procedure, we can consider prior densitiesother than the uniform, which moves us from Bayes’postulate to Bayes’ Theorem or Rule (both monikers areused in the literature).

In our example, Bayes’ Theorem is:

f hsjx59;n513ð Þ5 p x59jn513; hsð Þf hsð Þð1

0

p x59jn513; hð Þf hð Þdh

; (7)

where f hð Þ represents the prior density. If the prior den-sity in Eq. (7) is a beta density function with “shape”parameters a and b, then the posterior density isBe a1x; b1n2xð Þ. In maximum likelihood estimation weknow that the mean and mode both equal x=n, so we canconsider what values of a and b in the beta prior wouldreproduce the maximum likelihood mean or mode. If, asin Bayes’ postulate, the beta prior is Be 1;1ð Þ, then theposterior mean will be 11x

21n and the posterior mode willbe x=n. If the beta prior is Be 0; 0ð Þ, then the posteriormean will be x=n but the posterior mode will be x21

n22. TheBe 0;0ð Þ prior, known as Haldane’s prior (after Haldane,1932) is an example of an improper prior because thedensity goes to infinity at the borders of 0 and 1. Onecould compromise between Be 1;1ð Þ and Be 0; 0ð Þ, andinstead use Be 0:5; 0:5ð Þ as the prior, which gives a poste-rior mean of 0:51x

11n and a posterior mode of x20:5n21 . This lat-

ter prior is known as a Jeffreys prior (after Jeffreys,1946).

All three priors (Bayes’ original uniform, Haldane’s,and Jeffreys’) can be referred to as uninformative priorsbecause they are rapidly “dominated” by the likelihoodfunction after examining even a little bit of data. Theexistence of multiple uninformative priors raises a chiefcriticism made against Bayesian inference: if oneexpresses lack of knowledge about a parameter by usinga uniform prior on one scale (for example a proportion,0 � h � 1), then this lack of knowledge should apply on atransformed scale (e.g., the arcsin transformed propor-tion on 0, p=2). However, as we demonstrate below, thereis scarcely any difference between using one uninforma-tive prior and another once there is a modicum of data.Consider Jeffreys’ prior. We decide to transform h inorder to reduce the asymmetry in Figure 2. The arcsintransformation (Bartlett, 1937), /5sin 21

ffiffiffihp

, is one suchpossible transformation. Figure 6 shows the posteriordensity for the same data as used in Figure 2, exceptthat here the proportion of males is expressed in radians(from the arcsin transformation) rather than as astraight-scale proportion of males. Note that the asym-metry for the scaled likelihood in Figure 6 is much lessthan for Figure 2 (the light gray region is muchreduced). Figure 6 also shows the uniform prior as adashed line, which in the arcsin transformation has adensity value of 2=p for any value between sin 21

ffiffiffi0p

50and sin 21

ffiffiffi1p

5p=2. This is again a proper prior densityin that 2=p3p=251.

Now suppose we want to move in the other directionfrom Figure 6 back to Figure 2 in order to see our analy-sis on the original proportion of males scale. If we under-take this task, then the uniform prior from the radianscale shown in Figure 6 becomes the Be 0:5;0:5ð Þ betadistribution on the proportion of males scale, or Jeffreys’prior again. Similarly, Haldane’s prior is a uniform dis-tribution in the logit scale: log h

12h

� �. Thus, Bayes’ origi-

nal uniform prior is the uninformative prior for thebinomial likelihood in the proportion scale, Jeffreys’prior is uninformative in the arcsin transformed scale,and Haldane’s prior is uninformative in the logit scale.

Fig. 6. Posterior density for the arcsin transformed propor-tion of males (compare to Fig. 2).

158 L.W. KONIGSBERG AND S.R. FRANKENBERG

American Journal of Physical Anthropology

Page 7: Bayes in biological anthropology

If we have no prior knowledge about the proportion ofmales then it stands to reason that we also have no priorknowledge about the arcsin or logit transformed propor-tion of males, but in point of fact, there is scarcely anydifference between using Jeffreys’ prior and the uniformprior once data is added. We show this in the “triplot”(O’Hagan, 2004) of the standardized likelihood, Jeffreys’prior, and the posterior density in Figure 7. Note thatunder a uniform prior (on the proportion of males scale)the standardized likelihood and the posterior densitycoincide, as in Figures 1 and 2. In Figure 7 the Jeffreys’prior has barely nudged the posterior density away fromthe standardized likelihood. The point of this exercise isto demonstrate that a uniform prior becomes non-uniform with transformation of a parameter, a valid cri-tique of a Bayesian approach that has no practical effectwhen the likelihood dominates the prior.

Sequential use of Bayes Theorem

Thus far we have generalized the uniform prior to con-sider other types of uninformative priors, but we canalso use this generalization to create an informativeprior. Mays and Faerman’s (2001) study is not the onlyone to have used aDNA to assess the proportion of malesamong infants from Roman era sites. After examiningtheir data, Mays and Faerman also included a count ofthree male infants and one female infant from the Bed-dingham Roman Villa (Waldron et al., 1999). Thisbrought the total to 12 males and 5 females. Althoughthey did not include aDNA data from the Late RomanEra site of Ashkelon (Faerman et al., 1998), this data(14 males and 5 females) could potentially be combinedwith the 12 males and 5 females to yield 26 total malesand 10 females. Using Jeffreys’ prior, the posterior den-sity would then be a Be 26:5; 10:5ð Þ distribution. The 95%HPD for the proportion of males from this distribution isfrom 0.558 to 0.845. Courgeau (2010, 2012, p 114–116)describes how Laplace used data on live births fromParis and London and “inverse probability” to assess theprobability that the sex ratio at birth for Paris washigher than for London. As this probability was nearlyzero, Laplace opined that London, at 0.513, had thehigher proportion of male births. The 95% HPD for the26 males and 10 females from Roman sites excludes thisvalue of 0.513, so similarly we might suspect that theRoman era sites provide more male infant deaths thanexpected from modern data.

In the above paragraph, we approached the analysisas if we started with Jeffreys’ prior and then used thedata on 36 infants (Faerman et al., 1998; Waldron et al.,1999; Mays and Faerman, 2001) to find the posteriordensity of Be 26:5;10:5ð Þ. But we can also use yesterday’sposterior as tomorrow’s prior, so we could treat the datain the order that it arrived in the literature. Startingwith Jeffreys’ prior and the Faerman et al. (1998) datagives a posterior density of Be 14:5; 5:5ð Þwhich we coulduse as a prior for Waldron et al.’s (1999) data to obtain aposterior of Be 17:5; 6:5ð Þ. This in turn could be used as aprior for Mays and Faerman’s (2001) data to arrive atthe posterior of Be 26:5;10:5ð Þ. Alternatively, we couldtake the data in reverse order, again beginning with Jef-freys’ prior but adding Mays and Faerman’s (2001) datato obtain a posterior of Be 9:5; 4:5ð Þ, then adding Waldronet al.’s data (1999) data to arrive at a posterior ofBe 12:5; 5:5ð Þ, and finally adding Faerman et al.’s (1998)data to again arrive at a posterior density of

Be 26:5; 10:5ð Þ. Figure 8 shows how these two paths botharrive at the same answer. Panel A shows data incorpo-rated in ascending order of date of publication, whereasPanel B shows descending order; both panels should beread from the top down.

Predictive density

The examples in the preceding section show how wecan use a posterior density to create a new informativeprior for analyzing additional data. We can also use a pos-terior density to generate probabilities of observing cer-tain parameter values in a new sample. Forming theproduct of Eq. (1) with a posterior density and then inte-grating across h from zero to one gives the predicted val-ues if we were to obtain a new sample. For example, if westart with Be 26:5;10:5ð Þas our posterior density for h (theproportion of males across the three studies from the pre-vious section), and we then obtain a new dataset of 15infants sexed by aDNA from a Roman site, the predictivedensity gives us the probabilities of observing 0, 1, 2, …,14, or 15 males in the new sample. Using a and b for the“shape parameters” in the beta posterior density (26.5and 10.5 in this example) we can write the joint probabil-ity for h (the proportion of males) and for obtaining ynumber of males in a future sample of m individuals.This is the product of the binomial probability for obtain-ing y males out of m individuals given h and the beta pos-terior density for h. Integrating across h from 0 to 1 givesthe predictive probability distribution for y, which is:

p yð Þ5ð10

f h; yð Þdh5m

y

!B a1y;b1m2yð Þ

B a; bð Þ ; (8)

where B a;bð Þ is the beta function, the integral of thebeta density between 0 and 1. Equation (8) is the beta

Fig. 7. “Triplot” (O’Hagan, 2004) for the Bayesian analysisof the proportion of males where the data are 9 males out of 13individuals and the prior is Be 0:5; 0:5ð Þ(Jeffreys’ prior). Notethat the posterior has not moved much from the likelihood.Under a Be 1;1ð Þprior, the posterior and the likelihood would beidentical.

BAYES IN BIOLOGICAL ANTHROPOLOGY 159

American Journal of Physical Anthropology

Page 8: Bayes in biological anthropology

binomial distribution BB y; a;bð Þ. Figure 9 shows ourexample using BB 15;26:5;10:5ð Þ. While we will not havemuch further need to use the predictive distributionanalytically in this article, it is a useful point of depar-ture for demonstrating the Gibbs sampler (Casella andGeorge, 1992). It also can be used to check on the rea-sonableness of a model. If the (posterior) predictive den-sity does not do a good job of “predicting” the dataoriginally used in the analysis, then the analysis itself issuspect.

Bayes factor

While the posterior predictive density is useful forchecking the reasonableness of a single model, the mea-sure for choosing between competing models is Bayesfactor. Kass and Raftery (1995) give a useful expositionof Bayes factors, and Kruschke (2010a) provides adetailed example for the binomial distribution. If wehave two competing models that potentially could havegenerated an observed data set, then we can form a ratioof the probabilities of observing the data under each ofthe two models. This ratio of probabilities is referred toas a Bayes factor. It is identical to a likelihood ratio onlywhen the two models are specified such that the parame-ter(s) of interest is (are) point values. If instead themodel gives the parameter value(s) across an interval,then the parameter value must be “integrated out,”which for a beta prior leads to the beta binomial, as wesaw above. If the quotient of the probabilities of Model1/Model 2 is greater than one, then the first model ismore strongly supported by the data being considered.Various scales have been offered for translating Bayesfactors into statements about relative plausibility ofmodels. For example, Jeffreys (1939, p 357) classifiedBayes factors (which he called “K”) into six grades, run-ning from grade 0 for “null hypothesis supported” tograde 6 for evidence against the null is “decisive.”

In the example we have been following, let’s considerthe simplest Bayes factor first: comparison of a uniformprior to a prior with a point mass of 1.0 on h50:513. Fora uniform prior, the probability of getting any countfrom 0 up to 13 males out of 13 individuals is 1/14, sothe probability of the observed data of 9 males out of 13individuals is 0.0714. The probability of getting 9 malesout of 13 individuals if h50:513 follows directly from Eq.(1), and is about 0.099. The Bayes factor comparing amodel with h50:513 to that with a uniform prior is

Fig. 8. A: Sequential use of Bayes’ Theorem taking the data in the order in which they arrived in the literature. B: Sequentialuse of Bayes’ Theorem taking the data in the reverse order in which they arrived in the literature.

Fig. 9. Posterior predictive distribution for the number ofmale infant deaths out of 15 individuals where the posteriordistribution is Be 26:5; 10:5ð Þ.

160 L.W. KONIGSBERG AND S.R. FRANKENBERG

American Journal of Physical Anthropology

Page 9: Bayes in biological anthropology

therefore 0.099/0.0714 or about 1.4, which demonstratesthat a model with a point mass at 0.513 does a better jobof providing the observed data than does a uniform prior.

Now let’s consider Bayes’ factors for models we createdby sequentially using Bayes Theorem. The data of ninemales and four females from Mays and Faerman’s (2001)study was the last to arrive in the literature, so we couldtake as one model a Be 17:5; 6:5ð Þ prior based on Jeffreys’prior and the fact that 17 male and 6 female infants hadbeen reported in the literature for Roman era infants.The beta distribution has a mean of a= a1bð Þ and var-

iance of ab= a1bð Þ2 a1b11ð Þh i

, so the Be 17:5;6:5ð Þ distri-

bution has a mean of about 0.729 (proportion of males)and standard deviation of about 0.0889. For an alterna-tive model, we might presume that the mean proportionwas 0.513 and the standard deviation was 0.1. Thiswould correspond to a Be 12:3;11:7ð Þ prior. The ratio ofBB 9;13;17:5;6:5ð Þ=BB 9; 13;12:3;11:7ð Þ, which is theBayes factor for comparing these two models, is about1.671. A Bayes factor of 1.671 classifies as a grade 1, or“not worth more than a bare mention,” on Jeffreys’ scale.If, on the other hand, we consider all of the data fromFaerman et al. (1998), Waldron et al. (1999), and Maysand Faerman (2001) under a Jeffreys prior and comparethat to the probability of getting the same data if hwas exactly equal to 0.513, then we haveBB 26;36; 0:5;0:5ð Þ in the numerator and the binomialprobability for getting 26 males out of 36 individuals(with h50:513) in the denominator. This gives a Bayesfactor of about 3.5, which classifies on Jeffreys’ scale asa grade 2, or “evidence against the null is substantial.”

Interpretively, it is useful to look at the Bayes factorin a bit more detail. As Kass and Raftery (1995) show,Bayes Theorem can be used to manipulate Bayes factorso that we have:

p M1jDð Þp M2jDð Þ5

p DjM1ð Þp DjM2ð Þ3

p M1ð Þp M2ð Þ ; (9)

where the term on the left is the posterior odds forModel 1 conditioned on the observed data against Model2 conditioned on the same data. The first term on theright side of the “equals” sign is the Bayes factor, or theodds of getting the data under Model 1 versus underModel 2. The final term is the prior odds placed onthe two models. Typically, one assumes thatp M1ð Þ5p M2ð Þ50:5, canceling out the final term, so thatthe posterior odds are equal to Bayes factor. Equation(9) looks very similar to one that is frequently used inforensic work, where the posterior odds of two hypothe-ses are equal to the likelihood ratio times the prior oddsfor the hypotheses. In the forensic setting the two com-peting hypotheses are typically “guilty” versus “notguilty” (see for example (Lucy, 2005, p 112–114)). As wepointed out above, whenever models are such thatparameters are exact points, then the Bayes factor is thesame thing as a likelihood ratio.

A normally distributed parameter

Application of Bayes Theorem is certainly not limitedto data that has a discrete distribution. In this section,we turn to the problem of estimating the stature for A.L.288-1 (“Lucy”), a problem that can be addressed usingboth a “likelihoodist” approach (Sober, 2002) or a Bayes-ian approach. Konigsberg et al. (1998) gave the classical

calibration estimator for stature from femur length as104:613:47513Fem , which with A.L. 288-1’s femurlength of 281 mm gives an estimated stature of 1081mm. Classical calibration in this context finds the stat-ure at which the observed femur length is most likely tohave occurred. They also gave an “integrated meansquared error” from this method equivalent to 2,369. Inthe Bayesian setting, we would refer to the integratedmean squared error as the “data variance,” as it reflectsour uncertainty in the estimated stature due to theimperfect correlation between stature and femur length.The “data variance” is equal to Vstat

1r2 21� �

, where Vstat

is the variance of stature, and r is the correlation of stat-ure with femur length within a reference sample (forwhich Konigsberg et al. used 2,053 modern humans).Note that when the correlation between femur lengthand stature is 1.0, the data variance for the estimatedstature is 0. Putting the estimated stature and its datavariance together we can write stat � N 1;081;2;369ð Þ,meaning that the data from A.L. 288-1’s femur lengthand the reference sample imply that A.L. 288-1’s esti-mated stature is normally distributed around a mean of1,081 mm with a variance of 2,369.

Using a “likelihoodist” approach, we would stop at thispoint and use stat � N 1;081;2;369ð Þ as our best esti-mate. In contrast, a Bayesian approach would combinethis information from data with a prior density for stat-ure. Lee (2012, p 40–42) gives a good presentation ofhow the normal distribution is the conjugate prior for anormal likelihood. For the purposes of illustration, wecould use the reference sample stature distribution asthe prior, in which case we have the normal distributionN 1;725;7;270ð Þ, where 1,725 mm is the mean statureand 7,270 is the variance of stature. This is not a rea-sonable prior in the “real world,” as we know that over-all Lucy is quite small as compared to modern humans.Continuing with the Bayesian approach, we use theinverse of the data and prior variances, each of which isreferred to as a “precision.” The posterior precision issimply the sum of the data precision and the prior preci-sion, which gives 1

7;270 1 12;369 55:59631024, or a posterior

standard deviation of 42.3 once we invert the posteriorprecision and take the square root. The posterior meanis equal to the weighted average of the data mean (1,081mm) and the prior mean (1,725 mm), where the weightsare the respective relative precisions (i.e., the data preci-sion and the prior precision each divided by the posteriorprecision). This gives 4:221

5:596

� �31;0811 1:375

5:596

� �31;72551;239,

which agrees with the estimate obtained using “inversecalibration” summarized in Konigsberg et al.’s (1998)Table 2. Because the reference sample size is large(n 5 2,053), we ignore the fact that the variances, corre-lation, and means are estimated rather than known.Konigsberg et al. (2006) show how Gibbs sampling, atype of computer simulation, can be used to form the fullposterior density of stature for an individual when onlya small reference sample is available.

COMPUTER SIMULATION AND BAYESIANSTATISTICS

In the simple examples given above, we were able toanalytically compute and use full posterior probabilitydistributions to perform Bayesian inference. While usinga conjugate prior on relatively simple models like thebinomial and a univariate normal removes the onus ofnumerical integration, there are many more complicated

BAYES IN BIOLOGICAL ANTHROPOLOGY 161

American Journal of Physical Anthropology

Page 10: Bayes in biological anthropology

problems that cannot be so easily handled. In caseswhere the prior and posterior do not take the sameform, and/or when the probability distributions of inter-est are multivariate or otherwise complex, we may giveup solving analytical equations and replace symbolic ornumerical integration with computer simulation. Thisshift is responsible for much of the fluorescence ofBayesian analysis in the last two decades (Gilks et al.,1996; Lunn et al., 2000, 2009; Gelman et al., 2004;Sturtz et al., 2005; K�ery, 2010; Kruschke, 2010a,b;Ntzoufras, 2011; K�ery and Schaub, 2012), as we noted inthe introduction. Complex Bayesian analyses use sam-pling techniques based on Monte Carlo methods to esti-mate, rather than calculate, the posterior distribution.In the Appendix, we illustrate five Bayesian simulationmethods, the first of which, approximate Bayesian com-putation (ABC), uses acceptance sampling to approxi-mate or build a posterior distribution. The foursubsequent simulation methods—the Metropolis sam-pler, slice sampling, adaptive rejection sampling, andGibbs sampling—are used within Markov Chain MonteCarlo (MCMC) methods to sample from conditionaldistributions.

Each method described in the Appendix has differentstrengths and weaknesses, and is useful and appropriatefor different kinds of problems or questions. Conven-iently, a number of freely available specialized softwarepackages for applying MCMC methods behave as expertsystems that attempt to utilize the most appropriatesampling scheme, thus removing the decision of whichsampling simulation method is most appropriate fromthe user. Less conveniently, the expert systems do nottypically reveal how they selected the most appropriatesimulation method. Lunn et al. (2000, p 328) explain thehierarchical logic used by one of the expert systems,WinBUGS. While the software automatically choosesamong simulation methods, we believe it is neverthelessimportant to understand how these different modelswork. Our illustrations in the Appendix continue usingthe aDNA sexing for Roman Era infants example, eventhough we were able to apply direct analytical methodsto it. The advantage of staying with this example is thatcomputer simulation results in the Appendix can becompared to the analytical results.

Using OpenBUGS for paleoanthropologicalanalysis

For all of the very modest examples of Bayes Theoremup to this point, we have been using the statistical com-puting and graphics environment, R (R DevelopmentCore Team, 2013). While it is certainly possible to buildMCMC models in R, as reflected in recent years in theAJPA (F€urtbauer et al., 2013; Gillespie et al., 2013;S�eguy et al., 2013), it is more typical for researchers touse one of the freely available packages to apply MCMC(Millard and Gowland, 2002; Barik et al., 2008; Matsudaet al., 2010; Babb et al., 2011; Matauschek et al., 2011;Yang et al., 2012; Gilmore, 2013; Muchlinski et al., 2013;Raaum et al., 2013; Zinner et al., 2013). Among thefreely available packages are: BATWING, BayesTraits,BEAST, JAGS, MrBayes, OpenBUGS, Stan, STRUC-TURE, and WinBUGS. Many of these packages arefocused around phylogenetics or population structure,but others (JAGS, OpenBUGS, Stan, WinBUGS) arequite general. We give a brief example from one of thesemore general packages (OpenBUGS) using published

summary statistics by Uhl et al. (2013). Uhl et al. givethe vector of means and the variance–covariance matrixfor log scale measurements of body mass, three humeralmeasurements, and three femoral measurements from600 modern humans. They then proceed to use profilelikelihood and Bayesian methods to estimate body massfor KNM WT-15000 (“Nariokotome boy”). The “profilelikelihood” is simply the likelihood function “afterremoval of nuisance parameters” (Brown, 1993, p 2),which in this case are the mean measurements and var-iance–covariance terms within the reference sample. Werevisit Uhl et al.’s (2013) analysis here starting fromtheir summary statistics and using OpenBUGS in placeof the direct calculations to reevaluate body massestimates.

Figure 10 shows the OpenBUGS code for running thisexample, which is identical to the code for WinBUGS.Both OpenBUGS and WinBUGS are descendants ofBUGS, but we use OpenBUGS (version 3.2.2 rev 1063,July 15, 2012) here because it is open source and undercontinuous development, whereas WinBUGS is no longerunder development and is consequently “frozen” at ver-sion 1.4.3 (August 6, 2007). Figure 10 contains a “model”and “data.” The model first states two different priors,one of which can be commented out in order to repro-duce either the profile likelihood results or the Bayesianresults from Uhl et al. (2013). The remaining code givesthe multivariate regression of log measurements(humerus minimum midshaft diameter, humerus epicon-dylar breadth, femur anterior-posterior midshaft diame-ter, and femur medial-lateral midshaft diameter) on logbody mass for the 600 modern humans. The “data” state-ment contains the log measurements for KNM WT-15000 and “tau.” “Tau” is the inverse of the residual var-iance–covariance matrix among bone measurementsafter “regressing out” log body mass. This matrix as wellas the regressions was found directly from the summarystatistics from Uhl et al. (2013). With access to the rawdata one could also model this matrix as having a Wish-art prior with low prior precision (see Appendix 2 in thework by Konigsberg et al., 2006) in order to include theuncertainty in calculating the multivariate regression.With 600 cases this would have little effect.

Figure 11 shows the output from 10,000 iterations inOpenBUGS, under both an uninformative prior and aninformative prior using data from 600 modern humansfor four bone measurements for KNM WT-15000.Because the prior is conjugate to the likelihood (the mul-tivariate normal), the sampling is direct and the pro-gram does not actually use MCMC sampling. As aconsequence, the issues of “burn-in” and autocorrelationswithin the “chain” (which we take up for later analysesin OpenBUGS) can be ignored. In Figure 11 the full pos-terior densities are shown using kernel density estima-tion, and the 0.025, 0.5, and 0.975 quantiles from the10,000 iterations are shown with vertical dashed lines.For comparison, the reported quantiles by Uhl et al.(2013) are shown using filled points.

The quantiles from OpenBUGS are virtually identicalwith the quantiles published by Uhl et al. (2013), differ-ing by at most 0.8 kg. After examining an allometrydiagnostic and z-scores from Darroch and Mosimann(1985) for six shape variables (the four here as well asthe humerus maximum mid-shaft diameter and the ver-tical head diameter of the femur), Uhl et al. (2013) com-mented on the fact that the humerus maximummidshaft diameter for KMN WT-15000 appeared to be

162 L.W. KONIGSBERG AND S.R. FRANKENBERG

American Journal of Physical Anthropology

Page 11: Bayes in biological anthropology

“too large.” We can examine this by looking at the (pos-terior) predictive distribution for this measurement in amodel where all six variables depend on body mass andwe take an uninformative prior for body mass. On10,000 iterations, the predicted value for the humerusmaximum midshaft diameter was greater than or equalto the observed value of 29.9 mm only 1.67% of the time,showing that the observed value of 29.9 mm is quiteextreme if KMN WT-15000 followed the same allometryfor this variable (relative to the remaining variables) asseen in modern humans.

BAYES IN BIOARCHAEOLOGY

As has hopefully become apparent from the studiescited thus far, biological anthropological applications ofBayesian methods to human genetic, epidemiological,and demographic questions are not restricted to livingpopulations but can address archaeological and paleonto-logical populations as well. More or less Bayesianapproaches have become common in paleodemographyover the past decade or so, and are also making anappearance in the paleopathology literature. Here, weexamine the use of Bayesian inference in estimating theage-at-death structure for a bioarchaeological sample,using a toy example to illustrate how to estimate hazardparameters while accounting for uncertainty in skeletalage estimation at the same time. We also examine the useof Bayesian analysis to form full posterior densities toestimate disease prevalence for a bioarchaeological sam-ple, using another toy example to illustrate exactly howBayes’ Theorem can fit within paleopathology studies.

Bayesian modeling of mortality and uncertainage estimates

Konigsberg and Frankenberg (1992) noted that muchof prior paleodemographic analysis had an ersatz Bayes-ian flavor that followed from conditioning “age on stage.”The problem with these previous approaches is that theyessentially hid the prior distribution for age-at-death.

This problem of a hidden prior was implicit in Bocquet-Appel and Massett’s (1982) insightful critique of paleode-mography a decade earlier, where they noted that

Fig. 11. Posterior densities for body mass of KMN WT-15000 estimated from the humerus minimum midshaft diame-ter, humerus epicondylar breadth, femur anterior-posterior mid-shaft diameter, and femur medial-lateral midshaft diameterusing 10,000 iterations in OpenBUGS (see Fig. 10 for code).Posterior densities are shown as kernel density plots using bothan uninformative prior and an informative prior that incorpo-rates body mass data on 600 modern humans. The dashed verti-cal lines are the 0.025, 0.5, and 0.975 quantiles from the 10,000iterations, while the filled points are the quantile valuesreported by Uhl et al. (2013).

Model{# Uninformative prior (low precision) ln.Mass ~ dnorm(0.0,1.E-99)

# Prior from data on 600 modern humans # ln.Mass ~ dnorm(4.0471,12.61034)

# Regressions (everything in log scale)

pred[1] <- 2.1366 + 0.1816*ln.Mass pred[2] <- 3.5730 + 0.1311*ln.Mass pred[3] <- 2.9662 + 0.0958*ln.Mass pred[4] <- 2.6707 + 0.1538*ln.Mass

bone[1:4] ~ dmnorm(pred[1:4],tau[,])

Mass <- exp(ln.Mass) }

Datalist(bone=c(2.815, 4.007, 3.199, 3.190), tau=structure(.Data=c(153.12, -85.59, -51.9, -26.28, -85.59, 283.78, -48.32, -61.65, -51.9, -48.32, 160.36, -15.95, -26.28, -61.65, -15.95, 153.25), .Dim = c(4, 4))))

Fig. 10. OpenBUGS code for a portion of the analysis of body mass of KNM WT-15000 given by Uhl et al. (2013).

BAYES IN BIOLOGICAL ANTHROPOLOGY 163

American Journal of Physical Anthropology

Page 12: Bayes in biological anthropology

paleodemographic age-at-death distributions tended tomimic those of the reference sample on which age deter-mination methods were based. Konigsberg and Herr-mann (2002) used MCMC to fit hazard models toskeletal data, but they did not include a prior distribu-tion for the hazard parameters. As a result, theirapproach was decidedly non-Bayesian and was instead astochastic expectation-maximization approach (Dieboltand Ip, 1996) that led to maximum likelihood estimatesof the hazard parameters. Possibly as an over-reactionto the somewhat Bayesian past of age estimation inpaleodemography, the “Rostock volume” (Hoppa andVaupel, 2002) took a decidedly frequentist approach tothe field.

More recently, a number of applications in paleode-mography have taken a more explicitly Bayesianapproach (Chamberlain, 2000; Gowland and Chamber-lain, 2002; Millard and Gowland, 2002; Bocquet-Appeland Bacro, 2008; Caussinus and Courgeau, 2010; S�eguyet al., 2013). Of these, only Caussinus and Courgeau(2010) and S�eguy et al. (2013) take a fully Bayesianapproach by applying MCMC to produce the full poste-rior distributions for the proportions of individuals ineach age class. The MCMC can also provide the poste-rior densities for age-at-death for each individual as aby-product. This is not trivial, as one would hope thatbioarchaeological analyses would take into account theuncertainty in skeletal age estimation, as Konigsbergand Holman (1999) have argued is appropriate.

For the remainder of this section, we consider a “toy”example of obtaining the posterior density for the base-line að Þ and senescent components bð Þ of mortality in aGompertz model using a six-stage osteological age“indicator.” For the purposes of illustration, we makeseveral simplifying assumptions. To model progressionthrough an age “indicator” system with six orderedstates we assume that the progression to the next high-est state follows a log-normal distribution with means at2.8, 3.2, 3.4, 4.0, and 4.6 and a common standard devia-tion of 0.45. This translates into modal transition ages of13.4, 20.0, 24.5, 44.6, and 81.2 years on the straightscale. These transitions are based on Todd phase scoresfor 422 males from the Terry Anatomical collectionwhere the six stages are Todd I–III, IV–V, VI, VII–VIII,IX, and X (Katz and Suchey’s, 1986 “T2” scoring). Nextwe assume a Gompertz mortality model starting at age15 years with a50:001 and b50:14. Assuming that wehave 200 skeletons each of whom aged according to thelog-normal model and then died-off following the speci-fied Gompertz hazard, rounding to integers we shouldhave obtained 5, 17, 18, 88, 61, and 11 skeletons in eachof the six stages. Starting with these counts and thetransition analysis parameters, the maximum likelihoodestimates of the two hazard parameter values area50:0012 and b50:1339.

Figure 12 shows the OpenBUGS code for this “toy”example. Soliman et al. (2012) show that the gamma dis-tribution is an appropriate prior for the senescent com-ponent hazard in a Gompertz model, but they only do soafter assuming a discrete prior for the baseline mortal-ity. In their example they assume that values of from0.05 up to 0.50 in steps of 0.05 are a priori equallylikely. A diffuse gamma distribution can also be used asa prior for the baseline mortality. Indeed, the examplefrom “ReliaBUGS,” a package within OpenBUGS, whichuses the function dgpz() for the Gompertz distributiontakes diffuse gamma distributions as priors for both

parameters. While this works well when ages are known(as is the case in the example from “ReliaBUGS”) wefound that it did not work when ages are estimated withconsiderable uncertainty. We consequently used thedloglike() function in OpenBUGS which implements the“zeroes trick” used in WinBUGS. This simply requireswriting the log-likelihood from the Gompertz model oneach case, which is shown in the code as logLike[i].

The model shown in Figure 12 starts by making ran-dom draws on a gamma distribution with “shape” and“rate” parameters both equal to 1.0 to obtain a value forthe baseline and senescent mortality hazard parameters.Then for each of the 200 individuals, an age-at-death issimulated out of the current Gompertz model. In orderfor these simulated ages to represent actual ages-at-death, 15 years (the starting age of the Gompertz model)must be added to each. This age is then converted to thelog-scale as that is the scale for the “transition analysis.”Next, for each of the individuals the probability thatthey are in each of the six phases is found from the tran-sition analysis based on their current log age (the “phi”in Fig. 12 is the standard normal cumulative densityfunction). Finally, the observed phase data for each indi-vidual is shown as distributed by their individual cur-rent multinomial distributions.

Figure 12 also shows the “data” that underlie themodel. These are the observed phases, the mean (log)transition ages, and the common (log) standard devia-tion of transition ages. OpenBUGS uses the slice sam-pler (see the Appendix) for sampling from the twohazard parameters as well as the 200 ages-at-death.Because the slice sampler is being used within Gibbssampling there is considerable autocorrelation in theoutput and we consequently must discard the initial iter-ations and retain only some of the subsequent iterations.We “post-processed” the chain from OpenBUGS in the Rpackage “coda” (for “convergence diagnosis and outputanalysis,” see Plummer et al., 2006). In the interest oftime and space we do not report this here. Ultimatelywe ran 10,000 iterations in one chain as a “burn-in” andthen retained 10,000 values of both of the hazard param-eters (as well as some age distribution informationdescribed below). The 10,000 retained values were simu-lated using a “thinning” interval of 500 so that in actual-ity 5 million iterates were formed after the burn-in.

Figure 13 shows kernel density plots of the 10,000retained values from the posterior distribution for thebaseline and senescent components of mortality alongwith the maximum likelihood estimates drawn as log-normal distributions. Figure 13 also shows the prior dis-tributions used in the MCMC procedure. Of particularinterest in Figure 13 is the fact that maximum likeli-hood estimation leads to a far more peaked distributionfor the baseline hazard than in MCMC. This is a resultof the baseline hazard being near its lower boundary of0.0. In contrast, the distributions for the senescent com-ponent of mortality are more comparable.

One great advantage of using this method to modelthe posterior densities of mortality is that it also allowsus to form posterior densities of age-at-death for individ-uals. This can be done by “monitoring” the age iteratesfor a specific individual within each of the six phases.Figure 14 plots the kernel density estimates of theseposterior age distributions along with the posterior agedistributions obtained from the product of the maximumlikelihood Gompertz model with the “transition analysis”model (see for example Fig. 1 in DiGangi et al., 2009).

164 L.W. KONIGSBERG AND S.R. FRANKENBERG

American Journal of Physical Anthropology

Page 13: Bayes in biological anthropology

These are converted to true densities by dividing withineach phase by the integral of the function across age.The posterior densities obtained by the MCMC methodare similar to the ones from integration, but the MCMCposterior densities for age-at-death are “heavier” in thetails because the MCMC method accounts for the factthat the hazard parameters are being estimated (theintegration method assumes that both the baseline andthe senescent hazard parameter values are known).While the assumption of known hazard parameter val-ues may be justified in the forensic setting, it is rarely ifever justifiable in the bioarchaeological setting. Anotherfeature not explored in this example is the ability toinclude uncertainty in the transition analysis parame-ters themselves. As the logit and probit models (thebasis for transition analysis) are standard models usedwithin MCMC programs, this could be added in realanalyses.

Bayesian analysis of paleopathology

Just as Bayesian approaches in paleodemography canincorporate the uncertainty in age estimates, similarapproaches in paleopathology can incorporate the uncer-tainty due to small samples and/or to incomplete bonyexpression of particular disease states. Byers and Rob-erts (2003) show how to use “Bayes’ Theorem in Paleopa-thological Diagnosis” but they do not sketch how toconduct a Bayesian analysis. Specifically, they show howto calculate a posterior probability but do not show howto form the full posterior density for a parameter, suchas prevalence. We illustrate how to carry out a Bayesiananalysis that includes forming the full posterior densityfor prevalence using another “toy” example. We presumethat there is some disease that produces a specific bonelesion with a frequency of 80% in affected individualsand that individuals who do not have the disease will

model{for(j in 1:2) {par[j] ~ dgamma(1,1)} for(i in 1:200) {

# Gompertz prior

age[i]~dunif(0,80) dummy[i] <- 0 dummy[i] ~ dloglik(logLike[i]) logLike[i] <- log(par[1])+par[2]*age[i] + par[1]/par[2]*(1-exp(par[2]*age[i])) age.s[i]<-age[i]+15 ln.age[i]<-log(age.s[i])

# Likelihood from "transition analysis"

p[i,1]<- 1 - phi((ln.age[i]-mu[1])/SD) for(j in 2:5){ p[i,j]<- phi((ln.age[i]-mu[j-1])/SD) - phi((ln.age[i]- mu[j])/SD) } p[i,6]<- phi((ln.age[i]-mu[5])/SD)

# Phase has multinomial distribution

phase[i] ~ dcat(p[i,1 : 6]) } }

Datalist(phase = c(1,1,1,1,1,2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6), mu=c(2.8,3.2,3.4,4.0,4.6), SD = 0.45)

Fig. 12. OpenBUGS code for the paleodemographic example.

BAYES IN BIOLOGICAL ANTHROPOLOGY 165

American Journal of Physical Anthropology

Page 14: Bayes in biological anthropology

never form these lesions. Out of a bioarchaeological sam-ple of 43 individuals we observe four individuals whohave the specific bone lesion.

Byers and Roberts show how to combine an informa-tive prior (disease prevalence from previous studies)with likelihoods (in our case, the information that 80%of the diseased individuals will form the specific bonelesion) to form the posterior probability that an individ-ual had a particular disease. But we can proceed moredirectly to estimating the prevalence in our small sam-ple of 43 individuals. Given that the lesion can only beformed in those with the disease (with a probability of0.8), and given that we observed 4 out of 43 individualswith the lesion, we can solve for the prevalence at about0.1163, or that 5 out of the 43 individuals had the dis-ease. But this algebraic estimate ignores the fact wehave uncertainty in the prevalence due both to the smallsample size and the fact that we surmised that 80% ofthe sick would develop the lesion. All that we did wasdivide one proportion (4/43) that serves as the probabil-ity of observing the lesions by another proportion (0.8)that serves as the probability that a sick individualwould express the lesion (the “sensitivity” of the lesionto the presence of the disease).

The picture changes if we now note that our statementthat 80% of the sick will form the bone lesion was basedon a small study of 50 skeletons from individuals knownto have had the disease, and if we also account for thefact that our estimate of the lesion frequency in the bio-archaeological sample is based on only 43 skeletons. Ifwe want to form the full posterior density for the diseaseprevalence in our bioarchaeological sample then we needto recognize that this is the ratio of a beta density forthe lesion frequency in the bioarchaeological sampleagainst a beta density for the lesion frequency in (sick)individuals in the reference sample. Figure 15 shows theposterior densities for disease prevalence assuming thata large reference sample is available for calculating thelikelihood (i.e., the frequency of lesions in known sick

individuals), that a sample of only 50 individuals is

available, and that a sample of only five individuals is

available. In all three cases, we assume Jeffreys’ prior

for the probability that individuals in the bioarchaeologi-

cal sample will have bone lesions (so the posterior den-

sity is Be 4:5;39:5ð Þ) as well as for the probability that

sick individuals in the reference sample will have skele-

tal lesions.This latter assumption leads to a posterior density of

Be 40:5; 10:5ð Þ when there are 50 individuals andBe 5:5;1:5ð Þ when there are five individuals. When thereference sample size is large (even 100 skeletons isquite close to the asymptotic result) then the posteriordensity for prevalence in the bioarchaeological samplecomes directly from the Be 4:5; 39:5ð Þ posterior density.Specifically, if h is a variable representing the diseaseprevalence in the bioarchaeological sample while /50:8his the transformation that leads to observed lesions,then the posterior density for prevalence is0:83Be /;4:5;39:5ð Þ. When the reference sample size issmaller, the posterior density for the prevalence in thebioarchaeological sample is a ratio of beta distributions(Pham-Gia, 2000) with Be 4:5;39:5ð Þ in the numerator.Figure 15 shows that, at least for this example, the pos-terior density for prevalence in the bioarchaeologicalsample is not much affected by the reference samplesize. This is encouraging, but the example is quite artifi-cial, and it certainly does not account for the largebiases that can occur if the disease is expressed differ-ently in the bioarchaeological sample relative to the ref-erence sample.

The above example assumed a specificity of one suchthat individuals who do not have the disease will neverform the bone lesion, but specificity often can be lessthan one (meaning that at least some disease-free indi-viduals show the bone lesion). Boldsen (2001) considerssuch a case, presenting a novel application that usesconstrained non-linear regression to estimate the

Fig. 13. Comparison of posterior density for the baseline (panel A) and senescent (panel B) components of mortality (shown askernel density estimates from 10,000 iterates in OpenBUGS and labeled “MCMC”) to the maximum likelihood estimates (MLE).The MLE density is drawn as a log-normal using the maximum likelihood estimate and its asymptotic standard error. The priordensity for the MCMC is also shown.

166 L.W. KONIGSBERG AND S.R. FRANKENBERG

American Journal of Physical Anthropology

Page 15: Bayes in biological anthropology

prevalence of leprosy in three bioarchaeological samplesand to assess the sensitivities and specificities of sevenosteological “markers” that may be related to the pres-ence of leprosy. His model is a special case of what isknown as the Hui–Walter paradigm or model (Hui andWalter, 1980; Johnson et al., 2001; Toft et al., 2005;Berkvens et al., 2006). In one form of this model, two ormore “diagnostic” tests with unknown sensitivities andspecificities are applied across two or more samples withdiffering prevalence for the disease to which the testsrelate. The data within this model form a three dimen-sional array with one dimension having a length of two(for presence versus absence of the osteological“marker”), one dimension having a length equal to thenumber of osteological “markers,” and one dimensionhaving a length equal to the number of samples.

In the Hui–Walter model there are 2T1P (Pouillotet al., 2002) parameters to estimate where T is the num-ber of traits and P is the number of populations. The

“two times” is for the sensitivity and specificity of each“test” (marker) while the addition of P is for the numberof prevalances to be estimated. In Boldsen’s setting withthree archaeological samples and seven osteological“markers,” there are 17 parameters to be estimated. Theoriginal Hui–Walter model assumes that the “markers”are independent conditional on the unknown disease sta-tuses, and thus there are P 2T21

� �degrees of freedom

available from the data (Pouillot et al., 2002). Boldsen(2001) makes the stronger assumption that the“markers” are independent without the requirement ofconditioning on unknown disease status. The degrees offreedom from the data in Boldsen’s analysis are PT521which leaves four degrees of freedom with which to esti-mate the 17 parameters. Positive degrees of freedom area necessary condition for the model to be “likelihoodidentifiable,” or in other words this is a necessary condi-tion for there to be a unique local maximum to the likeli-hood function. But having positive degrees of freedom

Fig. 14. Comparison of posterior density of age conditional on osteological stage from MCMC and from integration using themethod of maximum likelihood.

BAYES IN BIOLOGICAL ANTHROPOLOGY 167

American Journal of Physical Anthropology

Page 16: Bayes in biological anthropology

alone is not a sufficient condition. Using a methoddetailed by Jones et al. (2010), it is possible to show thatBoldsen’s (2001) model is not likelihood identified, but ifcross-classifications of pathologies are included in thedata then there are identifiable models. Such models canbe fit using the program “TAGS” (http://www.epi.ucdavi-s.edu/diagnostictests/QUERY.HTM) which uses themethod of maximum likelihood. Joseph et al. (1995)described a Gibbs Sampler that can also be used to fit theHui–Walter model. This model, as well as more compli-cated models that include dependence between patholo-gies, has been fit using MCMC software (Branscum et al.,2004; Engel et al., 2006; de Clare Bronsvoort et al., 2010).Such models could be fit in paleopathological analyses pro-vided the cross-classifications of pathologies are available.

WHY FORENSIC ANTHROPOLOGISTS SHOULDBE BAYESIANS, BUT SELDOM ARE

In our experience, forensic anthropologists are amongthe strongest adherents of classical hypothesis testingapproaches. This seems puzzling on the surface, as thequestions that arise in forensic anthropology rarelyrelate to questions of whether or not to reject a nullhypothesis (see for example Taroni et al., 2010). Instead,such questions pertain to the relative strengths of com-peting hypotheses or to the characterization of posteriordensities of one or more parameters of interest. The fas-cination with P-values in forensic anthropology is nodoubt a result of past training as well as stasis in thefield. The use of “subjective probability” (see in particu-lar Chapter 2 by Courgeau, 2012) may make non-frequentist approaches appear un-scientific and gener-ally unappealing, and the use of prior information (evenwhen it is completely justifiable, as in Brenner andWeir, 2003) may appear problematic to forensic scien-tists. The Scientific Working Group for Forensic Anthro-

pology’s (SWGANTH) “product” for statistical methods isrelatively mute on the subject of Bayesian statistics,although it does note under “unacceptable practices” the“ad hoc formulation of priors when using Bayesianstatistics.” Finally, MCMC seems to be off the table as faras forensic anthropologists are concerned. Computer sim-ulation and the law seem to be an uncomfortable mix,despite the fact that MCMC can make complicated Bayes-ian networks easily interpretable using directed graphs.

To the above mentioned problems we can add the currentcourt position in the UK regarding the use of likelihoodratios and Bayesian inference for non-DNA expert testi-mony. The appeal in the case of “R v T” (names wereredacted in the court proceedings) brought into questionexpert testimony on a match between a suspect’s shoe andfootprints at a crime scene. The entire December 2012 issueof Law, Probability, and Risk was devoted to a discussion ofthis case. While opinions were divided among authors,Thompson (2012, p 347) rather emphatically stated that heconsidered the court’s position a “judicial blunder”:

I will say at the outset that I think R v T is an ineptjudicial opinion that creates bad law. The opinionwent awry because the justices who wrote it misunder-stood a key aspect of the evidence they were evaluating.The justices sought to achieve laudable goals, but theirmisunderstanding of basic principles of inductive logic,and particularly Bayes’ theorem, led them to exclude atype of expert evidence that, in general, is helpful andappropriate in favour of an alternative type. . . that isfundamentally inconsistent with the goals the courtsought to achieve. The case has already received severecriticism and will inevitably come to be seen for whatit is—a judicial blunder.

In light of these various problems and objections, wecontinue to be surprised by the extent to which at leastsome forensic anthropologists are “closet Bayesians.”Konigsberg et al. (2008) referred to the use of the per-centile method for age estimation in forensic anthropol-ogy, where the percentiles of ages within stages in areference sample are used to estimate ages in a targetsample, as a “hidden Bayesian” approach. Konigsberget al. (2009, p 84) in an article on “estimation and evi-dence in forensic anthropology: sex and race” stated:

We consider forensic anthropologists as being implicitBayesians because they often do bring prior informa-tion to their cases, though this information is typicallyimplicit, unstated, and not quantified.

One of our goals within forensic anthropology hasbeen to make the use of Bayesian inference explicit (seefor example Ross and Konigsberg, 2002). In this section,we give examples of how to make Bayesian inferenceexplicit in the analysis of commingled remains and invictim identification in mass disasters involving closedpopulations. We also spend some time clarifying the roleof prior information in forensic anthropology and theways in which conditional probabilities may be incor-rectly transposed and misinterpreted in trial settings.

Analysis of commingled remains

The analysis of commingled skeletal remains involvestwo related problems, one being the ability to pair left

Fig. 15. Example of estimating the posterior density for dis-ease prevalence in a bioarchaeological sample of 43 skeletons ofwhich 4 show a particular diagnostic lesion that has a“penetrance” of 0.8 for those with the disease and which isnever expressed in those without the disease. The asymptoticestimate assumes that the reference sample from which thepenetrance is estimated is large, while the N 5 50 and N 5 5lines assume reference samples at the stated sample sizes.

168 L.W. KONIGSBERG AND S.R. FRANKENBERG

American Journal of Physical Anthropology

Page 17: Bayes in biological anthropology

and right antimeres from the same individual and theother being the ability to re-associate different elementsfrom the same individual. We begin with the problem ofattempting to re-associate antimeres (Lyman, 2006;Byrd, 2008; Nikita and Lahr, 2011; O’Brien and Storlie,2011) and then turn to the problem of re-associating dif-ferent skeletal elements. We show in particular for theanalysis of bilateral elements the danger of only using aBayesian approach.

Re-associating bilateral elements. Previous stud-ies that have dealt with the problem of re-associatingpaired elements have generally used one or more meas-urements on antimeres from a proposed pair and com-pared the asymmetry from those measurements to thedistribution of asymmetry measurements obtained froma reference sample of paired bones. O’Brien and Storlie(2011) have recently taken an approach to this problemthat is implicitly Bayesian, but they do not indicate thattheir “Refit Probability” is actually a posterior probabil-ity based on assuming an equal prior of pairing a givenleft bone (or alternatively a given right bone) to each ofthe right bones (or alternatively each of the left bones).Further, they refer to this posterior probability as beinga likelihood, which further muddies the waters.

O’Brien and Storlie (2011) form vectors of differencesbetween left and right homologous measurements on agiven element within a reference sample of known pairs.They then use a multivariate normal density from thisreference data to find the point densities of getting vec-tors of differences between all pairs of bones within theirtarget sample and place these in a matrix of left bonesagainst right bones. O’Brien and Storlie then convertthis matrix into both a row-stochastic matrix (by divid-ing row elements by their associated row totals) and intoa column stochastic matrix (by dividing column elementsby their associated column totals).

Literally what O’Brien and Storlie have done is foundin their two stochastic matrices the posterior probabilityconditioning on the left side bones that the various rightside bones came from the same individual, and con-versely conditioning on the right side bones that the var-ious left side bones came from the same individual.O’Brien and Storlie’s Bayesian approach is problematicboth because the prior probabilities are not well justifiedand because their approach does not allow for the possi-bility that some recovered bones may not have theirantimeres present in the sample. In the following sec-tion, we will assume equal priors not by conditioning onparticular bones as O’Brien and Storlie have implicitlydone, but rather by enumerating all the possible“configurations” (ways to group bones into individuals).

In O’Brien and Storlie’s approach some of the posteriorprobabilities for matches may be quite high, even thoughthe likelihood (based on asymmetry measured for a pos-sible pair) is quite low. This is a common problem fromdiscriminant function analysis that is addressed by cal-culating a “typicality probability” (Konigsberg et al.,2009; Ousley et al., 2009; Ousley and Jantz, 2012). Here,the typicality probability is the probability of observingan amount of left/right asymmetry between two poten-tial pairs that is equal to or greater than some valuegiven the distribution of asymmetry between knownpairs in the reference sample. Calculating typicalityprobabilities is a frequentist approach that is preferredhere over O’Brien and Storlie’s ill-advised Bayesian

approach. In the following section, we skirt the problemof having to pair antimeres and calculate typicality prob-abilities by assuming that there is absolutely no left/right asymmetry (or that only one side is under consid-eration), that there are no missing bones, and that thenumber of individuals is known.

Re-associating different elements. Adams andByrd (2006) give an example of the analysis of “small-scale commingling” using data on the recovered remainsof two US soldiers from a military helicopter thatcrashed in Vietnam during 1969. In an analysis of thehumeri and femora they wish to determine whether aleft humerus should be associated with the femoraalready associated with “Individual 1” or whether theyshould be associated with “Individual 2.” Adams andByrd use inclusion versus exclusion within a 90% predic-tion interval for the regression of a composite humerusvariable on a femoral measurement, where the regres-sion is formed using data on 139 individuals. Thisapproach does not allow us to make probabilistic state-ments, and it is not clear what percentage should beused in making statements of inclusion or of exclusion.Ideally one would want exclusion based on a very wideor high percentage interval and inclusion based on avery narrow or low percentage interval.

We can give the analysis a more probabilistic bent,though it requires access to summary statistics from thereference data not given in the original publication. Thecorrelation coefficient and regression parameters pro-vided in the original publication are insufficient to char-acterize the bivariate normal assumed in fitting theregression. We can, however, extract the raw data usingData Thief (Tummers, 2006) and Adams and Byrd’s plotin their Figure 1, and recalculate the regression in orderto characterize the bivariate normal. While our regres-sion is similar to the published version (our correlationwas 0.906 versus Adams and Byrd’s 0.903, and ourregression parameters were 0.853 and 1.213 versusAdams and Byrd’s 0.844 and 1.251) it is not identical.This is primarily because of overlapping points thatwere difficult to visualize. Still, the basic reference dataare similar enough to prove instructive.

From the captured reference data we have means at3.864 and 4.511, variances of 8:75031023 and7:77131023, and a covariance of 7:46731023 for thefemur and humerus variables in the reference data.Adams and Byrd give an observation for a humerusfrom the helicopter of 4.541, which is the log of the sumof two humeral head measurements, rather than usingthe average of the two log-scale measurements, whichwould be a Darroch and Mosimann size variable (Dar-roch and Mosimann, 1985; Jungers et al., 1995). Theyalso give the log maximum diameter for the femoralhead from Individual 1 and from Individual 2 of 3.826and 3.867. Using the femoral head variable from Individ-ual 1 to calculate the 90% prediction interval for thehumerus variable (which we calculated for the recoveredstatistics as from 4.417 to 4.541), they (Adams and Byrd,2006, p 67) then show that the actual humerus value of4.541 is “located on the upper boundary of a 90% predic-tion interval.” Conversely, the observed humerus vari-able is within the 90% prediction interval (which wecalculated as 4.451–4.576) based on Individual 2’s femurvariable. The authors (Adams and Byrd, 2006, p 67) con-clude that “Based on this objective technique, the

BAYES IN BIOLOGICAL ANTHROPOLOGY 169

American Journal of Physical Anthropology

Page 18: Bayes in biological anthropology

humerus was included with the remains designated asIndividual 2 via exclusionary sorting.” In other words,because they find that the humerus measurement is onthe 90% prediction interval border when using Individ-ual 1’s femur as the predictor, they feel that they canexclude Individual 1 as the source of the humerus. Thehumerus is then included with Individual 2 becausethere is no other individual remaining and because itsvalue does fall within the 90% prediction interval basedon Individual 2’s femur.

The above technique is certainly replicable, but itsobjectivity is debatable. One can fairly arbitrarily formmany “inclusions” by making the prediction intervalquite wide. A 95% interval would have included bothIndividual 1 and Individual 2 as possible sources, whichis presumably the reason that Adams and Byrd insteadused a 90% prediction interval. One can also arbitrarilyform many “exclusions” by making the prediction inter-val narrower. A 53.3% prediction interval would haveplaced Individual 1 on the prediction interval border andIndividual 2 well above the border. The chief problemwith the current approach is that it violates what isknown as the “likelihood principle.”

Lee (2012, p 221) gives a very succinct and under-standable definition of this principle:

The nub of the argument here is that in drawing anyconclusion from an experiment only the actual observa-tion x made (and not the other possible outcomes thatmight have occurred) is relevant. This is in contrast tomethods, by which, for example, a null hypothesis isrejected because the probability of a value as large orlarger than that actually observed is small. . .

In the current context, what is wanted is the probabilityof obtaining the observed data (a humerus value of4.541) if the femur value was 3.867 (Individual 2’s value)as versus the probability of getting the humerus value ifthe femur value was 3.826 (Individual 1’s value). It doesus no good to concern ourselves with the concept ofhumerus measurements that are “more extreme” thanthe one actually observed.

In our analysis of the data from Adams and Byrd(2006), we consequently want to find the ratio of theprobability of getting a humerus value of exactly 4.541 ifthe femur value was 3.876 to the probability of obtainingthat humerus value if the femur value was 3.826. Theseprobabilities can be found using a t distribution withN 2 2 degrees of freedom. This ratio of probabilities(with Individual 2 as the predictor in the numerator andIndividual 1 in the denominator) is 2.965. This ratio canbe referred to as a likelihood ratio, as it is the ratio ofthe likelihood of obtaining the data (the humerus mea-surement) if Individual 2 was the source of the humerusdivided by the likelihood of the data if Individual 1 wasthe source. As previously, the likelihood is proportionalto the probability of obtaining the observed data given acertain hypothesis or set of parameter values. The likeli-hood ratio in this setting is the same thing as the Bayesfactor we saw earlier. The likelihood ratio literallymeans that the datum (the humerus value) was 2.965times more likely to have arisen from an individual witha femur measurement of 3.876 than from an individualwith a femur measurement of 3.826.

Another way of stating Bayes Theorem, and one thatis used quite frequently in forensic sciences, is that the

posterior odds are equal to the likelihood ratio times theprior odds. The prior odds are straightforward in thisexample. Without recourse to any data or observations,the humerus in question has a probability of one-half ofbeing from Individual 2 and one-half of being from Indi-vidual 1, so the prior odds are 1:1. As a consequence, theposterior odds are equal to the likelihood ratio, so theposterior odds are 2.965 “in favor” of the humerus beingfrom Individual 2. The posterior odds can be convertedto a posterior probability by dividing the odds by thequantity one plus the odds. This gives a posterior proba-bility of about 0.75 that Individual 2 was the source ofthe humerus.

It is useful to extend this example to a situation wherethe prior odds are not 1:1, which we can do by imaginingthat the helicopter contained three individuals. We alsointroduce an additional complication that stature isknown for the three individuals. This segues the exam-ple into the problem of forming identifications. For thisexample, we use summary statistics on stature, femurlength, and humerus length from Konigsberg et al.(1998) to simulate data from three individuals and thenuse this simulated data together with the summary sta-tistics to perform the analysis. We constructed two dif-ferent simulations, an “easy” one in which the threeresulting individuals had very different statures, and a“difficult” one in which Individuals 2 and 3 differ in stat-ure by only about 3 cm.

Table 1 shows the results of two simulation runs,where the first is the easy case of very different statures,and the second is the difficult case where two individu-als are of similar stature. The table also gives the pre-dicted long bone lengths given the statures. In bothcases we need to permute bones against individuals inorder to generate posterior probabilities for various boneconfigurations (bone pairs against individuals). Thereare six ways to permute the humerii against the threeindividuals and likewise there are six ways to permutethe femora, which leads to 36 (636) possible ways toconfigure the bones against individuals. Within eachconfiguration we find the bivariate normal density valuewithin individuals by finding the probability of gettingthe “assigned” bone lengths given the predicted (fromstature) lengths and the residual variance–covariancematrix after “regressing out” stature. If we assume thatthe individuals are unrelated, then we can multiplythese three probability values to get the likelihood of aparticular configuration. We assume that all 36 configu-rations are a priori equally probable, so each receives aprior probability of 1/36.

TABLE 1. Two Simulations of Three Individuals Each

Individual StatureHumerus

lengthFemurlength

Predictedhumerus

Predictedfemur

1 1660 313 444 311 4552 1742 341 478 339 4673 1869 381 526 369 517

1 1757 335 464 338 4752 1818 365 499 350 4933 1846 357 488 355 501

Humerus length” and “Femur length” were simulated using theconditional distributions on stature (from the regressions onstature) while the predicted values are the point predictionsfrom the regression of the bones on stature.

170 L.W. KONIGSBERG AND S.R. FRANKENBERG

American Journal of Physical Anthropology

Page 19: Bayes in biological anthropology

Figure 16 shows a plot of the posterior probabilitiesfor each of the 36 configurations in ascending order. Thetop three posterior probabilities in ascending order are0.0053, 0.0431, and 0.9517. We ignore all but the twoconfigurations with the highest posterior probabilities.These are labeled in Figure 16 where the column to theleft lists the humerii as A, B, and C, and the column tothe right lists the femora as D, E, and F. Each row ofthese small text matrices represents an individual, withIndividual 1 as the top row, Individual 2 as the middle,and Individual 3 as the bottom row. The configurationswith the two highest posterior probabilities both havethe humerii assigned in the same way but differ inwhether femora D and E are assigned to Individuals 1and 2, respectively (with a posterior probability of0.9517), or to Individuals 2 and 1, respectively (with aposterior probability of 0.0431). On the basis of theseresults, one might decide to have humerus A and femurE typed for DNA, and find that these bones have differ-ent genotypes. This information now enters into the like-lihood calculation so that any configurations that pairsA with E would have an overall likelihood of 0.0. Thisthen increases the highest posterior probability from0.9517 to 0.9945.

Figure 17 shows a comparable plot of the posteriorprobabilities but this time from the “difficult” simulationwhere two of the three individuals have similar statures.Now there are four configurations that have posteriorprobabilities greater than 0.026, which in ascendingorder are 0.1525, 0.1974, 0.2171, and 0.3546. Note fromFigure 17 that the configuration with the second highestposterior probability is indeed the correct configuration.Upon examining these four configurations one can seethat humerus A and femur D are always assigned toIndividual 1, but the pairing of humerii B and C withfemora E and F, and their identification with either Indi-vidual 2 or Individual 3, cannot be resolved. To resolvethis commingling one would minimally need DNA fromone of the remaining humerii (B or C) and one of the

remaining femora (E or F), and to form the identifica-tions one would also need an ante-mortem DNA samplefrom Individual 2 or Individual 3.

DNA exclusions (eliminations) are simpler to workwith than DNA inclusions (matches), so we presumehere that the analyst obtained an ante-mortem DNAsample from Individual 3 and post-mortem samples fromhumerus B and femur F (which are paired together onlyin the fourth highest posterior probability). The labresults come back with a match between Individual 3and femur F but an exclusion for humerus B againstIndividual 3. The match will typically be given as a like-lihood ratio which is the inverse of the genotype fre-quency in the “population at large.” This likelihood ratiois based on the idea that the DNA match has a probabil-ity of 1.0 if the ante-mortem and post-mortem samplesare from the same individual and a probability equal tothe population frequency of the genotype if the samplesare from different individuals.

For the moment we use this likelihood ratio for inclu-sion very conservatively, treating it only as the lack ofan exclusion. In other words, we cannot necessarily usethe DNA match between Individual 3 and femur F toassign femur F to this individual because we do notknow whether the remaining individuals might have thesame genotype. Using only exclusions, the configurationsthat previously had the 4th and 3rd highest posteriorprobabilities (shown at index 33 and 34 in Fig. 17) arenow eliminated because both pair humerus B withfemur F, and the configuration with the highest poste-rior probability (index 36 in Fig. 17) must be eliminatedbecause it places humerus B with Individual 3. Thisleaves only the (previously second highest posteriorprobability) configuration at index 35, which now has aposterior probability of 0.9162. Using the inclusion offemur F with Individual 3 (in other words, jettisoningany configurations that do not have this femur with thisindividual), the posterior probability rises from 0.9162 to0.9562.

Fig. 16. Sorted posterior probabilities for bone assignmentsto three individuals in Simulation 1, where A–C are humeriithat (in reality) came from Individuals 1–3, and D–F are femorathat (in reality) came from Individuals 1–3.

Fig. 17. The seven highest posterior probabilities for boneassignments to three individuals in Simulation 2, where A–Care humerii that (in reality) came from Individuals 1–3, and D–F are femora that (in reality) came from Individuals 1–3.

BAYES IN BIOLOGICAL ANTHROPOLOGY 171

American Journal of Physical Anthropology

Page 20: Bayes in biological anthropology

Identification in a “closed population” massdisaster

The previous section touched on the “victim identi-fication” problem when dealing with commingledremains. Here we expand on victim identification, mak-ing the simplifying assumption that there is no commin-gling of remains so that we are solely attempting tomake identifications. Rather than working directly withposterior probabilities as in the previous section, we willwork with likelihood ratios and prior odds as this ismuch more common in the disaster victim and personalidentification literature (Goodwin et al., 1999; Adams,2003b; Brenner and Weir, 2003; Alonso et al., 2005;Christensen, 2005; Lin et al., 2006; Steadman et al.,2006; Prinz et al., 2007; Kaye, 2009; Budowle et al.,2011; Butler, 2011; Hartman et al., 2011; Abrahamet al., 2012; Montelius and Lindblom, 2012; Jackson andBlack, 2013). As mentioned above, likelihood ratios fromDNA are typically reported as the inverse of the popula-tion frequency for the matched (between ante-mortemand post-mortem) genotype, although this is only possi-ble when a “direct reference” ante-mortem sample isavailable (so that the numerator is equal to 1.0). Bren-ner and Weir (2003, p 174) define a direct reference as“a known biological relic of a victim” from which anante-mortem DNA sample can be obtained. When thepost-mortem sample is correctly matched to the directreference sample then the likelihood is 1.0. This is sim-ply a statement that the probability of getting theobserved post-mortem DNA data is 1.0 if the source forthat sample was the same individual that provided theante-mortem sample. The transpose is not necessarilytrue. It does not necessarily follow that the post-mortemsample must be from the same individual that providedthe ante-mortem reference sample because the two sam-ples match. If another individual or individuals is/are apotential source for the post-mortem sample (becausethey also match the DNA profile) then the transpose isless than 1.0.

We frame this problem of victim identification interms of calculating the prior probability of a correctidentification and defining likelihood thresholds for iden-tifications using an example from Adams (2003a). Adamsstudied the diversity of dental pathology data usingmethods very similar to those applied to mtDNAsequence data. He found based on dental records from29,152 individuals that the most common pattern, pres-ent in about 13% of his cases, was for all 28 teeth (heexcluded third molars) to be “virgin” (i.e., non-cariousand unrestored). If we were dealing with a “closed pop-ulation” disaster with 100 individuals then we mightexpect 13 of the individuals to have no dental pathology.Mundorff (2008, p 127) defines a “closed population” asone “where the number and names of the victimsinvolved are known,” while Kontanis and Sledzik (2008,p 318) define it specifically within the transportationindustry as one “where accurate passenger manifestsare available soon after the accident.” If we furtherassume that we have complete and accurate ante-mortem and post-mortem data for all 100 individuals,then whenever we correctly match ante-mortem to post-mortem data for individuals with “virgin” teeth the like-lihood will be 1.0, but the likelihood ratio will be99=1258:25. This is the inverse of the frequency of“virgin” teeth which is 12/99 rather than 13/100 becausethe denominator in the likelihood ratio excludes the cur-

rent individual who already appears in the numeratorwith a likelihood of 1.0.

Prior probability of a correct identification. Fol-lowing Budowle et al. (2011), we use v to represent thenumber of, as yet, unidentified victims. Note that thereis also precedent for using v11 to represent the numberof as yet unidentified victims (Brenner and Weir, 2003).Using v as the number of victims, the prior probabilityof a correct identification is 1=v, while the prior probabil-ity of an incorrect identification is v21ð Þ=v. The priorodds of an identification (i.e., the ratio of the prior prob-ability of a correct identification to the prior probabilityof an incorrect identification) are:

1=vð Þv21ð Þ=vð Þ5

1

v21: (10)

With each identification made, the number of unidenti-fied victims can be decreased by one, a point also alludedto by Brenner and Weir (2003) when they indicate thatthe prior odds “continually increase as new victim iden-tifications are made.”

Continuing with our example of a closed population dis-aster with 100 individuals, the prior odds of a correct iden-tification are 1/99. When multiplied by the likelihood ratiofor an ante-mortem to post-mortem match for “virgin”teeth this gives the posterior odds as 1/12. The posteriorodds can in turn be converted to the easily interpretableposterior probability of 1=12ð Þ= 111=12ð Þ51=13. As thereare thirteen individuals within the 100 individuals knownto have had “virgin” teeth, a post-mortem match has a 1/13 chance of leading to a correct identification. This calcu-lation is not in keeping with the more conservative way inwhich DNA evidence is ordinarily handled. There thedenominator in the likelihood ratio is the “match proba-bility” based on a database, such as using the product rulewith the Combined DNA Index System (Budowle andMoretti, 1999; Budowle et al., 2001).

As an example of this more conservative approach,Steadman et al. (2006) found that a particular pattern ofdental pathology observed in a forensic case was neverobserved in a database sample of 29,152 individuals.They used a match probability of 1/29,153 in analyzingtheir case, on the presumption that had one additionalcase been sampled it would have yielded a match. Now ifwe presume that this very rare dental pathology patternis known to belong to one of the 100 victims but we haveincomplete ante-mortem information from the remainingvictims, under this more conservative approach we mustallow for the possibility that there was another occur-rence of the pattern. The likelihood ratio given an ante-mortem to post-mortem match is now 29,153, which whendivided by 99 gives posterior odds of about 294.5 and thisconverts to a posterior probability of about 0.9966. In con-trast, if we knew with surety from the ante-mortem datathat there was only one individual among the 100 withthis pattern of dental pathology, then the likelihood ratiowould be infinite and the posterior probability on theidentification would be 1.0.

More on priors and the problem of transposedconditionals in forensic science

We have not yet commented on the role of prior infor-mation in forensic anthropology, although this is clearly

172 L.W. KONIGSBERG AND S.R. FRANKENBERG

American Journal of Physical Anthropology

Page 21: Bayes in biological anthropology

an area for potential confusion. Our use of the priorabove, which is very common in the mass fatality identi-fication literature for “closed” populations, is uncontro-versial. But in expert testimony the introduction of priorodds for as versus against an identification should beavoided at all costs. For the expert to “wander” into deal-ing with prior odds brings the expert into the position ofcommenting on the particulars of the identificationexternal to the forensic anthropological evidence. AsKoehler and Saks (1991, p 371) note: “Where presump-tions exist, they should be provided by the law, not byexpert witnesses.” Using only the likelihood ratio fromthe forensic anthropological evidence should protect theexpert from this error, but unfortunately it is all tooeasy for the likelihood ratio to be misinterpreted by“transposing” the conditioning (Thompson and Schu-mann, 1987; Evett, 1995; National Research CouncilCommittee on DNA Technology in Forensic Science,1996; Foreman et al., 2003).

Evett (1995, p 129) gives a clear illustration of a trans-posed conditional probability:

The probability that an animal has four legs if it is acow is onedoes not mean the same thing as:The probability that an animal is a cow if it has fourlegs is one.

A much more convoluted example can be found in a dis-cussion in the 2003 movie “Pirates of the Caribbean: TheCurse of the Black Pearl,” in which Murtogg demon-strates very circuitously to Mullroy that the probabilitythat the ship the Black Pearl would have black sails is1.0, while the probability that a ship with black sailswould be the Black Pearl is much lower. Both misread-ing of a conditional probability and transposition of con-ditioning in a likelihood ratio can occur when a forensicanthropologist is acting as an expert witness. We givean extended example of likely scenarios under whichsuch errors can occur when a forensic anthropologist isacting in this role.

Let’s presume that the skeleton of an unidentifiedindividual is found in a clandestine grave. A partialDNA profile from the Combined DNA Index System(CODIS) is obtained from the individual and submittedto the FBI to be checked against the National MissingPerson DNA Database. This produces a “hit” for a miss-ing person sample obtained from a hairbrush (the DNAon which was confirmed using a biological sample from asibling). The missing person’s disappearance was circum-stantially linked to another individual who stands to becharged with murder if the identification can be “made.”As a consequence, the partial profile from the skeleton ischecked against a larger database and ultimately a like-lihood ratio of 10,000 is obtained.

The prosecution is concerned that this likelihood ratiomay not be high enough, and on finding that dentalrecords are available from the missing person the prose-cutor obtains an expert witness to examine the dentitionfrom the skeleton and the dental records of the missingperson. The expert witness finds a perfect match, withall teeth being free of restorative work and the mandibu-lar right first and second molars having occlusal fillings.On submitting this dental pattern to the Joint POW/MIA Accounting Command’s program Odontosearch II(http://www.jpac.pacom.mil/index.php?page5odontosearch),

the expert witness finds that 29 out of 37,955 individualshave this particular pattern. The expert witness conse-quently reports a likelihood ratio of 1,308.8. At trial, theexpert witness testifies that “the match is nearly 1,309more likely if the skeleton and the missing person repre-sent the same individual than if they were two differentindividuals.” Wishing to clarify this statement, the prose-cutor asks “so based on the dental evidence, it is nearly1,309 more likely that the skeleton is that of the missingperson than that it is from someone else?” But one couldonly make such a leap from the likelihood of the dataunder two different hypotheses to the posterior odds ofthose hypotheses given the data by assuming that theprior odds on the identification are “evens” (1/1). The pros-ecutor would never have been asked to prosecute a casebased on an identification which was as likely to be correctas incorrect. This form of transposition is referred to asthe “prosecutor’s fallacy” (first named so by (Thompsonand Schumann, 1987)) because that is the usual source forthe error.

The defense attorney who has had access to the expertwitnesses’ report has seen that the likelihood ratio wasobtained by inverting the match probability. The defenseattorney realizes that 29 out of 37,955 individuals in theOdontosearch database matched on dental pathology tothe skeleton from the clandestine grave. The grave waslocated in a rural area just on the outskirts of a citywith a population of one million individuals. The defenseattorney consequently multiplies the frequency of thedental pathology pattern in Odontosearch, whichis7:6431024, by one million to arrive at a predicted 764individuals from the city that would have had a match-ing pattern. From this standpoint, the defense attorneyargues that the posterior odds that the skeleton fromthe clandestine grave represents the same individual asthe missing person (as opposed to one of the 764 individ-uals presumed to have the same dental pattern) are1:764. This form of logic was referred to by Thompsonand Schumann as the “defense attorney’s fallacy”because that also was the usual source. This fallacy isnot so much an outright transposition as it is a misread-ing of the evidence within the framework of the trial.The fact that there are potentially many other individu-als who match the dental pattern ignores the fact thatthere is other evidence that moved the case forward. Inmaking this argument the defense attorney has chosento ignore the DNA evidence that was central to the ini-tial putative identification.

The role of prior information can become particularlyconfusing in the forensic setting because there is almostalways prior information that enters into the denomina-tor of likelihood ratios. This can be seen in Schneider’s(2007, p 240) statement that “Match probabilities or like-lihood ratios are based on assumptions about the ethnicorigin of the suspect or an unknown perpetrator.” In anexample of calculating a likelihood ratio, Evett and Weir(1998) write “because the victim described her assailantas Caucasian, it is appropriate to use estimates of alleleproportions from a Caucasian database.” In contrast, theNational Research Council report (1996, p 29) containsthe following proscription, “Usually, the subgroup towhich the suspect belongs is irrelevant, since we want tocalculate the probability of a match on the assumptionthat the suspect is innocent and the evidence DNA wasleft by someone else.” In any event, it is important tounderstand that the assumption of a prior for thedenominator of a likelihood establishes the “population

BAYES IN BIOLOGICAL ANTHROPOLOGY 173

American Journal of Physical Anthropology

Page 22: Bayes in biological anthropology

at large” that contains all individuals who could poten-tially match the evidentiary information. This is not thesame thing as assuming a prior probability or prior oddsfor an identification. In a legal proceeding, this prior isin the domain of the court, while in disaster victim iden-tification the prior may be well-defined for “closed popu-lation events” and much less clear for “open” events. Ineither case, the identification (at which point prior oddsof the identification come into play) will be above thelevel of any one “identification modality.”

In a critique of Steadman et al. (2006), Ferreira andAndrade (2009) confuse the issue of a prior on an identi-fication with a prior for the denominator calculation in alikelihood ratio. Steadman et al. (2006) gave the likeli-hood ratio as:

P ‘‘sex00jcorrectIDð ÞP ‘‘sex00jwrongIDð Þ 5

P }M}jMð ÞP }M}jMð Þ3P Mð Þ1P }M}jFð Þ3P Fð Þ

(11)

for a case with a putative identification to an individualwho was a male, where P }M}jMð Þ and P }M}jFð Þ are,respectively, the probabilities that an actual male andthat an actual female would be sexed as male. Steadmanet al., on the basis of a study of the Terry AnatomicalCollection, gave these probabilities as 0.9884 and 0.0194.They then gave the prior probability in the “populationat large” of being male as being equal to the prior proba-bility of being female (equal to 0.5). They took this priorafter an examination of the sex ratio in the NationalCrime Information Center’s missing person database,ultimately arriving at a likelihood ratio of 1.9615. Ferre-ira and Andrade (2009) argued that the likelihood ratioin Eq. (11) should be:

P ‘‘sex00jcorrectIDð ÞP ‘‘sex00jwrongIDð Þ 5

P }M}jMð ÞP }M}jFð Þ : (12)

This is only correct if the prior sex ratio for the“population at large” is such that the sample is entirelycomposed of females. In that case, and if osteologistswere “perfect” at assessing sex from skeletal material,then the likelihood ratio would reach its maximum ofinfinity as opposed to the maximum of 2.0 under a priorsex ratio of 1:1.

CONCLUSIONS

It is our hope that the examples we have given in thisarticle will spur biological anthropologists to movebeyond any preliminary training they may have receivedconcerning Bayes Theorem and consider using Bayesianapproaches where appropriate in their future research.While a number of the examples we have examined,because they began with an uninformative prior, provideresults that do not differ substantively from traditionalmaximum likelihood approaches, they do illustrate thata Bayesian approach provides a richer depiction of ourcurrent knowledge about parameters and models. In theexample taken from Uhl et al. (2013) we were able topresent the full posterior densities for body mass underan uninformative prior and under an informative prior(see Fig. 11) whereas the published analysis gave onlythe 0.025, 0.5, and 0.975 quantiles under these two pri-ors. When a Bayesian approach produces rather differ-ent results from traditional frequentist approach, it

often does so because of problems with the assumptionsunderpinning the latter approach. In our paleodemo-graphic example, maximum likelihood estimation pro-duced an asymptotic (log normal) density for thebaseline hazard parameter that was too peaked becausethe baseline hazard parameter lay near its boundary of0.0. In this case, rejecting the Bayesian approachbecause it requires prior information (we used a gammadistribution with “shape” and “scale” values both equalto 1.0) and instead using maximum likelihood estimationleads to a baseline hazard confidence interval that is toonarrow. This is not because of a failure on the part ofmaximum likelihood estimation, but is instead a resultof pressing the method into a situation where anassumption (that a parameter is not on or near a bound-ary) is violated.

It has been a quarter of a century since Berger andBerry (1988, p 166) wrote:

. . .common usage of statistics seems to have become fos-silized, mainly because of the view that standard sta-tistics is the objective way to analyze data. Discardingthis notion, and indeed embracing the need for subjec-tivity through Bayesian analysis, can lead to more flex-ible, powerful, and understandable analysis of data.

We are not advocating for dispatching with initial train-ing in “standard statistics,” as knowledge of frequentistconcepts often makes Bayesian concepts clearer, and fre-quentist analyses are sometimes useful and appropriate.The Nobel Laureate economist Sims (2007, p 2) in asummer seminar paper “Bayesian methods in appliedeconometrics, or, why econometrics should always andeverywhere be Bayesian” wrote in regard to trainingthat: “…I think that full understanding of what confi-dence regions and hypothesis tests actually are will leadto declining interest in constructing them.” On a muchless pessimistic note concerning the future of frequentistapproaches, Samaniego (2010) presents frequentist andBayesian approaches as a threshold problem where thebenefits of one approach over the other may reach a tip-ping point. Samaniego provides examples where the fre-quentist approach can win out over Bayesianapproaches, although in the end he characterizes himselfas “unabashedly a ‘Bayesian sympathizer’.” He (Sama-niego, 2010, p 62) also takes the occasional jab at fre-quentist methods, writing: “…let’s take a look at thelogical underpinnings of frequentist inference. Surprise,there are none!” But as he is quick to point out, a logi-cally consistent method that produces poor answers isno boon.

There has been a long-standing debate over whetherBayesian inference can be taught effectively at an intro-ductory level. The oft-cited exchange in The AmericanStatistician’s “Teacher’s Corner” could be considered abit of a draw, with Moore (1997) clearly supporting omit-ting Bayesian methods from such courses and Albert(1997) and Berry (1997) arguing strongly for their inclu-sion. The comments on these three articles were essen-tially split, with two authors (Freedman and Scheaffer)not supporting an extensive introduction of Bayesianmethods, two (Lindley and Short) supporting introduc-tion of such methods, and one author (Witmer) a bit onthe fence. At that time, one of the chief impediments toteaching about Bayesian inference was the general lackof suitable textbooks and readily available software,

174 L.W. KONIGSBERG AND S.R. FRANKENBERG

American Journal of Physical Anthropology

Page 23: Bayes in biological anthropology

with Albert’s (1996) and Berry’s (1996) introductorytexts being about the only options. Since then Kruschke(2010a) has written an excellent introductory Bayesiantext that incorporates the use of both R and BUGS, andthere are now more advanced texts that focus on Bayes-ian inference and MCMC (Albert, 2009; Link andBarker, 2010; Christensen et al., 2011; Lesaffre andLawson, 2012; Cowles, 2013). Examples in these moreadvanced texts do sometimes run to the anthropological,such as in Christensen et al.’s (2011, p 4) use of data onAche armadillo hunting practices.

While training is an important issue, the ability forstudents to learn Bayesian methods ultimately shouldnot be the criterion upon which we decide whether ornot to incorporate such methods into our own research.Furthermore, the leap into Bayesian thinking may notbe quite as great as anticipated. Stangl (1998, p 256–257) writes about her collaborations with “researcherstrained in medicine and the social sciences,” which intheory includes the bulk of training that most biologicalanthropologists receive. She says of them:

While the statistical analyses they present in publica-tions is nearly 100% classical, the statistical interpreta-tions made in their day-to-day work is not. In dailyconversations, debates, and statistical analyses, theyrarely follow classical prescriptions for “legitimate”data analyses or give classical interpretations to theirinference. In their day-to-day activities their thinkingand the decisions they make based on this thinkingare nearly 100% Bayesian. What appears on paper isnot indicative of what goes on in their heads.

This brings us full circle to our original example usingMays and Faerman’s (2001) data of nine males and fourfemale infant deaths. In the section on sequential use ofBayes Theorem, we mentioned that Mays and Faermanessentially stopped their (frequentist) analysis at twelvemales and five females which included the four individu-als from Waldron et al. (1999). Though the data fromFaerman et al. (1998) were certainly available, Maysand Faerman mentioned the male bias in that data, anddid not include this additional data that would havebrought the number of total male infant deaths to 26and female deaths to 10. The apparent reason for theirexclusion of this earlier data from any further chi-square calculations was the possibility that the sampleused in the Faerman et al. study was derived from abrothel, based on archaeological context (the remainswere recovered around a bath house) and a male biasamong the remains (which they interpreted as femalebabies having economic value in the presumed context).Mays and Faerman made the somewhat subjective deci-sion that the Faerman et al. data came from a differentcontext and consequently should not be included. Whatappear to be frequentist studies often have subjectiveelements within, or subjective decisions attached to,them. These subjective elements or decisions are betterserved by seeking fuller explanations through Bayesianmodeling, since choosing a Bayesian approach forces theresearcher to make the subjective information explicit.

ACKNOWLEDGMENTS

The authors thank the anonymous reviewers for theirhelpful comments on a previous draft. They also thankTrudy Turner both for her comments and for her patience

when they pushed various deadlines to their utter limits.The reference sample data used for the paleodemographicexample was obtained under National Science Foundationgrant BCS97-27386 awarded to the first author.

APPENDIX

Approximate Bayesian computation

Approximate Bayesian computation (ABC) is not theoldest Bayesian Monte Carlo method, but it is a goodplace to start our illustrations because in its simplestform it is straightforward. Pritchard et al. (1999), follow-ing Tavar�e et al. (1997), were among the first to useABC, although the method was not referred to as suchuntil a few years later (Beaumont et al., 2002). Themethod is a form of acceptance sampling that can beused to “build” the posterior distribution of one or moreparameters. ABC involves four steps that are repeated(iterated) a large number of times: 1) possible parametervalues are simulated from a prior distribution, 2) data issimulated based on the parameter values from the firststep, 3) summary statistics from the simulated data arecompared to the summary statistics from the actualdata, and 4) if the summary statistics from the simu-lated data are “close enough” to the summary statisticsfrom the actual data, the parameters simulated in thefirst step are accepted. An advantage of the ABC methodis that it does not require calculating likelihoods. Turnerand Van Zandt (2012) give an example of ABC using thebinomial distribution which we present here using theMays and Faerman (2001) data.

To apply ABC, we simulated 10,000 values from theposterior density by: 1) drawing a uniform random devi-ate between 0 and 1 (i.e., a value of u from the prior), 2)simulating a random binomial deviate at a sample sizeof thirteen (Mays and Faerman’s, 2001 sample size)using the u value from the first step, and 3) acceptingthe proposed u value from the first step if it producednine “males.” The top graph in Figure A1 shows a den-sity histogram of the 10,000 values as well as theBe 10;5ð Þ posterior density. The bottom graph in FigureA1 shows the empirical cumulative density function(Wilk and Gnanadesikan, 1968) from the 10,000 simu-lated values along with the Be 10;5ð Þ cumulative distri-bution function.

The simulation we just described is rather inefficient.With 14 possible values for the binomial variable (num-ber of “males” from 0 to 13) and sampling out of the uni-form distribution, the probability of simulating nine“males” is just 1/14, or 0.0714 (as we saw when discus-sing Bayes’ factor). Consequently, about 93% of the pro-posed u values will be rejected. This means that to get10,000 accepted u values we would have to simulateabout 143,000 initial values. One could relax therequirement that we must get nine “males” in order toaccept a proposed u value, and instead say that wewould accept a u value if it leads to the simulation of 8,9, or 10 “males.” This has an acceptance probability of 3/14 so that about 21% of the simulated u values would beaccepted, but the posterior will be slightly heavier in thetails.

For continuous data problems and multiple parameterproblems, the ABC potentially could reject virtually allsimulated values, making it necessary to accept simu-lated values that are within a pre-established tolerance.For example, Pritchard et al. (1999) wanted to estimate

BAYES IN BIOLOGICAL ANTHROPOLOGY 175

American Journal of Physical Anthropology

Page 24: Bayes in biological anthropology

multiple demographic parameters for past populationsusing data on modern Y chromosome microsatellite data.They used three summary statistics to compare theactual dataset to the dataset simulated using ABC: theaverage variance across eight loci for repeat numbers,the average heterozygosity, and the number of uniquehaplotypes. To test whether a simulated dataset shouldbe accepted they formed the absolute difference betweeneach pair of these three summary statistics (one calcu-lated on the simulated dataset and the other on theactual dataset), divided each summary statistic by itsrespective actual dataset value, and then accepted a sim-ulation if all three Euclidean distances were less than0.1. They found that if they used a value lower than 0.1then the posterior densities for the demographic andgenetic parameters did not change appreciably.

ABC has been applied to a number of human genetic,epidemiological, and demographic questions, a few ofwhich have been reported in the AJPA (see for exampleBoattini et al., 2013). Ray and Excoffier (2009) give a use-ful review of some such applications. A number of ABCpackages for addressing genomic diversity and phylogenyare available in the “R” statistical/graphics program andas stand-alone programs. For example, coalescent simula-tions are widely available both as “stand-alone” programs(Laval and Excoffier, 2004; Ray et al., 2010) and within“R” as “rtree” in the contributed package “ape” (Paradis,2012). Additionally, new ABC packages for addressingpast population structure are proliferating. Tsutaya andYoneda (2013) have recently written an ABC package in Rcalled “WARN” (for “Weaning Age Reconstruction withNitrogen isotope analysis”) that brings the ABC methodinto the realm of bioarchaeology. In general, ABC pack-ages and programs tend to be quite specific in application.Available packages are useful for asking the same ques-tions of different data, but if you are asking a new ques-tion, you must be prepared to write your own code to

simulate data after sampling from prior distributions forthe parameters. While ABC is useful for sampling fromthe posterior density for population genetic and demo-graphic parameters, its use in model comparison has beencalled into question (Robert et al., 2011).

Metropolis sampler

Unlike the relatively recent ABC method, the Metrop-olis sampler dates back to the middle of the last century(Metropolis et al., 1953), and belongs to the class of Mar-kov Chain Monte Carlo (MCMC) methods (Geyer, 1992;Gilks et al., 1996; Brooks et al., 2011) that have revolu-tionized Bayesian analysis. As one of the simplest of theMCMC methods, the Metropolis sampler is useful forexplaining the simulation and sampling processes usedby these methods. The first MC in MCMC references thefact that because these methods use the current value inorder to simulate the next value, the sequence of simu-lated values follows a Markov chain. In a Markov chain,a future value is related probabilistically to the currentvalue. The second MC in MCMC references the fact thatthese methods use Monte Carlo simulation, or repeatedrandom sampling to obtain numerical results. In theMetropolis sampler the next value in the sequence isoffered as a “proposal” value that may or may not beaccepted. We start with the simplest of the Metropolissamplers: the independence Metropolis sampler. In ourexample we will use Jeffreys’ prior so that the effect ofthe prior can be seen in the Monte Carlo procedure.

The independence variant of the Metropolis sampler isso called because potential future values are drawn inde-pendently of the current value. The fact that future valuesare drawn independently of current values suggests thatthe independence sampler will not form a Markov chain,but because the independence sampler can “reject” a pro-posed value and stay at its current value it does form aMarkov chain. To use the Metropolis sampler we need tobe able to write the product of the likelihood and the priorup to a multiplicative constant. This gives:

f hð Þ5 hx 12hð Þn2xh i

h20:5 12hð Þ20:5h i

5hx20:5 12hð Þn2x20:5;

(A1)

where the first bracketed term is from the likelihoodand the second is from the prior. To start the Metropolissampler we randomly draw a value from the uniformdensity, which we can write as uc for “theta current,”and find f ucð Þ from Eq. (A(1)). Now we pick anothervalue from the uniform density, which we can write asup for “theta proposal,” and find f up

� �, again from Eq.

(A(1)). We then form the ratio f up

� �=f ucð Þ, and if the

ratio is greater than 1.0 then we accept the proposedvalue up as our new uc. What we have just described issimply a “maximizer,” and a very inefficient one at that.It will climb to the highest density and stay there with-out forming a sample of the posterior. In order to samplefrom the posterior density, we must allow the possibilityfor the sampler to make stochastic moves to lower den-sities. In order to do this, we draw on a uniform againwhenever the f up

� �=f ucð Þ ratio is less than one, and if

this drawn random deviate u is less than the ratio, thenwe accept the proposed value.

Figure A2 shows the results of using the independenceMetropolis sampler on the Mays and Faerman (2001)data. Panel A shows 100 iterations through this

Fig. A1. ABC used to obtain the posterior density for h (pro-portion of male infant deaths) when 9 out of 13 infant deathswere male. The top graph shows the histogram density for10,000 draws and the Be 10; 5ð Þ posterior. The bottom graphshows the empirical cumulative density function (as openpoints) from the 10,000 draws against the cumulative Be 10; 5ð Þdensity.

176 L.W. KONIGSBERG AND S.R. FRANKENBERG

American Journal of Physical Anthropology

Page 25: Bayes in biological anthropology

algorithm, with the horizontal dashed line representingthe posterior mode from the Be 9:5; 4:5ð Þposterior distri-bution based on nine males and four females and Jef-freys’ prior of Be 0:5;0:5ð Þ. Panel B shows the ECDFfrom 10,000 simulations along with the Be 9:5;4:5ð Þcumulative distribution function. The independence sam-pler thus appears to be effective in sampling from theposterior density in our “toy” example, but it can be veryineffective in real-world practical applications because itmay reject proposal values so frequently that the sam-pler gets stuck on various values. Konigsberg and Herr-man (2002, p 236) reported that they “had some success”in obtaining samples from the posterior distribution ofage-at-death using the independence Metropolis samplerafter observing an “age marker” and sampling from auniform distribution of age. This method had successonly because the amount of information from the “agemarker” was paltry, so that a uniform distribution of agewas an adequate proposal density.

A more flexible and useful form of the Metropolis algo-rithm is the random walk Metropolis sampler. Kruschke(2010a) devotes an entire chapter of his book to present-ing a random walk Metropolis sampler using the uni-form prior. In this method, a random “jump” from asymmetric distribution centered on zero is added to thecurrent value rather than independently sampling a pro-posal value. Figure A3 shows an example of the randomwalk sampler that is much like the previous exampleexcept that a random “jump” from a normal distributionwith a mean of zero is added to the current value inorder to form the proposal value. The acceptance ratiof up

� �=f ucð Þ is calculated as described above, a random

uniform number is drawn, and if that number is lessthan the acceptance ratio then the proposal value isaccepted. If the current u value is near zero or one thenthe “jump” may be to a proposed value that is less thanzero or greater than one, in which case the proposedvalue is automatically rejected.

Figure A3 shows 250 iterations of a Metropolis ran-dom walk with a standard deviation of 0.2 for the jump

distribution in the top panel and the same procedure butwith standard deviations of 0.01 and of 2.0 in the middleand bottom panels, respectively. The selection of anappropriate “spread” for the jump distribution is animportant consideration, as the 0.01 standard deviationis too small and leads to very frequent acceptance of

Fig. A2. A: One hundred iterations of a Metropolis independence sampler from a Be 9:5; 4:5ð Þ distribution that represents aBe 0:5; 0:5ð Þ prior and 9 males sampled out of 13 individuals. The dashed line is the mode from the Be 9:5; 4:5ð Þ distribution. B: Theempirical cumulative density function (as open points) from the 10,000 iterations of the Metropolis independence sampler shown inpanel A plotted against the cumulative Be 9:5;4:5ð Þ density.

Fig. A3. Two hundred and fifty iterations of a random walkMetropolis sampler under three different standard deviationsfor the “jump” distribution. The first panel, with a standarddeviation of 0.2, adequately explores the posterior density. Thesecond panel has a standard deviation which is too small andconsequently the sampler creeps through the posterior densitytaking many “baby steps.” The third panel has a standard devi-ation which is too large and consequently the sampler gets“stuck” at the same value for many consecutive iterations.

BAYES IN BIOLOGICAL ANTHROPOLOGY 177

American Journal of Physical Anthropology

Page 26: Bayes in biological anthropology

“baby steps” through the posterior distribution. Thus,the random walk has not even entered the area aroundthe posterior mode in the middle panel of Figure A3(where the standard deviation is 0.01). Conversely, thestandard deviation of 2.0 is far too large so that the pro-posed “jump” is almost always rejected, and as a conse-quence the Metropolis sampler gets stuck for numerousconsecutive iterations, as shown in the bottom panel ofFigure A3. One of the many advantages of using special-ized software (such as JAGS, OpenBUGS, and Win-BUGS) to run the Metropolis sampler is that theyautomatically determine “jump widths” and can “tune”the widths so that the sampler works more effectively,removing this house-keeping chore from the user’sresponsibility. The advantage of Metropolis samplersoverall is that they can be used to sample from very gen-eral distributions, including ones that have no bounda-ries, or ones that have only a single boundary such as atzero but that are “open” to the left (down to negativeinfinity) or to the right (up to infinity).

Slice sampling

While the Metropolis sampler can adequately samplefrom conditional distributions, it is comparatively ineffi-cient, and often too simple to deal effectively with largedatasets and multiple parameter situations. To counterthe inefficiency of the Metropolis sampler and to makeGibbs sampling (discussed in a later section) easier, Neal(2003) introduced a different method of sampling fromdensities (including posterior densities) that he refers toas “slice sampling.” Although we only consider univari-ate slice sampling here, Neal described the multivariateform. Slice sampling has the advantage that it alwaysproduces a new step through the space. Neal describedtwo possible ways of performing slice sampling, the firstbeing a “stepping-out” process followed by “shrinkage,”and the second being a “doubling” process. We describethe (first) stepping-out and shrinkage method here. Slicesampling is most easily explained, and hopefully bestunderstood, using a graphical representation like theone we provide in Figure A4. All eight panels in this fig-ure show the posterior density Be 9:5; 4:5ð Þ from whichwe are trying to sample, and proceeding in orderthrough panels A through H represents the sequentialsteps in “stepping-out” and then “shrinkage.”

In panel A of Figure A4 we have a current value of uc

equal to 0.6. This value of u has a particular densityvalue which we find from Eq. (A1), and then representon the graph as a filled point. Next we pick a new point,shown as an unfilled point, from a uniform distributionbetween 0 and the density value at our current u (thefilled point). The “slice” is represented in panel A by thedotted line across the density at the randomly chosenunfilled point “slice height.” The “slice” is not actuallyseen or known, nor do we have a full picture of the den-sity from which we are sampling. All we can do is evalu-ate points using Eq. (A1). Throughout the slice samplingshown in Figure A4 we will be using a “window” with awidth of 0.1 along the abscissa to represent the size, orarea, of individual steps that we will take in the“stepping-out” process.

As shown in panel B, this window is randomly placedsuch that it covers uc. In this particular example, theleft boundary for the window was randomly placed at0.576, and the right boundary is consequently at 0.676.Now we evaluate the densities (shown as filled points in

panel B) at these two values on the abscissa (0.576 and0.676) to determine whether the density values fallabove or below the slice height. Slice height is again

Fig. A4. Steps in “slice” (Neal, 2003) sampling using the“stepping-out” and “shrinkage” approach. Every panel shows theBe 9:5; 4:5ð Þ posterior density from which we are sampling. A: Start-ing point at hc50:6 with the density value represented by a filledpoint. A new point (unfilled) is simulated on a uniform to establishthe “slice,” shown with dashed line. B: A “window” with a width of0.1 is randomly placed around the starting point (here from 0.576 to0.676) and the density is evaluated at both sides of the window(shown with filled points) to determine whether they fall above theslice. Both do, so “stepping-out” will be necessary in both directions.C: “Stepping-out” one window to the left produces a density value atthe far left that is below the slice height, so that “stepping-out” tothe left is terminated. D: Two “steps” to the right are necessarybefore the right-most density is below the slice height. E: The finalsampling window from h5 0:476;0:876ð Þ after “stepping-out” is com-plete but prior to any “shrinkage.” F: A proposal h value of 0.54 issimulated on the uniform between 0.476 and 0.876 and found tohave a lower density than the slice height. The proposed value isconsequently rejected. G: The window is shrunk to the right usingthe rejected point so that there is a new window of from 0.54 to0.876. H: A new proposal value of 0.747 is simulated and found tohave a density above the slice height. This value is now accepted tobe the current value of h.

178 L.W. KONIGSBERG AND S.R. FRANKENBERG

American Journal of Physical Anthropology

Page 27: Bayes in biological anthropology

shown with a dotted line, and we have hatched the rec-tangular window up to the slice height. Panel B and allsubsequent panels also show uc with a heavy verticalline. Because both the left and right densities fall abovethe slice height, we need to “step-out” in both directions.

Panel C shows “stepping-out” to the left. The originalrectangle (from 0.576 to 0.676) is shown without hatch-ing while the step to the left, down the window-width of0.1 to a value of 0.476, is shown with a hatched rectan-gle. The density at a u value of 0.476 is less than theslice height, so we are done stepping out to the left.Panel D shows that two steps to the right must be takenin order to obtain a “right-hand” density that is lessthan the slice height. Panel E shows the final samplingwindow after “stepping-out” is complete; this is theregion from which we will sample a new potential valueof u. We “stepped-out” one window of 0.1 to the left from0.576 (the initial left boundary) and two windows to theright from 0.676 (the original right boundary), so theregion for sampling u runs from 0.476 to 0.876.

Panels F through H show the process of “shrinkage”that leads to finally sampling a new value for u. In panelF we randomly sample a value from the uniformbetween 0.476 and 0.876, obtaining a value of 0.54. Thedensity at 0.54 (shown with a filled point) is below theslice height and consequently cannot be accepted. As thevalue of 0.54 is less than the current u value of 0.6, theshaded region will need to shrink to the right (towardthe current u value). Panel G shows the new boundaryand new sampling window that runs from 0.54 to 0.876.We now randomly sample another value from the uni-form, this time between 0.54 and 0.876, obtaining avalue of 0.747, as shown in panel H. This value gives adensity above the slice height and is consequentlyaccepted as the next value of u. If, on the other hand, wehad randomly sampled a u value between 0.829 and0.876, then we would have rejected the density andshrunk the shaded region to the left, shifting the rightboundary to whatever value was drawn. As can be seenin panel H, the density from which we are sampling fallsbelow the slice height (the top of the hatched samplingwindow) in the region between 0.829 and 0.876. Now thewhole process can begin anew using uc50:747 as thenew current value.

Slice sampling has the advantage that over the initialsampling events, the window width can be “tuned” sothat it is appropriate for the target density. This is auto-matically handled within the software that implements“slice” sampling for MCMC analyses. The method isfairly easy to implement for univariate distributions, canbe applied to multivariate distributions, and can adaptto some extent to dependencies between variables.

Adaptive rejection sampling

Adaptive rejection sampling is another samplingmethod that is more efficient than the Metropolis sam-pler and that is useful in applications of Gibbs sampling(described in the next section). It can be used to samplefrom any density which is log concave (down), thoughthe restriction to log concave densities can be removedby adding a Metropolis step (Gilks et al., 1995). Log con-cavity of a density is just another way of saying that thedensity is unimodal and that the mode does not occur ona boundary. The method is extensively described byGilks (1992), Gilks and Wild (1992), and Gilks (1996).Figure A5 shows graphically how adaptive rejection

sampling works, using the same density as in Figure A4but now calculated as a log density. We use the simplerderivative free version (Gilks, 1992) as opposed to thetangent version (Gilks and Wild, 1992) of adaptive rejec-tion sampling in this illustration. This figure also makesextensive use of Wayne Zhang’s “ARS” R code (availablefrom http://actuaryzhang.com/seminar/seminar.html)which he modeled after Gilks’ C code. As in the preced-ing illustration, separate panels, here A/B and C–F, areused to show sequential steps of the sampling process.

Panel A shows the same density we were dealing within slice sampling, but the density is drawn on a log

Fig. A5. Adaptive rejection sampling by the derivative freemethod. A: The same density shown in Figure 13 is shown hereas a log density. The log density is evaluated at four initialpoints (0.1, 0.25, 0.5, and 0.9) in order to draw an “upper hull”(heavy continuous line) and a “lower hull” (heavy dashed line).A proposal value of 0.687 (light vertical line) is simulated fromthe piecewise exponential upper hull. B: Same view as A, but inthe straight scale. The narrow vertical rectangle is at the pro-posal value of 0.687, while the short horizontal line crossing therectangle is a random uniform deviate that is used in two tests,the first of which is shown here. If the deviate falls between a 0density and the density at the lower hull (shaded in the rectan-gle) then the proposal value is accepted. This test failed. C:Under the second test, if the deviate falls within the densityregion between the two hulls (shaded in the rectangle) the pro-posal value is accepted. This second step of the acceptance testalso failed. D: The rejected proposal value calculations are usedto “adapt” the upper and lower hulls. E: A proposal value of0.883 is also rejected, leading to further “adaptation.” F: Theproposed value of 0.778 is accepted because the associated ran-dom deviate fell between 0 and the lower hull.

BAYES IN BIOLOGICAL ANTHROPOLOGY 179

American Journal of Physical Anthropology

Page 28: Bayes in biological anthropology

scale. We start with four initial arbitrary u values of 0.1,0.25, 0.5, and 0.9 for which we can evaluate the log den-sities, which are then plotted as four filled points. A“lower hull,” shown with a dashed line, can be formed by“connecting the dots” and placing vertical drop lines atthe first and last points. An “upper hull,” shown withheavy solid lines, is then drawn by continuing the lowerhull lines to either side of an intervening segment andfinding the points of intersection. For example, theupper hull above the log densities at 0.5 and 0.9 isdrawn by continuing the dashed line between the pointsat 0.25 and 0.5 and continuing the dashed vertical linefrom the point at 0.9 until they intersect. Similarly, theupper hull above the log density between the points at0.25 and 0.5 is drawn by continuing the dashed linesbetween the points at 0.1 and 0.25 and between thepoints at 0.5 and 0.9. Panel A also shows a vertical lineat a u value of 0.687, which is a proposed value from thedensity. It is relatively easy to sample proposal valuesbecause the upper hull is piecewise exponential.

Panel B shows the same graph but in the original den-sity scale; here the hulls are exponentials, instead of theline segments shown in the log scale Panel A. The upperhull above the density between 0.5 and 0.9 now extendsbeyond the cut-off we picked for scaling purposes. Hadwe included all of the upper hull, the density functionwould be compressed to a nearly flat line. The narrowrectangle at the proposed u value of 0.687 will be used totest whether the proposed u value can be accepted. Thisis potentially a two-step acceptance test whereby evalua-tion continues to the second step only if the proposedvalue fails the first step, a measure intended to cutdown on the computations required. The short horizontalline crossing the rectangle represents a random uniformdeviate between 0 and 1 that we have scaled so that it isappropriate for the graph. The very small filled portionof the rectangle is from a density value of 0.0 up to thelower hull, whereas the open part of the rectangle con-tinues from the lower hull up to the upper hull.

If the random uniform deviate (the short horizontalline crossing the rectangle) falls within the filled portionof the rectangle, then the proposed u value is accepted.This does not occur in our example in Panel B, and wemust proceed to the second part of the acceptance test,which involves shifting the rectangle up to the lowerhull so that the filled portion now represents the densityregion between the two hulls, as shown in Panel C.Once again our random uniform deviate does not fallwithin the filled portion of the rectangle, and the pro-posed u value cannot be accepted. We then use therejected proposal value calculation to “adapt” the upperand lower halls by inserting the evaluated deviate as anew point at the u value of 0.687 and redrawing thehalls, as shown in Panel D. Panel D represents the“adaptive” step of the sampler in that the upper andlower hulls now more narrowly bracket the log densityfunction (compare panels A and D). We then start theprocess of sampling another proposed value over again.Panel E shows a new proposed u value of 0.883 that alsowas not accepted, and as a consequence the hulls onceagain “adapt.” Finally, panel F shows a u value of 0.778that was accepted.

Gibbs sampling

Gibbs sampling is the process of sampling sequentiallyfrom full conditional densities so that it is possible to

evaluate the posterior density for anything of interest. Itis an MCMC method appropriate to multivariate con-texts and multiple parameter problems, and as men-tioned in the above sections, incorporates additionaltypes of sampling (such as slice or sampling from aknown distribution) within its iterations. Thus far in ourillustrations using the Mays and Faerman (2001) data,we have been trying to estimate a single parameter, u,or the proportion of males. The Gibbs sampler is notappropriate for estimating u here, because this is a uni-variate problem, but we can use Gibbs sampling to findthe posterior predictive distribution (which we alsoarrived at analytically). Casella and George (1992) giveas one of their examples of Gibbs sampling the evalua-tion of the predictive (posterior) density from a binomialmodel. As they point out, Gibbs sampling is not at allnecessary here because the predictive density can befound directly from the beta binomial distribution. Withthat said, it is instructive to look at the Gibbs samplerin this context so that it can be readily compared to theknown answer. The full conditional distributions for x(the number of males) and u (the proportion of males)are:

x � bin n; hð Þ

h � Be x1a;n2x1bð Þ;(A2)

where the “�” symbol (referred to variously as a tilde ora “twiddle”) means “is distributed as.” For our examplewe use a and b values of 9.5 and 4.5, respectively, whichcorresponds to the observed data of nine males and fourfemales and a Jeffreys prior of Be 0:5; 0:5ð Þ, and thensimulate 50,000 values starting at a u value of 0.5. The xvalues were simulated using the rbinom function in R,while the u values were simulated using the rbeta func-tion also in R. Figure A6 shows the predictive distribu-tion for the number of males out of 13 individuals,where the vertical lines are from the proportion of

Fig. A6. Comparison of results from 50,000 iterations of theGibbs sampler to obtain the posterior predictive distribution(vertical lines) to the analytical result from the beta binomialdistribution (unfilled points).

180 L.W. KONIGSBERG AND S.R. FRANKENBERG

American Journal of Physical Anthropology

Page 29: Bayes in biological anthropology

simulated x values that fall into each count of males(from 0 to 13) and the unfilled points are from the betabinomial distribution. This figure illustrates that thesimulated results from the Gibbs sampler and the ana-lytical results from the beta binomial distribution match.

LITERATURE CITED

Abraham J, Kwan P, Champod C, Lennard C, and Roux C.2012. An AFIS candidate list centric fingerprint likelihoodratio model based on morphometric and spatial analyses(MSA). In: Yang J, and Xie SJ, editors. New trends and devel-opments in biometrics. Rijeka, Croatia: InTech. p 221–250.

Adams BJ. 2003a. The diversity of adult dental patterns in theUnited States and the implications for personal identification.J Forensic Sci 48:497–504.

Adams BJ. 2003b. Establishing personal identification based onspecific patterns of missing, filled, and unrestored teeth. JForensic Sci 48:487–496.

Adams BJ, Byrd JE. 2006. Resolution of small-scale commin-gling: a case report from the Vietnam War. Forensic Sci Int156:63–69.

Agresti A, Coull BA. 1998. Approximate is better than “exact” forinterval estimation of binomial proportions. Am Stat 52:119–126.

Albert J. 1997. Teaching Bayes’ rule: a data-oriented approach.Am Stat 51:247–253.

Albert J. 2009. Bayesian computation with R. New York, NY:Springer.

Albert JH. 1996. Bayesian computation using Minitab. Belmont,CA: Duxbury Press.

Alonso A, Mart�ın P, Albarr�an C, de Simon P, Iturralde M,Fernandez-Rodriguez A, Atienza I, Capilla J, Garcia-Hirschfeld J, Martinez P. 2005. Challenges of DNA profilingin mass disaster investigations. Croat Med J 46:540.

Babb PL, Fernandez Duque E, Baiduc CA, Gagneux P, Evans S,Schurr TG. 2011. mtDNA diversity in Azara’s owl monkeys(Aotus azarai azarai) of the Argentinean Chaco. Am J PhysAnthropol 146:209–224.

Barik S, Sahani R, Prasad B, Endicott P, Metspalu M, SarkarB, Bhattacharya S, Annapoorna P, Sreenath J, Sun D. 2008.Detailed mtDNA genotypes permit a reassessment of the set-tlement and population structure of the Andaman Islands.Am J Phys Anthropol 136:19–27.

Bartlett M. 1937. Sub-sampling for attributes. Suppl J R StatSoc 4:131–135.

Bayes T. 1763. An essay toward solving a problem in the doc-trine of chances. Philos T R Soc Lond 53:370–418.

Beaumont MA, Zhang W, Balding DJ. 2002. ApproximateBayesian computation in population genetics. Genetics 162:2025–2035.

Berger JO, Berry DA. 1988. Statistical analysis and the illusionof objectivity. Am Sci 76:159–165.

Berkvens D, Speybroeck N, Praet N, Adel A, Lesaffre E. 2006.Estimating disease prevalence in a Bayesian framework usingprobabilistic constraints. Epidemiology 17:145–153.

Bernard HR. 2011. Research methods in anthropology. Lanham,MD: AltaMira Press.

Berry DA. 1996. Statistics: a Bayesian perspective. Belmont,CA: Duxbury Press.

Berry DA. 1997. Teaching elementary Bayesian statistics withreal applications in science. Am Stat 51:241–246.

Boattini A, Castr�ı L, Sarno S, Useli A, Cioffi M, Sazzini M,Garagnani P, De Fanti S, Pettener D, Luiselli D. 2013.mtDNA variation in East Africa unravels the history of afro-asiatic groups. Am J Phys Anthropol 150:375–385.

Bocquet-Appel J-P, Bacro JN. 2008. Estimation of an age distri-bution with its confidence intervals using an iterative Bayes-ian procedure and a bootstrap sampling approach. In:Bocquet-Appel JP, editor. Recent advances in paleodemogra-phy. Dordrecht, The Netherlands: Springer. p 63–82.

Bocquet-Appel J-P, Masset C. 1982. Farewell to paleodemogra-phy. J Hum Evol 11:321–333.

Boldsen JL. 2001. Epidemiological approach to the paleopatho-logical diagnosis of leprosy. Am J Phys Anthropol 115:380–387.

Branscum A, Gardner I, Johnson W. 2004. Bayesian modelingof animal-and herd-level prevalences. Prev Vet Med 66:101–112.

Brenner CH, Weir BS. 2003. Issues and strategies in the DNAidentification of World Trade Center victims. Theor PopulBiol 63:173–178.

Brooks S, Gelman A, Jones G, Meng X-L, editors. 2011. Hand-book of Markov chain Monte Carlo. Boca Raton, FL: Chap-man and Hall/CRC.

Brown LD, Cai TT, DasGupta A. 2001. Interval estimation for abinomial proportion. Stat Sci 16:101–117.

Brown PJ. 1993. Measurement, regression, and calibration.New York: Oxford University Press.

Budowle B, Ge J, Chakraborty R, Gill-King H. 2011. Use ofprior odds for missing persons identifications. InvestigativeGenetics 2:15.

Budowle B, Moretti TR. 1999. Genotype profiles for six popula-tion groups at the 13 CODIS short tandem repeat core lociand other PCR-based loci. Forensic Sci Commun 1:73–88.

Budowle B, Shea B, Niezgoda S, Chakraborty R. 2001. CODISSTR loci data from 41 sample populations. J Forensic Sci 46:453–489.

Butler JM. 2011. Advanced topics in forensic DNA typing:methodology. Waltham, MA: Academic Press.

Byers SN, Roberts CA. 2003. Bayes’ theorem in paleopathologi-cal diagnosis. Am J Phys Anthropol 121:1–9.

Byrd JE. 2008. Models and methods for osteometric sorting. In:Adams BJ, Byrd JE, editors. Recovery, analysis, and identifi-cation of commingled human remains. Totowa, NJ: Springer.p 199–220.

Casella G, George EI. 1992. Explaining the Gibbs sampler. AmStat 46:167–174.

Caussinus H, Courgeau D. 2010. Estimating age without meas-uring it: a new method in paleodemography. Population (Eng-lish edition) 65:117–144.

Chamberlain A. 2000. Problems and prospects in paleodemogra-phy. In: Cox M, Mays S, editors. Human osteology in archae-ology and forensic science. London, UK: Greenwich MedicalMedia. p 101–115.

Christensen AM. 2005. Testing the reliability of frontal sinusesin positive identification. J Forensic Sci 50:18–22.

Christensen R, Johnson WO, Branscum AJ, Hanson TE. 2011.Bayesian ideas and data analysis: an introduction for scien-tists and statisticians. Boca Raton, FL: CRC Press.

Clopper C, Pearson ES. 1934. The use of confidence or fiduciallimits illustrated in the case of the binomial. Biometrika 26:404–413.

Courgeau D. 2010. Dispersion of measurements in demography:a historical view. Electron J Hist Probab Stat 6:1–19.

Courgeau D. 2012. Probability and social science: methodologi-cal relationships between the two approaches. New York, NY:Springer.

Cowles MK. 2013. Applied Bayesian statistics: with R andOpenBUGS examples. New York, NY: Springer.

D’Ao~aut K, Vereecke EE, editors. 2011. Primate locomotion: link-ing field and laboratory research. New York, NY: Springer.

Darroch JN, Mosimann JE. 1985. Canonical and principal com-ponents of shape. Biometrika 72:241–252.

de Clare Bronsvoort BM, von Wissmann B, Fevre EM, HandelIG, Picozzi K, Welburn SC. 2010. No gold standard estimationof the sensitivity and specificity of two molecular diagnosticprotocols for Trypanosoma brucei spp. in western Kenya.PLoS One 5:e8628.

Diebolt J, Ip EHS. 1996. Stochastic EM: method and applica-tion. In: Gilks WR, Richardson S, Spiegelhalter DJ, editors.New York: Chapman and Hall. p 259–273.

DiGangi EA, Bethard JD, Kimmerle EH, Konigsberg LW. 2009.A new method for estimating age at death from the first rib.Am J Phys Anthropol 138:164–176.

Dorai-Raj S. 2009. Binom: binomial confidence intervals for sev-eral parameterizations. R package version 1.0–5. URL: http://

BAYES IN BIOLOGICAL ANTHROPOLOGY 181

American Journal of Physical Anthropology

Page 30: Bayes in biological anthropology

CRAN.R-project.org/package5binom. Accessed October 9,2013.

Engel B, Swildens B, Stegeman A, Buist W, De Jong M. 2006.Estimation of sensitivity and specificity of three conditionallydependent diagnostic tests in the absence of a gold standard.J Agric Biol Environ Stat 11:360–380.

Evett I. 1995. Avoiding the transposed conditional. Sci Justice35:127–131.

Evett IW, Weir BS. 1998. Interpreting DNA evidence: statisticalgenetics for forensic scientists. Sunderland, MA: SinauerAssociates.

Faerman M, Bar-Gal GK, Filon D, Greenblatt CL, Stager L,Oppenheim A, Smith P. 1998. Determining the sex of infanti-cide victims from the late Roman era through ancient DNAanalysis. J Archaeol Sci 25:861–865.

Ferreira M, Andrade M. 2009. A note on Dawnie Wolfe Stead-man, Bradley J. Adams, and Lyle W. Konigsberg, statisticalbasis for positive identification in forensic anthropology. Int JAcad Res 1:23–26.

Fisher RA. 1922. On the mathematical foundations of theoreti-cal statistics. Phil Trans R Soc London Ser A 222:309–368.

Foreman L, Champod C, Evett I, Lambert J, Pope S. 2003. Inter-preting DNA evidence: a review. Int Stat Rev 71:473–495.

F€urtbauer I, Heistermann M, Sch€ulke O, Ostner J. 2013. Briefcommunication: female fecal androgens prior to the matingseason reflect readiness to conceive in reproductively quies-cent wild macaques. Am J Phys Anthropol 151:311–315.

Gamerman D. 1997. Markov Chain Monte Carlo: stochastic sim-ulation for bayesian inference. New York, NY: Chapman &Hall/CRC. p 1–512.

Gelman A, Carlin JB, Stern HS, Rubin DB. 2004. Bayesiandata analysis. Boca Raton, FL: Chapman & Hall/CRC.

Geyer CJ. 1992. Practical Markov chain Monte Carlo. Stat Sci7:473–483.

Gilks WR. 1992. Derivative-free adaptive rejection sampling forGibbs sampling. In: Bernardo JM, Berger JO, Dawid AP,Smith AFM, editors. Bayesian statistics 4. New York: OxfordUniversity Press. p 641–649.

Gilks WR. 1996. Full conditional distributions. In: Gilks WR,Richardson S, Spiegelhalter DJ, editors. Markov chain MonteCarlo in practice. New York, NY: Chapman and Hall. p 75–88.

Gilks WR, Best NG, Tan KKC. 1995. Adaptive rejection Metrop-olis sampling within Gibbs sampling. Appl Stat 44:455–472.

Gilks WR, Richardson S, Spiegelhalter DJ. 1996. Markov chainMonte Carlo in practice. New York, NY: Chapman and Hall.

Gilks WR, Wild P. 1992. Adaptive rejection sampling for Gibbssampling. Appl Stat 41:337–348.

Gillespie TR, Barelli C, Heistermann M. 2013. Effects of socialstatus and stress on patterns of gastrointestinal parasitism inwild White Handed Gibbons (Hylobates lar). Am J PhysAnthropol 150:602–608.

Gilmore CC. 2013. A comparison of antemortem tooth loss inhuman hunter-gatherers and non-human catarrhines: impli-cations for the identification of behavioral evolution in thehuman fossil record. Am J Phys Anthropol 151:252–264.

Goodwin W, Linacre A, Vanezis P. 1999. The use of mitochon-drial DNA and short tandem repeat typing in the identifica-tion of air crash victims. Electrophoresis 20:1707–1711.

Gowland RL, Chamberlain AT. 2002. A Bayesian approach toageing perinatal skeletal material from archaeological sites:implications for the evidence for infanticide in Roman-Brit-ain. J Archaeol Sci 29:677–685.

Haldane J. 1932. A note on inverse probability. Math Proc Cam-bridge 28:55–61.

Hartman D, Drummer O, Eckhoff C, Scheffer J, Stringer P.2011. The contribution of DNA to the disaster victim identifi-cation (DVI) effort. Forensic Sci Int 205:52–58.

Hoppa RD, Vaupel JW. 2002. Paleodemography: age distributionfrom skeletal samples. Cambridge; New York: CambridgeUniversity Press.

Hui SL, Walter SD. 1980. Estimating the error rates of diagnos-tic tests. Biometrics 36:167–171.

Jackson G, Black S. 2013. Use of data to inform expert evalua-tive opinion in the comparison of hand images—theimportance of scars. Int J Legal Med (early view: 10.1007/s00414-013-0828-5).

Jeffreys H. 1939. Theory of probability. Oxford, UK: ClarendonPress.

Jeffreys H. 1946. An invariant form for the prior probability inestimation problems. Proc R Soc London Series A Math PhysSci 186:453–461.

Johnson WO, Gastwirth JL, Pearson LM. 2001. Screening with-out a “gold standard”: the Hui-Walter paradigm revisited. AmJ Epidemiol 153:921–924.

Jones G, Johnson WO, Hanson TE, Christensen R. 2010. Identi-fiability of models for multiple diagnostic testing in theabsence of a gold standard. Biometrics 66:855–863.

Joseph L, Gyorkos TW, Coupal L. 1995. Bayesian estimation ofdisease prevalence and the parameters of diagnostic tests inthe absence of a gold standard. Am J Epidemiol 141:263–272.

Jungers WL, Falsetti AB, Wall CE. 1995. Shape, relative size,and size-adjustments in morphometrics. Yb Phys Anthropol38:137–161.

Kass RE, Raftery AE. 1995. Bayes factors. J Am Stat Assoc 90:773–795.

Katz D, Suchey JM. 1986. Age determination of the male ospubis. Am J Phys Anthropol 69:427–435.

Kaye DH. 2009. Identification, individualization and unique-ness: what’s the difference? Law Probab Risk 8:85–94.

K�ery M. 2010. Introduction to WinBUGS for ecologists: a Bayes-ian approach to regression, ANOVA and related analyses.Burlington, MA: Academic Press.

K�ery M, Schaub M. 2012. Bayesian population analysis usingWinBUGS: a hierarchical perspective. New York, NY: Elsevier.

Koehler J, Saks M. 1991. What DNA ‘fingerprinting’ can teachthe law about the rest of forensic science. Cardozo Law Rev13:361–372.

Konigsberg L, Holman D. 1999. Estimation of age at death fromdental emergence and implications for studies of prehistoricsomatic growth. In: Hoppa RD, FitzGerald CM, editors.Human growth in the past: studies from bones and teeth.New York, NY: Cambridge University Press. p 264–289.

Konigsberg LW, Algee-Hewitt BFB, Steadman DW. 2009. Esti-mation and evidence in forensic anthropology: sex and race.Am J Phys Anthropol 139:77–90.

Konigsberg LW, Frankenberg SR. 1992. Estimation of age struc-ture in anthropological demography. Am J Phys Anthropol89:235–256.

Konigsberg LW, Hens SM, Jantz LM, Jungers WL. 1998. Stat-ure estimation and calibration: Bayesian and maximum likeli-hood perspectives in physical anthropology. Yb PhysAnthropol 41:65–92.

Konigsberg LW, Herrmann NP. 2002. Markov chain MonteCarlo estimation of hazard model parameters in paleodemog-raphy. In: Hoppa RD, Vaupel JW, editors. Paleodemography:age distributions from skeletal samples. New York, NY: Cam-bridge University Press. p 222–242.

Konigsberg LW, Herrmann NP, Wescott DJ, Kimmerle EH.2008. Estimation and evidence in forensic anthropology: age-at-death. J Forensic Sci 53:541–557.

Konigsberg LW, Ross AH, Jungers WL. 2006. Estimation andevidence in forensic anthropology: stature. In: Schmitt A,Cunha E, Pinheiro J, editors. Forensic anthropology and med-icine: complementary sciences from recovery to cause ofdeath. Totowa, NJ: Humana Press. p 317–331.

Kontanis EJ, Sledzik PS. 2008. Resolving commingling issuesduring the medicolegal investigation of mass fatality inci-dents. In: Adam BJ, Byrd JE, editors. Recovery, analysis, andidentification of commingled human remains. Totowa, NJ:Springer. p 317–336.

Kruschke J. 2010a. Doing Bayesian data analysis: a tutorialintroduction with R and BUGS. Burlington, MA: AcademicPress.

Kruschke JK. 2010b. Bayesian data analysis. Wiley InterdiscipRev Cogn Sci 1:658–676.

182 L.W. KONIGSBERG AND S.R. FRANKENBERG

American Journal of Physical Anthropology

Page 31: Bayes in biological anthropology

Laval G, Excoffier L. 2004. SIMCOAL 2.0: a program to simu-late genomic diversity over large recombining regions in asubdivided population with a complex history. Bioinformatics20:2485–2487.

Lee PM. 2012. Bayesian statistics: an introduction. West Sus-sex, UK: Wiley.

Lesaffre E, Lawson AB. 2012. Bayesian biostatistics. West Sus-sex, UK: Wiley.

Lin T-H, Myers EW, Xing EP. 2006. Interpreting anonymousDNA samples from mass disasters—probabilistic forensicinference using genetic markers. Bioinformatics 22:e298–e306.

Link W, Barker R. 2010. Bayesian inference with ecologicalapplications. San Diego, CA: Elsevier.

Lucy D. 2005. Introduction to statistics for forensic scientists.Hoboken, NJ: Wiley.

Lunn D, Spiegelhalter D, Thomas A, Best N. 2009. The BUGSproject: evolution, critique and future directions. Stat Med 28:3049–3067.

Lunn DJ, Thomas A, Best N, Spiegelhalter D. 2000. Win-BUGS—a Bayesian modelling framework: concepts, structure,and extensibility. Stat Comput 10:325–337.

Lyman RL. 2006. Identifying bilateral pairs of deer (Odocoileussp.) bones: how symmetrical is symmetrical enough? JArchaeol Sci 33:1256–1265.

Madrigal L. 1998. Statistics for anthropology. Cambridge, UK:Cambridge University Press.

Matauschek C, Roos C, Heymann EW. 2011. Mitochondrial phy-logeny of tamarins (Saguinus, Hoffmannsegg 1807) with taxo-nomic and biogeographic implications for the S. nigricollisspecies group. Am J Phys Anthropol 144:564–574.

Matsuda I, Kubo T, Tuuga A, Higashi S. 2010. A Bayesian anal-ysis of the temporal change of local density of proboscis mon-keys: implications for environmental effects on a multilevelsociety. Am J Phys Anthropol 142:235–245.

Mays S, Faerman M. 2001. Sex identification in some putativeinfanticide victims from Roman Britain using ancient DNA. JArchaeol Sci 28:555–559.

McGrayne SB. 2011. The theory that would not die: how Bayes’rule cracked the enigma code, hunted down Russian submar-ines, & emerged triumphant from two centuries of contro-versy. New Have, CT: Yale University Press.

Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH,Teller E. 1953. Equation of state calculations by fast comput-ing machines. J Chem Phys 21:1087–1092.

Miao W, Gastwirth JL. 2004. The effect of dependence on confi-dence intervals for a population proportion. Am Stat 58:124–130.

Millard AR, Gowland RL. 2002. A Bayesian approach to theestimation of the age of humans from tooth development andwear. Archeol Calcolatori 13:197–210.

Montelius K, Lindblom B. 2012. DNA analysis in disaster vic-tim identification. Forensic Sci Med Pathol 8:140–147.

Moore DS. 1997. Bayes for beginners? Some reasons to hesitate.Am Stat 51:254–261.

Muchlinski MN, Durham EL, Smith TD, Burrows AM. 2013.Comparative histomorphology of intrinsic vibrissa muscula-ture among primates: implications for the evolution of sen-sory ecology and “face touch”. Am J Phys Anthropol 150:301–312.

Mundorff AZ. 2008. Anthropologist-directed triage: three dis-tinct mass fatality events involving fragmentation of humanremains. In: Adams BJ, Byrd JE, editors. Recovery, analysis,and identification of commingled human remains. Totowa,NJ: Springer. p 123–144.

National Research Council Committee on DNA Technology inForensic Science. 1996. The evaluation of forensic DNA evi-dence: an update. Washington, DC: National Academies Press.

Neal RM. 2003. Slice sampling. Ann Stat 31:705–741.Nikita E, Lahr MM. 2011. Simple algorithms for the estimation

of the initial number of individuals in commingled skeletalremains. Am J Phys Anthropol 146:629–636.

Ntzoufras I. 2011. Bayesian modeling using WinBUGS. NewYork, NY: Wiley.

O’Hagan AO. 2004. Bayesian statistics: principles and benefits.In: van Boekel MA, Stein A, van Bruggen AHC, editors.Bayesian statistics and quality modelling in the agro-foodproduction chain. Dordrecht, NL: Kluwer Academic. p 31–45.

O’Brien M, Storlie CB. 2011. An alternative bilateral refittingmodel for zooarchaeological assemblages. J Taphonomy 9:245–268.

Ousley S, Jantz R, Freid D. 2009. Understanding race andhuman variation: why forensic anthropologists are good atidentifying race. Am J Phys Anthropol 139:68–76.

Ousley SD, Jantz RL. 2012. Fordisc 3 and statistical methodsfor estimating sex and ancestry. In: Dirkmatt DC, editor. Acompanion to forensic anthropology. London, UK: Wiley-Blackwell. p 311–329.

Paradis E. 2012. Analysis of phylogenetics and evolution withR. New York, NY: Springer.

Pham-Gia T. 2000. Distributions of the ratios of independentbeta variables and applications. Commun Stat A Theor 29:2693–2715.

Plummer M, Best N, Cowles K, Vines K. 2006. CODA: Conver-gence diagnosis and output analysis for MCMC. R News 6:7–11.

Pouillot R, Gerbier G, Gardner IA. 2002. “TAGS”, a program forthe evaluation of test accuracy in the absence of a gold stand-ard. Prev Vet Med 53:67–81.

Prinz M, Carracedo A, Mayr W, Morling N, Parsons T, SajantilaA, Scheithauer R, Schmitter H, Schneider PM. 2007. DNACommission of the International Society for Forensic Genetics(ISFG): recommendations regarding the role of forensic genet-ics for disaster victim identification (DVI). Forensic Sci IntGenetics 1:3–12.

Pritchard JK, Seielstad MT, Perez-Lezaun A, Feldman MW.1999. Population growth of human Y chromosomes: a study ofY chromosome microsatellites. Mol Biol Evol 16:1791–1798.

R Development Core Team. 2013. R: a language and environ-ment for statistical computing. Vienna, Austria: R Foundationfor Statistical Computing, URL http://www.R-project.org.Accessed October 9, 2013.

Raaum RL, Al Meeri A, Mulligan CJ. 2013. Culture modifiesexpectations of kinship and sex-biased dispersal patterns: acase study of patrilineality and patrilocality in tribal yemen.Am J Phys Anthropol 150:526–538.

Raiffa H, Schlaifer R. 1961. Applied statistical decision theory.Boston, MA: Harvard Business School.

Ray N, Currat M, Foll M, Excoffier L. 2010. SPLATCHE2: aspatially explicit simulation framework for complex demogra-phy, genetic admixture and recombination. Bioinformatics 26:2993–2994.

Ray N, Excoffier L. 2009. Inferring past demography using spa-tially explicit population genetic models. Hum Biol 81:141–157.

Robert CP, Cornuet J-M, Marin J-M, Pillai NS. 2011. Lack ofconfidence in approximate Bayesian computation modelchoice. Proc Natl Acad Sci USA 108:15112–15117.

Ross AH, Konigsberg LW. 2002. New formulae for estimatingstature in the Balkans. J Forensic Sci 47:165–167.

Samaniego FJ. 2010. A comparison of the Bayesian and fre-quentist approaches to estimation. New York, NY: Springer.

Schneider PM. 2007. Scientific standards for studies in forensicgenetics. Forensic Sci Int 165:238–243.

S�eguy I, Caussinus H, Courgeau D, Buchet L. 2013. Estimatingthe age structure of a buried adult population: a new statisti-cal approach applied to archaeological digs in France. Am JPhys Anthropol 150:170–183.

Sims CA. 2007. Bayesian methods in applied econometrics, or,why econometrics should always and everywhere be Bayes-ian. Unpublished paper available at http://sims.princeton.edu/yftp/EmetSoc607/AppliedBayes.pdf. Accessed October 9, 2013.

Slice DE, editor. 2005. Modern morphometrics in physicalanthropology. New York, NY: Kluwer Academic / Plenum.

Smith C. 1986. The development of human linkage analysis.Ann Hum Genet 50:293–311.

Sober E. 2002. Bayesianism-its scope and limits. Proc Br Acadpress 113:21–38.

BAYES IN BIOLOGICAL ANTHROPOLOGY 183

American Journal of Physical Anthropology

Page 32: Bayes in biological anthropology

Soliman AA, Abd-Ellah AH, Abou-Elheggag NA, Abd-ElmougodGA. 2012. Estimation of the parameters of life for Gompertzdistribution using progressive first-failure censored data.Comput Stat Data Anal 56:2471–2485.

Stangl D. 1998. Classical and Bayesian paradigms: can we teachboth? In: Pereira-Mendoza L, Kea LS, Kee TW, Wong W, editors.Proceedings of the Fifth International Conference on TeachingStatistics. International Statistics Institute. p 251–258.

Steadman DW, Adams BJ, Konigsberg LW. 2006. Statisticalbasis for positive identification in forensic anthropology. Am JPhys Anthropol 131:15–26.

Stigler SM. 1982. Thomas Bayes’s Bayesian inference. J R StatSoc Ser A 145:250–258.

Stigler SM. 1986. Laplace’s 1774 memoir on inverse probability.Stat Sci 1:359–363.

Sturtz S, Ligges U, Gelman AE. 2005. R2WinBUGS: a packagefor running WinBUGS from R. J Stat Softw 12:1–16.

Taroni F, Bozza S, Biedermann A, Garbolino P, Aitken C. 2010.Data analysis in forensic science: a Bayesian decision per-spective. West Sussex, UK: Wiley.

Tavar�e S, Balding DJ, Griffiths R, Donnelly P. 1997. Inferringcoalescence times from DNA sequence data. Genetics 145:505–518.

Thomas DH. 1986. Refiguring anthropology: first principles ofprobability and statistics. Prospect Heights, IL: WavelandPress.

Thompson WC. 2012. Discussion paper: hard cases make badlaw—reactions to R v T. Law Probability Risk 11:347–359.

Thompson WC, Schumann EL. 1987. Interpretation of statisti-cal evidence in criminal trials: the prosecutor’s fallacy andthe defense attorney’s fallacy. Law Hum Behav 11:167.

Tobi H, van den Berg PB, de Jong van den Berg L. 2005. Smallproportions: what to report for confidence intervals? Pharma-coepidemiol Drug Safety 14:239–247.

Toft N, J�rgensen E, H�jsgaard S. 2005. Diagnosing diagnostictests: evaluating the assumptions underlying the estimationof sensitivity and specificity in the absence of a gold standard.Prev Vet Med 68:19–33.

Tsutaya T, Yoneda M. 2013. WARN: an R package for quantita-tive reconstruction of weaning ages in archaeological popula-tions using bone collagen nitrogen isotope ratios. arXivpreprint, available from http://arxiv.org/pdf/1304.2468.pdf.Accessed October 9, 2013.

Tummers B. 2006. DataThief III. In: http://datathief.org/, editor.Retrieved October 9, 2013. http://datathief.org/.

Turner BM, Van Zandt T. 2012. A tutorial on approximateBayesian computation. J Math Psychol 56:69–85.

Uhl NM, Rainwater CW, Konigsberg LW. 2013. Testing for sizeand allometric differences in fossil hominin body mass esti-mation. Am J Phys Anthropol 151:215–229.

Vollset SE. 1993. Confidence intervals for a binomial proportion.Stat Med 12:809–824.

Waldron T, Taylor GM, Rudling D. 1999. Sexing of Romano-British baby burials from the Beddingham and Bignor villas.Sussex Archaeol Collections 137:71–79.

Wilk MB, Gnanadesikan R. 1968. Probability plotting methodsfor the analysis of data. Biometrika 55:1–17.

Yang M, Yang Y, Cui D, Fickenscher G, Zinner D, Roos C,Brameier M. 2012. Population genetic structure of Guizhousnub-nosed monkeys (Rhinopithecus brelichi) as inferred frommitochondrial control region sequences, and comparison withR. roxellana and R. bieti. Am J Phys Anthropol 147:1–10.

Zinner D, Wertheimer J, Liedigk R, Groeneveld LF, Roos C.2013. Baboon phylogeny as inferred from complete mitochon-drial genomes. Am J Phys Anthropol 150:133–140.

184 L.W. KONIGSBERG AND S.R. FRANKENBERG

American Journal of Physical Anthropology