hierarchical bayesian modelling for saccharomyces cerevisiae population dynamics

11
 Hierarchical Bayesian Modelling for  Saccharomyces cerevisiae  population dynamics Aymé Spor a, , Christine Dillmann a , Shao xiao Wang a,b , Dominique de Vienne a , Delphine Sicard a , Eric Parent c a Univ Paris-Sud, UMR 0320/UMR 8120 Génétique Végétale, F-91190 Gif-sur-Yvette, France b Department of Biochemistry and Molecular Biology, Louisiana State University Health Sciences Center, Shreveport, LA 71130, USA c UMR MIA 518, INRA/AgroParisTech, ENGREF, 19 avenue du Maine, F-75015 Paris, France a b s t r a c t a r t i c l e i n f o  Article history: Received 10 March 2009 Received in revised form 6 April 2010 Accepted 14 May 2010 Keywords: Hierarchical Bayesian Modelling Saccharomyces cerevisiae Population dynamics Hierarchical Bayesian Modelling is powerful however under-used to model and evaluate the risks associated with the development of pathogens in food industry, to predict exotic invasions, species extinctions and deve lopment of emer ging dis eas es, or to assess chemical risks. Modelli ng popu lat ion dyna mics of Saccharomyces cerevisiae consideri ng its biodivers ity and other sources of variabilit y is crucial for select ing strai ns meeting indust rial needs. Using this approach , we studied the popula tion dynamics of  S. cerevisiae, the domestica ted yeast, widely encoun tered in food industr y, notably in brewer y, vinery, bakery and distillery. We relied on a logistic equation to estimate the key variables of population growth, but we took als o int o acco unt fact ors abl e to aff ect them, namely environmental eff ects , gene tic dive rsi ty and measur ement errors. Our proba bilist ic approach allowed us: (i) to model the dynamical behaviour of strains in a given condition under some uncertainty, (ii) to measure environmental effects and (iii) to evaluate genetic variability of the growth key variables. © 2010 Elsevier B.V. All rights reserved. 1. Introduction Biological data, whate ver the  eld of resear ch, are mos tly dynamical or spatial,  i.e.  they are function of time and/or of spatial coordinates. The challenge of the biologist is to explain the variations of trait s by the variation of expl anatory varia bles . When ever the observations are collected at different time or space points from the same biological sample, they become dependent because they are  jointly related through time or space. Statistical analysis consists in modelling (depicting the various sources of variation) and inference (est imati ng the parameters of the model ). Histo rical ly, stati stic al analysis has been developed from a  frequentist  point of view: the parameters are considered to have a  xed value, and estimates of this valu e are searched  via  various statistical proc edure s of infe rence (moment adju stme nt, maximum like liho od estimate s, etc. ). Most stati stica l toolb oxes that are avai lable to biolo gists are desi gned according to the frequentist approach. However, they are generally restricted to the analysis of linear models,  i.e.  to the cases where the res pon se is lin ear thr oug h time and /or spa ce.When dea lin g with non - linear proce sses, the probl em beco mes much more complex , and requires sophistica ted statistical tools which are usually not mastered by the biologists. As a result, they have to use non-optimal methods for properly analyzing or even simply detecting differences between two curves, for instance population growth curves. The Bayesian approach is another way for analyzing biological data. Given uncertainty on parameter values, a so-called prior probability distribution is assigned to the parameter in a modelling step, taking possibly into account previous knowledge on the parameter. Bayesian inference can be interpreted as formulating a probabilistic judgment abo ut theunkno wnsof themodelgiventhe obs erv ed dat a (up dati ngthe prior into a posterior distribution). Because Bayesians traditionally put more emph asis in the mod ell ing proc ess , the Baye sia n statist ica l fra mew ork prov ide s an easyway of thi nki ng abou t bio log ica l prob lems. Unlike the freque ntist estimation techniq ues, dealing with compl ex models (non-linearity, dependence) does not bring much additional dif culties to the Bayesian inferential algorithms. Hierarchical Bayesian Modelling (HBM) is a probabilistic, adapt- able and ef cient framework for modelling dynamical processes by tak ing into accoun t mul tip le sourc es of var iat ion . This type of mod el is not restricted to specic problems and can be generically applied to a vast extent of dynamical and spatial systems. Hierarchical statistical mode lling has the potential to matc h high dimens ion probl ems through conditional decomposition into a series of probabilistically linked simpler substructures (Clark and Gelfand, 2006). Hierarchical statistical models are made of three  layers ( Wikle, 2003). First, an expe rime ntal data leve l spec ies the dis tri bution of theobser vab lesat hand given the parameters and the underlying processes. Second, a latent process level depicts the various hidden biological mechanisms that make sense of the data. For example in this article, the latent International Journal of Food Microbiology 142 (2010) 2535  Corresponding author. E-mail address:  [email protected] (A. Spor). 0168-1605/$  see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.ijfoodmicro.2010.05.012 Contents lists available at  ScienceDirect International Journal of Food Microbiology  j ou r na l h omepag e : www. e l s ev i er.com/ l oc a t e / i j f o o d mi cr o

Upload: alfonso

Post on 04-Nov-2015

14 views

Category:

Documents


0 download

DESCRIPTION

Hierarchical Bayesian Modelling for Saccharomyces Cerevisiae Population Dynamics

TRANSCRIPT

  • aa,b,

    eienc

    llinathdi

    nsideedidegistable to affect them, namely environmental effects, genetic diversity andprobabilistic approach allowed us: (i) to model the dynamical behaviour of

    evaluate genetic variability of the growth key var

    eld oction ogist istory v

    observations are collected at different time o

    jointly related through time or space. Statistical analysis consists in prior into a posterior distribution). Because Bayesians traditionally put

    International Journal of Food Microbiology 142 (2010) 2535

    Contents lists available at ScienceDirect

    International Journal o

    j ourna l homepage: www.e lsevmodelling (depicting the various sources of variation) and inference(estimating the parameters of the model). Historically, statisticalanalysis has been developed from a frequentist point of view: theparameters are considered to have a xed value, and estimates of thisvalue are searched via various statistical procedures of inference(moment adjustment, maximum likelihood estimates, etc.). Moststatistical toolboxes that are available to biologists are designedaccording to the frequentist approach. However, they are generallyrestricted to the analysis of linear models, i.e. to the cases where theresponse is linear through time and/or space.When dealing with non-

    more emphasis in the modelling process, the Bayesian statisticalframework provides an easyway of thinking about biological problems.Unlike the frequentist estimation techniques, dealing with complexmodels (non-linearity, dependence) does not bring much additionaldifculties to the Bayesian inferential algorithms.

    Hierarchical Bayesian Modelling (HBM) is a probabilistic, adapt-able and efcient framework for modelling dynamical processes bytaking into account multiple sources of variation. This type of model isnot restricted to specic problems and can be generically applied to avast extent of dynamical and spatial systems. Hierarchical statisticallinear processes, the problem becomes murequires sophisticated statistical tools whichby the biologists. As a result, they have to us

    Corresponding author.E-mail address: [email protected] (A. Spor).

    0168-1605/$ see front matter 2010 Elsevier B.V. Aldoi:10.1016/j.ijfoodmicro.2010.05.012r space points from thendent because they are

    inference can be interpreted as formulating a probabilistic judgmentabout theunknownsof themodel given theobserveddata (updating thesame biological sample, they become depe1. Introduction

    Biological data, whatever the dynamical or spatial, i.e. they are funcoordinates. The challenge of the bioloof traits by the variation of explana 2010 Elsevier B.V. All rights reserved.

    f research, are mostlyf time and/or of spatialto explain the variationsariables. Whenever the

    for properly analyzing or even simply detecting differences betweentwo curves, for instance population growth curves.

    The Bayesian approach is another way for analyzing biological data.Given uncertainty on parameter values, a so-called prior probabilitydistribution is assigned to the parameter in a modelling step, takingpossibly into account previous knowledge on the parameter. Bayesianch more complex, andare usually not masterede non-optimal methods

    modelling hasthrough conditiolinked simpler sstatistical modelexperimental dahand given thelatent process lethat make sense

    l rights reserved.iables.

    strains in a given condition under some uncertainty, (ii) to measure environmental effects and (iii) toPopulation dynamics also into account factorsmeasurement errors. OurHierarchical Bayesian Modelling for Sacch

    Aym Spor a,, Christine Dillmann a, Shaoxiao WangDelphine Sicard a, Eric Parent c

    a Univ Paris-Sud, UMR 0320/UMR 8120 Gntique Vgtale, F-91190 Gif-sur-Yvette, Francb Department of Biochemistry and Molecular Biology, Louisiana State University Health Scc UMR MIA 518, INRA/AgroParisTech, ENGREF, 19 avenue du Maine, F-75015 Paris, France

    a b s t r a c ta r t i c l e i n f o

    Article history:Received 10 March 2009Received in revised form 6 April 2010Accepted 14 May 2010

    Keywords:Hierarchical Bayesian ModellingSaccharomyces cerevisiae

    Hierarchical Bayesian Modewith the development of pdevelopment of emergingSaccharomyces cerevisiae costrains meeting industrial nthe domesticated yeast, wdistillery. We relied on a loromyces cerevisiae population dynamics

    Dominique de Vienne a,

    es Center, Shreveport, LA 71130, USA

    g is powerful however under-used to model and evaluate the risks associatedogens in food industry, to predict exotic invasions, species extinctions andseases, or to assess chemical risks. Modelling population dynamics ofering its biodiversity and other sources of variability is crucial for selectings. Using this approach, we studied the population dynamics of S. cerevisiae,ly encountered in food industry, notably in brewery, vinery, bakery andic equation to estimate the key variables of population growth, but we took

    f Food Microbiology

    i e r.com/ locate / i j foodmicrothe potential to match high dimension problemsnal decomposition into a series of probabilisticallyubstructures (Clark and Gelfand, 2006). Hierarchicals are made of three layers (Wikle, 2003). First, anta level species the distribution of the observables atparameters and the underlying processes. Second, avel depicts the various hidden biological mechanismsof the data. For example in this article, the latent

  • 26 A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535process level describes the population growth process with a logisticmodel. Third, a parameter level identies the xed quantities thatwould be sufcient, were they known, to mimic the behaviour of thesystem and to produce new data statistically similar to the onesalready collected. Sources of variation on the parameters can beadded: some authors (Bernier et al., 2000; Nauta, 2000, 2002) makethe distinction between variability (i.e. uncertainty by essence thatcannot be reduced by additional information) and uncertainty (oruncertainty by ignorance that should decrease as the sample sizeincreases). In this paper, variability will describe changes with respectto biotic or abiotic variation, while uncertainty accounts formeasurement errors and model imperfections (Shorten et al., 2004).

    In predictive microbiology, HBM has been particularly used for riskassessment and food shelf-life estimation. It is convenient to predictpathogenic bacterial behaviour in case of contamination, because itmakes it possible to quantify separately the effects of environmentalfactors (temperature, pH and available resources), genetic variation andmeasurement uncertainty. Some population dynamic model usingBayesian inferencearenowavailable anddescribe for example the effectof temperature on growth of Listeria monocytogenes, Salmonella,Escherichia coli, Clostridium perfringens and Bacillus cereus on porkmeats, milk, seafoods or egg products (Delignette-Muller et al., 2006;Membre et al., 2005; Pouillot et al., 2003) or the development of agrowing population of bacterial cells from an inoculum of dormantspores (Barker et al., 2005).

    In ecology, this probabilistic framework is increasingly used toexamine population dynamics because it can take easily into accountmultiple sources of stochasticity (such as space, time and individualheterogeneities), while in standard statistical models, only processerrors are routinely included (Calder et al., 2003; Clark, 2003). HBMhas much to offer, including more precise parameter estimation(Calder et al., 2003), and it becomes more and more used to predictexotic invasions, extinction risk or development of emergingdiseases. For example, HBM has been successfully implemented tomodel the invasive Eurasian Collared-Dove dynamics (Hooten et al.,2007), to estimate species richness and spatial occupancy (Kry andRoyle, 2008), the various failures in a Dynamic Energy Budgetmechanism for ecotoxical Daphnid data (Billoir et al., 2008) or topredict the relative abundance of House Finches over the easternUnited States (Wikle, 2003).

    The yeast Saccharomyces cerevisiae, a common biological model ingenetics, genomics and physiology, has been exploited since Neolithicperiod to produce fermented beverages and bread dough. Because ofthe consumers' reluctance about genetically modied organisms, itseems unrealistic to improve strains by genetic engineering. Anotherstrategy is to exploit present natural biodiversity of yeast, whichrequires characterizing strains, searching for suitable physiologicaltraits for industrial purposes, and planning genetic resource manage-ment because it is not possible to give the same maintenance effort toall strains.

    Bakers need to develop strains with hyper-osmolarity resistance,brewers strains with high fermentation rates and short lag phases,and oenologists strains tolerant to ethanol for completing fermenta-tion (Boekhout and Robert, 2003). These different properties havebeen shown to be related to population dynamic characteristics and tointeract with the environment. The population dynamics of S.cerevisiae depends both on the genetic background of the strainsand on environmental factors such as temperature (Beltran et al.,2002) or glucose content of the medium (Spor et al., 2008). The latterstudy demonstrated a strong impact of the food-processing use ofstrains on population dynamic key variables (Spor et al., 2008).Similarly, Domizio et al., 2007 described a close relationship betweenwine attributes and Saccharomyces spp population dynamics. Thus,predicting population growth and modelling genetic and non-geneticvariation would help for yeast genetic resource management and for

    selecting industrial starter strains.We used HBM to describe S. cerevisiae population dynamics. Theexperimental data consisted in population size counts over time for 12S. cerevisiae strains grown in three culture media. The latent processrelied on a logistic equation depending on three populationparameters, which divides the population growth into two phases,an exponential growth from an initial population of size N0 with anintrinsic growth rate r, followed by a decrease of the populationgrowth which leads to a stationary phase, characterized by amaximum population size K, also called carrying capacity in ecology.The latent process model described differences in these key variableswith respect to both environmental effects (glucose content in theculture medium) and genetic variation between strains. Finally, theuncertainty related to measurement errors was described.

    2. Materials and methods

    2.1. Principle of the Bayesian inference

    Bayesian inference, or model learning, is the process of updatingprior beliefs about unknowns by probabilistic machinery based uponthe relationships in the model and the observations recorded aboutthe situation.

    By contrast with the classical approach, which begins with ahypothesis test that proposes a specic value for anunknownparameter, Bayesian inference proposes a prior distribution p() for thisparameter which represents the beliefs originally encoded in themodel. Data x1, x2,, xn are collected and the likelihood f(x1, x2,, xn| )is calculated given the parameter values (as in the frequentist case).

    Then the probabilities of all the other variables that are connected tothe variable representing the new data are updated. Bayes's theorem isused to calculate the posterior distribution g(| x1, x2, , xn). Afterinference, the updated probabilities reect the new levels of belief in (orprobabilities of) all possible outcomes encoded in the model.

    2.2. Data

    The experimental data used to develop this model have beenpublished in (Spor et al., 2008). Strain origin, culture mediumcomposition and population size measurements are detailed in theMaterial and method section of Spor et al. (2008). To sum it up, 12strains stemming from three industrial origins (vinery, brewery andbakery) were grown in three media differing by their glucoseconcentration (0.25%, 1% and 15%). Every two hours samples weretaken, diluted and plated to estimate population size. Three biologicalreplicates were performed for each medium-by-strain combination,each time starting with a new inoculum. The population size wasexpressed in CFU/mL (Colony Forming Units). The experimental dataare also called observations in the Bayesian setting.

    2.3. Model

    Our aim was to construct a population dynamic model capable tocorrectly predict the population size Ns,m,t of strain s in medium mover time t. Fig. 1 illustrates the corresponding Directed Acyclic Graphthat points out the conditional dependence between nodes. In thisframework, parameters and observations could either be consideredas logical or stochastic nodes of the model. Logical nodes correspondto nodes that are deterministic functions of other nodes, andstochastic nodes correspond to nodes that are described by probabil-ity laws. The description of the nodes is given in Table 1.

    2.4. Description of the latent process

    We assumed that S. cerevisiae population growth follows a logisticequation. This equation is classically used in ecology to model

    microbial as well as animal population dynamics, and is central in

  • ; covted

    27A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535the mathematical denition of the famous r and K strategies inecology (MacArthur andWilson, 1967). Two types of logistic equationcould be considered: with or without lag-phase. Because freshmedium was inoculated after an overnight pre-culture, we used thelogistic model without lag-phase. Thus population size followed:

    NK; r;N0s;m; t =Ks;mN0s;me

    rs;mt

    Ks;m + N0s;mers;mt11

    where N([K,r,N0]s,m,t), the population size at time t, depends on thevariables K, r and N0 of strain s in the medium m. K is the carryingcapacity (maximum population size) expressed in CFU/mL, N0 is theinitial population size also expressed in CFU/mL and r is the intrinsic

    Fig. 1.Directed Acyclic Graph of themodel (DAG). Data (Ys,m,t) are denoted by rectanglesto stochastic dependences between nodes while broken arrows indicate logical link. Dotmedium).growth rate (equivalent to the maximum rate of increase of thepopulation, in min1).

    2.5. Sources of variability on parameters

    Our aim was to estimate posterior distributions of the latent keyvariables K, r and N0 for each strain in each glucose condition. In theliterature of system analysis, there are commonly named populationdynamic parameters, which turns to be a rather inappropriate term ina statistical modelling framework since, contrary to statisticalparameters, they vary as latent (i.e. unobserved) random variablesdepending on factors of explanations or grouping of data. Modellingthe variability consists in dening how the environment, as well asthe genetic variation between strains, would affect populationdynamic key variables. In this context, variations can be described

    Table 1Description of the links between nodes.

    Node Type Denitiona

    Ys,m,t Stochastic N(Ns,m,t, Ns,m,t)Ns,m,t Logical Eq. (1)Ks,m Stochastic N(Kmeanm, Ksdm)rs,m Stochastic N(rmeanm, rsdm)N0s,m Stochastic N(N0meanm, N0sdm)Ns,m,t Stochastic N(0, Ns,m,t)

    a N(a,b), normal distribution with expected value a and standard deviation b.by normal distributions N(, sd) dened by two parameters, the mean and the standard deviation sd. A xed effect would be an effect thatchanges the mean of a latent variable K, r or N0, while a randomeffect would change their standard deviation sd. This introducesadditional correlations between individuals of the same group, i.e.individuals of the same strain. The degree of resemblance will betuned by the standard deviation sd of the latent variable.

    In our case, two sources of variation could affect populationdynamics: the glucose concentration in the medium and geneticdifferences between strains. As each culture condition may affectyeast population dynamics in a specic manner, the medium effectwas considered to be xed and was described by a mean value foreach parameter Kmeanm, rmeanm and N0meanm in each glucose

    ariates by double rectangles (t) and latent variables by ellipses. Solid arrows correspondblue rectangles illustrated the embedded levels of the modelling (timepoint, strain andcondition m. Note that in the case of N0meanm, there is no causalrelationship between the glucose content of the medium and thisparameter. However, the experiments in the 15% glucose conditionwere performed by a different experimenter from those performed inthe 1% and 0.25% glucose conditions. The variation of the N0meanmparameter represents therefore the variation of the inoculumconditionally to the experimenter. The mean values of the populationdynamic latent variables K, r orN0 were assumed to be the same for allstrains in a given glucose condition. The differences between strainswere considered as a genetic random effect, statistically described bythe standard deviations Ksdm, rsdm, N0sdm of the normal distributions.

    Mathematically, for each strain s in each medium m, we chose todraw the latent key variables (K, r and N0) in independent normaldistributions with parameters Kmeanm, rmeanm and N0meanm asexpected values. The other parameters, the standard deviations Ksdm,rsdm and N0sdm, rule the range of variation for the variables aroundtheir mean in each glucose condition m:

    Ks;mNKmeanm;Ksdm;rs;mNrmeanm; rsdm;N0s;mNN0meanm;N0sdm:

    In other words, there is a random effect Ks,m=Ks,mKmeanm,corresponding to the different behaviours of two strains s and s in agiven glucose condition m (cov(Ks,m, Ks,m)=0), while there iscorrelation between data stemming from the same strain (cov(Ks,m,

  • Ks,m=(Ksdm)2 when s=s). This covariation gives the dependence

    structure of the model.Note that we explicitly allowed for genotype-by-environment

    interactions because the standard deviation of the latent variablesdepended on the environment m.

    2.6. Description of the uncertainty related to measurement errors

    If we consider a strain s in a culture condition m at a time t, theobservation Ys,m,t writes:

    Ys;m;t = Ns;m;t + Ns;m;t

    where Ns,m,t corresponds to the residual error around the theoreticallaw of Ns,m,t. The three replicates for each medium-by-straincombination were pooled, so that encompasses both technical and

    inference, described in next paragraph) works with precisionparameters (Kprec, rprec, N0prec and 2) which are the reciprocalof the square of the standard deviations. Precision parameters weredrawn in G(103,103), where G(a, b) is a Gamma distribution ofshapeparametera and scale parameterb (Table 2). Settinga=b=103

    is a common Bayesian practice for picking non informative precisionpriors.

    2.8. Bayesian inference

    Bayesian inferences of parameter values were performed usingWinBUGS software ( MRC Biostatistics Unit (Spiegelhalter et al.,2003)). After an adaptation phase (also called burn-in phase (Gilkset al., 1996)) of 4000 iterations, the convergence of the Monte CarloMarkov Chain (MCMC) algorithm was checked by visual inspection of

    datapoints is reected by the posterior distribution of the latent

    rate andN0 the initial population size), but also on xed and random

    P0.0

    603.33106 1 5.2841060 7.9210305 3.04108

    000

    , Gaesey: 8

    28 A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535biological variations. This model was chosen because the variationsbetween the three replicates of the same strain in the same mediumwere very low. In such model, the variability between the replicates istaken into account, but we neglect the dependencies between datapoints belonging to the same replicate.

    As the inspection of the observations revealed that the experi-mental error was increasing with the population size, we chose todraw the error in a normal distribution, centered at 0, with a standarddeviation equal to N:

    Ns;m;tN0; Ns;m;t;

    where is the residual standard deviation multiplier of the model.

    2.7. Prior distributions

    The prior distributions of the means Kmeanm, rmeanm andN0meanm have been drawn in normal distributions (Table 2). Theprior distribution for Kmean has been chosen as wide and as at aspossible because the culture media covered a wide range of glucoseconditions, and the carrying capacity Kmean should reect thenutrient content of the medium. From the literature, the priordistribution for rmean has been chosen with a mean value xed at0.01 min1 and a relatively large standard deviation (Wloch et al.,2001). Finally the prior distributions for N0mean have been xed at 1,with a standard deviation allowing reaching 5 because from 1 to5106 cells have been inoculated in a fresh culture at the beginning ofthe experiments.

    The prior distributions of the standard deviations of the model(Ksd, rsd, N0sd and ) were chosen to favour large values. Theunderlying assumptions were (i) there is variation between strains ina given environment (Ksd, rsd, N0sd) and (ii) themeasurement error islarge (). WinBUGS (the software used to perform the Bayesian

    Table 2Prior distributions used for the parameters.

    Parameter Distributiona

    Kmeanm N(70106, 70.7106)rmeanm N(0.01, 5.7103)N0meanm N(1106, 2.24106)Ksdm Kprecm

    dG(0.001, 0.001)rsdm rprecm

    dG(0.001, 0.001)N0sdm N0precm

    dG(0.001, 0.001) 2dG(0.001, 0.001)a N(a, b), normal distribution with expected value a and standard deviation b; G(a, b)

    rmean and N0mean, we draw only in the positive part of normal distributions because thdistribution truncated for positive values for Kmean, rmean and N0mean are respectivel

    b 2.5th percentile.c 97.5th percentile.d WinBUGS deals with precision parameters, i.e. the reciprocal of the square of the stand7.9210305 3.04108

    7.9210305 3.04108

    7.9210305 3.04108

    mma distribution with shape parameter a and scale parameter b. Note that for Kmean,parameters can only be positive in our conditions. Note that the actual means of normal9.18106, 0.01 and 2.26106.factors related to environmental variation, to genetic differencesbetween lines and to measurement error (see the modellingscheme in Fig. 1).

    25b Median P0.975c

    9.8106 50106 207.62106

    .1103 0.01 0.02variables. The posterior Monte Carlo samples have been directly usedto evaluate the statistics related to the parameters and the latentdynamic population variables (posterior means, standard deviations,medians and 95% credibility intervals). Joint posterior distributions ofparameters and latent variables were studied using the functionpairs under R software. The precision parameters obtained fromWinBUGS have been transformed in standard deviations, Ksd, rsd,N0sd and , to have the same unit for the variability and for the meanof the population dynamic key variables.

    3. Results

    A Bayesian approach was used for estimating populationdynamic key variables in yeast, relying on a modelling frameworkin which the population size N depends not only on the parametersof a logistic function (K the carrying capacity, r the intrinsic growththe good mixing of three independent chains starting at threedifferent initial values for each parameter. Inferences were made onthe following 15000 iterations after the burn-in phase.

    2.9. Empirical posterior distributions

    Altogether, our model comprises 19 parameters: Kmean, rmean,N0mean, Ksd, rsd, N0sd for each of the 3 culture media and . Themodel also comprises 36 latent variables (Ks,m, rs,m and N0s,m). Notethat the biological and technical variability due to replicatedard deviation.

  • 3.1. Efciency of HBM

    This modelling scheme was efcient for studying the S. cerevisiaepopulation dynamics. For each strain/medium, growth modellingallowed us to estimate the key variables K, r and N0, and to predict theresulting growth curves as shown in Fig. 2. As expected, the intervalNs,m2 N ( is the residual standard deviation of the model)included the majority of the experimental data points, indicating thatboth the model used and the way we described experimental errorseemed to be relevant to describe the S. cerevisiae populationdynamics. Striking genotype-by-environment interactions could beobserved for the carrying capacity, since the strain with the highest Kvalue in 15% glucose (Fig. 2A) has the smallest value in 1% glucose(Fig. 2C).

    The comparisons between prior and posterior distributions ofKmean, rmean, Ksd and rsd are shown in Fig. 3. The prior distribution ofKmean was very at and uninformative, whereas the three posteriordistributions (one for each medium) were very narrow, with distinctmeans, even for Kmean1% and Kmean0.25% (Fig. 3A). For the rmeandistributions, differences between prior and posterior distributionswere less, probably because the prior distribution for rmean waschosen from relevant literature. Note that choosing a uniformuninformative prior distribution gave the same posterior distribu-tions. Posterior distributions were more tightened than the prior, andwere distinct between media even if they overlap in a large part.Posterior distributions of Ksd parameters were all Gamma likedistributions despite their quite different shapes (Fig. 3C). Finally,posterior distributions of rsd in the three different media merged andwere quite different from the prior one, which indicates a similar

    genetic variability of the intrinsic growth rate in the three differentculture conditions.

    Empirical posterior distributions are shown in Table 3, andillustrated in Fig. 4 for the 15% glucose conditions. The distributionsof Kmean, rmean and N0mean were roughly symmetric, except forN0mean0.25%, whereas posterior distributions of standard deviationparameters (Ksd, rsd and N0sd) were slightly skewed to the right.

    3.2. Environmental and genetic effects on population dynamics

    The environment and the genetic differences between strains had astrong effect on population dynamics. Descriptive statistics of empiricalposterior distributions are given in Table 3.Kmeanmeanvalues increasedwhen glucose increased in the medium (Kmean0.25%=35.57106,Kmean1%=42.33106 and Kmean15%=96.02106), and rmean meanvalues decreased when the environment was richer (rmean0.25%=1.13102, rmean1%=8.65103 and rmean15%=6.86103). Thedifferences between the N0mean mean values reect experimentalvariations in the cell density at the beginning of the kinetics: in the15% glucose medium, more cells were inoculated (N0mean15%=2.71106) than in the two other media (N0mean1%=0.47106 andN0mean0.25%=0.34106).

    The standard deviations Ksd and rsd directly reect the geneticvariability of population dynamic latent variables K and r among ourcollection of strains in a given medium. Descriptive statistics forstandard deviation parameters are given in Table 3. The variability ofthe carrying capacity was about 2 times higher in the 15% glucosemedium (Ksd15%=21.46106) than in the 1% glucose medium(Ksd1%=12.76106), and about 3 times higher than in the 0.25%

    ed sande m

    29A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535Fig. 2.Modelling the population dynamics of two strains grown in two culture media. Rmodelled population size Ns,m over time from K, r and N0 estimates. Blue dot dashedrepresent the residual standard deviation of the model (or the uncertainty related to th

    media, while B and D represents another strain grown respectively in the 15% and 1% glucoolid diamonds represent experimental data. Black curves represent the evolution of theblue dotted curves represent respectively Ns,m(N) and Ns,m(2N) where easurement). A and C represents respectively a given strain in the 15% and 1% glucose

    se media.

  • rs Kd cu

    30 A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535glucose medium (Ksd0.25%=7.85106) indicating that geneticvariability between strains depended on the medium. In otherwords, we found genotypeenvironment interactions for K. On theopposite, the genetic variability of the intrinsic growth rate (rsd) washigh but robust with regard to environmental changes (rsd1.44102in the three media). The variability of the initial population size was

    Fig. 3. Comparison of prior and posterior distributions of population dynamic parameterepresent respectively posterior distributions in 15%, 1% and 0.25% media. Green dashemore than 10 times higher in the 15% glucose medium than in the twoother media (1.08106 for N0sd15% vs. 0.14106 and 6.45104

    respectively for N0sd1% and N0sd0.25%).Finally, the residual standard deviation multiplier of the model, ,

    has been estimated to 1.398. Thus the real residual standard deviationaround the theoretical law of population dynamic Ns,m,t is drawn in a

    Table 3Descriptive statistics of empirical posterior distributions of parameters.

    Parameter Mean S.D. P0.025a Median P0.975b

    Kmean15% 96.02106 6.76106 82.35106 96.01106 109.5106

    Kmean1% 42.33106 4.06106 34.29106 42.28106 50.56106

    Kmean0.25% 35.57106 3.06106 29.71106 35.48106 41.93106

    rmean15% 6.86103 3.36103 4.55104 6.81103 1.37102

    rmean1% 8.65103 3.34103 2.21103 8.63103 1.52102

    rmean0.25% 1.13102 3.39103 4.59103 1.13102 1.79102

    N0mean15% 2.71106 0.35106 2.02106 2.70106 3.41106

    N0mean1% 0.47106 0.10106 0.37106 0.45106 0.75106

    N0mean0.25% 0.34106 0.09106 0.19106 0.33106 0.56106

    Ksd15% 21.46106 5.31106 13.79106 20.57106 34.07106

    Ksd1% 12.76106 3.23106 8.09106 12.21106 20.42106

    Ksd0.25% 7.85106 3.37106 0.53106 7.79106 14.84106

    rsd15% 1.44102 3.33103 9.53103 1.37102 2.24102

    rsd1% 1.43102 3.26103 9.5103 1.37102 2.22102

    rsd0.25% 1.44102 3.28103 9.53103 1.38102 2.22102

    N0sd15% 1.08106 0.32106 0.60106 1.03106 1.85106

    N0sd1% 0.14106 0.18106 2.98104 8.75104 0.78106

    N0sd0.25% 6.45104 3.94104 2.05104 5.41104 0.17106

    1.39 2.63102 1.34 1.39 1.44

    a 2.5th percentile.b 97.5th percentile.normal distribution centered on 0 with a standard deviation equal to1.39Ns,m,t.

    3.3. Joint distributions and prediction

    Over all strains in a given medium, no correlation was observed

    mean (A), rmean (B), Ksd (C) and rsd (D). Red dot dashed, black dotted and blue curvesrves represent prior distributions.when studying the joint posterior distributions of the parameters ofthemodel as illustrated by the correlation coefcients and the smoothlines in Fig. 4. Parameter joint distributions are given for illustration inthe 15% glucose medium, but the lack of correlation is also valid in thetwo other media. On the other hand, for a given strain s in a givenconditionm, key latent variables K and rwere negatively correlated asillustrated in Fig. 5. The variance of the joint posterior distribution ofthe latent variables Ks,m and rs,m stems from microenvironmentalvariations and reects the variability between two replicates of thesame strain in the same environment. In a prediction point of view,these results lead to different ways of drawing latent variables. Whencorrelations are detected, it becomes necessary to draw jointly a Kvalue and an r value in their empirical joint posterior distribution.With our data set, taking into account joint distributions becomesparticularly important to model different replicates of a given strain sin a given condition m.

    A major interest of Bayesian modelling is its predictive capacity.From our modelling achievement, it becomes possible to predict thetypical behaviour of any strain grown in our different glucoseconditions, as illustrated in Fig. 6. We rst simulated populationdynamics in 15% and 1% glucosemedia by drawing 30 values of K and rin the Kmean and rmean empirical joint posterior distributionsillustrated in Fig. 4. The 30 growth curves obtained represent themean typical behaviour in eachmedium under the assumption that allstrains behave like a hypothetical average one (Fig. 6A and B) andreect the effect of the culture medium. Then, to demonstrate theeffect of genetic variability on population dynamics, we drew 30 K andr values respectively in N(Kmean, Ksd) and in N(rmean, rsd) withrespect to the joint posterior distributions of these parameters. Asillustrated in Fig. 6, taking into account the genetic variability between

  • 31A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535strains led to more variable behaviours than predicted by the soleeffect of the culture medium on population dynamic key variables.

    4. Discussion

    We developed a probabilistic model to describe populationdynamics of 12 different strains grown in three culture mediadiffering by their glucose content. To our knowledge, none of theprevious studies relied on a Bayesian framework to model S. cerevisiaepopulation dynamics.

    4.1. Differences between frequentist and Bayesian approaches

    Data presented here had previously been analyzed by tting apopulation dynamic model using a frequentist approach. Populationkey variables were then recovered and an explicative secondarymodel (ANOVA) had been used to determine if the forcing factors (inour case, medium variation and strain variability) did have asignicant effect on the estimated parameter values. With thismethod, uncertainty caused by the lack of t of the populationdynamic model to the biological data was not taken into account. Yetwe had just estimates, and less condence should be granted to valuesobtained with fewer points. In addition, non-linear relationships mayprevent from a rapid and unbiased convergence of such estimates (the

    Fig. 4. Empirical joint posterior distributions of parameters Kmean15%, rmean15%, Ksd15%, rsdjoint posterior distributions are shown on the lower panel of the gure (under the diagonal)Ksd15%, rsd15% and . Correlation coefcients between parameters are given in the upper paspeed of convergence depends on the number of data pointsnecessary to reach a given estimate, and the bias corresponds to thedifferences between estimates obtained from different experiments).This strategy can lead to rough approximations, and even to wrongconclusions in some extreme cases. However, for this particular study,the two approaches produced similar estimates.

    Within the frequentist context, a better strategy could have beento reconstruct an all-inclusive analysis of variance based on a globalnon-linear procedure (Mc Culloch et al., 2008; Molenberghs andVerbeke, 2006; Muller and Stewart, 2006). There are now powerfulEM algorithms (McLachlan and Krishnan, 1996) for the inference of abroad range of non-linear models (e.g. multi level models, mean-dispersion models, longitudinal models with individual evolutionsruled by differential equations, etc). However, in this kind of models,tests for signicant effects of controlling factors need to be adaptedand developed specically for each type of non-linear model. Thesemethods may be powerful in terms of precision of parameterestimators, but are reserved to high skilled statisticians since veryfew user-friendly software's are available. For instance, SAS Nlmixprocedures, Monolix and R Nlme routines cannot presently dealwith a statistical model like the one we have used. Moreover, from apractical point of view, classical frequentist framework encountersobstacles for the treatment of missing data, which either should bedeleted or replaced by approximated values. In Bayesian settings,

    15% and , corresponding adjusted distributions and correlation coefcients. Empirical. The diagonal contains adjusted posterior densities of parameters Kmean15%, rmean15%,nel of the gure (over the diagonal).

  • 32 A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535missing data are just considered as latent variables estimated afterconsideration of the data. No particular treatment of missing values isthen needed for the inference.

    A fundamental advantage of the Bayesian approach is thepossibility to combine various sources of information to formulatehypotheses on parameters of interest, and thus to dene their priordistributions. In classical frequentist framework, hypotheses on

    Fig. 5. Empirical joint posterior distributions of latent key variables K and r, corresponenvironmental conditions. Empirical joint posterior distributions are shown on the lower pdensities of variables K and r. Correlations coefcient are given in the upper panel of each

    Fig. 6.Modelling the typical population dynamics of S. cerevisiae strains in two different envfrom empirical posterior distribution adjustments. A. 30 growth curves (red lines) were simmL), and Kmean15% and rmean15% drawn in the empirical joint posterior distributions. Thevariability (genetic differences among strains) on population dynamic key variables. For earsd15%) taking into account correlations related to empirical joint posterior distributions. B.parameter distributions are not related to expert knowledge or toprevious experiments, while in Bayesian framework, posteriordistributions of a rst study could be recycled as prior distributionsfor a second one. The Bayesian approach can be viewed as statisticallearning machinery that progressively updates the state of knowl-edge about a specic phenomenon by processing data from aparticular eld of research.

    ding adjusted distributions and correlation coefcients for one strain in the threeanel of each sub-gure (under the diagonal). The diagonal contains adjusted posteriorsub-gure (over the diagonal).

    ironments. Population dynamics was modelled according to probability laws obtainedulated in 15% glucose medium, with the same initial density (N0=N0mean=106 cell/30 grey lines represent the additional information brought after incorporating geneticch grey line, K and r are respectively drawn in N(Kmean15%, Ksd15%) and in N(rmean15%,Same simulation conditions as in A, except that the glucose concentration was 1%.

  • 33A. Spor et al. / International Journal of Food Microbiology 142 (2010) 25354.2. Bayesian modelling, a thinking framework for biologists

    In statistical terms, we developed a hierarchical non-linear mixed-effects model with heteroscedastic variance. The conception of thismodel was quite simple and followed the natural way of thinking ofbiologists. The rst stepwas to imagine the phenomenological processinto action. Microbial populations are known to grow in 3 phases: lag,exponential and stationary. The classical mathematical description ofthis dynamics is a logistic equation summarized by 3 populationdynamic variables: K, r and N0. This is the deterministic part of themodel. Then, variability could be added to the process by drawingthese latent variables in probabilistic distributions. The way thesevariables are drawn reects directly how biologists understand theprocess and its behavioural similarities. We could have considered theeffect of glucose concentration on population dynamic key variablesas a random effect, but we chose to model it as a xed effect becausewe had only three culture media not homogenously distributed (15%,1% and 0.25%) and not representative of a typical environmentaleffect. Statistically, this means there is an average value for eachparameter in each glucose condition (Kmean, rmean and N0mean) andthat these three experimental situations are considered as indepen-dent. Then we introduced a random effect due to genetic variability,described by the standard deviation of the law from which thepopulation latent variables are drawn (Ksd, rsd andN0sd). The last stepwas to incorporate uncertainty in measurements. Since measurementerrors increased with population size, which was due to the highernumber of dilutions required to plate the same number of cells, wesimply chose to draw the residual variance in N(0, Ns,m,t), i.e. thestandard deviation of the error was proportional to the populationsize.

    Our approach borrowed many concepts from the seminal Delign-ette-Muller et al. (2006)'s paper regarding Bayesian modelling ofgrowth curves for risk assessment. Specically, we employed: i) thewell-known logistic equation used as a primary growth model, ii)using their words to separate uncertainty and variability of a model,we tried to account for the main sources of variability (random effectsstemming from genetic differences between strains) and uncertainty(essentially measurement errors and only partial knowledge aboutparameters) and iii) the same powerful MCMC techniques were usedto perform posterior distribution of the unknowns. However, majordifferences in the modelling assumptions were made in our article,since main sources of variability and uncertainty were different. InDelignette-Muller et al. 2006, the researchers had to consider growthparameters as a function of changing conditions over a large range,since in addition to their own 61 curves, 35 others were taken from 10publications. As all our data have been obtained specically for thisstudy under controlled conditions, we didn't need such a secondarygrowth model. Similarly, due to the specic features of our strainsdata, we didn't need to introduce a lag time in the model. In thatsense, our model is simpler. Nevertheless, we had to relax theassumption of a homogeneous measurement error variance describedin Delignette-Muller et al. (2006) over each growth curve. Thus, weproposed here a more general way to deal with measurementuncertainties. Because we had the same strains replicated in differentconditions, our model allowed us to take explicitly genotypeenvir-onment interactions into account, which was not the case in theDelignette-Muller et al. (2006). Concerning the applications, Delign-ette-Muller et al. (2006) used parameter estimations that wereperformed separately on one bacterial species and on the totalmicrobial ora to predict the results of the competition between thebacterial species of interest and the total ora. Because we wereinterested in predicting the environmental conditions that favouryeast growth, we used our model to predict the range of variation forgrowth in different environments.

    Thus the construction of the model was quite intuitive. Moreover,

    we would like to emphasize that Bayesian approaches are verypowerful and accessible to model any dynamical or spatial processes,such as the impact of any environmental abiotic (temperature,oxidative stress, poisoning) or biotic factors, the competition betweentwo or more micro-organism species or strains, etc. A free dedicatedsoftware, WinBUGS ( MRC Biostatistics Unit, (Spiegelhalter et al.,2003)), is available and a library has been developed under R (libraryBRugs http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/2.9/BRugs_0.5-3.zip, the R Development Core Team) giving to thescientist the opportunity to develop his own models.

    4.3. Additional improvements

    Another possible approach could have been functional dataanalysis (Ramsay and Silverman, 2005) which aims at explainingthe variations of a response function of time by the information ofother explanatory variables, such as, in our case, the culture mediaor the strains. The analysis consists in converting the data into asystem of basic functions which are combined linearly to dene theactual functions. Then, a collection of statistical techniques is appliedto the parameters of the basic functions. Statistical packages areavailable for R and Matlab. This approach is really efcient foranswering questions like How do the curves vary from one conditionto another? or Are the differences between the curves signicant?.However, because parameter estimation is performed on basicmathematical functions, and not on the actual response curves, theymay lack of biological meaning. In the case of population dynamicanalysis, the key variables of the latent process, K, r and N0, have adirect biological meaning.

    Various types of equations could have been used to model S.cerevisiae population dynamics. We chose to use a logistic equation,because it is parsimonious (three variables having a biologicalmeaning are sufcient) and possesses an analytical solution whichproperly describes S. cerevisiae population dynamics. A moresophisticated population dynamic model based on ordinary differen-tial equations, as described in (Billoir et al., 2008), could have beenused. In this type of modelling, the specic add-on module PKBUGSdeveloped by Lunn can be used, as in pharmacokinetics, whenever theordinary differential equation models with unknown coefcientscannot be analytically solved (Lunn et al., 1999).

    Competingmodels couldhavebeendesigned and comparedvia theirposterior probabilities (e.g.withBayes factors; (Kass andRaftery, 1995))or according to their predictive ability on a test sample (predictiveposterior checks; (Gelman et al., 2003)). The hypothesis of conditionalindependence betweenmedia could be relaxed. For instance a continualcovariation of the parameters with the environments could be added.A random strain origin effect independent from glucose concentrationcould have been declared, and variance estimates could have beencompared to the estimates of the present model. In addition, a morecomplex model would assume prior correlations between populationdynamic features with a 3-dimensional random effect instead of threeindependent random effects for each of these. This would help takinginto account biological adjustments in the life-history strategy of strainssuch as an increased value of r to compensate for a below average K.Other renements may affect the measurement error structure: at verylow additional computation costs, a two-parameter proportional errorNs,m,tN (0,Ns,m,t

    ) could be studied in the model ( is the secondparameter).

    It is also relatively straightforward to develop the same type ofmodel for more than two factors providing that enough data arecollected to convey sufcient information to update the additionalparameter priors to be involved into the analysis. However whenmany factors interplay, we can no longer trust the commonmicrobiological good sense to focus on a single model and manycompeting models can be designed. When the number of possiblemodels gets large (and this number grows very rapidly with the

    number of factors), the Bayesian model selection problem may

  • Domizio, P., Lencioni, L., Ciani, M., Di Blasi, S., Pontremolesi, C., Sabatelli, M.P., 2007.

    in spatially replicated surveys. Journal of Applied Ecology 45, 589598.

    34 A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535become tricky and rened MCMC strategies (clearly out of the scopeof this paper) should be used to select the appropriate model(reversible MCMC jump as in Green, 1994 or stochastic search as inchapter 4 of Marin and Robert, 2007).

    4.4. A measure of G*E interactions

    Genotype-by-environment interactions reect the way a genotypebehaves across different environments. In classical quantitativegenetics, an ANOVA is performed on the trait considered, and theG*E effect is inferred when (i) the ranking of the genotypes changesaccording to the medium or (ii) the inter-individual variance varies inthe different environmental conditions (Lynch and Walsh, 1997).

    Bayesian inference provides us with both the empirical posteriordistributions of growth key variables (Ks,m, rs,m, N0s,m) of each strain ineach glucose condition, and the standard deviation (Ksdm, rsdm,N0sdm) of those variables in each glucose condition, which reectsstrain variability in each condition. Then comparisons of the rankingof the average growth key variables of the strains could be performed(for example rank correlation of parameter values in two differentenvironments), as well as comparisons of posterior distributions ofstandard deviations of parameters among culture conditions (usingfor example KolmogorovSmirnov tests).

    In our case, the means of the empirical posterior distributions ofKsd are very different in the three culture media while the means ofrsd stay constant. The between-strain genetic differences for carryingcapacity are increased in rich media compared to poor ones, whilegenetic variation of the intrinsic growth rate stays robust towardsenvironmental changes. This is a good example of standing geneticvariation expressed only under certain environmental conditions(Ksd) and canalization (rsd) (Waddington, 1942). HBM would bepowerful for providing better predictions of the genetic diversity ofpopulation dynamic key variables in relation to environmentalvariation.

    Last but not least, another fundamental aspect of the Bayesianapproach is the possibility to study the joint posterior distributions oflatent variables and parameters of interest. Studying the jointposterior distributions of Kmean and rmean in each glucose conditionshould inform us about the possible existence of a genetic trade-offbetween carrying capacity and intrinsic growth rate in S. cerevisiaepopulations. We did not detect it in our data. However we showedthat modelling the population growth of a given strain s in a givenconditionm requires to take into account the conditional dependencebetween key latent variables K and r. In other words, the differentpossible ways a given strain can grow in a given condition (because ofthe microenvironmental variations, within population variations andstochasticity between technical replicates) are constrained by a trade-off between K and r (Novak et al., 2006).

    4.5. Further prospects

    The next step would be to extend this model and to classify strainsaccording to their population dynamic behaviour. By dening a xedeffect of the industrial origin of strains (as we did for the glucoseeffect), and modelling population growth in typical conditions thatmaximize the differences between strains coming from differentindustrial origins, we could determine the adjusted distributions ofpopulation dynamic key variables. Then, when testing a novel naturalstrain in this environment, it would be possible to assign a measure ofgoodness of this strain in a specic industrial condition byclassifying it according to its population dynamics. Another possibleapplication could be the choice of typical growing conditions in orderto maximize the differences between industrial origins. A mediumthat seems appropriate to maximize differences between strainscoming from different industrial origins would be a medium in which

    all strains are able to grow (not too stressful) but in which strainsLunn, D.J., Wakeeld, J., Thomas, A., Best, N., Spiegelhalter, D., 1999. PKBugs User GuideDept. Epidemiology and Public Health. Imperial College School of Medicine, London.

    Lynch,M.,Walsh, B., 1997. Genetics and Analysis of Quantitative. Traits Sinauer associates,Inc., Sunderland, Massachusetts.

    MacArthur, R.,Wilson, E.O., 1967. The Theory of Island Biogeography. PrincetonUniversitypress.

    Marin, J.M., Robert, C.P., 2007. Bayesian Core: a Practical Approach to ComputationalBayesian Statistics. Springer, New York.

    Mc Culloch, C.E., Searle, S.R., Neuhaus, J.M., 2008. Generalized, Linear and Mixed Models.Wiley-Interscience.

    McLachlan, G.J., Krishnan, T., 1996. The EmAlgorithm and Extensions. Wiley-Interscience.Membre, J.M., Leporq, B., Vialette, M., Mettler, E., Perrier, L., Thuault, D., Zwietering, M.,

    2005. Temperature effect on bacterial growth rate: quantitative microbiologyapproach including cardinal values and variability estimates to perform growthSpontaneous and inoculated yeast populations dynamics and their effect onorganoleptic characters of Vinsanto wine under different process conditions.International Journal of Food Microbiology 115, 281289.

    Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 2003. Bayesian Data Analysis. Chapman &Hall/CRC.

    Gilks, W.R., Richardson, S., Spiegelhalter, D.J., 1996. Markov Chain Monte Carlo inPractice. Chapman & Hall, London.

    Green, P.J., 1994. Reversible jumpMCMC computation andBayesianmodel determination.Tech. Report. University of Bristol.

    Hooten, M.B., Wikle, C.K., Dorazio, R.M., Royle, J.A., 2007. Hierarchical spatiotemporalmatrix models for characterizing invasions. Biometrics 63, 558567.

    Kass, R.E., Raftery, A.E., 1995. Bayes factors. Journal of the American Statistical Association90, 773795.

    Kry,M., Royle, J.A., 2008. Hierarchical Bayes estimation of species richness and occupancycould exhibit quantitative variation for population dynamic para-meters. For example, in our experiment, the 15% glucose mediumtypically maximizes the variance for K between strains (Fig. 6).However, from a statistical perspective, it is rather a challenge to planin practice an optimal design, which would maximize beforehand theexpected differences between strains coming from different industrialorigins. In the Bayesian setting (Mller, 1999), a utility function has tobe elicited from the end-users and burdensome high dimensionalmaximization and integration have to be performed (Amzal et al.,2006).

    Acknowledgements

    The authors would like to thank Yves Rousselle, Adrienne Ressayreand Thibault Nidelet for critical comments and programmingassistance. We are also grateful to reviewers for their criticalcomments. This work was supported by a PhD fellowship of theMinistre de l'Enseignement Suprieur et de la Recherche to AymSpor, and indirectly beneted from a grant by the French AgenceNationale de la Recherche (ANR Project ADAPTALEVURE no. NT05-4_45721).

    References

    Amzal, B., Bois, F., Parent, E., Robert, C.P., 2006. Bayesian optimal design via interactingparticle systems. Journal of the American Statistical Association 101 (474), 773785.

    Barker, G.C., Malakar, P.K., Peck, M.W., 2005. Germination and growth from spores:variability and uncertainty in the assessment of food borne hazards. InternationalJournal of Food Microbiology 100, 6776.

    Beltran, G., Torija, M.J., Novo, M., Ferrer, N., Poblet, M., Guillamon, J.M., Rozes, N., Mas, A.,2002. Analysis of yeast populations during alcoholic fermentation: a six yearfollow-up study. Systematic and Applied Microbiology 25, 287293.

    Bernier, J., Parent, E., Boreux, J.J., 2000. Statistique pour l'environnement Lavoisier, Paris.Billoir, E., Delignette-Muller, M.L., Pry, A.R., Charles, S., 2008. A Bayesian approach to

    analyzing ecotoxicological data. Environmental Science & Technology 42 (23),89788984.

    Boekhout, T., Robert, V., 2003. Yeasts in Food. Woodhead Publishing Limited, Cambridge,England.

    Calder, C., Lavine, M., Mller, P., Clark, J.S., 2003. Incorporating multiple sources ofstochasticity into dynamic population models. Ecology 84, 13951402.

    Clark, J.S., 2003. Uncertainty and variability in demography and population growth: ahierarchical approach. Ecology 84, 13701381.

    Clark, J.S., Gelfand, A.E., 2006. Hierarchical Modeling for the Environmental Sciences:Statistical Methods and Applications. Oxford University Press Inc., New York.

    Delignette-Muller, M.L., Cornu, M., Pouillot, R., Denis, J.B., 2006. Use of Bayesian modelingin risk assessment: application to growth of Listeria monocytogenes and food ora incold-smoked salmon. International Journal of Food Microbiology 106, 195208.simulations on/in food. International Journal of Food Microbiology 100, 179186.

  • Molenberghs, G., Verbeke, G., 2006. Models for Discrete Longitudinal Data. Springer,Berlin/Heidelberg.

    Mller, P., 1999. Simulation-based optimal design. Bayesian Statistics 6, 459474.Muller, K.E., Stewart, P.W., 2006. LinearModel Theory:Univariate,Multivariate, andMixed

    Models. Wiley-Interscience.Nauta, M.J., 2000. Separation of uncertainty and variability in quantitative microbial

    risk assessment models. International Journal of Food Microbiology 57, 918.Nauta, M.J., 2002. Modeling bacterial growth in quantitative microbiological risk

    assessment: is it possible? International Journal of Food Microbiology 73, 297304.Novak, M., Pfeiffer, T., Lenski, R.E., Sauer, U., Bonhoeffer, S., 2006. Experimental tests for

    an evolutionary trade-off between growth rate and yield in E. coli. The AmericanNaturalist 168, 242251.

    Pouillot, R., Albert, I., Cornu, M., Denis, J.B., 2003. Estimation of uncertainty and variabilityin bacterial growth using Bayesian inference. Application to Listeria monocytogenes.International Journal of Food Microbiology 81, 87104.

    Ramsay, J., Silverman, B.W., 2005. Functional Data Analysis. Springer, New York.Shorten, P.R., Membre, J.M., Pleasants, A.B., Kubaczka,M., Soboleva, T.K., 2004. Partitioning

    of the variance in the growth parameters of Erwinia carotovora on vegetable products.International Journal of Food Microbiology 93, 195208.

    Spiegelhalter, D.J., Thomas, A., Best, N., Lunn, D., 2003. WinBUGS Version 1.4 User ManualMRC Biostatistics Unit, Cambridge, UK.

    Spor, A., Wang, S., Dillmann, C., de Vienne, D., Sicard, D., 2008. Ant and grasshopperlife-history strategies in Saccharomyces cerevisiae. PLoS ONE 3, e1579.

    Waddington, C.H., 1942. Canalization of development and the inheritance of acquiredcharacters. Nature 150, 563565.

    Wikle, C.K., 2003. Hierarchical Bayesian models for predicting the spread of ecologicalprocesses. Ecology 84, 13821394.

    Wloch, D.M., Szafraniec, K., Borts, R.H., Korona, R., 2001. Direct estimate of the mutationrate and the distribution of tness effects in the yeast Saccharomyces cerevisiae.Genetics 159 (2), 441452.

    35A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535