hierarchical bayesian modelling for saccharomyces cerevisiae population dynamics
DESCRIPTION
Hierarchical Bayesian Modelling for Saccharomyces Cerevisiae Population DynamicsTRANSCRIPT
-
aa,b,
eienc
llinathdi
nsideedidegistable to affect them, namely environmental effects, genetic diversity andprobabilistic approach allowed us: (i) to model the dynamical behaviour of
evaluate genetic variability of the growth key var
eld oction ogist istory v
observations are collected at different time o
jointly related through time or space. Statistical analysis consists in prior into a posterior distribution). Because Bayesians traditionally put
International Journal of Food Microbiology 142 (2010) 2535
Contents lists available at ScienceDirect
International Journal o
j ourna l homepage: www.e lsevmodelling (depicting the various sources of variation) and inference(estimating the parameters of the model). Historically, statisticalanalysis has been developed from a frequentist point of view: theparameters are considered to have a xed value, and estimates of thisvalue are searched via various statistical procedures of inference(moment adjustment, maximum likelihood estimates, etc.). Moststatistical toolboxes that are available to biologists are designedaccording to the frequentist approach. However, they are generallyrestricted to the analysis of linear models, i.e. to the cases where theresponse is linear through time and/or space.When dealing with non-
more emphasis in the modelling process, the Bayesian statisticalframework provides an easyway of thinking about biological problems.Unlike the frequentist estimation techniques, dealing with complexmodels (non-linearity, dependence) does not bring much additionaldifculties to the Bayesian inferential algorithms.
Hierarchical Bayesian Modelling (HBM) is a probabilistic, adapt-able and efcient framework for modelling dynamical processes bytaking into account multiple sources of variation. This type of model isnot restricted to specic problems and can be generically applied to avast extent of dynamical and spatial systems. Hierarchical statisticallinear processes, the problem becomes murequires sophisticated statistical tools whichby the biologists. As a result, they have to us
Corresponding author.E-mail address: [email protected] (A. Spor).
0168-1605/$ see front matter 2010 Elsevier B.V. Aldoi:10.1016/j.ijfoodmicro.2010.05.012r space points from thendent because they are
inference can be interpreted as formulating a probabilistic judgmentabout theunknownsof themodel given theobserveddata (updating thesame biological sample, they become depe1. Introduction
Biological data, whatever the dynamical or spatial, i.e. they are funcoordinates. The challenge of the bioloof traits by the variation of explana 2010 Elsevier B.V. All rights reserved.
f research, are mostlyf time and/or of spatialto explain the variationsariables. Whenever the
for properly analyzing or even simply detecting differences betweentwo curves, for instance population growth curves.
The Bayesian approach is another way for analyzing biological data.Given uncertainty on parameter values, a so-called prior probabilitydistribution is assigned to the parameter in a modelling step, takingpossibly into account previous knowledge on the parameter. Bayesianch more complex, andare usually not masterede non-optimal methods
modelling hasthrough conditiolinked simpler sstatistical modelexperimental dahand given thelatent process lethat make sense
l rights reserved.iables.
strains in a given condition under some uncertainty, (ii) to measure environmental effects and (iii) toPopulation dynamics also into account factorsmeasurement errors. OurHierarchical Bayesian Modelling for Sacch
Aym Spor a,, Christine Dillmann a, Shaoxiao WangDelphine Sicard a, Eric Parent c
a Univ Paris-Sud, UMR 0320/UMR 8120 Gntique Vgtale, F-91190 Gif-sur-Yvette, Francb Department of Biochemistry and Molecular Biology, Louisiana State University Health Scc UMR MIA 518, INRA/AgroParisTech, ENGREF, 19 avenue du Maine, F-75015 Paris, France
a b s t r a c ta r t i c l e i n f o
Article history:Received 10 March 2009Received in revised form 6 April 2010Accepted 14 May 2010
Keywords:Hierarchical Bayesian ModellingSaccharomyces cerevisiae
Hierarchical Bayesian Modewith the development of pdevelopment of emergingSaccharomyces cerevisiae costrains meeting industrial nthe domesticated yeast, wdistillery. We relied on a loromyces cerevisiae population dynamics
Dominique de Vienne a,
es Center, Shreveport, LA 71130, USA
g is powerful however under-used to model and evaluate the risks associatedogens in food industry, to predict exotic invasions, species extinctions andseases, or to assess chemical risks. Modelling population dynamics ofering its biodiversity and other sources of variability is crucial for selectings. Using this approach, we studied the population dynamics of S. cerevisiae,ly encountered in food industry, notably in brewery, vinery, bakery andic equation to estimate the key variables of population growth, but we took
f Food Microbiology
i e r.com/ locate / i j foodmicrothe potential to match high dimension problemsnal decomposition into a series of probabilisticallyubstructures (Clark and Gelfand, 2006). Hierarchicals are made of three layers (Wikle, 2003). First, anta level species the distribution of the observables atparameters and the underlying processes. Second, avel depicts the various hidden biological mechanismsof the data. For example in this article, the latent
-
26 A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535process level describes the population growth process with a logisticmodel. Third, a parameter level identies the xed quantities thatwould be sufcient, were they known, to mimic the behaviour of thesystem and to produce new data statistically similar to the onesalready collected. Sources of variation on the parameters can beadded: some authors (Bernier et al., 2000; Nauta, 2000, 2002) makethe distinction between variability (i.e. uncertainty by essence thatcannot be reduced by additional information) and uncertainty (oruncertainty by ignorance that should decrease as the sample sizeincreases). In this paper, variability will describe changes with respectto biotic or abiotic variation, while uncertainty accounts formeasurement errors and model imperfections (Shorten et al., 2004).
In predictive microbiology, HBM has been particularly used for riskassessment and food shelf-life estimation. It is convenient to predictpathogenic bacterial behaviour in case of contamination, because itmakes it possible to quantify separately the effects of environmentalfactors (temperature, pH and available resources), genetic variation andmeasurement uncertainty. Some population dynamic model usingBayesian inferencearenowavailable anddescribe for example the effectof temperature on growth of Listeria monocytogenes, Salmonella,Escherichia coli, Clostridium perfringens and Bacillus cereus on porkmeats, milk, seafoods or egg products (Delignette-Muller et al., 2006;Membre et al., 2005; Pouillot et al., 2003) or the development of agrowing population of bacterial cells from an inoculum of dormantspores (Barker et al., 2005).
In ecology, this probabilistic framework is increasingly used toexamine population dynamics because it can take easily into accountmultiple sources of stochasticity (such as space, time and individualheterogeneities), while in standard statistical models, only processerrors are routinely included (Calder et al., 2003; Clark, 2003). HBMhas much to offer, including more precise parameter estimation(Calder et al., 2003), and it becomes more and more used to predictexotic invasions, extinction risk or development of emergingdiseases. For example, HBM has been successfully implemented tomodel the invasive Eurasian Collared-Dove dynamics (Hooten et al.,2007), to estimate species richness and spatial occupancy (Kry andRoyle, 2008), the various failures in a Dynamic Energy Budgetmechanism for ecotoxical Daphnid data (Billoir et al., 2008) or topredict the relative abundance of House Finches over the easternUnited States (Wikle, 2003).
The yeast Saccharomyces cerevisiae, a common biological model ingenetics, genomics and physiology, has been exploited since Neolithicperiod to produce fermented beverages and bread dough. Because ofthe consumers' reluctance about genetically modied organisms, itseems unrealistic to improve strains by genetic engineering. Anotherstrategy is to exploit present natural biodiversity of yeast, whichrequires characterizing strains, searching for suitable physiologicaltraits for industrial purposes, and planning genetic resource manage-ment because it is not possible to give the same maintenance effort toall strains.
Bakers need to develop strains with hyper-osmolarity resistance,brewers strains with high fermentation rates and short lag phases,and oenologists strains tolerant to ethanol for completing fermenta-tion (Boekhout and Robert, 2003). These different properties havebeen shown to be related to population dynamic characteristics and tointeract with the environment. The population dynamics of S.cerevisiae depends both on the genetic background of the strainsand on environmental factors such as temperature (Beltran et al.,2002) or glucose content of the medium (Spor et al., 2008). The latterstudy demonstrated a strong impact of the food-processing use ofstrains on population dynamic key variables (Spor et al., 2008).Similarly, Domizio et al., 2007 described a close relationship betweenwine attributes and Saccharomyces spp population dynamics. Thus,predicting population growth and modelling genetic and non-geneticvariation would help for yeast genetic resource management and for
selecting industrial starter strains.We used HBM to describe S. cerevisiae population dynamics. Theexperimental data consisted in population size counts over time for 12S. cerevisiae strains grown in three culture media. The latent processrelied on a logistic equation depending on three populationparameters, which divides the population growth into two phases,an exponential growth from an initial population of size N0 with anintrinsic growth rate r, followed by a decrease of the populationgrowth which leads to a stationary phase, characterized by amaximum population size K, also called carrying capacity in ecology.The latent process model described differences in these key variableswith respect to both environmental effects (glucose content in theculture medium) and genetic variation between strains. Finally, theuncertainty related to measurement errors was described.
2. Materials and methods
2.1. Principle of the Bayesian inference
Bayesian inference, or model learning, is the process of updatingprior beliefs about unknowns by probabilistic machinery based uponthe relationships in the model and the observations recorded aboutthe situation.
By contrast with the classical approach, which begins with ahypothesis test that proposes a specic value for anunknownparameter, Bayesian inference proposes a prior distribution p() for thisparameter which represents the beliefs originally encoded in themodel. Data x1, x2,, xn are collected and the likelihood f(x1, x2,, xn| )is calculated given the parameter values (as in the frequentist case).
Then the probabilities of all the other variables that are connected tothe variable representing the new data are updated. Bayes's theorem isused to calculate the posterior distribution g(| x1, x2, , xn). Afterinference, the updated probabilities reect the new levels of belief in (orprobabilities of) all possible outcomes encoded in the model.
2.2. Data
The experimental data used to develop this model have beenpublished in (Spor et al., 2008). Strain origin, culture mediumcomposition and population size measurements are detailed in theMaterial and method section of Spor et al. (2008). To sum it up, 12strains stemming from three industrial origins (vinery, brewery andbakery) were grown in three media differing by their glucoseconcentration (0.25%, 1% and 15%). Every two hours samples weretaken, diluted and plated to estimate population size. Three biologicalreplicates were performed for each medium-by-strain combination,each time starting with a new inoculum. The population size wasexpressed in CFU/mL (Colony Forming Units). The experimental dataare also called observations in the Bayesian setting.
2.3. Model
Our aim was to construct a population dynamic model capable tocorrectly predict the population size Ns,m,t of strain s in medium mover time t. Fig. 1 illustrates the corresponding Directed Acyclic Graphthat points out the conditional dependence between nodes. In thisframework, parameters and observations could either be consideredas logical or stochastic nodes of the model. Logical nodes correspondto nodes that are deterministic functions of other nodes, andstochastic nodes correspond to nodes that are described by probabil-ity laws. The description of the nodes is given in Table 1.
2.4. Description of the latent process
We assumed that S. cerevisiae population growth follows a logisticequation. This equation is classically used in ecology to model
microbial as well as animal population dynamics, and is central in
-
; covted
27A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535the mathematical denition of the famous r and K strategies inecology (MacArthur andWilson, 1967). Two types of logistic equationcould be considered: with or without lag-phase. Because freshmedium was inoculated after an overnight pre-culture, we used thelogistic model without lag-phase. Thus population size followed:
NK; r;N0s;m; t =Ks;mN0s;me
rs;mt
Ks;m + N0s;mers;mt11
where N([K,r,N0]s,m,t), the population size at time t, depends on thevariables K, r and N0 of strain s in the medium m. K is the carryingcapacity (maximum population size) expressed in CFU/mL, N0 is theinitial population size also expressed in CFU/mL and r is the intrinsic
Fig. 1.Directed Acyclic Graph of themodel (DAG). Data (Ys,m,t) are denoted by rectanglesto stochastic dependences between nodes while broken arrows indicate logical link. Dotmedium).growth rate (equivalent to the maximum rate of increase of thepopulation, in min1).
2.5. Sources of variability on parameters
Our aim was to estimate posterior distributions of the latent keyvariables K, r and N0 for each strain in each glucose condition. In theliterature of system analysis, there are commonly named populationdynamic parameters, which turns to be a rather inappropriate term ina statistical modelling framework since, contrary to statisticalparameters, they vary as latent (i.e. unobserved) random variablesdepending on factors of explanations or grouping of data. Modellingthe variability consists in dening how the environment, as well asthe genetic variation between strains, would affect populationdynamic key variables. In this context, variations can be described
Table 1Description of the links between nodes.
Node Type Denitiona
Ys,m,t Stochastic N(Ns,m,t, Ns,m,t)Ns,m,t Logical Eq. (1)Ks,m Stochastic N(Kmeanm, Ksdm)rs,m Stochastic N(rmeanm, rsdm)N0s,m Stochastic N(N0meanm, N0sdm)Ns,m,t Stochastic N(0, Ns,m,t)
a N(a,b), normal distribution with expected value a and standard deviation b.by normal distributions N(, sd) dened by two parameters, the mean and the standard deviation sd. A xed effect would be an effect thatchanges the mean of a latent variable K, r or N0, while a randomeffect would change their standard deviation sd. This introducesadditional correlations between individuals of the same group, i.e.individuals of the same strain. The degree of resemblance will betuned by the standard deviation sd of the latent variable.
In our case, two sources of variation could affect populationdynamics: the glucose concentration in the medium and geneticdifferences between strains. As each culture condition may affectyeast population dynamics in a specic manner, the medium effectwas considered to be xed and was described by a mean value foreach parameter Kmeanm, rmeanm and N0meanm in each glucose
ariates by double rectangles (t) and latent variables by ellipses. Solid arrows correspondblue rectangles illustrated the embedded levels of the modelling (timepoint, strain andcondition m. Note that in the case of N0meanm, there is no causalrelationship between the glucose content of the medium and thisparameter. However, the experiments in the 15% glucose conditionwere performed by a different experimenter from those performed inthe 1% and 0.25% glucose conditions. The variation of the N0meanmparameter represents therefore the variation of the inoculumconditionally to the experimenter. The mean values of the populationdynamic latent variables K, r orN0 were assumed to be the same for allstrains in a given glucose condition. The differences between strainswere considered as a genetic random effect, statistically described bythe standard deviations Ksdm, rsdm, N0sdm of the normal distributions.
Mathematically, for each strain s in each medium m, we chose todraw the latent key variables (K, r and N0) in independent normaldistributions with parameters Kmeanm, rmeanm and N0meanm asexpected values. The other parameters, the standard deviations Ksdm,rsdm and N0sdm, rule the range of variation for the variables aroundtheir mean in each glucose condition m:
Ks;mNKmeanm;Ksdm;rs;mNrmeanm; rsdm;N0s;mNN0meanm;N0sdm:
In other words, there is a random effect Ks,m=Ks,mKmeanm,corresponding to the different behaviours of two strains s and s in agiven glucose condition m (cov(Ks,m, Ks,m)=0), while there iscorrelation between data stemming from the same strain (cov(Ks,m,
-
Ks,m=(Ksdm)2 when s=s). This covariation gives the dependence
structure of the model.Note that we explicitly allowed for genotype-by-environment
interactions because the standard deviation of the latent variablesdepended on the environment m.
2.6. Description of the uncertainty related to measurement errors
If we consider a strain s in a culture condition m at a time t, theobservation Ys,m,t writes:
Ys;m;t = Ns;m;t + Ns;m;t
where Ns,m,t corresponds to the residual error around the theoreticallaw of Ns,m,t. The three replicates for each medium-by-straincombination were pooled, so that encompasses both technical and
inference, described in next paragraph) works with precisionparameters (Kprec, rprec, N0prec and 2) which are the reciprocalof the square of the standard deviations. Precision parameters weredrawn in G(103,103), where G(a, b) is a Gamma distribution ofshapeparametera and scale parameterb (Table 2). Settinga=b=103
is a common Bayesian practice for picking non informative precisionpriors.
2.8. Bayesian inference
Bayesian inferences of parameter values were performed usingWinBUGS software ( MRC Biostatistics Unit (Spiegelhalter et al.,2003)). After an adaptation phase (also called burn-in phase (Gilkset al., 1996)) of 4000 iterations, the convergence of the Monte CarloMarkov Chain (MCMC) algorithm was checked by visual inspection of
datapoints is reected by the posterior distribution of the latent
rate andN0 the initial population size), but also on xed and random
P0.0
603.33106 1 5.2841060 7.9210305 3.04108
000
, Gaesey: 8
28 A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535biological variations. This model was chosen because the variationsbetween the three replicates of the same strain in the same mediumwere very low. In such model, the variability between the replicates istaken into account, but we neglect the dependencies between datapoints belonging to the same replicate.
As the inspection of the observations revealed that the experi-mental error was increasing with the population size, we chose todraw the error in a normal distribution, centered at 0, with a standarddeviation equal to N:
Ns;m;tN0; Ns;m;t;
where is the residual standard deviation multiplier of the model.
2.7. Prior distributions
The prior distributions of the means Kmeanm, rmeanm andN0meanm have been drawn in normal distributions (Table 2). Theprior distribution for Kmean has been chosen as wide and as at aspossible because the culture media covered a wide range of glucoseconditions, and the carrying capacity Kmean should reect thenutrient content of the medium. From the literature, the priordistribution for rmean has been chosen with a mean value xed at0.01 min1 and a relatively large standard deviation (Wloch et al.,2001). Finally the prior distributions for N0mean have been xed at 1,with a standard deviation allowing reaching 5 because from 1 to5106 cells have been inoculated in a fresh culture at the beginning ofthe experiments.
The prior distributions of the standard deviations of the model(Ksd, rsd, N0sd and ) were chosen to favour large values. Theunderlying assumptions were (i) there is variation between strains ina given environment (Ksd, rsd, N0sd) and (ii) themeasurement error islarge (). WinBUGS (the software used to perform the Bayesian
Table 2Prior distributions used for the parameters.
Parameter Distributiona
Kmeanm N(70106, 70.7106)rmeanm N(0.01, 5.7103)N0meanm N(1106, 2.24106)Ksdm Kprecm
dG(0.001, 0.001)rsdm rprecm
dG(0.001, 0.001)N0sdm N0precm
dG(0.001, 0.001) 2dG(0.001, 0.001)a N(a, b), normal distribution with expected value a and standard deviation b; G(a, b)
rmean and N0mean, we draw only in the positive part of normal distributions because thdistribution truncated for positive values for Kmean, rmean and N0mean are respectivel
b 2.5th percentile.c 97.5th percentile.d WinBUGS deals with precision parameters, i.e. the reciprocal of the square of the stand7.9210305 3.04108
7.9210305 3.04108
7.9210305 3.04108
mma distribution with shape parameter a and scale parameter b. Note that for Kmean,parameters can only be positive in our conditions. Note that the actual means of normal9.18106, 0.01 and 2.26106.factors related to environmental variation, to genetic differencesbetween lines and to measurement error (see the modellingscheme in Fig. 1).
25b Median P0.975c
9.8106 50106 207.62106
.1103 0.01 0.02variables. The posterior Monte Carlo samples have been directly usedto evaluate the statistics related to the parameters and the latentdynamic population variables (posterior means, standard deviations,medians and 95% credibility intervals). Joint posterior distributions ofparameters and latent variables were studied using the functionpairs under R software. The precision parameters obtained fromWinBUGS have been transformed in standard deviations, Ksd, rsd,N0sd and , to have the same unit for the variability and for the meanof the population dynamic key variables.
3. Results
A Bayesian approach was used for estimating populationdynamic key variables in yeast, relying on a modelling frameworkin which the population size N depends not only on the parametersof a logistic function (K the carrying capacity, r the intrinsic growththe good mixing of three independent chains starting at threedifferent initial values for each parameter. Inferences were made onthe following 15000 iterations after the burn-in phase.
2.9. Empirical posterior distributions
Altogether, our model comprises 19 parameters: Kmean, rmean,N0mean, Ksd, rsd, N0sd for each of the 3 culture media and . Themodel also comprises 36 latent variables (Ks,m, rs,m and N0s,m). Notethat the biological and technical variability due to replicatedard deviation.
-
3.1. Efciency of HBM
This modelling scheme was efcient for studying the S. cerevisiaepopulation dynamics. For each strain/medium, growth modellingallowed us to estimate the key variables K, r and N0, and to predict theresulting growth curves as shown in Fig. 2. As expected, the intervalNs,m2 N ( is the residual standard deviation of the model)included the majority of the experimental data points, indicating thatboth the model used and the way we described experimental errorseemed to be relevant to describe the S. cerevisiae populationdynamics. Striking genotype-by-environment interactions could beobserved for the carrying capacity, since the strain with the highest Kvalue in 15% glucose (Fig. 2A) has the smallest value in 1% glucose(Fig. 2C).
The comparisons between prior and posterior distributions ofKmean, rmean, Ksd and rsd are shown in Fig. 3. The prior distribution ofKmean was very at and uninformative, whereas the three posteriordistributions (one for each medium) were very narrow, with distinctmeans, even for Kmean1% and Kmean0.25% (Fig. 3A). For the rmeandistributions, differences between prior and posterior distributionswere less, probably because the prior distribution for rmean waschosen from relevant literature. Note that choosing a uniformuninformative prior distribution gave the same posterior distribu-tions. Posterior distributions were more tightened than the prior, andwere distinct between media even if they overlap in a large part.Posterior distributions of Ksd parameters were all Gamma likedistributions despite their quite different shapes (Fig. 3C). Finally,posterior distributions of rsd in the three different media merged andwere quite different from the prior one, which indicates a similar
genetic variability of the intrinsic growth rate in the three differentculture conditions.
Empirical posterior distributions are shown in Table 3, andillustrated in Fig. 4 for the 15% glucose conditions. The distributionsof Kmean, rmean and N0mean were roughly symmetric, except forN0mean0.25%, whereas posterior distributions of standard deviationparameters (Ksd, rsd and N0sd) were slightly skewed to the right.
3.2. Environmental and genetic effects on population dynamics
The environment and the genetic differences between strains had astrong effect on population dynamics. Descriptive statistics of empiricalposterior distributions are given in Table 3.Kmeanmeanvalues increasedwhen glucose increased in the medium (Kmean0.25%=35.57106,Kmean1%=42.33106 and Kmean15%=96.02106), and rmean meanvalues decreased when the environment was richer (rmean0.25%=1.13102, rmean1%=8.65103 and rmean15%=6.86103). Thedifferences between the N0mean mean values reect experimentalvariations in the cell density at the beginning of the kinetics: in the15% glucose medium, more cells were inoculated (N0mean15%=2.71106) than in the two other media (N0mean1%=0.47106 andN0mean0.25%=0.34106).
The standard deviations Ksd and rsd directly reect the geneticvariability of population dynamic latent variables K and r among ourcollection of strains in a given medium. Descriptive statistics forstandard deviation parameters are given in Table 3. The variability ofthe carrying capacity was about 2 times higher in the 15% glucosemedium (Ksd15%=21.46106) than in the 1% glucose medium(Ksd1%=12.76106), and about 3 times higher than in the 0.25%
ed sande m
29A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535Fig. 2.Modelling the population dynamics of two strains grown in two culture media. Rmodelled population size Ns,m over time from K, r and N0 estimates. Blue dot dashedrepresent the residual standard deviation of the model (or the uncertainty related to th
media, while B and D represents another strain grown respectively in the 15% and 1% glucoolid diamonds represent experimental data. Black curves represent the evolution of theblue dotted curves represent respectively Ns,m(N) and Ns,m(2N) where easurement). A and C represents respectively a given strain in the 15% and 1% glucose
se media.
-
rs Kd cu
30 A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535glucose medium (Ksd0.25%=7.85106) indicating that geneticvariability between strains depended on the medium. In otherwords, we found genotypeenvironment interactions for K. On theopposite, the genetic variability of the intrinsic growth rate (rsd) washigh but robust with regard to environmental changes (rsd1.44102in the three media). The variability of the initial population size was
Fig. 3. Comparison of prior and posterior distributions of population dynamic parameterepresent respectively posterior distributions in 15%, 1% and 0.25% media. Green dashemore than 10 times higher in the 15% glucose medium than in the twoother media (1.08106 for N0sd15% vs. 0.14106 and 6.45104
respectively for N0sd1% and N0sd0.25%).Finally, the residual standard deviation multiplier of the model, ,
has been estimated to 1.398. Thus the real residual standard deviationaround the theoretical law of population dynamic Ns,m,t is drawn in a
Table 3Descriptive statistics of empirical posterior distributions of parameters.
Parameter Mean S.D. P0.025a Median P0.975b
Kmean15% 96.02106 6.76106 82.35106 96.01106 109.5106
Kmean1% 42.33106 4.06106 34.29106 42.28106 50.56106
Kmean0.25% 35.57106 3.06106 29.71106 35.48106 41.93106
rmean15% 6.86103 3.36103 4.55104 6.81103 1.37102
rmean1% 8.65103 3.34103 2.21103 8.63103 1.52102
rmean0.25% 1.13102 3.39103 4.59103 1.13102 1.79102
N0mean15% 2.71106 0.35106 2.02106 2.70106 3.41106
N0mean1% 0.47106 0.10106 0.37106 0.45106 0.75106
N0mean0.25% 0.34106 0.09106 0.19106 0.33106 0.56106
Ksd15% 21.46106 5.31106 13.79106 20.57106 34.07106
Ksd1% 12.76106 3.23106 8.09106 12.21106 20.42106
Ksd0.25% 7.85106 3.37106 0.53106 7.79106 14.84106
rsd15% 1.44102 3.33103 9.53103 1.37102 2.24102
rsd1% 1.43102 3.26103 9.5103 1.37102 2.22102
rsd0.25% 1.44102 3.28103 9.53103 1.38102 2.22102
N0sd15% 1.08106 0.32106 0.60106 1.03106 1.85106
N0sd1% 0.14106 0.18106 2.98104 8.75104 0.78106
N0sd0.25% 6.45104 3.94104 2.05104 5.41104 0.17106
1.39 2.63102 1.34 1.39 1.44
a 2.5th percentile.b 97.5th percentile.normal distribution centered on 0 with a standard deviation equal to1.39Ns,m,t.
3.3. Joint distributions and prediction
Over all strains in a given medium, no correlation was observed
mean (A), rmean (B), Ksd (C) and rsd (D). Red dot dashed, black dotted and blue curvesrves represent prior distributions.when studying the joint posterior distributions of the parameters ofthemodel as illustrated by the correlation coefcients and the smoothlines in Fig. 4. Parameter joint distributions are given for illustration inthe 15% glucose medium, but the lack of correlation is also valid in thetwo other media. On the other hand, for a given strain s in a givenconditionm, key latent variables K and rwere negatively correlated asillustrated in Fig. 5. The variance of the joint posterior distribution ofthe latent variables Ks,m and rs,m stems from microenvironmentalvariations and reects the variability between two replicates of thesame strain in the same environment. In a prediction point of view,these results lead to different ways of drawing latent variables. Whencorrelations are detected, it becomes necessary to draw jointly a Kvalue and an r value in their empirical joint posterior distribution.With our data set, taking into account joint distributions becomesparticularly important to model different replicates of a given strain sin a given condition m.
A major interest of Bayesian modelling is its predictive capacity.From our modelling achievement, it becomes possible to predict thetypical behaviour of any strain grown in our different glucoseconditions, as illustrated in Fig. 6. We rst simulated populationdynamics in 15% and 1% glucosemedia by drawing 30 values of K and rin the Kmean and rmean empirical joint posterior distributionsillustrated in Fig. 4. The 30 growth curves obtained represent themean typical behaviour in eachmedium under the assumption that allstrains behave like a hypothetical average one (Fig. 6A and B) andreect the effect of the culture medium. Then, to demonstrate theeffect of genetic variability on population dynamics, we drew 30 K andr values respectively in N(Kmean, Ksd) and in N(rmean, rsd) withrespect to the joint posterior distributions of these parameters. Asillustrated in Fig. 6, taking into account the genetic variability between
-
31A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535strains led to more variable behaviours than predicted by the soleeffect of the culture medium on population dynamic key variables.
4. Discussion
We developed a probabilistic model to describe populationdynamics of 12 different strains grown in three culture mediadiffering by their glucose content. To our knowledge, none of theprevious studies relied on a Bayesian framework to model S. cerevisiaepopulation dynamics.
4.1. Differences between frequentist and Bayesian approaches
Data presented here had previously been analyzed by tting apopulation dynamic model using a frequentist approach. Populationkey variables were then recovered and an explicative secondarymodel (ANOVA) had been used to determine if the forcing factors (inour case, medium variation and strain variability) did have asignicant effect on the estimated parameter values. With thismethod, uncertainty caused by the lack of t of the populationdynamic model to the biological data was not taken into account. Yetwe had just estimates, and less condence should be granted to valuesobtained with fewer points. In addition, non-linear relationships mayprevent from a rapid and unbiased convergence of such estimates (the
Fig. 4. Empirical joint posterior distributions of parameters Kmean15%, rmean15%, Ksd15%, rsdjoint posterior distributions are shown on the lower panel of the gure (under the diagonal)Ksd15%, rsd15% and . Correlation coefcients between parameters are given in the upper paspeed of convergence depends on the number of data pointsnecessary to reach a given estimate, and the bias corresponds to thedifferences between estimates obtained from different experiments).This strategy can lead to rough approximations, and even to wrongconclusions in some extreme cases. However, for this particular study,the two approaches produced similar estimates.
Within the frequentist context, a better strategy could have beento reconstruct an all-inclusive analysis of variance based on a globalnon-linear procedure (Mc Culloch et al., 2008; Molenberghs andVerbeke, 2006; Muller and Stewart, 2006). There are now powerfulEM algorithms (McLachlan and Krishnan, 1996) for the inference of abroad range of non-linear models (e.g. multi level models, mean-dispersion models, longitudinal models with individual evolutionsruled by differential equations, etc). However, in this kind of models,tests for signicant effects of controlling factors need to be adaptedand developed specically for each type of non-linear model. Thesemethods may be powerful in terms of precision of parameterestimators, but are reserved to high skilled statisticians since veryfew user-friendly software's are available. For instance, SAS Nlmixprocedures, Monolix and R Nlme routines cannot presently dealwith a statistical model like the one we have used. Moreover, from apractical point of view, classical frequentist framework encountersobstacles for the treatment of missing data, which either should bedeleted or replaced by approximated values. In Bayesian settings,
15% and , corresponding adjusted distributions and correlation coefcients. Empirical. The diagonal contains adjusted posterior densities of parameters Kmean15%, rmean15%,nel of the gure (over the diagonal).
-
32 A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535missing data are just considered as latent variables estimated afterconsideration of the data. No particular treatment of missing values isthen needed for the inference.
A fundamental advantage of the Bayesian approach is thepossibility to combine various sources of information to formulatehypotheses on parameters of interest, and thus to dene their priordistributions. In classical frequentist framework, hypotheses on
Fig. 5. Empirical joint posterior distributions of latent key variables K and r, corresponenvironmental conditions. Empirical joint posterior distributions are shown on the lower pdensities of variables K and r. Correlations coefcient are given in the upper panel of each
Fig. 6.Modelling the typical population dynamics of S. cerevisiae strains in two different envfrom empirical posterior distribution adjustments. A. 30 growth curves (red lines) were simmL), and Kmean15% and rmean15% drawn in the empirical joint posterior distributions. Thevariability (genetic differences among strains) on population dynamic key variables. For earsd15%) taking into account correlations related to empirical joint posterior distributions. B.parameter distributions are not related to expert knowledge or toprevious experiments, while in Bayesian framework, posteriordistributions of a rst study could be recycled as prior distributionsfor a second one. The Bayesian approach can be viewed as statisticallearning machinery that progressively updates the state of knowl-edge about a specic phenomenon by processing data from aparticular eld of research.
ding adjusted distributions and correlation coefcients for one strain in the threeanel of each sub-gure (under the diagonal). The diagonal contains adjusted posteriorsub-gure (over the diagonal).
ironments. Population dynamics was modelled according to probability laws obtainedulated in 15% glucose medium, with the same initial density (N0=N0mean=106 cell/30 grey lines represent the additional information brought after incorporating geneticch grey line, K and r are respectively drawn in N(Kmean15%, Ksd15%) and in N(rmean15%,Same simulation conditions as in A, except that the glucose concentration was 1%.
-
33A. Spor et al. / International Journal of Food Microbiology 142 (2010) 25354.2. Bayesian modelling, a thinking framework for biologists
In statistical terms, we developed a hierarchical non-linear mixed-effects model with heteroscedastic variance. The conception of thismodel was quite simple and followed the natural way of thinking ofbiologists. The rst stepwas to imagine the phenomenological processinto action. Microbial populations are known to grow in 3 phases: lag,exponential and stationary. The classical mathematical description ofthis dynamics is a logistic equation summarized by 3 populationdynamic variables: K, r and N0. This is the deterministic part of themodel. Then, variability could be added to the process by drawingthese latent variables in probabilistic distributions. The way thesevariables are drawn reects directly how biologists understand theprocess and its behavioural similarities. We could have considered theeffect of glucose concentration on population dynamic key variablesas a random effect, but we chose to model it as a xed effect becausewe had only three culture media not homogenously distributed (15%,1% and 0.25%) and not representative of a typical environmentaleffect. Statistically, this means there is an average value for eachparameter in each glucose condition (Kmean, rmean and N0mean) andthat these three experimental situations are considered as indepen-dent. Then we introduced a random effect due to genetic variability,described by the standard deviation of the law from which thepopulation latent variables are drawn (Ksd, rsd andN0sd). The last stepwas to incorporate uncertainty in measurements. Since measurementerrors increased with population size, which was due to the highernumber of dilutions required to plate the same number of cells, wesimply chose to draw the residual variance in N(0, Ns,m,t), i.e. thestandard deviation of the error was proportional to the populationsize.
Our approach borrowed many concepts from the seminal Delign-ette-Muller et al. (2006)'s paper regarding Bayesian modelling ofgrowth curves for risk assessment. Specically, we employed: i) thewell-known logistic equation used as a primary growth model, ii)using their words to separate uncertainty and variability of a model,we tried to account for the main sources of variability (random effectsstemming from genetic differences between strains) and uncertainty(essentially measurement errors and only partial knowledge aboutparameters) and iii) the same powerful MCMC techniques were usedto perform posterior distribution of the unknowns. However, majordifferences in the modelling assumptions were made in our article,since main sources of variability and uncertainty were different. InDelignette-Muller et al. 2006, the researchers had to consider growthparameters as a function of changing conditions over a large range,since in addition to their own 61 curves, 35 others were taken from 10publications. As all our data have been obtained specically for thisstudy under controlled conditions, we didn't need such a secondarygrowth model. Similarly, due to the specic features of our strainsdata, we didn't need to introduce a lag time in the model. In thatsense, our model is simpler. Nevertheless, we had to relax theassumption of a homogeneous measurement error variance describedin Delignette-Muller et al. (2006) over each growth curve. Thus, weproposed here a more general way to deal with measurementuncertainties. Because we had the same strains replicated in differentconditions, our model allowed us to take explicitly genotypeenvir-onment interactions into account, which was not the case in theDelignette-Muller et al. (2006). Concerning the applications, Delign-ette-Muller et al. (2006) used parameter estimations that wereperformed separately on one bacterial species and on the totalmicrobial ora to predict the results of the competition between thebacterial species of interest and the total ora. Because we wereinterested in predicting the environmental conditions that favouryeast growth, we used our model to predict the range of variation forgrowth in different environments.
Thus the construction of the model was quite intuitive. Moreover,
we would like to emphasize that Bayesian approaches are verypowerful and accessible to model any dynamical or spatial processes,such as the impact of any environmental abiotic (temperature,oxidative stress, poisoning) or biotic factors, the competition betweentwo or more micro-organism species or strains, etc. A free dedicatedsoftware, WinBUGS ( MRC Biostatistics Unit, (Spiegelhalter et al.,2003)), is available and a library has been developed under R (libraryBRugs http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/2.9/BRugs_0.5-3.zip, the R Development Core Team) giving to thescientist the opportunity to develop his own models.
4.3. Additional improvements
Another possible approach could have been functional dataanalysis (Ramsay and Silverman, 2005) which aims at explainingthe variations of a response function of time by the information ofother explanatory variables, such as, in our case, the culture mediaor the strains. The analysis consists in converting the data into asystem of basic functions which are combined linearly to dene theactual functions. Then, a collection of statistical techniques is appliedto the parameters of the basic functions. Statistical packages areavailable for R and Matlab. This approach is really efcient foranswering questions like How do the curves vary from one conditionto another? or Are the differences between the curves signicant?.However, because parameter estimation is performed on basicmathematical functions, and not on the actual response curves, theymay lack of biological meaning. In the case of population dynamicanalysis, the key variables of the latent process, K, r and N0, have adirect biological meaning.
Various types of equations could have been used to model S.cerevisiae population dynamics. We chose to use a logistic equation,because it is parsimonious (three variables having a biologicalmeaning are sufcient) and possesses an analytical solution whichproperly describes S. cerevisiae population dynamics. A moresophisticated population dynamic model based on ordinary differen-tial equations, as described in (Billoir et al., 2008), could have beenused. In this type of modelling, the specic add-on module PKBUGSdeveloped by Lunn can be used, as in pharmacokinetics, whenever theordinary differential equation models with unknown coefcientscannot be analytically solved (Lunn et al., 1999).
Competingmodels couldhavebeendesigned and comparedvia theirposterior probabilities (e.g.withBayes factors; (Kass andRaftery, 1995))or according to their predictive ability on a test sample (predictiveposterior checks; (Gelman et al., 2003)). The hypothesis of conditionalindependence betweenmedia could be relaxed. For instance a continualcovariation of the parameters with the environments could be added.A random strain origin effect independent from glucose concentrationcould have been declared, and variance estimates could have beencompared to the estimates of the present model. In addition, a morecomplex model would assume prior correlations between populationdynamic features with a 3-dimensional random effect instead of threeindependent random effects for each of these. This would help takinginto account biological adjustments in the life-history strategy of strainssuch as an increased value of r to compensate for a below average K.Other renements may affect the measurement error structure: at verylow additional computation costs, a two-parameter proportional errorNs,m,tN (0,Ns,m,t
) could be studied in the model ( is the secondparameter).
It is also relatively straightforward to develop the same type ofmodel for more than two factors providing that enough data arecollected to convey sufcient information to update the additionalparameter priors to be involved into the analysis. However whenmany factors interplay, we can no longer trust the commonmicrobiological good sense to focus on a single model and manycompeting models can be designed. When the number of possiblemodels gets large (and this number grows very rapidly with the
number of factors), the Bayesian model selection problem may
-
Domizio, P., Lencioni, L., Ciani, M., Di Blasi, S., Pontremolesi, C., Sabatelli, M.P., 2007.
in spatially replicated surveys. Journal of Applied Ecology 45, 589598.
34 A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535become tricky and rened MCMC strategies (clearly out of the scopeof this paper) should be used to select the appropriate model(reversible MCMC jump as in Green, 1994 or stochastic search as inchapter 4 of Marin and Robert, 2007).
4.4. A measure of G*E interactions
Genotype-by-environment interactions reect the way a genotypebehaves across different environments. In classical quantitativegenetics, an ANOVA is performed on the trait considered, and theG*E effect is inferred when (i) the ranking of the genotypes changesaccording to the medium or (ii) the inter-individual variance varies inthe different environmental conditions (Lynch and Walsh, 1997).
Bayesian inference provides us with both the empirical posteriordistributions of growth key variables (Ks,m, rs,m, N0s,m) of each strain ineach glucose condition, and the standard deviation (Ksdm, rsdm,N0sdm) of those variables in each glucose condition, which reectsstrain variability in each condition. Then comparisons of the rankingof the average growth key variables of the strains could be performed(for example rank correlation of parameter values in two differentenvironments), as well as comparisons of posterior distributions ofstandard deviations of parameters among culture conditions (usingfor example KolmogorovSmirnov tests).
In our case, the means of the empirical posterior distributions ofKsd are very different in the three culture media while the means ofrsd stay constant. The between-strain genetic differences for carryingcapacity are increased in rich media compared to poor ones, whilegenetic variation of the intrinsic growth rate stays robust towardsenvironmental changes. This is a good example of standing geneticvariation expressed only under certain environmental conditions(Ksd) and canalization (rsd) (Waddington, 1942). HBM would bepowerful for providing better predictions of the genetic diversity ofpopulation dynamic key variables in relation to environmentalvariation.
Last but not least, another fundamental aspect of the Bayesianapproach is the possibility to study the joint posterior distributions oflatent variables and parameters of interest. Studying the jointposterior distributions of Kmean and rmean in each glucose conditionshould inform us about the possible existence of a genetic trade-offbetween carrying capacity and intrinsic growth rate in S. cerevisiaepopulations. We did not detect it in our data. However we showedthat modelling the population growth of a given strain s in a givenconditionm requires to take into account the conditional dependencebetween key latent variables K and r. In other words, the differentpossible ways a given strain can grow in a given condition (because ofthe microenvironmental variations, within population variations andstochasticity between technical replicates) are constrained by a trade-off between K and r (Novak et al., 2006).
4.5. Further prospects
The next step would be to extend this model and to classify strainsaccording to their population dynamic behaviour. By dening a xedeffect of the industrial origin of strains (as we did for the glucoseeffect), and modelling population growth in typical conditions thatmaximize the differences between strains coming from differentindustrial origins, we could determine the adjusted distributions ofpopulation dynamic key variables. Then, when testing a novel naturalstrain in this environment, it would be possible to assign a measure ofgoodness of this strain in a specic industrial condition byclassifying it according to its population dynamics. Another possibleapplication could be the choice of typical growing conditions in orderto maximize the differences between industrial origins. A mediumthat seems appropriate to maximize differences between strainscoming from different industrial origins would be a medium in which
all strains are able to grow (not too stressful) but in which strainsLunn, D.J., Wakeeld, J., Thomas, A., Best, N., Spiegelhalter, D., 1999. PKBugs User GuideDept. Epidemiology and Public Health. Imperial College School of Medicine, London.
Lynch,M.,Walsh, B., 1997. Genetics and Analysis of Quantitative. Traits Sinauer associates,Inc., Sunderland, Massachusetts.
MacArthur, R.,Wilson, E.O., 1967. The Theory of Island Biogeography. PrincetonUniversitypress.
Marin, J.M., Robert, C.P., 2007. Bayesian Core: a Practical Approach to ComputationalBayesian Statistics. Springer, New York.
Mc Culloch, C.E., Searle, S.R., Neuhaus, J.M., 2008. Generalized, Linear and Mixed Models.Wiley-Interscience.
McLachlan, G.J., Krishnan, T., 1996. The EmAlgorithm and Extensions. Wiley-Interscience.Membre, J.M., Leporq, B., Vialette, M., Mettler, E., Perrier, L., Thuault, D., Zwietering, M.,
2005. Temperature effect on bacterial growth rate: quantitative microbiologyapproach including cardinal values and variability estimates to perform growthSpontaneous and inoculated yeast populations dynamics and their effect onorganoleptic characters of Vinsanto wine under different process conditions.International Journal of Food Microbiology 115, 281289.
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 2003. Bayesian Data Analysis. Chapman &Hall/CRC.
Gilks, W.R., Richardson, S., Spiegelhalter, D.J., 1996. Markov Chain Monte Carlo inPractice. Chapman & Hall, London.
Green, P.J., 1994. Reversible jumpMCMC computation andBayesianmodel determination.Tech. Report. University of Bristol.
Hooten, M.B., Wikle, C.K., Dorazio, R.M., Royle, J.A., 2007. Hierarchical spatiotemporalmatrix models for characterizing invasions. Biometrics 63, 558567.
Kass, R.E., Raftery, A.E., 1995. Bayes factors. Journal of the American Statistical Association90, 773795.
Kry,M., Royle, J.A., 2008. Hierarchical Bayes estimation of species richness and occupancycould exhibit quantitative variation for population dynamic para-meters. For example, in our experiment, the 15% glucose mediumtypically maximizes the variance for K between strains (Fig. 6).However, from a statistical perspective, it is rather a challenge to planin practice an optimal design, which would maximize beforehand theexpected differences between strains coming from different industrialorigins. In the Bayesian setting (Mller, 1999), a utility function has tobe elicited from the end-users and burdensome high dimensionalmaximization and integration have to be performed (Amzal et al.,2006).
Acknowledgements
The authors would like to thank Yves Rousselle, Adrienne Ressayreand Thibault Nidelet for critical comments and programmingassistance. We are also grateful to reviewers for their criticalcomments. This work was supported by a PhD fellowship of theMinistre de l'Enseignement Suprieur et de la Recherche to AymSpor, and indirectly beneted from a grant by the French AgenceNationale de la Recherche (ANR Project ADAPTALEVURE no. NT05-4_45721).
References
Amzal, B., Bois, F., Parent, E., Robert, C.P., 2006. Bayesian optimal design via interactingparticle systems. Journal of the American Statistical Association 101 (474), 773785.
Barker, G.C., Malakar, P.K., Peck, M.W., 2005. Germination and growth from spores:variability and uncertainty in the assessment of food borne hazards. InternationalJournal of Food Microbiology 100, 6776.
Beltran, G., Torija, M.J., Novo, M., Ferrer, N., Poblet, M., Guillamon, J.M., Rozes, N., Mas, A.,2002. Analysis of yeast populations during alcoholic fermentation: a six yearfollow-up study. Systematic and Applied Microbiology 25, 287293.
Bernier, J., Parent, E., Boreux, J.J., 2000. Statistique pour l'environnement Lavoisier, Paris.Billoir, E., Delignette-Muller, M.L., Pry, A.R., Charles, S., 2008. A Bayesian approach to
analyzing ecotoxicological data. Environmental Science & Technology 42 (23),89788984.
Boekhout, T., Robert, V., 2003. Yeasts in Food. Woodhead Publishing Limited, Cambridge,England.
Calder, C., Lavine, M., Mller, P., Clark, J.S., 2003. Incorporating multiple sources ofstochasticity into dynamic population models. Ecology 84, 13951402.
Clark, J.S., 2003. Uncertainty and variability in demography and population growth: ahierarchical approach. Ecology 84, 13701381.
Clark, J.S., Gelfand, A.E., 2006. Hierarchical Modeling for the Environmental Sciences:Statistical Methods and Applications. Oxford University Press Inc., New York.
Delignette-Muller, M.L., Cornu, M., Pouillot, R., Denis, J.B., 2006. Use of Bayesian modelingin risk assessment: application to growth of Listeria monocytogenes and food ora incold-smoked salmon. International Journal of Food Microbiology 106, 195208.simulations on/in food. International Journal of Food Microbiology 100, 179186.
-
Molenberghs, G., Verbeke, G., 2006. Models for Discrete Longitudinal Data. Springer,Berlin/Heidelberg.
Mller, P., 1999. Simulation-based optimal design. Bayesian Statistics 6, 459474.Muller, K.E., Stewart, P.W., 2006. LinearModel Theory:Univariate,Multivariate, andMixed
Models. Wiley-Interscience.Nauta, M.J., 2000. Separation of uncertainty and variability in quantitative microbial
risk assessment models. International Journal of Food Microbiology 57, 918.Nauta, M.J., 2002. Modeling bacterial growth in quantitative microbiological risk
assessment: is it possible? International Journal of Food Microbiology 73, 297304.Novak, M., Pfeiffer, T., Lenski, R.E., Sauer, U., Bonhoeffer, S., 2006. Experimental tests for
an evolutionary trade-off between growth rate and yield in E. coli. The AmericanNaturalist 168, 242251.
Pouillot, R., Albert, I., Cornu, M., Denis, J.B., 2003. Estimation of uncertainty and variabilityin bacterial growth using Bayesian inference. Application to Listeria monocytogenes.International Journal of Food Microbiology 81, 87104.
Ramsay, J., Silverman, B.W., 2005. Functional Data Analysis. Springer, New York.Shorten, P.R., Membre, J.M., Pleasants, A.B., Kubaczka,M., Soboleva, T.K., 2004. Partitioning
of the variance in the growth parameters of Erwinia carotovora on vegetable products.International Journal of Food Microbiology 93, 195208.
Spiegelhalter, D.J., Thomas, A., Best, N., Lunn, D., 2003. WinBUGS Version 1.4 User ManualMRC Biostatistics Unit, Cambridge, UK.
Spor, A., Wang, S., Dillmann, C., de Vienne, D., Sicard, D., 2008. Ant and grasshopperlife-history strategies in Saccharomyces cerevisiae. PLoS ONE 3, e1579.
Waddington, C.H., 1942. Canalization of development and the inheritance of acquiredcharacters. Nature 150, 563565.
Wikle, C.K., 2003. Hierarchical Bayesian models for predicting the spread of ecologicalprocesses. Ecology 84, 13821394.
Wloch, D.M., Szafraniec, K., Borts, R.H., Korona, R., 2001. Direct estimate of the mutationrate and the distribution of tness effects in the yeast Saccharomyces cerevisiae.Genetics 159 (2), 441452.
35A. Spor et al. / International Journal of Food Microbiology 142 (2010) 2535