on parameter estimation in population models
TRANSCRIPT
ARTICLE IN PRESS
0040-5809/$ - se
doi:10.1016/j.tp
�CorrespondE-mail addr
uq.edu.au (T. T
Theoretical Population Biology 70 (2006) 498–510
www.elsevier.com/locate/tpb
On parameter estimation in population models
J.V. Ross�, T. Taimre, P.K. Pollett
Department of Mathematics, University of Queensland, Qld. 4072, Australia
Received 4 January 2006
Available online 12 August 2006
Abstract
We describe methods for estimating the parameters of Markovian population processes in continuous time, thus increasing their utility
in modelling real biological systems. A general approach, applicable to any finite-state continuous-time Markovian model, is presented,
and this is specialised to a computationally more efficient method applicable to a class of models called density-dependent Markov
population processes. We illustrate the versatility of both approaches by estimating the parameters of the stochastic SIS logistic model
from simulated data. This model is also fitted to data from a population of Bay checkerspot butterfly (Euphydryas editha bayensis),
allowing us to assess the viability of this population.
r 2006 Elsevier Inc. All rights reserved.
Keywords: Markov chains; Cross-entropy method; Density dependence; Euphydryas editha bayensis; Stochastic SIS logistic model
1. Introduction
Continuous-time Markov chains have been proposed astheoretical models for an array of biological systems. Forexample, they have been used to describe the evolution ofpopulations (Bartlett, 1949), the spread of epidemics(Bailey, 1957; Bartlett, 1956) and rumours (Daley andKendall, 1965), and competition between species (Reuter,1961). Whilst they are ubiquitous in the applied probabilityliterature, their application has been limited, partly due toa lack of clear statistical procedures for fitting the models.This is most evident in the ecological literature, wherediscrete-time Markov chains predominate, perhaps due inpart to a misconception that a discrete-time model isneeded if data are collected at discrete-time points (say atthe end of each breeding season). We address the statisticallimitations by providing straightforward methods forparameter estimation, so enabling this rich family ofmodels to be used in modelling real biological systems.
We begin by reviewing the general approach to para-meter estimation in the context of continuous-time Markovchains, when the process is observed at successive (but not
e front matter r 2006 Elsevier Inc. All rights reserved.
b.2006.08.001
ing author.
esses: [email protected] (J.V. Ross), ttaimre@maths.
aimre), [email protected] (P.K. Pollett).
necessarily equally spaced) time points (for example, sets ofabundancy data collected at various times). This should becontrasted with the case where the process is observedcontinuously; in this situation different techniques apply(see for example Section 4 of Feigin, 1976). We thendescribe an approach which is simpler in terms ofcomputational implementation, and which is suitable forparameter estimation in a wide class of population modelscalled (asymptotically) density-dependent1 Markov pro-cesses (Kurtz, 1970). Density-dependent processes haveappeared in a variety of biological and physical contexts:chemical kinetics (Kurtz, 1973; Pollett and Vassallo, 1992;Gillespie, 2000), epidemics (Nasell, 2002; Clancy, 2005;Clancy et al., 2001), metapopulations (Barbour andPugliese, 2004, 2005; Ross, 2006a,b), parasitology (Pollett,1990) and telecommunications (Gibbens et al., 1990;Pollett, 1992). Whatever the context, our approach canbe used, but we focus here on ecological models. Themethod is based on an approximation to the full likelihoodof the process, resulting in a substantial decrease incomputational complexity. This is achieved using someremarkable results of Kurtz (1970, 1971) and Barbour
1In its common usage in ecology, ‘density dependence’ refers to a
population whose per capita rate of change is a function of population
size. Its use here is a more literal ‘dependence on the population density’ of
a specific form (see Definition 2 in Appendix A).
ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510 499
(1974, 1976, 1980a,b), which provide (i) a natural way ofidentifying an approximating deterministic model thatmore closely approximates the original model as the sizeof the system increases, and (ii) a Gaussian diffusioncharacterizing fluctuations about the unique trajectory ofthe deterministic model. If the population process startsclose to a deterministic equilibrium, this diffusion is anOrnstein–Uhlenbeck (OU) process, and the likelihoodfunction for observations at successive time points has aparticularly simple form. It is this likelihood we exploit forour parameter estimation procedure.
We illustrate both techniques by fitting the stochastic SISlogistic model to simulated data. We also fit this model todata from a population of Bay checkerspot butterfly(Euphydryas editha bayensis), and use the fitted model toassess the viability of the population.
The remainder of the paper is organised as follows. Inthe next section we define Markov population processes.This is followed in Section 3 with an account of a generalapproach to estimating the parameters of continuous-timeMarkov chains with bounded transition rates. Section 4summarises properties of density-dependent models, whichare illustrated with reference to the stochastic logisticmodel. Several properties of density-dependent models areexploited in Section 5, where our computationally simplermethod is described. In Section 6 we illustrate bothapproaches to parameter estimation using the stochasticlogistic model, fitting it to both simulated and real data.We also investigate the bias of our estimates and provideerror bounds for them in Section 6. A general discussion ofour numerical results is given in Section 7, and this isfollowed by a summary of the paper in Section 8.
2. Markov population processes
Markov population processes have been used extensivelyto model populations consisting of individuals of differentcategories or types, or where the individuals occupydifferent sites. We represent the state at time t by a vectormðtÞ ¼ ðm1ðtÞ;m2ðtÞ; . . . ;mkðtÞÞ, where miðtÞ is the numberof individuals at the ith site at time t, and we assume thatðmðtÞ; tX0Þ is a continuous-time Markov chain whose statespace S is contained in Zk (the k-dimensional integerlattice). The evolution of such a Markov chain is governedby transition rates Q ¼ ðqðm; nÞ;m; n 2 SÞ, with qðm; nÞrepresenting the rate of transition from state m tostate n, for nam, and qðm;mÞ ¼ �qðmÞ, whereqðmÞ :¼
Pnam qðm; nÞ ðo1Þ, being the total rate out of
state m. The model is specified by writing down Q. Thejumps of a Markov population process can be of threekinds: the arrival of a new individual (birth or immigra-tion), the departure of an existing individual from thesystem (death or emigration), or the transfer of anindividual from one site to another (migration orpredator–prey competition). Accordingly, we use thefollowing definition (Kingman, 1969):
Definition 1. A Markov chain on a subset S of Zk will becalled a Markov population process if, for any m, the onlyvalues of n for which qðm; nÞ is non-zero are those with
ni ¼ mi þ 1; nj ¼ mj ðjaiÞ ðarrival at iÞ,
ni ¼ mi � 1; nj ¼ mj ðjaiÞ ðdeparture from iÞ,
ni ¼ mi � 1; nj ¼ mj þ 1; nk ¼ mk ðkai; jÞ
ðtransfer from i to jÞ.
Thus, Markov population processes encompass birth–death processes, predator–prey systems, metapopulationsand many other population processes. Our secondapproach to parameter estimation, to be outlined inSection 5, works best for Markov population processes.We note that the approximation can be used for processeswith larger than unit changes in population size, but themethod will become less accurate in such situations and forlarge changes in population size it will be renderedineffective. Consequently, our method of parameterestimation cannot be used for populations that experience,for example, catastrophes.
We will assume that Q is regular, so that the there is aunique transition function PðtÞ with entries pijðtÞ corre-sponding to the probability that the process moves fromstate i to state j in time t. Regularity is guaranteed if Q isbounded in the sense that qðmÞpb, for some constant b, acondition that is trivially satisfied when there are finitelymany states. In this case we may write PðtÞ ¼ expðQtÞ,where exp is the matrix exponential. (For further details,see Anderson, 1991; Norris, 1997.) In most cases thetransition function cannot be evaluated explicitly. How-ever, the behaviour of the process can often be investigatedby working directly with the transition rates or, for morecomplex processes, by way of numerical computationprocedures, or analytical approximations such as thediffusion approximation adopted here.
3. A general approach to parameter estimation
The general approach to parameter estimation, whichmakes use of a numerical procedure for computing thematrix exponential, is straightforward (see for exampleMode and Sleeman, 2000 for a general discussion), butapparently not widely known in the ecology literature,perhaps due in part to the infeasibility of the method whenthe population ceiling is large.We will suppose that there is a parameter (or vector of
parameters) y, contained in some parameter space Y, thatmust be estimated. We will allow the dependence on y to bemade explicit in our notation by writing QðyÞ for thetransition rates and Pðy; tÞ ¼ expðQðyÞtÞ ¼ ðpijðy; tÞ; i; j 2SÞ for the transition function. We will also write piðy; tÞ forthe probability that the process is in state i at time t;pðy; tÞ ¼ ðpiðy; tÞ; i 2 SÞ is usually taken to be the stationaryor the quasi-stationary distribution. Given a set of n
ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510500
observations ik ¼ mðtkÞ ðk ¼ 1; . . . ; nÞ of the state of theprocess at times ð0pÞ t1o � � �otn, the likelihood ofobserving them is
LðyÞ ¼ pi1ðy; t1Þ
Yn
k¼2
pik�1;ikðy; tk � tk�1Þ. (1)
We may then calculate the maximum likelihood estimator(MLE) by of the parameters y by maximising the likelihood(1) over Y for the given observations. As noted previously,the transition function cannot usually be evaluatedexplicitly, but progress can be made by computing thetransition probabilities numerically. This, combined with anumerical search algorithm over the parameter space Y,allows us to compute the desired MLE.
To compute the required matrix exponentials, we useEXPOKIT, a software package whose algorithms make useof Krylov subspace methods (Sidje, 1998). Any one of arange of numerical optimisation techniques can be used tomaximise (1). We use the Cross-Entropy (CE) Method
(Rubinstein and Kroese, 2004), which has proved to beparticularly effective for maximising the likelihood func-tions that we consider. We will see that this combinationprovides a useful tool for fitting continuous-time Markovchains to real systems, provided that the parameter spaceand the maximum population size is not too large. Weimplement this approach in Section 6.
4. Density-dependent models
Our second approach, to be outlined in Section 5, issuitable for parameter estimation in a wide class ofstochastic population models known as density-dependentmodels, which we shall describe and illustrate here.
Imagine now that our population model is indexed byN40, which will usually be a parameter of the model. Inany case N will be related to the size of the system. Fordefiniteness we may think of N as being the populationceiling (maximum population size). Thus, we have a family
fmNð�Þg of Markov chains indexed by N, where mN ð�Þ takesvalues in SN (contained in Zk) and has transition ratesQN ¼ ðqN ðm; nÞ; m; n 2 SNÞ. Our model is then calleddensity-dependent (Kurtz, 1970) if the transition rates takethe form
qNðm;mþ lÞ ¼ Nfm
N; l
� �; la0, (2)
for a suitable function f, so that the transition rates of thecorresponding ‘density process’ X N ð�Þ, defined byX N ðtÞ ¼ mNðtÞ=N, tX0, depend on the present state m
only through the density m=N. A formal and slightly moregeneral definition, where property (2) is realised asympto-
tically (for large N), is given in Appendix A (Definition 2).To illustrate the idea, we will consider a version of
Feller’s (1939) stochastic logistic model, which allows one tomodel the dynamics of a population inhabiting a finiteregion that can support a maximum of N individuals, andaccounts for density dependence (in the ecological sense
noted in Section 1) in both the birth and death transitionrates. The version we shall consider is the stochastic SIS
(Susceptible–Infective–Susceptible) logistic model, whichwas one of the earliest stochastic models for the spreadof infections that do not confer any long lasting immunity,and where individuals become susceptible again afterinfection (Weiss and Dishon, 1971). It provides thesimplest population model that incorporates ecologicaldensity dependence and individual-level demography. Ithas been applied, not only in epidemiology, but also in thepropagation of rumours (Bartholomew, 1976), in chemicalreaction kinetics (Oppenheim et al., 1977), and inmetapopulation ecology (Levins, 1969). It is a continu-ous-time Markov chain ðmðtÞ; tX0Þ taking values in S ¼
f0; 1; . . . ;Ng with non-zero transition rates
qðm;mþ 1Þ ¼ lm
NðN �mÞ ðm ¼ 1; 2; . . . ;N � 1Þ,
qðm;m� 1Þ ¼ mm ðm ¼ 1; 2; . . . ;NÞ,
where l and m are both positive. We can see immediatelythat the model is density dependent with f ðx;þ1Þ ¼ lxð1�xÞ and f ðx;�1Þ ¼ mx, being the rate of increase and the rateof decrease, respectively, when the population density is x.Furthermore, if we let DX ¼ X Nðtþ DtÞ � X N ðtÞ, thenEðDX jX N ðtÞ ¼ xÞ ’ F ðxÞDt, where F ðxÞ ¼ lxð1� xÞ � mx.Consequently, a natural deterministic model emerges forthe population density:
dx
dt¼ F ðxÞ where F ðxÞ ¼ lxð1� xÞ � mx. (3)
This is of course the classical Verhulst model (Verhulst,1838) (which has been reinvented several times since it firstappeared in 1838). Indeed, for any density-dependentmodel we can identify a deterministic analogue:
dx
dt¼ F ðxÞ where F ðxÞ ¼
Xl
lf ðx; lÞ. (4)
Our intuition tells us that X N ð�Þ should behave moredeterministically as N becomes large and that indeed thedeterministic trajectory should be ‘tracked’ by the processwhen N is large. Kurtz (1970) established this rigorously byproving a functional law of large numbers (stated in itsmost general form as Theorem 3 of Appendix A) that holdsunder mild conditions which will apply in most biologicalcontexts.What makes this approximation procedure so powerful
is our ability to model precisely the fluctuations of thedensity process X N ð�Þ about the deterministic trajectory.We will summarise results for the ‘equilibrium case’, wherewe model the fluctuations about a steady state xeq of (4),that is, F ðxeqÞ ¼ 0, and consider the family fZNð�Þg definedby ZN ðtÞ ¼
ffiffiffiffiffiNpðX N ðtÞ � xeqÞ. Our intuition, based on the
Central Limit Theorem, tells us that, for large N, ZN ðtÞ
should have an approximate normal (Gaussian) distribu-tion (with zero mean and a variance that depends on t).This is indeed the case, but we can be far more precise byidentifying an approximating process. Kurtz (1970) proved
ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510 501
that our scaled density process ZN ð�Þ converges to an OUprocess Zð�Þ whose parameters can be determined simplyfrom the original model. This is given in Appendix A with aformal statement of a functional central limit theorem(Corollary 5).
Much can be deduced from properties of the limitingprocess Zð�Þ. For example, in the one-dimensional case, wemay conclude that, for N large, X NðtÞ has an approximatenormal distribution with
EðX N ðtÞÞ ’ xeq þ eBtðX Nð0Þ � xeqÞ,
where B ¼ F 0ðxeqÞ and VarðX NðtÞÞ ’ s2t =N, with
s2t ¼GðxeqÞ
2Bðe2Bt � 1Þ where GðxÞ ¼
Xl
l2f ðx; lÞ.
We will consider only the case where xeq is asymptoticallystable, that is Bo0, so that, as t!1, EðZðtÞÞ ! 0 andVarðZðtÞÞ ! s2 :¼V=ð�2BÞ, where V ¼ GðxeqÞ. Again, fulldetails are given in Appendix A. The special covariancestructure of the OU process will be revealed and exploitedin the next section, where our computationally simplerparameter estimation method is described.
To illustrate the diffusion approximation technique,consider again the logistic model, but in the mostinteresting case l4m, where there is drift away from theextinction state 0. Under this assumption, (3) has twoequilibria, both in ½0; 1�: 0 (unstable) and xst ¼ 1� r, wherer ¼ m=l (asymptotically stable). We note, in the context ofepidemiology, r is the equilibrium fraction of susceptibleindividuals and is also the inverse of the basic reproductionratio R0 (Diekmann et al., 1990). Furthermore, (3) has theunique solution
xðtÞ ¼xstx0
x0 þ ðxst � x0Þe�lxstt; tX0, (5)
where x0 ¼ xð0Þ is the initial point of the trajectory.It is possible to write down the full diffusion approxima-
tion for the population density X Nð�Þ ¼ mð�Þ=N about thedeterministic path (5) by way of Theorem 4 of Appendix A,but we will not pursue this here. Rather, we will considerthe OU approximation about the asymptotically stableequilibrium xst. Since F 0ðxÞ ¼ lðxst � 2xÞ, we have (localdrift) B ¼ F 0ðxstÞ ¼ �lxst and, since GðxÞ ¼ F ðxÞ þ 2mx,we have (local variance) V ¼ GðxstÞ ¼ 2mxst ¼ 2lxstr. Weconclude that, for N large, X N ðtÞ has an approximatenormal distribution with
EðX NðtÞÞ ’ xst þ e�atðX N ð0Þ � xstÞ;
VarðX NðtÞÞ ’ rð1� e�2atÞ=N;
where a ¼ lxst ¼ l� m ¼ lð1� rÞ ð40Þ. Hence, for alltX0,
EðmðtÞÞ ’ Nð1� rÞ þ e�atðmð0Þ �Nð1� rÞÞ;
VarðmðtÞÞ ’ Nrð1� e�2atÞ:
5. A computationally simpler approach
Our parameter estimation method uses the (stable) one-dimensional OU approximation outlined in the previoussection. The use of diffusion approximations for parameterestimation has been considered in the statistical literature(see for example Feigin, 1976; McNeil and Weiss, 1977),but once again the application of the idea to real biologicalmodelling appears to be limited. Recall that we are given aset of n observations ðmðt1Þ; . . . ;mðtnÞÞ of the state of theprocess at times ð0pÞ t1o � � �otn. If we denote thecorresponding observations of the density process byðX Nðt1Þ; . . . ;X NðtnÞÞ, then, when N is large, this vector willhave an approximate normal distribution, with a covar-iance structure that can be determined explicitly fromproperties of the limiting OU process.If we start the OU process in equilibrium, that is,
Zð0Þ�Normalð0;s2Þ, where s2 ¼ V=ð�2BÞ, or equivalently(for large N) X Nð0Þ�Normalðxeq; s2=NÞ, then it willbe strongly stationary. Therefore, since CovðZðsÞ;Zðsþ tÞÞ ¼ s2 expðBjtjÞ, we may approximate CovðX N ðsÞ;X N ðsþ tÞÞ by
cðtÞ :¼ cð0Þ expðBjtjÞ, (6)
where cð0Þ ¼ s2=N. Hence, for large N, we know explicitlythe correlation structure of the vector ðX Nðt1Þ; . . . ;X NðtnÞÞ,and hence its likelihood function:
f ðxÞ ¼1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið2pÞnjCj
p exp �1
2ðx�mÞC�1ðx�mÞ0
� �, (7)
where m ¼ ðm1;m2; . . . ;mnÞ, mi ¼ xeq for all i ¼ 1; 2; . . . ; n,and
C ¼
c1 c1;2 c1;3 � � � c1;n
c1;2 c2 c2;3 � � � c2;n
..
. ... ..
. ... ..
.
c1;n c2;n c3;n � � � cn
0BBBBB@
1CCCCCA,
where ci ¼ s2=N and ci;iþs ¼ ðs2=NÞ expðBðtiþs � tiÞÞ. Theinversion of C, and calculation of its determinant, can bedone explicitly, and this permits us to write down the (log)likelihood explicitly (see Appendix B). The form thatshould be used in practice is given in (10) (or (11)).We can therefore evaluate the (joint) MLEs for the
parameters of the model, which are the values thatmaximise (7). Explicit calculation of the MLEs is notpractical in most situations. Therefore, a numericaloptimisation procedure will be required in most cases tofind the parameters which maximise the likelihood func-tion. The CE method (see Rubinstein and Kroese, 2004) isan ideal recent approach, but any one of several numericaloptimisation procedures should be effective. We illustratethis approach in Section 6 by using the CE method to
ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510502
estimate the parameters of the stochastic logistic modelusing both simulated and real data.
Remarks. (1) In many situations the maximum populationsize N is unknown. It can be estimated using theapproximation NX Nð0Þ�NormalðNxeq;Ns2Þ. The jointMLEs are then calculated with N as an additionalunknown parameter.
(2) The OU approximation is achieved by letting themaximum population size tend to infinity. Thus, the OUapproximation, and hence the parameter estimationprocedure itself, are best for large population sizes. Thisis precisely the situation in which the first method becomesinfeasible.
(3) We emphasise that unequally spaced sampling of theprocess is no obstacle to the success of the method, as canbe seen from the covariance structure (6).
Finally, we note that any approach to estimating theparameters of an OU process may be implemented. Forexample, one may use the theory of estimation forcontinuous-time autoregressive processes (Fan et al.,1999). See also Franco (2003) for a computationallyefficient maximum likelihood method. Additionally, inthe approach presented above, it is often possible todetermine an equation for the optimal value of oneparameter, conditional on knowing the other parameters.In such cases we may use this equation to reduce thedimensionality of our search by one, thus increasing thecomputational efficiency of the method.
6. The stochastic logistic model
In this section we illustrate the methods presented inSections 3 and 5, by estimating the parameters of thestochastic logistic model from both simulated dataand data for a population of Bay checkerspot butterfly(E. editha bayensis) (Harrison et al., 1991). We considersituations in which the maximum population size is knownand unknown, where it is ‘small’ and where it is ‘large’, andsituations where we have a reasonable sample of data, andothers where we have minimal data.
The general approach: For this model, the parametervector y is ðl; mÞ, or ðl;m;NÞ, depending upon whether N isknown or unknown, respectively.
The computationally simpler approach: We assume thatX N ð0Þ is normally distributed with mean 1� m=l andvariance m=ðNlÞ, so that the OU process is stronglystationary and, in particular,
CovðX NðsÞ;X N ðsþ tÞÞ ’ cðtÞ ¼m
Nlexpð�ðl� mÞjtjÞ.
Furthermore, the likelihood ðX Nðt1Þ;X Nðt2Þ; . . . ;X NðtnÞÞ
has the approximate Gaussian density (7) whose para-meters have entries mi ¼ 1� m=l, ci ¼ m=ðNlÞ and ci;iþs ¼
m=ðNlÞ expð�ðl� mÞðtiþs � tiÞÞ (i ¼ 1; 2; . . . ; n).Experience shows that in some situations it is better,
from a numerical perspective, to use the log-likelihood lðxÞ,
that is
lðxÞ ¼ �n
2lnð2pÞ �
1
2lnðjCjÞ �
1
2ðx�mÞC�1ðx�mÞ0.
(Again, the form that should be used in practice is givenin (11).)
6.1. Simulated data
In all cases discussed below, our simulated data sets weregenerated in the following way. We simulated thestochastic logistic model with chosen parameters (as listedin each section) to generate data over 119 units of time(years). We then observed the population size at yearlyintervals. We took as our data set the required number n ofobservations, after discarding the first 20 observations(corresponding to 19 years of data), to allow the influenceof the initial conditions to dissipate. Also, unless otherwisespecified, the results presented are from 20 runs of theappropriate CE algorithm (see Appendix C).In the tables presented, ‘It’ refers to the number of
iterations taken by the CE algorithm and ‘S’ represents thevalue of the (log-)likelihood. The ‘min’, ‘median’, ‘mean’and ‘max’ in the tables refer to the value of the (log)likelihood S.
The general approach: The parameters used were N ¼ 50,l ¼ 0:8 and m ¼ 0:4, and n ¼ 40. We used a CE algorithm,with normal updating (Rubinstein and Kroese, 2004) tomaximise the likelihood given by (1). We considered the(log-)likelihood as a function of the ‘natural’ parameters,N, r ¼ m=l and a ¼ l� m. In the case where N was known,we used the log-likelihood, and in the case where it wasunknown, we used the likelihood itself. We used EX-POKIT to evaluate (1) with initial distribution taken to bethe quasi-stationary distribution, and began the CEalgorithm with 1000 initial parameter estimates for ðl;mÞ,drawn uniformly from the triangle (0,0)–(0,5)–(5,5). Notethat, for the example considered, a wider range of initialparameters resulted in numerical precision problems whenusing EXPOKIT (for Matlab) to evaluate the likelihood.Also, for each parameter estimate, a transition rate matrixhad to be created and passed to EXPOKIT for evaluation.In our case, we could take advantage of the sparsestructure of Q, which EXPOKIT could exploit. Even so,experience showed that when N became quite large, theevaluation of the likelihood at a parameter estimate couldbecome too costly for our algorithm to complete inreasonable time (on a 3GHz Pentium 4 with 1Gb ofRAM). Hence, it would appear that the general procedureis currently feasible only when N is moderate.The CE parameters were updated on the basis of the best
25 parameter estimates, meaning that the best 25 parameterestimates of the 1000 were used to obtain a new samplingdistribution for the next iteration. The algorithm wasstopped when all of the standard deviations were belowe ¼ 10�7, or when the algorithm hit the maximumiterations ceiling of 250. Note that we did allow ‘illegal’
ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510 503
parameter values, such as negative N, r41 or ao0.Further, we did not attempt to evaluate the likelihood inthese situations; rather, we applied a penalty, beingquadratic in the parameters. This guided the samplingdistribution of the CE algorithm into the region ofallowable parameters if it strayed. Finally, if the optimisa-tion procedure terminated due to a numerical precisionerror, the procedure was rerun.
N known: We obtained the following results:
N ¼ 2000
Known N
Unknown N
MinMeanMediaMax
MinMeanMediaMax
n ¼ 10
N l
––
n ––
1654.771654.76
n 1654.781654.71 1
m
661.143 32747.609 37794.560 39687.940 34
782.968 30589.403 23609.114 23081.070 42
n ¼ 40
It S bl bm Min 8 �109.108 1.14739 0.566295 Mean 8.65 �109.108 1.14739 0.566295 Median 9 �109.108 1.14739 0.566295 Max 9 �109.108 1.14739 0.566294N unknown: When N was unknown, we introduced anadditional sampling distribution for N. Using the samedata set as above, we sampled 1000 initial populationceilings uniformly from the maximum observed populationsize (which in this case was 32) and 250. (Note that thisceiling should be large enough to encompass the truepopulation size.) All of the other parameters remained thesame.
n ¼ 40
It S bN bl bm Min 14 �107.29 40 1.63349 0.605665 Mean 15 �107.249 39.15 1.67561 0.598932 Median 17.5 �107.242 39 1.68304 0.597744 Max 15 �107.242 39 1.68304 0.597744n ¼ 40 n ¼ 100
N l m N l m
8.464 – 0.906201 0.448219 – 0.906262 0.4520531.422 – 0.905708 0.447823 – 0.906262 0.4520534.748 – 0.905563 0.447709 – 0.906262 0.4520531.778 – 0.905564 0.447709 – 0.906262 0.452053
6.883 1654.03 1.17260 0.45642 1854.30 0.978558 0.4496911.008 1653.21 1.17329 0.456330 1853.45 0.979045 0.4496558.733 1653.10 1.17329 0.456270 1853.30 0.979232 0.4496843.709 1653.07 1.17347 0.456343 1853.27 0.979215 0.449679
The computationally simpler approach: In this case, theCE algorithm took 50,000 samples per iteration, with thesampling parameters updated from the best 100parameter estimates. We initialised the algorithm with
pairs ðl;mÞ drawn uniformly from the triangle(0,0)–(1000,0)–(1000,1000). In the case that N was un-known, we drew N uniformly between the highest valuefound in the sample and 10; 000. The algorithm wasstopped when the largest of the standard deviations of thesampling distributions dropped below the thresholde ¼ 10�7. The larger search space and number of samplesper iteration in the CE algorithm were now practical owingto the low cost of evaluating the (log-)likelihood for a trialset of parameters.The following tables show results for data simulated
using parameters ðN; l;mÞ ¼ ð50; 0:8; 0:4Þ.N known:
n ¼ 40
It S bl bm Min 23 47.5405 1.10683 0.559313 Mean 21.95 47.5464 1.10654 0.557900 Median 22.5 47.5437 1.10674 0.558667 Max 8 47.5582 1.10432 0.552027N unknown:
n ¼ 40
It S bN bl bm Min 7 2:57901e� 047 38.8803 1.61317 0.596472 Mean 6.85 2:57912e� 047 38.9045 1.60636 0.592581 Median 7 2:57913e� 047 38.8875 1.60156 0.590422 Max 7 2:57915e� 047 38.9036 1.59894 0.589929The following tables show results for data simulatedusing parameters ðN; l;mÞ ¼ ð2000; 0:8; 0:4Þ.
N known, unequal spacing: The results below are forunequal spacing of observations; we used the following
ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510504
spacing (in years) between observations:
ð2; 2; 2; 2; 3; 2; 1; 1; 2; 1; 2; 2; 2; 3; 2; 1; 2; 1; 2; 3; 2; 1; 2; 3; 3; 2; 1; 1; 2Þ.
n ¼ 30
It S bl bm Min 24 87.7777 1.21099 0.600365 Mean 19.7 87.8001 1.20623 0.597228 Median 24.5 87.7991 1.2081 0.59832 Max 9 87.8055 1.20298 0.5952676.2. The Bay checkerspot butterfly of Jasper Ridge
We now consider the time series of the Jasper Ridge(JRH) population of Bay checkerspot butterfly, E. editha
bayensis. The population was sampled yearly between 1960and 1986 (Harrison et al., 1991). An analysis of this dataset is included to demonstrate the application of themethod to real data. We do not claim that the stochasticSIS logistic model is the most appropriate for modellingthe butterfly population in question. Furthermore, weassume that the population is in stationarity.
N unknown: The following results were obtained usingthe computationally simpler approach:
n ¼ 27
It S bN bl bm Min 8 8.57701e� 088 280156 245.645 245.215 Mean 8 8.57701e� 088 280156 245.645 245.215 Median 8 8:57701e� 088 280156 245.645 245.215 Max 8 8.57701e� 088 280156 245.645 245.2152In particular with respect to the predictions in Foley (1994).
Observe that N was estimated (consistently) to be wellabove our (arbitrary) cap of 10,000. The actual procedureused to obtain this estimate involved first taking into accountthat the optimisation favoured solutions with N outside thedefault initial range, and subsequently increasing the initialrange until the estimates fell within this range. (In this case,the ‘cap’ of the initial range was increased to 500,000.)
The Bay checkerspot butterfly is now extinct on JasperRidge, with the JRH population thought to have goneextinct in 1998 due to a combination of extensive habitatloss and a period of increased climatic variability beginningin 1972 (McLaughlin et al., 2002). We can evaluate theexpected time to extinction using the stochastic logisticmodel and parameter estimates derived from our procedureusing data from a time when the population was extant.For a general Markov chain with transition ratesQ ¼ ðqðm; nÞ; m; n 2 SÞ, whose state space S (possiblyinfinite) includes a subset A which is reached withprobability 1, the expected time ti it takes to reach A
starting in state i is the minimal non-negative solution toXj2S
qði; jÞtj þ 1 ¼ 0; ieA,
with ti ¼ 0 for i 2 A. This result can be found in most textson Markov chains (for a recent exposition see Norris,1997). Mangel and Tier (1994), in their paper ‘Four factsevery conservation biologist should know about persis-tence’, encourage their readers to use it: Fact 2 ‘There is asimple and direct method for the computation of persis-tence times that virtually all biologists can use’. The resultreduces the problem of computing persistence times to thatof solving a system of linear equations, for which a host ofnumerical methods exist. For any stochastic birth-and-death process, with sets of (population-size dependent)birth and death rates fljg and fmjg, respectively, ti isgiven by
ti ¼Xi
j¼1
1
mjpj
XN
k¼j
pk,
where (the potential coefficients) pj ðjX1Þ are given by p1 ¼1 and pj ¼
Pjk¼2 ðlk�1=mkÞ for jX2. This formula is also
valid in the infinite-state case, replacing N by 1. For thestochastic logistic model we have
ti ¼1
m
Xi
j¼1
XN�j
k¼0
1
j þ k
Yk�1l¼0
N � j � l
Nr
� �.
(This expression admits further simplification, but thisform reflects the algorithm used to evaluate ti, the productbeing evaluated recursively, and the sums evaluated in sucha way as to minimise round-off error.) Whilst the aboveexpression for ti does not pose any significant numericalproblems, an explicit expression for the time to extinction isoften preferred. By evaluating the factorials as gammaintegrals, and using Cauchy’s method to estimate theseintegrals, Pollett (2001) obtained the following explicitasymptotic formula:
ti�rð1� riÞ
mð1� rÞ2e�ð1�rÞ
r
� �N ffiffiffiffiffiffi2pN
rðN largeÞ.
The expected time to extinction for the butterfly populationusing this approach, and setting the initial population sizeto the average population size (that is i ¼ 553) isapproximately 3:39
ffiffiffipp
years. It is interesting to note thatthis is quite close2 to the observed extinction time (over therecorded population range), 12 years after the lastmeasurements present in the data we used.
6.3. Bias
Here we examine (numerically) the bias of our compu-tationally simpler approach using simulated data from thestochastic logistic model with parameters ðN; l; mÞ ¼ð2000; 0:8; 0:4Þ. We applied the technique and obtainestimates for the parameters, for both N known and N
unknown. The procedure was repeated 10,000 times(obtaining new simulated data each time). Any data set
ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510 505
for which a45 had the CE algorithm rerun on it (since thisindicated a numerically spurious result, rather than, say thedata set actually producing such a strange result; in mostcases, if a45 then a4100). Using this process, we arrivedat a set of values that are reasonably accurate indicators ofthe actual MLEs for the 10,000 data sets.
Note that the tables presented in this section are nolonger ordered by likelihood, but rather by the actualvalues for the parameters in question.
n ¼ 100
bN bl bm Min – 0.41054 0.20499 Median – 0.80304 0.40186 Mean – 0.81481 0.40785 Max – 1.59760 0.81427Min
1406 0.41324 0.20583 Median 1924 0.84405 0.40645 Mean 1951 0.86762 0.41219 Max 3158 9.11500 4.16980In order to assess the bias of the technique we performeda sign test on each of the estimated parameters, with thefollowing hypotheses:
H0 : The estimator is unbiased:
HA : Not H0:
The sign test was chosen because of the departure fromnormality observed in our estimates, and a normalapproximation was used to obtain our probabilities. Fora decision about biasedness, we took our critical p-value tobe 0:01. With this in mind, we obtained the following:
N known:
l
mT
1.34 2.22 p 0.180245 0.026419 Biased No NoN unknown:
N
l mT
�26.12 21.1 6.76 p o0:000001 o0:000001 o0:000001 Biased Yes Yes YesObserve that, when N is known, we cannot reject thehypothesis that the estimators are unbiased using the signtest. However, we can strongly reject the hypothesis whenN is unknown, concluding that the (joint) estimators in thiscase are biased.
6.4. Error bounds
MLEs are asymptotically normally distributed withmean y and covariance matrix given by the inverse of theFisher information matrix. This can be estimated by
IðyÞ� ��1
¼ �Eq2lðyÞ
qy qy0
" #( )�1.
However, the second derivatives of the log-likelihood areoften too complicated for their exact expected valuesto be calculated in practice. A second estimator widelyused is
IðyÞ� ��1
¼ �q2lðyÞ
qy qy0
!�1,
this being the inverse of (the negative of) the matrix ofsecond derivatives of the log-likelihood, evaluated at theMLE. For our model we have
Iððl; mÞÞ� ��1
¼ �
q2l
ql2
q2l
ql qm
q2l
qm ql
q2l
qm2
0BBBBB@
1CCCCCA�1
¼q2l
ql2
q2l
qm2�
q2l
ql qm
q2l
qm ql
!�1 �q2l
qm2q2l
ql qm
q2l
qm ql�q2l
ql2
0BBBBB@
1CCCCCA.
Hence, we must compute the second derivatives of the log-likelihood given in Appendix B. This can be done exactly.However, the formulæ are rather cumbersome and will notbe written out here. We use this approach to calculate theerror bounds presented below.Often it will be difficult to compute the matrix of second
derivatives. In such cases a third estimator may be useful,which only requires first derivatives of the log-likelihood.This estimator is
IðyÞ� ��1
¼Xn
i¼1
bgi bg0i !�1
,
where
bgi ¼qlðxi;byÞ
qby .
We illustrate the error bounds by plotting results ofcalculations for the stochastic logistic model, with a set ofsimulated data (with 100 equally spaced observations, andparameters ðN; l;mÞ ¼ ð2000; 0:8; 0:4Þ), and the JRH data(taking our estimate of N to be the true value).Note that in Fig. 2 the confidence ellipse has been
rotated, so that the x-axis is the line y ¼ r�1x. Without
ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510506
rotation, the 95% confidence region appears to be a line,suggesting that the ratio m=l is estimated with far greaterconfidence than the difference l� m.
7. Discussion
When N is known, we see that the general approachproduces reasonable estimates, bl ¼ 1:14739 andbm ¼ 0:566294, of the true parameter values l ¼ 0:8 andm ¼ 0:4. When N is unknown, we once again get reasonableestimates, bN ¼ 39, bl ¼ 1:68304 and bm ¼ 0:597744, to thetrue parameter values N ¼ 50, l ¼ 0:8 and m ¼ 0:4. Asalready noted, even with a small population ceiling ofN ¼ 50, and a small initial search space for this parameter,namely the interval ½32; 250�, the EXPOKIT functionmexpv was unable to complete calculations to the defaulttolerances in many cases, and hence errors were returned.
It can be seen that our approach enables the use of amuch wider search space, owing to the low computationalcost of evaluating the approximation’s (log-)likelihood.The parameter space can be expanded from values up to 5to values up to 1000, a significant increase. This suggeststhat the new procedure permits more uncertainty in thetrue parameter values than does the general approach.Likewise, the maximum population size can be increased,so that the default search space for this parameter is now½32; 10; 000�, once again allowing more uncertainty in thevalue of the maximum population size for the species beingmodelled. This also allows more freedom in modellingpopulations that are only weakly density dependent (in theecological sense mentioned in Section 1) using thestochastic logistic model. This will be discussed in moredetail later in this section with reference to the Baycheckerspot butterfly. The sample size used for the CEmethod can also be significantly increased, in our examplefrom 1000 to 50,000, thus allowing more accuratenumerical maximisation of the likelihood from the CEmethod. For the same data set used to illustrate the generalapproach, the computationally simpler approach producesreasonable estimates (bl ¼ 1:10432 and bm ¼ 0:552027) whenN is known. When N is unknown, we again achievereasonable estimates ( bN ¼ 39, bl ¼ 1:59894 and bm ¼0:589929) with a reasonably significant overestimate ofthe birth parameter. We note the close agreement of theestimates produced using our computationally simpleapproach with those produced via the general approach.The simpler approach works best for large populationsizes. The general approach should be used in cases when itcan be practically implemented, although the simplerapproach can still provide adequate estimates of para-meters in these cases.
We now consider situations in which the generalapproach is infeasible to implement, and demonstrate therobustness of the simpler approach. We use the parametervalues N ¼ 2000, l ¼ 0:8 and m ¼ 0:4. Assuming N isknown, we see that extremely accurate estimates areproduced when we have 40 and 100 observations of data,
being bl ¼ 0:905564 and bm ¼ 0:447709, and bl ¼ 0:906262and bm ¼ 0:452053, respectively. However, in the case ofonly 10 observations, our approach fails to producereasonable parameter estimates. In this case, the ratio rwas estimated well, but the difference a ¼ l� m was not,resulting in unreasonable estimates for both l and m. Thedifference a is the mean-reversion parameter, and this isknown to be difficult to estimate precisely (see for exampleFranco, 2003). This also demonstrates that an adequatesample of data is required to produce accurate parameterestimates. The sample size required for a particular model,and unknown parameter set, should be investigatednumerically through a simulation study. For the stochasticlogistic model, we find that somewhere between 15 and 20observations are required to produce reasonably accurateestimates for most simulated data. A similar result can beseen in the N unknown case. The parameter estimates areseen to improve with an increasing number of observa-tions. For the case n ¼ 40 the estimates are bN ¼ 1653, bl ¼1:17347 and bm ¼ 0:456343, improving to bN ¼ 1853, bl ¼0:979215 and bm ¼ 0:449679 when n ¼ 100. All of theseestimates are reasonably accurate and all are consistentlyachieved. Finally, we demonstrate the robustness of ourapproach to unequal observation intervals, producingparameter estimates of bl ¼ 1:20298 and bm ¼ 0:595267. Thisis an important feature of the procedure, in particular topopulation modelling where equally spaced data are notalways available.We also fitted the stochastic logistic model to population
data of the Bay checkerspot butterfly using our procedure.The parameter estimates produced were bN ¼ 280; 156, bl ¼245:645 and bm ¼ 245:215. Harrison et al. (1991) found thatthe observed data are consistent with the assumption ofdensity independence or only weak density dependence(both in the ecological sense). Our estimates support theconclusion of Harrison et al. (1991): since the estimates forl and m are so close, the population tends to be small, inparticular relative to the large population ceiling N, andhence qðm;mþ 1Þ ’ lm over a wide range of populationsizes. Consequently, the population does not exhibit astrong downward trend in size following deviations aboveits carrying capacity. This result indicates a degree offlexibility in the logistic model and the proposed estimationprocedure when N is unknown, because, while the modelallows for a tendency to decline at large population sizes, itdoes not require this tendency to be strong. As a widevariety of density-dependent processes may be analysedusing this approach, it appears to provide a robust methodfor assessing whether populations are strongly limited bytheir carrying capacity. We additionally evaluated theexpected time to extinction of this population and foundour estimates to be reasonably close to the reported time ofextinction (McLaughlin et al., 2002). Care should be takenwhen the estimates of l and m are close together, as we havefor this data set, because if the true difference a (being themean-reversion parameter) is 0, or negative, the assump-tions of our method are not valid, and the method will
ARTICLE IN PRESS
-600 -400 -200 0 200 400 600 800 1000 1200-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Per
pend
icul
ar D
ispl
acem
ent f
rom
y=
ρ-1x 50%
95%Estimate
Position along y=ρ-1x
Fig. 2. Confidence regions for the butterfly data, with N ¼ 280; 156 taken
as known.
J.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510 507
return a positive estimate of a (which will be typically veryclose to 0).
We also investigated the bias of our approach. We foundthat in the N-known case our procedure could not be saidto produce biased estimates of birth and death parameters,on the basis of a sign test over 10,000 sets of simulateddata. In the N-unknown case, it appears that our approachproduces biased estimates. From our investigation, for thestochastic logistic model, it appears that the populationceiling estimate bN is generally underestimated, and thebirth and death parameter estimates bl and bm are generallyoverestimated. Note that, over all data sets considered, theCE algorithm occasionally produced outlying estimates(whether this was caused by the algorithm or a particularquality of the data set in question, we are unsure).However, these outliers do not adversely affect the testused, since the sign test only records whether the estimatesare above or below the true parameters, and ignoresmagnitude. However, the median estimates for l and mshow a remarkable correspondence to the true parametervalues. In any particular situation, we recommend that anumerical simulation study be performed to determine theextent of any bias. It can then be compensated for whenusing the parameter estimates for subsequent analyses ofthe population.
We can calculate approximate confidence regions forestimates of parameter vectors produced using ourapproach. We have depicted two such regions in Figs. 1and 2, the first for 100 observations of a simulated process,and the second for our model of the butterfly population.From Fig. 1, we see that there appears to be moreconfidence in the ratio m=l than in the difference a ¼ l� m.As a consequence, the confidence region is long andnarrow. For the stochastic logistic model, this appears tobe the general pattern. The parameter pairs which fallinside the 95% confidence region range from ðl;mÞ �ð0:5; 0:25Þ to ðl; mÞ � ð1:3; 0:65Þ, and happens to include the
0.2 0.3 0.4 0.5 0.6 0.70.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
µ
λ
50%95%EstimateTrue
Fig. 1. Confidence regions for simulated data (n ¼ 100), with N ¼ 2000
(known).
true parameter pair. The confidence region for theestimates of parameters from the butterfly data is extremelynarrow, indicating high confidence in the ratio r, andabysmal confidence in the difference a. The 95% con-fidence region, in this case, extends into negative parameterpairs, which are illegal for the model. This case highlightsthe importance of considering estimates in conjunctionwith their confidence regions. We note that more data, aswell as less variance in that data, will tend to yield smallerconfidence regions.Finally, we note that the use of the general approach will
commonly become computationally infeasible as thedimensionality of the model increases. We are currentlyinvestigating the use of our parameter estimation techniquefor such models, employing the multi-dimensional diffu-sion approximation. In such situations it is also frequentlythe case that we can only observe one (or more generally asubset) of these dimensions. Our procedure may be used insuch situations. However, obviously as more parametersare required to be estimated, the effectiveness of ourprocedure will be reduced. We are currently investigatingthis type of data, along with the extension of our techniqueto handle data from transient dynamics, that is insituations where our assumption of stationarity is notvalid.
8. Summary
We have presented two approaches to estimating theparameters of continuous-time Markov chains, thusincreasing their utility in modelling real biological systems.The first approach is applicable when both the parameterspace and the maximum population size are not too large.The second approach can be used whenever the rates of theoriginal Markov chain have a particular, but quite generalform, and is applicable in many situations where thegeneral approach is infeasible. We illustrated its versatility
ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510508
by estimating the parameters of the stochastic logisticmodel from simulated data. We also fitted this model to a,now extinct, population of the Bay checkerspot butterfly(E. editha bayensis) and used the fitted model to predictretrospectively the extinction time of the population. Thiswas found to be close to the witnessed time of extinction.The new approach to parameter estimation is applicable toa wide class of models, in particular those appropriate formodelling many biological systems. For this reason, weanticipate that our methods will open up new opportunitiesfor implementing continuous-time Markov chains asmodels for real populations, and for performing subse-quent population viability analyses using these models.
Acknowledgments
The authors thank Ben Cairns and Dirk Kroese forvaluable discussions. The support of the AustralianResearch Council Centre of Excellence for Mathematicsand Statistics of Complex Systems is gratefully acknowl-edged.
Appendix A. Density dependence
In a series of papers, Kurtz (1970, 1971) and Barbour(1974, 1976, 1980a,b) established a suite of diffusionapproximation techniques for density-dependent Markovchains. We summarise their main results here, but in theslightly more general context of asymptotic density-dependent processes outlined in Pollett (1990, 2001).
Suppose that we are given a family fmN ð�Þg ofcontinuous-time Markov chains indexed by N40, wheremNð�Þ takes values in SN , a subset of Zk, and has transitionrates QN ¼ ðqN ðm; nÞ; m; n 2 SNÞ.
Definition 2. Suppose that there is a subset E of Rk (k-dimensional Euclidean space) and a family ff N ;N40g ofcontinuous functions, with f N : E � Zk ! R, such that
qNðm;mþ lÞ ¼ Nf N
m
N; l
� �; la0.
Then, the family of Markov chains is asymptoticallydensity dependent if, additionally, there exists a functionF : E! R such that fF Ng, given by FN ðxÞ ¼
Pl lf N ðx; lÞ,
x 2 E, converges (pointwise) to F on E. It is called simplydensity dependent if f N , and hence F N , are the same forall N.
Now define a ‘density process’ X Nð�Þ byX N ðtÞ ¼ mNðtÞ=N, tX0. The following functional law oflarge numbers establishes convergence of the familyfX N ð�Þg to the unique trajectory of an appropriateapproximating deterministic model.
Theorem 3. Suppose that f Nð�; lÞ is bounded for each l and
N, that F is Lipschitz continuous on E and that fF Ng
converges uniformly to F on E. Then, if limN!1X Nð0Þ ¼ x0,the density process X N ð�Þ converges uniformly in probability
on ½0; t� to xð�Þ, the unique (deterministic) trajectory
satisfying xð0Þ ¼ x0 and
d
dsxðsÞ ¼ F ðxðsÞÞ; xðsÞ 2 E; s 2 ½0; t�. (8)
The following functional central limit law establishesthat, for large N, the fluctuations about the deterministictrajectory follow a Gaussian diffusion, provided that mild‘second-order’ conditions are satisfied.
Theorem 4. Suppose f Nð�; lÞ is bounded for each l and N,that F is Lipschitz continuous and has uniformly continuous
first derivative on E, and that
limN!1
supx2E
ffiffiffiffiffiNpjF NðxÞ � F ðxÞj ¼ 0.
Suppose also that the sequence fGNg, where
GN ðxÞ ¼X
l
l2f N ðx; lÞ; x 2 E,
converges uniformly to G, where G is uniformly continuous
on E. Let x0 2 E and let xð�Þ be the unique trajectory
satisfying xð0Þ ¼ x0 and (8). Then, if
limN!1
ffiffiffiffiffiNpðX N ð0Þ � x0Þ ¼ z, (9)
the family of processes fZNð�Þg, defined by
ZN ðsÞ ¼ffiffiffiffiffiNpðX N ðsÞ � xðsÞÞ; 0pspt,
converges weakly in D½0; t� (the space of right-continuous,left-hand limit functions on ½0; t�) to a Gaussian diffusion Zð�Þ
with initial value Zð0Þ ¼ z and with mean and variance given
by ms :¼EðZðsÞÞ ¼Msz, where Ms ¼ expðR s
0 Bu duÞ and
Bs ¼ F 0ðxðsÞÞ, and VarðZðsÞÞ ¼ s2s , where
s2s ¼M2s
Z s
0
M�2u GðxðuÞÞ du.
The functional central limit theorem tells us that, forlarge N, the scaled density process ZNð�Þ can be approxi-mated over finite time intervals by the Gaussian diffusionZð�Þ. In particular, for all s40, X NðsÞ has an approximatenormal distribution with VarðX NðsÞÞ ’ s2s=N. We wouldusually take x0 ¼ X Nð0Þ, thus giving EðX NðsÞÞ ’ xðsÞ.
In the important special case where the initial point x0 ofthe deterministic trajectory is chosen as an equilibriumpoint of (8), we can be far more precise about theapproximating diffusion. If we set x0 ¼ xeq in Theorem 4,where xeq satisfies F ðxeqÞ ¼ 0, then, under the conditions ofthat theorem, we arrive at the following corollary:
Corollary 5. If xeq satisfies F ðxeqÞ ¼ 0 then, under the
conditions of Theorem 4, the family fZN ð�Þg, defined by
ZN ðsÞ ¼ffiffiffiffiffiNpðX N ðsÞ � xeqÞ, 0pspt, converges weakly in
D½0; t� to an OU process Zð�Þ with initial value Zð0Þ ¼ z,local drift B ¼ F 0ðxeqÞ and local variance V ¼ GðxeqÞ. In
particular, ZðsÞ is normally distributed with mean ms ¼ eBsz
and variance s2s ¼ ðV=ð2BÞÞðe2Bs � 1Þ.
We conclude that, for N large, X NðsÞ has an approximatenormal distribution with VarðX NðsÞÞ ’ s2s=N. A ‘working
ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510 509
approximation’ for the mean (that is, for a fixed value of N)is obtain by setting z equal to
ffiffiffiffiffiNpðX Nð0Þ � xeqÞ (refer to
Eq. (9)). We get
EðX N ðsÞÞ ’ xeq þ eBsðX N ð0Þ � xeqÞ.
Note that this corollary implies that the distribution of a(finite) sequence of observations of our scaled densityprocess, that is ðZN ðt1Þ; . . . ;ZNðtnÞÞ with ð0pÞ t1o � � �otn,converges to a multivariate normal distribution. Conse-quently, we may approximate the joint distribution of a(finite) sequence of observations of our density processX N ð�Þ by a multivariate normal distribution.
In the context of population models xeq will usually beasymptotically stable, that is Bo0. However, it should beemphasised that it need not be for each of the aboveconclusions to hold. Indeed, the OU approximation isoften very accurate in describing the fluctuations aboutcentres and unstable equilibria (see Barbour, 1976). In theasymptotically stable case the OU process Zð�Þ hasstationary mean 0 and variance s2 ¼ V=ð�2BÞ, and hence(approximately, for large N) X N ðsÞ�Normalðxeq;s2=NÞ.However, while it is intuitively reasonable to imagine thatthe long-term behaviour of the density process X Nð�Þ will beapproximated by that of the diffusion, a precise statementcannot be deduced immediately from the theorems statedabove, for the behaviour (as t!1) depends on the wholeprocess, rather than its behaviour over finite time intervals.
Appendix B. The OU (log-)likelihood
In this section we summarise results from Rybicki (1994)that allow us to invert explicitly, and calculate thedeterminant of, the time-dependent covariance matrix C
of the OU process. This permits us to write the (log)likeli-hood explicitly, and consequently note that only OðnÞ
operations are required to evaluate it.Write the likelihood function (7) as
f ðxÞ ¼1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið2pÞnjCj
p exp �1
2yC�1y0
� �,
with y ¼ x�m. It turns out (Rybicki and Press, 1995) that
yC�1y0 ¼N
s2Xn
i¼1
ðyi � ri�1yi�1Þ2
1� r2i�1,
where s2 ¼ V=ð�2BÞ and
detðCÞ ¼s2
N
� �n Yn�1i¼1
ð1� r2i Þ,
where
rk ¼0 if k ¼ 0;
exp½Bðtkþ1 � tkÞ� if 1pkpn� 1:
(
This allows us to write
f ðxÞ ¼2ps2
N
� ��n=2 Yn�1i¼1
1� r2i
!�1=2
� exp �N
2s2Xn
i¼1
ðyi � ri�1yi�1Þ2
1� r2i�1
" #, ð10Þ
and
lðxÞ ¼ �n
2log
2ps2
N
� ��
1
2
Xn�1i¼1
logð1� r2i Þ
�N
2s2Xn
i¼1
ðyi � ri�1yi�1Þ2
1� r2i�1. ð11Þ
We emphasise that the above formulæ hold in general,and should be used to evaluate the (log-)likelihood(instead of inverting C and calculating its determinantnumerically).
Appendix C. The CE method
We describe here the algorithm used to maximise the(log-)likelihood (for further details and applications of theCE method, see Rubinstein and Kroese, 2004). Thealgorithm is used with our simpler approach based on theOU approximation. However, a slight change in para-meters, coupled with the (numerical) computation of thetrue likelihood (say using EXPOKIT) instead of thelikelihood in Step 3, will result in an algorithm formaximising the likelihood in the general approach outlinedin Section 3.
Algorithm.
1.
Set Ns ¼ 50; 000, Ne ¼ 100, e ¼ 10�7, a ¼ 1000,maxits ¼ 5000, t ¼ 0.2.
Draw Ns pairs ðli;miÞ uniformly from the triangle(0,0)–ða; 0Þ–ða; aÞ and transform them into ða;rÞ pairs.If N is unknown, draw Ns samples Ni uniformly from½maxðxÞ; 10; 000�.3.
Set t ¼ tþ 1. If N is known, calculate li ¼ lðx; li; miÞ foreach i, and if N is unknown, calculate li ¼ f ðx; li;mi;NiÞfor each i.
4. Locate a set E of Ne indices i for which lkXlj for allk 2 E, and jeE.
5. Calculate mr, sr, ma, sa, (mN , sN) as the means andstandard deviations of rk, ak (and Nk), where k 2 E.
6. Draw Ns pairs (or triples) of unknown parameters fromindependent normal distributions with means andstandard deviations calculated in the previous step.
7.
If the largest s is greater than e and tomaxits thenreturn to Step 3; otherwise output bl ¼ ak=ð1� rkÞ, bm ¼akrk=ð1� rkÞ (and bN ¼ Nk), where k is an index suchthat lkXlj for all j.ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510510
References
Anderson, W., 1991. Continuous-time Markov Chains: An Applications-
Oriented Approach. Springer, New York.
Bailey, N.T.J., 1957. The Mathematical Theory of Epidemics. Griffin,
London.
Barbour, A.D., 1974. On a functional central limit theorem for Markov
population processes. Adv. Appl. Probab. 6, 21–39.
Barbour, A.D., 1976. Quasi-stationary distributions in Markov popula-
tion processes. Adv. Appl. Probab. 8, 296–314.
Barbour, A.D., 1980a. Equilibrium distributions Markov population
processes. Adv. Appl. Probab. 12, 591–614.
Barbour, A.D., 1980b. Density-dependent Markov population processes.
In: Jager, W., Rost, H., Tautu, P. (Eds.), Biological Growth and
Spread, Lecture Notes in Biomathematics, vol. 38. Springer, Berlin,
pp. 36–49.
Barbour, A.D., Pugliese, A., 2004. Asymptotic behaviour of a metapo-
pulation model. J. Math. Biol. 49, 468–500.
Barbour, A.D., Pugliese, A., 2005. Asymptotic behavior of a metapopula-
tion model. Ann. Appl. Probab. 15, 1306–1338.
Bartholomew, D.J., 1976. Continuous time diffusion models with random
duration of interest. J. Math. Sociol. 4, 187–199.
Bartlett, M.S., 1949. Some evolutionary stochastic processes. J. Roy.
Statist. Soc. Ser. B 11, 211–229.
Bartlett, M.S., 1956. Deterministic and stochastic models for recurrent
epidemics. Proceedings of the Third Berkeley Symposium on Mathe-
matics Statistical Probability, vol. 4, pp. 81–109.
Clancy, D., 2005. A stochastic SIS infection model incorporating indirect
transmission. J. Appl. Probab. 42, 726–737.
Clancy, D., O’Neill, P.D., Pollett, P.K., 2001. Approximations for the
long-term behaviour of an open-population epidemic model. Metho-
dol. Comput. Appl. Probab. 3, 75–95.
Daley, D.J., Kendall, D.G., 1965. Stochastic rumours. J. Inst. Math. Appl.
1, 42–55.
Diekmann, O., Heesterbeek, J.A.P., Metz, J.A.J., 1990. On the definition
and the computation of the basic reproduction ratio R0 in models for
infectious diseases in heterogenous populations. J. Math. Biol. 28,
365–382.
Fan, H., Soderstrom, T., Mossberg, M., Carlsson, B., Zou, Y., 1999.
Estimation of continuous-time AR process parameters from discrete-
time data. IEE Trans. Signal Process. 47, 1232–1244.
Feigin, P.D., 1976. Maximum likelihood estimation for continuous-time
stochastic processes. Adv. Appl. Probab. 8, 712–736.
Feller, W., 1939. Die grundlagen der volterraschen theorie des kampfes
ums dasein in wahrscheinlichkeitsteoretischer behandlung. Acta
Biotheoret. 5, 11–40.
Foley, P., 1994. Predicting extinction times from environmental stochas-
ticity and carrying capacity. Cons. Biol. 8, 124–137.
Franco, J.C.G., 2003. Maximum likelihood estimation of mean reverting
processes: hwww.investmentscience.com/Content/howtoArticles/
MLE_for_OR_mean_reverting.pdfi (downloaded on 16th May 2006).
Gibbens, R.J., Hunt, P.J., Kelly, F.P., 1990. Bistability in communications
networks. In: Grimmett, G.R., Welsh, D.J.A. (Eds.), Disorder in
Physical Systems. Oxford University Press, Oxford, pp. 113–127.
Gillespie, D.T., 2000. The chemical Langevin equation. J. Chem. Phys.
113, 297–306.
Harrison, S., Quinn, J.F., Baughman, J.F., Murphy, D.D., Ehrlich, P.R.,
1991. Estimating the effects of scientific study on two butterfly
populations. Amer. Nat. 137, 227–243.
Kingman, J.F.C., 1969. Markov population processes. J. Appl. Probab. 6,
1–18.
Kurtz, T., 1970. Solutions of ordinary differential equations as limits of
pure jump Markov processes. J. Appl. Probab. 7, 49–58.
Kurtz, T., 1971. Limit theorems for sequences of jump Markov processes
approximating ordinary differential processes. J. Appl. Probab. 8,
344–356.
Kurtz, T.G., 1973. The relationship between stochastic and deterministic
models in chemical reactions. J. Chem. Phys. 57, 2976–2978.
Levins, R., 1969. Some demographic and genetic consequences of
environmental heterogeneity for biological control. Bull. Entom. Soc.
Amer. 15, 237–240.
Mangel, M., Tier, C., 1994. Four facts every conservation biologist should
know about persistence. Ecology 75, 607–614.
McLaughlin, J.F., Hellmann, J.J., Boggs, C.L., Ehrlich, P.R., 2002.
Climate change hastens population extinctions. Proc. Nat. Acad. Sci.
99, 6070–6074.
McNeil, D.R., Weiss, G.H., 1977. A large population approach to
estimation of parameters in Markov population models. Biometrika
64, 553–558.
Mode, C.J., Sleeman, C.K., 2000. Stochastic Processes in Epidemiology,
HIV/AIDS, Other Infectious Diseases and Computers. World
Scientific, Singapore.
Nasell, I., 2002. Stochastic models of some endemic infections. Math.
Biosci. 179, 1–19.
Norris, J., 1997. Markov Chains. Cambridge University Press, Cam-
bridge.
Oppenheim, I., Shuler, K.E., Weiss, G.H., 1977. Stochastic theory of
nonlinear rate processes with multiple stationary states. Physica 88A,
191–214.
Pollett, P.K., 1990. On a model for interference between searching insect
parasites. J. Austral. Math. Soc. Ser. B 32, 133–150.
Pollett, P.K., 1992. Diffusion approximations for a circuit switching
network with random alternative routing. Austral. Telecom. Res. 25,
45–52.
Pollett, P.K., 2001. Diffusion approximations for ecological models. In:
Ghassemi, F. (Ed.), Proceedings of the International Congress on
Modelling and Simulation, vol. 2, Modelling and Simulation Society of
Australia and New Zealand, Canberra, Australia, pp. 843–848.
Pollett, P.K., Vassallo, A., 1992. Diffusion approximations for some
simple chemical reaction schemes. Adv. Appl. Probab. 24,
875–893.
Reuter, G.E.H., 1961. Competition processes. Proceedings of the Fourth
Berkeley Symposium on Mathematics Statistical Probability, vol. 2,
pp. 421–430.
Ross, J.V., 2006a. A stochastic metapopulation model accounting for
habitat dynamics. J. Math. Biol. 52, 788–806.
Ross, J.V., 2006b. Stochastic models for mainland-island metapopula-
tions in static and dynamic landscapes. Bull. Math. Biol. 68,
417–449.
Rubinstein, R.Y., Kroese, D.P., 2004. The Cross-Entropy Method: A
Unified Approach to Combinatorial Optimization, Monte-Carlo
Simulation, and Machine Learning. Springer, New York.
Rybicki, G.B., 1994. Unpublished notes: notes on Gaussian random
functions with exponential correlation functions (Ornstein–Uhlenbeck
process). From hhttp://www.lanl.gov/DLDSTP/fast/i.
Rybicki, G.B., Press, W.H., 1995. A class of fast methods for processing
irregularly sampled or otherwise inhomogeneous one-dimensional
data. Phys. Rev. Lett. 74, 1060–1063.
Sidje, R.B., 1998. EXPOKIT. A software package for computing matrix
exponentials. ACM Trans. Math. Software 24, 130–156.
Verhulst, P.F., 1838. Notice sur la loi que la population suit dans son
accroisement. Corr. Math. Phys. X, 113–121.
Weiss, G.H., Dishon, M., 1971. On the asymptotic behaviour of the
stochastic and deterministic models of an epidemic. Math. Biosci. 11,
261–265.