on parameter estimation in population models

13
Theoretical Population Biology 70 (2006) 498–510 On parameter estimation in population models J.V. Ross , T. Taimre, P.K. Pollett Department of Mathematics, University of Queensland, Qld. 4072, Australia Received 4 January 2006 Available online 12 August 2006 Abstract We describe methods for estimating the parameters of Markovian population processes in continuous time, thus increasing their utility in modelling real biological systems. A general approach, applicable to any finite-state continuous-time Markovian model, is presented, and this is specialised to a computationally more efficient method applicable to a class of models called density-dependent Markov population processes. We illustrate the versatility of both approaches by estimating the parameters of the stochastic SIS logistic model from simulated data. This model is also fitted to data from a population of Bay checkerspot butterfly (Euphydryas editha bayensis), allowing us to assess the viability of this population. r 2006 Elsevier Inc. All rights reserved. Keywords: Markov chains; Cross-entropy method; Density dependence; Euphydryas editha bayensis; Stochastic SIS logistic model 1. Introduction Continuous-time Markov chains have been proposed as theoretical models for an array of biological systems. For example, they have been used to describe the evolution of populations (Bartlett, 1949), the spread of epidemics (Bailey, 1957; Bartlett, 1956) and rumours (Daley and Kendall, 1965), and competition between species (Reuter, 1961). Whilst they are ubiquitous in the applied probability literature, their application has been limited, partly due to a lack of clear statistical procedures for fitting the models. This is most evident in the ecological literature, where discrete-time Markov chains predominate, perhaps due in part to a misconception that a discrete-time model is needed if data are collected at discrete-time points (say at the end of each breeding season). We address the statistical limitations by providing straightforward methods for parameter estimation, so enabling this rich family of models to be used in modelling real biological systems. We begin by reviewing the general approach to para- meter estimation in the context of continuous-time Markov chains, when the process is observed at successive (but not necessarily equally spaced) time points (for example, sets of abundancy data collected at various times). This should be contrasted with the case where the process is observed continuously; in this situation different techniques apply (see for example Section 4 of Feigin, 1976). We then describe an approach which is simpler in terms of computational implementation, and which is suitable for parameter estimation in a wide class of population models called (asymptotically) density-dependent 1 Markov pro- cesses (Kurtz, 1970). Density-dependent processes have appeared in a variety of biological and physical contexts: chemical kinetics (Kurtz, 1973; Pollett and Vassallo, 1992; Gillespie, 2000), epidemics (Na˚sell, 2002; Clancy, 2005; Clancy et al., 2001), metapopulations (Barbour and Pugliese, 2004, 2005; Ross, 2006a,b), parasitology (Pollett, 1990) and telecommunications (Gibbens et al., 1990; Pollett, 1992). Whatever the context, our approach can be used, but we focus here on ecological models. The method is based on an approximation to the full likelihood of the process, resulting in a substantial decrease in computational complexity. This is achieved using some remarkable results of Kurtz (1970, 1971) and Barbour ARTICLE IN PRESS www.elsevier.com/locate/tpb 0040-5809/$ - see front matter r 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.tpb.2006.08.001 Corresponding author. E-mail addresses: [email protected] (J.V. Ross), ttaimre@maths. uq.edu.au (T. Taimre), [email protected] (P.K. Pollett). 1 In its common usage in ecology, ‘density dependence’ refers to a population whose per capita rate of change is a function of population size. Its use here is a more literal ‘dependence on the population density’ of a specific form (see Definition 2 in Appendix A).

Upload: jv-ross

Post on 01-Dec-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

ARTICLE IN PRESS

0040-5809/$ - se

doi:10.1016/j.tp

�CorrespondE-mail addr

uq.edu.au (T. T

Theoretical Population Biology 70 (2006) 498–510

www.elsevier.com/locate/tpb

On parameter estimation in population models

J.V. Ross�, T. Taimre, P.K. Pollett

Department of Mathematics, University of Queensland, Qld. 4072, Australia

Received 4 January 2006

Available online 12 August 2006

Abstract

We describe methods for estimating the parameters of Markovian population processes in continuous time, thus increasing their utility

in modelling real biological systems. A general approach, applicable to any finite-state continuous-time Markovian model, is presented,

and this is specialised to a computationally more efficient method applicable to a class of models called density-dependent Markov

population processes. We illustrate the versatility of both approaches by estimating the parameters of the stochastic SIS logistic model

from simulated data. This model is also fitted to data from a population of Bay checkerspot butterfly (Euphydryas editha bayensis),

allowing us to assess the viability of this population.

r 2006 Elsevier Inc. All rights reserved.

Keywords: Markov chains; Cross-entropy method; Density dependence; Euphydryas editha bayensis; Stochastic SIS logistic model

1. Introduction

Continuous-time Markov chains have been proposed astheoretical models for an array of biological systems. Forexample, they have been used to describe the evolution ofpopulations (Bartlett, 1949), the spread of epidemics(Bailey, 1957; Bartlett, 1956) and rumours (Daley andKendall, 1965), and competition between species (Reuter,1961). Whilst they are ubiquitous in the applied probabilityliterature, their application has been limited, partly due toa lack of clear statistical procedures for fitting the models.This is most evident in the ecological literature, wherediscrete-time Markov chains predominate, perhaps due inpart to a misconception that a discrete-time model isneeded if data are collected at discrete-time points (say atthe end of each breeding season). We address the statisticallimitations by providing straightforward methods forparameter estimation, so enabling this rich family ofmodels to be used in modelling real biological systems.

We begin by reviewing the general approach to para-meter estimation in the context of continuous-time Markovchains, when the process is observed at successive (but not

e front matter r 2006 Elsevier Inc. All rights reserved.

b.2006.08.001

ing author.

esses: [email protected] (J.V. Ross), ttaimre@maths.

aimre), [email protected] (P.K. Pollett).

necessarily equally spaced) time points (for example, sets ofabundancy data collected at various times). This should becontrasted with the case where the process is observedcontinuously; in this situation different techniques apply(see for example Section 4 of Feigin, 1976). We thendescribe an approach which is simpler in terms ofcomputational implementation, and which is suitable forparameter estimation in a wide class of population modelscalled (asymptotically) density-dependent1 Markov pro-cesses (Kurtz, 1970). Density-dependent processes haveappeared in a variety of biological and physical contexts:chemical kinetics (Kurtz, 1973; Pollett and Vassallo, 1992;Gillespie, 2000), epidemics (Nasell, 2002; Clancy, 2005;Clancy et al., 2001), metapopulations (Barbour andPugliese, 2004, 2005; Ross, 2006a,b), parasitology (Pollett,1990) and telecommunications (Gibbens et al., 1990;Pollett, 1992). Whatever the context, our approach canbe used, but we focus here on ecological models. Themethod is based on an approximation to the full likelihoodof the process, resulting in a substantial decrease incomputational complexity. This is achieved using someremarkable results of Kurtz (1970, 1971) and Barbour

1In its common usage in ecology, ‘density dependence’ refers to a

population whose per capita rate of change is a function of population

size. Its use here is a more literal ‘dependence on the population density’ of

a specific form (see Definition 2 in Appendix A).

ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510 499

(1974, 1976, 1980a,b), which provide (i) a natural way ofidentifying an approximating deterministic model thatmore closely approximates the original model as the sizeof the system increases, and (ii) a Gaussian diffusioncharacterizing fluctuations about the unique trajectory ofthe deterministic model. If the population process startsclose to a deterministic equilibrium, this diffusion is anOrnstein–Uhlenbeck (OU) process, and the likelihoodfunction for observations at successive time points has aparticularly simple form. It is this likelihood we exploit forour parameter estimation procedure.

We illustrate both techniques by fitting the stochastic SISlogistic model to simulated data. We also fit this model todata from a population of Bay checkerspot butterfly(Euphydryas editha bayensis), and use the fitted model toassess the viability of the population.

The remainder of the paper is organised as follows. Inthe next section we define Markov population processes.This is followed in Section 3 with an account of a generalapproach to estimating the parameters of continuous-timeMarkov chains with bounded transition rates. Section 4summarises properties of density-dependent models, whichare illustrated with reference to the stochastic logisticmodel. Several properties of density-dependent models areexploited in Section 5, where our computationally simplermethod is described. In Section 6 we illustrate bothapproaches to parameter estimation using the stochasticlogistic model, fitting it to both simulated and real data.We also investigate the bias of our estimates and provideerror bounds for them in Section 6. A general discussion ofour numerical results is given in Section 7, and this isfollowed by a summary of the paper in Section 8.

2. Markov population processes

Markov population processes have been used extensivelyto model populations consisting of individuals of differentcategories or types, or where the individuals occupydifferent sites. We represent the state at time t by a vectormðtÞ ¼ ðm1ðtÞ;m2ðtÞ; . . . ;mkðtÞÞ, where miðtÞ is the numberof individuals at the ith site at time t, and we assume thatðmðtÞ; tX0Þ is a continuous-time Markov chain whose statespace S is contained in Zk (the k-dimensional integerlattice). The evolution of such a Markov chain is governedby transition rates Q ¼ ðqðm; nÞ;m; n 2 SÞ, with qðm; nÞrepresenting the rate of transition from state m tostate n, for nam, and qðm;mÞ ¼ �qðmÞ, whereqðmÞ :¼

Pnam qðm; nÞ ðo1Þ, being the total rate out of

state m. The model is specified by writing down Q. Thejumps of a Markov population process can be of threekinds: the arrival of a new individual (birth or immigra-tion), the departure of an existing individual from thesystem (death or emigration), or the transfer of anindividual from one site to another (migration orpredator–prey competition). Accordingly, we use thefollowing definition (Kingman, 1969):

Definition 1. A Markov chain on a subset S of Zk will becalled a Markov population process if, for any m, the onlyvalues of n for which qðm; nÞ is non-zero are those with

ni ¼ mi þ 1; nj ¼ mj ðjaiÞ ðarrival at iÞ,

ni ¼ mi � 1; nj ¼ mj ðjaiÞ ðdeparture from iÞ,

ni ¼ mi � 1; nj ¼ mj þ 1; nk ¼ mk ðkai; jÞ

ðtransfer from i to jÞ.

Thus, Markov population processes encompass birth–death processes, predator–prey systems, metapopulationsand many other population processes. Our secondapproach to parameter estimation, to be outlined inSection 5, works best for Markov population processes.We note that the approximation can be used for processeswith larger than unit changes in population size, but themethod will become less accurate in such situations and forlarge changes in population size it will be renderedineffective. Consequently, our method of parameterestimation cannot be used for populations that experience,for example, catastrophes.

We will assume that Q is regular, so that the there is aunique transition function PðtÞ with entries pijðtÞ corre-sponding to the probability that the process moves fromstate i to state j in time t. Regularity is guaranteed if Q isbounded in the sense that qðmÞpb, for some constant b, acondition that is trivially satisfied when there are finitelymany states. In this case we may write PðtÞ ¼ expðQtÞ,where exp is the matrix exponential. (For further details,see Anderson, 1991; Norris, 1997.) In most cases thetransition function cannot be evaluated explicitly. How-ever, the behaviour of the process can often be investigatedby working directly with the transition rates or, for morecomplex processes, by way of numerical computationprocedures, or analytical approximations such as thediffusion approximation adopted here.

3. A general approach to parameter estimation

The general approach to parameter estimation, whichmakes use of a numerical procedure for computing thematrix exponential, is straightforward (see for exampleMode and Sleeman, 2000 for a general discussion), butapparently not widely known in the ecology literature,perhaps due in part to the infeasibility of the method whenthe population ceiling is large.We will suppose that there is a parameter (or vector of

parameters) y, contained in some parameter space Y, thatmust be estimated. We will allow the dependence on y to bemade explicit in our notation by writing QðyÞ for thetransition rates and Pðy; tÞ ¼ expðQðyÞtÞ ¼ ðpijðy; tÞ; i; j 2SÞ for the transition function. We will also write piðy; tÞ forthe probability that the process is in state i at time t;pðy; tÞ ¼ ðpiðy; tÞ; i 2 SÞ is usually taken to be the stationaryor the quasi-stationary distribution. Given a set of n

ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510500

observations ik ¼ mðtkÞ ðk ¼ 1; . . . ; nÞ of the state of theprocess at times ð0pÞ t1o � � �otn, the likelihood ofobserving them is

LðyÞ ¼ pi1ðy; t1Þ

Yn

k¼2

pik�1;ikðy; tk � tk�1Þ. (1)

We may then calculate the maximum likelihood estimator(MLE) by of the parameters y by maximising the likelihood(1) over Y for the given observations. As noted previously,the transition function cannot usually be evaluatedexplicitly, but progress can be made by computing thetransition probabilities numerically. This, combined with anumerical search algorithm over the parameter space Y,allows us to compute the desired MLE.

To compute the required matrix exponentials, we useEXPOKIT, a software package whose algorithms make useof Krylov subspace methods (Sidje, 1998). Any one of arange of numerical optimisation techniques can be used tomaximise (1). We use the Cross-Entropy (CE) Method

(Rubinstein and Kroese, 2004), which has proved to beparticularly effective for maximising the likelihood func-tions that we consider. We will see that this combinationprovides a useful tool for fitting continuous-time Markovchains to real systems, provided that the parameter spaceand the maximum population size is not too large. Weimplement this approach in Section 6.

4. Density-dependent models

Our second approach, to be outlined in Section 5, issuitable for parameter estimation in a wide class ofstochastic population models known as density-dependentmodels, which we shall describe and illustrate here.

Imagine now that our population model is indexed byN40, which will usually be a parameter of the model. Inany case N will be related to the size of the system. Fordefiniteness we may think of N as being the populationceiling (maximum population size). Thus, we have a family

fmNð�Þg of Markov chains indexed by N, where mN ð�Þ takesvalues in SN (contained in Zk) and has transition ratesQN ¼ ðqN ðm; nÞ; m; n 2 SNÞ. Our model is then calleddensity-dependent (Kurtz, 1970) if the transition rates takethe form

qNðm;mþ lÞ ¼ Nfm

N; l

� �; la0, (2)

for a suitable function f, so that the transition rates of thecorresponding ‘density process’ X N ð�Þ, defined byX N ðtÞ ¼ mNðtÞ=N, tX0, depend on the present state m

only through the density m=N. A formal and slightly moregeneral definition, where property (2) is realised asympto-

tically (for large N), is given in Appendix A (Definition 2).To illustrate the idea, we will consider a version of

Feller’s (1939) stochastic logistic model, which allows one tomodel the dynamics of a population inhabiting a finiteregion that can support a maximum of N individuals, andaccounts for density dependence (in the ecological sense

noted in Section 1) in both the birth and death transitionrates. The version we shall consider is the stochastic SIS

(Susceptible–Infective–Susceptible) logistic model, whichwas one of the earliest stochastic models for the spreadof infections that do not confer any long lasting immunity,and where individuals become susceptible again afterinfection (Weiss and Dishon, 1971). It provides thesimplest population model that incorporates ecologicaldensity dependence and individual-level demography. Ithas been applied, not only in epidemiology, but also in thepropagation of rumours (Bartholomew, 1976), in chemicalreaction kinetics (Oppenheim et al., 1977), and inmetapopulation ecology (Levins, 1969). It is a continu-ous-time Markov chain ðmðtÞ; tX0Þ taking values in S ¼

f0; 1; . . . ;Ng with non-zero transition rates

qðm;mþ 1Þ ¼ lm

NðN �mÞ ðm ¼ 1; 2; . . . ;N � 1Þ,

qðm;m� 1Þ ¼ mm ðm ¼ 1; 2; . . . ;NÞ,

where l and m are both positive. We can see immediatelythat the model is density dependent with f ðx;þ1Þ ¼ lxð1�xÞ and f ðx;�1Þ ¼ mx, being the rate of increase and the rateof decrease, respectively, when the population density is x.Furthermore, if we let DX ¼ X Nðtþ DtÞ � X N ðtÞ, thenEðDX jX N ðtÞ ¼ xÞ ’ F ðxÞDt, where F ðxÞ ¼ lxð1� xÞ � mx.Consequently, a natural deterministic model emerges forthe population density:

dx

dt¼ F ðxÞ where F ðxÞ ¼ lxð1� xÞ � mx. (3)

This is of course the classical Verhulst model (Verhulst,1838) (which has been reinvented several times since it firstappeared in 1838). Indeed, for any density-dependentmodel we can identify a deterministic analogue:

dx

dt¼ F ðxÞ where F ðxÞ ¼

Xl

lf ðx; lÞ. (4)

Our intuition tells us that X N ð�Þ should behave moredeterministically as N becomes large and that indeed thedeterministic trajectory should be ‘tracked’ by the processwhen N is large. Kurtz (1970) established this rigorously byproving a functional law of large numbers (stated in itsmost general form as Theorem 3 of Appendix A) that holdsunder mild conditions which will apply in most biologicalcontexts.What makes this approximation procedure so powerful

is our ability to model precisely the fluctuations of thedensity process X N ð�Þ about the deterministic trajectory.We will summarise results for the ‘equilibrium case’, wherewe model the fluctuations about a steady state xeq of (4),that is, F ðxeqÞ ¼ 0, and consider the family fZNð�Þg definedby ZN ðtÞ ¼

ffiffiffiffiffiNpðX N ðtÞ � xeqÞ. Our intuition, based on the

Central Limit Theorem, tells us that, for large N, ZN ðtÞ

should have an approximate normal (Gaussian) distribu-tion (with zero mean and a variance that depends on t).This is indeed the case, but we can be far more precise byidentifying an approximating process. Kurtz (1970) proved

ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510 501

that our scaled density process ZN ð�Þ converges to an OUprocess Zð�Þ whose parameters can be determined simplyfrom the original model. This is given in Appendix A with aformal statement of a functional central limit theorem(Corollary 5).

Much can be deduced from properties of the limitingprocess Zð�Þ. For example, in the one-dimensional case, wemay conclude that, for N large, X NðtÞ has an approximatenormal distribution with

EðX N ðtÞÞ ’ xeq þ eBtðX Nð0Þ � xeqÞ,

where B ¼ F 0ðxeqÞ and VarðX NðtÞÞ ’ s2t =N, with

s2t ¼GðxeqÞ

2Bðe2Bt � 1Þ where GðxÞ ¼

Xl

l2f ðx; lÞ.

We will consider only the case where xeq is asymptoticallystable, that is Bo0, so that, as t!1, EðZðtÞÞ ! 0 andVarðZðtÞÞ ! s2 :¼V=ð�2BÞ, where V ¼ GðxeqÞ. Again, fulldetails are given in Appendix A. The special covariancestructure of the OU process will be revealed and exploitedin the next section, where our computationally simplerparameter estimation method is described.

To illustrate the diffusion approximation technique,consider again the logistic model, but in the mostinteresting case l4m, where there is drift away from theextinction state 0. Under this assumption, (3) has twoequilibria, both in ½0; 1�: 0 (unstable) and xst ¼ 1� r, wherer ¼ m=l (asymptotically stable). We note, in the context ofepidemiology, r is the equilibrium fraction of susceptibleindividuals and is also the inverse of the basic reproductionratio R0 (Diekmann et al., 1990). Furthermore, (3) has theunique solution

xðtÞ ¼xstx0

x0 þ ðxst � x0Þe�lxstt; tX0, (5)

where x0 ¼ xð0Þ is the initial point of the trajectory.It is possible to write down the full diffusion approxima-

tion for the population density X Nð�Þ ¼ mð�Þ=N about thedeterministic path (5) by way of Theorem 4 of Appendix A,but we will not pursue this here. Rather, we will considerthe OU approximation about the asymptotically stableequilibrium xst. Since F 0ðxÞ ¼ lðxst � 2xÞ, we have (localdrift) B ¼ F 0ðxstÞ ¼ �lxst and, since GðxÞ ¼ F ðxÞ þ 2mx,we have (local variance) V ¼ GðxstÞ ¼ 2mxst ¼ 2lxstr. Weconclude that, for N large, X N ðtÞ has an approximatenormal distribution with

EðX NðtÞÞ ’ xst þ e�atðX N ð0Þ � xstÞ;

VarðX NðtÞÞ ’ rð1� e�2atÞ=N;

where a ¼ lxst ¼ l� m ¼ lð1� rÞ ð40Þ. Hence, for alltX0,

EðmðtÞÞ ’ Nð1� rÞ þ e�atðmð0Þ �Nð1� rÞÞ;

VarðmðtÞÞ ’ Nrð1� e�2atÞ:

5. A computationally simpler approach

Our parameter estimation method uses the (stable) one-dimensional OU approximation outlined in the previoussection. The use of diffusion approximations for parameterestimation has been considered in the statistical literature(see for example Feigin, 1976; McNeil and Weiss, 1977),but once again the application of the idea to real biologicalmodelling appears to be limited. Recall that we are given aset of n observations ðmðt1Þ; . . . ;mðtnÞÞ of the state of theprocess at times ð0pÞ t1o � � �otn. If we denote thecorresponding observations of the density process byðX Nðt1Þ; . . . ;X NðtnÞÞ, then, when N is large, this vector willhave an approximate normal distribution, with a covar-iance structure that can be determined explicitly fromproperties of the limiting OU process.If we start the OU process in equilibrium, that is,

Zð0Þ�Normalð0;s2Þ, where s2 ¼ V=ð�2BÞ, or equivalently(for large N) X Nð0Þ�Normalðxeq; s2=NÞ, then it willbe strongly stationary. Therefore, since CovðZðsÞ;Zðsþ tÞÞ ¼ s2 expðBjtjÞ, we may approximate CovðX N ðsÞ;X N ðsþ tÞÞ by

cðtÞ :¼ cð0Þ expðBjtjÞ, (6)

where cð0Þ ¼ s2=N. Hence, for large N, we know explicitlythe correlation structure of the vector ðX Nðt1Þ; . . . ;X NðtnÞÞ,and hence its likelihood function:

f ðxÞ ¼1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið2pÞnjCj

p exp �1

2ðx�mÞC�1ðx�mÞ0

� �, (7)

where m ¼ ðm1;m2; . . . ;mnÞ, mi ¼ xeq for all i ¼ 1; 2; . . . ; n,and

C ¼

c1 c1;2 c1;3 � � � c1;n

c1;2 c2 c2;3 � � � c2;n

..

. ... ..

. ... ..

.

c1;n c2;n c3;n � � � cn

0BBBBB@

1CCCCCA,

where ci ¼ s2=N and ci;iþs ¼ ðs2=NÞ expðBðtiþs � tiÞÞ. Theinversion of C, and calculation of its determinant, can bedone explicitly, and this permits us to write down the (log)likelihood explicitly (see Appendix B). The form thatshould be used in practice is given in (10) (or (11)).We can therefore evaluate the (joint) MLEs for the

parameters of the model, which are the values thatmaximise (7). Explicit calculation of the MLEs is notpractical in most situations. Therefore, a numericaloptimisation procedure will be required in most cases tofind the parameters which maximise the likelihood func-tion. The CE method (see Rubinstein and Kroese, 2004) isan ideal recent approach, but any one of several numericaloptimisation procedures should be effective. We illustratethis approach in Section 6 by using the CE method to

ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510502

estimate the parameters of the stochastic logistic modelusing both simulated and real data.

Remarks. (1) In many situations the maximum populationsize N is unknown. It can be estimated using theapproximation NX Nð0Þ�NormalðNxeq;Ns2Þ. The jointMLEs are then calculated with N as an additionalunknown parameter.

(2) The OU approximation is achieved by letting themaximum population size tend to infinity. Thus, the OUapproximation, and hence the parameter estimationprocedure itself, are best for large population sizes. Thisis precisely the situation in which the first method becomesinfeasible.

(3) We emphasise that unequally spaced sampling of theprocess is no obstacle to the success of the method, as canbe seen from the covariance structure (6).

Finally, we note that any approach to estimating theparameters of an OU process may be implemented. Forexample, one may use the theory of estimation forcontinuous-time autoregressive processes (Fan et al.,1999). See also Franco (2003) for a computationallyefficient maximum likelihood method. Additionally, inthe approach presented above, it is often possible todetermine an equation for the optimal value of oneparameter, conditional on knowing the other parameters.In such cases we may use this equation to reduce thedimensionality of our search by one, thus increasing thecomputational efficiency of the method.

6. The stochastic logistic model

In this section we illustrate the methods presented inSections 3 and 5, by estimating the parameters of thestochastic logistic model from both simulated dataand data for a population of Bay checkerspot butterfly(E. editha bayensis) (Harrison et al., 1991). We considersituations in which the maximum population size is knownand unknown, where it is ‘small’ and where it is ‘large’, andsituations where we have a reasonable sample of data, andothers where we have minimal data.

The general approach: For this model, the parametervector y is ðl; mÞ, or ðl;m;NÞ, depending upon whether N isknown or unknown, respectively.

The computationally simpler approach: We assume thatX N ð0Þ is normally distributed with mean 1� m=l andvariance m=ðNlÞ, so that the OU process is stronglystationary and, in particular,

CovðX NðsÞ;X N ðsþ tÞÞ ’ cðtÞ ¼m

Nlexpð�ðl� mÞjtjÞ.

Furthermore, the likelihood ðX Nðt1Þ;X Nðt2Þ; . . . ;X NðtnÞÞ

has the approximate Gaussian density (7) whose para-meters have entries mi ¼ 1� m=l, ci ¼ m=ðNlÞ and ci;iþs ¼

m=ðNlÞ expð�ðl� mÞðtiþs � tiÞÞ (i ¼ 1; 2; . . . ; n).Experience shows that in some situations it is better,

from a numerical perspective, to use the log-likelihood lðxÞ,

that is

lðxÞ ¼ �n

2lnð2pÞ �

1

2lnðjCjÞ �

1

2ðx�mÞC�1ðx�mÞ0.

(Again, the form that should be used in practice is givenin (11).)

6.1. Simulated data

In all cases discussed below, our simulated data sets weregenerated in the following way. We simulated thestochastic logistic model with chosen parameters (as listedin each section) to generate data over 119 units of time(years). We then observed the population size at yearlyintervals. We took as our data set the required number n ofobservations, after discarding the first 20 observations(corresponding to 19 years of data), to allow the influenceof the initial conditions to dissipate. Also, unless otherwisespecified, the results presented are from 20 runs of theappropriate CE algorithm (see Appendix C).In the tables presented, ‘It’ refers to the number of

iterations taken by the CE algorithm and ‘S’ represents thevalue of the (log-)likelihood. The ‘min’, ‘median’, ‘mean’and ‘max’ in the tables refer to the value of the (log)likelihood S.

The general approach: The parameters used were N ¼ 50,l ¼ 0:8 and m ¼ 0:4, and n ¼ 40. We used a CE algorithm,with normal updating (Rubinstein and Kroese, 2004) tomaximise the likelihood given by (1). We considered the(log-)likelihood as a function of the ‘natural’ parameters,N, r ¼ m=l and a ¼ l� m. In the case where N was known,we used the log-likelihood, and in the case where it wasunknown, we used the likelihood itself. We used EX-POKIT to evaluate (1) with initial distribution taken to bethe quasi-stationary distribution, and began the CEalgorithm with 1000 initial parameter estimates for ðl;mÞ,drawn uniformly from the triangle (0,0)–(0,5)–(5,5). Notethat, for the example considered, a wider range of initialparameters resulted in numerical precision problems whenusing EXPOKIT (for Matlab) to evaluate the likelihood.Also, for each parameter estimate, a transition rate matrixhad to be created and passed to EXPOKIT for evaluation.In our case, we could take advantage of the sparsestructure of Q, which EXPOKIT could exploit. Even so,experience showed that when N became quite large, theevaluation of the likelihood at a parameter estimate couldbecome too costly for our algorithm to complete inreasonable time (on a 3GHz Pentium 4 with 1Gb ofRAM). Hence, it would appear that the general procedureis currently feasible only when N is moderate.The CE parameters were updated on the basis of the best

25 parameter estimates, meaning that the best 25 parameterestimates of the 1000 were used to obtain a new samplingdistribution for the next iteration. The algorithm wasstopped when all of the standard deviations were belowe ¼ 10�7, or when the algorithm hit the maximumiterations ceiling of 250. Note that we did allow ‘illegal’

ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510 503

parameter values, such as negative N, r41 or ao0.Further, we did not attempt to evaluate the likelihood inthese situations; rather, we applied a penalty, beingquadratic in the parameters. This guided the samplingdistribution of the CE algorithm into the region ofallowable parameters if it strayed. Finally, if the optimisa-tion procedure terminated due to a numerical precisionerror, the procedure was rerun.

N known: We obtained the following results:

N ¼ 2000

Known N

Unknown N

MinMeanMediaMax

MinMeanMediaMax

n ¼ 10

N l

––

n ––

1654.771654.76

n 1654.781654.71 1

m

661.143 32747.609 37794.560 39687.940 34

782.968 30589.403 23609.114 23081.070 42

n ¼ 40

It S bl bm Min 8 �109.108 1.14739 0.566295 Mean 8.65 �109.108 1.14739 0.566295 Median 9 �109.108 1.14739 0.566295 Max 9 �109.108 1.14739 0.566294

N unknown: When N was unknown, we introduced anadditional sampling distribution for N. Using the samedata set as above, we sampled 1000 initial populationceilings uniformly from the maximum observed populationsize (which in this case was 32) and 250. (Note that thisceiling should be large enough to encompass the truepopulation size.) All of the other parameters remained thesame.

n ¼ 40

It S bN bl bm Min 14 �107.29 40 1.63349 0.605665 Mean 15 �107.249 39.15 1.67561 0.598932 Median 17.5 �107.242 39 1.68304 0.597744 Max 15 �107.242 39 1.68304 0.597744

n ¼ 40 n ¼ 100

N l m N l m

8.464 – 0.906201 0.448219 – 0.906262 0.4520531.422 – 0.905708 0.447823 – 0.906262 0.4520534.748 – 0.905563 0.447709 – 0.906262 0.4520531.778 – 0.905564 0.447709 – 0.906262 0.452053

6.883 1654.03 1.17260 0.45642 1854.30 0.978558 0.4496911.008 1653.21 1.17329 0.456330 1853.45 0.979045 0.4496558.733 1653.10 1.17329 0.456270 1853.30 0.979232 0.4496843.709 1653.07 1.17347 0.456343 1853.27 0.979215 0.449679

The computationally simpler approach: In this case, theCE algorithm took 50,000 samples per iteration, with thesampling parameters updated from the best 100parameter estimates. We initialised the algorithm with

pairs ðl;mÞ drawn uniformly from the triangle(0,0)–(1000,0)–(1000,1000). In the case that N was un-known, we drew N uniformly between the highest valuefound in the sample and 10; 000. The algorithm wasstopped when the largest of the standard deviations of thesampling distributions dropped below the thresholde ¼ 10�7. The larger search space and number of samplesper iteration in the CE algorithm were now practical owingto the low cost of evaluating the (log-)likelihood for a trialset of parameters.The following tables show results for data simulated

using parameters ðN; l;mÞ ¼ ð50; 0:8; 0:4Þ.N known:

n ¼ 40

It S bl bm Min 23 47.5405 1.10683 0.559313 Mean 21.95 47.5464 1.10654 0.557900 Median 22.5 47.5437 1.10674 0.558667 Max 8 47.5582 1.10432 0.552027

N unknown:

n ¼ 40

It S bN bl bm Min 7 2:57901e� 047 38.8803 1.61317 0.596472 Mean 6.85 2:57912e� 047 38.9045 1.60636 0.592581 Median 7 2:57913e� 047 38.8875 1.60156 0.590422 Max 7 2:57915e� 047 38.9036 1.59894 0.589929

The following tables show results for data simulatedusing parameters ðN; l;mÞ ¼ ð2000; 0:8; 0:4Þ.

N known, unequal spacing: The results below are forunequal spacing of observations; we used the following

ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510504

spacing (in years) between observations:

ð2; 2; 2; 2; 3; 2; 1; 1; 2; 1; 2; 2; 2; 3; 2; 1; 2; 1; 2; 3; 2; 1; 2; 3; 3; 2; 1; 1; 2Þ.

n ¼ 30

It S bl bm Min 24 87.7777 1.21099 0.600365 Mean 19.7 87.8001 1.20623 0.597228 Median 24.5 87.7991 1.2081 0.59832 Max 9 87.8055 1.20298 0.595267

6.2. The Bay checkerspot butterfly of Jasper Ridge

We now consider the time series of the Jasper Ridge(JRH) population of Bay checkerspot butterfly, E. editha

bayensis. The population was sampled yearly between 1960and 1986 (Harrison et al., 1991). An analysis of this dataset is included to demonstrate the application of themethod to real data. We do not claim that the stochasticSIS logistic model is the most appropriate for modellingthe butterfly population in question. Furthermore, weassume that the population is in stationarity.

N unknown: The following results were obtained usingthe computationally simpler approach:

n ¼ 27

It S bN bl bm Min 8 8.57701e� 088 280156 245.645 245.215 Mean 8 8.57701e� 088 280156 245.645 245.215 Median 8 8:57701e� 088 280156 245.645 245.215 Max 8 8.57701e� 088 280156 245.645 245.215

2In particular with respect to the predictions in Foley (1994).

Observe that N was estimated (consistently) to be wellabove our (arbitrary) cap of 10,000. The actual procedureused to obtain this estimate involved first taking into accountthat the optimisation favoured solutions with N outside thedefault initial range, and subsequently increasing the initialrange until the estimates fell within this range. (In this case,the ‘cap’ of the initial range was increased to 500,000.)

The Bay checkerspot butterfly is now extinct on JasperRidge, with the JRH population thought to have goneextinct in 1998 due to a combination of extensive habitatloss and a period of increased climatic variability beginningin 1972 (McLaughlin et al., 2002). We can evaluate theexpected time to extinction using the stochastic logisticmodel and parameter estimates derived from our procedureusing data from a time when the population was extant.For a general Markov chain with transition ratesQ ¼ ðqðm; nÞ; m; n 2 SÞ, whose state space S (possiblyinfinite) includes a subset A which is reached withprobability 1, the expected time ti it takes to reach A

starting in state i is the minimal non-negative solution toXj2S

qði; jÞtj þ 1 ¼ 0; ieA,

with ti ¼ 0 for i 2 A. This result can be found in most textson Markov chains (for a recent exposition see Norris,1997). Mangel and Tier (1994), in their paper ‘Four factsevery conservation biologist should know about persis-tence’, encourage their readers to use it: Fact 2 ‘There is asimple and direct method for the computation of persis-tence times that virtually all biologists can use’. The resultreduces the problem of computing persistence times to thatof solving a system of linear equations, for which a host ofnumerical methods exist. For any stochastic birth-and-death process, with sets of (population-size dependent)birth and death rates fljg and fmjg, respectively, ti isgiven by

ti ¼Xi

j¼1

1

mjpj

XN

k¼j

pk,

where (the potential coefficients) pj ðjX1Þ are given by p1 ¼1 and pj ¼

Pjk¼2 ðlk�1=mkÞ for jX2. This formula is also

valid in the infinite-state case, replacing N by 1. For thestochastic logistic model we have

ti ¼1

m

Xi

j¼1

XN�j

k¼0

1

j þ k

Yk�1l¼0

N � j � l

Nr

� �.

(This expression admits further simplification, but thisform reflects the algorithm used to evaluate ti, the productbeing evaluated recursively, and the sums evaluated in sucha way as to minimise round-off error.) Whilst the aboveexpression for ti does not pose any significant numericalproblems, an explicit expression for the time to extinction isoften preferred. By evaluating the factorials as gammaintegrals, and using Cauchy’s method to estimate theseintegrals, Pollett (2001) obtained the following explicitasymptotic formula:

ti�rð1� riÞ

mð1� rÞ2e�ð1�rÞ

r

� �N ffiffiffiffiffiffi2pN

rðN largeÞ.

The expected time to extinction for the butterfly populationusing this approach, and setting the initial population sizeto the average population size (that is i ¼ 553) isapproximately 3:39

ffiffiffipp

years. It is interesting to note thatthis is quite close2 to the observed extinction time (over therecorded population range), 12 years after the lastmeasurements present in the data we used.

6.3. Bias

Here we examine (numerically) the bias of our compu-tationally simpler approach using simulated data from thestochastic logistic model with parameters ðN; l; mÞ ¼ð2000; 0:8; 0:4Þ. We applied the technique and obtainestimates for the parameters, for both N known and N

unknown. The procedure was repeated 10,000 times(obtaining new simulated data each time). Any data set

ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510 505

for which a45 had the CE algorithm rerun on it (since thisindicated a numerically spurious result, rather than, say thedata set actually producing such a strange result; in mostcases, if a45 then a4100). Using this process, we arrivedat a set of values that are reasonably accurate indicators ofthe actual MLEs for the 10,000 data sets.

Note that the tables presented in this section are nolonger ordered by likelihood, but rather by the actualvalues for the parameters in question.

n ¼ 100

bN bl bm Min – 0.41054 0.20499 Median – 0.80304 0.40186 Mean – 0.81481 0.40785 Max – 1.59760 0.81427

Min

1406 0.41324 0.20583 Median 1924 0.84405 0.40645 Mean 1951 0.86762 0.41219 Max 3158 9.11500 4.16980

In order to assess the bias of the technique we performeda sign test on each of the estimated parameters, with thefollowing hypotheses:

H0 : The estimator is unbiased:

HA : Not H0:

The sign test was chosen because of the departure fromnormality observed in our estimates, and a normalapproximation was used to obtain our probabilities. Fora decision about biasedness, we took our critical p-value tobe 0:01. With this in mind, we obtained the following:

N known:

l

m

T

1.34 2.22 p 0.180245 0.026419 Biased No No

N unknown:

N

l m

T

�26.12 21.1 6.76 p o0:000001 o0:000001 o0:000001 Biased Yes Yes Yes

Observe that, when N is known, we cannot reject thehypothesis that the estimators are unbiased using the signtest. However, we can strongly reject the hypothesis whenN is unknown, concluding that the (joint) estimators in thiscase are biased.

6.4. Error bounds

MLEs are asymptotically normally distributed withmean y and covariance matrix given by the inverse of theFisher information matrix. This can be estimated by

IðyÞ� ��1

¼ �Eq2lðyÞ

qy qy0

" #( )�1.

However, the second derivatives of the log-likelihood areoften too complicated for their exact expected valuesto be calculated in practice. A second estimator widelyused is

IðyÞ� ��1

¼ �q2lðyÞ

qy qy0

!�1,

this being the inverse of (the negative of) the matrix ofsecond derivatives of the log-likelihood, evaluated at theMLE. For our model we have

Iððl; mÞÞ� ��1

¼ �

q2l

ql2

q2l

ql qm

q2l

qm ql

q2l

qm2

0BBBBB@

1CCCCCA�1

¼q2l

ql2

q2l

qm2�

q2l

ql qm

q2l

qm ql

!�1 �q2l

qm2q2l

ql qm

q2l

qm ql�q2l

ql2

0BBBBB@

1CCCCCA.

Hence, we must compute the second derivatives of the log-likelihood given in Appendix B. This can be done exactly.However, the formulæ are rather cumbersome and will notbe written out here. We use this approach to calculate theerror bounds presented below.Often it will be difficult to compute the matrix of second

derivatives. In such cases a third estimator may be useful,which only requires first derivatives of the log-likelihood.This estimator is

IðyÞ� ��1

¼Xn

i¼1

bgi bg0i !�1

,

where

bgi ¼qlðxi;byÞ

qby .

We illustrate the error bounds by plotting results ofcalculations for the stochastic logistic model, with a set ofsimulated data (with 100 equally spaced observations, andparameters ðN; l;mÞ ¼ ð2000; 0:8; 0:4Þ), and the JRH data(taking our estimate of N to be the true value).Note that in Fig. 2 the confidence ellipse has been

rotated, so that the x-axis is the line y ¼ r�1x. Without

ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510506

rotation, the 95% confidence region appears to be a line,suggesting that the ratio m=l is estimated with far greaterconfidence than the difference l� m.

7. Discussion

When N is known, we see that the general approachproduces reasonable estimates, bl ¼ 1:14739 andbm ¼ 0:566294, of the true parameter values l ¼ 0:8 andm ¼ 0:4. When N is unknown, we once again get reasonableestimates, bN ¼ 39, bl ¼ 1:68304 and bm ¼ 0:597744, to thetrue parameter values N ¼ 50, l ¼ 0:8 and m ¼ 0:4. Asalready noted, even with a small population ceiling ofN ¼ 50, and a small initial search space for this parameter,namely the interval ½32; 250�, the EXPOKIT functionmexpv was unable to complete calculations to the defaulttolerances in many cases, and hence errors were returned.

It can be seen that our approach enables the use of amuch wider search space, owing to the low computationalcost of evaluating the approximation’s (log-)likelihood.The parameter space can be expanded from values up to 5to values up to 1000, a significant increase. This suggeststhat the new procedure permits more uncertainty in thetrue parameter values than does the general approach.Likewise, the maximum population size can be increased,so that the default search space for this parameter is now½32; 10; 000�, once again allowing more uncertainty in thevalue of the maximum population size for the species beingmodelled. This also allows more freedom in modellingpopulations that are only weakly density dependent (in theecological sense mentioned in Section 1) using thestochastic logistic model. This will be discussed in moredetail later in this section with reference to the Baycheckerspot butterfly. The sample size used for the CEmethod can also be significantly increased, in our examplefrom 1000 to 50,000, thus allowing more accuratenumerical maximisation of the likelihood from the CEmethod. For the same data set used to illustrate the generalapproach, the computationally simpler approach producesreasonable estimates (bl ¼ 1:10432 and bm ¼ 0:552027) whenN is known. When N is unknown, we again achievereasonable estimates ( bN ¼ 39, bl ¼ 1:59894 and bm ¼0:589929) with a reasonably significant overestimate ofthe birth parameter. We note the close agreement of theestimates produced using our computationally simpleapproach with those produced via the general approach.The simpler approach works best for large populationsizes. The general approach should be used in cases when itcan be practically implemented, although the simplerapproach can still provide adequate estimates of para-meters in these cases.

We now consider situations in which the generalapproach is infeasible to implement, and demonstrate therobustness of the simpler approach. We use the parametervalues N ¼ 2000, l ¼ 0:8 and m ¼ 0:4. Assuming N isknown, we see that extremely accurate estimates areproduced when we have 40 and 100 observations of data,

being bl ¼ 0:905564 and bm ¼ 0:447709, and bl ¼ 0:906262and bm ¼ 0:452053, respectively. However, in the case ofonly 10 observations, our approach fails to producereasonable parameter estimates. In this case, the ratio rwas estimated well, but the difference a ¼ l� m was not,resulting in unreasonable estimates for both l and m. Thedifference a is the mean-reversion parameter, and this isknown to be difficult to estimate precisely (see for exampleFranco, 2003). This also demonstrates that an adequatesample of data is required to produce accurate parameterestimates. The sample size required for a particular model,and unknown parameter set, should be investigatednumerically through a simulation study. For the stochasticlogistic model, we find that somewhere between 15 and 20observations are required to produce reasonably accurateestimates for most simulated data. A similar result can beseen in the N unknown case. The parameter estimates areseen to improve with an increasing number of observa-tions. For the case n ¼ 40 the estimates are bN ¼ 1653, bl ¼1:17347 and bm ¼ 0:456343, improving to bN ¼ 1853, bl ¼0:979215 and bm ¼ 0:449679 when n ¼ 100. All of theseestimates are reasonably accurate and all are consistentlyachieved. Finally, we demonstrate the robustness of ourapproach to unequal observation intervals, producingparameter estimates of bl ¼ 1:20298 and bm ¼ 0:595267. Thisis an important feature of the procedure, in particular topopulation modelling where equally spaced data are notalways available.We also fitted the stochastic logistic model to population

data of the Bay checkerspot butterfly using our procedure.The parameter estimates produced were bN ¼ 280; 156, bl ¼245:645 and bm ¼ 245:215. Harrison et al. (1991) found thatthe observed data are consistent with the assumption ofdensity independence or only weak density dependence(both in the ecological sense). Our estimates support theconclusion of Harrison et al. (1991): since the estimates forl and m are so close, the population tends to be small, inparticular relative to the large population ceiling N, andhence qðm;mþ 1Þ ’ lm over a wide range of populationsizes. Consequently, the population does not exhibit astrong downward trend in size following deviations aboveits carrying capacity. This result indicates a degree offlexibility in the logistic model and the proposed estimationprocedure when N is unknown, because, while the modelallows for a tendency to decline at large population sizes, itdoes not require this tendency to be strong. As a widevariety of density-dependent processes may be analysedusing this approach, it appears to provide a robust methodfor assessing whether populations are strongly limited bytheir carrying capacity. We additionally evaluated theexpected time to extinction of this population and foundour estimates to be reasonably close to the reported time ofextinction (McLaughlin et al., 2002). Care should be takenwhen the estimates of l and m are close together, as we havefor this data set, because if the true difference a (being themean-reversion parameter) is 0, or negative, the assump-tions of our method are not valid, and the method will

ARTICLE IN PRESS

-600 -400 -200 0 200 400 600 800 1000 1200-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Per

pend

icul

ar D

ispl

acem

ent f

rom

y=

ρ-1x 50%

95%Estimate

Position along y=ρ-1x

Fig. 2. Confidence regions for the butterfly data, with N ¼ 280; 156 taken

as known.

J.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510 507

return a positive estimate of a (which will be typically veryclose to 0).

We also investigated the bias of our approach. We foundthat in the N-known case our procedure could not be saidto produce biased estimates of birth and death parameters,on the basis of a sign test over 10,000 sets of simulateddata. In the N-unknown case, it appears that our approachproduces biased estimates. From our investigation, for thestochastic logistic model, it appears that the populationceiling estimate bN is generally underestimated, and thebirth and death parameter estimates bl and bm are generallyoverestimated. Note that, over all data sets considered, theCE algorithm occasionally produced outlying estimates(whether this was caused by the algorithm or a particularquality of the data set in question, we are unsure).However, these outliers do not adversely affect the testused, since the sign test only records whether the estimatesare above or below the true parameters, and ignoresmagnitude. However, the median estimates for l and mshow a remarkable correspondence to the true parametervalues. In any particular situation, we recommend that anumerical simulation study be performed to determine theextent of any bias. It can then be compensated for whenusing the parameter estimates for subsequent analyses ofthe population.

We can calculate approximate confidence regions forestimates of parameter vectors produced using ourapproach. We have depicted two such regions in Figs. 1and 2, the first for 100 observations of a simulated process,and the second for our model of the butterfly population.From Fig. 1, we see that there appears to be moreconfidence in the ratio m=l than in the difference a ¼ l� m.As a consequence, the confidence region is long andnarrow. For the stochastic logistic model, this appears tobe the general pattern. The parameter pairs which fallinside the 95% confidence region range from ðl;mÞ �ð0:5; 0:25Þ to ðl; mÞ � ð1:3; 0:65Þ, and happens to include the

0.2 0.3 0.4 0.5 0.6 0.70.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

µ

λ

50%95%EstimateTrue

Fig. 1. Confidence regions for simulated data (n ¼ 100), with N ¼ 2000

(known).

true parameter pair. The confidence region for theestimates of parameters from the butterfly data is extremelynarrow, indicating high confidence in the ratio r, andabysmal confidence in the difference a. The 95% con-fidence region, in this case, extends into negative parameterpairs, which are illegal for the model. This case highlightsthe importance of considering estimates in conjunctionwith their confidence regions. We note that more data, aswell as less variance in that data, will tend to yield smallerconfidence regions.Finally, we note that the use of the general approach will

commonly become computationally infeasible as thedimensionality of the model increases. We are currentlyinvestigating the use of our parameter estimation techniquefor such models, employing the multi-dimensional diffu-sion approximation. In such situations it is also frequentlythe case that we can only observe one (or more generally asubset) of these dimensions. Our procedure may be used insuch situations. However, obviously as more parametersare required to be estimated, the effectiveness of ourprocedure will be reduced. We are currently investigatingthis type of data, along with the extension of our techniqueto handle data from transient dynamics, that is insituations where our assumption of stationarity is notvalid.

8. Summary

We have presented two approaches to estimating theparameters of continuous-time Markov chains, thusincreasing their utility in modelling real biological systems.The first approach is applicable when both the parameterspace and the maximum population size are not too large.The second approach can be used whenever the rates of theoriginal Markov chain have a particular, but quite generalform, and is applicable in many situations where thegeneral approach is infeasible. We illustrated its versatility

ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510508

by estimating the parameters of the stochastic logisticmodel from simulated data. We also fitted this model to a,now extinct, population of the Bay checkerspot butterfly(E. editha bayensis) and used the fitted model to predictretrospectively the extinction time of the population. Thiswas found to be close to the witnessed time of extinction.The new approach to parameter estimation is applicable toa wide class of models, in particular those appropriate formodelling many biological systems. For this reason, weanticipate that our methods will open up new opportunitiesfor implementing continuous-time Markov chains asmodels for real populations, and for performing subse-quent population viability analyses using these models.

Acknowledgments

The authors thank Ben Cairns and Dirk Kroese forvaluable discussions. The support of the AustralianResearch Council Centre of Excellence for Mathematicsand Statistics of Complex Systems is gratefully acknowl-edged.

Appendix A. Density dependence

In a series of papers, Kurtz (1970, 1971) and Barbour(1974, 1976, 1980a,b) established a suite of diffusionapproximation techniques for density-dependent Markovchains. We summarise their main results here, but in theslightly more general context of asymptotic density-dependent processes outlined in Pollett (1990, 2001).

Suppose that we are given a family fmN ð�Þg ofcontinuous-time Markov chains indexed by N40, wheremNð�Þ takes values in SN , a subset of Zk, and has transitionrates QN ¼ ðqN ðm; nÞ; m; n 2 SNÞ.

Definition 2. Suppose that there is a subset E of Rk (k-dimensional Euclidean space) and a family ff N ;N40g ofcontinuous functions, with f N : E � Zk ! R, such that

qNðm;mþ lÞ ¼ Nf N

m

N; l

� �; la0.

Then, the family of Markov chains is asymptoticallydensity dependent if, additionally, there exists a functionF : E! R such that fF Ng, given by FN ðxÞ ¼

Pl lf N ðx; lÞ,

x 2 E, converges (pointwise) to F on E. It is called simplydensity dependent if f N , and hence F N , are the same forall N.

Now define a ‘density process’ X Nð�Þ byX N ðtÞ ¼ mNðtÞ=N, tX0. The following functional law oflarge numbers establishes convergence of the familyfX N ð�Þg to the unique trajectory of an appropriateapproximating deterministic model.

Theorem 3. Suppose that f Nð�; lÞ is bounded for each l and

N, that F is Lipschitz continuous on E and that fF Ng

converges uniformly to F on E. Then, if limN!1X Nð0Þ ¼ x0,the density process X N ð�Þ converges uniformly in probability

on ½0; t� to xð�Þ, the unique (deterministic) trajectory

satisfying xð0Þ ¼ x0 and

d

dsxðsÞ ¼ F ðxðsÞÞ; xðsÞ 2 E; s 2 ½0; t�. (8)

The following functional central limit law establishesthat, for large N, the fluctuations about the deterministictrajectory follow a Gaussian diffusion, provided that mild‘second-order’ conditions are satisfied.

Theorem 4. Suppose f Nð�; lÞ is bounded for each l and N,that F is Lipschitz continuous and has uniformly continuous

first derivative on E, and that

limN!1

supx2E

ffiffiffiffiffiNpjF NðxÞ � F ðxÞj ¼ 0.

Suppose also that the sequence fGNg, where

GN ðxÞ ¼X

l

l2f N ðx; lÞ; x 2 E,

converges uniformly to G, where G is uniformly continuous

on E. Let x0 2 E and let xð�Þ be the unique trajectory

satisfying xð0Þ ¼ x0 and (8). Then, if

limN!1

ffiffiffiffiffiNpðX N ð0Þ � x0Þ ¼ z, (9)

the family of processes fZNð�Þg, defined by

ZN ðsÞ ¼ffiffiffiffiffiNpðX N ðsÞ � xðsÞÞ; 0pspt,

converges weakly in D½0; t� (the space of right-continuous,left-hand limit functions on ½0; t�) to a Gaussian diffusion Zð�Þ

with initial value Zð0Þ ¼ z and with mean and variance given

by ms :¼EðZðsÞÞ ¼Msz, where Ms ¼ expðR s

0 Bu duÞ and

Bs ¼ F 0ðxðsÞÞ, and VarðZðsÞÞ ¼ s2s , where

s2s ¼M2s

Z s

0

M�2u GðxðuÞÞ du.

The functional central limit theorem tells us that, forlarge N, the scaled density process ZNð�Þ can be approxi-mated over finite time intervals by the Gaussian diffusionZð�Þ. In particular, for all s40, X NðsÞ has an approximatenormal distribution with VarðX NðsÞÞ ’ s2s=N. We wouldusually take x0 ¼ X Nð0Þ, thus giving EðX NðsÞÞ ’ xðsÞ.

In the important special case where the initial point x0 ofthe deterministic trajectory is chosen as an equilibriumpoint of (8), we can be far more precise about theapproximating diffusion. If we set x0 ¼ xeq in Theorem 4,where xeq satisfies F ðxeqÞ ¼ 0, then, under the conditions ofthat theorem, we arrive at the following corollary:

Corollary 5. If xeq satisfies F ðxeqÞ ¼ 0 then, under the

conditions of Theorem 4, the family fZN ð�Þg, defined by

ZN ðsÞ ¼ffiffiffiffiffiNpðX N ðsÞ � xeqÞ, 0pspt, converges weakly in

D½0; t� to an OU process Zð�Þ with initial value Zð0Þ ¼ z,local drift B ¼ F 0ðxeqÞ and local variance V ¼ GðxeqÞ. In

particular, ZðsÞ is normally distributed with mean ms ¼ eBsz

and variance s2s ¼ ðV=ð2BÞÞðe2Bs � 1Þ.

We conclude that, for N large, X NðsÞ has an approximatenormal distribution with VarðX NðsÞÞ ’ s2s=N. A ‘working

ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510 509

approximation’ for the mean (that is, for a fixed value of N)is obtain by setting z equal to

ffiffiffiffiffiNpðX Nð0Þ � xeqÞ (refer to

Eq. (9)). We get

EðX N ðsÞÞ ’ xeq þ eBsðX N ð0Þ � xeqÞ.

Note that this corollary implies that the distribution of a(finite) sequence of observations of our scaled densityprocess, that is ðZN ðt1Þ; . . . ;ZNðtnÞÞ with ð0pÞ t1o � � �otn,converges to a multivariate normal distribution. Conse-quently, we may approximate the joint distribution of a(finite) sequence of observations of our density processX N ð�Þ by a multivariate normal distribution.

In the context of population models xeq will usually beasymptotically stable, that is Bo0. However, it should beemphasised that it need not be for each of the aboveconclusions to hold. Indeed, the OU approximation isoften very accurate in describing the fluctuations aboutcentres and unstable equilibria (see Barbour, 1976). In theasymptotically stable case the OU process Zð�Þ hasstationary mean 0 and variance s2 ¼ V=ð�2BÞ, and hence(approximately, for large N) X N ðsÞ�Normalðxeq;s2=NÞ.However, while it is intuitively reasonable to imagine thatthe long-term behaviour of the density process X Nð�Þ will beapproximated by that of the diffusion, a precise statementcannot be deduced immediately from the theorems statedabove, for the behaviour (as t!1) depends on the wholeprocess, rather than its behaviour over finite time intervals.

Appendix B. The OU (log-)likelihood

In this section we summarise results from Rybicki (1994)that allow us to invert explicitly, and calculate thedeterminant of, the time-dependent covariance matrix C

of the OU process. This permits us to write the (log)likeli-hood explicitly, and consequently note that only OðnÞ

operations are required to evaluate it.Write the likelihood function (7) as

f ðxÞ ¼1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið2pÞnjCj

p exp �1

2yC�1y0

� �,

with y ¼ x�m. It turns out (Rybicki and Press, 1995) that

yC�1y0 ¼N

s2Xn

i¼1

ðyi � ri�1yi�1Þ2

1� r2i�1,

where s2 ¼ V=ð�2BÞ and

detðCÞ ¼s2

N

� �n Yn�1i¼1

ð1� r2i Þ,

where

rk ¼0 if k ¼ 0;

exp½Bðtkþ1 � tkÞ� if 1pkpn� 1:

(

This allows us to write

f ðxÞ ¼2ps2

N

� ��n=2 Yn�1i¼1

1� r2i

!�1=2

� exp �N

2s2Xn

i¼1

ðyi � ri�1yi�1Þ2

1� r2i�1

" #, ð10Þ

and

lðxÞ ¼ �n

2log

2ps2

N

� ��

1

2

Xn�1i¼1

logð1� r2i Þ

�N

2s2Xn

i¼1

ðyi � ri�1yi�1Þ2

1� r2i�1. ð11Þ

We emphasise that the above formulæ hold in general,and should be used to evaluate the (log-)likelihood(instead of inverting C and calculating its determinantnumerically).

Appendix C. The CE method

We describe here the algorithm used to maximise the(log-)likelihood (for further details and applications of theCE method, see Rubinstein and Kroese, 2004). Thealgorithm is used with our simpler approach based on theOU approximation. However, a slight change in para-meters, coupled with the (numerical) computation of thetrue likelihood (say using EXPOKIT) instead of thelikelihood in Step 3, will result in an algorithm formaximising the likelihood in the general approach outlinedin Section 3.

Algorithm.

1.

Set Ns ¼ 50; 000, Ne ¼ 100, e ¼ 10�7, a ¼ 1000,maxits ¼ 5000, t ¼ 0.

2.

Draw Ns pairs ðli;miÞ uniformly from the triangle(0,0)–ða; 0Þ–ða; aÞ and transform them into ða;rÞ pairs.If N is unknown, draw Ns samples Ni uniformly from½maxðxÞ; 10; 000�.

3.

Set t ¼ tþ 1. If N is known, calculate li ¼ lðx; li; miÞ foreach i, and if N is unknown, calculate li ¼ f ðx; li;mi;NiÞ

for each i.

4. Locate a set E of Ne indices i for which lkXlj for all

k 2 E, and jeE.

5. Calculate mr, sr, ma, sa, (mN , sN) as the means and

standard deviations of rk, ak (and Nk), where k 2 E.

6. Draw Ns pairs (or triples) of unknown parameters from

independent normal distributions with means andstandard deviations calculated in the previous step.

7.

If the largest s is greater than e and tomaxits thenreturn to Step 3; otherwise output bl ¼ ak=ð1� rkÞ, bm ¼akrk=ð1� rkÞ (and bN ¼ Nk), where k is an index suchthat lkXlj for all j.

ARTICLE IN PRESSJ.V. Ross et al. / Theoretical Population Biology 70 (2006) 498–510510

References

Anderson, W., 1991. Continuous-time Markov Chains: An Applications-

Oriented Approach. Springer, New York.

Bailey, N.T.J., 1957. The Mathematical Theory of Epidemics. Griffin,

London.

Barbour, A.D., 1974. On a functional central limit theorem for Markov

population processes. Adv. Appl. Probab. 6, 21–39.

Barbour, A.D., 1976. Quasi-stationary distributions in Markov popula-

tion processes. Adv. Appl. Probab. 8, 296–314.

Barbour, A.D., 1980a. Equilibrium distributions Markov population

processes. Adv. Appl. Probab. 12, 591–614.

Barbour, A.D., 1980b. Density-dependent Markov population processes.

In: Jager, W., Rost, H., Tautu, P. (Eds.), Biological Growth and

Spread, Lecture Notes in Biomathematics, vol. 38. Springer, Berlin,

pp. 36–49.

Barbour, A.D., Pugliese, A., 2004. Asymptotic behaviour of a metapo-

pulation model. J. Math. Biol. 49, 468–500.

Barbour, A.D., Pugliese, A., 2005. Asymptotic behavior of a metapopula-

tion model. Ann. Appl. Probab. 15, 1306–1338.

Bartholomew, D.J., 1976. Continuous time diffusion models with random

duration of interest. J. Math. Sociol. 4, 187–199.

Bartlett, M.S., 1949. Some evolutionary stochastic processes. J. Roy.

Statist. Soc. Ser. B 11, 211–229.

Bartlett, M.S., 1956. Deterministic and stochastic models for recurrent

epidemics. Proceedings of the Third Berkeley Symposium on Mathe-

matics Statistical Probability, vol. 4, pp. 81–109.

Clancy, D., 2005. A stochastic SIS infection model incorporating indirect

transmission. J. Appl. Probab. 42, 726–737.

Clancy, D., O’Neill, P.D., Pollett, P.K., 2001. Approximations for the

long-term behaviour of an open-population epidemic model. Metho-

dol. Comput. Appl. Probab. 3, 75–95.

Daley, D.J., Kendall, D.G., 1965. Stochastic rumours. J. Inst. Math. Appl.

1, 42–55.

Diekmann, O., Heesterbeek, J.A.P., Metz, J.A.J., 1990. On the definition

and the computation of the basic reproduction ratio R0 in models for

infectious diseases in heterogenous populations. J. Math. Biol. 28,

365–382.

Fan, H., Soderstrom, T., Mossberg, M., Carlsson, B., Zou, Y., 1999.

Estimation of continuous-time AR process parameters from discrete-

time data. IEE Trans. Signal Process. 47, 1232–1244.

Feigin, P.D., 1976. Maximum likelihood estimation for continuous-time

stochastic processes. Adv. Appl. Probab. 8, 712–736.

Feller, W., 1939. Die grundlagen der volterraschen theorie des kampfes

ums dasein in wahrscheinlichkeitsteoretischer behandlung. Acta

Biotheoret. 5, 11–40.

Foley, P., 1994. Predicting extinction times from environmental stochas-

ticity and carrying capacity. Cons. Biol. 8, 124–137.

Franco, J.C.G., 2003. Maximum likelihood estimation of mean reverting

processes: hwww.investmentscience.com/Content/howtoArticles/

MLE_for_OR_mean_reverting.pdfi (downloaded on 16th May 2006).

Gibbens, R.J., Hunt, P.J., Kelly, F.P., 1990. Bistability in communications

networks. In: Grimmett, G.R., Welsh, D.J.A. (Eds.), Disorder in

Physical Systems. Oxford University Press, Oxford, pp. 113–127.

Gillespie, D.T., 2000. The chemical Langevin equation. J. Chem. Phys.

113, 297–306.

Harrison, S., Quinn, J.F., Baughman, J.F., Murphy, D.D., Ehrlich, P.R.,

1991. Estimating the effects of scientific study on two butterfly

populations. Amer. Nat. 137, 227–243.

Kingman, J.F.C., 1969. Markov population processes. J. Appl. Probab. 6,

1–18.

Kurtz, T., 1970. Solutions of ordinary differential equations as limits of

pure jump Markov processes. J. Appl. Probab. 7, 49–58.

Kurtz, T., 1971. Limit theorems for sequences of jump Markov processes

approximating ordinary differential processes. J. Appl. Probab. 8,

344–356.

Kurtz, T.G., 1973. The relationship between stochastic and deterministic

models in chemical reactions. J. Chem. Phys. 57, 2976–2978.

Levins, R., 1969. Some demographic and genetic consequences of

environmental heterogeneity for biological control. Bull. Entom. Soc.

Amer. 15, 237–240.

Mangel, M., Tier, C., 1994. Four facts every conservation biologist should

know about persistence. Ecology 75, 607–614.

McLaughlin, J.F., Hellmann, J.J., Boggs, C.L., Ehrlich, P.R., 2002.

Climate change hastens population extinctions. Proc. Nat. Acad. Sci.

99, 6070–6074.

McNeil, D.R., Weiss, G.H., 1977. A large population approach to

estimation of parameters in Markov population models. Biometrika

64, 553–558.

Mode, C.J., Sleeman, C.K., 2000. Stochastic Processes in Epidemiology,

HIV/AIDS, Other Infectious Diseases and Computers. World

Scientific, Singapore.

Nasell, I., 2002. Stochastic models of some endemic infections. Math.

Biosci. 179, 1–19.

Norris, J., 1997. Markov Chains. Cambridge University Press, Cam-

bridge.

Oppenheim, I., Shuler, K.E., Weiss, G.H., 1977. Stochastic theory of

nonlinear rate processes with multiple stationary states. Physica 88A,

191–214.

Pollett, P.K., 1990. On a model for interference between searching insect

parasites. J. Austral. Math. Soc. Ser. B 32, 133–150.

Pollett, P.K., 1992. Diffusion approximations for a circuit switching

network with random alternative routing. Austral. Telecom. Res. 25,

45–52.

Pollett, P.K., 2001. Diffusion approximations for ecological models. In:

Ghassemi, F. (Ed.), Proceedings of the International Congress on

Modelling and Simulation, vol. 2, Modelling and Simulation Society of

Australia and New Zealand, Canberra, Australia, pp. 843–848.

Pollett, P.K., Vassallo, A., 1992. Diffusion approximations for some

simple chemical reaction schemes. Adv. Appl. Probab. 24,

875–893.

Reuter, G.E.H., 1961. Competition processes. Proceedings of the Fourth

Berkeley Symposium on Mathematics Statistical Probability, vol. 2,

pp. 421–430.

Ross, J.V., 2006a. A stochastic metapopulation model accounting for

habitat dynamics. J. Math. Biol. 52, 788–806.

Ross, J.V., 2006b. Stochastic models for mainland-island metapopula-

tions in static and dynamic landscapes. Bull. Math. Biol. 68,

417–449.

Rubinstein, R.Y., Kroese, D.P., 2004. The Cross-Entropy Method: A

Unified Approach to Combinatorial Optimization, Monte-Carlo

Simulation, and Machine Learning. Springer, New York.

Rybicki, G.B., 1994. Unpublished notes: notes on Gaussian random

functions with exponential correlation functions (Ornstein–Uhlenbeck

process). From hhttp://www.lanl.gov/DLDSTP/fast/i.

Rybicki, G.B., Press, W.H., 1995. A class of fast methods for processing

irregularly sampled or otherwise inhomogeneous one-dimensional

data. Phys. Rev. Lett. 74, 1060–1063.

Sidje, R.B., 1998. EXPOKIT. A software package for computing matrix

exponentials. ACM Trans. Math. Software 24, 130–156.

Verhulst, P.F., 1838. Notice sur la loi que la population suit dans son

accroisement. Corr. Math. Phys. X, 113–121.

Weiss, G.H., Dishon, M., 1971. On the asymptotic behaviour of the

stochastic and deterministic models of an epidemic. Math. Biosci. 11,

261–265.