an improved regulatory sampling method for mapping and representing plant disease from a limited...

10

Click here to load reader

Upload: w-luo

Post on 25-Nov-2016

239 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: An improved regulatory sampling method for mapping and representing plant disease from a limited number of samples

Ad

Wa

b

c

d

e

a

ARRAA

KDDHISU

I

cbpgtsdalHtpps

1d

Epidemics 4 (2012) 68–77

Contents lists available at SciVerse ScienceDirect

Epidemics

j o ur nal ho me p age : www.elsev ier .com/ locate /ep idemics

n improved regulatory sampling method for mapping and representing plantisease from a limited number of samples

. Luoa,c,∗, S. Pietravallea, S. Parnellb, F. van den Boschb, T.R. Gottwaldc, M.S. Ireyd, S.R. Parkere

The Food and Environment Research Agency, Sand Hutton, York YO41 1LZ, UKCentre for Mathematical and Computational Biology, Rothamsted Research, Harpenden, Herts. AL5 2JQ, UKUnited States Department of Agriculture, Agricultural Research Service, U.S. Horticultural Research Laboratoty, 2001 South Rock Road, Ft. Pierce, FL 34945, USAUS Sugar Corporation, Clewiston, FL 33440, USASyngenta Crop Protection UK, CPC4, Capital Park, Fulbourn, Cambridge CB21 5XE, UK

r t i c l e i n f o

rticle history:eceived 9 November 2011eceived in revised form 2 February 2012ccepted 23 February 2012vailable online 3 March 2012

eywords:ispersal gradientsisease spreaduanglongbing disease

a b s t r a c t

A key challenge for plant pathologists is to develop efficient methods to describe spatial patterns ofdisease spread accurately from a limited number of samples. Knowledge of disease spread is essentialfor informing and justifying plant disease management measures. A mechanistic modelling approach isadopted for disease mapping which is based on disease dispersal gradients and consideration of hostpattern. The method is extended to provide measures of uncertainty for the estimates of disease at eachhost location. In addition, improvements have been made to increase computational efficiency by betterinitialising the disease status of unsampled hosts and speeding up the optimisation process of the modelparameters. These improvements facilitate the practical use of the method by providing information on:(a) mechanisms of pathogen dispersal, (b) distance and pattern of disease spread, and (c) prediction of

ndicator krigingtrawberry powdery mildewncertainty analysis

infection probabilities for unsampled hosts. Two data sets of disease observations, Huanglongbing (HLB)of citrus and strawberry powdery mildew, were used to evaluate the performance of the new method fordisease mapping. The result showed that our method gave better estimates of precision for unsampledhosts, compared to both the original method and spatial interpolation. This enables decision makers tounderstand the spatial aspects of disease processes, and thus formulate regulatory actions accordinglyto enhance disease control.

ntroduction

The invasion of an area by non-indigenous plant pathogensan have devastating consequences on local plant populations, inoth agricultural and natural landscapes. Knowledge of the spatialatterns of disease spread can provide useful information on therowth and spread of pathogen populations which is necessary forhe design of effective management plans. In the past, intensiveampling techniques were often used to monitor the mean inci-ence and the pattern of disease spread (Kish, 1995). However, thispproach is costly, laborious and time consuming, especially forarge scale surveys. A number of researchers (Ferrandino, 2004;arrison, 1981; Xu and Ridout, 1998) have argued that due to

emporal and monetary constraints, a key challenge for plant

athologists is to develop efficient methods to describe spatialatterns of disease spread accurately from a limited number ofamples. To meet the challenge, various statistical methods that

∗ Corresponding author. Tel.: +44 0 1904 462094; fax: +44 0 1904 462111.E-mail address: [email protected] (W. Luo).

755-4365/$ – see front matter © 2012 Elsevier B.V. All rights reserved.oi:10.1016/j.epidem.2012.02.001

© 2012 Elsevier B.V. All rights reserved.

quantitatively define the spatial pattern of disease have been devel-oped.

Geostatistics have been proposed in plant pathology to ana-lyse the spatial pattern of epidemics (Chellemi et al., 1988; Steinet al., 1994; Wu et al., 2001). In geostatistics, the spatial correlationstructure is usually examined and described by a semivariogram,which quantifies spatial dependence of the variable of interest,by describing its spatial variation as a function of distance. Krig-ing is one of the common geostatistics techniques, which makesuse of the structural properties of the semivariogram to inter-polate the variable of interest (Krige, 1966). It is useful in bothcreating surface maps to reveal broad disease patterns from sparseobservations and making predictions at unsampled locations fromnearby sampled points. Furthermore, it provides variance of errorto measure the uncertainty of the prediction, which reflects thespatial correlation and the sample location pattern (Myers, 1991).Although geostatistics has several advantages in characterising the

disease pattern, it does not explicitly account for the epidemio-logical mechanisms that determine disease spread. The build-upof an epidemic depends on pathogen dispersal, and understand-ing the spatial characteristics of dispersal is essential for reliable
Page 2: An improved regulatory sampling method for mapping and representing plant disease from a limited number of samples

demics 4 (2012) 68–77 69

apsobtoLata1Hirs

tSrp2pOta

1

2

aad

M

ctptd

R

attatelc

W. Luo et al. / Epi

ssessment of the risks of disease spread (Gregory, 1968). Manyathogens travel only a short distance from the infection source ando are greatly influenced by the distances between discrete patchesf host (e.g. fields or plants). Hence, the rate of disease spread cane influenced by the host pattern. For example, an aggregated pat-ern of hosts can cause the rate of disease spread to differ from thatbserved in a uniform host pattern (Caraco et al., 2001; Gosme anducas, 2009), because potentially rare long-range dispersal eventsre then required. Given precise information on an initial infec-ion source and dispersal gradients, analytic and simulation-basedpproaches can be applied to model disease spread (Ferrandino,993; Frantzen and van den Bosch, 2000; Sackett and Mundt, 2005).owever, there are practical difficulties in knowing the locations of

nfectious hosts and acquiring empirical dispersal gradients accu-ately (from physical principles or field experiments) due to thetochastic nature of dispersal processes (Skelsey et al., 2009).

Integrating host and disease characteristics is of great impor-ance in the construction of disease risk models (Magarey andutton, 2007). It is evident that a more mechanistic approach isequired to understand the spread of disease through pathogen dis-ersal. Here, we adopt an epidemiological approach (Parnell et al.,009, 2011), which incorporates host pattern and pathogen dis-ersal processes explicitly to generate maps of disease distribution.ur study aims to make key improvements to this method in order

o enable the use of an epidemiologically informed (mechanistic)pproach in the field. Specifically we achieve this by:

. Improving the computational efficiency of the original epidemi-ological approach.

. Providing explicit measurements of uncertainty in diseasepredictions, a key consideration in decision making. Here, uncer-tainty is defined as the observed variability in the modelpredictions. This variation is the result of the stochastic processin the model iterations.

In addition, we evaluate the performance of the new methodgainst geostatistical algorithms, by comparing prediction accuracynd disease pattern recovery using two different sets of diseaseata.

aterial and methods

First, we provide a systematic overview of the original method,onsidering the estimation of disease pattern and examining poten-ial problems in modelling the epidemic parameters. We thenropose the new method for disease mapping with the incorpora-ion of uncertainty analysis. Finally, we describe the actual diseaseata used for performance assessment.

eview of original method

The method originally described by Parnell et al. (2009) providesn alternative way to methods such as geostatistics to determinehe spatial pattern of disease at unsampled locations. To do this,he method focuses on the analysis of disease dispersal gradientsnd the estimation of infection probability across the host popula-ion. The disease dispersal gradient is quantified numerically withxplicit account of the spatial structure of the studied host popu-ation. The original method (highlighted by the solid box in Fig. 1)an be broken down into three general steps:

Sample inputs: The spatial coordinates of the host population (N)are collected and a proportion of host is randomly selected forsampling to avoid biased measurement. The infection status ofthe sampled host is assessed as either diseased (1) or healthy (0).

Fig. 1. Schematic representation of the improved method for disease mapping.

Each unsampled host is then arbitrarily assigned as either dis-eased or healthy. Under most circumstances, all the unsampledhosts are set as healthy for simplicity.

• Determining “best” dispersal function: An exponential model isused to construct the dispersal gradient. To proceed, an unsam-pled host is randomly selected and the level of disease at host i iscalculated as:

yi = a∑j /= i

pj exp(−bdij). (1)

Assuming that the absolute rate of infection progress is pro-portional to the healthy tissue and follows a negative exponentialshape, the infection probability Pi is back-transformed by

Pi = 1 − exp(−yi).

Here dij is the distance from host location i–j, and pj is the infec-tion probability of host j. The infection probability of the sampledhosts is equivalent to the measured infection status, 1 for diseasedand 0 for healthy. Positive parameters a,b are used to describe theshape of the exponential curve, with a representing the magnitudeof the source and b measuring the steepness of the gradient.

The validity of updating Pi is evaluated by calculating themap accuracy for all sampled hosts. The evaluation process isimplemented in a similar manner to the “jack-knife” method orcross-validation (Efron and Gong, 1983). This is achieved by takingeach sampled host sequentially and estimating its infection prob-ability from all the remaining hosts, including both sampled andunsampled. The difference between the observed and estimatedinfection probability is used to form the test statistics for map accu-racy. Although other tests statistics could be used (e.g. the deviance,Parnell et al., 2009), the sum of absolute error (SAE) was used forbinary observations in this study. The changed Pi is accepted orrejected depending on whether the test statistic, SAE, is decreasedor not.

The process of selecting unsampled hosts to update is repeateduntil reaching steady state where no more significant improve-ment is found. The method is iterated for a reasonable range ofdispersal parameters (a,b) with fixed step size, and the optimumdispersal function is indicated by result with the minimum teststatistic value.

• Disease mapping: The model is run once with the determineddispersal parameters to estimate the infection probability for allunsampled hosts.

Optimisation improvement and development of uncertaintyanalysis

Although the usefulness of the original method has been testedin the previous study (Parnell et al., 2011), opportunities exist to

improve its efficiency, incorporate stochasticity in the parameteroptimisation and uncertainty in the predicted maps. Improvementsof the method can be expressed in the following three phases (high-lighted by dashed box in Fig. 1).
Page 3: An improved regulatory sampling method for mapping and representing plant disease from a limited number of samples

7 demics 4 (2012) 68–77

I

ptiw

ttpiFibifttmIdsFtwo

P

ubtwtif

y

ilkofsi

atepdvstdTrebrrra

Fig. 2. Simulated disease data (open circle: unsampled host; solid dark circle: sam-

stochastic process used to select unsampled host to update. Uncer-tainty analysis is accomplished in a simple way by repeating thedisease mapping process with optimal dispersal parameters a large

0 W. Luo et al. / Epi

nitialisationThe initial infection probability assumed for each of the unsam-

led hosts influences the resulting map accuracy and processingime. The original method adopts a simple approach; setting fixednfection probability (0 as default value) for all unsampled host,

hich may be computationally inefficient.In this study, spatial interpolation methods are used to initialise

he infection probabilities for the unsampled hosts more closelyo the converged steady state of the system (i.e. the true spatialattern of disease). On average, values at points close together

n space are more likely to be similar than points further apart.or simple data manipulation, two commonly used deterministicnterpolation methods are applied for initialisation: nearest neigh-our (NN) also known as Thiessen polygons (Thiessen, 1911) and

nverse distance weighting (IDW) (Shepard, 1968). The weightingactor in IDW is inversely proportional to any power of the dis-ance, and the choice of weighting function can significantly affecthe interpolation results (Isaaks and Srivastava, 1989). Three com-

on power parameters 0.5, 1 and 2 are tested in this paper for theDW method. In order to minimise unnecessary computational bur-en, the initialisation methods were compared using a subset of 20amples selected randomly from the dispersal parameters space.or each of those, the method was run until its steady state andhe test statistic (SAE) was calculated. The initialisation methodith the lowest average SAE was then selected for the parameter

ptimisation stage.

arameter optimisationIn the original method, prediction of the disease status of an

nmeasured host considers the status of all the remaining hostsased on the dispersal function. The dispersal function assumeshat each host has a local influence that diminishes exponentiallyith distance. Consequently hosts that are widely separated from

he prediction location have negligible effect on estimating thenfection probability. To avoid unnecessary computational burden,ormula (1) can be rewritten with the extra condition:

i = a∑j /= i

pj exp(−bdij) where dij ≤ log(

�C

a

)/b. (2)

Here �C is the minimum limit set for the distant host, and �Cs fixed as small as 0.001. Hence, distant hosts with contributioness than �C are excluded from the calculation. In addition, if priornowledge of the maximum pathogen transport distance can bebtained, this could be utilised to inform the initial upper thresholdor dij in formula (2). The upper threshold for dij is also used as theearch radius of IDW for initialisation. An infection probability isnitialised as 0 if no neighbour is found within the search radius.

According to the second step of the original method (Fig. 1), stochastic process is involved in selecting the unsampled hosto update. However, this stochastic process itself greatly influ-nces the predicted infection probabilities of unsampled hosts. Theotential problem is illustrated by the simple simulated diseaseata (Fig. 2). Fig. 2 shows a series of labelled hosts at regular inter-als along a one-dimensional field where three of the hosts areampled. Using hypothetical dispersal parameters (a = 0.5, b = 0.2),he original method was repeatedly applied 1000 times to pre-ict the infection probability for each of the unsampled hosts.he SAE at steady state and the corresponding estimated map areecorded for each repetition. The varied performance of all the rep-titions, which range from 1.30 to 1.42 in terms of SAE, is showny the frequencies in Table 1. About 1% of the repetitions produced

esults with SAE relatively larger than 1.35 while the others gaveesults with great similarity. To evaluate the variability of theseepetitions, the best and the worst estimated maps are presentednd compared (Fig. 2). The simulated data has a clearly defined

pled diseased; solid grey circle: sampled healthy) for demonstration of the potentialproblem for original method. The estimated infection probability is showed insidethe squared bracket.

symmetry pattern and the best estimated map preserved thisproperty for the estimates at the unsampled hosts. However, thestructure for the worst estimated map is somewhat irregular anddifficult to define. It shows a striking difference of estimated infec-tion probability for host 3 and 5. One possible explanation is thatthe algorithm was trapped and stopped in a local steady state.

Different permutation/ordering of the unsampled hosts selectedto update can lead to a high degree of uncertainty for the output.Hence, the performance of each dispersal parameter combina-tion is not evaluated sufficiently by a single run of the algorithm.In this study, the algorithm is repeated a reasonable number oftimes, depending on sample size, to investigate the variation anduncertainty associated with the stochastic selection process of themethod. For each dispersal parameter combination, 5% of repe-titions with the largest test statistic are treated as outliers andremoved from the analysis; with the assumption that they aremore likely to have originated from a local steady state optimum.The averaged test statistic from the retained repetitions is usedas a measure of the general performance of the selected dispersalparameter combination.

To find the global best solution of dispersal parameters, theprior method used a simple but exhaustive grid search strategyto enumerate all possible solutions for the dispersal parameterspace. However, this search scheme is computationally inefficient.Instead, we propose a multistage searching approach which is moreflexible and less demanding on computation. We apply a global pre-liminary search with a large grid size to provide crude estimates,and then grid search on a finer mesh centred at the best estimate.This refined searching process continues until no obvious improve-ment is found on the current estimates (or the searching step sizeis reduced to specified tolerance). The greatest precision possiblefor the dispersal parameter can be obtained through this searchingtechnique.

Uncertainty analysisEven with optimised dispersal parameters, the disease map-

ping results have uncertainty, which originates, in part, from the

Table 1The frequency of resulted SAE for the 1000 repetitions.

SAE 1.30 1.31 1.32 1.33 1.34 1.35 1.40 1.41 1.42Frequency 919 38 22 5 4 3 2 5 2

Page 4: An improved regulatory sampling method for mapping and representing plant disease from a limited number of samples

demics 4 (2012) 68–77 71

nttfmfItiae

D

dieba4(tMaicdmc

ataitttmc

wofo(1eetmmKb1lioiaogekdt

W. Luo et al. / Epi

umber of times. For consistency, those 5% repetitions with largestest statistic are removed from the analysis. For the remaining repe-itions, the mean and variance of the estimated infection probabilityor each unsampled host is calculated. Further adaptive manage-

ent options (e.g. advice on further sampling design) can be betterormulated with consideration of uncertainty for disease mapping.t is worthwhile to point out that there exist other sources of uncer-ainty which can have some influence on the mapping results. Thesenclude, amongst others, the sampling design, the measurementsccuracy, or the structure of the simulation model in relation to thepidemic.

ata

The performance of our method was tested against two actualisease data sets: citrus huanglongbing from a commercial plant-

ng in Florida, USA and strawberry powdery mildew from a fieldxperiment in the UK. Huanglongbing (HLB), presumptively causedy Candidatus Liberibactor asiaticus, is spread by a psyllid vector,nd is the most serious and destructive disease of citrus in over0 different Asian, African, South and North American countriesBové, 2006; Gottwald, 2010). HLB is a major new quarantine threato Florida and affects all commercial citrus species (Halbert and

anjunath, 2004). The data were collected by US Sugar Corporations part of an initiative to monitor the temporal and spatial dynam-cs of HLB. Fig. 3a shows the distribution of HLB disease within thehosen block of citrus in south Florida (1235 hosts, the maximumistance between the hosts is approximate 550 m). A full assess-ent of disease status was made for the entire host population and

oordinates of each host were also recorded.Powdery mildew, caused by the fungal pathogen Podosphaera

phanis, is one of the most economically important and destruc-ive diseases affecting cultivated strawberries worldwide (Jordannd Hunter, 1972). Fig. 4a shows a snapshot of powdery mildewn a strawberry field experiment in the UK. For the entire field,wo visual assessments were made weekly for the appearance ofhe sporulating pathogen throughout the growing season. In ordero describe the initial disease spread within the field, the assess-

ent conducted 4 weeks after disease appearance was used for theurrent study.

For both disease data sets, the sample inputs (Figs. 3b and 4b)ere generated by selecting 20% (typical sampling efforts) of the

bservations randomly, and the validation data were constructedrom the remaining observations. To evaluate the relative abilitiesf the two methods to accurately map disease, the mean errorME), mean of absolute error (MAE) and kappa statistic (Cohen,960) were calculated on the validation data, and the predictionrror plots were examined. The ME was used to detect bias: ME isqual to zero if the predictions are completely unbiased, i.e. cen-red on the measurement values. The MAE was used to compare

ethods by examining how closely predictions correspond to theeasured values, the smaller the MAE the better the match. The

appa statistic is used to measure the magnitude of agreementetween classification results. Kappa varies between 0 and 1, with

indicating perfect agreement and 0 indicating agreement equiva-ent to chance. For the calculation of kappa statistic, the estimatednfection probabilities are classified as healthy or infected status (0r 1) using the typical threshold value of 0.5. Our evaluation exam-nes both the predictive performance of the new method, and thebility to estimate the true disease pattern. We therefore comparedutputs from our method with those obtained by a commonly usedeostatistics method. Interpolation maps of the probability of dis-

ase occurrence at the unsampled sites were produced by indicatorriging (IK), which was developed to handle distribution of binaryata (Journel, 1983). In order to identify the possible spatial correla-ion of the sample data, three common models (Spherical, Gaussian

Fig. 3. (a) The completed survey of HLB disease in a chosen block, Florida; (b) thesample input for the new method by selecting 20% of the observations randomly(©: sampled diseased; �: sampled healthy).

and Exponential) were implemented to fit the semivariogram, andthe best fit model was determined with the lowest Akaike infor-mation criterion (Akaike, 1974). In addition, to test the stability ofthe original method, it was run 100 times, with the best and worstresults recorded for comparison.

Results

A full analysis is provided for the HLB disease data, where theresults are represented in an expository way for the settings of thenew method. For brevity, only the key result is summarised for thestrawberry dataset.

HLB disease data

Based on preliminary analysis of the disease and host pattern,we defined an initial dispersal parameter space (a∈(0,5], b∈(0,0.22])which covers the majority of reasonable shapes of disease gradi-ent. Then, we selected 20 dispersal parameter samples (based onthe size of the spatial grid) to compare the different initialisation

methods. The summary statistics of their accuracy and efficiencyare shown in Table 2. The NN initialisation had the lowest meanSAE (93.32), indicating that it was the most accurate method. Therewas an obvious trend related to SAE that decreased with the power
Page 5: An improved regulatory sampling method for mapping and representing plant disease from a limited number of samples

72 W. Luo et al. / Epidemics 4 (2012) 68–77

F ent, Ur

otwhNcmNu

usearncwsgaoSbd3

f

TSmr

ig. 4. (a) The completed survey of strawberry powdery mildew in a field experimandomly (©: sampled diseased; �: sampled healthy).

f IDW, although none of IDW initialisations provided satisfac-ory results (Table 2). Note that unrealistic dispersal parametersill result with the same maximum SAE (equal to the number ofealthy sampled hosts, i.e. 170) for all initialisation methods. TheN initialisation method was also the most computationally effi-ient. Computation time was more than 20% less than the Fixed(0)ethod. Overall, both assessment measures consistently identifiedN as the best method for initialising infection probability for allnsampled hosts.

Following initialisation, the multistage searching approach wassed to identify the optimal dispersal parameters. First, a crude gridearch was performed at 10 × 10 arrayed check sample points withqual distance interval for each dispersal parameter (Fig. 5a). Next,

refined grid search was made over a narrow search range sur-ounding the best point found in the crude search grid (Fig. 5b), untilo further improvements were made. For each dispersal parameterombination, the averaged test statistic of SAE over 100 repetitionsas illustrated by two-dimensional surface plots (Fig. 5). A rea-

onably smooth concave surface of SAE was produced at the cruderid searching step, and the minimum mean SAE (43.16) was foundt dispersal grid parameter, a = 1, b = 0.2. The refined grid searchffered a further 4.5% improvement for SAE, and the minimumAE (41.12) was detected using the dispersal grid parameters a = 1,

= 0.212. These dispersal parameters determined characterised the

ispersal gradient, indicating spread of the HLB pathogen up to2.6 m from infected hosts.

It is instructive to examine how well the determined dispersalunction predicts the infection probability for the unsampled hosts,

able 2ummary statistics on the performance and efficiency of different initialisationethods over the 20 simulated dispersal parameters samples. Here, “Fixed(0)” is

eferred as initialising infection probability of 0 for all unsampled host.

Initialisation method Resulted test statistics (SAE) Total computationtime (s)

Min Max Mean

Fixed(0) 46.81 170 94.65 32NN 46.37 170 93.32 25IDW(0.5) 50.71 170 102.07 43IDW(1) 50.32 170 101.43 39IDW(2) 49.52 170 99.62 36

K; (b) the sample input for the new method by selecting 20% of the observations

given limited sampled data. Using the determined dispersal modelas input parameters, the disease mapping process is repeated 1000times. Fig. 6a and b represents the spatial distribution of mean esti-mated infection probability and the associated uncertainty withthe prediction. HLB disease, which is spatially heterogeneous oftenwith edge effects (Gottwald, 2010), was found not spread evenlythroughout the planting. Disease occurrence at the north and southborders was much greater than in the centre of the field. The map ofinfection probability captured the general pattern of disease spread,but failed to identify isolated infections. The overall standard devi-ation of the predictions is low, with high confidence in predictionsfor the hosts located in the middle of the block. The uncertaintyis determined by not only the number of hosts that are sampledwithin the true dispersal range of the disease, but also the spa-tial structure of the host distribution. For IK, the semivariogramfitted with the spherical model gave the best measure of the spa-tial variability among sampled hosts. The suitable range of spatialdependence was found at 35.1 m. This range of spatial dependenceis very close to the distance of influence determined by the dispersalfunction, but there is a striking difference in the resulting infectionprobability map (Fig. 6c). The IK estimation map also captured thebroad disease distribution, but a much wider disease distributionpattern was estimated, in particular, the north area of the block.In addition, isolated infected sample points exert an influence inall directions, which diminishes evenly, leading to characteristic‘hot spot’ patterns in the estimated infection map. Apart from cre-ating probability maps from categorical samples, IK also providesestimation errors to measure the interpolation uncertainty. Com-pared with the new method, the standard deviations for IK had aslightly larger range (Fig. 6d). In general, the standard deviationsfor IK were smallest in the neighbourhood of the more intenselysampled hosts, and largest for the areas with the least samples. Itwas also large in some regions near the edge, i.e. northwest of theblock.

ME, MAE and kappa statistic were used to examine the qualityof all the estimated infection maps including the best and worst

outputs from original epidemiological model (Table 3). Of all themethods, the ME for all unsampled hosts were generally small,with IK resulting in the smallest. However, the advantage of MEunbiased property for IK interpolation comes at a high cost, i.e.,
Page 6: An improved regulatory sampling method for mapping and representing plant disease from a limited number of samples

W. Luo et al. / Epidemics 4 (2012) 68–77 73

F de grs

tomiswi

Fa

ig. 5. Multistage searching approach for dispersal parameter optimisation. (a) Cruearch at a smaller step size (0.1 and 0.004 for a and b, respectively).

he infection probability of healthy unsampled hosts was seri-usly overestimated. In terms of MAE and kappa statistic, our newethod showed clear superiority over the other methods. The orig-

nal method yielded inconsistent results for each run with a kappa

tatistic ranging from 0.471 to 0.537. The best-selected outputas equivalent to the new method, while the worst output lies

n between the new method and IK.

ig. 6. The estimated probability of occurrence of HLB disease (a) and associated uncertalso used to estimate the infection probability (c) of HLB disease and standard deviation (

id search at large step size (0.5 and 0.02 for a and b, respectively); (b) Refined grid

Powdery mildew data

The strawberry powdery mildew data were collected at asmaller spatial scale than those for HLB. The distances between

any two strawberry hosts range between 0.3 m for the two clos-est hosts and 12.3 m for the two hosts furthest apart. To analysedisease spread within this considerably smaller distance range, the

inty of prediction (b) from the new method. For comparison, indictor kriging wasd) of prediction error.

Page 7: An improved regulatory sampling method for mapping and representing plant disease from a limited number of samples

74 W. Luo et al. / Epidemics 4 (2012) 68–77

Table 3Evaluation result of ME, MAE and kappa statistic for all estimated infection maps onunsampled HLB hosts.

Method ME MAE Kappa statistic

Indicator kriging 0.023 0.295 0.413New method −0.067 0.219 0.542

desNdtiwumFam(at

Faia

Table 4Evaluation result of ME, MAE and kappa statistic for all estimated infection maps onunsampled strawberry powdery mildew hosts.

Method ME MAE Kappa statistic

Indicator kriging 0.031 0.275 0.534New method −0.051 0.213 0.611

Best output (original method) 0.068 0.223 0.537Worst output (original method) −0.080 0.268 0.471

ispersal parameters space was set with greater flexibility for thexponential power (a∈(0,5], b∈(0,10]) to include most reasonablehapes of disease gradient for strawberry powdery mildew. Again,N was identified as the best initialisation method, and the bestispersal parameters (a = 4.9, b = 6.2) were determined by the mul-istage grid search procedure. The optimal dispersal parametersndicated that an infected host only influenced surrounding hosts

ithin 1.37 m, suggesting that across-row disease spread was verynlikely. Fig. 7a and b shows the estimated infection probabilityap of strawberry powdery mildew and its estimated uncertainty.

or visual comparison, the best interpolated map of infection prob-bility for powdery mildew (Fig. 7c) was also generated via the IK

ethod, as well as the best estimation error standard deviation

Fig. 7d). Compared with IK, our new method generated a moreccurate infection probability map that well represented the spa-ial features of the disease pattern. Although both methods were

ig. 7. The estimated probability of occurrence of strawberry powdery mildew (a)nd associated uncertainty of prediction (b) from the new method. For comparison,ndictor kriging was also used to estimate the infection probability (c) of HLB diseasend standard deviation (d) of prediction error.

Best output (original method) −0.053 0.220 0.596Worst output (original method) −0.046 0.262 0.552

affected by isolated infection points, IK was substantially poorerwith serious over-estimation around those points (e.g. middle areaof second row and top area of third row). For the new method, thestandard deviation of infection prediction error ranged from 0 to0.35, which were much smaller than those for IK. In particular, thestandard deviations remained small over most areas, but increasedrapidly near the edge of each disease cluster. The bias on estima-tion of infection probability for unsampled hosts is relatively smallfor all methods, with IK giving the least (Table 4). For MAE andthe kappa statistic, our method was ranked the most accurate inestimation of infection probability.

Discussion

Predicting plant disease risk at an early stage of an epidemic isdifficult because of the limited spatial distribution usually avail-able, but nevertheless is important and helpful for applicationsin a plant disease management (Nelson et al., 1999). The devel-opment of modelling techniques of spatially referenced sampleshas provided the opportunity to obtain enough information tocharacterise disease epidemics spatially at an early stage and incor-porate this information into management programs at a range ofspatial scales. For instance, Gibson (1997a,b) recently developeda novel approach on spatio-temporal analysis of spatially refer-enced diseased plants when a sequence of disease maps is available.Although our approach only focuses explicitly on the spatial aspectof the epidemic, the aim of optimising parameters from a diseasedispersal function is similar to the approach developed by Gibson(1997a,b). Whereas that approach relies on Markov Chain MonteCarlo to maximise the likelihood of transitions between states, ourapproach concentrates on the disease spread process between dif-ferent hosts within one temporal state. Similarly to the approach byGibson (1997a,b), our improved method accounts for both the char-acteristics of sampled disease pattern and the spatially distributedhost locations. Apart from the capability to detect and quantify thepathogen dispersal gradient, the model can also be used to estimateinfection probabilities for unsampled hosts. This allows the spreadof a pathogen to be visualised even at the early stage of an epidemic,provided an adequate sample size of observations can be applied.We first discuss some important methodological steps (Fig. 1) builtinto the improved method, then summarise their performanceson disease modelling, and point out the potential applications fordisease control or future sampling design.

Methodological steps

The method described by Parnell et al. (2009) has been improvedto (i) better initialise the disease status of unsampled hosts; (ii)speed up the optimisation process; and (iii) provide uncertainty forthe disease estimates. These improvements represent an importantstep in facilitating the implementation of this type of mechanisticapproach to disease mapping in the field.

Correct initialisation of the unsampled host status can signif-icantly reduce the number of steps and thus computation timerequired to reach the steady state and is therefore important infacilitating the application of the method in practice. Two spatial

Page 8: An improved regulatory sampling method for mapping and representing plant disease from a limited number of samples

demic

ite

trmltpedoapiocnseptat(

otbteacsteo

aprmdeuicbcssThwpdp

P

mepb

W. Luo et al. / Epi

nterpolation methods (NN and IDW) were proposed for initialisa-ion, and NN was identified as most appropriate for both data setsxamined.

During the optimisation process, in order to improve computa-ional efficiency, a realistic constraint was imposed on the searchadius used to update the disease probability of unsampled hosts, asore distant hosts are unlikely to contribute significantly to that

ikelihood. Further, the exhaustive grid search algorithm used inhe original method is computationally expensive because the dis-ersal kernel must be evaluated at many points within the grid forach parameter. The local optimisation method, such as steepestecent has been used widely to speed up optimisation in vari-us applications (Rosenbrock, 1960). However, the success of such

scheme is heavily dependent upon the choice of the startingoint, and often gives locally optimal solutions of varying qual-

ty. Although slightly more computationally demanding than localptimisation, the improved method seeks a global solution of aonstrained optimisation model by using a multistage search tech-ique. The performance of a multistage search technique is lessensitive on the distribution of the starting points, so it is especiallyffective in epidemiological modelling situations where initial dis-ersal parameters are difficult to determine. The multistage searchechnique also guarantees a better solution in the subsequent iter-tion, and is likely to find the global optimal solution given thathe crude search identifies the attraction basin of the true solutionSirola, 2006).

Uncertainty analysis provides a basis to judge the reliabilityf the disease estimates when making decisions. Depending onhe amount of uncertainty in the results, it can assist the decisionetween the need for early action for quarantine precautions andhe collection of additional data. The improved method provides anstimated standard deviation of unsampled hosts (Figs. 6b and 7b),

benefit in addressing the uncertainty caused by the stochastic pro-ess in selecting unsampled hosts to update. Despite the additionalteps needed to estimate the uncertainty of the model predictions,he improved method was still found to be more computationallyfficient than the original method and also allowed for larger setsf data to be analysed.

It is important to note that the sampling design can cause a largemount of additional uncertainty in the disease predictions. Therediction obtained in this study is the result of one realisation ofandom sampling. Thus, the information on the spatial arrange-ent of infected hosts obtained from such a sampling explicitly

epends on the locations of the sample points. Inferences about dis-ase pattern can change if different sampling locations are used. Thender-estimation of our method for strawberry powdery mildew

n the top right corner of the field (Fig. 7a) is mainly because, byhance, all the infected hosts in that area were unsampled. A num-er of procedures for sampling design (such as random sampling,luster sampling, sampling placed on a regular grid and systematictratified sampling) have been compared and evaluated in previoustudies (Hughes and Madden, 1996; Kish, 1995; Ferrandino, 2004).hese show that the performance of various sampling schemes isighly sensitive to the spatial structure of the disease. Empirically,here spatial structure was highly pronounced, a small size sam-ling scheme gave equivalent accuracy on prediction to a moreetailed sampling scheme (Parnell et al., 2011). For complex diseaseatterns, the extra efforts for detailed sampling are worthwhile.

erformance and applications

The optimal dispersal parameters were determined from the

ultistage search procedure (Fig. 5). This method provided an

xplicit graphical representation of the performance of each dis-ersal parameter combination, allowing the prediction accuracy toe investigated systematically. The form of the dispersal function

s 4 (2012) 68–77 75

and its parameters can be used to characterise how far the pathogencan spread from an infected host, and how strong the effect is;information that provides clues to underlying processes of diseasespread. Due to different pathosystems, the transmission of HLB viainfective citrus psyllids (Halbert and Manjunath, 2004; Gottwald,2010) travels a lot further than pathogen P. aphanis, causing pow-dery mildew of strawberry. Knowledge of dispersal distance canbe important for informing disease management (Gregory, 1968),because epidemics driven by long-range dispersal may be far moredifficult to control than those driven by short-range dispersal.The increasing understanding of dispersal parameters enables usto define epidemiological strategies to minimise disease spreadaccordingly (McCartney and Fitt, 1998). Most such strategies aimto reduce dispersal parameter a, that is the strength of the infection(Minogue, 1986). This relies on applications of protectant/systemicfungicides or eradication of the sources of initial inoculum to reducedisease intensity (Berger, 1977). Dispersal distance could be usedto implement appropriate planting strategies. For pathogens trav-elling only short distances from the source, such as strawberrypowdery mildew, choosing appropriate plant spacing or row pat-tern is an effective means to preventing disease spread in the field(Legard et al., 2000). In combination with the effect of the environ-ment on monocyclic processes in P. aphanis, better knowledge ofdisease dispersal can be obtained to design efficient managementmethods for powdery mildew of strawberry (Sombardier et al.,2009).

The precision of estimated disease incidence depends not onlyon the quality of the dispersal function determined, but also onthe density and distribution of the host population. Figs. 6a and 7aassessed how effectively the improved method recovered theobserved spatial distribution pattern using a sample of limited size.It is interesting to note that, because of the way it is derived (unbi-ased estimate), the IK has the lowest ME. On the other hand, theIK scored poorly for both the MAE and Kappa statistics and there-fore, the improved method was overall more accurate than the IK,not only in predicting areas showing regular and dense diseasepattern, but also for areas where the disease pattern was irregu-lar and less pronounced. From a theoretical point of view, IK doesnot explicitly account for the influence of the host distribution pat-tern, whereas our proposed method was constructed to reflect themechanisms of disease spread. In addition, our method requiresa less stringent assumption of stationarity for spatial autocorre-lation compared with kriging interpolation techniques and couldtherefore be developed to analyse spatial disease epidemics inmore general situations. Similarly to the optimal dispersal param-eters, the estimated risk map can be used to formulate strategiesfor disease control. Based on estimation of disease risk immedi-ately for surrounding areas, mitigation of disease risk could beapplied by either destroying alternate host plants near fields orchanging time and location of planting. The comparison statistics(Tables 3 and 4) further confirmed that the improved method wascapable of producing estimates of greater ‘precision’ for unsampledhosts than any other tested methods in this study. In particular, ourproposed method appears to more closely represent the subpopu-lation of healthy unsampled hosts, while IK greatly overestimateddisease spread. In comparison, the original epidemiological modelhad inconsistent performance between runs, and such degree ofuncertainty can create a risk of inappropriate action whenever asuboptimal run is inadvertently chosen as a base for decision mak-ing. Isolated diseased hosts are evident in both data sets, and theimproved method did not predict these points, because they aredistributed at random. It is difficult to explain the sudden appear-

ances of infected hosts in places far from areas of known pathogenestablishment. It is possibly caused by alternative pathways ofpathogen dispersal, i.e. dispersal mechanism, such as animal orhuman mediated dispersal, which could lead to long-distance
Page 9: An improved regulatory sampling method for mapping and representing plant disease from a limited number of samples

7 demic

pldtac

C

tcoarwtotdofo

tmoaFddscagmaaebv

A

lm

R

A

A

A

B

B

C

C

C

6 W. Luo et al. / Epi

hysical transport and establishment of pathogen in an unexpectedocation. Clearly, even a sound epidemiological model of pathogenispersal would have difficulties in predicting the isolated infec-ion sites that occur due to random processes, especially when theyre very rare (Aylor, 2003), unless the mechanism is understood,haracterised, and incorporated into the model.

onclusion and extension

In summary, the improved method is able to characterisehe appropriate dispersal gradient and predict infection risk suc-essfully for unsampled hosts without reference to the sourcef inoculum. It can help decision makers understand the spatialspects of disease processes, and formulate disease control and/oregulatory actions accordingly to handle disease in an appropriateay. Although the improved method was focused exclusively on

he binary epidemic data in this study, it could be applied directly tother continuous percentage data (disease severity) or other quan-itative measures of disease such as counts data. In addition, sinceispersal is a universal phenomenon in ecology, work on the spreadf plant pathogens from our method may form the basis of studiesor a variety of other systems of practical interest, including spreadf arthropod pests, weeds and invasive animal species.

In the current study only an exponential model has been usedo fit the dispersal gradient. However, the nature of the gradient

ay be different for different plant pathogen systems, dependingn factors such as the mechanism of dispersal, the host structurend local environmental conditions (Aylor, 1999; McCartney anditt, 1985). In order to better characterise the shapes of the gra-ients, other dispersal functions (i.e. modified power law, Cauchyistribution) with non-exponentially bounded tails could be con-idered in the future and the sensitivity of the approach to thehoice of the dispersal gradient could be investigated further. Also,ttention has been restricted to pathogen dispersal that is homo-eneous across the entire area. In reality, the pathogen dispersalay vary in response to different hosts (i.e. susceptible or resistant)

nd anisotropic weather conditions (i.e. disease spread is associ-ted with wind direction (Sackett and Mundt, 2005; Willocquett al., 2008)). Further improvement of disease spread mapping cane achieved by incorporating more complex interaction of theseariables into models of dispersal.

cknowledgments

The work was supported by Defra and USDA. The authors wouldike to thank Dr. Andrew G.S. Cuthbertson (Fera) and two anony-

ous reviewers for useful comments on the structure of the paper.

eferences

kaike, H., 1974. A new look at the statistical model identification. IEEE Transactionson Automatic Control 19, 716–723.

ylor, D.E., 1999. Biophysical scaling and the passive dispersal of fungus spores:Relationship to integrated pest management strategies. Agricultural and ForestMeteorology 97, 275–292.

ylor, D.E., 2003. Spread of plant disease on a continental scale: role of aerial dis-persal of pathogens. Ecology 84, 1989–1997.

erger, R.D., 1977. Application of epidemiological principles to achieve plant diseasecontrol. Annual Review of Phytopathology 15, 165–181.

ové, J.M., 2006. Huanglongbing: a destructive, newly-emerging, century-old dis-ease of citrus. Journal of Plant Pathology 88, 7–37.

araco, T., Duryea, M.C., Glavanakov, S., Maniatty, W., Szymanski, B.K., 2001. Hostspatial heterogeneity and the spread of vector-borne infection. Theoretical Pop-ulation Biology 59, 185–206.

hellemi, D.O., Rohrbach, K.G., Yost, R.S., Sonoda, R.M., 1988. Analysis of the spatialpattern of plant pathogens and diseased plants using geostatistics. Phytopathol-ogy 78, 221–226.

ohen, J., 1960. A coefficient of agreement for nominal scales. Educational and Psy-chological Measurements 20, 37–46.

s 4 (2012) 68–77

Efron, B., Gong, G., 1983. A leisurely look at the bootstrap, the jackknife, and cross-validation. The American Statistician 37, 36–48.

Ferrandino, F.J., 1993. Dispersive epidemic waves: I. Focus expansion within a linearplanting. Phytopathology 83, 795–802.

Ferrandino, F.J., 2004. Measuring spatial aggregation in binary epidemics: Correla-tive analysis and the advantage of fractal-based sampling. Phytopathology 94,1215–1227.

Frantzen, J., van den Bosch, F., 2000. Spread of organisms: can travelling and disper-sive waves be distinguished? Basic and Applied Ecology 1, 83–91.

Gibson, G.J, 1997a. Investigating mechanisms of spatiotemporal epidemic spreadusing stochastic models. Phytopathology 87, 139–146.

Gibson, G.J., 1997b. Markov chain Monte Carlo methods for fitting spatiotemporalepidemic stochastic models in plant pathology. Journal of the Royal StatisticalSociety Series C 46, 215–233.

Gosme, M., Lucas, P., 2009. Disease spread across multiple scales in a spatial hierar-chy: effect of host spatial structure, and of inoculum quantity and repartition.Phytopathology 99, 833–839.

Gottwald, T.R., 2010. Current epidemiological understanding of citrus huanglong-bing. Annual Review of Phytopathology 48, 119–139.

Gregory, P.H., 1968. Interpreting plant disease dispersal gradients. Annual Reviewof Phytopathology 6, 189–212.

Halbert, S.E., Manjunath, K.L., 2004. Asian citrus psyllids (Sternorrhyncha: Psyllidae)and greening disease of citrus: A literature review and assessment of risk inFlorida. Florida Entomologist 87, 330–353.

Harrison, B.D., 1981. Plant virus ecology: ingredients, interactions and environmen-tal influences. Annals of Applied Biology 99, 195–209.

Hughes, G., Madden, L.V., 1996. Cluster sampling for disease incidence data. Phy-topathology 86, 132–137.

Isaaks, E.H., Srivastava, R.M., 1989. An Introduction to Applied Geostatistics. OxfordUniversity Press, New York.

Jordan, V.W.L., Hunter, T., 1972. The effects of glass cloche and coloured polyethyl-ene tunnels on microclimate, growth, yield and disease severity of strawberryplants. The Journal of Horticultural Science and Biotechnology 47, 419–426.

Journel, A.G., 1983. Nonparametric estimation of spatial distributions. MathematicalGeology 15, 445–468.

Kish, L., 1995. Survey Sampling. John Wiley and Sons, Inc., New York.Krige, D.G., 1966. Two-dimensional weighted average trend surfaces for ore-

evaluation. Journal of the South African Institute of Mining and Metallurgy 66,13–38.

Legard, D.E., Xiao, C.L., Mertely, J.C., Chandler, C.K., 2000. Effects of plant spacing andcultivar on incidence of Botrytis fruit rot in annual strawberry. Plant Disease 84,531–538.

Magarey, R.D., Sutton, T.B., 2007. How to create and deploy infection models forplant pathogens. In: Ciancio, A., Mukerji, K.G. (Eds.), Integrated Management ofPlant Pests and Diseases, Vol. I. General Concepts in Integrated Pest and DiseaseManagement. Springer, Netherlands, pp. 3–26.

McCartney, H.A, Fitt, B.D.L., 1985. Construction of dispersal models. In: Gilligan,C.A. (Ed.), Advances in Plant Pathology, Vol. 3. Mathematical Modelling of CropDisease. Academic Press, London, pp. 107–143.

McCartney, H.A., Fitt, B.D.L., 1998. Dispersal of foliar fungal plant pathogens: mech-anisms, gradients and spatial patterns. In: Jones, D.G. (Ed.), Plant DiseaseEpidemiology. Kluwer Academic Publishers, London, pp. 138–160.

Minogue, K.P., 1986. Disease gradients and the spread of disease. In: Leonard, K.J.,Fry, W.E. (Eds.), Plant Disease Epidemiology: Population Dynamics and Manage-ment, vol. 1. Macmillan Publication, New York, pp. 285–310.

Myers, D.E., 1991. Interpolation and estimation with spatially located data. Chemo-metrics and Intelligent Laboratory Systems 11, 209–228.

Nelson, M.R., Orum, T.V., Jaime-Garcia, R., 1999. Application of geographic informa-tion systems and geostatistics in plant disease epidemiology and management.Plant Disease 83, 308–319.

Parnell, S., Gottwald, T.R., Irey, M.S., van den Bosch, F., 2009. A numerical opti-mization method to estimate the spatial distribution of an epidemic. Journalof Agricultural Science 147, 731–742.

Parnell, S., Gottwald, T.R., Irey, M.S., Luo, W., van den Bosch, F., 2011. A stochasticoptimisation method to estimate the spatial distribution of a pathogen from asample. Phytopathology 101, 1184–1190.

Rosenbrock, H.H., 1960. An automatic method for finding the greatest or least valueof a function. Computer Journal 3, 175–184.

Sackett, K.E., Mundt, C.C., 2005. The effects of dispersal gradient and pathogen lifecycle components on epidemic velocity in computer simulations. Phytopathol-ogy 95, 992–1000.

Shepard, D., 1968. A two dimensional interpolation function for irregularlyspaced data. In: Proceedings of the 23rd National Conference ACM, pp. 517–523.

Sombardier, A., Savary, S., Blancard, D., Jolivet, J., Willocquet, L., 2009. Effects ofleaf surface and temperature on monocyclic processes in Podosphaera aphanis,causing powdery mildew of strawberry. Canadian Journal of Plant Pathology 31,439–448.

Sirola, N., 2006. Exhaustive global grid search in computing receiver position frommodular satellite range measurements. Journal of Physics: Conference Series 52,

73–82.

Skelsey, P., Rossing, W.A.H., Kessel, G.J.T., van der Werf, W., 2009. Scenario approachfor assessing the utility of dispersal information in decision support for aeriallyspread plant pathogens, applied to Phytophthora infestans. Phytopathology 99,887–895.

Page 10: An improved regulatory sampling method for mapping and representing plant disease from a limited number of samples

demic

S

T

W

ysis of lettuce downy mildew using geostatistics and Geographic InformationSystems. Phytopathology 91, 134–142.

W. Luo et al. / Epi

tein, A., Kocks, C.G., Zadoks, J.C., Frinking, H.D., Ruissen, M.A., Myers, D.E., 1994. Ageostatistical analysis of the spatiotemporal development of the downy mildewepidemics in cabbage. Phytopathology 84, 1227–1239.

hiessen, A.H., 1911. Precipitation averages for large areas. Monthly Weather Review39, 1082–1084.

illocquet, L., Sombardier, A., Blancard, D., Jolivet, J., Savary, S., 2008. Spore dispersaland disease gradients in strawberry powdery mildew. Canadian Journal of PlantPathology 30, 434–441.

s 4 (2012) 68–77 77

Wu, B.M., van Bruggen, A.H.C., Subbarao, K.V., Pennings, G.G.H., 2001. Spatial anal-

Xu, X.M., Ridout, M.S., 1998. Effects of initial epidemic conditions, sporulation rate,and spore dispersal gradient on the spatio-temporal dynamics of plant diseaseepidemics. Phytopathology 88, 1000–1012.