[ieee 2014 third international conference on agro-geoinformatics - beijing, china...

5
Forecasting of Powdery Mildew disease with multi- sources of remote sensing information Jingcheng Zhang 1,2,3,4 , Lin Yuan 1,4 , Chenwei Nie 1 , Liguang Wei 1 , Guijun Yang 1,2,3* 1 Research center for Information Technology in Agriculture, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China 3 Key Laboratory for Information Technologies in Agriculture, the Ministry of Agriculture, 100097, P.R.China [email protected] 2 National Engineering Research Center for Information Technology in Agriculture, Beijing, 100097, P.R.China 4 Institute of Agriculture Remote Sensing and Information System Application, Zhejiang University, Hangzhou 310029, China [email protected] Abstract—Powdery mildew (PM) is a typical disease in winter wheat which causes severe yield loss in China. To control the disease effectively, it is important to develop a disease forecasting model at a regional scale. In this study, the remotely sensed data that reflect crop vigor and habitat traits were adopted as candidate inputs in model development, including various vegetation indices, land surface temperature and plant’s drought index. Based upon a correlation analysis, a total of 9 remotely sensed variables at specific growing stages that had significant response to PM were identified as explanatory variables. To assess the ground truth of PM occurrence, a field campaign was conducted in suburban area of Beijing in 2010. According to the remote sensing data and corresponding ground truth data, the PM forecasting model was established in terms of the logistic regression analysis. The validation result showed that the disease risk map could reflect the general spatial distribution pattern of PM occurrence in the study area, with an overall accuracy of 72%. To facilitate the disease control practices, the map of disease probability was converted to a binary map (presence/absence) using a thresholding method. The potential of remote sensing information in PM forecasting is illustrated in this study. Keywords—Powdery mildew; Winter wheat; Land surface temperature; Vegetation index; Logistic regression I. INTRODUCTION As there is 10-16% of the global harvest is lost due to crop diseases, besides the breeding of disease cultivars of crops, the forecasting model of diseases is of great importance in guiding the spray practices, which is the dominant way to prevent disease epidemic situation. Papastamati et al. (2002) presented a novel model in predicting the daily progress of light leaf spot epidemics on winter oilseed rape by associating the inoculum concentrations with temperature and rain duration [1]. Nutter et al. (2002) established a spatial forecasting system to predict the corn Stewart’s disease occurrence in the US based on geospatially-referenced disease and weather data [2]. Several more disease forecasting models are established aiming at different types of crops, e.g., bean, potato, corn, sorghum [3,4]. Given most disease forecasting models are driven by meteorological data, their predicting capability would be strongly influenced by density and representativeness of meteorological stations. To account for this imperfection, remote sensing observation that is able to provide spatially continuous information thus offers a chance to develop a more comprehensive disease forecasting model. As for vegetation remote sensing, satellite images are usually used to map both the plant’s biophysical status and its habitat traits. The visible and near infrared (NIR) bands have sound correlation with some important biochemical (e.g., pigments concentration) and biophysical parameters (leaf area index and biomass). Such spectral information could be used to retrieve and map these biochemical and biophysical parameters. Moreover, the thermal band (TIR) of satellite image could be used to retrieve the land surface temperature (LST), which is an important indicator of plant’s habitat trait [5]. Since the susceptibility of plants to disease is closely associated with both the plant’s vigor and its habitat traits, it is possible to determine the suitability of disease development based upon these variables. As a severe disease of winter wheat, powdery mildew (PM) endangers both yield and grain quality of wheat worldwide. To assist disease control, in this study, we attempted to use multi-spectral satellite images for developing a disease forecasting model. A routinely operational satellite data with high revisit frequency (<4 days), the environment and disaster reduction small satellites (HuanJing-1A/B) data, was adopted as remote sensing data. The objectives of this study were: 1. to identify remotely sensed features for PM forecasting; 2. to develop the PM forecasting model based on Logistic regression analysis; and 3. to discuss the implementation of the PM forecasting model in practice. II. EXPERIMENTS AND DATA PROCESSING A. Study Site The study site is located on the north of the North China Plain, which was at a suburban area in Beijing, China (39.78º N, 116.73º E, Fig. 1). During October through next June, the PM is a frequent disease to winter wheat in this area given the warm and humid climate condition. Given the fungicide spray This work was subsidized by National Natural Science Foundation of China (41271412), Beijing Natural Science Foundation (4132029) and Prior Sci-Tech Program for Scientific Activity of Overseas staff.

Upload: guijun

Post on 27-Mar-2017

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: [IEEE 2014 Third International Conference on Agro-Geoinformatics - Beijing, China (2014.8.11-2014.8.14)] 2014 The Third International Conference on Agro-Geoinformatics - Forecasting

Forecasting of Powdery Mildew disease with multi-sources of remote sensing information

Jingcheng Zhang 1,2,3,4, Lin Yuan 1,4, Chenwei Nie 1, Liguang Wei 1, Guijun Yang 1,2,3*

1 Research center for Information Technology in Agriculture, Beijing Academy of Agriculture and Forestry

Sciences, Beijing 100097, China

3 Key Laboratory for Information Technologies in Agriculture, the Ministry of Agriculture, 100097,

P.R.China [email protected]

2 National Engineering Research Center for Information Technology in Agriculture, Beijing, 100097, P.R.China

4 Institute of Agriculture Remote Sensing and Information System Application, Zhejiang University, Hangzhou 310029, China

[email protected]

Abstract—Powdery mildew (PM) is a typical disease in winter

wheat which causes severe yield loss in China. To control the disease effectively, it is important to develop a disease forecasting model at a regional scale. In this study, the remotely sensed data that reflect crop vigor and habitat traits were adopted as candidate inputs in model development, including various vegetation indices, land surface temperature and plant’s drought index. Based upon a correlation analysis, a total of 9 remotely sensed variables at specific growing stages that had significant response to PM were identified as explanatory variables. To assess the ground truth of PM occurrence, a field campaign was conducted in suburban area of Beijing in 2010. According to the remote sensing data and corresponding ground truth data, the PM forecasting model was established in terms of the logistic regression analysis. The validation result showed that the disease risk map could reflect the general spatial distribution pattern of PM occurrence in the study area, with an overall accuracy of 72%. To facilitate the disease control practices, the map of disease probability was converted to a binary map (presence/absence) using a thresholding method. The potential of remote sensing information in PM forecasting is illustrated in this study.

Keywords—Powdery mildew; Winter wheat; Land surface temperature; Vegetation index; Logistic regression

I. INTRODUCTION As there is 10-16% of the global harvest is lost due to crop

diseases, besides the breeding of disease cultivars of crops, the forecasting model of diseases is of great importance in guiding the spray practices, which is the dominant way to prevent disease epidemic situation. Papastamati et al. (2002) presented a novel model in predicting the daily progress of light leaf spot epidemics on winter oilseed rape by associating the inoculum concentrations with temperature and rain duration [1]. Nutter et al. (2002) established a spatial forecasting system to predict the corn Stewart’s disease occurrence in the US based on geospatially-referenced disease and weather data [2]. Several more disease forecasting models are established aiming at different types of crops, e.g., bean, potato, corn, sorghum [3,4].

Given most disease forecasting models are driven by meteorological data, their predicting capability would be strongly influenced by density and representativeness of meteorological stations. To account for this imperfection, remote sensing observation that is able to provide spatially continuous information thus offers a chance to develop a more comprehensive disease forecasting model. As for vegetation remote sensing, satellite images are usually used to map both the plant’s biophysical status and its habitat traits. The visible and near infrared (NIR) bands have sound correlation with some important biochemical (e.g., pigments concentration) and biophysical parameters (leaf area index and biomass). Such spectral information could be used to retrieve and map these biochemical and biophysical parameters. Moreover, the thermal band (TIR) of satellite image could be used to retrieve the land surface temperature (LST), which is an important indicator of plant’s habitat trait [5]. Since the susceptibility of plants to disease is closely associated with both the plant’s vigor and its habitat traits, it is possible to determine the suitability of disease development based upon these variables.

As a severe disease of winter wheat, powdery mildew (PM) endangers both yield and grain quality of wheat worldwide. To assist disease control, in this study, we attempted to use multi-spectral satellite images for developing a disease forecasting model. A routinely operational satellite data with high revisit frequency (<4 days), the environment and disaster reduction small satellites (HuanJing-1A/B) data, was adopted as remote sensing data. The objectives of this study were: 1. to identify remotely sensed features for PM forecasting; 2. to develop the PM forecasting model based on Logistic regression analysis; and 3. to discuss the implementation of the PM forecasting model in practice.

II. EXPERIMENTS AND DATA PROCESSING

A. Study Site The study site is located on the north of the North China

Plain, which was at a suburban area in Beijing, China (39.78º N, 116.73º E, Fig. 1). During October through next June, the PM is a frequent disease to winter wheat in this area given the warm and humid climate condition. Given the fungicide spray

This work was subsidized by National Natural Science Foundation ofChina (41271412), Beijing Natural Science Foundation (4132029) and PriorSci-Tech Program for Scientific Activity of Overseas staff.

Page 2: [IEEE 2014 Third International Conference on Agro-Geoinformatics - Beijing, China (2014.8.11-2014.8.14)] 2014 The Third International Conference on Agro-Geoinformatics - Forecasting

had best to be implemented at the booting stage, to better assistant the disease prevention practice, we only include the remotely sensed data and meteorological data no later than the booting stage.

B. Datasets and data preprocessing (1) Remote Sensing data

A total of 5 HJ-CCD images and 3 HJ-IRS images were acquired at three phases in 2010 at late tillering stage (May, 1), jointing stage (May, 13) and early grain filling stage (May, 20). The dates and the path/row numbers for used scenes were summarized in Table 1. We ensured that all used HJ-CCD and HJ-IRS images reached the cloud-free standard. Prior to the feature extraction, both the HJ-CCD and HJ-IRS data were preprocessed with radiometric calibration, atmospheric correction and geometric correction. To avoid the possible spectral confusion from objects other than winter wheat, the forecast of PM is only implemented on planting area of winter wheat, which is extracted from HJ-CCD images with a decision tree classification.

Fig. 1. Map of study area for powdery mildew forecasting

TABLE I. MAIN PARAMETERS OF 5 RADARSAT-2 IMAGES

Dataset Year Path (P) and Row (R) numbers

May 1 (P1) May 13 (P2) May 20 (P3)

HJ-CCD 2010 P456, R64 P1, R64

P457, R68 P456, R68 P1, R68

HJ-IRS 2010 P2, R63 P4, R63 P4, R63

(2) Field campaign of disease infection

For field campaign of disease occurrence, a total of 90 points were surveyed in 2010. The PM outbroke in field in this year. We investigated not only the occurrence of disease, but also their severities. The disease index (DI) was used to indicate the disease severity [6]. The field survey was carried out in the middle of grain filling stage (May, 25), when the symptoms of PM became identifiable in field. The entire datasets were randomly splitted into 60% versus 40% in each year for model calibration and validation, respectively.

C. Extraction of remotely sensed features To establish the PM forecasting model at a regional scale,

three forms of remote sensing data were used to reflect

biophysical conditions of plants as well as condition of their surrounding habitat.

(1) Vegetation indices for reflecting plants’ vigor

Given that the plant’s vigor is associated with the occurrence of PM, a total of 13 vegetation indices were selected, including 4 bands’ original reflectance and 9 vegetation indices (VIs) that can reflect the biochemical and biophysical status of plants (Table 2). All these VIs at 3 different phases were used as candidate input variables. An identification process will be implemented to identify the VIs/phases that are most sensitive to PM for model establishment.

TABLE II. DEFINITIONS OF A SET OF SPECTRAL FEATURES USED IN THIS STUDY

Spectral features Definition Formular/Definition

RB

Original reflectance of each band of HJ-CCD

blue band

RG green band

RR red band

RNIR near-infrared band

SR Simple ratio RNIR/RR

NDVI Normalized difference vegetation index (RNIR-RR)/(RNIR+RR)

GNDVI Green normalized difference vegetation index

(RNIR-RG)/(RNIR+RG)

TVI Triangular vegetation index 0.5[120(RNIR-RG)-200(RR-RG)]

SAVI Soil adjusted vegetation index

(1+L)*(RNIR-RR)/(RNIR+RR+L); L=0.5

OSAVI Optimized soil adjusted vegetation index

1.16*(c)/(RNIR+RR+0.16)

MSR Modified simple ratio (RNIR/RR-1)/((RNIR/RR)0.5+1)

NLI Non-linear vegetation index (RNIR

2-RR)/ (RNIR2+RR)

RDVI Re-normalized difference vegetation index

(RNIR-RR)/ (RNIR+RR)0.5

(2) Land surface temperature for delineating plants’ habitat characteristics

Given that the LST is closely related with respiration and evapotranspiration of plants, it is an efficient indicator of habitat characteristics in vegetated areas [5]. In our study, the LST was calculated from the calibrated emissivity of TIR band using a single-channel Method [7]. The LST in all 3 phases are included as input variables, given the important physical association between the LST and the disease occurrence.

(3) Perpendicular Drought Index for indicating plants’ water content

Considering the occurrence of PM is also related with the plants’ water content [8], the perpendicular drought index (PDI) was adopted to reflect this information in this study. The PDI proposed by Ghulam et al. (2007) for drought monitoring which takes the advantage of the reflective and absorptive features of the canopy and bare soils in the NIR and Red

Page 3: [IEEE 2014 Third International Conference on Agro-Geoinformatics - Beijing, China (2014.8.11-2014.8.14)] 2014 The Third International Conference on Agro-Geoinformatics - Forecasting

spectral domain [9]. Considering the influence of plant’s water content on disease occurrence would have a latent period, only the PDI at Phase 1 (PDI-P1) and Phase 2 (PDI-P1) were adopted as input variables for model establishment.

III. METHODOLOGY The establishment of the PM forecasting model included

two major steps in general. Firstly, the sensitivity to disease of all candidate variables were estimated through a correlation analysis. Then, based on variables that are most sensitive to PM, the PM forecasting model is established based on a logistic regression method. The entire workflow of modeling process is given in Fig. 2.

A. Identification of appropriate VIs To identify appropriate remotely sensed features for

establishing PM forecasting model, a correlation analysis is performed between DI and each VI (Table 2). Only the VIs achieving significant correlation with DI will be selected as explanatory variables. In such analysis, each VI at different phases was treated as an independent feature.

B. Logistic regression disease forecasting model In disease forecasting process, a binary logistic regression

analysis was used to relate the disease occurrence information with identified remotely sensed variables. For model calibration, the area under disease infection is assigned a value of 1 whereas the disease free area is assigned a value of 0. The logistic equation has the form:

( )log it ln1

ppp

⎛ ⎞= ⎜ ⎟−⎝ ⎠

(1)

where p is the infection probability, which can be expressed as:

( )( )

0 1 1 2 2

0 1 1 2 2

exp1 exp

i i

i i

x x xp

x x xβ β β β

β β β β+ + + +

=+ + + + +

……

(2)

where x1-i are explanatory variables, β0 is the intercept parameter, and β1-i are slope parameters. The forecasting model was calibrated using the training set as introduced in section of “datasets and data preprocessing”. The goodness-of-fit for logistic model was evaluated through the Hosmer-Lemeshow Test. Once developed, the model is able to predict the probability of PM occurrence for a given pixel based on Eq 1, which allowing us to make predicts for the reserved 40% survey points.

Fig. 2. Diagram of technique flow of disesease forcasting based on remote sensing data.

IV. RESULTS AND DISCUSSION

A. Remotely sensed features for PM forecasting The sensitivities of remotely sensed variables to PM were

examined based on Pearson correlation analysis (Table 3). The responses of features were stronger at a late stage than at an early stage, which thus reflected the epidemic progress of disease. To identify the most efficient VIs for establishing a PM forecasting model, only 4 VIs with disease sensitivity achieving a significance level of p-value<0.001 were retained as explanatory variable, including RG, RR, NDVI and NLI at phase 3. All these variables are physically associated with chlorophyll concentration or biomass, which are indicators of plants’ vigor. Apart from these SFs, the LST at all three stages and PDI at phase 1 and phase 2 were also included into the model, which thus lead to a total of 9 variable as input variables of model.

TABLE III. SUMMARY OF CORRELATION BETWEEN FEATURES AND DISEASE SEVERITY OF POWDERY MILDEW AT EACH PHASE

VIs Phase 1 Phase 2 Phase 3

RB * **

RG * ***

RR ** ***

RNIR SR ** **

NDVI ** ***

GNDVI * **

TVI *

SAVI **

OSAVI * **

MSR ** **

NLI ** ***

RDVI **

*, ** and *** indicate the correlations are significant at 0.950, 0.990 and 0.999 confidence level, respectively.

B. PM forecasting with the logistic regression model The logistic regression analysis has been successfully

adopted for estimating disease occurrence probability in empirical disease forecasting models [10]. Based on the selected remotely sensed variables as illustrated above, the logistic model of PM occurrence probability was established in this study. One advantage of the logistic regression model is that the standardized coefficients of variables provide straightforward interpretation of their impacts. As shown in Fig. 3, the LST at phase 3 (LST-P3) and PDI at phase 2 (PDI-P2) got higher coefficients than other variables, indicating the parameters that associated with habitat information play a key role in determining the PM infection probability. The probability map of PM occurrence in 2010 were produced and demonstrated in Fig. 4. From the map, we observed that the southern part of the study area got a higher chance to be infected by PM than the northern part, which is in good

Page 4: [IEEE 2014 Third International Conference on Agro-Geoinformatics - Beijing, China (2014.8.11-2014.8.14)] 2014 The Third International Conference on Agro-Geoinformatics - Forecasting

agreement with the pattern illustrated in the field survey. Besides, the survey records from the local Plant Protection Agency also showed similar trend with the forecasting result.

Fig. 3. The standardized coefficients of each variable in the logistic

regression model.

Fig. 4. Predicted probability of powdery mildew in our study area

As for Hosmer-Lemeshow test, the nonsignificant p-value (greater than 0.05) indicated adequate fit for the model. To facilitate the practical implementation of the forecasting model, the probability value was converted to a binary result (infected or healthy) by applying a threshold, which thus allowed the validation against an independent validation field survey sample. Since a conservative (low) threshold indicates a large area that treatment needs to be implemented, to avoid the subjective selection of threshold, we calculated the overall accuracy, omission error and commission error under a variety of various thresholds from 5% to 95% with a step of 5% (Fig. 5). It is noticed that with the increase of the threshold, the commission error tend to decrease whereas the omission error tend to increase. The overall accuracy varied under different thresholds, and reached its climax (72%) at threshold of 30%, where a trade-off between the commission error and the omission error was also achieved (Fig. 5). Therefore, in this study, 30% was defined as the threshold for converting the estimated probability to a binary result.

Fig. 5. Accuracy indicators under varied cut-off values

V. CONCLUSIONS This study has demonstrated that the multi-sources

remotely sensed data can be used for forecasting the occurrence probability of PM at a regional scale. A variety of remotely sensed parameters were identified for representing the plants’ vigor and fields’ habitat characteristics. Based on the logistic regression analysis, the forecasting model of PM can thus be established that is able to provide the spatial distribution of the disease occurrence probability. With an independent validation sample, the accuracies of prediction were assessed under varied probability thresholds. It turned out that the model can yield a generally satisfactory accuracy, with OAA of 72%. In future, it is recommended to incorporate more forms of information, particularly those of strong associations with pathogen distribution and cultivar resistance, for develop a more comprehensive disease forecasting model. The forecasting model of PM that relies on the remotely sensed data provides a framework in integrating multi-sources data for disease forecast and management. To further improve the reliability and stability of model in forecasting accuracy, it is important to include meteorological data in the model. In future, more studies at this point are necessary.

Acknowledgment

This work was subsidized by National Natural Science Foundation of China (41271412), Beijing Natural Science Foundation (4132029) and Prior Sci-Tech Program for Scientific Activity of Overseas staff. The authors are grateful to Mr. Weiguo Li, Mrs. Hong Chang for their helps in field data collection.

References [1] K. Papastamati, F. Van Den Bosch, S. J. Welham, B. D. L. Fitt, N.

Evans, J.M. Steed, “Modelling the daily progress of light leaf spot epidemics on winter oilseed rape (Brassica napus), in relation to Pyrenopeziza brassicae inoculum concentrations and weather factors.” Ecological modeling, Vol. 148(2), pp. 169-189, 2002.

[2] F.W. Nutter Jr, R.R. Rubsam, S.E. Taylor, J.A. Harri, P.D. Esker, “Use of geospatially-referenced disease and weather data to improve site-specific forecasts for Stewart's disease of corn in the US corn belt.” Computers and Electronics in Agriculture, Vol. 37(1), pp. 7-14, 2002.

0

5000

10000

15000

20000

25000

0%

10%

20%

30%

40%

50%

60%

70%

80%

5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80%

Are

a (h

m2 )

Acc

urac

y

Cut-off value

Area with risk

Overall Accuracy

Omission error

Commission error

Page 5: [IEEE 2014 Third International Conference on Agro-Geoinformatics - Beijing, China (2014.8.11-2014.8.14)] 2014 The Third International Conference on Agro-Geoinformatics - Forecasting

[3] M.S. Saharan and G.S. Saharan, “Influence of weather factors on the incidence of Alternaria blight of cluster bean (Cyamopsis tetragonoloba (L.) Taub.) on varieties with different susceptibilities. ” Crop Protection, Vol. 23(12), pp. 1223-1227, 2004.

[4] R.N. Strange and P.R. Scott, “Plant Disease: A Threat to Global Food Security.” Annual Review of Phytopathology, Vol. 40, pp. 83–116, 2005.

[5] R. Nemani, H. Hashimoto, P. Votava, F. Melton, W.L. Wang, A. Michaelis, L. Mutch, C. Milesi, S. Hiatt, M. White, “Monitoring and forecasting ecosystem dynamics using the Terrestrial Observation and Prediction System (TOPS).” Remote Sensing of Environment. Vol. 113(7), pp. 1497-1509, 2009.

[6] W. J. Huang, W. L. David, Z. Niu, Y. J. Zhang, L.Y. Liu, J. H.Wang, “Identification of yellow rust in wheat using in-situ spectral reflectance measurements and airborne hyperspectral imaging.” Precision Agriculture, Vol. 8, pp. 187-197, 2009.

[7] J.C. Jiménez‐Muñoz and J.A. Sobrino, “A generalized single‐channel method for retrieving land surface temperature from remote sensing data.” Journal of Geophysical Research: Atmospheres (1984–2012), Vol. 108(D22), 2003.

[8] B.M. Cooke, D.G.Jones, B. Kaye, “The Epidemiology of Plant Diseases. ” Springer, Netherland, 2006.

[9] A. Ghulam, Q. Qin, Z. Zhan, “Designing of the perpendicular drought index. ” Environmental Geology, Vol. 52(6), pp.1045-1052, 2007.

[10] V. Machault, C. Vignolles, F. Pagès, L. Gadiaga, Y.M. Tourre, A. Gaye, C. Sokhna, J.F. Trape, J.P. Lacaux, C. Rogier. “Risk Mapping of Anopheles gambiae sl Densities Using Remotely-Sensed Environmental and Meteorological Data in an Urban Area: Dakar, Senegal. ” PLoS One. Vol. 7(11), e50674, 2012.