forecasting drought using the agricultural reference index for drought (arid): a case study

Forecasting Drought Using the Agricultural Reference Index forDrought (ARID): A Case Study

PREM WOLI

Mississippi State University, Mississippi State, Mississippi

JAMES JONES AND KEITH INGRAM

University of Florida, Gainesville, Florida

JOEL PAZ

Mississippi State University, Mississippi State, Mississippi

(Manuscript received 25 April 2012, in final form 19 October 2012)

ABSTRACT

Drought forecasting can aid in developing mitigation strategies and minimizing economic losses. Drought

may be forecast using a drought index, which is an indicator of drought. The agricultural reference index for

drought (ARID) was used as a tool to investigate the possibility of using climate indices (CIs) as predictors to

improve the current level of forecasting, which is El Nino–Southern Oscillation (ENSO) based. The per-

formances of models that are based on linear regression (LR), artificial neural networks (ANN), adaptive

neuron-fuzzy inference systems (ANFIS), and autoregressive moving averages (ARMA) models were

compared with that of the ENSO approach. Monthly values of ARID spanning 56 yr were computed for five

locations in the southeasternUnited States, andmonthly values of the CIs having significant connections with

weather in this region were obtained. For the ENSO approach, the ARID values were separated into three

ENSO phases and averaged by phase. For the ARMAmodels, monthly time series of ARID were used. For

the ANFIS, ANN, and LRmodels, ARID was predicted 1, 2, and 3 months ahead using the past values of the

first principal component of the CIs. Model performances were assessed with the Nash–Sutcliffe index. Re-

sults indicated that drought forecasting could be improved for the southern part of the region using ANN

models and CIs. The ANN outperformed the other models for most locations in the region. The CI-based

models and theENSOapproach performed better during thewinter, whereas the efficiency ofARMAmodels

depended on precipitation periodicities. All models performed better for southern locations. The CIs showed

good potential for use in forecasting drought, especially for southern locations in the winter.

1. Introduction

Drought is a creeping natural hazard, causing tre-

mendous economic and social losses worldwide. In the

United States, about $7 billion is lost every year through

drought (FEMA 1995; NSTC 2005). Although drought

cannot be prevented, its losses can be minimized by

mitigation strategies if it is forecast well in advance.

Because it evolves slowly, its mitigation plans can be

carried out more effectively than those of other natural

hazards. The benefits of reliable forecasting are tremen-

dous (Campbell 1973).

Drought may be forecast using drought variables

(Hastenrath andGreischar 1993;Chiewet al. 1998;Cordery

and McCall 2000) or drought indicators (Nicholls 1985;

Mishra and Desai 2005; Cancelliere et al. 2007; Vasiliades

and Loukas 2008). A drought variable is a key element

that defines drought, such as precipitation, soil moisture,

or evapotranspiration. A drought indicator is a measure

used to quantify and characterize drought conditions.

For forecasting a meteorological drought, forecasting

precipitation may suffice. For forecasting an agricultural

drought (the shortage of water in the root zone to meet

the consumptive use of plants), however, forecasting all

associated variables is required. This is difficult, because

Corresponding author address: Prem Woli, Mississippi State

University, 130 Creelman St., Mississippi State, MS 39762-9632.

E-mail: [email protected]

APRIL 2013 WOL I ET AL . 427

DOI: 10.1175/WAF-D-12-00036.1

� 2013 American Meteorological Society

these variables are controlled by several plant and soil

processes. Therefore, for agricultural drought, the use of

a drought indicator, such as a drought index, is more

appropriate than using a drought variable.

Various methods exist for forecasting a drought index

(Mishra and Singh 2011), one of which is to use its own

past values (Panu and Sharma 2002; Mishra and Desai

2005; Mendicino et al. 2008). This approach involves

quantifying a relationship between the present and past

values of the index by fitting the data to a time series

model, such as an autoregressive moving average

(ARMA; Box and Jenkins 1970). Another approach is

to predict the index as a function of the past values of its

predictor variables. Teleconnection indices may be used

as drought predictors (Mishra and Singh 2010, 2011)

because atmospheric circulation patterns have been

found to influence droughts (Stahl and Demuth 1999).

The linkage between changes in the atmosphere in one

place and effects on the weather in another, distant place

is called teleconnection. The indices of large-scale oce-

anic and atmospheric teleconnection patterns, or oscil-

lations, are called teleconnection or climate indices.

Although climate indices are generally calculated on the

basis of measurements from a localized area, they can be

statistically related with climate variables of other areas

around the globe (Bond et al. 2007).

Because teleconnection patterns affect weather vari-

ables such as temperature and precipitation, the key

inputs of a drought index, a plausible relationship is

likely to exist between a drought index and climate in-

dices. Some of the major teleconnection patterns that

are reported to have significant impacts on the weather

of the southeastern United States are the Atlantic

multidecadal oscillation (AMO; Gershunov and Barnett

1998b; Enfield et al. 2001; Stevens 2007), theNorthAtlantic

Oscillation (NAO; Rogers 1984; Yin 1994; Hagemeyer

2006; Stevens 2007), the Pacific decadal oscillation (PDO;

(Gershunov and Barnett 1998b; Enfield et al. 2001; Stevens

2007), the Pacific–North American (PNA) pattern

(Leathers et al. 1991; Yin 1994; Konrad 1998; Hagemeyer

2006; Martinez et al. 2009), sea surface temperature

(SST) in the Nino-3.4 region (i.e., the east-central

tropical Pacific: 58N–58S, 1708–1208W) (Trenberth 1997;

Gershunov and Barnett 1998b; Schmidt et al. 2001;

Hanley et al. 2003; Hagemeyer 2006), and the SST over

the region of 48S–48N, 1508–908W (Hansen et al. 1998).

TheAMO index is basically an index of ongoing series

of long-duration changes in the SST of the North At-

lantic Ocean over 08–708N. The NAO index is the dif-

ference in sea level pressure between two stations

situated close to the Icelandic low and Azores high

pressure centers. The PNA index is a linear combination

of the normalized geopotential height anomalies at the

700-hPa level at four locations: Hawaii (208N, 1608W);

the North Pacific Ocean (458N, 1658W); Alberta, Canada

(558N, 1158W); and the Gulf Coast of the United States

(308N, 858W). The PDO index is the spatial average of

themonthly SST of the Pacific Ocean north of 208N. The

Nino-3.4 index is the departure in monthly SST from

their long-term means over the Nino-3.4 region. The

JapanMeteorological Agency (JMA) index is a 5-month

running mean of spatially averaged SST anomalies over

the tropical Pacific region (48S–48N, 1508–908W).

The most commonly used approach for forecasting

precipitation or drought in the southeastern United

States is based on the El Nino–Southern Oscillation

(ENSO). The strong teleconnection between ENSO and

the weather conditions in this region has enabled skillful

forecasting of seasonal temperature and precipitation

up to 1 yr in advance (Steinemann 2006; Brolley et al.

2007). The drought-forecasting ability of the ENSO

approach, however, varies across months and locations

because the impact of ENSO varies across regions and

seasons (Ropelewski and Halpert 1986; Gershunov and

Barnett 1998a; Smith et al. 1998; Brolley et al. 2007).

That is, the ENSO-based approach might produce

plausible forecasts for some months of the year and in-

accurate forecasts for other months. This situation ne-

cessitates the search for other approaches. The study

presented here explored whether climate indices could

produce more accurate forecasts than those produced

using ENSO for the months or locations for which the

predictability of ENSO is not good.

For quantifying predictor–predictand relationships,

various statistical models have been used, such as linear

regression (LR; Kumar and Panu 1997; Panu and

Sharma 2002; Wood and Lettenmaier 2006; Vasiliades

and Loukas 2008), artificial neural networks (ANN;

Wilpen et al. 1994; Zhang et al. 1998; Panu and Sharma

2002; Vasiliades and Loukas 2008), and the adaptive

neuro-fuzzy inference system (ANFIS; Nayak et al.

2004; Chang and Chang 2006; Bacanli et al. 2008; Firat

and Gungor 2008). An ARMA model can model the

periodicity and nonstationarity of time series data effec-

tively (Modarres 2006; Mishra et al. 2007) and can pro-

vide systematic procedures for modeling the behavior

of uncertain systems (Mishra and Desai 2005). An LR

model can explicitly address causal relationships and can

show optimal results for a linear mechanism (Belsley

et al. 1980). An ANN model is able to map nonlinear,

complex, and arbitrary input–output relationships

(Hornik et al. 1989; Smith 1996; Zhang et al. 1998; Kim

and Valdes 2003; Samarasinghe 2007). An ANFIS

(Zadeh 1965; Jang 1993) is a promising tool formodeling

complex, nonlinear systems (Marce et al. 2004; Nayak

et al. 2004; Bacanli et al. 2008; Altun et al. 2009) and has

428 WEATHER AND FORECAST ING VOLUME 28

the ability to draw conclusions from vague, ambiguous,

incomplete, and imprecise information (Lee et al. 2007).

Several researchers have used thesemodels, individually

or in combination, to forecast various hydrological vari-

ables. Some researchers, such as Cutore et al. (2009),

have usedANN to forecast a hydrological drought, which

is an expression of precipitation shortfall on surface or

subsurface water supply. For forecasting an agricultural

drought, however, no study has assessed all of these

methods using a number of climate indices mentioned

above. The curent study examined the performance of

ANFIS,ANN,ARMA, and LRmodels relative to that of

the ENSO approach in forecasting the agricultural ref-

erence index for drought (ARID;Woli et al. 2012), which

is a computationally simple, physically and physiologi-

cally sound, and generally applicable drought index that

can characterize an agricultural drought better thanmany

other drought indices that are applied to agricultural

systems (Woli et al. 2012). Although ARID is a simple

and generic drought index, it is applicable to a wide range

of conditions (Woli et al. 2013).

TheARID is based on the reference grass (Allen et al.

2005) and is expressed as the ratio of plant water deficit

to plant water need as

ARIDi 5 12 (Ti/ETo,i) , (1)

where the subscript i stands for the ith day, T is tran-

spiration (mm day21), and ETo is the reference grass

evapotranspiration (mm day21). Values of ARID fall

between 0, indicating no water stress, and 1, indicating

full water stress. Whereas ETo is computed using the

United Nations Food and Agriculture Organization Ir-

rigation and Drainage Paper 56 (FAO-56) Penman–

Monteith model (Allen et al. 1998), T is estimated as

Ti5min(azQada,i21, ETo,i) , (2)

whereTi is the transpiration in day i (mm day21), a is the

water uptake coefficient, z is the root-zone depth (mm),

and Qada,i21 is plant available water content after deep

drainage at the end of the previous day (mm mm21). The

available water content in day i Qada,i is estimated as the

ratio of the amount of available water in the root zone in

day i—Wi (mm)—to the root-zone depth: Qada,i 5Wi/z.

The Wi is estimated from a simple soil water balance:

Wi 5Wi211Pi 1 Ii 2Ti 2Di 2Ri , (3)

where subscripts i and i 2 1 denote the current and

previous day, respectively, and Pi, Ii, Ti, Di, and Ri are

precipitation, irrigation, transpiration, deep drainage,

and surface runoff that occurred in day i (mm day21),

respectively. The deep drainage for day i is estimated as

Di 5bz(Qbda,i 2 um) , (4)

where b is drainage coefficient, Qbda,i is the available soil

water content in day i before drainage (mm mm21), and

um is plant available water capacity (mm mm21). Using

the USDA-SCS (1972) method, ARID estimates daily

surface runoff (mm day21) as

Ri 5 (Pi2 Ia)2/(Pi 2 Ia1 S) , (5)

where Ia is initial abstraction (mm day21) and S is po-

tential maximum retention (mm day21), which is com-

puted as S 5 25 400/h (where h is the runoff curve

number).

2. Materials and methods

a. Sites, data, and general computations

Of the five forecasting methods assessed—ANFIS,

ANN, LR, ARMA, and ENSO—only the first three

methods used climate indices as the predictors of ARID.

Because ARMA is a time series model, it only uses the

past values of the predictand to predict future values.

Although ENSO is also teleconnection based (JMA),

the ENSO approach does not directly use the values of

the JMA index and the other climate indices used by

ANFIS, ANN, and LR. The predictors of ARID for the

climate index–based methods were chosen, and further

computations and analyses were made in the following

steps (which are also summarized in Table 1):

1) Through a literature review, six large-scale oceanic–

atmospheric oscillations having significant impacts

on the weather of the southeastern United States

were identified: AMO, NAO, JMA, Nino-3.4,

PDO, and PNA.

2) Five locations in the southeastern United States—

Miami (25.58N, 80.58W), Bartow (27.98N, 81.88W),

and LiveOak (30.38N, 82.98W) in Florida and Plains

(32.78N, 84.28W) and Blairsville (34.88N, 83.98W) in

Georgia—were selected by considering latitude and

the availability of historical weather data.

3) Daily weather data spanning 56 yr (1951–2006) for

the respective locations were obtained on the

Internet from the Florida Automated Weather

Network and theGeorgia Automated Environmen-

tal Monitoring Network.

4) The monthly values of the indices of the oscillations

mentioned in step 1, called climate indices, for the years

mentioned in step 3 were obtained from the Internet.

5) From the weather data mentioned in step 3, daily

values of ARID were computed for each of the five


locations, which later were converted to monthly

values by averaging the daily values over a month.

For instance, a monthly value for January was

computed by averaging all 31 daily values of this

month. Computations were carried out monthly

because most of the climate indices used in the study

recorded only monthly values. Moreover, a monthly

time scale is appropriate for monitoring drought

from the viewpoints of farm management and

crop-growth-stage sensitivity to water deficit (Panu

and Sharma 2002).

6) For each climate index, the dataset obtained in step

4 was separated by month, thus creating 12 subsets,

one for each month. The monthly separation of the

dataset was carried out by assuming that the degree

of linkage between large-scale teleconnection pat-

terns and the weather in this region might vary

across months.

7) To assess the predictability of ARID for different

lead times, ARIDwas forecast with 1–3-month lead

time (t1 1, t1 2, and t1 3) using the current (time t)

values of its predictors. The predictors’ lag time

remained the same (t) while predicting ARID at

different lead times. To forecast ARID at three lead

times, three series of ARID were created and each

series individually was used. As in step 6, 12 subsets of

ARIDat each lead-time forecast (ARIDt11,ARIDt12,

or ARIDt13) were created for each location.

8) The correlation between a climate index andARID

at each lead time was examined for each climate

index, month, and location. When the correlation

was significant at the 90% confidence level (a 50.1), the climate index was selected. For the signif-

icant cases, the absolute values of the Pearson’s

correlation coefficient jrj were greater than 0.22.

When both AMO and NAO were significantly

correlated with ARID, NAO was chosen from the

two; when ARID was not significantly correlated

withNAObut waswithAMO, the latter was used. In

a similar way, when both Nino-3.4 and JMA were

significantly correlated with ARID, Nino-3.4 was

chosen; JMA was used when ARID was not signif-

icantly correlated with Nino-3.4 but was with JMA.

These specific selections were made because AMO

and NAO are similar and so are Nino-3.4 and JMA.

In addition, NAO has a stronger influence than

AMO in the southeastern United States, and Nino-

3.4 has a stronger influence than JMA (J. O’Brien

2009, personal communication). Following the

above procedures, climate indices, the predictors

of ARID, were selected for each month, location,

and lead-time forecast (Table 2).

9) The climate indices selected for any of the five

locations in the region for a month were considered

to be the predictors of ARID for the entire region

for that month (the last column of Table 2). The

climate indices selected for the southeasternUnited

States for each month and lead-time forecast are

presented in Table 3.

10) The cross correlations among the six climate indices

mentioned in step 1 were examined (Table 4).

Examining the correlations was necessary because

regression techniques require the predictors to be

independent (Cordery and McCall 2000).

11) Because most of the climate indices were signifi-

cantly cross correlated at the significance level of 0.05

(Table 4), principal component analysis was per-

formed on the climate indices chosen for each month

and lead-time forecast (Table 3), and the first princi-

pal component, hereinafter referred to asX, was used

as the predictor of ARID. TheXwas chosen because

(i) the total variance it explained was generally

reasonably good (Table 3) and (ii) its use made the

number of predictors across months and lead-time

TABLE 1. Steps involved in selecting the predictors of ARID and making further computations.

Step Activity

1 Large-scale oscillations were identified for the southeastern United States

2 Five locations in the region were selected

3 Historical weather data spanning 1951–2006 were obtained

4 Monthly values of the CIs were obtained

5 Monthly values of ARID were computed for each location

6 For each CI, the dataset was separated by month

7 For each location and lead time, the ARID dataset was separated by month

8 CI–ARID correlation was examined for each index, month, location, and lead time; if significant at a 5 0.1, the index was

selected (Tables 2 and 3)

9 The CIs selected for any location for a month were considered to be the predictors of ARID for the entire region

for that month

10 Cross correlations among the CIs were examined (Table 4)

11 PCA was performed on the chosen CIs (Table 3), and the first principal component was used as the predictor of ARID

12 Models were evaluated using the Nash–Sutcliffe index, and the predictability of ARID was assessed using the RMSE


forecasts consistent (i.e., 1) because the number of

climate indices was different for different months

and lead times, ranging from 1 to 4 (Table 3).

12) Last, the predictive accuracy of each model was

evaluated using the modeling efficiency [ME, also

called the Nash–Sutcliffe index (Nash and Sutcliffe

1970)], which is a commonly used measure for

evaluating hydrometeorological models (Sevat

and Dezetter 1991; ASCE Task Committee on

Definition of Criteria for Evaluation of Watershed

Models of theWatershedManagement Committee,

Irrigation and Drainage Division 1993; Legates and

McCabe 1999; Moriasi et al. 2007). The ME was

computed as

ME5 12

��n

i51

(ARID2 ARID)2.�n

i51

(ARID2ARID)2�, (6)

where ARID, ARID, and ARID are the observed

(i.e., originally computed), predicted, and mean

observed values of ARID, respectively, and n is

the number of observations. Values of ME range

from 2‘ to 1. Whereas a negative value indicates

that the observed mean is a better predictor than

TABLE 2. CIs selected as predictors of ARID at lead-time t 1 1 (1-month-ahead forecasting) for various months and locations in the

southeastern United States. The CIs were significantly correlated with ARID at the 90% confidence level (a 5 0.1).

Location

Month Miami Bartow Live Oak Plains Blairsville Region

Jan Nino-3.4 Nino-3.4 Nino-3.4 JMA, PDO, PNA — Nino-3.4, PDO, PNA

Feb Nino-3.4, PDO,

PNA

Nino-3.4, PDO Nino-3.4, PDO, PNA Nino-3.4 Nino-3.4 Nino-3.4, PDO, PNA

Mar AMO, Nino-3.4,

PDO

Nino-3.4 — NAO, JMA NAO, Nino-3.4,

PNA

NAO, Nino-3.4,

PDO, PNA

Apr JMA, PDO JMA PDO — — JMA, PDO

May — — — — JMA JMA

Jun AMO, PNA — PDO PDO Nino-3.4, PDO AMO, Nino-3.4,

PDO, PNA

Jul PNA — — JMA JMA JMA, PNA

Aug AMO NAO, Nino-3.4, PDO NAO NAO Nino-3.4 NAO, Nino-3.4, PDO

Sep Nino-3.4 Nino-3.4 — — — Nino-3.4

Oct Nino-3.4 JMA Nino-3.4, PDO NAO, Nino-3.4,

PDO

NAO NAO, Nino-3.4, PDO

Nov Nino-3.4, PDO Nino-3.4, PDO, PNA AMO, Nino-3.4, PDO NAO, Nino-3.4,

PDO

— NAO, Nino-3.4,

PDO, PNA

Dec — Nino-3.4 — — — Nino-3.4

TABLE 3. CIs used as the predictor of ARID for various months and lead-time forecast (t1 1, t1 2, and t1 3) for the southeasternUnited

States. Here, ‘‘X var’’ is the percent of total variance explained by the first principal component X.

t 1 1 forecasting t 1 2 forecasting t 1 3 forecasting

Month (t) CIs X var CIs X var CIs X var

Jan Nino-3.4, PDO, PNA 72 Nino-3.4, PDO, PNA 72 Nino-3.4 100

Feb Nino-3.4, PDO, PNA 72 AMO, Nino-3.4, PDO, PNA 60 PDO, PNA 85

Mar NAO, Nino-3.4, PDO, PNA 62 AMO, JMA, PDO 77 JMA 100

Apr JMA, PDO 82 PNA 100 NAO, Nino-3.4, PDO, PNA 55

May JMA 100 NAO, Nino-3.4, PDO 60 JMA, PNA 70

Jun AMO, Nino-3.4, PDO, PNA 48 JMA, PNA 63 AMO, Nino-3.4 60

Jul JMA, PNA 65 AMO, Nino-3.4, PDO, PNA 52 NAO, PNA 65

Aug NAO, Nino-3.4, PDO 60 PNA 100 AMO, Nino-3.4, PDO 60

Sep Nino-3.4 100 AMO, Nino-3.4, PDO 65 AMO, Nino-3.4, PDO 65

Oct NAO, Nino-3.4, PDO 60 AMO, Nino-3.4, PDO, PNA 48 Nino-3.4, PNA 65

Nov NAO, Nino-3.4, PDO, PNA 55 NAO, Nino-3.4 70 AMO, JMA 64

Dec Nino-3.4 100 NAO, PDO 60 PDO 100


the model, a positive value signifies that using the

model estimate is better than using the observed

mean. An ME of 0 indicates that the model esti-

mate is as accurate as the mean of the observed

data, whereas an ME of 1 corresponds to a perfect

match of the modeled values to the observed data.

In fact, the closer the ME is to 1, the more accurate

is the model.

The predictability of ARID for different lead times

was assessed by comparing the 1-, 2-, or 3-month lead-

time forecasts of ARID with observed values using the

root-mean-square error (RMSE). The comparisons

were carried out for each model, month, and location.

b. Computations for ANN models

For ANN, a three-layer multilayer perceptron

(MLP) was used (Fig. 1) because it is the most widely

used network and provides a general framework for

nonlinear input–output mapping (Hornik et al. 1989;

Zhang et al. 1998; Samarasinghe 2007; Abbasi 2009).

A single-hidden-layer MLP was chosen because one

hidden layer is sufficient to approximate any complex

nonlinear function (Cybenko 1989; Hornik et al. 1989;

Zhang et al. 1998; Maier and Dandy 2000). To examine

how the number of nodes in a hidden layer of the net-

work would affect the forecasting ability of the net-

work and to decide on the number of nodes, we carried

out a sensitivity analysis on the number of nodes. The

ability of the network was measured in terms of pre-

diction errors (RMSE): the lower the RMSE value is ,

the better is the forecasting ability. For this, the net-

work was run with different numbers of nodes in a

hidden layer ranging from 1 to 10. The results showed

that increasing the number of nodes in a hidden layer

from 1 did not significantly improve the performance of

the network having one input node (Table 5). For sim-

plicity, therefore, we used just one node in the hidden

layer. Several other researchers, such as De Groot and

Wurtz (1991), Chakraborty et al. (1992), and Tang and

Fishwick (1993), are also of the opinion that one node in

the hidden layer be used for a network that has just one

input node.

TABLE 4. Correlations and associated P values among the CIs used to forecast ARID for the southeastern United States.

AMO NAO JMA Nino-3.4 PDO PNA

Correlation

AMO 1 20.17 0.08 0.08 0.02 0.12

NAO 20.17 1 0.01 0.02 0.02 0.03

JMA 0.08 0.01 1 0.83 0.44 0.16

Nino-3.4 0.08 0.02 0.83 1 0.41 0.12

PDO 0.02 0.02 0.44 0.41 1 0.36

PNA 0.12 0.03 0.16 0.12 0.36 1

P value

AMO 1 1 3 1025 0.04 0.04 0.56 2 3 1023

NAO 1 3 1025 1 0.70 0.68 0.69 0.49

JMA 0.04 0.70 1 ;0 ;0 2 3 1025

Nino-3.4 0.04 0.68 ;0 1 ;0 2 3 1023

PDO 0.56 0.69 ;0 ;0 1 ;0

PNA 2 3 1023 0.49 2 3 1025 2 3 1023 ;0 1

FIG. 1. The architecture of an MLP neural network with one input layer, one hidden layer, and

one output layer, each with one neuron.


TheMLP network estimatedARIDwith the following

steps (Fig. 1):

1) The network-inputXwas fed into the hidden neuron,

and its weighted input pwas computed, using its layer

weight a and bias weight b as

p5 aX1 b . (7)

2) In the hidden neuron, its output h was computed

using p in a hyperbolic tangent input transfer func-

tion (Klimasauskas 1991; Maier and Dandy 1998;

Samarasinghe 2007) as

h5 [exp(p)2 exp(2p)]/[exp(p)1 exp(2p)] . (8)

3) The h was then fed into the output neuron, and its

weighted input q was computed, using its layer

weight a and bias weight b, as

q5ah1b . (9)

4) In the output neuron, the network output (ARID)

was computed using q in a linear output transfer

function (Rumelhart et al. 1995; Zhang et al. 1998;

Sahoo et al. 2005):

ARID5 q . (10)

Step 4 concluded forward processing in the network,

and, with this, ARID was compared with the target

value (ARID) to compute error «, which was then

backpropagated through the network to update the pa-

rameters a, b, a, andb. For training, the gradient descent

method was used because it is efficient and popular

(Zhang et al. 1998; ASCE Task Committee on Applica-

tion of Artificial Neural Networks in Hydrology 2000;

Mishra and Desai 2006; Samarasinghe 2007). Learning

rate andmomentumwere set to 0.01 and 0.9, respectively,

assuming that the input–output relationships were

complex (Tang et al. 1991; Hagan et al. 1996; Principe

et al. 1999; Mishra and Desai 2006; MathWorks 2012).

For training, batch learning was used because it is the

most preferred method, forces the search in the di-

rection of the true gradient (Maier and Dandy 2000),

and avoids instability (Samarasinghe 2007).

Before feeding data into the network, each monthly

dataset of 56 input–output combinations was divided

into three subsets: 44 for training, 11 for testing, and 1 for

validation (Granger 1993). The testing set was used to

avoid overfitting the data, and the validation set was

used to evaluate the model. The performance of an

ANN model was evaluated applying the leave-one-out

technique of cross validation. Last, values of RMSE and

ME were computed for each month, location, and lead-

time forecast using the model-estimated and original

values of ARID.

c. Computations for ANFIS models

This study used the Sugeno-type ANFIS (Takagi and

Sugeno 1985) because it is computationally efficient and

is well suited formodeling nonlinear systems (Jang 1993;

Lee et al. 2007; MathWorks 2012). The first-order Su-

geno type was used because higher orders often add an

unwanted level of complexity (Lee et al. 2007). To

partition the input space, grid partitioning was used

because it gives more precise modeling (Altun et al.

2009). For training, the hybrid optimization method—

the combination of backpropagation and least squares

error—was used because of its high learning speed (Jang

1993; Lee et al. 2007). The ANFIS network estimated

ARID with the following steps (Fig. 2):

1) In the node of layer 1, the inputXwas fuzzified using

a generalized bell-shaped (Marja and Esa 1997; Lee

et al. 2007; MathWorks 2012) input membership

function f, a curve that maps each element in the input

space to a membership value between 0 and 1 as

fi 5 1/h11 fabs[(X2 ci)/ei]g2dii , (11)

where the subscript i stands for the ith rule and c, d,

and e are antecedent parameters that are updated

through backpropagation.

2) In the node of layer 2, firing strengths, also called

weights w, were computed for rule i. For the single-

input variable X, the same values of f were used for

w as follows,

TABLE 5. The RMSE values associated with 1-month-lead-time

forecasting of ARID for various months for Miami, using different

numbers of neurons in a hidden layer of the neural network having

one input node.

No. of neurons

Month 1 2 3 4 5 6 7 8 9 10

Jan 0.15 0.21 0.20 0.20 0.21 0.20 0.22 0.21 0.22 0.20

Feb 0.16 0.17 0.19 0.17 0.18 0.15 0.19 0.15 0.14 0.17

Mar 0.11 0.14 0.14 0.12 0.11 0.13 0.13 0.13 0.12 0.14

Apr 0.10 0.11 0.13 0.10 0.09 0.09 0.09 0.11 0.13 0.10

May 0.11 0.12 0.13 0.10 0.12 0.14 0.17 0.13 0.11 0.14

Jun 0.07 0.06 0.10 0.11 0.07 0.11 0.06 0.12 0.14 0.07

Jul 0.05 0.07 0.06 0.07 0.08 0.08 0.08 0.09 0.08 0.08

Aug 0.06 0.05 0.08 0.10 0.07 0.09 0.11 0.08 0.08 0.10

Sep 0.06 0.07 0.06 0.07 0.06 0.08 0.05 0.07 0.08 0.07

Oct 0.12 0.13 0.11 0.10 0.11 0.12 0.13 0.13 0.14 0.11

Nov 0.17 0.17 0.18 0.19 0.16 0.17 0.15 0.17 0.14 0.18

Dec 0.16 0.19 0.14 0.18 0.17 0.20 0.16 0.19 0.16 0.19


wi 5 fi . (12)

3) In the node of layer 3, the normalized weight wi* of

rule i was computed as the ratio of the ith rule’s

weight to the sum of the weights of all n rules as

wi*5wi

��n

i51

wi . (13)

4) An output membership function g was computed

from the network-input X and the consequent pa-

rameters that are updated through least squares

error, g and d, as

gi 5 giX1 di . (14)

5) In the node of layer 4, the fuzzy values were

defuzzified back to crisp output values as

Oi5wi*gi , (15)

where Oi is the output contributed by the ith rule

toward the overall ANFIS output.

6) Last, ARID, the overall output of the model, was

computed in the fifth layer by summing up the

outputs contributed by all rules:

ARID5 �n

i51

Oi . (16)

With the conclusion of forward processing, ARID

was compared with the target value (ARID) to

compute error «, which was then backpropagated

through the network to update the antecedent pa-

rameters (step 1).

Before feeding the data into the system, each monthly

dataset of 56 input–output combinations was divided

into three subsets: a training set (44), a testing set (11),

and a validation set (1). AnANFISmodel was evaluated

by applying the leave-one-out cross-validation tech-

nique and computing RMSE and ME for each month,

location, and lead-time forecast using the estimated and

original values of ARID.

d. Computations for LR models

The m-month-ahead forecasts of ARID were gener-

ated as follows after estimating the coefficients u and y

by regressing the lead-time values of ARID (ARIDt1m)

and the current values of X (Xt):

ARIDt1m 5 uXt 1 y , (17)

where t is the month andm is the lead time. Values ofm

ranged from 1 to 3 months.

For regression and forecasting, respectively, training

and validation datasets were used. An LR model was

evaluated for each month, location, and lead-time

forecast using the leave-one-out technique of cross val-

idation and the RMSE and ME values.

e. Computations for ARMA models

For each location, 12 time series of ARID were cre-

ated, one for each month. For instance, one time series

comprised the ARID values from January 1951 through

December 2005, another series comprised the values

from February 1951 through January 2006, and the 12th

series comprised the values fromDecember 1951 through

November 2006. The time series ending in December, for

instance, was used to make forecasts for January 2006

(1 month ahead), February 2006 (2 months ahead), and

March 2006 (3 months ahead). Similarly, the series

ending in January was used to predict for February 2006

(1 month ahead), March 2006 (2 months ahead), and

FIG. 2. The architecture of an ANFIS with one input variable X.


April 2006 (3 months ahead), and so on. The monthly

time series were createdwith the assumption that weather

conditions vary across months. From each monthly

series, 16 subseries were created by truncating the

monthly series by 1–15 yr. For instance, the first subseries

of the series ending in December (mentioned above)

ended in December 2005, the second in December 2004,

and the 16th in December 1990. Accordingly, 16 ARMA

models were developed for each month. The subseries

were created for model evaluation purposes (to create

n observations), and 16 was chosen for n by consider-

ing 40 yr as the minimum length of the time series re-

quired to reflect any long-term periodicity in the data

(56 2 40 5 16).

The ARMA models were developed for each month

with the assumption that the weather conditions in this

region vary across months. The monthly ARMAmodels

would be consistent and thus be easy to compare with

the other methods for which the variation across months

in the degree of linkage between large-scale teleconnection

patterns and the weather in this region is assumed. A

seasonally integratedARMAmodel, called the seasonal

autoregressive integrated moving average (SARIMA)

model can satisfactorily describe a time series that ex-

hibits nonstationarity not only across seasons but also

within a season (Mishra and Singh 2011).

To observe the relationships between the present and

past values of ARID, autocorrelation functions (ACF)

and partial autocorrelation functions (PACF) were

calculated for each of the 16 time series of each month

and location. In each series, the values of ACF and

PACF exhibited periodicities corresponding to the

positive correlation between values separated by 12

months, indicating a need for seasonal differencing

(Fig. 3a). Thus, each time series was seasonally differ-

enced as ARIDt 2 ARIDt212. To study the behavior of

the seasonally differenced series, its ACF and PACF

values were calculated and analyzed to find any correla-

tion among the ARID values. Because the ACF and

PACF values of the differenced series did not exhibit any

periodicities (Fig. 3b), no further differencing was

needed. Thus, the D in an integrated ARMA model,

ARIMA( j, d, k) 3 (J, D, K)s, was set to 1. The parame-

ters j, d, k, J, D, K, and s in the model denote the non-

seasonal autoregressive (AR) model of order j, the

nonseasonal difference of order d, the nonseasonal

moving average (MA)model of order k, the seasonal AR

model of order J, the seasonal difference of orderD, the

seasonal MA model of order K, and the seasonal lag s,

respectively. The characteristics of theACFandPACFof

the seasonally differenced series showed a strong peak at

a lag s of 12 months in the ACF. Smaller peaks appeared

at s5 24 and 36 in the ACF, combined with peaks at s512, 24, 36, and 48 in the PACF (Fig. 3a). The cutting off of

ACF after lag s5 12 months (a seasonal lag of 1 yr) and

the tailing off of the PACF of the seasonally differenced

series indicated a seasonalMAof orderK5 1. Therefore,

the orders of J, D, and K for the seasonal part were 0, 1,

and 1, respectively. The same orders were found for each

series, month, and location. For the nonseasonal com-

ponent, the ACF and PACF values of the seasonally

differenced series within seasonal lags l 5 1, 2, . . . , 11

did not show any periodicities. Therefore, no non-

seasonal differencing was needed (d 5 0). Because the

FIG. 3. The (left) ACF and (right) PACF of (a) ARID and (b) differenced ARID for Miami in January.


nonseasonal behavior of ACF and PACF was the same

in each series for each month and location, the order of

d was also the same for all cases. For the order of j and

k, the tailing off of ACF and the cutting off of PACF

after lag l5 1 indicated a nonseasonal AR of order j5 1

and a nonseasonal MA of order k 5 0 (Fig. 3b). Thus,

the most appropriate ARMA model for each month

and location was ARIMA(1, 0, 0)3 (0, 1, 1)12. Because

the ARMA model included a seasonal component

and differencing, it is hereinafter referred to as the

SARIMA model.

Using 16 SARIMA models for each month, ARID

was forecast with 1-, 2-, and 3-month lead times. The

SARIMAmodels were then evaluated for each location,

month, and lead-time forecast using the RMSE and ME

as goodness-of-fit measures.

f. Computations for the ENSO approach

To characterize the ENSO episodes, generally the

JMA index is used because it is more sensitive to

El Nino and La Nina conditions than are the other in-

dices (Trenberth 1997; Hanley et al. 2003). When the

index is $0.58C (#20.58C) for six consecutive months,

including October–December, an El Nino (La Nina)

episode is defined (Hanley et al. 2003). The episode then

lasts from October through the following September.

The episode for all other values of the index is termed

neutral. This study, however, used the modified version

of the index because it givesmore significant results than

does the original index (Keeling 2008; Gerard-Marchant

and Stooksbury 2010; Royce et al. 2011). In the modified

JMA index, the El Nino (La Nina) episode stops as soon

as the temperature conditions are no longer met

(Gerard-Marchant and Stooksbury 2010).

Themonthly time series of ARIDwere separated into

three ENSO categories. In each category, the ARID

values were further separated into 12 months and in-

dividually averaged. Themonthly average value ofARID

in a particular month and ENSO phase was considered

to be the forecast of ARID for that month during that

ENSO phase.

g. Probabilistic forecasting

A probabilistic forecast is more ‘‘honest’’ than a de-

terministic forecast (Krzysztofowicz 2001) and is never

wrong (Mason 2002). Although deterministic forecasts

are more useful, they are more difficult to produce.

Therefore, most forecasts are probabilistic.

On the basis of ME, the three best-performing models

were selected for making the probabilistic forecasts of

ARID. From the 56 values of prediction error « (16 in

the case of SARIMA), computed as estimated ARID

minus original ARID, a cumulative distribution function

was created, from which the probability of « falling in-

side the negative and positive values of the prediction

error threshold E, denoted as P(2E , « , E), was

computed as P(«, E) minus P(«, 2E). The P(2E,«,E) valuewas then used as the probability of (F2E),ARID , (F 1 E), where F is the deterministic fore-

cast made by a model. Such computations were made

for each location, month, and model. For E, 0.1 was

used to indicate a small error.

3. Results and discussion

a. Comparing the forecasting models

The modeling efficiency of each of the four models

(ANFIS, ANN, LR, and SARIMA) varied depending

on the month and location (Fig. 4). The variation in the

efficiencies of the models gave an indication that, for

a specific month at a specific location, one specific model

may bemore appropriate than the others. In general, the

efficiencies of the CI-based models (ANFIS, ANN, and

LR) and the ENSO approach were large during the

winter and small during the summer in most of the lo-

cations. Furthermore, these models indicated an ac-

ceptable level of performance during the winter, as their

ME values were generally positive. During the summer,

they indicated an unacceptable level of performance, as

their ME values were mostly negative. This seasonal

difference was possibly due to the stronger signals of

ENSO and large-scale teleconnections during the winter

than in the summer. The time series–based model,

SARIMA, however, did not show this pattern. Its effi-

ciency largely depended on the periodicity of the time

series data. For instance, SARIMA was highly efficient

for a month that received a large amount of precip-

itation in each year of the time series.

In general, the efficiency of each model, including the

ENSO approach, was better in southern locations, such

as Miami, Bartow, and Live Oak, than in the northern

ones, such as Plains and Blairsville (Fig. 4). The mod-

eling efficiencies decreased in the direction of south to

north. Miami, in southern Florida, had the largest effi-

ciencies, whereas Blairsville, in northern Georgia,

showed the lowest efficiencies. This differential effi-

ciency probably arose because of stronger ENSO and

teleconnection signals and larger precipitation period-

icities in the south than in the north.

In terms of modeling efficiency, ANN generally

demonstrated the best performance for southern loca-

tions such as Miami, Bartow, and Live Oak. Of the five

approaches compared, the performances of LR and

ANFIS were generally poor for all locations. The poor

efficiency of LR was likely the result of its being a linear


model, such that it could not represent the nonlinearity

that possibly exists between ARID and climate indices.

For ANFIS, even though it was supposed to account for

nonlinearity in a system, it performed poorly. The poor

performance was most probably due to limited data

(Costa Branco andDente 2001; Marce et al. 2004; Sahoo

et al. 2005). The total number of parameters associated

with an ANFIS model was 15, whereas the number of

samples in the training dataset was only 44. Because the

training data points were fewer than 4 times the number

of model parameters, a threshold value (Marce et al.

2004), the small training dataset probably caused over-

fitting so that the generality was lost. Because of limited

data, ANFIS probably could not generate proper fuzzy

rules and derive all the complex information of the

system properly. The use of ANN, on the other hand,

does not necessarily require a large sample size (Zhang

et al. 1998). The ANN models perform well even with

a sample size of less than 50 (Kang 1991). Being a uni-

versal approximator because of its ability to approxi-

mate any nonlinear relationship between inputs and

outputs to any degree of accuracy (Hornik et al. 1989;

Smith 1996; Zhang et al. 1998; Samarasinghe 2007),

ANN performed better than the other CI-based

models—namely, ANFIS and LR. The better perfor-

mance of ANN for southern locations than for northern

locations indicated that the relationships between cli-

mate indices and ARID are stronger in the southern

locations. Probably because of weak teleconnections

between the precipitation pattern in northern locations

and the climate indices, ANN performed poorly for

Plains and Blairsville. Generally, ENSO- and CI-based

methods—namely, ANFIS, ANN, and LR—performed

poorly for the northern part of the region. The perfor-

mance of CI-based methods was even poorer than that

of the ENSO approach, indicating that the signals of the

large-scale teleconnection patterns represented by the

climate indices used in this study are even weaker than

those of the ENSO (i.e., JMA) in northern locations.

Also in southern locations, the ENSO approach per-

formed better than the other models for some months:

February and April for Miami; January, April, and

October for Bartow; and July–October for Live Oak.

These results indicated that the ENSO signal is stronger

in these months and locations than are the tele-

connections represented by the climate indices. For the

other months, signals represented by the climate indices

seemed stronger than that of the ENSO. The efficiency

of SARIMAmodels also varied depending upon month

and location. Generally, the performance of SARIMA

was poorer than those of ANN and ENSO for most of

themonths in southern locations. The poor performance

probably resulted because a SARIMA model assumes

that time series are generated from linear processes,

whereas the drought time series are often nonlinear and

seasonal (Granger and Terasvirta 1993; Zhang et al.

1998). A SARIMA model generally performs better

when periodicity exists in the data, as the periodicity is

taken into account by themodel by differencing the data

FIG. 4. Values of ME for various models associated with

1-month-ahead forecasting across months for (a) Miami, (b)

Bartow, (c) Live Oak, (d) Plains, and (e) Blairsville. Note that J5January, . . . , D 5 December indicate the 12 months.


(Mishra et al. 2007). Probably because of the lack of

seasonality in the data in these months, the SARIMA

models performed poorly. In the northern region,

however, especially for Plains, SARIMA performed

better than the other models. Even in southern loca-

tions, the performance was better than those of the

other approaches for some months: January, August,

September, andDecember forMiami; June andNovember

for Bartow; and April and June for Live Oak. The better

performance in these months was probably due to the ex-

istence of seasonality and nonstationarity in the time se-

ries data. For instance, December and January in Miami

are dry months as they usually receive less precipitation,

whereas August and September are wet months, the pe-

riod of hurricane. Therefore, SARIMAmodels performed

better when periodicity existed in the time series.

Results indicated that ARID may be forecast more

accurately for several months in most locations by using

one or more models than by using the current ENSO

approach (Table 6). For instance, whereas the fore-

casting of ARID could be improved for January in Mi-

ami by using the time series–based SARIMA models,

the index could be better forecast for March and May–

December by using the CI-based ANN models. Simi-

larly, forecasting for December in Miami could be

improved with any of the following models: ANFIS,

ANN, LR, or SARIMA. Forecasting of ARID could

not be improved with the CI-basedmodels for the other

months, probably because of the weaker signals of the

large-scale teleconnections than that of the ENSO.

Similarly, the SARIMA models could not generate

more accurate forecasts than that of ENSO because of

the lack of strong periodicity in the time series data in

these months. Table 6 also shows the best method in

terms of modeling efficiency for each month–location

combination. In general, ANNhad the highest efficiency

for southern locations such as Miami, Bartow, and Live

Oak. The SARIMA model performed best for a mid-

northern location such as Plains, whereas for Blairsville,

the northernmost location in the region, the perfor-

mance of the ENSO approach was the highest of all

methods. The probable causes of these phenomena are

explained in the preceding paragraph.

b. Lead-time forecasts

The RMSE associated with 1-, 2-, and 3-month-ahead

forecasting for eachmonth and location varied depending

on the models used. For the CI-based models—ANFIS,

ANN, andLR—theoccurrenceofRMSEt11,RMSEt12,RMSEt13 was less than 25% (Fig. 5). The terms

RMSEt11, RMSEt12, and RMSEt13 denote the RMSE

associated with 1-, 2-, and 3-month-ahead forecasting,

respectively. The percentage of occurrence shows the

number of times RMSEt11 , RMSEt12 , RMSEt13

occurred out of the total 60 cases (5 locations 3 12

months). For SARIMA, the occurrence of RMSEt11 ,RMSEt12 , RMSEt13 was 50% (i.e., 30 out of 60).

Although RMSEt11 , RMSEt12 , RMSEt13 did not

occur for 100% of the cases with each model, the results

gave an indication that short-term forecasting (e.g., for

1 month in advance) can generally make more accurate

estimations than can long-term forecasting (e.g., for 2 or

3 months in advance). The SARIMA models produced

less accurate forecasts than did the CI-based models for

the months that were more distant because of weaker

autocorrelations in the time series data with increasing

lag times. The occurrence of RMSEt11 , RMSEt12 ,RMSEt13 for fewer cases with the CI-based models than

TABLE 6. The relative ME values, computed as the ME of a method divided by the ME of the ENSO approach, showing the perfor-

mances of various methods relative to those of the ENSO approach for different months and locations. Relative ME values of greater

(less) than 1 indicate better (worse) performance than that of the ENSO approach. The largest (boldface) value in a month–location grid

(four grid cells) corresponds to the best method for that grid. For the grids without boldface values, the ENSO approach was the best.

Here, A 5 ANFIS, N 5 ANN, R 5 LR, and S 5 SARIMA.

Miami Bartow Live Oak Plains Blairsville

Month A N R S A N R S A N R S A N R S A N R S

Jan 0.6 0.6 0.5 1.1 0.6 0.7 0.8 0.0 0.3 1.5 0.4 20.7 20.8 21.0 21.9 4.0 21.6 24.7 23.4 0.0

Feb 0.5 0.5 0.3 0.6 0.5 1.1 0.6 0.5 3.5 3.7 2.8 1.5 21.6 0.3 20.3 1.9 24.8 23.7 22.9 23.2

Mar 1.4 1.4 1.4 0.2 0.5 1.1 0.8 0.6 2.6 2.5 2.8 20.6 0.8 0.9 0.2 20.5 20.9 21.3 21.5 21.6

Apr 0.5 0.8 0.2 0.1 0.6 0.5 0.3 0.8 22.6 3.6 21.5 7.1 28.7 20.8 24.4 9.7 0.1 0.9 0.3 0.6

May 14.0 11.6 2.1 236.2 0.4 3.0 20.3 214.9 27.2 3.1 25.4 218.3 25.7 22.5 23.6 21.3 224.9 216.6 218.0 29.2

Jun 213.4 7.9 2.4 24.7 211.5 2.9 22.5 5.9 23.4 22.1 22.4 2.7 265.6 222.4 232.3 0.2 24.4 21.2 21.8 20.8

Jul 20.5 4.1 0.4 23.2 24.3 1.6 21.8 28.0 22.0 22.3 22.7 20.7 28.7 21.8 25.6 3.9 22.6 0.3 21.2 20.9

Aug 25.4 1.0 0.1 12.8 219.1 2.6 213.0 231.3 27.8 23.9 25.1 22.0 22.1 20.3 22.3 21.4 21.3 20.2 21.1 0.5

Sep 16.0 5.8 23.1 84.5 22.9 1.8 22.8 25.5 24.1 20.8 22.7 0.4 24.8 24.3 26.9 5.0 24.2 20.5 21.8 20.2

Oct 22.2 2.8 0.2 0.9 212.1 24.7 25.9 239.5 20.8 20.1 20.1 21.8 15.3 217.2 5.9 17.8 21.6 20.5 22.8 24.1

Nov 0.9 4.4 0.8 0.0 0.1 1.8 0.1 2.2 2.4 4.7 4.6 21.0 0.8 2.1 1.5 2.4 20.5 0.0 21.1 1.4

Dec 2.2 2.5 2.2 2.9 0.5 1.8 0.3 1.4 1.2 3.3 1.2 2.2 0.4 2.7 1.4 0.0 22.4 22.9 22.9 0.0


with the SARIMA models gave an impression that, al-

though ARID is slightly more correlated with climate

indices with shorter lags, the correlations between

ARID and the climate indices with longer lags are also

possible.

c. Probabilistic forecasting

The distribution of « in terms of the box-and-whiskers

plots for each of the 12 months, five locations, and the

three methods (ANN, ENSO, and SARIMA) whose

performances were better than those of the others as

shown by the ME values presented in section 3a are

presented in Fig. 6. In line with the modeling efficiency

results, the performance of each method in terms of

producing small errors varied depending on months and

locations. For instance, the interquartile range of errors

for Miami in January was the smallest with the ENSO

approach, whereas that for Miami in March was the

smallest with ANN. Similarly, ANN produced the

smallest errors for Miami in June and July, whereas

SARIMA produced the smallest errors in August and

September.

The probability of « falling inside 20.1 and 0.1, de-

noted as P(20.1 , « , 0.1) and computed for each lo-

cation, month, and method, is presented in Table 7.

From this table, the probabilistic forecasts of ARID can

be made. For instance, P(20.1 , « , 0.1) for Miami in

January as estimated by the ENSO approach is 0.39. If

the forecast F made by this approach for Miami in

January were 0.26, P(F 2 0.1 , ARID , F 1 0.1); that

is, P(0.16 , ARID , 0.36) for this month and location

would be 0.39. That is, there would be 39% chance that

the forecast of ARID would fall inside 0.26 6 0.1.

FIG. 5. The percentage of occurrence of RMSEt11,RMSEt12,RMSEt13 for different forecasting models, where RMSEt11,

RMSEt12, and RMSEt13 denote the RMSE values associated with

1-, 2-, and 3-month-ahead forecasting, respectively. There were 60

cases in total (5 locations3 12 months), and therefore 50%means

30 cases.

FIG. 6. Box-and-whiskers plots of errors, computed as the difference between the original and model-estimated values of ARID, for

various months, locations, and models. Note that E 5 ENSO, N 5 ANN, and S 5 SARIMA.


Similarly, if the forecast made by an ANN model for

Bartow in July were 0.27,P(F2 0.1,ARID, F1 0.1);

that is, P(0.17 , ARID , 0.37) for this month and lo-

cation would be 0.82 (Table 7).

The large probability values in Table 7 may be due to

extremely wet conditions, extremely dry conditions, or

good teleconnections between the large-scale oceanic–

atmospheric signals (climate indices) and ARID in this

region.Under extremelywet conditions, almost all values

of ARID are 0, such as those in January, February, and

December for Plains and Blairsville (Figs. 7d,e). Under

extremely dry conditions, values of ARID concentrate

at 1. Thus, the width of the prediction error is almost

zero, and P(20.1 , « , 0.1) is high in both severely

wet and dry conditions. When the distribution of ARID

is not close to 0 or 1—such as in July, August, and

September for Miami (Fig. 7a), July and August for

Bartow (Fig. 7b); and February for Live Oak (Fig. 7c)—

the large value of P(20.1 , « , 0.1) indicates a strong

teleconnection between climate indices and ARID and

good performance of a forecasting model such as ANN.

Figure 7 also illustrates 75% probability that the mean

value of ARID as forecast by an ANNmodel falls inside

a certain range. For January in Miami, for instance,

there is 75% chance that the forecast of ARID falls in-

side 0.30 and 0.63 (Fig. 7). The 75% probability was

computed as the difference between the 87.5th percen-

tile and the 12.5th percentile.

4. Conclusions

This study attempted to verify whether the use of

climate indices—namely, AMO, NAO, JMA, Nino-3.4,

PDO, and PNA—could improve the current level of

drought forecasting, which is essentially ENSO based,

for the southeastern United States and also examined

the performance of ANFIS, ANN, ARMA, and LR

models relative to that of the ENSO approach in fore-

casting ARID, an agricultural drought index.

The results indicated that improved forecasting may

be possible for several months in most locations of the

region with the use of climate indices and forecasting

models such as ANN and SARIMA. For instance,

ARID forecasting could be improved for January in

Miami using the SARIMA models, whereas the index

could be better forecast for March–December using the

ANN models.

The results showed that the climate indices used in

this study have good potential for use in drought fore-

casting for both winter and summer months, especially

for the southern part of the region. Thus, the climate

indices can potentially contribute to a better forecast of

droughts.

The performance of the ANFIS, ANN, ENSO, LR,

and SARIMAmodels varied depending uponmonth and

location. In general, climate index–based models and

the ENSO approach performed better during the winter

than during the summer because of stronger signals of

the ENSO and large-scale teleconnections in the winter

than in the summer. The efficiency of SARIMA models

depended largely on the periodicity of precipitation in

the data. The ANN models generally outperformed the

other models for the southern part of the region,

whereas the ENSO approach performed better than the

other models for the northern part. The SARIMA

models were best for the central part of the region. In

general, LR and ANFIS performed poorly, as the for-

mer could not represent the nonlinearity between

ARID and the climate indices and the latter lost gen-

erality mainly because of limited data. The relatively

better performance of ANN was due to its ability to

approximate the nonlinear relationship between input

TABLE 7. The P(20.1 , « , 0.1) values as estimated by various models for different months and locations in the southeastern United

States. Here, E 5 ENSO, N 5 ANN, and S 5 SARIMA.

Miami Bartow Live Oak Plains Blairsville

Month E N S E N S E N S E N S E N S

Jan 0.39 0.29 0.16 0.23 0.14 0.36 0.44 0.48 0.56 0.94 0.91 0.96 0.99 0.98 0.96

Feb 0.32 0.34 0.36 0.30 0.30 0.16 0.45 0.64 0.44 0.89 0.88 0.92 0.99 0.98 0.96

Mar 0.32 0.38 0.36 0.34 0.27 0.20 0.39 0.43 0.24 0.53 0.55 0.52 0.94 0.91 0.84

Apr 0.49 0.48 0.60 0.33 0.36 0.28 0.31 0.34 0.32 0.37 0.29 0.40 0.56 0.54 0.52

May 0.45 0.46 0.28 0.35 0.39 0.40 0.35 0.38 0.44 0.37 0.36 0.32 0.49 0.45 0.40

Jun 0.52 0.50 0.32 0.42 0.50 0.36 0.46 0.41 0.40 0.39 0.38 0.28 0.46 0.41 0.40

Jul 0.67 0.75 0.60 0.72 0.82 0.60 0.53 0.54 0.64 0.51 0.38 0.52 0.38 0.34 0.28

Aug 0.59 0.63 0.68 0.67 0.64 0.56 0.49 0.52 0.52 0.42 0.43 0.40 0.44 0.45 0.36

Sep 0.80 0.77 0.84 0.46 0.46 0.32 0.46 0.36 0.40 0.34 0.36 0.20 0.35 0.27 0.32

Oct 0.38 0.36 0.28 0.34 0.32 0.12 0.33 0.29 0.32 0.30 0.30 0.32 0.18 0.38 0.28

Nov 0.22 0.23 0.16 0.24 0.30 0.24 0.24 0.27 0.20 0.28 0.38 0.32 0.51 0.68 0.76

Dec 0.32 0.32 0.28 0.23 0.23 0.32 0.20 0.20 0.04 0.73 0.73 0.84 0.97 0.98 0.96


and output. The climate index–based models performed

worse than the ENSO approach for northern locations,

indicating that the signals of the large-scale tele-

connection patterns represented by the climate indices

are even weaker than those of ENSO (i.e., JMA) in the

northern part of the region. In general, the performance

of each model decreased in the direction of south to

north as a result of weaker ENSO and teleconnection

signals and less-distinct precipitation periodicities in the

north than in the south.

Acknowledgments. The authors thank the USDA

National Institute for Food and Agriculture and the

NOAA Regional Integrated Sciences and Assessments

program for supporting this work through grants.

REFERENCES

Abbasi, B., 2009: A neural network applied to estimate process

capability of non-normal processes. Expert Syst. Appl., 36,

3093–3100.

Allen, R. G., L. S. Pereira, D. Raes, and M. Smith, 1998: Crop

evapotranspiration: Guidelines for computing crop water re-

quirements. FAO Irrigation and Drainage Paper 56, 326 pp.

——, I. A. Walter, R. L. Elliott, T. A. Howell, D. Itenfisu, M. E.

Jensen, and R. L. Snyder, 2005: The ASCE Standardized

Reference Evapotranspiration Equation. American Society of

Civil Engineers, 216 pp.

Altun, S., A. B. Goktepe, A. M. Ansal, and C. Akguner, 2009:

Simulation of torsional shear test results with neuron-fuzzy

control system. Soil Dyn. Earthquake Eng., 29, 253–260.

ASCE Task Committee on Application of Artificial Neural Net-

works in Hydrology, 2000: Artificial neural networks in

hydrology. II: Hydrologic applications. J. Hydrol. Eng., 5, 124–137.

ASCE Task Committee on Definition of Criteria for Evaluation of

Watershed Models of the Watershed Management Commit-

tee, Irrigation and Drainage Division, 1993: Criteria for

evaluation of watershedmodels. J. Irrig. Drain. Eng., 119, 429–

442.

Bacanli, U. G., M. Firat, and F. Dikbas, 2008: Adaptive neuron-

fuzzy inference system for drought forecasting. Stochastic

Environ. Res. Risk Assess., 23, 1143–1154.

Belsley, D. A., E. Kuh, and R. E. Welsch, 1980: Regression Di-

agnostics: Identifying Influential Data and Sources of Collin-

earity. Wiley-Interscience, 292 pp.

Bond, P., R. Dada, and G. Erion, 2007: Climate Change, Carbon

Trading and Civil Society: Negative Returns on South African

Investments. Rozenberg Publishers, 190 pp.

Box, G. E. P., and G. M. Jenkins, 1970: Time Series Analysis,

Forecasting, and Control. Holden-Day, 553 pp.

Brolley, J. M., J. J. O’Brien, J. Schoof, and D. Zierden, 2007: Ex-

perimental drought threat forecast for Florida. Agric. For.

Meteor., 145, 84–96.

Campbell, K. O., 1973: The future role of agriculture in the

Australian economy. The Environmental, Economic, and

Social Significance of Drought, J. V. Lovett, Ed., Angus and

Robertson, 3–14.

Cancelliere, A., G. D. Mauro, B. Bonaccorso, and G. Rossi, 2007:

Drought forecasting using the standardized precipitation in-

dex. Water Resour. Manage., 21, 801–819.

Chakraborty, K., K. Mehrotra, C. K. Mohan, and S. Ranka, 1992:

Forecasting the behavior of multivariate time series using

neural networks. Neural Networks, 5, 961–970.

Chang, F. J., and Y. T. Chang, 2006: Adaptive neuron-fuzzy in-

ference system for prediction of water level in reservoir. Adv.

Water Resour., 29, 1–10.Chiew, F. H. S., T. C. Piechota, J. A. Dracup, and T. A. McMahon,

1998: El Nino/Southern Oscillation and Australian rainfall,

FIG. 7. Distributions of ARIDwith the mean and the 12.5th- and

87.5th-percentile values as forecast by ANNmodels across months

for (a)Miami, (b) Bartow, (c) LiveOak, (d) Plains, and (e) Blairsville.

Also shown are values of P(20.1 , prediction error , 0.1).


stream flow, and drought: Links and potential for forecasting.

J. Hydrol., 204, 138–149.

Cordery, I., and M. McCall, 2000: A model for forecasting drought

from teleconnections. Water Resour. Res., 36, 763–768.

Costa Branco, P. J., and J. A. Dente, 2001: Fuzzy systemsmodeling

in practice. Fuzzy Sets Syst., 121, 73–93.Cutore, P., G. D. Mauro, and A. Cancelliere, 2009: Forecasting

Palmer index using neural networks and climatic indexes.

J. Hydrol. Eng., 14, 588–595.

Cybenko, G., 1989: Approximation by superposition of a sigmoidal

function. Math. Control Signals Syst., 2, 303–314.De Groot, C., and D. Wurtz, 1991: Analysis of univariate time

series with connectionist nets: A case study of two classical

examples. Neurocomputing, 3, 177–192.

Enfield, D. B., A. M. Mestas-Nunez, and P. J. Trimble, 2001: The

Atlantic multidecadal oscillation and its relation to rainfall

and river flows in the continental U.S.Geophys. Res. Lett., 28,

2077–2080.

FEMA, 1995: National Mitigation Strategy: Partnerships for

building safer communities. Federal Emergency Management

Agency Rep., 45 pp. [Available online at http://www.fas.org/

irp/agency/dhs/fema/mitigation.pdf.]

Firat, M., andM. Gungor, 2008: Hydrological time-series modeling

using adaptive neuron-fuzzy inference system. Hydrol. Pro-

cesses, 22, 2122–2132.

Gerard-Marchant, P., and D. Stooksbury, 2010: Impact of El Nino/

Southern Oscillation on low-flows in south Georgia, USA.

Southeast. Geogr., 50, 218–243.

Gershunov, A., and T. P. Barnett, 1998a: ENSO influence on in-

traseasonal extreme rainfall and temperature frequencies in

the contiguous United States: Observations andmodel results.

J. Climate, 11, 1575–1586.

——, and ——, 1998b: Interdecadal modulation of ENSO tele-

connections. Bull. Amer. Meteor. Soc., 79, 2715–2725.

Granger, C. W. J., 1993: Strategies for modeling nonlinear time

series relationships. Econ. Rec., 69, 233–238.

——, and T. Terasvirta, 1993: Modeling Nonlinear Economic Re-

lationships. Oxford University Press, 198 pp.

Hagan, M. T., H. P. Demuth, and M. Beale, 1996: Neural Network

Design. PWS Publishing, 736 pp.

Hagemeyer, B. C., 2006: ENSO, PNA and NAO scenarios for ex-

treme storminess, rainfall and temperature variability during

the Florida dry season. Preprints, 18th Conf. on Climate Var-

iability and Change, Atlanta, GA, Amer. Meteor. Soc., P2.4.

[Available online at http://ams.confex.com/ams/pdfpapers/

98077.pdf.]

Hanley, D. E., M. A. Bourassa, J. J. O’Brien, S. R. Smith, and E. R.

Spade, 2003: A quantitative evaluation of ENSO indices.

J. Climate, 16, 1249–1258.

Hansen, J. W., A. W. Hodges, and J. W. Jones, 1998: ENSO in-

fluences on agriculture in the southeastern United States.

J. Climate, 11, 404–411.

Hastenrath, S., and L. Greischar, 1993: Further work on the pre-

diction of Northeast Brazil rainfall anomalies. J. Climate, 6,

743–758.

Hornik, K., M. Stichcombe, and H. White, 1989: Multi-layer feed-

forward networks are universal approximators. Neural Net-

works, 2, 359–366.

Jang, J. R., 1993: ANFIS: Adaptive-network-based fuzzy inference

system. IEEE Trans. Syst. Man Cybern., 23, 665–685.

Kang, S. Y., 1991:An investigation of the use of feedforward neural

networks for forecasting. Ph.D. dissertation, Kent State Uni-

versity, 162 pp.

Keeling, T. B., 2008: Modified JMA ENSO index and its im-

provements to ENSO classification. M.S. thesis, Dept. of

Meteorology, The Florida State University, 117 pp. [Available

online at etd.lib.fsu.edu/theses_1/available/etd-12222008-094254/

unrestricted/KeelingTThesis.pdf.]

Kim, T., and J. B. Valdes, 2003: Nonlinear model for drought

forecasting based on a conjunction of wavelet transforms and

neural networks. J. Hydrol. Eng., 8, 319–328.Klimasauskas, C. C., 1991: Applying neural networks. Part III:

Training a neural network. PC AI, 5, 20–24.

Konrad, C. E., 1998: Intramonthly indices of the Pacific/North

American teleconnection pattern and temperature regimes

over United States. Theor. Appl. Climatol., 60, 11–19.

Krzysztofowicz, R., 2001: The case for probabilistic forecasting in

hydrology. J. Hydrol., 249, 2–9.Kumar, V., and U. Panu, 1997: Predictive assessment of severity of

agricultural droughts based on agro-climatic factors. J. Amer.

Water Resour. Assoc., 33, 1255–1264.

Leathers, D. J., B. Yarnal, and M. A. Palecki, 1991: The Pacific/

North American pattern and United States climate. Part I:

Regional temperature and precipitation associations. J. Cli-

mate, 4, 517–528.

Lee, S. H., R. J. Howlett, C. Crua, and S. D. Walters, 2007: Fuzzy

logic and neuron-fuzzy modeling of diesel spray penetration:

A comparative study. J. Intell. Fuzzy Syst.: Appl. Eng. Tech-

nol., 18, 43–56.Legates, D. R., and G. J. McCabe, 1999: Evaluating the use of

goodness-of-fit measures in hydrologic and hydroclimatic

model validation. Water Resour. Res., 35, 233–241.

Maier, H. R., and G. C. Dandy, 1998: The effect of internal param-

eters and geometry on the performance of back-propagation

neural networks:An empirical study.Environ.Modell. Software,

13, 193–209.

——, and ——, 2000: Neural networks for the prediction and

forecasting of water resources variables: A review of modeling

issues and applications. Environ. Modell. Software, 15, 101–

124.

Marce, R., M. Comerma, J. C. Garcia, and J. Armengo, 2004:

A neuron-fuzzy modeling tool to estimate fluvial nutrient

loads in watersheds under time-varying human impact.Limnol.

Oceanogr. Methods, 2, 342–355.Marja, M., and M. Esa, 1997: Neuro-fuzzy modelling of spectro-

scopic data. Part A—Modelling of dye solutions. J. Soc. Dyers

Colour., 113, 13–17.

Martinez, C. J., G. A. Baigorria, and J. W. Jones, 2009: Use of

climate indices to predict corn yields in southeast USA. Int.

J. Climatol., 29, 1680–1691.

Mason, S. J., 2002: Analyzing and communicating forecast quality

and probability. Asian Disaster Preparedness Center Asian

Climate TrainingManual,Module 3.3, 9 pp. [Available online at

http://www.adpc.net/ece/ACT_man/ACT-3.3-ForecastQuality.

pdf.]

MathWorks, cited 2012: Fuzzy Logic Toolbox user’s guide,

R2012b. [Available online at http://www.mathworks.com/

access/helpdesk/help/pdf_doc/fuzzy/fuzzy.pdf.]

Mendicino, G., A. Senatore, and P. Versace, 2008: A groundwater

resource index for drought monitoring and forecasting in

a Mediterranean climate. J. Hydrol., 357, 282–302.

Mishra, A. K., and V. R. Desai, 2005: Drought forecasting using

stochastic models. Stochastic Environ. Res. Risk Assess., 19,

326–339.

——, and ——, 2006: Drought forecasting using feed-forward re-

cursive neural network. Ecol. Modell., 198, 127–138.


——, andV. P. Singh, 2010: A review of drought concepts. J. Hydrol.,

391, 202–216.

——, and ——, 2011: Drought modeling—A review. J. Hydrol.,

403, 157–175.——, V. R. Desai, and V. P. Singh, 2007: Drought forecasting using

a hybrid stochastic and neural networkmodel. J. Hydrol. Eng.,

12, 626–638.

Modarres, R., 2006: Streamflow drought time series forecasting.

Stochastic Environ. Res. Risk Assess., 21, 223–233.

Moriasi, D. N., J. G. Arnold, M. W. van Liew, R. L. Bingner, R. D.

Harmel, and T. L. Veith, 2007: Model evaluation guidelines

for systematic quantification of accuracy in watershed simu-

lations. Trans. ASABE, 50, 885–900.

Nash, J. E., and J. V. Sutcliffe, 1970: River flow forecasting through

conceptualmodels. Part I: A discussion of principles. J. Hydrol.,

10, 282–290.

Nayak, P. C., K. P. Sudheer, D. M. Rangan, and K. S. Ramasastri,

2004: A neuron-fuzzy computing technique for modeling hy-

drological time series. J. Hydrol., 291, 52–66.Nicholls, N., 1985: Towards the prediction of major Australian

droughts. Aust. Meteor. Mag., 33, 161–166.

NSTC, 2005: Grand challenges for disaster reduction. National Science

and Technology Council Rep., 25 pp. [Available online at http://

www.sdr.gov/docs/SDRGrandChallengesforDisasterReduction.

pdf.]

Panu, U. S., and T. C. Sharma, 2002: Challenges in drought re-

search: Some perspectives and future directions. Hydrol. Sci.,

47, S19–S30.

Principe, J. C., N. R. Euliano, andW. C. Lefebvre, 1999:Neural and

Adaptive Systems: Fundamentals through Simulations. John

Wiley and Sons, 672 pp.

Rogers, J. C., 1984: The association between the North Atlantic

Oscillation and the Southern Oscillation in the Northern

Hemisphere. Mon. Wea. Rev., 112, 1999–2015.Ropelewski, C. F., and M. S. Halpert, 1986: North American pre-

cipitation and temperature patterns associated with the El

Nino/South Oscillation (ENSO). Mon. Wea. Rev., 115, 2352–2362.

Royce, F. S., C. W. Fraisse, and G. A. Baigorria, 2011: ENSO

classification indices and summer crop yields in the south-

eastern USA. Agric. For. Meteor., 151, 817–826.Rumelhart, D. E., R. Durbin, R. Golden, and Y. Chauvin, 1995:

Backpropagation: The basic theory. Backpropagation:

Theory, Architectures, and Applications, Y. Chauvin and

D. E. Rumelhart, Eds., Lawrence Erlbaum Associates, 1–

34.

Sahoo, G. B., C. Ray, andH. F. Wade, 2005: Pesticide prediction in

ground water in North Carolina domestic wells using artificial

neural networks. Ecol. Modell., 183, 29–46.Samarasinghe, S., 2007: Neural Networks for Applied Sciences and

Engineering. Auerbach Publications, 570 pp.

Schmidt, N., E. K. Lipp, J. B. Rose, andM. E. Luther, 2001: ENSO

influences on seasonal rainfall and river discharge in Florida.

J. Climate, 14, 615–628.

Sevat, E., and A. Dezetter, 1991: Selection of calibration objective

function in the context of rainfall-runoff modeling in a Suda-

nese savannah area. Hydrol. Sci. J., 36, 307–330.

Smith, M., 1996: Neural Networks for Statistical Modeling. In-

ternational Thompson Computer Press, 235 pp.

Smith, S. R., P. M. Green, A. P. Leonardi, and J. J. O’Brien, 1998:

Role of multiple-level troposphereic circulations in forcing

ENSO winter precipitation anomalies. Mon. Wea. Rev., 126,

3102–3116.

Stahl, K., and S. Demuth, 1999: Methods for regional classification

of stream flow drought series: Cluster analysis. University of

Freiburg Assessment of the Regional Impact of Droughts in

Europe Tech. Rep. 1, 8 pp. [Available online at www.hydrology.

uni-freiburg.de/forsch/aride/navigation/publications/pdfs/

aride-techrep1.pdf.]

Steinemann, A. C., 2006: Using climate forecasts for drought

management. J. Appl. Meteor. Climatol., 45, 1353–1361.

Stevens, K. A., 2007: Statistical associations between large scale

climate oscillations and mesoscale surface meteorological

variability in the Apalachicola-Chattahoochee-Flint River

basin. M.S. thesis, Dept. of Meteorology, The Florida State

University, 113 pp. [Available online at http://diginole.lib.fsu.

edu/cgi/viewcontent.cgi?article54838&context5etd.]

Takagi, T., and M. Sugeno, 1985: Fuzzy identification of systems

and its applications to modeling and control. IEEE Trans.

Syst. Man Cybern., 15, 116–132.Tang, Z., and P. A. Fishwick, 1993: Feedforward neural nets as

models for time series forecasting.ORSA J. Comput., 5, 374–

385.

——,C. Almeida, and P. A. Fishwick, 1991: Time series forecasting

using neural networks vs. Box–Jenkins methodology. Simu-

lation, 57, 303–310.

Trenberth, K. E., 1997: The definition of El Nino. Bull. Amer.

Meteor. Soc., 78, 2771–2777.

USDA-SCS, 1972: Estimation of direct runoff from storm rainfall.

Hydrology, Section 4, National Engineering Handbook,

USDA Soil Conservation Service, 24 pp.

Vasiliades, L., and A. Loukas, 2008: Meteorological drought

forecasting models. Geophysical Research Abstracts, Vol. 10,

Abstract EGU2008-A-08458. [Available online at http://cosis.

net/abstracts/EGU2008/08458/EGU2008-A-08458.pdf.]

Wilpen, L. G., D. Nagin, and J. Szczypula, 1994: Comparative

study of artificial neural network and statistical models for

predicting student grade point averages. Int. J. Forecasting,

10, 17–34.

Woli, P., J. W. Jones, K. T. Ingram, and C. W. Fraisse, 2012: Ag-

ricultural reference index for drought (ARID).Agron. J., 104,

287–300.

——, ——, and ——, 2013: Assessing the Agricultural Reference

Index for Drought (ARID) using uncertainty and sensitivity

analyses. Agron. J., 105, 150–160.

Wood, A. W., and D. P. Lettenmaier, 2006: A test bed for new

seasonal hydrologic forecasting approaches in the western

United States. Bull. Amer. Meteor. Soc., 87, 1699–1712.

Yin, Z. Y., 1994: Moisture condition in the south-eastern USA and

teleconnection patterns. Int. J. Climatol., 14, 947–967.

Zadeh, L. A., 1965: Fuzzy sets. Inf. Control, 8, 338–353.

Zhang, G., B. E. Patuwo, and M. Y. Hu, 1998: Forecasting with

artificial neural networks: The state of art. Int. J. Forecasting,

14, 35–62.


forecasting drought using the agricultural reference index for drought (arid): a case study

Documents