forecasting drought using the agricultural reference index for drought (arid): a case study
TRANSCRIPT
Forecasting Drought Using the Agricultural Reference Index forDrought (ARID): A Case Study
PREM WOLI
Mississippi State University, Mississippi State, Mississippi
JAMES JONES AND KEITH INGRAM
University of Florida, Gainesville, Florida
JOEL PAZ
Mississippi State University, Mississippi State, Mississippi
(Manuscript received 25 April 2012, in final form 19 October 2012)
ABSTRACT
Drought forecasting can aid in developing mitigation strategies and minimizing economic losses. Drought
may be forecast using a drought index, which is an indicator of drought. The agricultural reference index for
drought (ARID) was used as a tool to investigate the possibility of using climate indices (CIs) as predictors to
improve the current level of forecasting, which is El Nino–Southern Oscillation (ENSO) based. The per-
formances of models that are based on linear regression (LR), artificial neural networks (ANN), adaptive
neuron-fuzzy inference systems (ANFIS), and autoregressive moving averages (ARMA) models were
compared with that of the ENSO approach. Monthly values of ARID spanning 56 yr were computed for five
locations in the southeasternUnited States, andmonthly values of the CIs having significant connections with
weather in this region were obtained. For the ENSO approach, the ARID values were separated into three
ENSO phases and averaged by phase. For the ARMAmodels, monthly time series of ARID were used. For
the ANFIS, ANN, and LRmodels, ARID was predicted 1, 2, and 3 months ahead using the past values of the
first principal component of the CIs. Model performances were assessed with the Nash–Sutcliffe index. Re-
sults indicated that drought forecasting could be improved for the southern part of the region using ANN
models and CIs. The ANN outperformed the other models for most locations in the region. The CI-based
models and theENSOapproach performed better during thewinter, whereas the efficiency ofARMAmodels
depended on precipitation periodicities. All models performed better for southern locations. The CIs showed
good potential for use in forecasting drought, especially for southern locations in the winter.
1. Introduction
Drought is a creeping natural hazard, causing tre-
mendous economic and social losses worldwide. In the
United States, about $7 billion is lost every year through
drought (FEMA 1995; NSTC 2005). Although drought
cannot be prevented, its losses can be minimized by
mitigation strategies if it is forecast well in advance.
Because it evolves slowly, its mitigation plans can be
carried out more effectively than those of other natural
hazards. The benefits of reliable forecasting are tremen-
dous (Campbell 1973).
Drought may be forecast using drought variables
(Hastenrath andGreischar 1993;Chiewet al. 1998;Cordery
and McCall 2000) or drought indicators (Nicholls 1985;
Mishra and Desai 2005; Cancelliere et al. 2007; Vasiliades
and Loukas 2008). A drought variable is a key element
that defines drought, such as precipitation, soil moisture,
or evapotranspiration. A drought indicator is a measure
used to quantify and characterize drought conditions.
For forecasting a meteorological drought, forecasting
precipitation may suffice. For forecasting an agricultural
drought (the shortage of water in the root zone to meet
the consumptive use of plants), however, forecasting all
associated variables is required. This is difficult, because
Corresponding author address: Prem Woli, Mississippi State
University, 130 Creelman St., Mississippi State, MS 39762-9632.
E-mail: [email protected]
APRIL 2013 WOL I ET AL . 427
DOI: 10.1175/WAF-D-12-00036.1
� 2013 American Meteorological Society
these variables are controlled by several plant and soil
processes. Therefore, for agricultural drought, the use of
a drought indicator, such as a drought index, is more
appropriate than using a drought variable.
Various methods exist for forecasting a drought index
(Mishra and Singh 2011), one of which is to use its own
past values (Panu and Sharma 2002; Mishra and Desai
2005; Mendicino et al. 2008). This approach involves
quantifying a relationship between the present and past
values of the index by fitting the data to a time series
model, such as an autoregressive moving average
(ARMA; Box and Jenkins 1970). Another approach is
to predict the index as a function of the past values of its
predictor variables. Teleconnection indices may be used
as drought predictors (Mishra and Singh 2010, 2011)
because atmospheric circulation patterns have been
found to influence droughts (Stahl and Demuth 1999).
The linkage between changes in the atmosphere in one
place and effects on the weather in another, distant place
is called teleconnection. The indices of large-scale oce-
anic and atmospheric teleconnection patterns, or oscil-
lations, are called teleconnection or climate indices.
Although climate indices are generally calculated on the
basis of measurements from a localized area, they can be
statistically related with climate variables of other areas
around the globe (Bond et al. 2007).
Because teleconnection patterns affect weather vari-
ables such as temperature and precipitation, the key
inputs of a drought index, a plausible relationship is
likely to exist between a drought index and climate in-
dices. Some of the major teleconnection patterns that
are reported to have significant impacts on the weather
of the southeastern United States are the Atlantic
multidecadal oscillation (AMO; Gershunov and Barnett
1998b; Enfield et al. 2001; Stevens 2007), theNorthAtlantic
Oscillation (NAO; Rogers 1984; Yin 1994; Hagemeyer
2006; Stevens 2007), the Pacific decadal oscillation (PDO;
(Gershunov and Barnett 1998b; Enfield et al. 2001; Stevens
2007), the Pacific–North American (PNA) pattern
(Leathers et al. 1991; Yin 1994; Konrad 1998; Hagemeyer
2006; Martinez et al. 2009), sea surface temperature
(SST) in the Nino-3.4 region (i.e., the east-central
tropical Pacific: 58N–58S, 1708–1208W) (Trenberth 1997;
Gershunov and Barnett 1998b; Schmidt et al. 2001;
Hanley et al. 2003; Hagemeyer 2006), and the SST over
the region of 48S–48N, 1508–908W (Hansen et al. 1998).
TheAMO index is basically an index of ongoing series
of long-duration changes in the SST of the North At-
lantic Ocean over 08–708N. The NAO index is the dif-
ference in sea level pressure between two stations
situated close to the Icelandic low and Azores high
pressure centers. The PNA index is a linear combination
of the normalized geopotential height anomalies at the
700-hPa level at four locations: Hawaii (208N, 1608W);
the North Pacific Ocean (458N, 1658W); Alberta, Canada
(558N, 1158W); and the Gulf Coast of the United States
(308N, 858W). The PDO index is the spatial average of
themonthly SST of the Pacific Ocean north of 208N. The
Nino-3.4 index is the departure in monthly SST from
their long-term means over the Nino-3.4 region. The
JapanMeteorological Agency (JMA) index is a 5-month
running mean of spatially averaged SST anomalies over
the tropical Pacific region (48S–48N, 1508–908W).
The most commonly used approach for forecasting
precipitation or drought in the southeastern United
States is based on the El Nino–Southern Oscillation
(ENSO). The strong teleconnection between ENSO and
the weather conditions in this region has enabled skillful
forecasting of seasonal temperature and precipitation
up to 1 yr in advance (Steinemann 2006; Brolley et al.
2007). The drought-forecasting ability of the ENSO
approach, however, varies across months and locations
because the impact of ENSO varies across regions and
seasons (Ropelewski and Halpert 1986; Gershunov and
Barnett 1998a; Smith et al. 1998; Brolley et al. 2007).
That is, the ENSO-based approach might produce
plausible forecasts for some months of the year and in-
accurate forecasts for other months. This situation ne-
cessitates the search for other approaches. The study
presented here explored whether climate indices could
produce more accurate forecasts than those produced
using ENSO for the months or locations for which the
predictability of ENSO is not good.
For quantifying predictor–predictand relationships,
various statistical models have been used, such as linear
regression (LR; Kumar and Panu 1997; Panu and
Sharma 2002; Wood and Lettenmaier 2006; Vasiliades
and Loukas 2008), artificial neural networks (ANN;
Wilpen et al. 1994; Zhang et al. 1998; Panu and Sharma
2002; Vasiliades and Loukas 2008), and the adaptive
neuro-fuzzy inference system (ANFIS; Nayak et al.
2004; Chang and Chang 2006; Bacanli et al. 2008; Firat
and Gungor 2008). An ARMA model can model the
periodicity and nonstationarity of time series data effec-
tively (Modarres 2006; Mishra et al. 2007) and can pro-
vide systematic procedures for modeling the behavior
of uncertain systems (Mishra and Desai 2005). An LR
model can explicitly address causal relationships and can
show optimal results for a linear mechanism (Belsley
et al. 1980). An ANN model is able to map nonlinear,
complex, and arbitrary input–output relationships
(Hornik et al. 1989; Smith 1996; Zhang et al. 1998; Kim
and Valdes 2003; Samarasinghe 2007). An ANFIS
(Zadeh 1965; Jang 1993) is a promising tool formodeling
complex, nonlinear systems (Marce et al. 2004; Nayak
et al. 2004; Bacanli et al. 2008; Altun et al. 2009) and has
428 WEATHER AND FORECAST ING VOLUME 28
the ability to draw conclusions from vague, ambiguous,
incomplete, and imprecise information (Lee et al. 2007).
Several researchers have used thesemodels, individually
or in combination, to forecast various hydrological vari-
ables. Some researchers, such as Cutore et al. (2009),
have usedANN to forecast a hydrological drought, which
is an expression of precipitation shortfall on surface or
subsurface water supply. For forecasting an agricultural
drought, however, no study has assessed all of these
methods using a number of climate indices mentioned
above. The curent study examined the performance of
ANFIS,ANN,ARMA, and LRmodels relative to that of
the ENSO approach in forecasting the agricultural ref-
erence index for drought (ARID;Woli et al. 2012), which
is a computationally simple, physically and physiologi-
cally sound, and generally applicable drought index that
can characterize an agricultural drought better thanmany
other drought indices that are applied to agricultural
systems (Woli et al. 2012). Although ARID is a simple
and generic drought index, it is applicable to a wide range
of conditions (Woli et al. 2013).
TheARID is based on the reference grass (Allen et al.
2005) and is expressed as the ratio of plant water deficit
to plant water need as
ARIDi 5 12 (Ti/ETo,i) , (1)
where the subscript i stands for the ith day, T is tran-
spiration (mm day21), and ETo is the reference grass
evapotranspiration (mm day21). Values of ARID fall
between 0, indicating no water stress, and 1, indicating
full water stress. Whereas ETo is computed using the
United Nations Food and Agriculture Organization Ir-
rigation and Drainage Paper 56 (FAO-56) Penman–
Monteith model (Allen et al. 1998), T is estimated as
Ti5min(azQada,i21, ETo,i) , (2)
whereTi is the transpiration in day i (mm day21), a is the
water uptake coefficient, z is the root-zone depth (mm),
and Qada,i21 is plant available water content after deep
drainage at the end of the previous day (mm mm21). The
available water content in day i Qada,i is estimated as the
ratio of the amount of available water in the root zone in
day i—Wi (mm)—to the root-zone depth: Qada,i 5Wi/z.
The Wi is estimated from a simple soil water balance:
Wi 5Wi211Pi 1 Ii 2Ti 2Di 2Ri , (3)
where subscripts i and i 2 1 denote the current and
previous day, respectively, and Pi, Ii, Ti, Di, and Ri are
precipitation, irrigation, transpiration, deep drainage,
and surface runoff that occurred in day i (mm day21),
respectively. The deep drainage for day i is estimated as
Di 5bz(Qbda,i 2 um) , (4)
where b is drainage coefficient, Qbda,i is the available soil
water content in day i before drainage (mm mm21), and
um is plant available water capacity (mm mm21). Using
the USDA-SCS (1972) method, ARID estimates daily
surface runoff (mm day21) as
Ri 5 (Pi2 Ia)2/(Pi 2 Ia1 S) , (5)
where Ia is initial abstraction (mm day21) and S is po-
tential maximum retention (mm day21), which is com-
puted as S 5 25 400/h (where h is the runoff curve
number).
2. Materials and methods
a. Sites, data, and general computations
Of the five forecasting methods assessed—ANFIS,
ANN, LR, ARMA, and ENSO—only the first three
methods used climate indices as the predictors of ARID.
Because ARMA is a time series model, it only uses the
past values of the predictand to predict future values.
Although ENSO is also teleconnection based (JMA),
the ENSO approach does not directly use the values of
the JMA index and the other climate indices used by
ANFIS, ANN, and LR. The predictors of ARID for the
climate index–based methods were chosen, and further
computations and analyses were made in the following
steps (which are also summarized in Table 1):
1) Through a literature review, six large-scale oceanic–
atmospheric oscillations having significant impacts
on the weather of the southeastern United States
were identified: AMO, NAO, JMA, Nino-3.4,
PDO, and PNA.
2) Five locations in the southeastern United States—
Miami (25.58N, 80.58W), Bartow (27.98N, 81.88W),
and LiveOak (30.38N, 82.98W) in Florida and Plains
(32.78N, 84.28W) and Blairsville (34.88N, 83.98W) in
Georgia—were selected by considering latitude and
the availability of historical weather data.
3) Daily weather data spanning 56 yr (1951–2006) for
the respective locations were obtained on the
Internet from the Florida Automated Weather
Network and theGeorgia Automated Environmen-
tal Monitoring Network.
4) The monthly values of the indices of the oscillations
mentioned in step 1, called climate indices, for the years
mentioned in step 3 were obtained from the Internet.
5) From the weather data mentioned in step 3, daily
values of ARID were computed for each of the five
APRIL 2013 WOL I ET AL . 429
locations, which later were converted to monthly
values by averaging the daily values over a month.
For instance, a monthly value for January was
computed by averaging all 31 daily values of this
month. Computations were carried out monthly
because most of the climate indices used in the study
recorded only monthly values. Moreover, a monthly
time scale is appropriate for monitoring drought
from the viewpoints of farm management and
crop-growth-stage sensitivity to water deficit (Panu
and Sharma 2002).
6) For each climate index, the dataset obtained in step
4 was separated by month, thus creating 12 subsets,
one for each month. The monthly separation of the
dataset was carried out by assuming that the degree
of linkage between large-scale teleconnection pat-
terns and the weather in this region might vary
across months.
7) To assess the predictability of ARID for different
lead times, ARIDwas forecast with 1–3-month lead
time (t1 1, t1 2, and t1 3) using the current (time t)
values of its predictors. The predictors’ lag time
remained the same (t) while predicting ARID at
different lead times. To forecast ARID at three lead
times, three series of ARID were created and each
series individually was used. As in step 6, 12 subsets of
ARIDat each lead-time forecast (ARIDt11,ARIDt12,
or ARIDt13) were created for each location.
8) The correlation between a climate index andARID
at each lead time was examined for each climate
index, month, and location. When the correlation
was significant at the 90% confidence level (a 50.1), the climate index was selected. For the signif-
icant cases, the absolute values of the Pearson’s
correlation coefficient jrj were greater than 0.22.
When both AMO and NAO were significantly
correlated with ARID, NAO was chosen from the
two; when ARID was not significantly correlated
withNAObut waswithAMO, the latter was used. In
a similar way, when both Nino-3.4 and JMA were
significantly correlated with ARID, Nino-3.4 was
chosen; JMA was used when ARID was not signif-
icantly correlated with Nino-3.4 but was with JMA.
These specific selections were made because AMO
and NAO are similar and so are Nino-3.4 and JMA.
In addition, NAO has a stronger influence than
AMO in the southeastern United States, and Nino-
3.4 has a stronger influence than JMA (J. O’Brien
2009, personal communication). Following the
above procedures, climate indices, the predictors
of ARID, were selected for each month, location,
and lead-time forecast (Table 2).
9) The climate indices selected for any of the five
locations in the region for a month were considered
to be the predictors of ARID for the entire region
for that month (the last column of Table 2). The
climate indices selected for the southeasternUnited
States for each month and lead-time forecast are
presented in Table 3.
10) The cross correlations among the six climate indices
mentioned in step 1 were examined (Table 4).
Examining the correlations was necessary because
regression techniques require the predictors to be
independent (Cordery and McCall 2000).
11) Because most of the climate indices were signifi-
cantly cross correlated at the significance level of 0.05
(Table 4), principal component analysis was per-
formed on the climate indices chosen for each month
and lead-time forecast (Table 3), and the first princi-
pal component, hereinafter referred to asX, was used
as the predictor of ARID. TheXwas chosen because
(i) the total variance it explained was generally
reasonably good (Table 3) and (ii) its use made the
number of predictors across months and lead-time
TABLE 1. Steps involved in selecting the predictors of ARID and making further computations.
Step Activity
1 Large-scale oscillations were identified for the southeastern United States
2 Five locations in the region were selected
3 Historical weather data spanning 1951–2006 were obtained
4 Monthly values of the CIs were obtained
5 Monthly values of ARID were computed for each location
6 For each CI, the dataset was separated by month
7 For each location and lead time, the ARID dataset was separated by month
8 CI–ARID correlation was examined for each index, month, location, and lead time; if significant at a 5 0.1, the index was
selected (Tables 2 and 3)
9 The CIs selected for any location for a month were considered to be the predictors of ARID for the entire region
for that month
10 Cross correlations among the CIs were examined (Table 4)
11 PCA was performed on the chosen CIs (Table 3), and the first principal component was used as the predictor of ARID
12 Models were evaluated using the Nash–Sutcliffe index, and the predictability of ARID was assessed using the RMSE
430 WEATHER AND FORECAST ING VOLUME 28
forecasts consistent (i.e., 1) because the number of
climate indices was different for different months
and lead times, ranging from 1 to 4 (Table 3).
12) Last, the predictive accuracy of each model was
evaluated using the modeling efficiency [ME, also
called the Nash–Sutcliffe index (Nash and Sutcliffe
1970)], which is a commonly used measure for
evaluating hydrometeorological models (Sevat
and Dezetter 1991; ASCE Task Committee on
Definition of Criteria for Evaluation of Watershed
Models of theWatershedManagement Committee,
Irrigation and Drainage Division 1993; Legates and
McCabe 1999; Moriasi et al. 2007). The ME was
computed as
ME5 12
��n
i51
(ARID2 ARID)2.�n
i51
(ARID2ARID)2�, (6)
where ARID, ARID, and ARID are the observed
(i.e., originally computed), predicted, and mean
observed values of ARID, respectively, and n is
the number of observations. Values of ME range
from 2‘ to 1. Whereas a negative value indicates
that the observed mean is a better predictor than
TABLE 2. CIs selected as predictors of ARID at lead-time t 1 1 (1-month-ahead forecasting) for various months and locations in the
southeastern United States. The CIs were significantly correlated with ARID at the 90% confidence level (a 5 0.1).
Location
Month Miami Bartow Live Oak Plains Blairsville Region
Jan Nino-3.4 Nino-3.4 Nino-3.4 JMA, PDO, PNA — Nino-3.4, PDO, PNA
Feb Nino-3.4, PDO,
PNA
Nino-3.4, PDO Nino-3.4, PDO, PNA Nino-3.4 Nino-3.4 Nino-3.4, PDO, PNA
Mar AMO, Nino-3.4,
PDO
Nino-3.4 — NAO, JMA NAO, Nino-3.4,
PNA
NAO, Nino-3.4,
PDO, PNA
Apr JMA, PDO JMA PDO — — JMA, PDO
May — — — — JMA JMA
Jun AMO, PNA — PDO PDO Nino-3.4, PDO AMO, Nino-3.4,
PDO, PNA
Jul PNA — — JMA JMA JMA, PNA
Aug AMO NAO, Nino-3.4, PDO NAO NAO Nino-3.4 NAO, Nino-3.4, PDO
Sep Nino-3.4 Nino-3.4 — — — Nino-3.4
Oct Nino-3.4 JMA Nino-3.4, PDO NAO, Nino-3.4,
PDO
NAO NAO, Nino-3.4, PDO
Nov Nino-3.4, PDO Nino-3.4, PDO, PNA AMO, Nino-3.4, PDO NAO, Nino-3.4,
PDO
— NAO, Nino-3.4,
PDO, PNA
Dec — Nino-3.4 — — — Nino-3.4
TABLE 3. CIs used as the predictor of ARID for various months and lead-time forecast (t1 1, t1 2, and t1 3) for the southeasternUnited
States. Here, ‘‘X var’’ is the percent of total variance explained by the first principal component X.
t 1 1 forecasting t 1 2 forecasting t 1 3 forecasting
Month (t) CIs X var CIs X var CIs X var
Jan Nino-3.4, PDO, PNA 72 Nino-3.4, PDO, PNA 72 Nino-3.4 100
Feb Nino-3.4, PDO, PNA 72 AMO, Nino-3.4, PDO, PNA 60 PDO, PNA 85
Mar NAO, Nino-3.4, PDO, PNA 62 AMO, JMA, PDO 77 JMA 100
Apr JMA, PDO 82 PNA 100 NAO, Nino-3.4, PDO, PNA 55
May JMA 100 NAO, Nino-3.4, PDO 60 JMA, PNA 70
Jun AMO, Nino-3.4, PDO, PNA 48 JMA, PNA 63 AMO, Nino-3.4 60
Jul JMA, PNA 65 AMO, Nino-3.4, PDO, PNA 52 NAO, PNA 65
Aug NAO, Nino-3.4, PDO 60 PNA 100 AMO, Nino-3.4, PDO 60
Sep Nino-3.4 100 AMO, Nino-3.4, PDO 65 AMO, Nino-3.4, PDO 65
Oct NAO, Nino-3.4, PDO 60 AMO, Nino-3.4, PDO, PNA 48 Nino-3.4, PNA 65
Nov NAO, Nino-3.4, PDO, PNA 55 NAO, Nino-3.4 70 AMO, JMA 64
Dec Nino-3.4 100 NAO, PDO 60 PDO 100
APRIL 2013 WOL I ET AL . 431
the model, a positive value signifies that using the
model estimate is better than using the observed
mean. An ME of 0 indicates that the model esti-
mate is as accurate as the mean of the observed
data, whereas an ME of 1 corresponds to a perfect
match of the modeled values to the observed data.
In fact, the closer the ME is to 1, the more accurate
is the model.
The predictability of ARID for different lead times
was assessed by comparing the 1-, 2-, or 3-month lead-
time forecasts of ARID with observed values using the
root-mean-square error (RMSE). The comparisons
were carried out for each model, month, and location.
b. Computations for ANN models
For ANN, a three-layer multilayer perceptron
(MLP) was used (Fig. 1) because it is the most widely
used network and provides a general framework for
nonlinear input–output mapping (Hornik et al. 1989;
Zhang et al. 1998; Samarasinghe 2007; Abbasi 2009).
A single-hidden-layer MLP was chosen because one
hidden layer is sufficient to approximate any complex
nonlinear function (Cybenko 1989; Hornik et al. 1989;
Zhang et al. 1998; Maier and Dandy 2000). To examine
how the number of nodes in a hidden layer of the net-
work would affect the forecasting ability of the net-
work and to decide on the number of nodes, we carried
out a sensitivity analysis on the number of nodes. The
ability of the network was measured in terms of pre-
diction errors (RMSE): the lower the RMSE value is ,
the better is the forecasting ability. For this, the net-
work was run with different numbers of nodes in a
hidden layer ranging from 1 to 10. The results showed
that increasing the number of nodes in a hidden layer
from 1 did not significantly improve the performance of
the network having one input node (Table 5). For sim-
plicity, therefore, we used just one node in the hidden
layer. Several other researchers, such as De Groot and
Wurtz (1991), Chakraborty et al. (1992), and Tang and
Fishwick (1993), are also of the opinion that one node in
the hidden layer be used for a network that has just one
input node.
TABLE 4. Correlations and associated P values among the CIs used to forecast ARID for the southeastern United States.
AMO NAO JMA Nino-3.4 PDO PNA
Correlation
AMO 1 20.17 0.08 0.08 0.02 0.12
NAO 20.17 1 0.01 0.02 0.02 0.03
JMA 0.08 0.01 1 0.83 0.44 0.16
Nino-3.4 0.08 0.02 0.83 1 0.41 0.12
PDO 0.02 0.02 0.44 0.41 1 0.36
PNA 0.12 0.03 0.16 0.12 0.36 1
P value
AMO 1 1 3 1025 0.04 0.04 0.56 2 3 1023
NAO 1 3 1025 1 0.70 0.68 0.69 0.49
JMA 0.04 0.70 1 ;0 ;0 2 3 1025
Nino-3.4 0.04 0.68 ;0 1 ;0 2 3 1023
PDO 0.56 0.69 ;0 ;0 1 ;0
PNA 2 3 1023 0.49 2 3 1025 2 3 1023 ;0 1
FIG. 1. The architecture of an MLP neural network with one input layer, one hidden layer, and
one output layer, each with one neuron.
432 WEATHER AND FORECAST ING VOLUME 28
TheMLP network estimatedARIDwith the following
steps (Fig. 1):
1) The network-inputXwas fed into the hidden neuron,
and its weighted input pwas computed, using its layer
weight a and bias weight b as
p5 aX1 b . (7)
2) In the hidden neuron, its output h was computed
using p in a hyperbolic tangent input transfer func-
tion (Klimasauskas 1991; Maier and Dandy 1998;
Samarasinghe 2007) as
h5 [exp(p)2 exp(2p)]/[exp(p)1 exp(2p)] . (8)
3) The h was then fed into the output neuron, and its
weighted input q was computed, using its layer
weight a and bias weight b, as
q5ah1b . (9)
4) In the output neuron, the network output (ARID)
was computed using q in a linear output transfer
function (Rumelhart et al. 1995; Zhang et al. 1998;
Sahoo et al. 2005):
ARID5 q . (10)
Step 4 concluded forward processing in the network,
and, with this, ARID was compared with the target
value (ARID) to compute error «, which was then
backpropagated through the network to update the pa-
rameters a, b, a, andb. For training, the gradient descent
method was used because it is efficient and popular
(Zhang et al. 1998; ASCE Task Committee on Applica-
tion of Artificial Neural Networks in Hydrology 2000;
Mishra and Desai 2006; Samarasinghe 2007). Learning
rate andmomentumwere set to 0.01 and 0.9, respectively,
assuming that the input–output relationships were
complex (Tang et al. 1991; Hagan et al. 1996; Principe
et al. 1999; Mishra and Desai 2006; MathWorks 2012).
For training, batch learning was used because it is the
most preferred method, forces the search in the di-
rection of the true gradient (Maier and Dandy 2000),
and avoids instability (Samarasinghe 2007).
Before feeding data into the network, each monthly
dataset of 56 input–output combinations was divided
into three subsets: 44 for training, 11 for testing, and 1 for
validation (Granger 1993). The testing set was used to
avoid overfitting the data, and the validation set was
used to evaluate the model. The performance of an
ANN model was evaluated applying the leave-one-out
technique of cross validation. Last, values of RMSE and
ME were computed for each month, location, and lead-
time forecast using the model-estimated and original
values of ARID.
c. Computations for ANFIS models
This study used the Sugeno-type ANFIS (Takagi and
Sugeno 1985) because it is computationally efficient and
is well suited formodeling nonlinear systems (Jang 1993;
Lee et al. 2007; MathWorks 2012). The first-order Su-
geno type was used because higher orders often add an
unwanted level of complexity (Lee et al. 2007). To
partition the input space, grid partitioning was used
because it gives more precise modeling (Altun et al.
2009). For training, the hybrid optimization method—
the combination of backpropagation and least squares
error—was used because of its high learning speed (Jang
1993; Lee et al. 2007). The ANFIS network estimated
ARID with the following steps (Fig. 2):
1) In the node of layer 1, the inputXwas fuzzified using
a generalized bell-shaped (Marja and Esa 1997; Lee
et al. 2007; MathWorks 2012) input membership
function f, a curve that maps each element in the input
space to a membership value between 0 and 1 as
fi 5 1/h11 fabs[(X2 ci)/ei]g2dii , (11)
where the subscript i stands for the ith rule and c, d,
and e are antecedent parameters that are updated
through backpropagation.
2) In the node of layer 2, firing strengths, also called
weights w, were computed for rule i. For the single-
input variable X, the same values of f were used for
w as follows,
TABLE 5. The RMSE values associated with 1-month-lead-time
forecasting of ARID for various months for Miami, using different
numbers of neurons in a hidden layer of the neural network having
one input node.
No. of neurons
Month 1 2 3 4 5 6 7 8 9 10
Jan 0.15 0.21 0.20 0.20 0.21 0.20 0.22 0.21 0.22 0.20
Feb 0.16 0.17 0.19 0.17 0.18 0.15 0.19 0.15 0.14 0.17
Mar 0.11 0.14 0.14 0.12 0.11 0.13 0.13 0.13 0.12 0.14
Apr 0.10 0.11 0.13 0.10 0.09 0.09 0.09 0.11 0.13 0.10
May 0.11 0.12 0.13 0.10 0.12 0.14 0.17 0.13 0.11 0.14
Jun 0.07 0.06 0.10 0.11 0.07 0.11 0.06 0.12 0.14 0.07
Jul 0.05 0.07 0.06 0.07 0.08 0.08 0.08 0.09 0.08 0.08
Aug 0.06 0.05 0.08 0.10 0.07 0.09 0.11 0.08 0.08 0.10
Sep 0.06 0.07 0.06 0.07 0.06 0.08 0.05 0.07 0.08 0.07
Oct 0.12 0.13 0.11 0.10 0.11 0.12 0.13 0.13 0.14 0.11
Nov 0.17 0.17 0.18 0.19 0.16 0.17 0.15 0.17 0.14 0.18
Dec 0.16 0.19 0.14 0.18 0.17 0.20 0.16 0.19 0.16 0.19
APRIL 2013 WOL I ET AL . 433
wi 5 fi . (12)
3) In the node of layer 3, the normalized weight wi* of
rule i was computed as the ratio of the ith rule’s
weight to the sum of the weights of all n rules as
wi*5wi
��n
i51
wi . (13)
4) An output membership function g was computed
from the network-input X and the consequent pa-
rameters that are updated through least squares
error, g and d, as
gi 5 giX1 di . (14)
5) In the node of layer 4, the fuzzy values were
defuzzified back to crisp output values as
Oi5wi*gi , (15)
where Oi is the output contributed by the ith rule
toward the overall ANFIS output.
6) Last, ARID, the overall output of the model, was
computed in the fifth layer by summing up the
outputs contributed by all rules:
ARID5 �n
i51
Oi . (16)
With the conclusion of forward processing, ARID
was compared with the target value (ARID) to
compute error «, which was then backpropagated
through the network to update the antecedent pa-
rameters (step 1).
Before feeding the data into the system, each monthly
dataset of 56 input–output combinations was divided
into three subsets: a training set (44), a testing set (11),
and a validation set (1). AnANFISmodel was evaluated
by applying the leave-one-out cross-validation tech-
nique and computing RMSE and ME for each month,
location, and lead-time forecast using the estimated and
original values of ARID.
d. Computations for LR models
The m-month-ahead forecasts of ARID were gener-
ated as follows after estimating the coefficients u and y
by regressing the lead-time values of ARID (ARIDt1m)
and the current values of X (Xt):
ARIDt1m 5 uXt 1 y , (17)
where t is the month andm is the lead time. Values ofm
ranged from 1 to 3 months.
For regression and forecasting, respectively, training
and validation datasets were used. An LR model was
evaluated for each month, location, and lead-time
forecast using the leave-one-out technique of cross val-
idation and the RMSE and ME values.
e. Computations for ARMA models
For each location, 12 time series of ARID were cre-
ated, one for each month. For instance, one time series
comprised the ARID values from January 1951 through
December 2005, another series comprised the values
from February 1951 through January 2006, and the 12th
series comprised the values fromDecember 1951 through
November 2006. The time series ending in December, for
instance, was used to make forecasts for January 2006
(1 month ahead), February 2006 (2 months ahead), and
March 2006 (3 months ahead). Similarly, the series
ending in January was used to predict for February 2006
(1 month ahead), March 2006 (2 months ahead), and
FIG. 2. The architecture of an ANFIS with one input variable X.
434 WEATHER AND FORECAST ING VOLUME 28
April 2006 (3 months ahead), and so on. The monthly
time series were createdwith the assumption that weather
conditions vary across months. From each monthly
series, 16 subseries were created by truncating the
monthly series by 1–15 yr. For instance, the first subseries
of the series ending in December (mentioned above)
ended in December 2005, the second in December 2004,
and the 16th in December 1990. Accordingly, 16 ARMA
models were developed for each month. The subseries
were created for model evaluation purposes (to create
n observations), and 16 was chosen for n by consider-
ing 40 yr as the minimum length of the time series re-
quired to reflect any long-term periodicity in the data
(56 2 40 5 16).
The ARMA models were developed for each month
with the assumption that the weather conditions in this
region vary across months. The monthly ARMAmodels
would be consistent and thus be easy to compare with
the other methods for which the variation across months
in the degree of linkage between large-scale teleconnection
patterns and the weather in this region is assumed. A
seasonally integratedARMAmodel, called the seasonal
autoregressive integrated moving average (SARIMA)
model can satisfactorily describe a time series that ex-
hibits nonstationarity not only across seasons but also
within a season (Mishra and Singh 2011).
To observe the relationships between the present and
past values of ARID, autocorrelation functions (ACF)
and partial autocorrelation functions (PACF) were
calculated for each of the 16 time series of each month
and location. In each series, the values of ACF and
PACF exhibited periodicities corresponding to the
positive correlation between values separated by 12
months, indicating a need for seasonal differencing
(Fig. 3a). Thus, each time series was seasonally differ-
enced as ARIDt 2 ARIDt212. To study the behavior of
the seasonally differenced series, its ACF and PACF
values were calculated and analyzed to find any correla-
tion among the ARID values. Because the ACF and
PACF values of the differenced series did not exhibit any
periodicities (Fig. 3b), no further differencing was
needed. Thus, the D in an integrated ARMA model,
ARIMA( j, d, k) 3 (J, D, K)s, was set to 1. The parame-
ters j, d, k, J, D, K, and s in the model denote the non-
seasonal autoregressive (AR) model of order j, the
nonseasonal difference of order d, the nonseasonal
moving average (MA)model of order k, the seasonal AR
model of order J, the seasonal difference of orderD, the
seasonal MA model of order K, and the seasonal lag s,
respectively. The characteristics of theACFandPACFof
the seasonally differenced series showed a strong peak at
a lag s of 12 months in the ACF. Smaller peaks appeared
at s5 24 and 36 in the ACF, combined with peaks at s512, 24, 36, and 48 in the PACF (Fig. 3a). The cutting off of
ACF after lag s5 12 months (a seasonal lag of 1 yr) and
the tailing off of the PACF of the seasonally differenced
series indicated a seasonalMAof orderK5 1. Therefore,
the orders of J, D, and K for the seasonal part were 0, 1,
and 1, respectively. The same orders were found for each
series, month, and location. For the nonseasonal com-
ponent, the ACF and PACF values of the seasonally
differenced series within seasonal lags l 5 1, 2, . . . , 11
did not show any periodicities. Therefore, no non-
seasonal differencing was needed (d 5 0). Because the
FIG. 3. The (left) ACF and (right) PACF of (a) ARID and (b) differenced ARID for Miami in January.
APRIL 2013 WOL I ET AL . 435
nonseasonal behavior of ACF and PACF was the same
in each series for each month and location, the order of
d was also the same for all cases. For the order of j and
k, the tailing off of ACF and the cutting off of PACF
after lag l5 1 indicated a nonseasonal AR of order j5 1
and a nonseasonal MA of order k 5 0 (Fig. 3b). Thus,
the most appropriate ARMA model for each month
and location was ARIMA(1, 0, 0)3 (0, 1, 1)12. Because
the ARMA model included a seasonal component
and differencing, it is hereinafter referred to as the
SARIMA model.
Using 16 SARIMA models for each month, ARID
was forecast with 1-, 2-, and 3-month lead times. The
SARIMAmodels were then evaluated for each location,
month, and lead-time forecast using the RMSE and ME
as goodness-of-fit measures.
f. Computations for the ENSO approach
To characterize the ENSO episodes, generally the
JMA index is used because it is more sensitive to
El Nino and La Nina conditions than are the other in-
dices (Trenberth 1997; Hanley et al. 2003). When the
index is $0.58C (#20.58C) for six consecutive months,
including October–December, an El Nino (La Nina)
episode is defined (Hanley et al. 2003). The episode then
lasts from October through the following September.
The episode for all other values of the index is termed
neutral. This study, however, used the modified version
of the index because it givesmore significant results than
does the original index (Keeling 2008; Gerard-Marchant
and Stooksbury 2010; Royce et al. 2011). In the modified
JMA index, the El Nino (La Nina) episode stops as soon
as the temperature conditions are no longer met
(Gerard-Marchant and Stooksbury 2010).
Themonthly time series of ARIDwere separated into
three ENSO categories. In each category, the ARID
values were further separated into 12 months and in-
dividually averaged. Themonthly average value ofARID
in a particular month and ENSO phase was considered
to be the forecast of ARID for that month during that
ENSO phase.
g. Probabilistic forecasting
A probabilistic forecast is more ‘‘honest’’ than a de-
terministic forecast (Krzysztofowicz 2001) and is never
wrong (Mason 2002). Although deterministic forecasts
are more useful, they are more difficult to produce.
Therefore, most forecasts are probabilistic.
On the basis of ME, the three best-performing models
were selected for making the probabilistic forecasts of
ARID. From the 56 values of prediction error « (16 in
the case of SARIMA), computed as estimated ARID
minus original ARID, a cumulative distribution function
was created, from which the probability of « falling in-
side the negative and positive values of the prediction
error threshold E, denoted as P(2E , « , E), was
computed as P(«, E) minus P(«, 2E). The P(2E,«,E) valuewas then used as the probability of (F2E),ARID , (F 1 E), where F is the deterministic fore-
cast made by a model. Such computations were made
for each location, month, and model. For E, 0.1 was
used to indicate a small error.
3. Results and discussion
a. Comparing the forecasting models
The modeling efficiency of each of the four models
(ANFIS, ANN, LR, and SARIMA) varied depending
on the month and location (Fig. 4). The variation in the
efficiencies of the models gave an indication that, for
a specific month at a specific location, one specific model
may bemore appropriate than the others. In general, the
efficiencies of the CI-based models (ANFIS, ANN, and
LR) and the ENSO approach were large during the
winter and small during the summer in most of the lo-
cations. Furthermore, these models indicated an ac-
ceptable level of performance during the winter, as their
ME values were generally positive. During the summer,
they indicated an unacceptable level of performance, as
their ME values were mostly negative. This seasonal
difference was possibly due to the stronger signals of
ENSO and large-scale teleconnections during the winter
than in the summer. The time series–based model,
SARIMA, however, did not show this pattern. Its effi-
ciency largely depended on the periodicity of the time
series data. For instance, SARIMA was highly efficient
for a month that received a large amount of precip-
itation in each year of the time series.
In general, the efficiency of each model, including the
ENSO approach, was better in southern locations, such
as Miami, Bartow, and Live Oak, than in the northern
ones, such as Plains and Blairsville (Fig. 4). The mod-
eling efficiencies decreased in the direction of south to
north. Miami, in southern Florida, had the largest effi-
ciencies, whereas Blairsville, in northern Georgia,
showed the lowest efficiencies. This differential effi-
ciency probably arose because of stronger ENSO and
teleconnection signals and larger precipitation period-
icities in the south than in the north.
In terms of modeling efficiency, ANN generally
demonstrated the best performance for southern loca-
tions such as Miami, Bartow, and Live Oak. Of the five
approaches compared, the performances of LR and
ANFIS were generally poor for all locations. The poor
efficiency of LR was likely the result of its being a linear
436 WEATHER AND FORECAST ING VOLUME 28
model, such that it could not represent the nonlinearity
that possibly exists between ARID and climate indices.
For ANFIS, even though it was supposed to account for
nonlinearity in a system, it performed poorly. The poor
performance was most probably due to limited data
(Costa Branco andDente 2001; Marce et al. 2004; Sahoo
et al. 2005). The total number of parameters associated
with an ANFIS model was 15, whereas the number of
samples in the training dataset was only 44. Because the
training data points were fewer than 4 times the number
of model parameters, a threshold value (Marce et al.
2004), the small training dataset probably caused over-
fitting so that the generality was lost. Because of limited
data, ANFIS probably could not generate proper fuzzy
rules and derive all the complex information of the
system properly. The use of ANN, on the other hand,
does not necessarily require a large sample size (Zhang
et al. 1998). The ANN models perform well even with
a sample size of less than 50 (Kang 1991). Being a uni-
versal approximator because of its ability to approxi-
mate any nonlinear relationship between inputs and
outputs to any degree of accuracy (Hornik et al. 1989;
Smith 1996; Zhang et al. 1998; Samarasinghe 2007),
ANN performed better than the other CI-based
models—namely, ANFIS and LR. The better perfor-
mance of ANN for southern locations than for northern
locations indicated that the relationships between cli-
mate indices and ARID are stronger in the southern
locations. Probably because of weak teleconnections
between the precipitation pattern in northern locations
and the climate indices, ANN performed poorly for
Plains and Blairsville. Generally, ENSO- and CI-based
methods—namely, ANFIS, ANN, and LR—performed
poorly for the northern part of the region. The perfor-
mance of CI-based methods was even poorer than that
of the ENSO approach, indicating that the signals of the
large-scale teleconnection patterns represented by the
climate indices used in this study are even weaker than
those of the ENSO (i.e., JMA) in northern locations.
Also in southern locations, the ENSO approach per-
formed better than the other models for some months:
February and April for Miami; January, April, and
October for Bartow; and July–October for Live Oak.
These results indicated that the ENSO signal is stronger
in these months and locations than are the tele-
connections represented by the climate indices. For the
other months, signals represented by the climate indices
seemed stronger than that of the ENSO. The efficiency
of SARIMAmodels also varied depending upon month
and location. Generally, the performance of SARIMA
was poorer than those of ANN and ENSO for most of
themonths in southern locations. The poor performance
probably resulted because a SARIMA model assumes
that time series are generated from linear processes,
whereas the drought time series are often nonlinear and
seasonal (Granger and Terasvirta 1993; Zhang et al.
1998). A SARIMA model generally performs better
when periodicity exists in the data, as the periodicity is
taken into account by themodel by differencing the data
FIG. 4. Values of ME for various models associated with
1-month-ahead forecasting across months for (a) Miami, (b)
Bartow, (c) Live Oak, (d) Plains, and (e) Blairsville. Note that J5January, . . . , D 5 December indicate the 12 months.
APRIL 2013 WOL I ET AL . 437
(Mishra et al. 2007). Probably because of the lack of
seasonality in the data in these months, the SARIMA
models performed poorly. In the northern region,
however, especially for Plains, SARIMA performed
better than the other models. Even in southern loca-
tions, the performance was better than those of the
other approaches for some months: January, August,
September, andDecember forMiami; June andNovember
for Bartow; and April and June for Live Oak. The better
performance in these months was probably due to the ex-
istence of seasonality and nonstationarity in the time se-
ries data. For instance, December and January in Miami
are dry months as they usually receive less precipitation,
whereas August and September are wet months, the pe-
riod of hurricane. Therefore, SARIMAmodels performed
better when periodicity existed in the time series.
Results indicated that ARID may be forecast more
accurately for several months in most locations by using
one or more models than by using the current ENSO
approach (Table 6). For instance, whereas the fore-
casting of ARID could be improved for January in Mi-
ami by using the time series–based SARIMA models,
the index could be better forecast for March and May–
December by using the CI-based ANN models. Simi-
larly, forecasting for December in Miami could be
improved with any of the following models: ANFIS,
ANN, LR, or SARIMA. Forecasting of ARID could
not be improved with the CI-basedmodels for the other
months, probably because of the weaker signals of the
large-scale teleconnections than that of the ENSO.
Similarly, the SARIMA models could not generate
more accurate forecasts than that of ENSO because of
the lack of strong periodicity in the time series data in
these months. Table 6 also shows the best method in
terms of modeling efficiency for each month–location
combination. In general, ANNhad the highest efficiency
for southern locations such as Miami, Bartow, and Live
Oak. The SARIMA model performed best for a mid-
northern location such as Plains, whereas for Blairsville,
the northernmost location in the region, the perfor-
mance of the ENSO approach was the highest of all
methods. The probable causes of these phenomena are
explained in the preceding paragraph.
b. Lead-time forecasts
The RMSE associated with 1-, 2-, and 3-month-ahead
forecasting for eachmonth and location varied depending
on the models used. For the CI-based models—ANFIS,
ANN, andLR—theoccurrenceofRMSEt11,RMSEt12,RMSEt13 was less than 25% (Fig. 5). The terms
RMSEt11, RMSEt12, and RMSEt13 denote the RMSE
associated with 1-, 2-, and 3-month-ahead forecasting,
respectively. The percentage of occurrence shows the
number of times RMSEt11 , RMSEt12 , RMSEt13
occurred out of the total 60 cases (5 locations 3 12
months). For SARIMA, the occurrence of RMSEt11 ,RMSEt12 , RMSEt13 was 50% (i.e., 30 out of 60).
Although RMSEt11 , RMSEt12 , RMSEt13 did not
occur for 100% of the cases with each model, the results
gave an indication that short-term forecasting (e.g., for
1 month in advance) can generally make more accurate
estimations than can long-term forecasting (e.g., for 2 or
3 months in advance). The SARIMA models produced
less accurate forecasts than did the CI-based models for
the months that were more distant because of weaker
autocorrelations in the time series data with increasing
lag times. The occurrence of RMSEt11 , RMSEt12 ,RMSEt13 for fewer cases with the CI-based models than
TABLE 6. The relative ME values, computed as the ME of a method divided by the ME of the ENSO approach, showing the perfor-
mances of various methods relative to those of the ENSO approach for different months and locations. Relative ME values of greater
(less) than 1 indicate better (worse) performance than that of the ENSO approach. The largest (boldface) value in a month–location grid
(four grid cells) corresponds to the best method for that grid. For the grids without boldface values, the ENSO approach was the best.
Here, A 5 ANFIS, N 5 ANN, R 5 LR, and S 5 SARIMA.
Miami Bartow Live Oak Plains Blairsville
Month A N R S A N R S A N R S A N R S A N R S
Jan 0.6 0.6 0.5 1.1 0.6 0.7 0.8 0.0 0.3 1.5 0.4 20.7 20.8 21.0 21.9 4.0 21.6 24.7 23.4 0.0
Feb 0.5 0.5 0.3 0.6 0.5 1.1 0.6 0.5 3.5 3.7 2.8 1.5 21.6 0.3 20.3 1.9 24.8 23.7 22.9 23.2
Mar 1.4 1.4 1.4 0.2 0.5 1.1 0.8 0.6 2.6 2.5 2.8 20.6 0.8 0.9 0.2 20.5 20.9 21.3 21.5 21.6
Apr 0.5 0.8 0.2 0.1 0.6 0.5 0.3 0.8 22.6 3.6 21.5 7.1 28.7 20.8 24.4 9.7 0.1 0.9 0.3 0.6
May 14.0 11.6 2.1 236.2 0.4 3.0 20.3 214.9 27.2 3.1 25.4 218.3 25.7 22.5 23.6 21.3 224.9 216.6 218.0 29.2
Jun 213.4 7.9 2.4 24.7 211.5 2.9 22.5 5.9 23.4 22.1 22.4 2.7 265.6 222.4 232.3 0.2 24.4 21.2 21.8 20.8
Jul 20.5 4.1 0.4 23.2 24.3 1.6 21.8 28.0 22.0 22.3 22.7 20.7 28.7 21.8 25.6 3.9 22.6 0.3 21.2 20.9
Aug 25.4 1.0 0.1 12.8 219.1 2.6 213.0 231.3 27.8 23.9 25.1 22.0 22.1 20.3 22.3 21.4 21.3 20.2 21.1 0.5
Sep 16.0 5.8 23.1 84.5 22.9 1.8 22.8 25.5 24.1 20.8 22.7 0.4 24.8 24.3 26.9 5.0 24.2 20.5 21.8 20.2
Oct 22.2 2.8 0.2 0.9 212.1 24.7 25.9 239.5 20.8 20.1 20.1 21.8 15.3 217.2 5.9 17.8 21.6 20.5 22.8 24.1
Nov 0.9 4.4 0.8 0.0 0.1 1.8 0.1 2.2 2.4 4.7 4.6 21.0 0.8 2.1 1.5 2.4 20.5 0.0 21.1 1.4
Dec 2.2 2.5 2.2 2.9 0.5 1.8 0.3 1.4 1.2 3.3 1.2 2.2 0.4 2.7 1.4 0.0 22.4 22.9 22.9 0.0
438 WEATHER AND FORECAST ING VOLUME 28
with the SARIMA models gave an impression that, al-
though ARID is slightly more correlated with climate
indices with shorter lags, the correlations between
ARID and the climate indices with longer lags are also
possible.
c. Probabilistic forecasting
The distribution of « in terms of the box-and-whiskers
plots for each of the 12 months, five locations, and the
three methods (ANN, ENSO, and SARIMA) whose
performances were better than those of the others as
shown by the ME values presented in section 3a are
presented in Fig. 6. In line with the modeling efficiency
results, the performance of each method in terms of
producing small errors varied depending on months and
locations. For instance, the interquartile range of errors
for Miami in January was the smallest with the ENSO
approach, whereas that for Miami in March was the
smallest with ANN. Similarly, ANN produced the
smallest errors for Miami in June and July, whereas
SARIMA produced the smallest errors in August and
September.
The probability of « falling inside 20.1 and 0.1, de-
noted as P(20.1 , « , 0.1) and computed for each lo-
cation, month, and method, is presented in Table 7.
From this table, the probabilistic forecasts of ARID can
be made. For instance, P(20.1 , « , 0.1) for Miami in
January as estimated by the ENSO approach is 0.39. If
the forecast F made by this approach for Miami in
January were 0.26, P(F 2 0.1 , ARID , F 1 0.1); that
is, P(0.16 , ARID , 0.36) for this month and location
would be 0.39. That is, there would be 39% chance that
the forecast of ARID would fall inside 0.26 6 0.1.
FIG. 5. The percentage of occurrence of RMSEt11,RMSEt12,RMSEt13 for different forecasting models, where RMSEt11,
RMSEt12, and RMSEt13 denote the RMSE values associated with
1-, 2-, and 3-month-ahead forecasting, respectively. There were 60
cases in total (5 locations3 12 months), and therefore 50%means
30 cases.
FIG. 6. Box-and-whiskers plots of errors, computed as the difference between the original and model-estimated values of ARID, for
various months, locations, and models. Note that E 5 ENSO, N 5 ANN, and S 5 SARIMA.
APRIL 2013 WOL I ET AL . 439
Similarly, if the forecast made by an ANN model for
Bartow in July were 0.27,P(F2 0.1,ARID, F1 0.1);
that is, P(0.17 , ARID , 0.37) for this month and lo-
cation would be 0.82 (Table 7).
The large probability values in Table 7 may be due to
extremely wet conditions, extremely dry conditions, or
good teleconnections between the large-scale oceanic–
atmospheric signals (climate indices) and ARID in this
region.Under extremelywet conditions, almost all values
of ARID are 0, such as those in January, February, and
December for Plains and Blairsville (Figs. 7d,e). Under
extremely dry conditions, values of ARID concentrate
at 1. Thus, the width of the prediction error is almost
zero, and P(20.1 , « , 0.1) is high in both severely
wet and dry conditions. When the distribution of ARID
is not close to 0 or 1—such as in July, August, and
September for Miami (Fig. 7a), July and August for
Bartow (Fig. 7b); and February for Live Oak (Fig. 7c)—
the large value of P(20.1 , « , 0.1) indicates a strong
teleconnection between climate indices and ARID and
good performance of a forecasting model such as ANN.
Figure 7 also illustrates 75% probability that the mean
value of ARID as forecast by an ANNmodel falls inside
a certain range. For January in Miami, for instance,
there is 75% chance that the forecast of ARID falls in-
side 0.30 and 0.63 (Fig. 7). The 75% probability was
computed as the difference between the 87.5th percen-
tile and the 12.5th percentile.
4. Conclusions
This study attempted to verify whether the use of
climate indices—namely, AMO, NAO, JMA, Nino-3.4,
PDO, and PNA—could improve the current level of
drought forecasting, which is essentially ENSO based,
for the southeastern United States and also examined
the performance of ANFIS, ANN, ARMA, and LR
models relative to that of the ENSO approach in fore-
casting ARID, an agricultural drought index.
The results indicated that improved forecasting may
be possible for several months in most locations of the
region with the use of climate indices and forecasting
models such as ANN and SARIMA. For instance,
ARID forecasting could be improved for January in
Miami using the SARIMA models, whereas the index
could be better forecast for March–December using the
ANN models.
The results showed that the climate indices used in
this study have good potential for use in drought fore-
casting for both winter and summer months, especially
for the southern part of the region. Thus, the climate
indices can potentially contribute to a better forecast of
droughts.
The performance of the ANFIS, ANN, ENSO, LR,
and SARIMAmodels varied depending uponmonth and
location. In general, climate index–based models and
the ENSO approach performed better during the winter
than during the summer because of stronger signals of
the ENSO and large-scale teleconnections in the winter
than in the summer. The efficiency of SARIMA models
depended largely on the periodicity of precipitation in
the data. The ANN models generally outperformed the
other models for the southern part of the region,
whereas the ENSO approach performed better than the
other models for the northern part. The SARIMA
models were best for the central part of the region. In
general, LR and ANFIS performed poorly, as the for-
mer could not represent the nonlinearity between
ARID and the climate indices and the latter lost gen-
erality mainly because of limited data. The relatively
better performance of ANN was due to its ability to
approximate the nonlinear relationship between input
TABLE 7. The P(20.1 , « , 0.1) values as estimated by various models for different months and locations in the southeastern United
States. Here, E 5 ENSO, N 5 ANN, and S 5 SARIMA.
Miami Bartow Live Oak Plains Blairsville
Month E N S E N S E N S E N S E N S
Jan 0.39 0.29 0.16 0.23 0.14 0.36 0.44 0.48 0.56 0.94 0.91 0.96 0.99 0.98 0.96
Feb 0.32 0.34 0.36 0.30 0.30 0.16 0.45 0.64 0.44 0.89 0.88 0.92 0.99 0.98 0.96
Mar 0.32 0.38 0.36 0.34 0.27 0.20 0.39 0.43 0.24 0.53 0.55 0.52 0.94 0.91 0.84
Apr 0.49 0.48 0.60 0.33 0.36 0.28 0.31 0.34 0.32 0.37 0.29 0.40 0.56 0.54 0.52
May 0.45 0.46 0.28 0.35 0.39 0.40 0.35 0.38 0.44 0.37 0.36 0.32 0.49 0.45 0.40
Jun 0.52 0.50 0.32 0.42 0.50 0.36 0.46 0.41 0.40 0.39 0.38 0.28 0.46 0.41 0.40
Jul 0.67 0.75 0.60 0.72 0.82 0.60 0.53 0.54 0.64 0.51 0.38 0.52 0.38 0.34 0.28
Aug 0.59 0.63 0.68 0.67 0.64 0.56 0.49 0.52 0.52 0.42 0.43 0.40 0.44 0.45 0.36
Sep 0.80 0.77 0.84 0.46 0.46 0.32 0.46 0.36 0.40 0.34 0.36 0.20 0.35 0.27 0.32
Oct 0.38 0.36 0.28 0.34 0.32 0.12 0.33 0.29 0.32 0.30 0.30 0.32 0.18 0.38 0.28
Nov 0.22 0.23 0.16 0.24 0.30 0.24 0.24 0.27 0.20 0.28 0.38 0.32 0.51 0.68 0.76
Dec 0.32 0.32 0.28 0.23 0.23 0.32 0.20 0.20 0.04 0.73 0.73 0.84 0.97 0.98 0.96
440 WEATHER AND FORECAST ING VOLUME 28
and output. The climate index–based models performed
worse than the ENSO approach for northern locations,
indicating that the signals of the large-scale tele-
connection patterns represented by the climate indices
are even weaker than those of ENSO (i.e., JMA) in the
northern part of the region. In general, the performance
of each model decreased in the direction of south to
north as a result of weaker ENSO and teleconnection
signals and less-distinct precipitation periodicities in the
north than in the south.
Acknowledgments. The authors thank the USDA
National Institute for Food and Agriculture and the
NOAA Regional Integrated Sciences and Assessments
program for supporting this work through grants.
REFERENCES
Abbasi, B., 2009: A neural network applied to estimate process
capability of non-normal processes. Expert Syst. Appl., 36,
3093–3100.
Allen, R. G., L. S. Pereira, D. Raes, and M. Smith, 1998: Crop
evapotranspiration: Guidelines for computing crop water re-
quirements. FAO Irrigation and Drainage Paper 56, 326 pp.
——, I. A. Walter, R. L. Elliott, T. A. Howell, D. Itenfisu, M. E.
Jensen, and R. L. Snyder, 2005: The ASCE Standardized
Reference Evapotranspiration Equation. American Society of
Civil Engineers, 216 pp.
Altun, S., A. B. Goktepe, A. M. Ansal, and C. Akguner, 2009:
Simulation of torsional shear test results with neuron-fuzzy
control system. Soil Dyn. Earthquake Eng., 29, 253–260.
ASCE Task Committee on Application of Artificial Neural Net-
works in Hydrology, 2000: Artificial neural networks in
hydrology. II: Hydrologic applications. J. Hydrol. Eng., 5, 124–137.
ASCE Task Committee on Definition of Criteria for Evaluation of
Watershed Models of the Watershed Management Commit-
tee, Irrigation and Drainage Division, 1993: Criteria for
evaluation of watershedmodels. J. Irrig. Drain. Eng., 119, 429–
442.
Bacanli, U. G., M. Firat, and F. Dikbas, 2008: Adaptive neuron-
fuzzy inference system for drought forecasting. Stochastic
Environ. Res. Risk Assess., 23, 1143–1154.
Belsley, D. A., E. Kuh, and R. E. Welsch, 1980: Regression Di-
agnostics: Identifying Influential Data and Sources of Collin-
earity. Wiley-Interscience, 292 pp.
Bond, P., R. Dada, and G. Erion, 2007: Climate Change, Carbon
Trading and Civil Society: Negative Returns on South African
Investments. Rozenberg Publishers, 190 pp.
Box, G. E. P., and G. M. Jenkins, 1970: Time Series Analysis,
Forecasting, and Control. Holden-Day, 553 pp.
Brolley, J. M., J. J. O’Brien, J. Schoof, and D. Zierden, 2007: Ex-
perimental drought threat forecast for Florida. Agric. For.
Meteor., 145, 84–96.
Campbell, K. O., 1973: The future role of agriculture in the
Australian economy. The Environmental, Economic, and
Social Significance of Drought, J. V. Lovett, Ed., Angus and
Robertson, 3–14.
Cancelliere, A., G. D. Mauro, B. Bonaccorso, and G. Rossi, 2007:
Drought forecasting using the standardized precipitation in-
dex. Water Resour. Manage., 21, 801–819.
Chakraborty, K., K. Mehrotra, C. K. Mohan, and S. Ranka, 1992:
Forecasting the behavior of multivariate time series using
neural networks. Neural Networks, 5, 961–970.
Chang, F. J., and Y. T. Chang, 2006: Adaptive neuron-fuzzy in-
ference system for prediction of water level in reservoir. Adv.
Water Resour., 29, 1–10.Chiew, F. H. S., T. C. Piechota, J. A. Dracup, and T. A. McMahon,
1998: El Nino/Southern Oscillation and Australian rainfall,
FIG. 7. Distributions of ARIDwith the mean and the 12.5th- and
87.5th-percentile values as forecast by ANNmodels across months
for (a)Miami, (b) Bartow, (c) LiveOak, (d) Plains, and (e) Blairsville.
Also shown are values of P(20.1 , prediction error , 0.1).
APRIL 2013 WOL I ET AL . 441
stream flow, and drought: Links and potential for forecasting.
J. Hydrol., 204, 138–149.
Cordery, I., and M. McCall, 2000: A model for forecasting drought
from teleconnections. Water Resour. Res., 36, 763–768.
Costa Branco, P. J., and J. A. Dente, 2001: Fuzzy systemsmodeling
in practice. Fuzzy Sets Syst., 121, 73–93.Cutore, P., G. D. Mauro, and A. Cancelliere, 2009: Forecasting
Palmer index using neural networks and climatic indexes.
J. Hydrol. Eng., 14, 588–595.
Cybenko, G., 1989: Approximation by superposition of a sigmoidal
function. Math. Control Signals Syst., 2, 303–314.De Groot, C., and D. Wurtz, 1991: Analysis of univariate time
series with connectionist nets: A case study of two classical
examples. Neurocomputing, 3, 177–192.
Enfield, D. B., A. M. Mestas-Nunez, and P. J. Trimble, 2001: The
Atlantic multidecadal oscillation and its relation to rainfall
and river flows in the continental U.S.Geophys. Res. Lett., 28,
2077–2080.
FEMA, 1995: National Mitigation Strategy: Partnerships for
building safer communities. Federal Emergency Management
Agency Rep., 45 pp. [Available online at http://www.fas.org/
irp/agency/dhs/fema/mitigation.pdf.]
Firat, M., andM. Gungor, 2008: Hydrological time-series modeling
using adaptive neuron-fuzzy inference system. Hydrol. Pro-
cesses, 22, 2122–2132.
Gerard-Marchant, P., and D. Stooksbury, 2010: Impact of El Nino/
Southern Oscillation on low-flows in south Georgia, USA.
Southeast. Geogr., 50, 218–243.
Gershunov, A., and T. P. Barnett, 1998a: ENSO influence on in-
traseasonal extreme rainfall and temperature frequencies in
the contiguous United States: Observations andmodel results.
J. Climate, 11, 1575–1586.
——, and ——, 1998b: Interdecadal modulation of ENSO tele-
connections. Bull. Amer. Meteor. Soc., 79, 2715–2725.
Granger, C. W. J., 1993: Strategies for modeling nonlinear time
series relationships. Econ. Rec., 69, 233–238.
——, and T. Terasvirta, 1993: Modeling Nonlinear Economic Re-
lationships. Oxford University Press, 198 pp.
Hagan, M. T., H. P. Demuth, and M. Beale, 1996: Neural Network
Design. PWS Publishing, 736 pp.
Hagemeyer, B. C., 2006: ENSO, PNA and NAO scenarios for ex-
treme storminess, rainfall and temperature variability during
the Florida dry season. Preprints, 18th Conf. on Climate Var-
iability and Change, Atlanta, GA, Amer. Meteor. Soc., P2.4.
[Available online at http://ams.confex.com/ams/pdfpapers/
98077.pdf.]
Hanley, D. E., M. A. Bourassa, J. J. O’Brien, S. R. Smith, and E. R.
Spade, 2003: A quantitative evaluation of ENSO indices.
J. Climate, 16, 1249–1258.
Hansen, J. W., A. W. Hodges, and J. W. Jones, 1998: ENSO in-
fluences on agriculture in the southeastern United States.
J. Climate, 11, 404–411.
Hastenrath, S., and L. Greischar, 1993: Further work on the pre-
diction of Northeast Brazil rainfall anomalies. J. Climate, 6,
743–758.
Hornik, K., M. Stichcombe, and H. White, 1989: Multi-layer feed-
forward networks are universal approximators. Neural Net-
works, 2, 359–366.
Jang, J. R., 1993: ANFIS: Adaptive-network-based fuzzy inference
system. IEEE Trans. Syst. Man Cybern., 23, 665–685.
Kang, S. Y., 1991:An investigation of the use of feedforward neural
networks for forecasting. Ph.D. dissertation, Kent State Uni-
versity, 162 pp.
Keeling, T. B., 2008: Modified JMA ENSO index and its im-
provements to ENSO classification. M.S. thesis, Dept. of
Meteorology, The Florida State University, 117 pp. [Available
online at etd.lib.fsu.edu/theses_1/available/etd-12222008-094254/
unrestricted/KeelingTThesis.pdf.]
Kim, T., and J. B. Valdes, 2003: Nonlinear model for drought
forecasting based on a conjunction of wavelet transforms and
neural networks. J. Hydrol. Eng., 8, 319–328.Klimasauskas, C. C., 1991: Applying neural networks. Part III:
Training a neural network. PC AI, 5, 20–24.
Konrad, C. E., 1998: Intramonthly indices of the Pacific/North
American teleconnection pattern and temperature regimes
over United States. Theor. Appl. Climatol., 60, 11–19.
Krzysztofowicz, R., 2001: The case for probabilistic forecasting in
hydrology. J. Hydrol., 249, 2–9.Kumar, V., and U. Panu, 1997: Predictive assessment of severity of
agricultural droughts based on agro-climatic factors. J. Amer.
Water Resour. Assoc., 33, 1255–1264.
Leathers, D. J., B. Yarnal, and M. A. Palecki, 1991: The Pacific/
North American pattern and United States climate. Part I:
Regional temperature and precipitation associations. J. Cli-
mate, 4, 517–528.
Lee, S. H., R. J. Howlett, C. Crua, and S. D. Walters, 2007: Fuzzy
logic and neuron-fuzzy modeling of diesel spray penetration:
A comparative study. J. Intell. Fuzzy Syst.: Appl. Eng. Tech-
nol., 18, 43–56.Legates, D. R., and G. J. McCabe, 1999: Evaluating the use of
goodness-of-fit measures in hydrologic and hydroclimatic
model validation. Water Resour. Res., 35, 233–241.
Maier, H. R., and G. C. Dandy, 1998: The effect of internal param-
eters and geometry on the performance of back-propagation
neural networks:An empirical study.Environ.Modell. Software,
13, 193–209.
——, and ——, 2000: Neural networks for the prediction and
forecasting of water resources variables: A review of modeling
issues and applications. Environ. Modell. Software, 15, 101–
124.
Marce, R., M. Comerma, J. C. Garcia, and J. Armengo, 2004:
A neuron-fuzzy modeling tool to estimate fluvial nutrient
loads in watersheds under time-varying human impact.Limnol.
Oceanogr. Methods, 2, 342–355.Marja, M., and M. Esa, 1997: Neuro-fuzzy modelling of spectro-
scopic data. Part A—Modelling of dye solutions. J. Soc. Dyers
Colour., 113, 13–17.
Martinez, C. J., G. A. Baigorria, and J. W. Jones, 2009: Use of
climate indices to predict corn yields in southeast USA. Int.
J. Climatol., 29, 1680–1691.
Mason, S. J., 2002: Analyzing and communicating forecast quality
and probability. Asian Disaster Preparedness Center Asian
Climate TrainingManual,Module 3.3, 9 pp. [Available online at
http://www.adpc.net/ece/ACT_man/ACT-3.3-ForecastQuality.
pdf.]
MathWorks, cited 2012: Fuzzy Logic Toolbox user’s guide,
R2012b. [Available online at http://www.mathworks.com/
access/helpdesk/help/pdf_doc/fuzzy/fuzzy.pdf.]
Mendicino, G., A. Senatore, and P. Versace, 2008: A groundwater
resource index for drought monitoring and forecasting in
a Mediterranean climate. J. Hydrol., 357, 282–302.
Mishra, A. K., and V. R. Desai, 2005: Drought forecasting using
stochastic models. Stochastic Environ. Res. Risk Assess., 19,
326–339.
——, and ——, 2006: Drought forecasting using feed-forward re-
cursive neural network. Ecol. Modell., 198, 127–138.
442 WEATHER AND FORECAST ING VOLUME 28
——, andV. P. Singh, 2010: A review of drought concepts. J. Hydrol.,
391, 202–216.
——, and ——, 2011: Drought modeling—A review. J. Hydrol.,
403, 157–175.——, V. R. Desai, and V. P. Singh, 2007: Drought forecasting using
a hybrid stochastic and neural networkmodel. J. Hydrol. Eng.,
12, 626–638.
Modarres, R., 2006: Streamflow drought time series forecasting.
Stochastic Environ. Res. Risk Assess., 21, 223–233.
Moriasi, D. N., J. G. Arnold, M. W. van Liew, R. L. Bingner, R. D.
Harmel, and T. L. Veith, 2007: Model evaluation guidelines
for systematic quantification of accuracy in watershed simu-
lations. Trans. ASABE, 50, 885–900.
Nash, J. E., and J. V. Sutcliffe, 1970: River flow forecasting through
conceptualmodels. Part I: A discussion of principles. J. Hydrol.,
10, 282–290.
Nayak, P. C., K. P. Sudheer, D. M. Rangan, and K. S. Ramasastri,
2004: A neuron-fuzzy computing technique for modeling hy-
drological time series. J. Hydrol., 291, 52–66.Nicholls, N., 1985: Towards the prediction of major Australian
droughts. Aust. Meteor. Mag., 33, 161–166.
NSTC, 2005: Grand challenges for disaster reduction. National Science
and Technology Council Rep., 25 pp. [Available online at http://
www.sdr.gov/docs/SDRGrandChallengesforDisasterReduction.
pdf.]
Panu, U. S., and T. C. Sharma, 2002: Challenges in drought re-
search: Some perspectives and future directions. Hydrol. Sci.,
47, S19–S30.
Principe, J. C., N. R. Euliano, andW. C. Lefebvre, 1999:Neural and
Adaptive Systems: Fundamentals through Simulations. John
Wiley and Sons, 672 pp.
Rogers, J. C., 1984: The association between the North Atlantic
Oscillation and the Southern Oscillation in the Northern
Hemisphere. Mon. Wea. Rev., 112, 1999–2015.Ropelewski, C. F., and M. S. Halpert, 1986: North American pre-
cipitation and temperature patterns associated with the El
Nino/South Oscillation (ENSO). Mon. Wea. Rev., 115, 2352–2362.
Royce, F. S., C. W. Fraisse, and G. A. Baigorria, 2011: ENSO
classification indices and summer crop yields in the south-
eastern USA. Agric. For. Meteor., 151, 817–826.Rumelhart, D. E., R. Durbin, R. Golden, and Y. Chauvin, 1995:
Backpropagation: The basic theory. Backpropagation:
Theory, Architectures, and Applications, Y. Chauvin and
D. E. Rumelhart, Eds., Lawrence Erlbaum Associates, 1–
34.
Sahoo, G. B., C. Ray, andH. F. Wade, 2005: Pesticide prediction in
ground water in North Carolina domestic wells using artificial
neural networks. Ecol. Modell., 183, 29–46.Samarasinghe, S., 2007: Neural Networks for Applied Sciences and
Engineering. Auerbach Publications, 570 pp.
Schmidt, N., E. K. Lipp, J. B. Rose, andM. E. Luther, 2001: ENSO
influences on seasonal rainfall and river discharge in Florida.
J. Climate, 14, 615–628.
Sevat, E., and A. Dezetter, 1991: Selection of calibration objective
function in the context of rainfall-runoff modeling in a Suda-
nese savannah area. Hydrol. Sci. J., 36, 307–330.
Smith, M., 1996: Neural Networks for Statistical Modeling. In-
ternational Thompson Computer Press, 235 pp.
Smith, S. R., P. M. Green, A. P. Leonardi, and J. J. O’Brien, 1998:
Role of multiple-level troposphereic circulations in forcing
ENSO winter precipitation anomalies. Mon. Wea. Rev., 126,
3102–3116.
Stahl, K., and S. Demuth, 1999: Methods for regional classification
of stream flow drought series: Cluster analysis. University of
Freiburg Assessment of the Regional Impact of Droughts in
Europe Tech. Rep. 1, 8 pp. [Available online at www.hydrology.
uni-freiburg.de/forsch/aride/navigation/publications/pdfs/
aride-techrep1.pdf.]
Steinemann, A. C., 2006: Using climate forecasts for drought
management. J. Appl. Meteor. Climatol., 45, 1353–1361.
Stevens, K. A., 2007: Statistical associations between large scale
climate oscillations and mesoscale surface meteorological
variability in the Apalachicola-Chattahoochee-Flint River
basin. M.S. thesis, Dept. of Meteorology, The Florida State
University, 113 pp. [Available online at http://diginole.lib.fsu.
edu/cgi/viewcontent.cgi?article54838&context5etd.]
Takagi, T., and M. Sugeno, 1985: Fuzzy identification of systems
and its applications to modeling and control. IEEE Trans.
Syst. Man Cybern., 15, 116–132.Tang, Z., and P. A. Fishwick, 1993: Feedforward neural nets as
models for time series forecasting.ORSA J. Comput., 5, 374–
385.
——,C. Almeida, and P. A. Fishwick, 1991: Time series forecasting
using neural networks vs. Box–Jenkins methodology. Simu-
lation, 57, 303–310.
Trenberth, K. E., 1997: The definition of El Nino. Bull. Amer.
Meteor. Soc., 78, 2771–2777.
USDA-SCS, 1972: Estimation of direct runoff from storm rainfall.
Hydrology, Section 4, National Engineering Handbook,
USDA Soil Conservation Service, 24 pp.
Vasiliades, L., and A. Loukas, 2008: Meteorological drought
forecasting models. Geophysical Research Abstracts, Vol. 10,
Abstract EGU2008-A-08458. [Available online at http://cosis.
net/abstracts/EGU2008/08458/EGU2008-A-08458.pdf.]
Wilpen, L. G., D. Nagin, and J. Szczypula, 1994: Comparative
study of artificial neural network and statistical models for
predicting student grade point averages. Int. J. Forecasting,
10, 17–34.
Woli, P., J. W. Jones, K. T. Ingram, and C. W. Fraisse, 2012: Ag-
ricultural reference index for drought (ARID).Agron. J., 104,
287–300.
——, ——, and ——, 2013: Assessing the Agricultural Reference
Index for Drought (ARID) using uncertainty and sensitivity
analyses. Agron. J., 105, 150–160.
Wood, A. W., and D. P. Lettenmaier, 2006: A test bed for new
seasonal hydrologic forecasting approaches in the western
United States. Bull. Amer. Meteor. Soc., 87, 1699–1712.
Yin, Z. Y., 1994: Moisture condition in the south-eastern USA and
teleconnection patterns. Int. J. Climatol., 14, 947–967.
Zadeh, L. A., 1965: Fuzzy sets. Inf. Control, 8, 338–353.
Zhang, G., B. E. Patuwo, and M. Y. Hu, 1998: Forecasting with
artificial neural networks: The state of art. Int. J. Forecasting,
14, 35–62.
APRIL 2013 WOL I ET AL . 443