developing a fuzzy neural network-based support vector regression (fnn-svr) for regionalizing...

Developing a fuzzy neural network-based support vectorregression (FNN-SVR) for regionalizing nitrate concentrationin groundwater

Seiyed Mossa Hosseini & Najmeh Mahjouri

Received: 12 June 2013 /Accepted: 21 January 2014# Springer International Publishing Switzerland 2014

Abstract The aim of this study is to develop a fuzzyneural network-based support vector regression model(FNN-SVR) for mapping crisp-input and fuzzy-outputvariables. In this model, an artificial neural network(ANN) estimator based on multilayer perceptron(MLP) is considered as the kernel function of theSVR, whereas asymmetric triangular fuzzy H-level setsare assumed for model parameters including weight andbiases of the ANN model. A genetic algorithm (GA)with real coding is implemented to optimize the modelparameters during the training phase. To evaluate theefficiency and applicability of the proposed model, it isapplied for simulating and regionalizing nitrate concen-tration in Karaj Aquifer in Iran. The goodness-of-fitcriteria indicate a better performance of the FNN-SVRcompared to some benchmark models such asgeostatistic techniques as well as traditional SVRmodels with linear, quadratic, polynomial, andGaussian kernel functions for modeling nitrate concen-trations in groundwater.

Keywords Fuzzy set theory. Support vector regression .

Genetic algorithm . Nitrate concentration . Groundwaterquality . Artificial neural network

Introduction

In the past decade, it has been proved that computationalintelligence techniques, such as artificial neural net-works (ANNs), evolutionary algorithms (EAs), and sup-port vector regressions (SVRs) are efficient in modeling,simulation, or optimization of complex systems (Maierand Dandy 2000; Ovaska 2005; ASCE Task Committee2000; Yu et al. 2006; Kholghi and Hosseini 2009).Recently, support vector regression (SVR) has beensuccessfully applied in the fields of water resourcesengineering and management for purposes such as run-off prediction (Dibike et al. 2001; Asefa et al. 2006; Yuand Liong 2007), flood forecasting (Liong andSivapragasam 2002; Yu et al. 2006), lake and aquiferwater level prediction (Asefa et al. 2005; Khan andCoulibaly 2006), groundwater monitoring network de-sign (Asefa et al. 2004), soil moisture prediction (Gillet al. 2006), groundwater quality and quantity simula-tion (Krishna et al. 2008; Yoon et al. 2011; Nikoo andMahjouri 2013), and reproducing the groundwater con-tamination data in unsampled sites (Kecman 2001;Asefa and Kemblowski 2002). The first support vectormachine was developed in the early 1960s by Vapnik, aRussian mathematician. It was based on the structuralrisk minimization (SRM) principle in the statisticallearning theory, and it gained popularity due to its

Environ Monit AssessDOI 10.1007/s10661-014-3650-8

S. M. Hosseini (*)Natural Geography Department, University of Tehran,Tehran14155-6465, Irane-mail: [email protected]

N. MahjouriFaculty of Civil Engineering, K.N. Toosi University ofTechnology, Tehran, Irane-mail: [email protected]

attractive features and promising empirical performance(Vapnik 1995; Kalteh 2013). In comparison with ANNs,SVRs are able to learn more effectively when usingscarce and incomplete hydrologic data (Babovic et al.2000; Khan and Coulibaly 2006; Gill et al. 2007;Behzad et al. 2010). This advantage is because of twooutstanding features: their excellent generalization ca-pability on the unseen data (testing phase) and theirsparse representation, in which all training exemplarscan become support vectors (Schölkopf and Smola2002). Liong and Sivapragasam (2002) detailed sevenfacts about the superiority of SVRs over ANNs insimulating and predicting complex phenomena in hy-drogeology and water resources while holding all thestrengths of ANN.

Details of SVR have been documented by manyauthors (e.g., Vapnik 1999 and Kecman 2001); there-fore, only a brief introduction is given here. Formulationof the optimization model stated by Vapnik (1995) isshown in Eqs. 1 and 2. The goal is to find a kernelfunction f(x)=w.Φ(x)+bwith a smallw that has the leastpossible deviation (ε) from the actual targets (Zi) for allN-training data sets. The kernel function simplifies thelearning process by transforming the representation ofdata in the input space to a linear/nonlinear representa-tion in a higher dimensional space called the featurespace (Jiang and Zhao 2013).

Minimize|fflfflfflfflffl{zfflfflfflfflffl}w;b;ξi;ξ

�i

1

2wk k2 þ F

XN

i¼1ξi þ ξ�i� � ð1Þ

Subject to

Zi− wT :Φ xið Þ þ b� �

≤ ε − ξiwT :Φ xið Þ þ b� �

− Zi≤ ε − ξ�iξi; ξ

�i ≥0 ∀i ¼ 1; 2;…;N

8<: ð2Þ

In the above equations, Φ is the given transformationfunction (kernel function) which maps input vector ofdata (x) from the input space to a higher dimensionalfeature space. ξi and ξi

* are slack variables which areused to determine the amount up to which samples witherror more than ε are penalized. Constant F defines thetrade-off between the flatness of f(x) and the toleratederror. For example, if the F value tends to infinity, theobjective becomes minimizing the error regardless ofthe flatness (Cherkassky and Yunqian 2002). Term of‖w‖2 in the objective function of Eq. 2 is a capacitycontrol factor to avoid overfitting in the high dimension-ality feature space.

SVRs have some weaknesses due to tuning of thekernel function type and also the model parameters bythe user to reach the final estimations. Also, identifica-tion of optimal values for the parameters is largely a trialand error process (Sivapragasam et al. 2001; Asefa et al.2006). On the other hand, SVRs have overfitting andunderfitting problems, and the overfitting is more dam-aging than the underfitting (Han et al. 2007). In addition,SVRs are still in their infancy in hydrogeological appli-cations and need more investigations. The availablehydrogeological data are often noisy, uncertain, impre-cise, and incomplete. Therefore, they cannot becompletely represented by few measurements taken onfield for input parameters. This makes it important tocarry out the uncertainty analysis to capture the variabil-ity of the prediction to make better decisions (Varma andBabu 2009).

Following Shrestha and Solomatine (2006), there arefour different approaches to take account of the uncer-tainty associated with a model’s output: The first ap-proach consists of predicting the output of the model ina probabilistic sense by characterizing the model param-eters themselves through probability distributions de-fined on the basis of Bayesian theory (Krzysztofowicz1999). The second approach consists of estimating theuncertainty associated with the model’s output by ana-lyzing the statistical properties of the errors of the modelin reproducing historically observed data and thus deter-mining the corresponding confidence intervals. The thirdapproach is based on the use of resampling techniques,generally known as ensemble methods, or the MonteCarlo method. Finally, the fourth approach is based onthe use of fuzzy number theory. The first and thirdapproaches require a priori definition of the parameter’s(or parameters’) probability distributions. The secondone requires a formulation of the given assumptionsabout errors (e.g., normality and homoscedasticity oferrors), whereas the fourth approach entails defining acriterion for characterizing different credibility levels (H-levels) to be used to construct the fuzzy number (Shresthaand Solomatine 2006). The use of fuzzy approach forevaluating the spatial uncertainty in groundwater pro-vides useful information for spatial uncertainty assess-ments in a nonprobabilistic sense (Tutmez et al. 2006)and allows integration of information of various param-eters into the modeling and evaluation process (Dahiyet al. 2007), but do not reflect natural uncertainties, whichare directly induced by the uncertainties in the modelparameters (Kumar and Schuhmacher 2005).

Environ Monit Assess

Hao and Chiang (2008) developed a fuzzy regressionanalysis model based on support vector learning tech-niques and suggested that the developed model couldperform automatic and accurate control in fuzzy regres-sion analysis tasks. Wu (2011) developed a triangularfuzzy robust wavelet support vector regression(TFRWm-SVM) machine to forecast the nonlinearfuzzy system with input and output fuzzy numbers. Heconcluded that the TFRWm-SVMwas effective in deal-ing with uncertain data and finite samples. Chuang(2008) presented an interval support vector regressionnetwork model, which could handle interval input andoutput data.

Jeng and Lee (1999) proposed a fuzzy SVR methodwhere the weighting vector gained in SVM wasregarded as an initial weighting vector in a fuzzy neuralnetwork (FNN). Juang and Hsieh (2009) proposed aTakagi–Sugeno fuzzy system-based SVR model(TSFS-SVR) by the hybridization of clustering algo-rithm and a linear SVR for function regression prob-lems. In this model, the weights are estimated by acombination of fuzzy clustering and linear SVR.Representative simulation results with some nonlinearmathematical functions showed that the TSFS-SVRachieved a better performance than the Gaussiankernel-based SVR and other regression models.

Hao and Chiang (2007) incorporated the concept offuzzy set theory with symmetric membership functioninto a SVR with typical kernel functions and tested theirmodel using some numerical examples. They indicatedthat the fuzzy set theory could provide effective meansfor considering the vagueness nature of parameters inthe SVR.

Hybridizing the ANN and Fuzzy methods offersadvantages over conventional modeling, including theability to handle large amounts of noisy data fromdynamic and nonlinear systems, especially when theunderlying physical relationships are not fully under-stood (Jang et al. 1997; Aqil et al. 2007).

Lin and Pai (2010) presented a fuzzy support vectorregression (FSVR) model to calculate fuzzy upper andlower bounds then made predictions by fuzzy H-levelset (H-cut) for forecasting indices of business cycles.Hong and Hwang (2003a, b) proposed a support vectorfuzzy regression machine model for modifying convexoptimization problems of multivariate fuzzy linear re-gression models. Empirical results indicated that thedeveloped model derived satisfying solutions efficient-ly. Jeng et al. (2003) developed a support vector interval

regression network to efficiently handle interval outputdata. Yao and Yu (2006) developed a fuzzy regressionbased on asymmetric support vector machines, whichovercame limitations of traditional nonlinear fuzzy re-gression and could effectively be used for parameterestimation.

Alvisi and Franchini (2011) developed a fuzzy neuralnetwork model in which the model parameters wererepresented by fuzzy numbers; and thus, the output ofthis model at each time step was a fuzzy number, whichit was expected to express the total uncertainty connect-ed with the forecasted variable.

This paper uses the capabilities of fuzzy set theory(FST) and ANN and genetic algorithm (GA) in combi-nation with the SVR, all in a new model called fuzzyneural network-based support vector regression (FNN-SVR). In this paper, a multilayered perceptron ANN isembedded as a kernel function in the fuzzy SVR struc-ture to map the crisp input and fuzzy output, while inprevious works, conventional kernel functions such aslinear, polynomial, Gaussian, and logarithmic kernelswere used for mapping input–output data in surface andgroundwater systems. Asymmetric triangular fuzzy H-level membership function is used to fuzzify the modelparameters including weights and biases of theANN model. A real-coded genetic algorithm isutilized to optimize these parameters. The problemof regionalizing the nitrate concentration in moni-toring wells over an aquifer is considered fordemonstrating the efficiency and applicability ofthe proposed model. The results of the FNN-SVRmodel are compared with those of some bench-mark models such as geostatistic approaches (i.e.,Kriging and co-Kriging) and traditional SVR.

FNN-SVR model structure

Incorporating fuzzy set theory to the SVR model

In the literature, many researchers have developed fuzzyregression models in which parameters and outputs ofthe models are fuzzy numbers (Tanaka et al. 1982; Yenet al. 1999; Kacprzyk and Fedrizzi 1992; Zahraie andHosseini 2009). In this section, we fuzzify the parame-ters of the SVR model, the weight vector, and the biasterm (Eq. 1). The fuzzifized parameters will make theSVR model more flexible.


In the fuzzy SVR, the outputs are considered to be

fuzzy variables eZi ¼ zci ; zsi

� �� with asymmetric trian-

gular membership function μeZizið Þ . zi

c and zis are the

center and the spread of eZi . The membership function

of eZi is calculated as follows (Zahraie and Hosseini2009):

μeZi zð Þ¼ 1−

z−zci�� ki:zsi

ð3Þ

where ki is the skewness factor of eZi , and for a sym-metric membership function, it equals unity (Fig. 1).

Each nonfuzzy input vector x=[x1,x2,…,xN]T) is

transformed by the kernel function Φ(x)=[φ(x)1,φ(x)2,…,φ(x)n]

T. In the fuzzy weight vectors W*=(w,v), wand v are respectively the center and the spread vectorsof W*, and n represents the dimension of the feature

space. In Eq. 3, eZi is approximated by Zi*=W*.Φ(xi)+

B*, where B*=(b,d) is the bias term and b and d arerespectively the center and spread, and μZe�i zð Þ can be

defined as follows:

μeZ�izð Þ ¼

1−z− wT:Φ xið Þ þ bð Þj jki: vT: Φ xið Þj j þ dð Þ xi≠ 0 ; ∀ z

1 −1

kixi ¼ 0; z ¼ 0

0 xi ¼ 0 ; z≠0

8>>><>>>:ð4Þ

The fitting degree of the fuzzy model is defined byH=mini(hi) for i=1,2,…,N (N is the number of trainingdata sets) (Fig. 2).

The total ambiguity of the fuzzy output is defined as alinear combination of spreads of the model parameters.To have an acceptable regression model, this termshould be minimized. In this regard, the second termin the objective function of Eq. 1 is replaced by∑ j=1

n vj+d. Similar to the traditional SVR approach (Eqs. 1 and2), the fuzzy model needs to minimize the capacitycontrol term (‖w‖)2. In addition, the constraints of theSVR model (Eqs. 1 and 2) can be rewritten so that allfitting degrees (hi) of the training data set be greater thanthe fitting degree of the fuzzy model (H). Thus, the SVRmodel given in Eqs. 1 and 2 is rewritten as follows:

Minimize|fflfflfflfflffl{zfflfflfflfflffl}w;v;b;d

1

2wk k2 þ F

Xn

j¼1v j þ d ð5Þ

Subject to hi≥H for i=1,2,…,NAs suggested by Tanaka et al. (1982), the constraint

hi≥H can be expanded as follows:

wT:Φ xið Þ þ b� �þ 1−Hð Þ vT: Φ xið Þj j þ d

� �≥Zi

− wT:Φ xið Þ þ b� �þ 1−Hð Þ vT: Φ xið Þj j þ d

� �≥−Zi

�ð6Þ

According to Fig. 1, as H and consequently hichange, the center values of fuzzy coefficients remainconstant, but the spread of fuzzy coefficients increaseswhen H increases. The centroid defuzzificationmethod can be used to convert the fuzzy numbers

of eZ�i to the corresponding crisp output Zi

*. In thispaper, to reduce the model dimensionality, allskewness factors of the fuzzy variables (ki) areconsidered to be equal to k.

The main difference between the SVR model (Eqs. 1and 2) and its corresponding fuzzy model (Eqs. 5 and 6)is that the first model searches for a function that has atmost ε-deviation from the observed Zi, while the fuzzySVR model attempts to find a fuzzy function whichinclude fuzzy parameters that have a fitting degree (H)

considering the fuzzy desired targets eZi .

MLP as a kernel function of the SVR model

A suitable choice of a kernel function (Φ(x)) for theSVR allows the data to become separable in the featurespace despite being inseparable in the original inputspace (Han et al. 2007). Among the different forms ofkernel functions, the Gaussian radial basis function(RBF) is a usually reasonable and superior choice due

( )

1.0

z

Fig. 1 Membership function for the asymmetric triangular fuzzy

number (eZi : Fuzzy number, μeZiZið Þ : Membership function ofeZi; zci : The center of eZi; zsi : The spread of eZi , and ki: The

skewness factor of eZi )


to its flexibility and fewer parameters in comparisonwith other typical kernels such as linear and polynomial(Dibike 2000; Karatzoglou et al. 2006; Hua and Zhang2006; Szidarovszky et al. 2007). Sivapragasam et al.(2001) introduced a combination of RBF and linearkernels and found out that this new kernel is morerobust.

In this paper, a MLP-based neural network functionis considered as the kernel function of the fuzzy SVRmodel in Eq. 6. It will be shown that applying the MLP-ANN, as a kernel function in the fuzzy SVR model,provides more efficiency than typical kernel functionswhich have been frequently used in the literature. Agood review of the applications of ANNs in hydrologicsystems modeling has been provided by the AmericanSociety of Civil Engineers (ASCE) Task Committee(ASCE Task Committee 2000). In this section, only abrief review of the MLP formulation is given. An MLPwith one hidden layer of sigmoid neurons and an outputlayer of linear neurons provides a general framework forapproximating most of nonlinear functions. An MLP

can mathematically be represented as follows (Kumarand Yadav 2011):

eZi ¼ f 2ð Þ b 2ð Þk þ

Xm

j¼1w 2ð Þkj � f 1ð Þ b 1ð Þ

j þXl

i¼1w 1ð Þij � xi

� �� ð7Þ

where xi is the ith nodal value in the input layer, eZi is the

ith estimated output corresponding to xi, bj is the bias ofthe jth node in the hidden layer, bk is the bias of the k

th

output variable, wij is the weight of the connector be-tween the ith node in the input layer and jth node in thehidden layer, l is the number of nodes in the input layer(number of input variables),m is the number of nodes inthe hidden layer, and f(1) and f(2) are the activationfunctions in the hidden and output layers, respectively.In Eq. 7, the weights between nodes in different layersand also the bias terms are considered as fuzzy numbersand compose the decision variables of the training pro-cess of the FNN-SVR model. The fuzzified form ofEq. 7 is considered as the kernel function of Eq. 6.Thus, the structure of FNN-SVR model is as follows:

Minimizing|fflfflfflfflfflfflffl{zfflfflfflfflfflfflffl}w;v;b;d;F;k;H

J ¼ 1

2wk k2 þ F � 1þ kð Þ �

Xm

j¼1v 2ð Þkj þ

Xm

j¼1

Xl

i¼1v 1ð Þji þ

Xm

i¼1d 1ð Þi þ d 2ð Þ

k

h i ð8Þ

Subject to:

f 2ð Þ b 2ð Þk þ

Xm


j þXl


� �� þ

1−Hð Þ � f 2ð Þ d 2ð Þk þ

Xm

j¼1v 2ð Þkj � f 1ð Þ d 1ð Þ

j þXl

i¼1v 1ð Þij � xij j

� �� ≥Zi

ð9Þ

( . | ( ) | + )

Fig. 2 Fitting degree of eZ�i for

observed fuzzy eZi with H-levelfactor


− f 2ð Þ b 2ð Þk þ

Xm


j þXn


� �� þ

1−Hð Þ � f 2ð Þ k � d 2ð Þk þ k �

Xm

j¼1v 2ð Þkj � f 1ð Þ K � d 1ð Þ

j þ k �Xn

i¼1v 1ð Þij � xij j

� �� ≥−Zi

∀i ¼ 1; 2;…;N ; F;H ; v; d > 0; k≥1

ð10Þ

Fig. 3 Location of monitoringwells in Karaj aquifer (a) and aScatter plot of NO–

3

concentration in 175 monitoringwells during the years 2000 to2005 (b). Values in figure aindicate the well numbers

Fig. 4 The percentages of monitoring wells showing nitrate con-centration less than the drinking water quality standards

Fig. 5 Nitrate concentration versus groundwater level during2000–2005 for well #47


Coupling the fuzzy SVR model with genetic algorithm

In the training process of an ordinary ANN for apredetermined network, the weights and biases need tobe optimized. To prevent the solutions from being cap-tured in some local optima, a meta-heuristic

optimization algorithm can be combined with ANN(Shevade et al. 2000; Schölkopf and Smola 2002; Wuand Chau 2006; Kazemi and Hosseini 2011; Morshedand Kaluarachchi 2012). Genetic algorithm (GA) as ameta-heuristic global optimization technique imitatesthe natural selection of chromosomes with better fitness

Population size=20Selection type=stochastic selectionElite count=2Crossover type (rate) = scattered (0.8)Mutation type (rate) = uniform (0.01)

Fig. 6 Convergence trend of GAto optimize the FNN-SVRparameters considering 3, 5, and10 nodes in the hidden layer

Fig. 7 Observed values of nitrate concentrations versus simulated values by FNN-SVR model considering (a) 3, (b) 5, and (c) 10 nodes inthe hidden layer


to survive in the environment. GA has been applied tomany complex optimization problems to obtain optimalor near optimal solutions (Goldberg 1989; Winston andVenkataramanan 2003). In this paper, a GA with realcoding is implemented to determine optimal parametersof the FNN-SVR model. According to previous workssuch as Rogers (1996) and Chang and Chen (2004), aGA with real coding has some advantages over binarycoding GA. Decision variables of the proposed modelinclude fuzzy parameters of the weights (v,w), bias terms(b,d), and parameters of F, H, and K. Depending on thenumber of neurons in the hidden layer of the MLPmodel, the number of genes in each chromosome varies.

Problem statement

The performance of the proposed methodology isassessed by applying it for regionalizing nitrate concen-tration in Karaj aquifer, Iran. Limited amount of precip-itation, with annual average of 300 mm, lack of apermanent river in the area, and rapid population growthin the last decade (1.2 million people in the year 1995and 2.0 million people in the year 2010) have led tooverexploitation of the aquifer which is the only avail-able water resource in the area. During the last decade, aconsiderable drawdown has been observed in the aver-age groundwater level (0.8 m/year). Also, intensive useof fertilizers by agricultural sectors, discharge of waste-water through septic wells, and leakage from landfillshave resulted in drastic deviation of nitrate contamina-tion from the standards. In Fig. 3, the locations ofmonitoring wells as well as a scatter plot of nitrateconcentration in 175 wells during 6 years (2000 to

2005) are depicted. The data of nitrate concentration inmonitoring wells have been gathered by the Water andWastewater Company of the Tehran Province.

Figure 4 illustrates the percentage of monitoringwells in which nitrate concentration meets the standardof 10 ppm. Nitrate in drinking water beyond the allowedlevel can be a potential cause of blue baby syndrome ininfants and development of cancer (Winneberger 1982;Ward et al. 2005). This fact shows the necessity ofhaving a reliable and accurate monitoring system. Inthis paper, the performance of the proposed FNN-SVRin providing the spatial distribution of nitrate concentra-tion over the aquifer is evaluated.

In the considered aquifer, between the groundwaterlevel in monitoring wells and corresponding nitrateconcentrations, good negative correlation is observed(averagely −0.63). Figure 5 indicates the dependencyof nitrate concentration to groundwater level, during2000–2005 for one of the wells as an instance. It revealsthat a good negative correlation exists between thenitrate concentration and groundwater level.

Fig. 8 Fuzzified results of nitrate concentration in monitoring wells simulated by the FNN-SVR model with five nodes in the hidden layer

Table 1 Evaluation criteria of FNN-SVR outputs with differentkernel functions

Kernel function No. of parameters Evaluation criteria

SE CE

MLP (with five nodes) 55 0.31 0.70

Linear 8 1.26 −0.02Quadratic 14 0.59 −0.04Polynomial (order 3) 22 0.62 −0.14Polynomial (order 5) 38 0.76 −0.64Gaussian 8 0.41 0.54


Results and discussion

The inputs of the FNN-SVR model are the location ofmonitoring wells (x and y) as well as groundwater levelin corresponding wells (Fig. 3). All input data of FNN-SVR are standardized so that they range between zeroand one. Leave-one-out cross validation technique(LOOCV) is applied for both training and testingphases. In the LOOCV method, the existing data isdivided into training set including N−1 data and testingset including the rest of the data. This procedure isrepeatedN=175 times (N is the number of training data),providingN candidate forecasts that cover the variabilityin sampling and parameter estimation with the fixed setof potential predictors (Cawley and Talbot 2004).

The FNN-SVR model has been programmed inMATLAB environment (Release #14). Two evaluationcriteria have been considered to compute the goodnessof fit of the model:

– Coefficient of efficiency (CE)

CE ¼ 1:0−

XN

i¼1Cobs

i −Csimi

2XN

i¼1Cobs

i −Cobs

i

� �2 ð11Þ

– Standard error (SE)

SE ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

N

XN

i¼1Cobs

i −Csimi

2rC obs

i

ð12Þ

where Ciobs and Ci

sim represent the ith observed andcalculated values, respectively; C

obsi represents the

mean of the observed values and N is the number ofobserved data.

In the FNN-SVR model, the effect of consideringdifferent nodes in the hidden layer on model output isevaluated. For easy comparison, the effect of consider-ing 3, 5, and 10 nodes are investigated here. Increasingthe number of nodes in the hidden layer increases thenumber of parameters that need to be optimized to 35,55, and 105, respectively. Stochastic selection, uniformmutation, and scattered crossover (Holland 1992) areused in this paper. The optimum values for crossoverprobability, mutation probability, elite count, and popu-lation size are obtained through trial and error as 0.8,0.01, 2, and 20, respectively. As an example, Fig. 6illustrates the convergence trend of the developed GAconsidering 3, 5, and 10 nodes in the hidden layer.According to this figure, with the increasing number ofnodes in the hidden layer, the number of generation toconverge the GA increases (i.e., 45, 75, and 100 gener-ations, respectively for 3, 5, and 10 nodes in the hiddenlayer). Nitrate concentration in the aquifer is simulated

1009080706050403020100

20

15

10

5

0

Nitrate Conc. (mg/l)

Fre

quen

cy

2.11.81.51.20.90.6

30

25

20

15

10

5

0

Nitrate Conc. (mg/l)

Fre

quen

cy

=0.05=0.962

-value=0.015Status=Non-significance

=0.05=0.660

-value=0.082Status=Significance

a b

Fig. 9 Histograms of original (a) and transformed (b) nitrate concentrations

0.60.40.20.0-0.2-0.4-0.6

10

8

6

4

2

0

Residuals

Fre

quen

cy

=0.05=0.336

-value=0.496Status=Significance

Fig. 10 Histogram of residuals of the fitted polynomial (Eq. 13)


using the optimized parameters corresponding to 3, 5,and 10 nodes in the hidden layer. The simulation resultsare depicted in Fig. 7.

It can be inferred that by using five nodes in thehidden layer, better results are obtained. The optimizedvalues of the regularization parameter (F), degree offitting of the fuzzy model (H), and skewness factor offuzzy parameters (k) considering five nodes in the hid-den layer are 10.20, 0.476, and 0.13, respectively.Figure 8 depicts simulated fuzzy values of the nitrateconcentration using the selected FNN-SVR model.According to this figure, the simulated upper and lowerfuzzy outputs almost cover the observed nitrate

concentration. These simulated fuzzy values can bedefuzzified using the centroid method.

In this paper, the performance of the FNN-SVRmodel is also evaluated using traditional kernel func-tions of linear, quadratic, gaussian, and polynomial (or-ders 3 and 5). The number of FNN-SVR parameters thatneed to be optimized for each kernel function are givenin Table 1. The results of the goodness-of-fit criteria (SEand CE) of FNN-SVRmodel with different kernel func-tions are also shown in Table 1. As seen in this table, theFNN-SVR model with MLP as the kernel function hasthe best performance compared to other kernelfunctions.

Table 2 Variogram parameters and the goodness-of-fit criteria for nitrate, groundwater level, and cross variable

Variable Variogram type Variogram parameters Goodness-of-fit criteria

C0 (mg/l)2 A (m) C0+C1 (mg/l)2 R2 RMSE

C1

C0þC1

Nitrate Gaussian 0.0292 12332.21 0.1450 0.967 1.80×10−5 0.800

Spherical 0.0148 15610.00 0.1496 0.958 2.05×10−5 0.901

Exponential 0.0128 63300.00 0.3196 0.956 2.08×10−5 0.960

Linear 0.0162 8768.97 0.1206 0.966 1.82×10−5 0.865

Groundwater level Gaussian 0.000 7430.00 0.0004 0.997 2.00×10−11 0.997

Spherical 0.000 10950.00 0.0004 0.976 2.41×10−10 0.997

Exponential 0.000 42690.00 0.0008 0.966 2.72×10−10 0.999

Linear 0.000 1.728 4.64×10−8 0.943 2.45×10−8 1.00

Cross variable of nitrate and groundwater level Gaussian 0.000 36546.27 0.020 0.933 1.37×10−7 1.00

Spherical 0.000 21100.00 0.0039 0.792 2.45×10−7 0.977

Exponential 0.000 63300.00 0.0065 0.769 2.46×10−7 0.998

Linear 0.000 4.422 3.77×10−7 0.809 1.61×10−6 −25.51

Fig. 11 Scatter plots of simulated nitrate concentrations obtained using Kriging (a) and co-Kriging (b) models versus correspondingobserved values


To investigate the effectiveness of the proposedmethodology, its results are compared with those ob-tained from geostatistical approaches such as Krigingand co-Kriging which are well-known tools for handlingspatial distribution of hydrogeological data (Ma et al.1999; Szidarovszky et al. 2007; Kholghi and Hosseini2009). More detailed discussions on these techniquesare given by Journel and Huijbregts (1978) and Isaaksand Srivastava (1989). To apply the Kriging and co-Kriging techniques, the normality test is conducted onobserved data (nitrate concentrations and water levels inmonitoring wells). The Anderson–Darling goodness-of-fit test which is used for a test of normality shows thatthe test statistic AD for the original data is higher thanthe corresponding critical P value at a significance levelof 5 % (with P value=0.96 and AD=0.15). Therefore,the null hypothesis for normal distribution is rejected,and the original data needs to be transformed (seeFig. 9). Using the logarithmic transformation, the

original data is approximated to normal (with P val-ue=0.66 and AD=0.82).

To satisfy the second-order stationarity condition ofnitrate data over the space, a second degree polynomialfunction is fitted to the original data as follows:

C ¼ −0:197 x2 þ 0:178 y2 þ 0:417 x:y−1:671 x−0:106 yþ 3:275

ð13Þwhere C is the nitrate concentration vector and x and yare longitude and latitude of sampling points, respec-tively. The Anderson–Darling test with a significancelevel of 5 % proves the normality of regression residuals(AD=0.336 and P value=0.496) as shown in Fig. 10.After the normality and second-order stationary test ofdata, variography analysis is conducted.

The empirical variogram, which is a function ofdistance lag, can be obtained using theoretical models.In this study, exponential, spherical, Gaussian, and lin-ear variograms are used which are mathematicallyexpressed as follows:

– Exponential model

γ h;C0;C1; að Þ ¼ C0 þ C1 1−e−ha

h ið14Þ

– Spherical model

γ h;C0;C1; að Þ ¼ C0 þ C1 1:5� h

a−0:5� h

a

� �3 !

;

C0 þ C1; h > a

h≤a

8><>: ð15Þ

Table 3 The values of goodness-of-fit criteria for simulatingNitrate concentration using geostatistic techniques

Model SE CE

Kriging 0.39 0.57

Co-Kriging 0.42 0.50

Gaussi

an

Poly (O

rder

5)

Poly (O

rder

3)

Quadra

tic P

oly.

Linea

r Poly

.

Co-Krig

ing

Krigin

g

FNN-SVR (M

LP5)

Observ

ed

200

175

150

125

100

75

50

25

0

Nit

rate

Con

c. (

mg/

l)

Max

Upper Quintile

Lower Quintile

Mean

Median

Min

Fig. 12 Box plots of observedand simulated values of nitrateconcentration by different models


– Gaussian model

γ h;C0;C1; að Þ ¼ C0 þ C1 1−e −h2

a2

� �� ð16Þ

– Linear model

γ h;C0;C1; að Þ ¼ C0 þ hC1

a

� �� ð17Þ

where C0 is the nugget variance,C1 represents structuralvariance, and a is the range parameter. Term of C0+C1

denotes the sill parameter of the variogram. The math-ematical models of Eqs. 14 to 17 are fitted to theempirical variogram obtained from the original data setof nitrate, GWL, and cross variable of nitrate and GWL.

Three evaluation criteria of correlation coefficient(R2), root mean squares of errors (RMSE), and ratio ofspatial structure variance (C1) to the sill (C0+C1) areconsidered to determine the goodness of fit of the fittedvariogram to the observed values (Table 2). The statis-tics of C1/(C0+C1) will be equal to 1.0 for variogramwithout nugget effect. The results given in Table 2 showthat the Gaussian variogram is the best-fitted theoreticalmodel to the empirical variograms of nitrate concentra-tion and groundwater level and cross variogram of ni-trate concentration and groundwater level data. Thesimulated values of nitrate concentration are obtainedusing Kriging and co-Kriging methods based on leave-one-out cross validation technique. The scatter plots ofthe results are illustrated in Fig. 11.

The R2 value for both geostatistic approaches (0.54and 0.57) are less than that obtained by the FNN-SVRmodel (0.71). Also, the values of SE and CE goodness-of-fit criteria for Kriging and co-Kriging techniques arecalculated and given in Table 3. This table reveals thatthe FNN-SVR model outperforms the geostatistic tech-niques in terms of the SE and CE criteria (see alsoTable 1).

The performance of the FNN-SVR model with dif-ferent kernel functions and also the benchmark modelsare compared from the viewpoint of maintaining themain statistical characteristics of original data such asminimum, maximum, mean, median, and upper andlower quintiles as box plots shown in Fig. 12. Theresults show that the FNN-SVR model has more capa-bility to keep the main statistics of the original datacompared to other models. However, other models(e.g., Kriging and co-Kriging) provide acceptable re-

13590450-45-90-135

0.025

0.020

0.015

0.010

0.005

0.000

Residuals (mg/l)

Pro

babi

lity

Den

sity

Fun

ctio

n

FNN-SVR (MLP5)

KrigingCo-Kriging

Linear Poly.Quadratic Poly.

Poly (Order3)Poly (Order5)

Gaussian

Fig. 13 Probability densityfunction of residuals obtainedusing different models

Table 4 Mean and standard deviation of residuals of differentmodels

Model Mean Standard deviation

FNN-SVR (MLP5) 1.68 16.84

Kriging 3.74 17.77

Co-Kriging −14.24 26.10

SVR (linear) 16.29 26.77

SVR (quadratic) −30.92 33.17

SVR (polynomial order 3) −40.60 35.77

SVR (polynomial order 5) 16.49 28.54

SVR (Gaussian) 10.01 21.48


sults in terms of some statistics.The probability density function (PDF) and also the

corresponding mean and standard deviation of the modelresiduals are computed and shown in Fig. 13 and Table 4,respectively. In an ideal condition, the mean and standarddeviation of the residuals should be equal to zero. Theresults show that the PDF of FNN-SVR residuals aresymmetrical with less mean and standard deviation.

Summary and conclusion

In this paper, a new model named FNN-SVR was de-veloped and verified for regionalizing nitrate concentra-tion in groundwater systems. In this model, ANN wasapplied as a kernel function of the SVR model. Toevaluate the efficiency of the FNN-SVR model, theobtained results were compared with those of somewell-known models such as conventional SVR withdifferent kernel functions as well as Kriging-basedmethods. The results showed that the proposed modelcan outperform the previous models in regionalizationof nitrate concentration in groundwater systems due tothe capability of ANN in learning complex and nonlin-ear relations and good generalization performance of theSVR. In addition, the application of fuzzy H-level canimprove the reliability of predictions. The results of thispaper illustrated that the FNN-SVR model outperformsthe Kriging method for regionalizing the nitrate concen-tration in groundwater. However, Kriging can providethe standard error of estimations which can be used indeveloping optimal groundwater quality and quantitymonitoring networks. This is a special advantage ofKriging compared to the FNN-SVR.

The FNN-SVR can provide the required data forgroundwater quality zoning and groundwater vulnera-bility assessment. The results of this model can also beused as main inputs of entropy-based techniques for theassessment and redesign of groundwater systems.

References

Alvisi, S., & Franchini, M. (2011). Fuzzy neural networks forwater level and discharge forecasting with uncertainty.Environmental Modelling & Software, 26, 523–537.

Aqil, M., Kita, I., Yano, A., & Nishiyama, S. (2007). A compar-ative study of artificial neural networks and neuro-fuzzy incontinuous modeling of the daily and hourly behaviour ofrunoff. Journal of Hydrology, 337, 22–34.

Asefa, T., & Kemblowski, M.W. (2002). Support vector machinesapproximation of flow and transport models in initial ground-water contamination network design. EOS. Transactions ofthe American Geophysical Union, Fall Meeting 2002, ab-stract #H72D-0882.

Asefa, T., Kemblowski, M. W., Urroz, G., McKee, M., & Khalil,A. (2004). Support vectors-based groundwater head obser-vation networks design. Water Resources Research. doi:10.1029/ 2004WR003304.

Asefa, T., Kemblowski, M. W., Lall, U., & Urroz, G. (2005).Support vector machines for nonlinear state space reconstruc-tion: application to the Great Salt Lake time series. WaterResources Research. doi:10.1029/2004WR003785.

Asefa, T., Kemblowski, M., McKee, M., & Khalil, A. (2006).Multi-time scale stream flow predictions: the support vectormachines approach. Journal of Hydrology, 318, 7–16.

Babovic, V., Keijzer, M., & Bundzel, M. (2000). From global tolocal modelling: a case study in error correction of determin-istic models, hydroinformatics. USA: Iowa Institute ofHydraulic Research, Iowa.

Behzad, M., Asghari, K., & Coppola, E. A. (2010). Comparativestudy of SVMs and ANNs in aquifer water level prediction.Journal of Computing in Civil Engineering ASCE, 24, 408–413.

Cawley, G. C., & Talbot, N. L. C. (2004). Fast exact leave-one-outcross-validation of sparse least-squares support vector ma-chines. Neural Networks, 17(10), 1467–1475.

Chang, F. J., & Chen, L. (2004). Real-coded genetic algorithm forrule-based flood control reservoir management. WaterResources Management, 12(3), 185–198.

Cherkassky, V., & Yunqian, M. Y. (2002). Practical selection ofSVM parameters and noise estimation for SVM regression.Neural Networks, 17, 113–126. doi:10.1016/S0893-6080(03)00169-2.

Chuang, C. C. (2008). Extended support vector interval regressionnetworks for interval input–output data. InformationSciences, 178, 871–891.

Dahiy, S., Singh, B., Gaur, S., Garg, V. K., & Kushwaha, H. S.(2007). Analysis of groundwater quality using fuzzy syntheticevaluation. Journal of Hazardous Materials, 147, 938–946.

Dibike, Y. B. (2000). Machine learning paradigms for rainfall-runoff modeling, hydroinformatics. USA: Iowa Institute ofHydraulic Research, Iowa.

Dibike, Y. B., Velickov, S., Solomatine, D., & Abbott, M. B.(2001). Model induction with support vector machines: in-troduction and applications. Journal of Computing in CivilEngineering, 15(3), 208–216.

Gill, M. K., Asefa, T., Kemblowski, M. W., & McKee, M. (2006).Soil moisture prediction using support vector machines.Journal of the American Water Resources Association,42(4), 1033–1046.

Gill, M. K., Asefa, T., Kaheil, Y., & McKee, M. (2007). Effect ofmissing data on performance of learning algorithms for hy-drologic predictions: implications to an imputation technique.Water Resources Research. doi:10.1029/2006WR005298.

Goldberg, D. E. (1989). A comparative analysis of selectionscheme used in genetic algorithms. Foundations of geneticalgorithms (pp. 69–93). San Mateo: Morgan Kaufman.

Han, D., Chan, L., & Zhu, N. (2007). Flood forecasting usingsupport vector machines. Journal of Hydroinformatics, 9(4),267–276.


http://dx.doi.org/10.1029/%202004WR003304

http://dx.doi.org/10.1029/%202004WR003304

http://dx.doi.org/10.1029/2004WR003785

http://dx.doi.org/10.1016/S0893-6080(03)00169-2

http://dx.doi.org/10.1016/S0893-6080(03)00169-2

http://dx.doi.org/10.1029/2006WR005298

Hao, P. Y., & Chiang, J. H. (2007). A fuzzy model of supportvector regression machine. International Journal of FuzzySystems, 9(1), 45–50.

Hao, P. Y., & Chiang, J. H. (2008). Fuzzy regression analysis bysupport vector learning approach. IEEE Transactions onFuzzy Systems, 16(2), 428–441.

Holland, J. H. (1992). Genetic algorithms. Scientific American,267(1), 66–72.

Hong, D. H., & Hwang, C. (2003a). Support vector fuzzy regres-sion machines. Fuzzy Sets and Systems, 138(2), 271–281.

Hong, D. H., & Hwang, C. (2003b). Support vector fuzzy regres-sion machines. Fuzzy Sets and Systems, 138, 271–281.

Hua, Z. S., & Zhang, B. (2006). A hybrid support vector machinesand logistic regression approach for forecasting intermittentdemand of spare parts. Applied Mathematics andComputation, 181, 1035–1048.

Isaaks, E. H., & Srivastava, R.M. (1989).An introduction to appliedgeostatistics (p. 561). Oxford: Oxford University Press.

Jang, J. S. R., Sun, C. T., & Mizutani, E. (1997). Neuro-fuzzy andsoft computing, a computational approach to learning andmachine intelligence. USA, NJ: Prentice Hall.

Jeng, J.T., & Lee, T.T. (1999). Support vector machines for thefuzzy neural networks, In: Proc. IEEE InternationalConference on System, Man and Cybernetics (IEEESMC’99), 115–120.

Jeng, J. T., Chuang, C. C., & Su, S. F. (2003). Support vectorinterval regression networks for interval regression analysis.Fuzzy Sets and Systems, 138, 283–300.

Jiang, B. T., & Zhao, F. Y. (2013). Combination of support vectorregression and artificial neural networks for prediction ofcritical heat flux. International Journal of Heat and MassTransfer, 62, 481–494.

Journel, A. G., & Huijbregts, C. J. (1978). Mining Geostatistics.San Francisco: New York Academic Press.

Juang, C. F., & Hsieh, C. D. (2009). TS-fuzzy system-basedsupport vector regression. Fuzzy Sets and Systems, 160,2486–2504.

Kacprzyk, J., & Fedrizzi, M. (1992). Fuzzy regression analysis.Heidelberg: Physics-Verlag.

Kalteh, A. M. (2013). Monthly river flow forecasting usingartificial neural network and support vector regressionmodels coupled with wavelet transform. Computers &Geosciences, 54, 1–8.

Karatzoglou, A., Meyer, D., & Hornik, K. (2006). Support vectormachine in R. Journal of Statistical Software, 15(9), 1–28.

Kazemi, S. M., & Hosseini, S. M. (2011). Comparison of spatialinterpolation methods for estimating heavy metals in sedi-ments of Caspian Sea. Expert Systems with Applications, 38,1632–1649.

Kecman, V. (2001). Learning and soft computing: support vectormachines, neural networks and fuzzy logic models.Cambridge, MA: The MIT Press.

Khan,M. S., & Coulibaly, P. (2006). Application of support vectormachine in lake water level prediction. Journal of HydrologicEngineering, 11, 199–205.

Kholghi, M., & Hosseini, S. M. (2009). Comparison of ground-water level estimation using neuro-fuzzy and ordinaryKriging. Environmental Modeling and Assessment. doi:10.1007/s10666-008-9174-2.

Krishna, B., Satyaji, R. Y. R., & Vijaya, T. (2008).Modelling groundwater levels in an urban coastal

aquifer using artificial neural networks. HydrologicalProcesses, 22, 1180–1188.

Krzysztofowicz, R. (1999). Bayesian theory of probabilistic fore-casting via deterministic hydrologic model. Water ResourcesResearch, 35(9), 2739–2750.

Kumar V., & Schuhmacher M. (2005). Fuzzy uncertainty analysisin system modeling. European Symposium on ComputerAided Process Engineering, 391–396.

Kumar, M., & Yadav, N. (2011). Multilayer perceptrons and radialbasis function neural network methods for the solution ofdifferential equations: a survey. Computers and Mathematicswith Applications, 62, 3796–3811.

Lin, K. P., & Pai, P. F. (2010). A fuzzy support vector regressionmodel for business cycle predictions. Expert Systems withApplications, 37, 5430–5435.

Liong, S. Y., & Sivapragasam, C. (2002). Flood stage forecastingwith support vector machines. Journal of the American WaterResources Association, 38(1), 173–196.

Ma, T. S., Sophocleous, M., & Yu, Y. S. (1999). Geostatisticalapplications in ground-water modeling in south–centralKansas. Journal of Hydrologic Engineering, 4(1), 57–64.

Maier, H. R., & Dandy, G. C. (2000). Neural networks for theprediction and forecasting of water resources variables: areview of modelling issues and applications. EnvironmentalModeling and Software, 101–124.

Morshed, J., & Kaluarachchi, J. J. (2012). Parameter estimationusing artificial neural network and genetic algorithm for free-product migration and recovery. Water Resources Research,34(5), 1101–1113.

Nikoo, M. R. & Mahjouri, N. (2013) Water quality zoning usingprobabilistic support vector machines: two case studies,Water Resources Management, Springer, 27, 2577–2594

Ovaska, S. J. (2005). Computationally intelligent hybrid systems.Piscataway, New Jersey: Wiley-IEEE Press.

Rogers, D. (1996). Some theory and examples of geneticfunction approximation with comparison to evolutionarytechniques. In J. Devillers (Ed.), Genetic algorithms inmolecular modeling (pp. 87–107). London: AcademicPress.

Schölkopf, B., & Smola, A. J. (2002). Learning with kernels:support vector machines, regularization, optimization, andbeyond. Cambridge: MIT Press. 626.

Shevade, S. K., Keerthi, S. S., Bhattacharyya, C., &Murthy, K. R. K. (2000). Improvements to the SMOalgorithm for SVM regression. IEEE Trans on NeuralNetw, 11, 1188–1193.

Shrestha, D. L., & Solomatine, D. P. (2006). Experiments withAdaBoost. RT, an improved boosting scheme for regression.Neural Computation, 18, 1678–1710.

Sivapragasam, C., Liong, S. Y., & Pasha, M. F. K. (2001). Rainfalland runoff forecasting with SSA–SVM approach. Journal ofHydroinformatics, 3(3), 141–152.

Szidarovszky, F., Coppola, E. A., Long, J., Hall, A. D., & Poulton,M.M. (2007). A hybrid artificial neural network-numerical modelfor ground water problems. Ground Water, 45(5), 590–600.

Tanaka, H., Uejima, S., & Asai, K. (1982). Linear regressionanalysis with fuzzy model. IEEE Transactions on SystemsMan and Cybernetics, 12(6), 903–907.

Task Committee, A. S. C. E. (2000). Artificial neural networks inhydrology. II: hydrologic applications. Journal of HydrologicEngineering, ASCE, 5(2), 124–137.


http://dx.doi.org/10.1007/s10666-008-9174-2

http://dx.doi.org/10.1007/s10666-008-9174-2

Tutmez, B., Hatipoglu, Z., & Kaymak, U. (2006). Modelling electri-cal conductivity of groundwater using an adaptive neurofuzzyinference system. Computers & Geosciences, 32, 421–433.

Vapnik, V. (1995). The nature of statistical learning theory. NewYork: Springer-Verlag.

Vapnik, V. (1999). An overview of statistical learning theory. IEEETransactions on Neural Networks, l0(5), 988–999.

Varma, M., & Babu, B. R. (2009). More generality in efficientmultiple kernel learning. In proceedings of the 26th annualinternational conference on machine learning (ICML) (pp.1065–1072). New York: ACM.

Ward, M. H., deKok, T. M., Levallois, P., Brender, J., Gulis, G.,Nolan, B. T., et al. (2005). Workgroup report: drinking-waternitrate and health—recent findings and research needs.Environmental Health Perspectives, 113(11), 1607–1614.

Winneberger, J.H. (1982). Nitrogen, Public health and the envi-ronment. Ann Arbor, Michigan: Ann Arbor Science pub-lishers, Inc.

Winston, W. L., & Venkataramanan, M. (2003). Introduction toMathematical Programming. Pacific Grove: Brooks/Cole.

Wu, Q. (2011). The complex fuzzy system forecasting model basedon triangular fuzzy robust wavelet m-support vector machine.Expert Systems with Applications, 38, 14478–14489.

Wu, C. L., & Chau, K. W. (2006). A flood forecasting neuralnetwork model with genetic algorithm. InternationalJournal of Environment and Pollution, 28(3/4), 261–272.

Yao, C. C., & Yu, P. T. (2006). Fuzzy regression based on asym-metric support vector machines. Applied Mathematics andComputation, 182, 175–193.

Yen, K. K., Goshray, S., & Roig, G. (1999). A linear regressionmodel using triangular fuzzy number coefficients. Fuzzy Setsand Systems, 106, 166–177.

Yoon, H., Jun, S. C., Hyun, Y., Bae, G. O., & Lee, K. K. (2011). Acomparative study of artificial neural networks and supportvector machines for predicting groundwater levels in a coast-al aquifer. Journal of Hydrology, 396, 128–138.

Yu, X. Y., & Liong, S. Y. (2007). Forecasting of hydrology timeseries with ridge regression in feature space. Journal ofHydrology, 332(3–4), 290–302.

Yu, P. S., Chen, S. T., & Chang, I. F. (2006). Support vectorregression for real-time flood stage forecasting. Journal ofHydrology, 328(3–4), 704–716.

Zahraie, B., & Hosseini, S. M. (2009). Development of reservoiroperation policies considering variable agricultural waterdemands. Expert Systems with Applications, 36, 4980–4987.


developing a fuzzy neural network-based support vector regression (fnn-svr) for regionalizing...

Documents