forecasting black river flow using feedforward …sci.tamucc.edu/~cams/projects/528.pdfforecasting...
TRANSCRIPT
Forecasting Black River Flow UsingFeedforward Artificial Neural Networks
GRADUATE PROJECT REPORT
Submitted to the Faculty ofthe Department of Computing SciencesTexas A&M University-Corpus Christi
Corpus Christi, Texas
In Partial Fulfillment of the Requirements for the Degree ofMaster of Science in Computer Science
By
Sarvani ThotaSpring 2018
Committee Members
Dr. Alaa ShetaCommittee Chairperson
Dr. Ajay K. KatangurCommittee Member
ii
ABSTRACT
The River flow estimation plays a significant role where agriculture is the country’s
central economy. In this work, Artificial Neural Network (ANN) and Auto Regressive
(AR) models are used to solve the river forecasting problem of Black River at Elyria
OH. For these modeling techniques, single attribute and multiple attribute datasets are
considered. Time series modeling problems are solved too, as Artificial neural networks
(ANN) is quite successful to deal with time series modeling problem. A comparison is
made between multi attribute model and single attribute delay model. The single attribute
delay model is designed based on the model order selection. Simulation results indicate
that the performance is better with neural network model compared to the auto-regression
model in case of both multi attribute model and single attribute delay model. Hence, neural
network model is recommended as a tool for river flow forecasting.
iii
TABLE OF CONTENTS
CHAPTER Page
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
TABLE OF CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Supervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Historical Background and Related Works . . . . . . . . . . . . . . . . . 41.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.5 Research Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.6 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 MODELING TECHNIQUES . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 Traditional modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.1 Linear Regression Model (LR) . . . . . . . . . . . . . . . . . . . 112.1.2 Auto Regression Model (AR) . . . . . . . . . . . . . . . . . . . . 13
2.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.1 Neural Network Components . . . . . . . . . . . . . . . . . . . . 16
2.2.1.1 Types of Neural Networks . . . . . . . . . . . . . . . . . . 182.2.2 Network Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2.1 Design of Network Structure . . . . . . . . . . . . . . . . . 202.2.3 Backpropagation Learning algorithm . . . . . . . . . . . . . . . . 20
3 SYSTEM DESIGN AND IMPLEMENTATION . . . . . . . . . . . . . . . . 24
3.1 Data Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2 Model Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.1 Model Structure Selection . . . . . . . . . . . . . . . . . . . . . . 283.3 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
iv
CHAPTER Page
4 IMPLEMENTATION AND RESULTS . . . . . . . . . . . . . . . . . . . . . 32
4.1 AR Implementation for Single Attribute delay Model . . . . . . . . . . . 324.2 AR Implementation for Multiple Attribute Model . . . . . . . . . . . . . 334.3 ANN Implementation for Single Attribute delay Model . . . . . . . . . . 344.4 ANN Implementation for Multiple Attribute Model . . . . . . . . . . . . 35
5 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . 41
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
v
LIST OF TABLES
TABLE Page
1 Related Research work on River Forecasting . . . . . . . . . . . . . . . 4
2 NN Parameters for Single Attribute delay Model . . . . . . . . . . . . . 35
3 NN Parameters for Multiple Attribute Model . . . . . . . . . . . . . . . 37
4 Results for ANN and AR models of the Black Water River for SingleAttribute delay Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5 Results for ANN and AR models of the Black Water River for Mul-tiple Attribute Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
vi
LIST OF FIGURES
FIGURE Page
2.1 Three-layer perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1 Black river at Ohio state . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Proposed Model development procedure . . . . . . . . . . . . . . . . . 25
3.3 Data collected between 2011 and 2013 as inputs . . . . . . . . . . . . . 25
3.4 Discharge rate collected between 2011 and 2013 . . . . . . . . . . . . . 26
3.5 System model based on Multiple Attributes . . . . . . . . . . . . . . . . 28
3.6 System model for Single Attribute delay Model . . . . . . . . . . . . . . 29
4.1 Actual and Predicted flow using AR model for Single Attribute delay Model 33
4.2 Actual and Predicted flow using AR for Multiple Attribute Model . . . . 34
4.3 NN convergence curve over 1000 epochs for Single Attribute delay Model 36
4.4 Neural Network Structure for Multiple Attribute Model . . . . . . . . . . 37
4.5 Actual and Predicted flow using ANN for Single Attribute delay Model . 38
4.6 Actual and Predicted flow using ANN for Multiple Attribute Model . . . 39
4.7 Neural Network convergence over 5 iterations in Multiple Attribute Model 40
1
CHAPTER 1
INTRODUCTION
Forecasting the flow rate accurately based on the collected data is a challenging prob-
lem for engineers. It is important for short-term and long-term decisions. The two broad
categories of forecasting techniques [4] are quantitative methods (objective approach) and
qualitative methods (subjective approach). Quantitative forecasting methods are based on
analysis of historical data under an assumption that past patterns in data can be used to
forecast future data points. Qualitative forecasting techniques employ the judgment of ex-
perts in the specified field to generate forecasts. This approach is based on quantitative
forecasting method. The forecast is not just dependent on the stream rate and considers
numerous parameters such as temperature, moistness, precipitation etc. However, accu-
rate and timely forecasting of river flow flooding provides ample time for the authorities
to take flood protection measures such as evacuation. As water is dependent on attributes
like temperature, rainfall, conductivity, oxygen etc the problem is considered to be non
linear in nature. Therefore, forecasting such non linear characteristic data is a challenging
task as it is based on time series. Therefore, for this work the data considered is both
multi-variable and univariate data.
Furthermore, the study of a life cycle of water is known as hydrology. The most basic
process in this cycle is where evaporation happens and brings about stream in the form of
rainfall. However, excess rainfall leads to floods. The flow rates collected at each station
are essential to many activities such as taking measures against flood protection, assessing
how much water can be extracted from a river for water supply or irrigation, protecting
agricultural land, dam construction, reservoir operations etc. Since the estimation of flow
rate more precisely is very important in the life cycle, some models that deal with mete-
2
orological, hydrological and geological variables should be improved. Thus, controlling
water and operating water structures would be efficiently possible. As a part of solving
such problems, we adopt machine learning techniques [22]. Nowadays machine learning
techniques are widely used in many applications. Machine learning is a field of computer
science that gives computer systems the ability to learn from data, without being explicitly
programmed. They learn from previous computations to produce reliable, repeatable deci-
sions and results. The two most widely adopted machine learning methods are supervised
learning and unsupervised learning but there are also other methods of machine learning.
1.1 Supervised learning
Supervised learning algorithms are given training using examples, as inputs and pre-
dicting the output. For instance, a bit of data could have information named either ”F”
(failures) or ”R” (runs). The learning algorithm gets a set of input and output values, and
the algorithm learns by comparing its actual output and corrected outputs to discover the
error. The algorithm then learns from the examples trained to it and extracts useful infor-
mation from the raw data. Through strategies like regression, prediction and classification,
supervised learning utilizes examples to predict the estimations of the unlabeled informa-
tion. Supervised learning is in commonly utilized in cases where the output can predict
the future trends.
1.2 Unsupervised learning
Unsupervised learning is used against data that has no historical labels. The system
is not told the ”right answer.” The objective is to investigate the information and discover
some structure inside. Unsupervised learning functions admirably on value-based infor-
mation. The most commonly used machine learning algorithms are Naive Bayes classifier
3
algorithm, K- means clustering algorithm, linear regression, logistic regression, artificial
neural networks, genetic programming and fuzzy logic.
Therefore, both supervised and unsupervised learning algorithms can be used to pre-
dict the flow rate. Training can be given on the available extensive records of river flow
and other climatic data, which could be used to predict flow. There are many practical situ-
ations where the primary concern is with making accurate predictions at specific locations.
In such cases, it is preferred to implement a simple black box [1] model to identify a direct
mapping between the inputs and outputs. In the class of black box models, machine learn-
ing methods are used to predict by known meteorological and hydrological time series
based on the applications of Auto Regression Model (AR) and Artificial Neural Network
(ANN). In this work, models AR and ANN for flow prediction are implemented. Graphs
are plotted between the actual and estimated flow rates in both the models. In addition,
parameters such as mean absolute percentage error (MAPE), mean square error (MSE),
and variance accounted for (VAF) in each technique are calculated and compared to know
which model is performing better in prediction. The main aim of the project is modeling
of time series prediction and historical data based on the records, which are necessary to
solve the below water management problems.
Forecasting the water flow will solve many problems like:
• Estimate how much amount of water can be stored for the future use.
• Analyze how much amount of water can be released to other canals for irrigation to
improve the yield in agriculture and fishing.
• Redirecting the overflow of the river and save the man kind of any destruction around
the river.
• Finding the appropriate places on the river for installing hydroelectric plants for
4
Table 1: Related Research work on River Forecasting
Ref Authors Title River
[7] Y. B. Dibike and D. P. Solo-
matln
River Flow Forecasting Using Artificial
Neural Networks
Apure river
basin
[20] A. F. Sheta and M. S. El-
Sherif
Optimal prediction of the Nile River flow
using neural networks
Nile River
[11] Ozgur Kisi Daily River Flow Forecasting Using
Artificial Neural Networks and Auto-
Regressive Models
Black water
river, Gila
river
[19] LINDA SEE and STAN
OPENSHAW
Applying soft computing approaches to
river level forecasting
Ouse River
catchment.
[1] Imen Aichouri and Azzedine
Hani and Nabil Bougherira
and Larbi Djabri and Hicham
Chaffai and Sami Lallahem”
River flow model using artificial neural
networks
Seybouse
basin
generating electricity.
1.3 Historical Background and Related Works
In the past, many authors proposed various techniques to predict the results using soft
computing. Some of these important techniques are stated below in Table 1
In the research river Flow Forecasting using Artificial Neural Networks by Y. B.
Dibike and D. P. Solomatln [7] a comparison is made between two types of ANN ar-
5
chitectures. This research is done considering data for Apure river basin. The two types
on ANN are a multi-layer perceptron network (MLP) and a radial basis function network
(RBF). Observations reveal that the performance of these networks when compared with a
conceptual rainfall-runoff model were found to be slightly better for this river flow- fore-
casting problem.
A.F.Sheta and M. S. El-Sherif [20] developed two models for forecasting the Nile
River flow in Egypt. A traditional linear autoregressive (AR) model and a feedforward
neural networks (NN) model are presented. The network with minimum normalized root
mean square error during training and testing was selected as the optimal network for
forecasting. The performance of both the AR and NN models were tested using a set of
measurements recorded at Dongola station in Egypt. A significant improvement of the
error when using NN model was achieved in this model.
Ozgur Kisi [11] makes a comparison between neural networks and auto-regressive
(AR) models. It gives both numerical and graphical comparison for river flow prediction.
Sum of square errors (SSEs) and correlation statistic measures were used to evaluate the
model’s performance in this study. Results showed that ANNs were able to produce better
results than AR models when given the same data inputs and the similar model [17].
The forecasting daily river flows and infilling missing data are investigated using
adaptive neuro-fuzzy and artificial neural network (ANN) techniques by Ozgur kisi and
Ozgur ozturk. In this model, ANN and ANFIS are considered, and a comparison is made
for predicting the river flow on a daily basis. In this study, three different applications are
employed. The first part deals with the estimation of upstream and downstream station
flow data, separately. The second part focuses on the evaluation of missing data of the
downstream station flow data by using the upstream station data. In the third part, the flow
data of downstream station are estimated by using data from both stations.
6
Imen Aichouri [19] model rates one of many possible improvements to conventional
flood prediction that can be achieved using soft computing technologies. A methodology
is sketched in which the forecast data set is divided into subsets with a series of neural
networks before training. These networks are then recombined using a rule-based fuzzy
logic model optimized with a genetic algorithm. The methodology is demonstrated using
historical time-series data from the catchment area of the Ouse River in northern Eng-
land. The model predictions are evaluated using global performance statistics and more-
over specific flood-related assessment is measured and compared with benchmarks from a
statistical model, naive forecasts The overall results show that this methodology is a well-
functioning, cost-effective solution that can be easily integrated into existing operational
flood forecasting and alert systems.
Imen Achouri [19] uses ANN to model the precipitation-runoff relationship in a
catchment area in a semi-arid and Mediterranean climate in Algeria [1]. The performance
of the developed neural network-based model was compared to several linear regression
models using the same observed data. Accordlingly, the neural network strategy can be
connected to different hydrological frameworks where different models are inappropriate.
Models of artificial neural networks show a good ability to model hydrological processes.
They are useful and powerful tools for dealing with complex problems compared to other
traditional models. The results show that in the semi-arid and Mediterranean regions,
where rain and runoff are very irregular, can show a model precipitation-runoff relation-
ship, using neural network is able to achieve the overall improvement when compared with
many other hydrological areas. The ANN approach could be a very useful and accurate
tool for solving problems in water resource studies and management.
Based on the studies mentioned earlier ANN and AR models are successful in pre-
dicting the daily river-flow [3] by either making comparisons or creating hybrid neural
7
network [5]. However, there is no model which compares all the two models multiple
attribute model and single attribute delay model. In this work, we compare the results pro-
duced by the two model to determine the best way for predicting the results. Moreover, in
the existing models, there is no focus on time-series prediction, so the proposed model also
predicts the flow based on time series, i.e., daily basis. Both graphical and numerical com-
parisons are produced using MATLAB. As a future work, the model can be extended to
the dynamic problems like rainfall modeling problem, temperature forecasting, forecasting
ozone concentrations [12], influence of the other gases on ozone surface [21], determining
the content of toxic gases in the air [16] etc.
1.4 Motivation
Water management plays a vital role in daily life. If the water is not managed properly
it may lead to problems like water scarcity, flooding, insufficient water for agriculture,
improper dam constructions. In case of floods it will effect the lives of people living on
the banks of river and agriculture yield will be effected which in turn effects the countries
economy. According to the national geographic USA reports, by 2025 an estimated 1.8
billion people will live in areas plagued by water scarcity, with two-thirds of the worlds
population living in water-stressed regions [24]. Asia is the most flood-affected region,
accounting for nearly 50% of flood-related fatalities in the last quarter of the 20th century.
Therefore, it is very important to address the problems of water management.
1.5 Research Challenges
The data used to estimate the output plays a vital role in each of the technique. Having
an output variable flow rate, which depends on many attributes like temperature, humidity,
pressure, precipitation, etc. is a great challenge. In the same way, predicting the water
8
flow only based on a single attribute, discharge flow using time series model is also chal-
lenging. As the data considered in this model has values in different ranges, there is a
need to normalize the data in an optimal range. Furthermore, the details about the two
types of the datasets considered are discussed when dataset is explained. Also, the dataset
preparation phase is crucial as it involves the concept of model order selection too. Using
that discharge flow as the attribute, we try to build a dataset having three inputs and one
output following model order selection.
Furthermore, model order selection is choosing a model structure. This is usually the
first step towards its estimation. There are various possibilities for structure-state-space,
transfer functions and polynomial forms such as ARX, ARMAX, OE, BJ, etc. If there is no
detailed prior knowledge of the system, such as its noise characteristics and the indication
of feedback, the choice of a reasonable structure may not be obvious. Also for a given
choice of structure, the order of the model needs to be specified before the corresponding
parameters are estimated. The decision of a model order is additionally affected by the
measure of lag. There are many ways to determine time delay from input to output. In the
proposed idea the best order of model order selection is also determined.
1.6 Research Objectives
Recently, a lot of research has been engaged on the modeling and identification of
linear systems. All systems are non-linear to some extent. The presence of nonlinear dis-
tortions can always cause significant errors in the process of system modeling. The fact
that the behaviors of most dynamic systems are nonlinear has made the use of models such
as artificial neural networks, regression models, fuzzy logic and neuro-fuzzy systems be-
comes a suitable choice for the modeling and identification of many real-life systems. This
research aims to investigate the use of Artificial Neural Network(ANN) and Traditional
9
model algorithms to develop a prediction/forecasting model for river flow and natural me-
teorological systems. The primary aim is to construct a model that can accurately solve
the prediction problem of river flow or without the need for physical insight or extensive
prior knowledge about these systems except historical measurements.
10
CHAPTER 2
MODELING TECHNIQUES
In this chapter we discuss the two modeling techniques autoregression model and ar-
tificial neural networks for solving the dynamic modeling problems like river forecasting.
In traditional modeling we discuss the types of regression models, how autoregression
model works when there are multiple attributes, whereas in artificial neural networks we
discuss types of neural networks and how backpropogation learning algorithm works in
tuning the weights. We also explain how ANN is able to learn from the samples of data
and derive knowledge from the raw data.
2.1 Traditional modeling
Regression model [8] is the most critical and extensively utilized machine learning
and statistics tools out there. It enables you to make predictions from data by learning the
connection between features of your data and some observed, constant valued response.
Regression is used in various applications.
Machine learning, more specifically the field of predictive modeling is primarily con-
cerned with minimizing the error rate of a model or making the most accurate predictions
possible in the application, at the expense of explaining ability. As such, linear regression
was developed in the field of statistics and is studied as a model for understanding the rela-
tionship between input and output numerical variables, but has been borrowed by machine
learning. It is both a statistical algorithm and a machine learning algorithm.
11
2.1.1 Linear Regression Model (LR)
Linear regression is a linear model, e.g., a model that assumes a linear relationship
[25] between the input variables (x) and the single output variable (y). The variable x is the
independent variable where y and one or more variables are the dependent variables. More
specifically, that y can be estimated from a linear combination of the input variables Linear
regression is a linear model, e.g., a model that assumes a linear relationship between the
input variables (x) and the single output variable (y). The variable x is the independent
variable where y and one or more variables are the dependent variables. More specifically,
that y can be estimated from a linear combination of the input variables . If there is a
single input variable (x), the method is called as simple linear regression. When there are
multiple input variables to predict the output as a combination of all the variable, then its
called as multiple linear regression. If there is a single input variable (x), the method is
called as simple linear regression. When there are multiple input variables to predict the
output as a combination of all the variable, then its called as multiple linear regression.
In linear regression analysis, the relations are modeled using the linear predictor func-
tions in which the data predict the unknown model parameters. Therefore, such models are
called as linear models. In common, the mean of y given the value of x is assumed to be a
function of x whereas the median or some other quantile if the condition distribution of y
given x is expressed as a linear function of x. Linear regression focuses on the conditional
probability distribution of y if x is given.
It is the first regression analysis which is studied and used in many practical applica-
tions. These models are often find the optimal parameters by the least square estimation
approach. However, there are multiple ways in which they can be fitted. Mostly, least
squares approach can be used to fit models that are not linear models.
The linear equation assigns one scale factor to each input value, called a coefficient
12
and represented by Beta (β) One additional coefficient is also added, giving the line an
additional degree of freedom and is known as the intercept or the bias coefficient. For
example, in a simple regression problem where only x and y are measures the form of the
model would be:
y = β0 + β1 ∗ x (2.1)
In solving real world problems, we have many x values or many input values to solve
the problem. This means many coefficients are used in this model as the inputs increase.
In the multiple regression models, because of the potentially large number of predictors or
inputs, it is more efficient to use matrices to define the regression model and the subsequent
analyses.
Consider the following simple linear regression function in Equation 2.2.
yi = β0 + β1xi + εi; ∀i = 1, 2, 3, ...n (2.2)
For i = 1, . . . , n, the obtained n Equations are 2.3, 2.4 and 2.5:
y1 = β0 + β1 ∗ x1 + ε1. (2.3)
y2 = β0 + β1 ∗ x2 + ε2. (2.4)
...
yn = β0 + β1 ∗ xn + εn. (2.5)
13
Therefore, these equations can be written in matrix format as Equation. 2.6:
y1
y2...
yn
=
1 x1
1 x2...
...
1 xn
β0β1
+
ε1
ε2...
εn
(2.6)
In short notation, simple linear regression function can be reduced as Y=Xβ + ε. In
all the above equations yi where i =1, 2, ..., n is called the response variables or measured
variables or dependent variables. So, in each dataset, the decision as to which variable is
modeled as the dependent variable and the independent variable is based on a presumption
that the other variables cause the selected one. And all xi where i=1, 2, ..., n is called
explanatory variables or input variables or independent variables. Usually a constant is
incorporated as one of the regressors. The elements β0, β1 are called the intercepts. Multi-
ple linear regression is the extension to simple linear regression. There are many predictor
variables (X) in this regression. Nearly all real-world problems involve multiple linear
regression.
2.1.2 Auto Regression Model (AR)
Autoregressive (AR) and autoregressive integrated moving average (ARIMA) models
are important in the stochastic modeling of hydrological data. Such models are of value
in handling what might be described as the short-run problem, that of modeling the sea-
sonal variability in a stochastic flow series. In recent years their importance to practical
water resource problems has been over- shadowed by more sophisticated types of models
that are designed to preserve long-time dependencies, perhaps of the order of decades,
in hydrological series. Although the long-run problem is crucial, the short-run problem,
perhaps of the order of months to a few years, is also important.
14
An autoregression model [11] is a time series model that uses observations from pre-
vious time steps as input to a regression equation to predict the value at the next time step.
Models using time as a predictor can be understood as using previous values to estimate
the model parameters, but they are otherwise not part of the forecast equation thus being
non-adaptive or fixed until re-estimation occurs. Models using time variables will exhibit
auto-correlated residuals thus should be studiously avoided as the presumed model. It is
a straightforward idea that can result in accurate forecasts on a range of time series prob-
lems. Also, a regression model, such as linear regression, models an output value based on
a linear combination of input values. If the output is just dependent on the previous value
i.e. yt on yt−1, then the AR model is written as Equation 2.7:
yt = β0 + β1yt−1 + εt. (2.7)
where t is the measure at specific time. If the idea is to predict the present value based
on the past two years as Equation 2.8, i.e on yt−1, yt−2 then the AR model is:
yt = β0 + β1yt−1 + β2yt−2 + εt. (2.8)
If the prediction is performed using yt−1, yt−2 then it is said as AR(2) since the past
two values are used in prediction. Hence, in general AR model can be written in the form
of Equation 2.9:
yt = β0 +n∑
i=1
βiyt−i + εt (2.9)
where β0,β1,...,βi are parameters, yt−1,yt−2,..., yt−i are previous values based on time.
There are two broadly used linear time series models Autoregressive (AR) and Moving
Average (MA) [9] [13] models. Combining these two, the Autoregressive Moving Aver-
age (ARMA) and Autoregressive Integrated Moving Average (ARIMA) models have been
15
proposed in the literature [15].
The ARMA is a tool for understanding and, maybe, foreseeing future values in this
series. The model comprises of two sections, an autoregressive (AR) part and a moving
normal (MA) part. The AR part includes regressing the variable lagged on its own, past
values. The MA part involves modeling the error term as a linear combination of error
terms occurring contemporaneously and at various times in the past. This model is usually
referred to as the ARMA(p, q) as in Equation 2.12, a model where p is the order of the
autoregressive part as in the Equation 2.10 and q is the order of the moving average part
as in the Equation 2.11, The notation AR(p) refers to the autoregressive model of order p.
The AR(p) model is written as:
yt = c+
p∑i=1
βiyt−i + εt. (2.10)
where βi i=1, 2, ,n are parameters, c is constant and εt is noise. Some constraints are
required on the values of the parameters so that the model remains stationary. The notation
MA(q) refers to the moving average model of order q:
yt = µ+ εt +
q∑i=1
θiεt−i. (2.11)
where θ1, θ2, ...., θn are parameters, µ is the expectation and εti , i=1,2,..,n are error
terms or noise. Therefore, the notation ARMA (p, q) refers to the model with p autore-
gressive terms and q moving-average terms. This model contains the AR(p) and MA(q)
models.
yt = c+ εt +
p∑i=1
βiyt−i +
q∑i=1
θiεt−i. (2.12)
ARMA is suitable when a system is a function of a series of unobserved shocks (the
16
MA or moving average part) as well as its behavior for example, in applications like stock
price prediction.
2.2 Artificial Neural Networks
ANNs are computing systems inspired by the biological neural network that learn by
examples. These systems do not have a prior idea about any objects, images or data. They
are quantitative models which learn to associate input and output patterns adaptively with
the use of learning algorithms. ANN’s can help in making decisions about examples in
same way that decision trees can. Just as with decision trees, the examples need to be
converted first to a set of feature values.
2.2.1 Neural Network Components
Similarly, a neural network structure is designed using the concept of the biological
nervous system. An ANN is the collection of connected units or nodes called as artificial
neurons which perform some functions. The network of these nodes is like the human
brain. The ANN is organized in layers like human nervous system. These layers include
input layer, a hidden layer and output layer where the training and testing the data takes
place by neurons in each layer. An example of three layer perceptron with three inputs,
one output, and the hidden layer including four neurons is shown in Figure 2.1. The
artificial neurons, activation units present in each layer that receives the signal can process
and forward the signal to next neuron. This process is continued until the output layer is
reached. This type of network in which data flows in one direction (forward) is known as
a feed-forward network. The data is divided into training and testing parts, and the data is
given to the system, and the validation is performed for a sample data so that it is examined
if the system could learn from the examples. In general, the function of each layer is:
17
• Input Layer - It contains those units (artificial neurons) that receive input from the
outside world through which the network will learn, recognize, or otherwise process.
• Output Layer - Contains units that respond to information about how they have
learned a task.
• Hidden Layer - These units are located between input and output layers. The hidden
layer’s job is to turn the input into something that the output unit can use in some
way using activation units. Most neural networks are fully connected, each hidden
neuron is completely connected to each neuron in its previous layer (input) and to
the next layer (output).
Figure 2.1: Three-layer perceptron
In ANN implementations, input between the artificial neurons is a real number; a
non-linear function calculates an output as the sum of its inputs. ANN typically have
a weight that adjusts learning rate. ANNs also have a threshold such that only if the
aggregate signal crosses that threshold is the signal sent. Thus, the signal emanating from
the output node(s) is the solution of the network to the input problem. Hence, ANN is
18
designed such that it can solve problems in the same way that human brain does. However,
ANN can be implemented as single layer perceptron, which has a single layer of output
nodes; the inputs are fed directly to the outputs through a series of weights. Multi-layer
perceptron as shown in the Figure 2.1 uses feed forward way on training and a variety of
learning techniques, the most popular being back propagation. Each neuron in one layer
has direct connections to the neurons of the subsequent layer as shown in the figure. In
many applications, the units of these networks apply a sigmoid function as an activation
function. The activation function can also be the sign, step which are commonly used.
A multi-layer neural network can be computed by a continuous output instead of a step
function or sign function. A common choice is the so-called logistic function:
f(x) =1
1 + e−x(2.13)
2.2.1.1 Types of Neural Networks
There are many types of neural networks. Some of the commonly used types in
machine learning techniques are.
• FeedForward Neural Network (FFNN): FFNN is one of the simplest forms of ANN,
where the data or the input forwards in a single direction [23]. The data flows
through the input nodes and comes out at the output nodes. This neural network
may or may not have the hidden layers. In simple words, it can forward in one
direction but no back propagation by using a classifying activation function usually.
• Radial Basis function Neural Network (RBF):
RBF is consider the distance of a point with respect to the center [10]. RBF functions
have two layers, first where the features are combined with the Radial Basis Function
19
in the inner layer and then the output of these features are taken into consideration
while computing the same output in the next time-step which is basically a memory.
• Recurrent Neural Network(RNN):
The Recurrent Neural Network [10] works on the principle of saving the output of
a layer and feeding this back to the input to help in predicting the outcome of the
layer. Here, the first layer is formed like the feed forward neural network with the
product of the sum of the weights and the features. The recurrent neural network
process starts once this is computed, this means that from one-, time step to the
next each neuron will memorize some information it had in the previous time-step.
This makes each neuron behave like a memory cell in performing computations. In
this process, we need to let the neural network to work on the front propagation
and remember what information it needs for future usage. Here, if the prediction is
wrong, we use the learning rate or error correction to make the changes so that it will
gradually work towards making the right prediction during the backpropagation.
• Convolution Neural Network(CNN):
CNN are like feed forward neural networks, where the neurons have learn-able
weights and biases [18]. Its application has been in signal and image processing
which takes over Open-CV in the field of computer vision.
2.2.2 Network Structure
The importance of the network design, that is the arrangement of neurons and synapses
is not to be underestimated. The arrangement of a number of neurons and number of hid-
den layers is very important to extract the hidden relation between the data.
20
2.2.2.1 Design of Network Structure
Neural network architecture can be defined as the design that executes the functions
of a neural network. All network structures are based on the concepts of neurons, inter-
connections and transfer functions. Network architectures are organized in different ways
according to their applications. The most common architecture is the multi-layer feed
forward network.
A neural network structure should have all the architecture in such a way that it can
solve the problems of application. All the ANN components should be decided and the
architecture designed should be able to solve the problems. The main aspects that are
considered in designing ANN structure are:
• Deciding on how many neurons should each layer in the architecture contain.
• Deciding how many number of hidden layers should be there.
• Deciding on which transfer function or activation function at each layer.
• Determining the weights for inputs.
2.2.3 Backpropagation Learning algorithm
Backpropagation was proposed in the 1970’s as a general optimization method acting
for performing automatic differentiation of complex nested functions. It is the generaliza-
tion of the Widrow-Hoff learning rule to multiple-layer networks and nonlinear differen-
tiable transfer functions [14]. Its a procedure to repeatedly adjust the weights to minimize
the difference between actual output and desired output. Also, an algorithm used in artifi-
cial neural networks to calculate the error contribution of each neuron after a batch of data
is being processed. This is used by adapting optimization algorithm to adjust the weight
21
of each neuron, completing the learning process for that case. Hidden Layers are neu-
ron nodes stacked in between inputs and outputs, allowing neural networks to learn more
complicated features.
• Backpropagation requires a known, desired output for each input value, therefore, it
is called supervised learning method.
• The weights that are changed when we backpropagate to the input layer follow var-
ious mathematical functions.
• Choose an optimal function to calculate weights that are to be given to the input
layer in each iteration so that the error rate is minimized.
Consider a neural network made up of n inputs connected to a single output unit with
n number of hidden neurons. The output of the network is determined by calculating a
weighted sum of its inputs and comparing this value with a threshold. If the net input is
greater than the threshold, the output is 1. Otherwise, it is 0. First, if we have m input data
(x1, x2, , xm), we call this m features. A feature is just one variable we consider as having
an influence on a specific outcome. Secondly, when we perform product for each of the m
features with a weight (w1, w2, , wm) and sum them all together, this is a dot product:
Mathematically, we can summarize the computation performed by the input unit as
Equation 2.14.
WX = w1x1 + w2x2 + .....wnxn =n∑
t=1
wixi (2.14)
where (w1, w2, ..., wn are the weights given to input layer and x1, x2, . . . , xn are inputs
at the input layer), which are output from the input layer.
Perform dot product for input data set X: x1, x2, ...., xm with weight set W 1: w11,
w12,... w1
n the weights have super script 1 since they are weight to first hidden neuron of the
22
hidden layer Perform dot product for input data set X: x1, x2, ...., xm with weight set W 1:
w11, w1
2,... w1n the weights have super script 1 since they are weight to first hidden neuron
of the hidden layer Perform dot product for input data set X: x1, x2, ...., xm with weight
set W 1: w11, w1
2,... w1n the weights have superscript 1 since they are weight to first hidden
neuron h11 of the hidden layer h1 with dot product summation, added with bias the result
z. So, at each hidden layer a bias is added to the inputs along with dot product of weight
as in Equation 2.16.
z =n∑
i=1
wixi + bi (2.15)
Now z is feed to the activation function. The activation function can be an identity,
sign or step. The output from the first hidden neuron in the hidden layer is now h11= f(z).
The same step is repeated for all the hidden n neurons and, now we have the n hidden
outputs. Now the outputs from all the n hidden neurons (h11, h12, ...., h1n) are used as input
to calculate the final output. For the final output, we perform a dot product for hidden layer
outputs, and hidden layers weights W h. As the inputs are from the hidden layer outputs
h1: (h11, h12, h
13, ...., h1n). As the final output is one value the set of weights consist of a
W h1 : (wh11, wh12, ..., wh1n) with n weights having n hidden layer inputs. Then the bias is
added to obtain the results z:
z =n∑
i=1
wh1i h
1i + bias (2.16)
Now the output z is given to an activation function f(z) so that an output at the output
layer is y=f(z). After the output is calculated the error rate is written as Equation 2.17.
Etotal =1
2[target− output]2. (2.17)
23
Now a chain rule is implemented to adjust the weights at each layer using derivative.
If the partial derivative of the total error is considered with respect to output, the quantity
12[target − output]2 becomes zero because output does not affect it which means taking
the derivative of a constant which is zero. Using delta rule the weights are adjusted at each
layer. Delta rule is a gradient descent learning rule for updating the weights of the inputs
to artificial neurons in a single-layer neural network. So, for a neuron j with activation
function g(x), the delta rule for j’s ith weight wji is written as Equation 2.18.
∆wji = α(tj − yj)g′(hj)xi (2.18)
where α is a small constant called learning rate, g(x) is the neuron’s activation func-
tion, tj is the target output, hj is the weighted sum of the neuron’s inputs, yj is the actual
output, xi is the ith input and hj=∑
xi wji and yj= g(hj).
In simplified form a delta rule for a neuron is:
∆wji = α(tj − yj)xi (2.19)
If mse > theta (theta can be any value called thresh hold value) then retrain the data
or stop training, the weights are adjusted until the desired error rate.
24
CHAPTER 3
SYSTEM DESIGN AND IMPLEMENTATION
In this chapter, we describe two modeling techniques, ANN and AR. These two tech-
niques are commonly used for prediction/forecasting of real-life problems. The Black
river data on the proposed model. The Black River is used for the purpose of analysis and
forecasting. It is a tributary of Lake Erie, about 19 km long, north of Ohio in the United
States as shown in Figure 3.1, across Lake Erie, the Niagara River and Lake Ontario, it
is part of the watershed of the St. Lawrence River that flows to the Atlantic Ocean. The
black drains an area of 470 square meters (1217 square kilometers) [6].
Figure 3.1: Black river at Ohio state
The proposed model structure to implement multiple attribute model and single at-
tribute delay model is shown in Figure 3.2. For the purpose of convenience and to have
accurate results the process is sub-categorized into three phases as in Figure 3.2, data pre-
processing, model design and validation testing. In preprocessing stage data normalization
is used to process the available dataset to find the range which will fit the problem. The
25
processed data is fed as an input in the model development stage, and finally, we test the
results acquired from the previous stage.
Figure 3.2: Proposed Model development procedure
The dataset consists of 6 attributes, collected during the years 2011, 2012 and 2013 at
site USGS 04200500. The attributes are temperature, dissolved oxygen, PH of the water,
specific conductance, turbidity and the discharge rate [24]. After visualizing the data in
the dataset, it is evident that there are many outliers present as shown in the Figure 3.3 and
Figure 3.4.
Jan 2011 Jan 2012 Jan 2013 Jan 2014
0
5
10
15
20
25
30
Jan 2011 Jan 2012 Jan 2013 Jan 2014
0
200
400
600
800
1000
Jan 2011 Jan 2012 Jan 2013 Jan 2014
6
8
10
12
14
16
Jan 2011 Jan 2012 Jan 2013 Jan 2014
7.4
7.6
7.8
8
8.2
8.4
8.6
8.8
Jan 2011 Jan 2012 Jan 2013 Jan 2014
0
100
200
300
400
500
600
Figure 3.3: Data collected between 2011 and 2013 as inputs
26
Jan 2011 Jul 2011 Jan 2012 Jul 2012 Jan 2013 Jul 2013 Jan 2014
0
2000
4000
6000
8000
10000
12000
Figure 3.4: Discharge rate collected between 2011 and 2013
3.1 Data Normalization
The dataset considered for historical data which has six attributes. These attributes
have a wide range of scales. As the attributes are of different features like temperature,
pH, oxygen, turbidity, etc, this dataset has a wide range of values with outliers, there is
a need for data normalization for obtaining accurate prediction results using the proposed
model. The existence of such data which do not have values on the same scale may not
give desired values as output. In data normalization stage all the variables are transformed
to a specific range or on to the same scale. However, we can do the normalization to reach
a linear, more robust relationship using the mean, standard deviation, setting up range
as maximum and minimum values to round off the values as follows: Standardizing the
Numeric Attributes. Data standardization is the technique of rescaling one or more than
one attribute so that they have a mean value of 0 and a standard deviation of 1.
In the proposed model normalization is performed by adding a function called Fea-
tureNormalize(X). FeatureNormalize(X) returns a normalized version of X, where the stan-
dard deviation of each feature is 1 and the mean value of each feature is 0. This is often
27
a better preprocessing technique to when working with learning algorithms. This func-
tion returns M, S, and norm as outputs. First, for each feature dimension, the mean of the
feature is computed and subtracted from the dataset, storing the mean value in M. Each
feature standard deviation is also computed and stored in S as in the equations 3.1, 3.2
and 3.3. Next, the actual value in each feature is computed using M and S stored and
computing normalized value into a variable norm.
M = mean(X) (3.1)
S = std(X) (3.2)
norm =(X −M)
S(3.3)
3.2 Model Design
In the model development stage after preprocessing the dataset, the data is fed as input
to the proposed models, namely ANN and AR. We have developed two models namely,
multi attribute model and single attribute delay model for predicting the discharge flow. In
multiple attribute model, more than one attribute is fed as input to an algorithm, whereas
in single attribute delay model, the discharge flow is predicted using single attribute based
on model order selection.
• Multiple Attribute Model:
The data used for this model is the data which is collected in 2011, 2012 and 2013.
The data collected has many attributes like temperature, humidity, dissolved oxygen,
pH, turbidity and discharge rate. The discharge rate depends on the other 5 attributes
in this model. The Model structure for multiple attribute model is shown in Figure
3.5, the inputs for the model are five attributes and the output is discharge rate.
28
Figure 3.5: System model based on Multiple Attributes
• Single Attribute delay Model:
This model is based on time series model which involves the data collected as an
average value per day. In this case, only daily flow rate is considered. A dataset is
designed using the model order selection with some delay in each column, and then
training is performed using both techniques and tested for the flow rate. To find y(t)
at a particular day, output using the input-output sequences of measurements y(t-1)
the previous day flow, y(t-2) which is 2 days before the flow, y(t-3) which is 3 days
before the actual day as shown in Figure 3.6. Hence, the values in both input and
output are discharge rate values.
3.2.1 Model Structure Selection
Model order selection is in which we determine order of the model and Input De-
lay. There are few techniques for selecting and configuring the optimal model structure,
lags and delays. Estimation of a model utilizing measurement data requires a choice of a
29
Figure 3.6: System model for Single Attribute delay Model
model structure, (for example, state-space or transfer function) and its order (e.g., number
of zeros) ahead of time. This decision is affected by earlier learning about the system
being modeled, yet can also be motivated by an analysis of data itself. Choosing a model
structure is usually the first step towards its estimation.
There are various possibilities for structure - state-space, transfer functions and poly-
nomial forms such as ARX, ARMAX, OE, BJ, etc. Likewise, for a given decision of
structure, the order of the model should be indicated before comparing the parameters.
The decision of a model order is also effected by the amount of delay. There are different
options available for deciding the time delay from input to output like using the delays, us-
ing a non-parametric estimate of the impulse response, using impulse using the state-space
model estimator N4SID with many different orders and finding the delay of the ’best’ one.
So, the best model order is chosen for predicting the output in single attribute delay model.
30
3.3 Evaluation Criteria
To check the performance of each model, we adopted evaluation parameters such
as Mean square error (MSE), Mean absolute percentage error (MAPE) and Variance ac-
counted for(VAF) as in the Equations 3.4, 3.5 and 3.6.
The mean squared error (MSE) in the equation 3.4 measures the average of the
squares of the errors or deviations present, the difference between the estimator and what
is estimated. MSE is a function, corresponding to the expected value of the squared error
loss or quadratic loss. The difference occurs because of randomness. The MSE is a mea-
sure of the quality of an estimator and is always non-negative, and values closer to zero
are better.
The mean absolute percentage error (MAPE) in the equation 3.5, also known as mean
absolute percentage deviation (MAPD), is a measure of prediction accuracy of a forecast-
ing method, for example in time series prediction.
Variance Accounted For (VAF) in the equation3.6 is the expectation of the squared
deviation of a random variable from its mean. It calculates how far a set of (random)
numbers are arranged out from their mean value.
An error rate e, is computed as a difference between estimated and predicted values
yi and yi as in the equations 3.4, 3.5 and 3.6. These evaluation criteria are provided by the
data mining software tool called Weka (Waikato environment for knowledge analysis) [2].
This software tool is a popular machine learning tool developed in java. To implement the
modeling techniques MATLAB software is used.
MSE =1
n
n∑i=1
(yi − yi)2 (3.4)
31
MAPE =1
n
n∑i=1
|yi − yi|yi
∗ 100% (3.5)
V AF = [1− var(y − y)
var(y)] ∗ 100% (3.6)
So the above parameters MSE, VAF and MAPE are measured for both single attribute
delay model and multiple attribute model using ANN and AR model and the results are
evaluated.
32
CHAPTER 4
IMPLEMENTATION AND RESULTS
We have designed two models to predict the discharge flow rate of the Black River
using ANN and AR algorithms. We computed results using single attribute delay model
and multiple attribute model. The dataset has been preprocessed to remove the outliers
using the normalizing technique discussed in Chapter 3. The data is divided into 70%
training and 30% testing for both techniques.
4.1 AR Implementation for Single Attribute delay Model
An AR model is implemented for predicting the flow of the Black River using the
single attribute, discharge flow. The AR model performance is estimated using the MSE,
to minimize the error between the actual flow and the predicted flow. The model order
of 3 is chosen in implementing this model. The developed AR algorithm for this model
is given by Equation 4.1. In this case, the results for the actual and predicted values are
shown in Figure 4.1, the actual flow with a solid line and the estimated flow with the dotted
line, based on AR(3) model for training and testing of Black River flow.
yt = 0.0012 + 0.9963 yt−1 − 0.3709 yt−2 + 0.0012 yt−3 (4.1)
33
4.2 AR Implementation for Multiple Attribute Model
For AR algorithm, the number of training examples used is 645, for 1000 itera-
tions.The developed AR for the multiple Attribute model is given by Equation 4.2.
y(k) = 0.0047− 0.0227 temp+ 05918 cdt− 0.6826 oxygen
+ 0.3425 pH − 0.0047 turbidity (4.2)
The experiment is executed over a number of iterations, varying the error rate, model
order, number of iterations and changing training and testing samples. The best results are
recorded after tuning the parameters that effect the performance of the model. Parameters
like MSE, VAF and MAPE are recorded for this model. The results for the actual and
predicted values for this model are shown in Figure 4.2.
0 100 200 300 400 500 600 700
-5000
0
5000
10000
15000
Observed
Estimated
0 50 100 150 200 250 300
-5000
0
5000
10000
15000
Observed
Estimated
Figure 4.1: Actual and Predicted flow using AR model for Single Attribute delay Model
34
0 100 200 300 400 500 600 700
0
2000
4000
6000
8000
10000
12000
Observed
Estimated
0 50 100 150 200 250 300
0
0.5
1
1.5
210
4
Observed
Estimated
Figure 4.2: Actual and Predicted flow using AR for Multiple Attribute Model
4.3 ANN Implementation for Single Attribute delay Model
A feed forward neural network is implemented on daily values. The daily data set
consists of 919 × 4 size data. This data is built based on the model order selection with
the order of 3 i.e., having a delay of 0. To extract the knowledge in the data one hidden
layer is not sufficient in this model. The model is executed for 2 hidden layers, 3 inputs
and 1 output layer. The neural network structure to predict daily values is shown in Table
2. Other attributes like Performance goal, Marquardt adjustment parameter, Maximum
validation failures, Minimum performance gradient are defined at the phase of training
and testing the network. For daily dataset for model 2, the results are as shown in the
Figure 4.3 and Figure 4.5.
35
Table 2: NN Parameters for Single Attribute delay Model
Inputs 3
Hidden Layers 2
Hidden neurons 20,1
Maximum epochs 1000
4.4 ANN Implementation for Multiple Attribute Model
In multiple attribute model we try to predict the discharge rate using the other five
attributes. The size of this dataset is 919 × 6. As shown in Table 3 the NN structure has
30 activation units in the first hidden layer and 20 activation units in the second hidden
layer which is considered as optimal. When the network structure is considered having
one hidden layer with less number of hidden neurons the performance of NN is not so
efficient. The neural network structure has 1 input layer, 2 hidden layers and 1 output
layer with 10 neurons in first hidden layer and 1 hidden neuron in second hidden layer as
shown in Figure 4.4. Therefore, more than one hidden layers are used to predict the output.
The number of hidden neurons present in the hidden layers are arbitrarily chosen. After a
number of runs, it is observed that 30 number of neurons in hidden layer 1 and 20 hidden
neurons in hidden layer 2 are optimal for achieving best MSE. In the first hidden layer the
transfer function used is tansig, and in second layer the transfer function is purelin. The
sum of the weighted inputs and the bias forms the input to these transfer function.
The results for the actual and predicted values using ANN in this model are shown
in Figure 4.6 When the experiment is implemented for 5 iterations the convergence graph
plotted between epochs and MSE for the 5 iterations is shown in Figure 4.7.
Accordingly, for the single attribute delay model, the best values for the parameters
36
0 100 200 300 400 500 600 700 800 900 1000
Epochs
10-1
100
101
102
MS
E
NN convergence
Figure 4.3: NN convergence curve over 1000 epochs for Single Attribute delay Model
MSE, VAF and MAPE are compared between AR and ANN techniques are shown in Table
4. It is observed that ANN is the best in predicting the flow rate using single attribute delay
model, multiple attribute model yielding the best values for the parameters MSE, VAF,
and MAPE are compared between the two models as shown in Table 5. It is observed that
ANN is best in predicting the flow rate. The actual and predicted values for the flow in
both training and testing cases are showed for ANN and AR techniques. The computed
difference between the actual and predicted are shown in Table 5. For both the models
ANN is observed to be better for predicting the flow values than AR when MSE, VAF, and
MAPE are compared.
37
Figure 4.4: Neural Network Structure for Multiple Attribute Model
Table 3: NN Parameters for Multiple Attribute Model
Inputs 5
Hidden Layers 2
Hidden neurons 30,20
Maximum epochs 1000
38
0 100 200 300 400 500 600 700
0
2000
4000
6000
8000
10000
12000
Observed
Estimated
0 50 100 150 200 250 300
-5000
0
5000
10000
15000
Observed
Estimated
Figure 4.5: Actual and Predicted flow using ANN for Single Attribute delay Model
Table 4: Results for ANN and AR models of the Black Water River for Single Attribute
delay Model
ANN AR
Training Testing Training Testing
VAF 68.759 61.259 59.262 61.513
MSE 0.2641 0.5287 0.3445 0.5235
MAPE 0.2173 0.2841 0.2952 0.2907
39
0 100 200 300 400 500 600 700
-5000
0
5000
10000
15000
Observed
Estimated
0 50 100 150 200 250 300
-5000
0
5000
10000
15000
Observed
Estimated
Figure 4.6: Actual and Predicted flow using ANN for Multiple Attribute Model
Table 5: Results for ANN and AR models of the Black Water River for Multiple Attribute
Model
ANN AR
Training Testing Training Testing
VAF 97.351 91.65 62.80 33.246
MSE 0.037 0.128 0.1354 1.0897
MAPE 0.115 0.135 0.333 0.299
40
0 100 200 300 400 500 600 700 800 900 100010
-1
100
101
102
103
run1
run2
run3
run4
run5
Figure 4.7: Neural Network convergence over 5 iterations in Multiple Attribute Model
41
CHAPTER 5
CONCLUSION AND FUTURE WORK
In this study, a detailed comparison is presented between the ANN and AR models
in predicting the water flow at Black river, Ohio. Comparisons reveal that ANN is bet-
ter in prediction besides having many advantages. As the model is implemented for both
the historical data and time series based data, it is observed that ANN can be adapted for
predicting the discharge rate. The performance of the two proposed models was com-
pared for both training and testing cases. In addition, the comparison is made considering
parameters such as MSE, VAF, and MAPE.
As a part of future work, a comparison can be made in predicting weekly and monthly
values. Other modeling techniques like genetic algorithm, artificial neuro fuzzy inference
systems, etc. can also be implemented. In addition, a hybrid neural network can be im-
plemented where the weights are tuned by the genetic algorithm and given to the neural
network for better training and testing. Other parameters for measuring the performance
such as correlation, regression, accuracy can also be implemented. Further, the model
can be extended to solve other real-world problems like ozone layer depletion, rain water
problems. etc.
42
REFERENCES
[1] AICHOURI, I., HANI, A., BOUGHERIRA, N., DJABRI, L., CHAFFAI, H., AND
LALLAHEM, S. River flow model using artificial neural networks. Energy Procedia
74 (2015), 1007 – 1014. The International Conference on Technologies and Materi-
als for Renewable Energy, Environment and Sustainability TMREES15.
[2] ALJAHDALI, S., SHETA, A. F., AND DEBNATH, N. C. Estimating software ef-
fort and function point using regression, support vector machine and artificial neural
networks models. In 2015 IEEE/ACS 12th International Conference of Computer
Systems and Applications (AICCSA) (Nov 2015), pp. 1–8.
[3] ATIYA, A. F., EL-SHOURA, S. M., SHAHEEN, S. I., AND EL-SHERIF, M. S. A
comparison between neural-network forecasting techniques-case study: river flow
forecasting. IEEE Transactions on Neural Networks 10, 2 (Mar 1999), 402–409.
[4] BOX, G. E. P., AND JENKINS, G. Time Series Analysis, Forecasting and Control.
Holden-Day, Incorporated, 1990.
[5] CHEN, X., CHAU, K., AND BUSARI, A. A comparative study of population-based
optimization algorithms for downstream river flow forecasting by a hybrid neural
network model. Engineering Applications of Artificial Intelligence 46 (2015), 258 –
268.
[6] CONTRIBUTORS, W. Black river (ohio) — wikipedia the free encyclopedia, 2018.
[Online; accessed 2-May-2018].
[7] DIBIKE, Y., AND SOLOMATINE, D. River flow forecasting using artificial neural
networks. Physics and Chemistry of the Earth, Part B: Hydrology, Oceans and At-
mosphere 26, 1 (2001), 1 – 7.
43
[8] DOAN, T., AND KALITA, J. Selecting machine learning algorithms using regres-
sion models. In 2015 IEEE International Conference on Data Mining Workshop
(ICDMW) (Nov 2015), pp. 1498–1505.
[9] G.E.P. BOX, G. J. Time series analysis, forecasting and control.
[10] GRAVES, A., R. MOHAMED, A., AND HINTON, G. Speech recognition with deep
recurrent neural networks. In 2013 IEEE International Conference on Acoustics,
Speech and Signal Processing (May 2013), pp. 6645–6649.
[11] KISI, O., AND ZTRK, . Forcasting river flows and estimating missing data using soft
computing techniques. International congress on River basin management (2007).
[12] KOVAC-ANDRIC, ELVIRAAND SHETA, A., FARIS, H., AND GAJDOSIK,
M. S. Forecasting ozone concentrations in the east of croatia using nonparametric
neural network models. Journal of Earth System Science 125, 5 (Jul 2016), 997–
1006.
[13] K.W. HIPEL, A. M. Time series modelling of water resources and environmental
systems.
[14] LI J., CHENG J., S. J. H. F. Brief Introduction of Back Propagation (BP) Neu-
ral Network Algorithm and Its Improvement. In: Jin D., Lin S. (eds) Advances in
Computer Science and Information Engineering. Advances in Intelligent and Soft
Computing, vol 169. Springer, Berlin, Heidelberg, 2012.
[15] MCDONALD, S., COLEMAN, S., MCGINNITY, T. M., AND LI, Y. A hybrid fore-
casting approach using arima models and self-organising fuzzy neural networks for
capital markets. In The 2013 International Joint Conference on Neural Networks
(IJCNN) (Aug 2013), pp. 1–7.
44
[16] MONDAL, B., MEETEI, M., DAS, J., CHAUDHURI, C. R., AND SAHA, H. Quan-
titative recognition of flammable and toxic gases with artificial neural network using
metal oxide gas sensors in embedded platform. Engineering Science and Technology,
an International Journal 18, 2 (2015), 229 – 234.
[17] NIGAM, R.AND NIGAM, S. M. S. K. The river runoff forecast based on the model-
ing of time series. Russian Meteorology and Hydrology 39, 11 (Nov 2014), 750–761.
[18] O’SHEA, K., AND NASH, R. An introduction to convolutional neural networks.
[19] SEE, L., AND OPENSHAW, S. Applying soft computing approaches to river level
forecasting. Hydrological Sciences Journal 44, 5 (1999), 763–778.
[20] SHETA, A. F., AND EL-SHERIF, M. S. Optimal prediction of the nile river flow
using neural networks. In Neural Networks, 1999. IJCNN ’99. International Joint
Conference on (1999), vol. 5, pp. 3438–3441 vol.5.
[21] SHETA, A. F., AND FARIS, H. Influence of nitrogen-di-oxide, temperature and rel-
ative humidity on surface ozone modeling process using multigene symbolic regres-
sion genetic programming. International Journal of Advanced Computer Science
and Applications 6, 6 (2015).
[22] SINGH, A., THAKUR, N., AND SHARMA, A. A review of supervised machine
learning algorithms. In 2016 3rd International Conference on Computing for Sus-
tainable Global Development (INDIACom) (March 2016), pp. 1310–1315.
[23] TANG, Y., AND SALAKHUTDINOV, R. R. Learning stochastic feedforward neu-
ral networks. In Advances in Neural Information Processing Systems 26, C. J. C.
Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, Eds. Curran
Associates, Inc., 2013, pp. 530–538.
45
[24] WATER-RESOURCES DATA. United states geological survey,usgs water data for the
nation, 1994.
[25] ZOU, K. H., TUNCALI, K., AND SILVERMAN, S. G. Correlation and simple linear
regression. Radiology 227, 3 (2003), 617–628. PMID: 12773666.