chapter 4 predicting stock market index using fusion of...
TRANSCRIPT
Chapter 4
Predicting Stock Market Index
using Fusion of Machine Learning
Techniques
The study focuses on the task of predicting future values of stock market index.
Two indices namely CNX Nifty and S&P BSE Sensex from Indian stock markets are
selected for experimental evaluation. Experiments are based on 10 years of historical
data of these two indices. The predictions are made for 1 to 10, 15 and 30 days in
advance. A two stage fusion approach is proposed in this study. First stage employs
SVR for preparing data for the second stage. The second stage of the fusion approach
uses ANN, Random Forest (RF) and SVR resulting in to SVR-ANN, SVR-RF and
SVR-SVR fusion prediction models. The prediction performance of these hybrid
models is compared with the single stage scenarios where ANN, RF and SVR are
used single-handedly. Ten technical indicators are selected as the inputs to each of
the prediction models.
4.1 Introduction and Literature Review
Prediction of stock prices is a classic problem. Efficient market hypothesis states
that it is not possible to predict stock prices and that stocks behave in random walk
manner. But technical analysts believe that most information about the stocks are
reflected in recent prices, and so, if trends in the movements are observed, prices
can be easily predicted. In addition, stock market’s movements are affected by many
39
CHAPTER 4. PREDICTING STOCK MARKET INDEX 40
macro-economical factors such as political events, firms’ policies, general economic
conditions, commodity price index, bank rate, bank exchange rate, investors’ expec-
tations, institutional investors’ choices, movements of other stock market, psychology
of investors, etc. (MIAO, CHEN, and ZHAO). Value of stock indices are calculated
based on stocks with high market capitalization. Various technical parameters are
used to gain statistical information from value of stocks prices. Stock indices are de-
rived from prices of stocks with high market capitalization and so they give an overall
picture of economy and depends on various factors.
There are several different approaches to time series modelling. Traditional sta-
tistical models including moving average, exponential smoothing, and ARIMA are
linear in their predictions of the future values (Rao and Gabr Hsieh Bollerslev). Ex-
tensive research has resulted in numerous prediction applications using ANN, Fuzzy
Logic, Genetic Algorithms (GA) and other techniques (Lee and Tong Hadavandi,
Shavandi, and Ghanbari Zarandi, Hadavandi, and Turksen). ANN SVR are two ma-
chine learning algorithms which have been most widely used for predicting stock price
and stock market index values. Each algorithm has its own way to learn patterns.
(Zhang and Wu) incorporated the Backpropagation neural network with an Improved
Bacterial Chemotaxis Optimization (IBCO). They demonstrated the ability of their
proposed approach in predicting stock index for both short term (next day) and long
term (15 days). Simulation results exhibited the superior performance of proposed
approach. A combination of data preprocessing methods, GA and Levenberg Mar-
quardt (LM) algorithm for learning feed forward neural networks was proposed in
(Asadi et al.). They used data pre-processing methods such as data transformation
and selection of input variables for improving the accuracy of the model. The results
showed that the proposed approach was able to cope with the fluctuations of stock
market values and also yielded good prediction accuracy. The Artificial Fish Swarm
Algorithm (AFSA) was introduced in (Shen et al.) to train Radial Basis Function
Neural Network (RBFNN). Their experiments on the stock indices of the Shanghai
Stock Exchange indicated that RBFNN optimized by AFSA was an easy-to-use algo-
rithm with considerable accuracy. (Hadavandi, Ghanbari, and Abbasian-Naghneh)
proposed a hybrid artificial intelligence model for stock exchange index forecasting.
The model was a combination of GA and feed forward Neural Network.
CHAPTER 4. PREDICTING STOCK MARKET INDEX 41
The Support Vector Machine (SVM) introduced by (Vapnik) has gained popu-
larity and is regarded as a state-of-the-art technique for regression and classification
applications. (Kazem et al.) proposed a forecasting model based on chaotic mapping,
firefly algorithm, and SVR to predict stock market price. SVR-CFA model which was
newly introduced in their study, was compared with SVR-GA , SVR-CGA (Chaotic
GA), SVR-FA (Firefly Algorithm), ANN and ANFIS models and the results showed
that SVR-CFA model performed better than other models. (Pai et al.) developed a
Seasonal Support Vector Regression (SSVR) model to forecast seasonal time series
data. Hybrid Genetic Algorithms and Tabu Search (GA/TS) algorithms were applied
in order to select three parameters of SSVR models. They also applied two other fore-
casting models, ARIMA and SVR for forecasting on the same data sets. Empirical
results indicated that the SSVR outperformed both SVR and ARIMA models in
terms of forecasting accuracy. By integrating GA based optimal time-scale feature
extractions with SVM, (Huang and Wu) developed a novel hybrid prediction model
that operated for multiple time-scale resolutions and utilized a flexible nonparamet-
ric regressor to predict future evolutions of various stock indices. In comparison with
Neural Networks, pure SVMs and traditional GARCH models, the proposed model
performed the best. The reduction in root-mean-squared error was significant. Fi-
nancial time series prediction using ensemble learning algorithms in (Cheng, Xu, and
Wang) suggested that ensemble algorithms were powerful in improving the perfor-
mances of base learners. The study by (Aldin, Dehnavr, and Entezari) evaluated
the effectiveness of using technical indicators, such as Moving Average, RSI, CCI,
MACD, etc. in predicting movements of Tehran Exchange Price Index (TEPIX).
This study focuses on the task of predicting future values of stock market indices.
The predictions are made for 1 to 10, 15 and 30 days in advance. A two stage fusion
approach involving (SVR in the first stage is proposed. The second stage of the
fusion approach uses ANN, Random Forest and SVR resulting in SVR-ANN, SVR-
RF and SVR-SVR prediction models. The prediction performance of these hybrid
models is compared with the single stage scenarios where ANN, RF and SVR are
used single-handedly.
CHAPTER 4. PREDICTING STOCK MARKET INDEX 42
4.2 Single Stage Approach
The basic idea of single stage approach is illustrated in Figure 4.1. It can be seen
that for the prediction task of n-day ahead of time, inputs to prediction models are
ten technical indicators describing tth-day while the output is (t + n)th-day’s closing
price. These technical indicators which are used as inputs are summarized in Table
3.4. The prediction models which are employed in this study are described in the
following sub-sections.
Figure 4.1: General architecture of single stage approach for predicting n day ahead
of time
CHAPTER 4. PREDICTING STOCK MARKET INDEX 43
4.2.1 Artificial Neural Network
Three layer feed forward back propagation ANN similar to that shown in Figure 3.1
is employed in this study (Mehrotra, Mohan, and Ranka Han, Kamber, and Pei).
The only difference is that, the transfer function of the neuron, in the output layer,
is linear. This neuron in the output layer predicts closing price/value instead of the
up/down movement as was the case in the previous chapter. Input layer has ten
neurons, one for each of the selected technical parameters. The value of the index
which is to be predicted is represented by the single neuron in the output layer.
Adaptive gradient descent is used as the weight update algorithm. A tan-sigmoid is
used as the transfer function of the neurons of the hidden layer. The output of the
model is a continuous value, signifying the predicted value of the index. The reason
behind using adaptive gradient descent is to allow learning rate to change during the
training process. It may improve the performance of the gradient descent algorithm.
In adaptive gradient descent, first, the initial network output and error are calculated.
The current learning rate is used to calculate new weights and biases at each epoch.
Based on these new weights and biases, new outputs and errors are calculated. If the
new error exceeds the old error by more than a predefined ratio (1.04, in this study),
the new weights and biases are discarded and the learning rate is decreased (to 70%
of its current value, in this study). Otherwise, new weights and biases are kept and
the learning rate is increased (by 5% of the current value, in the experiments reported
in this thesis).
The procedure ensures that the learning rate is increased only to the extent that
the network can learn without large increases in error. This allows to obtain near
optimal learning rate for the local terrain. At the same time, as long as stable learning
is assured, learning rate is increased. When it is too high to assure a decrease in error,
it is decreased until stable learning resumes.
Number of neurons in the hidden layer and number of epochs are considered as
the parameters of the model. Comprehensive number of experiments are carried out
by varying the parameter values as shown in Table 4.1.
CHAPTER 4. PREDICTING STOCK MARKET INDEX 44
Table 4.1: ANN parameters and their values tested
Parameters Values
Number of Hidden Layer Neurons (n) 10,20,· · · , 100
Epochs (ep) 1000, 2000, · · · , 10000
4.2.2 Support Vector Regression
The SVR uses the same principles as the SVM for classification, with only a few
minor differences (Vapnik). First of all, because output is a real number, it becomes
very difficult to predict the information at hand, which has infinite possibilities. In
the case of regression, a margin of tolerance ε is set in approximation to the SVM.
Up until the threshold ε, the error is considered 0. However, the main idea is always
the same: to minimize error, individualizing the hyper plane which maximizes the
margin, considering that, part of the error is tolerated (Parrella).
The basic concepts of SVR which are discussed here can also be found in
(Cristianini and Shawe-Taylor Kecman) and (Huang and Tsai). Assume that xi ∈
Rd, i = 1, 2, · · · ,m forms a set of input vectors with corresponding response vari-
able yi ∈ R, i = 1, 2, · · · ,m. SVR builds the linear regression function as shown in
Equation 4.1.
f(x,w) = wTx+ b (4.1)
Equation 4.2 shows Vapnik’s linear ε−Insensitivity loss function.
|y − f(x,w)|ε =
0, if |y − f(x,w)| ≤ ε
|y − f(xi, w)| − ε, otherwise
(4.2)
Based on this, linear regression f(x,w) is estimated by simultaneously minimizing
||w||2 and the sum of the linear ε−Insensitivity losses as shown in Equation 4.3. The
constant c controls a trade-off between an approximation error and the weight vector
norm ||w||.
R =1
2||w||2 + c(
m∑i=1
|y − f(x,w)|ε) (4.3)
CHAPTER 4. PREDICTING STOCK MARKET INDEX 45
Minimizing the risk R is equivalent to minimizing the risk shown in Equation 4.4
under the constraints illustrated in Equation 4.5, 4.6 and 4.7. Here, ξi and ξ∗i are
slack variables, one for exceeding the target value by more than ε and other for being
more than ε below the target.
R =1
2||w||2 + c
m∑i=1
(ξ + ξ∗) (4.4)
(wTxi + b)− yi ≤ ε+ ξi (4.5)
yi − (wTxi + b) ≤ ε+ ξ∗i (4.6)
ξi, ξ∗i ≥ 0, i = 1, 2, . . . ,m (4.7)
Similar to SVM, above constrained optimization problem is solved using Lagrangian
theory and the Karush-Kuhn-Tucker conditions to obtain the desired weight vector
of the regression function. SVR can map the input vectors xi ∈ Rd into a high
dimensional feature space Φ(xi) ∈ H. A kernel function K(xi, xj) performs the
mapping φ(·). The polynomial and radial basis kernel functions are used here and
they are shown in Equations 4.8 and 4.9 respectively.
Polynomial Function : K(xi, xj) = (xi · xj + 1)d (4.8)
Radial Basis Function : K(xi, xj) = exp(−γ||xi − xj||2) (4.9)
Here, d is the degree of polynomial function and γ is the constant of radial basis
function. Choice of kernel function, degree of kernel function (d) in case of polynomial
kernel, gamma in kernel function (γ) in case of radial basis kernel and regularization
constant (c) are considered as the parameters of SVR. Comprehensive number of
experiments are carried out by varying the parameter values as shown in Table 4.2.
Table 4.2: SVR parameters and their values tested
Parameters Value(s)
Degree of Kernel Function (d) 1, 2, 3, 4
Gamma in Kernel Function (γ) 0, 0.5, 1, 1.5, 2, 2.5, 3, 4, 5, 10, 20, 50, 100
Regularization Parameter (c) 1
CHAPTER 4. PREDICTING STOCK MARKET INDEX 46
4.2.3 Random Forest
It is already discussed in section 3.4.3. The only difference in the implementation
here is that instead of classification tree, regression tree is used as the base learner of
the ensemble.
Number of trees (ntrees) in the ensemble is considered as the parameter of
Random Forest. Experiments are carried out with 50, 100 and 150 number of trees.
4.3 Two Stage Fusion Approach
The basic idea of two stage fusion approach is illustrated in Figure 4.2. The first stage
employs SVRs to prepare inputs for the prediction models employed in the second
stage.
Figure 4.2: General architecture of two stage fusion approach for predicting n day
ahead of time
Details about inputs and outputs to these SVRs, for the prediction task of n-day
ahead of time, are depicted in Figure 4.3.
CHAPTER 4. PREDICTING STOCK MARKET INDEX 47
Figure 4.3: Details of two stage fusion approach for predicting n day ahead of time
It is to be noticed that inputs to the SVRs in the first stage describe tth day while
outputs of this stage describe (t+n)th-day in terms of ten technical indicators. These
outputs from the first stage serve as the inputs to the prediction models in the second
stage. This leads to the situation where prediction models in the second stage have to
identify mapping transformation from technical parameters describing (t + n)th day
to (t + n)th day’s closing price. This is different from single stage approach, where,
prediction models have to identify mapping transformation from technical parameters
describing tth day to (t+n)th day’s closing price. It can be seen that the final output
in both the approaches is the closing value of the (t+ n)th day. As shown in Figures
4.2 and 4.3, ANN, SVR and Random Forest are employed as the prediction models
in the second stage. Comprehensive number of experiments are carried out for each
of the prediction models in second stage by varying the parameter values in the same
manner as in the single stage approach.
CHAPTER 4. PREDICTING STOCK MARKET INDEX 48
Table 4.3: Best parameter combination reported by parameter tuning experiments
for each of the SVRs in first stage of two stage fusion approach
SVR Kernel Function Gamma(γ)
SVR-1 RBF 2
SVR-2 RBF 2
SVR-3 RBF 2
SVR-4 RBF 100
SVR-5 RBF 100
SVR-6 RBF 10
SVR-7 RBF 4
SVR-8 RBF 100
SVR-9 RBF 2
SVR-10 RBF 1.5
However, parameter tuning experiments are performed for each of the SVRs in
the first stage to decide best combination of parameter values. A parameter tuning
data set is formed as the 20% data of the entire data set. The parameter tuning data
set is further divided in training and testing set. Training data set consists of 80% of
the parameter tuning data set while remaining of the parameter tuning data forms
the testing data set. By means of experiments on these training and testing set, best
combination of parameter values for each of the SVRs in the first stage is identified.
In this study, these experiments are called as parameter tuning experiments, for the
SVRs in the first stage. The possible values which are considered for each of the
parameters of these SVRs are same as shown in Table 4.2.
Results of parameter tuning experiments for each of the SVRs in the first stage
show that transformation of input space through RBF kernel performs better than
the transformation through polynomial kernel. The best parameter combinations as
CHAPTER 4. PREDICTING STOCK MARKET INDEX 49
reported by parameter tuning experiments for each of the SVRs in the first stage are
summarized in Table 4.3. It is to be noticed that the aim of the parameter tuning
experiments is to identify best parameter combination for each of the SVRs in first
stage, so that, error in statistical parameters which are to be used as inputs to the
prediction models in second stage is minimized. During the overall experiments of
stock market value predictions, SVRs in the first stage are used with the parameters
determined during the parameter tuning experiments.
4.4 Experimental Evaluation
4.4.1 Data for Experimentation
This study uses total ten years of historical data from Jan 2003 to Dec 2012 of two
stock market indices CNX Nifty and S&P BSE Sensex which are highly voluminous.
The ten technical indicators used are calculated from close, high, low and opening
prices of these indices. All the data is obtained from http://www.nseindia.com/ and
http://www.bseindia.com/ websites.
4.4.2 Evaluation Measures
Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), relative
Root Mean Squared Error (rRMSE) and Mean Squared Error (MSE) are used to
evaluate the performance of these prediction models and their formulas are shown in
Equations 4.10, 4.11, 4.12 and 4.13. It is to be noticed that MAPE is measured in
terms of %.
MAPE =1
n
n∑t=1
|At − Ft||At|
× 100 (4.10)
MAE =1
n
n∑t=1
|At − Ft||At|
(4.11)
rRMSE =
√√√√ 1
n
n∑t=1
(At − FtAt
)2
(4.12)
CHAPTER 4. PREDICTING STOCK MARKET INDEX 50
MSE =1
n
n∑t=1
(At − Ft)2 (4.13)
where At is actual value and Ft is forecast value.
4.4.3 Results and Discussions
Three prediction models namely ANN, SVR and RF are used in single stage approach.
In two stage fusion approach, the prediction models that are used are SVR-ANN,
SVR-SVR and SVR-RF. For both the approaches, prediction experiments for 1 to 10,
15 and 30 days ahead of time are carried out. Results for CNX Nifty are shown in
Tables from 4.4 to 4.15. Similar results for S&P BSE Sensex are depicted in Tables
4.16 to 4.27.
It is important to notice that for each of the prediction tasks and prediction
models, comprehensive number of experiments are carried out, for different possible
combinations of model parameters. The values reported in the tables are the best
parameter combinations where minimum prediction error is exhibited.
It is evident from the result that, as predictions are made for more and more
number of days in advance, error values increase. This may be obvious for any predic-
tion system. Proposed two stage fusion models SVR-ANN and SVR-RF outperform
ANN and RF models for almost all prediction tasks for both the data sets. SVR-SVR
outperforms SVR for all the prediction tasks except for the prediction tasks up to 3
to 4 days in advance.
Table 4.28 and 4.29 compares performance of single stage models to two stage
fusion models for CNX Nifty. The reported values in these tables are averaged over all
12 prediction tasks (1 to 10, 15 and 30 days in advance). Similar results for S&P BSE
Sensex are summarised in Table 4.30 and 4.31. Tables 4.28 & 4.30 compare single
stage prediction models to two stage fusion models on the basis of average prediction
performance while Tables 4.29 & 4.31 show percentage improvement in performance
achieved by two stage fusion prediction models over single stage prediction models.
CHAPTER 4. PREDICTING STOCK MARKET INDEX 51
Table 4.4: Prediction performance of 1-day ahead of time (for CNX Nifty)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 3000 1.91 102.06 2.48 17745.90
SVR-ANN 7000 1.50 79.05 1.93 10006.12
γ MAPE MAE rRMSE MSE
SVR 4.00 0.99 52.48 1.26 4427.05
SVR-SVR 5.00 1.47 77.63 1.87 9614.30
ntrees MAPE MAE rRMSE MSE
Random Forest 50 1.36 72.45 1.68 8086.79
SVR-Ranom Forest 150 1.29 69.01 1.64 7710.16
Table 4.5: Prediction performance of 2-days ahead of time (for CNX Nifty)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 3000 1.92 101.79 2.42 16399.21
SVR-ANN 10000 1.66 87.82 2.13 12299.75
γ MAPE MAE rRMSE MSE
SVR 2.50 1.40 74.15 1.78 8748.11
SVR-SVR 5.00 1.61 85.14 2.09 12104.73
ntrees MAPE MAE rRMSE MSE
Random Forest 50 1.80 95.69 2.24 14206.36
SVR-Ranom Forest 150 1.55 82.34 1.96 10832.60
CHAPTER 4. PREDICTING STOCK MARKET INDEX 52
Table 4.6: Prediction performance of 3-days ahead of time (for CNX Nifty)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 6000 2.16 113.79 2.75 20668.78
SVR-ANN 4000 1.86 98.25 2.38 15736.44
γ MAPE MAE rRMSE MSE
SVR 0.00 1.76 93.13 2.22 13556.21
SVR-SVR 5.00 1.86 98.58 2.39 15833.27
ntrees MAPE MAE rRMSE MSE
Random Forest 150 2.12 112.78 2.67 20043.69
SVR-Ranom Forest 100 1.93 102.07 2.42 16355.13
Table 4.7: Prediction performance of 4-days ahead of time (for CNX Nifty)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 4000 2.49 131.92 3.10 26636.56
SVR-ANN 2000 2.12 112.14 2.72 20455.22
γ MAPE MAE rRMSE MSE
SVR 0.00 2.08 109.66 2.59 18445.44
SVR-SVR 5.00 2.06 108.64 2.62 19013.75
ntrees MAPE MAE rRMSE MSE
Random Forest 50 2.40 127.34 2.97 24734.84
SVR-Ranom Forest 100 2.12 112.55 2.69 20067.88
CHAPTER 4. PREDICTING STOCK MARKET INDEX 53
Table 4.8: Prediction performance of 5-days ahead of time (for CNX Nifty)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 2000 2.85 151.81 3.57 36016.34
SVR-ANN 2000 2.32 121.91 2.96 23562.53
γ MAPE MAE rRMSE MSE
SVR 4.00 2.34 123.77 2.92 23455.13
SVR-SVR 5.00 2.26 119.59 2.86 22583.42
ntrees MAPE MAE rRMSE MSE
Random Forest 100 2.62 139.01 3.30 30370.98
SVR-Ranom Forest 50 2.39 126.86 3.00 24975.76
Table 4.9: Prediction performance of 6-days ahead of time (for CNX Nifty)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 7000 2.80 149.35 3.47 34308.21
SVR-ANN 5000 2.48 130.52 3.16 26803.00
γ MAPE MAE rRMSE MSE
SVR 5.00 2.57 135.44 3.22 28502.25
SVR-SVR 5.00 2.46 130.04 3.08 26297.36
ntrees MAPE MAE rRMSE MSE
Random Forest 50 2.78 147.98 3.55 35171.38
SVR-Ranom Forest 50 2.61 138.29 3.23 28782.80
CHAPTER 4. PREDICTING STOCK MARKET INDEX 54
Table 4.10: Prediction performance of 7-days ahead of time (For CNX Nifty)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 9000 3.02 160.20 3.83 41725.69
SVR-ANN 9000 2.65 139.58 3.36 30552.47
γ MAPE MAE rRMSE MSE
SVR 10.00 2.74 144.93 3.48 33231.86
SVR-SVR 5.00 2.61 137.81 3.33 30462.37
ntrees MAPE MAE rRMSE MSE
Random Forest 50 3.08 164.00 3.96 44032.01
SVR-Ranom Forest 100 2.84 150.38 3.55 34948.60
Table 4.11: Prediction performance of 8-days ahead of time (for CNX Nifty)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 3000 3.01 160.14 3.80 40747.70
SVR-ANN 8000 2.82 149.03 3.60 35377.97
γ MAPE MAE rRMSE MSE
SVR 0.00 2.90 153.08 3.69 37239.22
SVR-SVR 4.00 2.77 145.99 3.55 34515.93
ntrees MAPE MAE rRMSE MSE
Random Forest 50 3.33 177.80 4.25 51119.66
SVR-Ranom Forest 50 2.90 153.03 3.68 37168.41
CHAPTER 4. PREDICTING STOCK MARKET INDEX 55
Table 4.12: Prediction performance of 9-days ahead of time (for CNX Nifty)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 1000 3.36 178.61 4.22 50724.35
SVR-ANN 4000 3.00 157.63 3.82 39622.51
γ MAPE MAE rRMSE MSE
SVR 0.00 3.08 162.50 3.92 41815.60
SVR-SVR 4.00 2.94 154.96 3.77 38621.40
ntrees MAPE MAE rRMSE MSE
Random Forest 150 3.58 190.76 4.56 58407.62
SVR-Ranom Forest 50 3.04 160.58 3.87 41177.90
Table 4.13: Prediction performance of 10-days ahead of time (for CNX Nifty)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 6000 3.54 188.73 4.61 60592.10
SVR-ANN 7000 3.23 170.40 4.15 46788.00
γ MAPE MAE rRMSE MSE
SVR 0.00 3.24 170.61 4.15 46764.19
SVR-SVR 4.00 3.11 163.74 3.99 43197.66
ntrees MAPE MAE rRMSE MSE
Random Forest 150 3.73 198.62 4.81 64653.82
SVR-Ranom Forest 150 3.26 172.23 4.15 47132.72
CHAPTER 4. PREDICTING STOCK MARKET INDEX 56
Table 4.14: Prediction performance of 15-days ahead of time (for CNX Nifty)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 8000 4.05 215.26 5.06 71431.74
SVR-ANN 7000 3.75 195.95 4.87 62369.57
γ MAPE MAE rRMSE MSE
SVR 0.00 4.04 212.36 5.09 69934.62
SVR-SVR 4.00 3.83 201.51 4.87 63747.90
ntrees MAPE MAE rRMSE MSE
Random Forest 50 4.12 217.61 5.49 82312.31
SVR-Ranom Forest 50 3.82 201.15 4.86 63754.16
Table 4.15: Prediction performance of 30-days ahead of time (for CNX Nifty)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 7000 5.02 267.49 6.21 109479.02
SVR-ANN 7000 4.56 237.59 5.79 86912.03
γ MAPE MAE rRMSE MSE
SVR 0.00 5.32 278.37 6.82 124246.62
SVR-SVR 4.00 4.94 258.41 6.26 103710.57
ntrees MAPE MAE rRMSE MSE
Random Forest 150 5.26 276.87 6.96 130770.09
SVR-Ranom Forest 50 4.88 255.23 6.19 101094.69
CHAPTER 4. PREDICTING STOCK MARKET INDEX 57
Table 4.16: Prediction performance of 1-day ahead of time (for S&P BSE Sensex)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 3000 1.78 313.92 2.31 166090.16
SVR-ANN 7000 1.55 272.71 1.96 118395.09
γ MAPE MAE rRMSE MSE
SVR 4.00 0.98 172.47 1.25 47558.47
SVR-SVR 0.50 1.48 260.05 1.89 108137.61
ntrees MAPE MAE rRMSE MSE
Random Forest 100 1.25 221.91 1.60 81098.60
SVR-Ranom Forest 50 1.23 216.02 1.55 73483.60
Table 4.17: Prediction performance of 2-days ahead of time (for S&P BSE Sensex)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 7000 1.92 338.77 2.40 179261.33
SVR-ANN 3000 1.69 296.41 2.20 146339.70
γ MAPE MAE rRMSE MSE
SVR 10.00 1.38 241.46 1.75 93134.99
SVR-SVR 0.50 1.59 278.92 2.07 131058.24
ntrees MAPE MAE rRMSE MSE
Random Forest 100 1.66 292.50 2.08 134559.91
SVR-Ranom Forest 150 1.62 285.74 2.01 125881.22
CHAPTER 4. PREDICTING STOCK MARKET INDEX 58
Table 4.18: Prediction performance of 3-days ahead of time (for S&P BSE Sensex)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 1000 2.15 378.59 2.75 233075.75
SVR-ANN 9000 2.02 354.01 2.64 212027.62
γ MAPE MAE rRMSE MSE
SVR 0.00 1.75 306.11 2.21 148141.01
SVR-SVR 0.50 1.85 324.22 2.37 171209.68
ntrees MAPE MAE rRMSE MSE
Random Forest 100 2.01 353.69 2.54 198459.92
SVR-Ranom Forest 150 1.89 332.20 2.35 169694.03
Table 4.19: Prediction performance of 4-days ahead of time (for S&P BSE Sensex)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 4000 2.28 399.99 2.87 247592.09
SVR-ANN 9000 2.27 392.53 2.99 258336.42
γ MAPE MAE rRMSE MSE
SVR 0.00 2.05 358.97 2.56 199730.19
SVR-SVR 0.50 2.04 357.32 2.60 205740.59
ntrees MAPE MAE rRMSE MSE
Random Forest 100 2.42 426.06 3.04 285116.07
SVR-Ranom Forest 100 2.11 370.79 2.64 215054.35
CHAPTER 4. PREDICTING STOCK MARKET INDEX 59
Table 4.20: Prediction performance of 5-days ahead of time (for S&P BSE Sensex)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 2000 2.70 476.36 3.42 363471.55
SVR-ANN 8000 2.30 403.21 2.92 259749.88
γ MAPE MAE rRMSE MSE
SVR 2.50 2.31 405.27 2.90 255505.25
SVR-SVR 0.50 2.23 392.30 2.83 244955.34
ntrees MAPE MAE rRMSE MSE
Random Forest 50 2.68 473.25 3.34 344949.70
SVR-Ranom Forest 100 2.32 409.54 2.89 256949.33
Table 4.21: Prediction performance of 6-days ahead of time (for S&P BSE Sensex)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 7000 2.70 478.29 3.37 357530.68
SVR-ANN 6000 2.43 429.90 3.14 312341.99
γ MAPE MAE rRMSE MSE
SVR 4.00 2.53 443.13 3.18 308063.70
SVR-SVR 0.50 2.42 425.68 3.04 282869.87
ntrees MAPE MAE rRMSE MSE
Random Forest 50 2.92 516.14 3.68 419168.80
SVR-Ranom Forest 150 2.50 439.29 3.10 293822.81
CHAPTER 4. PREDICTING STOCK MARKET INDEX 60
Table 4.22: Prediction performance of 7-days ahead of time (for S&P BSE Sensex)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 9000 2.82 497.40 3.63 412421.92
SVR-ANN 5000 2.53 445.19 3.23 320178.09
γ MAPE MAE rRMSE MSE
SVR 4.00 2.69 472.27 3.42 356444.96
SVR-SVR 1.50 2.55 447.80 3.26 324401.97
ntrees MAPE MAE rRMSE MSE
Random Forest 100 3.20 566.55 4.04 508254.64
SVR-Ranom Forest 150 2.69 473.19 3.33 341307.44
Table 4.23: Prediction performance of 8-days ahead of time (for S&P BSE Sensex)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 6000 3.12 546.67 3.93 465739.08
SVR-ANN 7000 2.64 459.58 3.54 367846.18
γ MAPE MAE rRMSE MSE
SVR 0.00 2.84 497.85 3.61 396295.99
SVR-SVR 1.50 2.71 475.37 3.48 368319.67
ntrees MAPE MAE rRMSE MSE
Random Forest 150 3.38 596.67 4.29 569631.81
SVR-Ranom Forest 100 2.88 507.31 3.58 392375.92
CHAPTER 4. PREDICTING STOCK MARKET INDEX 61
Table 4.24: Prediction performance of 9-days ahead of time (for S&P BSE Sensex)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 9000 3.17 554.86 4.05 496313.24
SVR-ANN 6000 2.87 499.37 3.74 411681.57
γ MAPE MAE rRMSE MSE
SVR 0.00 3.02 529.53 3.84 444071.59
SVR-SVR 0.50 2.87 502.33 3.68 408835.92
ntrees MAPE MAE rRMSE MSE
Random Forest 50 3.49 616.32 4.46 615743.91
SVR-Ranom Forest 150 3.10 545.80 3.86 454853.43
Table 4.25: Prediction performance of 10-days ahead of time (for S&P BSE Sensex)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 8000 3.45 603.90 4.40 585260.94
SVR-ANN 5000 2.72 474.62 3.60 387086.13
γ MAPE MAE rRMSE MSE
SVR 10.00 3.19 557.98 4.10 505260.72
SVR-SVR 0.50 3.00 525.45 3.87 449987.02
ntrees MAPE MAE rRMSE MSE
Random Forest 100 3.62 637.70 4.60 648907.41
SVR-Ranom Forest 150 3.19 561.07 4.06 497755.22
CHAPTER 4. PREDICTING STOCK MARKET INDEX 62
Table 4.26: Prediction performance of 15-days ahead of time (for S&P BSE Sensex)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 5000 3.90 681.64 4.84 700906.09
SVR-ANN 6000 3.58 624.50 4.54 612524.19
γ MAPE MAE rRMSE MSE
SVR 10.00 3.94 688.47 4.96 735083.38
SVR-SVR 0.50 3.69 644.66 4.68 651935.80
ntrees MAPE MAE rRMSE MSE
Random Forest 50 4.16 730.56 5.52 916603.04
SVR-Ranom Forest 150 3.80 664.37 4.81 692641.93
Table 4.27: Prediction performance of 30-days ahead of time (for S&P BSE Sensex)
Prediction Models Parameters Error Measures
ep MAPE MAE rRMSE MSE
ANN 8000 4.83 839.92 6.28 1152684.67
SVR-ANN 1000 4.32 745.01 5.58 891384.72
γ MAPE MAE rRMSE MSE
SVR 0.00 5.26 913.05 6.75 1341763.55
SVR-SVR 0.50 4.74 822.34 5.99 1047397.92
ntrees MAPE MAE rRMSE MSE
Random Forest 150 5.29 926.53 6.73 1357838.40
SVR-Ranom Forest 50 4.67 809.42 5.94 1022031.89
CHAPTER 4. PREDICTING STOCK MARKET INDEX 63
Table 4.28: Average prediction performance for CNX Nifty
Prediction Model MAPE MAE rRMSE MSE
ANN 3.01 160.10 3.79 43872.97
SVR-ANN 2.66 139.99 3.41 34207.13
SVR 2.71 142.54 3.43 37530.53
SVR-SVR 2.66 140.17 3.39 34975.22
Random Forest (RF) 3.02 160.08 3.87 46992.46
SVR-RF 2.72 143.64 3.44 36166.73
Table 4.29: Performance improvement in (%) (Single stage models vs. Two stage
fusion models) for CNX Nifty
Models under Comparison MAPE MAE rRMSE MSE
ANN vs. SVR-ANN 11.57 12.56 10.22 22.03
SVR vs. SVR-SVR 1.66 1.66 1.12 6.81
RF vs. SVR-RF 9.81 10.27 11.2 23.04
Table 4.30: Average prediction performance for S&P BSE Sensex
Prediction Model MAPE MAE rRMSE MSE
ANN 2.90 509.19 3.69 446695.60
SVR-ANN 2.58 449.75 3.34 358157.63
SVR 2.66 465.55 3.38 402606.42
SVR-SVR 2.60 454.69 3.31 366287.80
Random Forest (RF) 3.01 529.80 3.83 508104.95
SVR-RF 2.67 467.92 3.35 378614.56
It can be observed that performance of SVR-ANN and SVR-RF models is im-
proved significantly than ANN and RF models. SVR-SVR model exhibits a moderate
CHAPTER 4. PREDICTING STOCK MARKET INDEX 64
improvement over SVR model. These results demonstrate the effectiveness of our
proposal. Comparison of prediction performance of all the models for both the stock
market indices reveals that SVR-ANN model performs the best overall.
Table 4.31: Performance improvement in (%) (Single stage models vs. Two stage
fusion models) for S&P BSE Sensex
Models under Comparison MAPE MAE rRMSE MSE
ANN vs. SVR-ANN 11.2 11.67 9.42 19.82
SVR vs. SVR-SVR 2.41 2.33 1.88 9.02
RF vs. SVR-RF 11.31 11.68 12.7 25.48
Figure 4.4 shows the actual value of the CNX Nifty, value predicted by ANN
and SVR-ANN models for the task of predicting 5-day ahead of time. The visual
representation also validates the effectiveness of the proposed two stage fusion ap-
proach. Visual representation for the other prediction tasks (not shown here) also
demonstrates the effectiveness of the proposed approach.
The reason behind the improved performance of two stage fusion approach over
the single stage approach can be justified as follows. In two stage fusion approach,
prediction models in the second stage have to identify transformation from technical
parameters describing (t + n)th day to (t + n)th day’s closing price, while in single
stage approach, prediction models have to identify transformation from technical
parameters describing tth day to (t+ n)th day’s closing price.
The introduction of an additional stage in case of two stage fusion approach
takes the responsibility of preparing data for the prediction models in the second
stage. Actually it transforms closing and opening price, low and high of tth day
to technical parameters representing (t + n)th day. This may reduce the prediction
error, as now, the prediction models in second stage have to predict based on predicted
technical parameters of (t + n)th day rather than actual technical parameters but of
tth day.
CHAPTER 4. PREDICTING STOCK MARKET INDEX 65
Figure 4.4: Prediction performance comparison of ANN and SVR-ANN for predicting
5 day ahead of time for CNX Nifty
4.5 Conclusions
The task of predicting future values of stock market indices is focused in this study.
Experiments are carried out on ten years of historical data of two indices namely
CNX Nifty and S&P BSE Sensex from Indian stock markets. The predictions are
made for 1 to 10, 15 and 30 days in advance.
Review of the existing literature on the topic revealed that existing methods
for predicting stock market index’s value\price have used a single stage prediction
approach. In these existing methods, the technical\statistical parameters’ value of
(t)th day is used as inputs to predict the (t + n)th day’s closing price\value (t is a
current day). In such scenarios, as the value of n increases, predictions are based on
increasingly older values of statistical parameters and thereby not accurate enough.
It is clear from this discussion that there is a need to address this problem and two
stage prediction scheme which can bridge this gap and minimize the error stage wise
may be helpful.
Some of the literatures on the focused topic have tried to hybridize various
machine learning techniques but none has tried to bridge the identified gap, rather,
in these literatures, generally it is found that one machine learning technique is used
CHAPTER 4. PREDICTING STOCK MARKET INDEX 66
to tune the design parameters of the other technique.
A two stage fusion approach involving support vector regression (SVR) in the
first stage and ANN, random forest and SVR in the second stage is proposed in
this chapter to address the problem identified. Experiments are carried out with
single stage and two stage fusion prediction models. The results show that two stage
hybrid models perform better than that of the single stage prediction models. The
performance improvement is significant in case when ANN and RF are hybridized
with SVR. A moderate improvement in the performance is observed when SVR is
hybridized with itself. The best overall prediction performance is achieved by SVR-
ANN model.
The proposal of two stage prediction scheme is a significant research contribu-
tion of this chapter as this scheme provides a kind of new way of feeding adequate
information to prediction models. To accomplish this, machine learning methods
are used in cascade in two stages. First stage uses SVR to predict future values of
statistical parameters which are fed as the inputs to the prediction models in the
second stage. Experimental results are promising and demonstrates the usefulness
of the proposed approach. The proposed approach is not only successful but also
useful and adaptable for other prediction tasks such as weather forecasting, energy
consumption forecasting and GDP forecasting. This generalizability of the proposed
approach definitely makes the proposal a significant contribution to the research.