[ieee 2013 brics congress on computational intelligence & 11th brazilian congress on...
TRANSCRIPT
Combination of Biased Artificial Neural Network Forecasters
Thaıze F. Oliveira∗, Ricardo T. A. de Oliveira†,Paulo Renato A. Firmino‡, Paulo S. G. de Mattos Neto§, and Tiago A. E. Ferreira¶
Department of Statistics and InformaticsFederal Rural University of Pernambuco52171-900, Recife, Pernambuco, Brazil
∗ [email protected] † [email protected]‡ [email protected] § [email protected] ¶ [email protected]
Abstract—Artificial neural networks (ANN) have beenparamount for modeling and forecasting time series phe-nomena. In this way it has been usual to suppose that eachANN model generates a white noise as prediction error.However, mostly because of disturbances not captured byeach model, it is yet possible that such supposition is violated.On the other hand, to adopt a single ANN model may leadto statistical bias and underestimation of uncertainty. Thepresent paper introduces a two-step maximum likelihoodmethod for correcting and combining ANN models. Applica-tions involving single ANN models for Dow Jones IndustrialAverage Index and S&P500 series illustrate the usefulness ofthe proposed framework.
Keywords-Time Series Forecasting Models; Unbiased Fore-casts; Maximum Likelihood Estimation; Linear Combinationof Forecast.
I. INTRODUCTION
Approaches based on artificial neural networks (ANN)
for time series forecasting have produced convincing re-
sults in recent decades [1]–[4]. In this way, it has been
usual to spend considerable computational resources in
order to select the best ANN model [1], [3], [4]. Therefore,
much of ANN literature implicitly assume that there is
a true ANN model for the time series description, only
resting to infer its parameters. In other terms, this mod-
eling perspective only deals with parameters uncertainty
and neglects the model uncertainty.
However, it might be very difficult to find a unique and
true model for a given time series. Some authors [5] have
even pointed out that adopting just one model may lead
to statistical bias and underestimation of uncertainty. With
those arguments in mind, model uncertainty seems to be
present in any time series analysis.
Currently, model uncertainty research has been in the
vanguard of time series analysis. Under this reasoning
some authors [6]–[8] have rejected the hypothesis that
a unique and true model is achievable. Instead, these
researchers have been challenged by the attempt of com-
bining diverse models in order to present aggregated
forecasts. In this way, reviews from model uncertainty
literature [9]–[14] have emphasized linear combination of
forecasters (LCF). In LCF the combined estimator is given
by a weighted average of single models, where the weight
of each model is usually a function of the variance of its
residuals and its correlation with other models.
Generally, LCF are based on the assumption that the
residuals of each single model are caused by random
chocks, characterizing white noises, or unpredictable, in-
dependent, and unbiased terms. However, mostly because
of the heterogeneity of the phenomenon under study or
even due to disturbances not captured by the models, it
is yet possible that such suppositions are violated when
applying ANN models [15], [16], for instance.
The present paper illustrates cases involving ANN mod-
els for Dow Jones Industrial Average Index (DJ) series
and S&P500 series, where the white noise supposition
is violated and suggests a model uncertainty approach to
overcome the problem. Specifically, it is presented a two-
step LCF model.
II. METHODOLOGY
The proposed framework can be stated from three parts
(see Fig. 1): the Classical Approach, Correction Procedure
and Aggregation Procedure. The first step, i.e. the classical
approach, is not object of study of this paper. In this
step, each single forecasting model is elaborated. In fact,
the single models forecasts must be seen as input for
the proposed approach. Thus, the single models can be
considered black-boxes where the parameters and structure
is neglected.
In the Correction Procedure step, the residuals of each
ANN model are modeled via a recursive ARIMA-based
algorithm. The purpose of this step is to capture any
trend of the phenomena of interest not enveloped by
the original predictive models. Therefore, if necessary,
the prediction of the error series is used to correct the
respective estimates of the ANN forecaster for the future
values of the series (Forecastck in Fig. 1).
Finally, in the Aggregation Procedure step, the forecasts
of the corrected models are combined by the maximum
likelihood (ML) estimation method. In this procedure,
each model receives a weight proportional to its statistical
efficiency and independence with regard to the remaining
predictors. The weights are used to combine the models
according to its performance, where the larger the weight
(in module), the better the prediction. Therefore, the input
forecasters are unbiased, reinforcing the importance of
the correction phase and the output of this procedure is
the combination of the corrected single models forecasts
(Forecasta in Fig. 1).
1st BRICS Countries Congress on Computational Intelligence
978-1-4799-3194-1/13 $31.00 © 2013 IEEE
DOI 10.1109/BRICS-CCI.&.CBIC.2013.86
522
1st BRICS Countries Congress on Computational Intelligence
978-1-4799-3194-1/13 $31.00 © 2013 IEEE
DOI 10.1109/BRICS-CCI.&.CBIC.2013.86
522
2013 BRICS Congress on Computational Intelligence & 11th Brazilian Congress on Computational Intelligence
978-1-4799-3194-1/13 $31.00 © 2013 IEEE
DOI 10.1109/BRICS-CCI-CBIC.2013.92
522
These steps are presented as follows. Considering kANN models, let ut, Ut,i, and Et,i (i = 1, 2, . . . , k) be
in this order a time series to be predicted at time t, the
ith estimator (ANN forecaster) of ut and its respective
random error (bias). In order to perform the correction
and combination steps two error structures are taken into
account:
Et,i = Ut,i − ut (1)
namely additive error model and
Et,i =Ut,i
ut(2)
so-called multiplicative error model.
A. Correcting ANN models
1) Recursive ARIMA algorithm: The present paper
deals with biased predictive models and proposes the
adjustment of (log-)linear relationships in the residuals
of each model, under the supposition of constant vari-
ability, by means of the ARIMA formalism. The bias of
each model is modeled according to a recursive maxi-
mum likelihood (ML)-based ARIMA algorithm in order
to achieve a white noise behavior in the residuals (or
errors) of prediction. At each iteration of the algorithm the
best ML-ARIMA adjustment of the remaining residuals
is determined according to a given information criterion
(e.g. Akaike) until an ARIMA(p = 0, d = 0, q = 0)
is achieved (where p, d, and q respectively represent the
autoregressive, integrated, and moving-average orders of
the model).
Let ARIMAa(p,d,q) be the representation of the re-
cursive ARIMA model related to the residuals of a given
ANN model, represented by e, where the model index
is suppressed for clarity. Under this notation, a ARIMA
sub-models are involved in such a way that the remaining
error is a random noise, i.e. the (a + 1)th sub-model is
an ARIMA(0, 0, 0). Thus, let one consider a recursion
involving a iterations until the achievement of a random
noise. The order of the jth fitted model is (pj , dj , qj).In this way, if a = 0 then the initial forecaster error is
considered a random noise. Mathematically,
Et = μt + εt =a∑
j=1
mt,j + εt (3)
where εt ∼ Normal(0, σ) and mt,j is the mean value of
the ARIMA model adjusted to the jth remaining error. For
instance, it is considered the residuals e of a given pre-
dictor, a = 2 and an ARIMA2(p = (1, 0),d = (1, 1),q =(1, 0)) adjustment, one has Et =
∑2j=1 mt,j + εt where
mt,1 = θ(e)0 + et−1 + φ(e)(et−1 − et−2) + θ
(e)1 τt−1 is the
ARIMA(1, 1, 1) estimate of Et in the light of e. In other
words, mt,1 is the best ARIMA model adjusted to the
data e according to a given Information Criterion and its
parameters (θ(e)0 , θ
(e)1 , φ(e)) are inferred via ML estimation
while τt is the remaining residual of this adjustment. In
turn, τt = et−mt,1 and mt,2 = θτ0 +τt−1+εt is the mean
value of the ARIMA(0, 1, 0) adjusted to the sample error
τττ = (τ1, τ2, . . . , τt, . . . , τn), selected based on the adopted
Information Criterion. Finally, the best model to adjust εtis an ARIMA(0, 0, 0), characterizing εt as a random noise.
2) Additive and normally distributed errors: Regarding
the additive error structure in Equation 1, the following
model is obtained:
Ut,i = Et,i + ut. (4)
It is then supposed that Ut,i ∼ Normal(μt,i+ut, σi), where
μt,i is the mean value of Et,i and it is adjusted from
the proposed recursive ML-ARIMA model (Subsection
II-A1). Thus, σi is the standard deviation of the random
noise resulting from the recursive ML-ARIMA model. In
this way, Et,i ∼ Normal(μt,i, σi) and one can yet to
consider the approximation
Ut,i ∼ Normal(μt,i + ut, σi), (5)
for t > n, where n is the length of the observed time
series.
3) Multiplicative and log-normally distributed errors:Now, focusing on the cases where Et,i is a multiplicative
error (Equation 2) and it is supposed to follow a lognormal
distribution, one has E′t,i ∼ Normal(μ
′t,i, σ
′i) and
U′t,i ∼ Normal(μ
′t,i + u
′t, σ
′i), (6)
where E′t,i = log(Et,i), U
′t,i = log(Ut,i), μ
′t,i is the mean
value of log(Et,i) and u′t = log(ut).
As stated by Cryer [17], in time series context it
has been usual to take natural logarithm when increased
dispersion seems to be associated with higher levels of
the series or even when the level of the series is chang-
ing roughly exponentially. Such a behavior can suggest
multiplicative error structures instead of additive ones.
Thus, in the case of multiplicative error-based models, it
is sufficient to work on the logarithm scale of the residual
time series under study. In turn, it is also worthwhile to
mention that this (log-)normality reasoning is in accor-
dance with the basic framework of time series analysis.
The assumption of independent and identically distributed
(iid) normal (for additive) or lognormal (for multiplicative)
distributed errors plays the role in the usual formalisms,
such as ARIMA models.
B. Combining the unbiased ANN models
From the unbiased estimate of ut based on the ith addi-
tive model (Equation 5) yt,i = ut,i − μt,i, i = 1, 2, . . . , k,
the joint distribution of the unbiased forecasts of ut might
be a multivariate normal distribution:
g(yt) =1
(2π)k2
√det(Σ)
exp
(−1
2(yt − ut)
TΣ−1(yt − ut)
)(7)
523523523
Figure 1. Architecture of the proposed approach. The Classical Approach corresponds to usual forecasting methods (from the time lags the methodgenerates the forecasting of the series). The Correction Procedure represents the phase of forecaster error modeling (this phase provides basis forunbiasing the forecasters). The Aggregation Procedure is the phase where the forecasts of the several unbiased models are aggregated.
where yt =
⎛⎜⎜⎜⎝ut,1 − μt,1
ut,2 − μt,2
...
uk,t − μk,t
⎞⎟⎟⎟⎠ , ut =
⎛⎜⎜⎜⎝ut
ut
...
ut
⎞⎟⎟⎟⎠ and
Σ =
⎛⎜⎜⎜⎝σ11 σ12 · · · σ1k
σ21 σ22 · · · σ2k
...... σij
...
σk1 σk2 · · · σkk
⎞⎟⎟⎟⎠ is the covariance matrix
for the ANN residuals. It is straightforward to show that
the ML unbiased estimate for ut in the light of yt is given
by the linear combination
uMLt =
k∑i=1
wi · yt,i (8)
where
wi =
∑kj=1 aij∑k
i=1
∑kj=1 aij
and aij is the jth element of the ith row of the inverse
matrix Σ−1.
It is easy to see that the ML combined estimate for ut
in the light of two ANN estimates yt,1 and yt,2 is given
by the linear combination
uMLt = w1 · yt,1 + w2 · yt,2 (9)
where,
w1 =σ22 − σ12
σ21 + σ2
2 − 2σ12and w2 =
σ21 − σ12
σ21 + σ2
2 − 2σ12.
Therefore, the greater the variance of Yt,k the lesser
the weight of the kth ANN model (in module). In the
same fashion, the greater the dependency between models
Yt,s and Yt,r, the lower their weights in the combined
forecaster when k > 2.
In turn, regarding multiplicative models one must re-
member that Equation 8 operates on the logarithm scale
of the time series. Thus, if this is the case one obtains
the combined unbiased estimator for log(ut) rather than
for ut. Consequently, to exponent u′MLt will lead to the
estimator for ut, which is the median of the combined
model [18].
III. CASE STUDIES
The performance of the proposed approach is illustrated
in this section by using two forecasting ANN models.
The former single model is an ANN of type Multilayer
Perceptron (MLP). The MLP architecture used here is
524524524
composed of 3 nodes in the input layer, 5 nodes in the
hidden layer and 1 node in the output layer (one step
ahead prediction) and it is trained via back propagation
algorithm [19]. The resulting model is named ANN modelhereafter. The latter single model is obtained via TAEF
methodology [1] and is named TAEF model. The TAEF
models result from the search for the optimal ANN to
solve the time series forecasting problem. In this case,
the ANN parameters (such as architecture and the best
training algorithm) are determined via a genetic algorithm.
This combination consists of a hybrid intelligent system
for time series forecasting.
Besides of the ANN and TAEF single models, the
present paper also studies the performance of the proposed
LCF in relation to the well known simple average (SA)
strategy. The SA has been one of the main LCF techniques
in time series forecasting, possibly due to its simplicity of
implementation and accuracy.
A. The real world phenomena
Two phenomena are considered. The former is the Dow
Jones Industrial Average Index Series (DJ), represented
by the time series involving 350 daily observations from
9th April 2002 to 26th August 2003. The latter is the
monthly S&P500 series (SP) consisting of 87 observations,
from March 1996 to August 2003. In order to avoid
computational problems when elaborating the ANN and
TAEF models, the values of both series, DJ and SP,
have been normalized. To evaluate the performance of
the proposed approach, the last 12 points of each series
were left for prediction via the single model strategies
(ANN and TAEF), ML combined estimator and SA model
strategies.
IV. RESULTS
To decide between additive and multiplicative error
models, a goodness-of-fit test has been performed for
the remaining residuals of the recursive ARIMA models
adjusted to both ANN and TAEF predictors errors when
approaching the DJ and SP series. In this way, additive
error models were considered for DJ and multiplicative
ones for SP.
The first rows of Table 1 summarize the respective
adherence of the normal and lognormal distributions to
the remaining residual of the recursive recursive-ARIMA
model adjusted to both ANN and TAEF predictors, cor-
respondingly. In this perspective, it is also important to
emphasize the order of the adjusted recursive ARIMA
models. The need of at least one iteration of the algorithm
indicates the presence of components neglected by the
single ANN predictors, making each one biased. Actually,
in the case of the TAEF adjustment to the SP series
two iterations were required for achieving a random noise
behavior for the residuals.
Anyway, the resulting time-dependent error modeling
for both cases has presented the TAEF models as the best
ones, in terms of statistical efficiency. One can see that
by examining the standard deviation of the residuals of
unbiased TAEF and ANN estimates for both DJ and SP
series.
Regarding the resulting LCF weights, one can see the
discrepancies between the ANN and TAEF. The smallest
statistical efficiency of the former (see the standard devia-
tions) has been reflected in its weight. There are evidences
to advocate the contribution of ANN, since otherwise its
weight would equal zero. In terms of mean squared error
(MSE), the ML aggregated estimates outperform the single
models and SA model when predicting the DJ and SP
series. It must be highlighted that under SA strategy, ANN
and TAEF receive the same weight.
The study of the ratios between the mean squared errors
(MSE) of each of alternative estimates (ANN, TAEF, SA,
and proposed ML combination), has revealed that the ML
combined estimates outperformed the single ones and SA.
For instance, regarding the DJ series, the MSE of the ML
combined estimates is about 17.5% of that one related
to TAEF while considering the SP series such a ratio
achieves 2.0%. Figures 2 and 3 show the behaviour of the
ML combined estimated, SA and single ANN predictions
in contrast with the respective real series. One can see
the contribution of the first ML-step in unbiased TAEF
estimates. Such a model though biased has approached
the trend of the real series.
Figure 2. ANN, TAEF, proposed combined estimate and SA for DJseries.
Therefore, the ML combination weights provide a quan-
titative argument for ranking the single models, leading
TAEF to a superior level than ANN for DJ and SP series.
In this way, it is worthwhile to emphasize the widely
known SA strategy, where the weight of the single models
is constant. It can lead to less attractive results, as can be
seen in Table I.
V. CONCLUSION
This paper has presented a framework for improving
and combining ANN time series models. Firstly, for each
ANN model, the eventual violation against the supposition
of independent and identically distributed random noises
is suppressed by means of a recursive ARIMA-based algo-
rithm. Secondly, a ML combined estimator is considered.
525525525
Table ISUMMARY OF THE UNBIASED SINGLE AND COMBINED MODELS IN FACE OF DJ AND SP SERIES.
Time Series DJ SP
Error StructureAdditive and
normally distributedMultiplicative and
lognormally distributedp-values
(Kolmogorov-Smirnov test)ANN 0.6963 0.3150TAEF 0.8404 0.4659
(p,d,q) of recursiveARIMA
ANN (0, 1, 3) (0, 1, 0)TAEF (0, 0, 10) ((6, 0), (1, 1), (0, 1))
Unbiased errorstandard deviation
ANN 0.0404 0.0115TAEF 0.0023 0.0024
ML weightsANN 0.0049 0.0558TAEF 0.9951 0.9442
MSEANN 1.1 · 10−3 9.8 · 10−4
TAEF 1.2 · 10−5 1.7 · 10−3
ML 2.1 · 10−6 3.4 · 10−5
SA 2.8 · 10−4 1.3 · 10−3
Figure 3. ANN, TAEF, proposed combined estimate and SA for SPseries.
The resulting algorithm can be easily implemented in R-
Program, for example. Case studies involving relatively
small number of predictors and sized series indicate the
usefulness of the resulting two-step ML combined estima-
tor in relation to more established single ANN models and
SA combined model.
ACKNOWLEDGMENT
The authors would like to thank FACEPE, CNPq and
UFRPE by support.
REFERENCES
[1] T. A. E. Ferreira, G. C. Vasconcelos, and P. J. L. Adeodato,“A new intelligent system methodology for time seriesforecasting with artificial neural networks,” Neural ProcessLett, vol. 28, pp. 113–129, 2008.
[2] M. Y. H. G. Zhang, B. Eddy Patuwo, “Forecasting with ar-tificial neural networks: The state of the art,” InternationalJournal of Forecasting, vol. 14, pp. 35–62, 1998.
[3] C. Gerald and S. Dimitri, “Knowledge-based modulariza-tion and global optimization of artificial neural networkmodels in hydrological forecasting,” Neural Networks,vol. 20, no. 4, pp. 528 – 536, 2007.
[4] D. Xiao Niu, H. feng Shi, and D. D. Wu, “Short-termload forecasting using bayesian neural networks learnedby hybrid monte carlo algorithm,” Applied Soft Computing,vol. 12, no. 6, pp. 1822–1827, 2012.
[5] S. P. Neuman, “Maximum likelihood bayesian averagingof uncertain model predictions,” Stochastic EnvironmentalResearch and Risk Assessment, vol. 17, pp. 291–305, 2003.
[6] K. K. L. Lean Yu, Shouyang Wang, “A novel nonlinearensemble forecasting model incorporating glar and annfor foreign exchange rates,” Computers & Operations Re-search, vol. 32, p. 25232541, 2005.
[7] R. Dell’Aquila and E. Ronchetti, “Stock and bond returnpredictability: the discrimination power of model selectioncriteria,” Computational Statistics & Data Analysis, vol. 50,pp. 1478–1495, 2006.
[8] A. Amendola and G. Storti, “A gmm procedure for combin-ing volatility forecasts,” Computational Statistics & DataAnalysis, vol. 52, pp. 3047–3060, 2008.
[9] R. T. Clemen, “Combining forecasts: A review and anno-tated bibliography,” International Journal of Forecasting,vol. 5, pp. 559–583, 1989.
[10] D. I. Jeong and Y.-O. Kim, “Combining single-valuestreamflow forecasts a review and guidelines for selectingtechniques,” Journal of Hydrology, vol. 377, pp. 284–299,2009.
[11] K. F. Wallis, “Combining forecasts - forty years later,”Applied Financial Economics, vol. 21, pp. 33–41, 2011.
[12] L. Rodrigues, F. Doblas-Reyes, and C. Coelho,“Multi-model calibration and combination of tropicalseasonal sea surface temperature forecasts,” ClimateDynamics, pp. 1–20, 2013. [Online]. Available:http://dx.doi.org/10.1007/s00382-013-1779-8
[13] V. Genre, G. Kenny, A. Meyler, and A. Timmermann,“Combining expert forecasts: Can anything beat thesimple average?” International Journal of Forecasting,vol. 29, no. 1, pp. 108 – 121, 2013. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S016920701200088X
[14] A. Timmermann, Forecast Combinations, ser. Handbook ofEconomic Forecasting. Elsevier, December 2006, vol. 1,ch. 4, pp. 135–196.
526526526
[15] P. S. de Mattos Neto, A. R. Lima Junior, andT. A. Ferreira, “Time series forecasting using aperturbative intelligent system,” in Proceedings of the12th annual conference on Genetic and evolutionarycomputation, ser. GECCO ’10. New York, NY,USA: ACM, 2010, pp. 1477–1478. [Online]. Available:http://doi.acm.org/10.1145/1830483.1830755
[16] R. Sitte and J. Sitte, “Neural networks approach to therandom walk dilemma of financial time series,” AppliedIntelligence, vol. 16, no. 3, pp. 163–171, May 2002.
[17] J. D. Cryer and K.-S. Chan, Time series analysis withapplications in R, 2nd ed. New York: Springer, 2008.
[18] N. T. Longford, “Inference with the lognormal distribution,”Journal of Statistical Planning and Inference, vol. 139, pp.2329–2340, 2009.
[19] S. Haykin, Neural Networks: A Comprehensive Foundation.New Jersey: Prentice Hall, 1999.
527527527