[ieee 2013 brics congress on computational intelligence & 11th brazilian congress on...

Combination of Biased Artificial Neural Network Forecasters

Thaıze F. Oliveira∗, Ricardo T. A. de Oliveira†,Paulo Renato A. Firmino‡, Paulo S. G. de Mattos Neto§, and Tiago A. E. Ferreira¶

Department of Statistics and InformaticsFederal Rural University of Pernambuco52171-900, Recife, Pernambuco, Brazil

∗ [email protected] † [email protected]‡ [email protected] § [email protected] ¶ [email protected]

Abstract—Artificial neural networks (ANN) have beenparamount for modeling and forecasting time series phe-nomena. In this way it has been usual to suppose that eachANN model generates a white noise as prediction error.However, mostly because of disturbances not captured byeach model, it is yet possible that such supposition is violated.On the other hand, to adopt a single ANN model may leadto statistical bias and underestimation of uncertainty. Thepresent paper introduces a two-step maximum likelihoodmethod for correcting and combining ANN models. Applica-tions involving single ANN models for Dow Jones IndustrialAverage Index and S&P500 series illustrate the usefulness ofthe proposed framework.

Keywords-Time Series Forecasting Models; Unbiased Fore-casts; Maximum Likelihood Estimation; Linear Combinationof Forecast.

I. INTRODUCTION

Approaches based on artificial neural networks (ANN)

for time series forecasting have produced convincing re-

sults in recent decades [1]–[4]. In this way, it has been

usual to spend considerable computational resources in

order to select the best ANN model [1], [3], [4]. Therefore,

much of ANN literature implicitly assume that there is

a true ANN model for the time series description, only

resting to infer its parameters. In other terms, this mod-

eling perspective only deals with parameters uncertainty

and neglects the model uncertainty.

However, it might be very difficult to find a unique and

true model for a given time series. Some authors [5] have

even pointed out that adopting just one model may lead

to statistical bias and underestimation of uncertainty. With

those arguments in mind, model uncertainty seems to be

present in any time series analysis.

Currently, model uncertainty research has been in the

vanguard of time series analysis. Under this reasoning

some authors [6]–[8] have rejected the hypothesis that

a unique and true model is achievable. Instead, these

researchers have been challenged by the attempt of com-

bining diverse models in order to present aggregated

forecasts. In this way, reviews from model uncertainty

literature [9]–[14] have emphasized linear combination of

forecasters (LCF). In LCF the combined estimator is given

by a weighted average of single models, where the weight

of each model is usually a function of the variance of its

residuals and its correlation with other models.

Generally, LCF are based on the assumption that the

residuals of each single model are caused by random

chocks, characterizing white noises, or unpredictable, in-

dependent, and unbiased terms. However, mostly because

of the heterogeneity of the phenomenon under study or

even due to disturbances not captured by the models, it

is yet possible that such suppositions are violated when

applying ANN models [15], [16], for instance.

The present paper illustrates cases involving ANN mod-

els for Dow Jones Industrial Average Index (DJ) series

and S&P500 series, where the white noise supposition

is violated and suggests a model uncertainty approach to

overcome the problem. Specifically, it is presented a two-

step LCF model.

II. METHODOLOGY

The proposed framework can be stated from three parts

(see Fig. 1): the Classical Approach, Correction Procedure

and Aggregation Procedure. The first step, i.e. the classical

approach, is not object of study of this paper. In this

step, each single forecasting model is elaborated. In fact,

the single models forecasts must be seen as input for

the proposed approach. Thus, the single models can be

considered black-boxes where the parameters and structure

is neglected.

In the Correction Procedure step, the residuals of each

ANN model are modeled via a recursive ARIMA-based

algorithm. The purpose of this step is to capture any

trend of the phenomena of interest not enveloped by

the original predictive models. Therefore, if necessary,

the prediction of the error series is used to correct the

respective estimates of the ANN forecaster for the future

values of the series (Forecastck in Fig. 1).

Finally, in the Aggregation Procedure step, the forecasts

of the corrected models are combined by the maximum

likelihood (ML) estimation method. In this procedure,

each model receives a weight proportional to its statistical

efficiency and independence with regard to the remaining

predictors. The weights are used to combine the models

according to its performance, where the larger the weight

(in module), the better the prediction. Therefore, the input

forecasters are unbiased, reinforcing the importance of

the correction phase and the output of this procedure is

the combination of the corrected single models forecasts

(Forecasta in Fig. 1).

1st BRICS Countries Congress on Computational Intelligence

978-1-4799-3194-1/13 $31.00 © 2013 IEEE

DOI 10.1109/BRICS-CCI.&.CBIC.2013.86

522

1st BRICS Countries Congress on Computational Intelligence

978-1-4799-3194-1/13 $31.00 © 2013 IEEE

DOI 10.1109/BRICS-CCI.&.CBIC.2013.86

522

2013 BRICS Congress on Computational Intelligence & 11th Brazilian Congress on Computational Intelligence

978-1-4799-3194-1/13 $31.00 © 2013 IEEE

DOI 10.1109/BRICS-CCI-CBIC.2013.92

522

These steps are presented as follows. Considering kANN models, let ut, Ut,i, and Et,i (i = 1, 2, . . . , k) be

in this order a time series to be predicted at time t, the

ith estimator (ANN forecaster) of ut and its respective

random error (bias). In order to perform the correction

and combination steps two error structures are taken into

account:

Et,i = Ut,i − ut (1)

namely additive error model and

Et,i =Ut,i

ut(2)

so-called multiplicative error model.

A. Correcting ANN models

1) Recursive ARIMA algorithm: The present paper

deals with biased predictive models and proposes the

adjustment of (log-)linear relationships in the residuals

of each model, under the supposition of constant vari-

ability, by means of the ARIMA formalism. The bias of

each model is modeled according to a recursive maxi-

mum likelihood (ML)-based ARIMA algorithm in order

to achieve a white noise behavior in the residuals (or

errors) of prediction. At each iteration of the algorithm the

best ML-ARIMA adjustment of the remaining residuals

is determined according to a given information criterion

(e.g. Akaike) until an ARIMA(p = 0, d = 0, q = 0)

is achieved (where p, d, and q respectively represent the

autoregressive, integrated, and moving-average orders of

the model).

Let ARIMAa(p,d,q) be the representation of the re-

cursive ARIMA model related to the residuals of a given

ANN model, represented by e, where the model index

is suppressed for clarity. Under this notation, a ARIMA

sub-models are involved in such a way that the remaining

error is a random noise, i.e. the (a + 1)th sub-model is

an ARIMA(0, 0, 0). Thus, let one consider a recursion

involving a iterations until the achievement of a random

noise. The order of the jth fitted model is (pj , dj , qj).In this way, if a = 0 then the initial forecaster error is

considered a random noise. Mathematically,

Et = μt + εt =a∑

j=1

mt,j + εt (3)

where εt ∼ Normal(0, σ) and mt,j is the mean value of

the ARIMA model adjusted to the jth remaining error. For

instance, it is considered the residuals e of a given pre-

dictor, a = 2 and an ARIMA2(p = (1, 0),d = (1, 1),q =(1, 0)) adjustment, one has Et =

∑2j=1 mt,j + εt where

mt,1 = θ(e)0 + et−1 + φ(e)(et−1 − et−2) + θ

(e)1 τt−1 is the

ARIMA(1, 1, 1) estimate of Et in the light of e. In other

words, mt,1 is the best ARIMA model adjusted to the

data e according to a given Information Criterion and its

parameters (θ(e)0 , θ

(e)1 , φ(e)) are inferred via ML estimation

while τt is the remaining residual of this adjustment. In

turn, τt = et−mt,1 and mt,2 = θτ0 +τt−1+εt is the mean

value of the ARIMA(0, 1, 0) adjusted to the sample error

τττ = (τ1, τ2, . . . , τt, . . . , τn), selected based on the adopted

Information Criterion. Finally, the best model to adjust εtis an ARIMA(0, 0, 0), characterizing εt as a random noise.

2) Additive and normally distributed errors: Regarding

the additive error structure in Equation 1, the following

model is obtained:

Ut,i = Et,i + ut. (4)

It is then supposed that Ut,i ∼ Normal(μt,i+ut, σi), where

μt,i is the mean value of Et,i and it is adjusted from

the proposed recursive ML-ARIMA model (Subsection

II-A1). Thus, σi is the standard deviation of the random

noise resulting from the recursive ML-ARIMA model. In

this way, Et,i ∼ Normal(μt,i, σi) and one can yet to

consider the approximation

Ut,i ∼ Normal(μt,i + ut, σi), (5)

for t > n, where n is the length of the observed time

series.

3) Multiplicative and log-normally distributed errors:Now, focusing on the cases where Et,i is a multiplicative

error (Equation 2) and it is supposed to follow a lognormal

distribution, one has E′t,i ∼ Normal(μ

′t,i, σ

′i) and

U′t,i ∼ Normal(μ

′t,i + u

′t, σ

′i), (6)

where E′t,i = log(Et,i), U

′t,i = log(Ut,i), μ

′t,i is the mean

value of log(Et,i) and u′t = log(ut).

As stated by Cryer [17], in time series context it

has been usual to take natural logarithm when increased

dispersion seems to be associated with higher levels of

the series or even when the level of the series is chang-

ing roughly exponentially. Such a behavior can suggest

multiplicative error structures instead of additive ones.

Thus, in the case of multiplicative error-based models, it

is sufficient to work on the logarithm scale of the residual

time series under study. In turn, it is also worthwhile to

mention that this (log-)normality reasoning is in accor-

dance with the basic framework of time series analysis.

The assumption of independent and identically distributed

(iid) normal (for additive) or lognormal (for multiplicative)

distributed errors plays the role in the usual formalisms,

such as ARIMA models.

B. Combining the unbiased ANN models

From the unbiased estimate of ut based on the ith addi-

tive model (Equation 5) yt,i = ut,i − μt,i, i = 1, 2, . . . , k,

the joint distribution of the unbiased forecasts of ut might

be a multivariate normal distribution:

g(yt) =1

(2π)k2

√det(Σ)

exp

(−1

2(yt − ut)

TΣ−1(yt − ut)

)(7)

523523523

Figure 1. Architecture of the proposed approach. The Classical Approach corresponds to usual forecasting methods (from the time lags the methodgenerates the forecasting of the series). The Correction Procedure represents the phase of forecaster error modeling (this phase provides basis forunbiasing the forecasters). The Aggregation Procedure is the phase where the forecasts of the several unbiased models are aggregated.

where yt =

⎛⎜⎜⎜⎝ut,1 − μt,1

ut,2 − μt,2

...

uk,t − μk,t

⎞⎟⎟⎟⎠ , ut =

⎛⎜⎜⎜⎝ut

ut

...

ut

⎞⎟⎟⎟⎠ and

Σ =

⎛⎜⎜⎜⎝σ11 σ12 · · · σ1k

σ21 σ22 · · · σ2k

...... σij

...

σk1 σk2 · · · σkk

⎞⎟⎟⎟⎠ is the covariance matrix

for the ANN residuals. It is straightforward to show that

the ML unbiased estimate for ut in the light of yt is given

by the linear combination

uMLt =

k∑i=1

wi · yt,i (8)

where

wi =

∑kj=1 aij∑k

i=1

∑kj=1 aij

and aij is the jth element of the ith row of the inverse

matrix Σ−1.

It is easy to see that the ML combined estimate for ut

in the light of two ANN estimates yt,1 and yt,2 is given

by the linear combination

uMLt = w1 · yt,1 + w2 · yt,2 (9)

where,

w1 =σ22 − σ12

σ21 + σ2

2 − 2σ12and w2 =

σ21 − σ12

σ21 + σ2

2 − 2σ12.

Therefore, the greater the variance of Yt,k the lesser

the weight of the kth ANN model (in module). In the

same fashion, the greater the dependency between models

Yt,s and Yt,r, the lower their weights in the combined

forecaster when k > 2.

In turn, regarding multiplicative models one must re-

member that Equation 8 operates on the logarithm scale

of the time series. Thus, if this is the case one obtains

the combined unbiased estimator for log(ut) rather than

for ut. Consequently, to exponent u′MLt will lead to the

estimator for ut, which is the median of the combined

model [18].

III. CASE STUDIES

The performance of the proposed approach is illustrated

in this section by using two forecasting ANN models.

The former single model is an ANN of type Multilayer

Perceptron (MLP). The MLP architecture used here is

524524524

composed of 3 nodes in the input layer, 5 nodes in the

hidden layer and 1 node in the output layer (one step

ahead prediction) and it is trained via back propagation

algorithm [19]. The resulting model is named ANN modelhereafter. The latter single model is obtained via TAEF

methodology [1] and is named TAEF model. The TAEF

models result from the search for the optimal ANN to

solve the time series forecasting problem. In this case,

the ANN parameters (such as architecture and the best

training algorithm) are determined via a genetic algorithm.

This combination consists of a hybrid intelligent system

for time series forecasting.

Besides of the ANN and TAEF single models, the

present paper also studies the performance of the proposed

LCF in relation to the well known simple average (SA)

strategy. The SA has been one of the main LCF techniques

in time series forecasting, possibly due to its simplicity of

implementation and accuracy.

A. The real world phenomena

Two phenomena are considered. The former is the Dow

Jones Industrial Average Index Series (DJ), represented

by the time series involving 350 daily observations from

9th April 2002 to 26th August 2003. The latter is the

monthly S&P500 series (SP) consisting of 87 observations,

from March 1996 to August 2003. In order to avoid

computational problems when elaborating the ANN and

TAEF models, the values of both series, DJ and SP,

have been normalized. To evaluate the performance of

the proposed approach, the last 12 points of each series

were left for prediction via the single model strategies

(ANN and TAEF), ML combined estimator and SA model

strategies.

IV. RESULTS

To decide between additive and multiplicative error

models, a goodness-of-fit test has been performed for

the remaining residuals of the recursive ARIMA models

adjusted to both ANN and TAEF predictors errors when

approaching the DJ and SP series. In this way, additive

error models were considered for DJ and multiplicative

ones for SP.

The first rows of Table 1 summarize the respective

adherence of the normal and lognormal distributions to

the remaining residual of the recursive recursive-ARIMA

model adjusted to both ANN and TAEF predictors, cor-

respondingly. In this perspective, it is also important to

emphasize the order of the adjusted recursive ARIMA

models. The need of at least one iteration of the algorithm

indicates the presence of components neglected by the

single ANN predictors, making each one biased. Actually,

in the case of the TAEF adjustment to the SP series

two iterations were required for achieving a random noise

behavior for the residuals.

Anyway, the resulting time-dependent error modeling

for both cases has presented the TAEF models as the best

ones, in terms of statistical efficiency. One can see that

by examining the standard deviation of the residuals of

unbiased TAEF and ANN estimates for both DJ and SP

series.

Regarding the resulting LCF weights, one can see the

discrepancies between the ANN and TAEF. The smallest

statistical efficiency of the former (see the standard devia-

tions) has been reflected in its weight. There are evidences

to advocate the contribution of ANN, since otherwise its

weight would equal zero. In terms of mean squared error

(MSE), the ML aggregated estimates outperform the single

models and SA model when predicting the DJ and SP

series. It must be highlighted that under SA strategy, ANN

and TAEF receive the same weight.

The study of the ratios between the mean squared errors

(MSE) of each of alternative estimates (ANN, TAEF, SA,

and proposed ML combination), has revealed that the ML

combined estimates outperformed the single ones and SA.

For instance, regarding the DJ series, the MSE of the ML

combined estimates is about 17.5% of that one related

to TAEF while considering the SP series such a ratio

achieves 2.0%. Figures 2 and 3 show the behaviour of the

ML combined estimated, SA and single ANN predictions

in contrast with the respective real series. One can see

the contribution of the first ML-step in unbiased TAEF

estimates. Such a model though biased has approached

the trend of the real series.

Figure 2. ANN, TAEF, proposed combined estimate and SA for DJseries.

Therefore, the ML combination weights provide a quan-

titative argument for ranking the single models, leading

TAEF to a superior level than ANN for DJ and SP series.

In this way, it is worthwhile to emphasize the widely

known SA strategy, where the weight of the single models

is constant. It can lead to less attractive results, as can be

seen in Table I.

V. CONCLUSION

This paper has presented a framework for improving

and combining ANN time series models. Firstly, for each

ANN model, the eventual violation against the supposition

of independent and identically distributed random noises

is suppressed by means of a recursive ARIMA-based algo-

rithm. Secondly, a ML combined estimator is considered.

525525525

Table ISUMMARY OF THE UNBIASED SINGLE AND COMBINED MODELS IN FACE OF DJ AND SP SERIES.

Time Series DJ SP

Error StructureAdditive and

normally distributedMultiplicative and

lognormally distributedp-values

(Kolmogorov-Smirnov test)ANN 0.6963 0.3150TAEF 0.8404 0.4659

(p,d,q) of recursiveARIMA

ANN (0, 1, 3) (0, 1, 0)TAEF (0, 0, 10) ((6, 0), (1, 1), (0, 1))

Unbiased errorstandard deviation

ANN 0.0404 0.0115TAEF 0.0023 0.0024

ML weightsANN 0.0049 0.0558TAEF 0.9951 0.9442

MSEANN 1.1 · 10−3 9.8 · 10−4

TAEF 1.2 · 10−5 1.7 · 10−3

ML 2.1 · 10−6 3.4 · 10−5

SA 2.8 · 10−4 1.3 · 10−3

Figure 3. ANN, TAEF, proposed combined estimate and SA for SPseries.

The resulting algorithm can be easily implemented in R-

Program, for example. Case studies involving relatively

small number of predictors and sized series indicate the

usefulness of the resulting two-step ML combined estima-

tor in relation to more established single ANN models and

SA combined model.

ACKNOWLEDGMENT

The authors would like to thank FACEPE, CNPq and

UFRPE by support.

REFERENCES

[1] T. A. E. Ferreira, G. C. Vasconcelos, and P. J. L. Adeodato,“A new intelligent system methodology for time seriesforecasting with artificial neural networks,” Neural ProcessLett, vol. 28, pp. 113–129, 2008.

[2] M. Y. H. G. Zhang, B. Eddy Patuwo, “Forecasting with ar-tificial neural networks: The state of the art,” InternationalJournal of Forecasting, vol. 14, pp. 35–62, 1998.

[3] C. Gerald and S. Dimitri, “Knowledge-based modulariza-tion and global optimization of artificial neural networkmodels in hydrological forecasting,” Neural Networks,vol. 20, no. 4, pp. 528 – 536, 2007.

[4] D. Xiao Niu, H. feng Shi, and D. D. Wu, “Short-termload forecasting using bayesian neural networks learnedby hybrid monte carlo algorithm,” Applied Soft Computing,vol. 12, no. 6, pp. 1822–1827, 2012.

[5] S. P. Neuman, “Maximum likelihood bayesian averagingof uncertain model predictions,” Stochastic EnvironmentalResearch and Risk Assessment, vol. 17, pp. 291–305, 2003.

[6] K. K. L. Lean Yu, Shouyang Wang, “A novel nonlinearensemble forecasting model incorporating glar and annfor foreign exchange rates,” Computers & Operations Re-search, vol. 32, p. 25232541, 2005.

[7] R. Dell’Aquila and E. Ronchetti, “Stock and bond returnpredictability: the discrimination power of model selectioncriteria,” Computational Statistics & Data Analysis, vol. 50,pp. 1478–1495, 2006.

[8] A. Amendola and G. Storti, “A gmm procedure for combin-ing volatility forecasts,” Computational Statistics & DataAnalysis, vol. 52, pp. 3047–3060, 2008.

[9] R. T. Clemen, “Combining forecasts: A review and anno-tated bibliography,” International Journal of Forecasting,vol. 5, pp. 559–583, 1989.

[10] D. I. Jeong and Y.-O. Kim, “Combining single-valuestreamflow forecasts a review and guidelines for selectingtechniques,” Journal of Hydrology, vol. 377, pp. 284–299,2009.

[11] K. F. Wallis, “Combining forecasts - forty years later,”Applied Financial Economics, vol. 21, pp. 33–41, 2011.

[12] L. Rodrigues, F. Doblas-Reyes, and C. Coelho,“Multi-model calibration and combination of tropicalseasonal sea surface temperature forecasts,” ClimateDynamics, pp. 1–20, 2013. [Online]. Available:http://dx.doi.org/10.1007/s00382-013-1779-8

[13] V. Genre, G. Kenny, A. Meyler, and A. Timmermann,“Combining expert forecasts: Can anything beat thesimple average?” International Journal of Forecasting,vol. 29, no. 1, pp. 108 – 121, 2013. [Online]. Available:

http://www.sciencedirect.com/science/article/pii/S016920701200088X

[14] A. Timmermann, Forecast Combinations, ser. Handbook ofEconomic Forecasting. Elsevier, December 2006, vol. 1,ch. 4, pp. 135–196.

526526526

[15] P. S. de Mattos Neto, A. R. Lima Junior, andT. A. Ferreira, “Time series forecasting using aperturbative intelligent system,” in Proceedings of the12th annual conference on Genetic and evolutionarycomputation, ser. GECCO ’10. New York, NY,USA: ACM, 2010, pp. 1477–1478. [Online]. Available:http://doi.acm.org/10.1145/1830483.1830755

[16] R. Sitte and J. Sitte, “Neural networks approach to therandom walk dilemma of financial time series,” AppliedIntelligence, vol. 16, no. 3, pp. 163–171, May 2002.

[17] J. D. Cryer and K.-S. Chan, Time series analysis withapplications in R, 2nd ed. New York: Springer, 2008.

[18] N. T. Longford, “Inference with the lognormal distribution,”Journal of Statistical Planning and Inference, vol. 139, pp.2329–2340, 2009.

[19] S. Haykin, Neural Networks: A Comprehensive Foundation.New Jersey: Prentice Hall, 1999.

527527527

[ieee 2013 brics congress on computational intelligence & 11th brazilian congress on...

Documents