lecture 3: exponential smoothing · these are methods which do link to an explicit statistical...

NATCOR: Forecasting & Predictive Analytics

Lecture 3: Exponential Smoothing

John Boylan

Lancaster Centre for Forecasting Department of Management Science

Methods and Models

Forecasting Method A (numerical) procedure for generating a forecast. eg Take the average of all observations up to (and including) time t, as a forecast for time t+1.

Forecasting Model A statistical description of the data generating process. eg All observations are centred around an unchanging mean (μ) with a normally distributed i.i.d. noise term ( ) with zero mean and constant variance (V).

Slide 2 NATCOR – Exponential Smoothing

),0(~ VNtε

tty εµ +=

∑=

+ =t

iitt y

ty

1|1

1ˆ

Link between Models and Methods

Heuristic Methods (No Link) These are methods that have been designed without reference to statistical models and have no link to such models. eg Simple Moving Averages (see later slides).

Model-Based Methods (Linked) These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing (see later slides). eg “Average of all observations” – links to the model on the previous slide.


tty εµ +=

Arithmetic Mean

Slide 4

∑=

+ =t

iitt y

ty

1|1

1ˆ

• Gives equal weight to all observations • Has longest possible ‘memory’. • Reduces noise, as the random fluctuations tend to cancel out. • The more data is available, the longer the average, and the

better the estimation of the mean level. • If the model has held in the past, and continues to

hold over the forecast horizon, then the Arithmetic Mean is the ‘best’ forecast for time t+1.

What forecast should be used for time t+2 ? NATCOR – Exponential Smoothing

tty εµ +=

Forecasting with the Arithmetic Mean

Slide 5

10 20 30 40 50 60 70 80 90350

400

450

500

550

600

650

Month

Uni

ts

• Forecast becomes more stable as time progresses • If model holds, forecast accuracy depends on level of

“noise” in the model error term

NATCOR – Exponential Smoothing

1 step-ahead forecasts:

83|84,...,2|3,1|2 ˆˆˆ yyy1 - 12 step-ahead f/casts:

84|96,...,84|86,84|85 ˆˆˆ yyy11|2ˆ yy =

2/)(ˆ 212|3 yyy +=

Arithmetic Mean and Outliers

Slide 6

10 20 30 40 50 60 70 80 90 100 110100

200

300

400

500

600

ActualsArithmetic Mean


• The Arithmetic Mean becomes more robust to outliers as the length of history grows.

• The “weight” given to the outlier is only 1/t, where t is the length of the history used in calculating the mean.

Outlier

Arithmetic Mean and Level Shifts


• The Arithmetic Mean is poor at handling level shifts. • The method has a long memory. • It cannot “forget” the previous level and adjust to the new

level within a reasonable period of time.

Level Shift occurs here

10 20 30 40 50 60 70 80 90 100 11050

100

150

200

250

300

350

ActualsArithmetic Mean

Random Walk Model


ttt yy ε+= −1

• Mean level no longer constant (see graph) • The next noise term ( ) is not forecastable at time • ‘Best’ forecast of is to use the latest observation ( )

1+tεty1+ty

t

Naïve Forecast


• Naïve does not “filter” the noise - it copies the noise. • Arithmetic Mean good at filtering noise but unresponsive

to level shifts. The Naïve method is the opposite.

ttt yy =+ |1ˆ

10 20 30 40 50 60 70 80 90350

400

450

500

550

600

650

Month

Uni

ts

Alternative Approach: Simple Moving Averages


• Gives equal weight to all of the last N observations in the average:

• ‘Memory’ depends on length of Simple Moving Average • Unlike Arithmetic Mean and Naïve methods, the Simple Moving

Average has a parameter (N) that needs to be determined. • Higher N values filter noise better but respond more slowly to

level shifts. • Method is not model-based but may still perform more accurately

than some model-based methods (eg Naïve).

∑+−=

+ =t

Ntiitt y

Ny

1|1

1ˆ

Difference between Simple and Centred Moving Average


• Simple Moving Average (SMA) of length 3 takes the average of the first three observations as a forecast for the fourth period.

• Centred Moving Average (CMA) of length 3 takes the average of the first three observations as an estimate of the underlying model at the second period.

Simple Moving Average

Effect of Length of SMA


10 20 30 40 50 60 70 80 90 350

400

450

500

550

600

650

Month

Uni

ts

Actuals SMA(6) SMA(12) SMA(24)

• Different lengths of SMA may produce quite different forecasts. • Best choice of length depends on whether it is more important

to filter noise or respond to level shifts.

SMA and Outliers

Slide 13

10 20 30 40 50 60 70 80 90 100 110 100

150

200

250

300

350

400

450

500

550

Actuals SMA(6) SMA(12) SMA(24)

• Robustness of SMA to outliers depends on length of SMA • The longer the SMA, the more robust is the forecast to outlying

observations. NATCOR – Exponential Smoothing

SMA and Level Shifts

Slide 14

10 20 30 40 50 60 70 80 90 100 11050

100

150

200

250

300

350

ActualsMA(6)MA(12)MA(24)


• Adaptation of SMA to level shifts depends on length of SMA • It will take N periods for an SMA to fully adapt to a new level

(where N is the length of the SMA).

Choice of Length (Order) of SMA


• Best length of SMA not known in advance • Times series graph may give some clues but cannot

determine best length of SMA from this alone. • Need to compare accuracy of SMA using different lengths.

We experiment on past data, but only using data that would have been available at the time to calculate our forecasts. Issues to resolve

1. What error measure? 2. How many steps-ahead? 3. Over what time period?

Error Measures (h-step-ahead forecasts)

Slide 16

Mean Squared Error (MSE)

Mean Absolute Error (MAE)

∑ ∑−

=

−

=+++++++ −==

1

0

1

0

2|

2 )ˆ(11 m

j

m

jjtjhtjhtjht yy

me

mMSE

∑ ∑−

=

−

=+++++++ −==

1

0

1

0|ˆ11 m

j

m

jjtjhtjhtjht yy

me

mMAE


Mean Absolute Percentage Error (MAPE) ∑ ∑

−

=

−

= ++

+++++

++

++ −==

1

0

1

0

|ˆ100100 m

j

m

j jht

jtjhtjht

jht

jht

yyy

mye

mMAPE

1. Choice of Error Measure to determine length of SMA


• Most common choice is MSE.

• MSE is the error measure used in times series theory to link models to methods which are ‘optimal’ (Minimum Mean Square Error, MMSE) for that model.

• This is what was meant by ‘best’ forecast in earlier slides.

• MSE also links to the AIC measure for model selection (discussed later).

• However, results can be sensitive to outlying observations.

2. Choice of Forecast Horizon (h) to determine length of SMA


• Most common choice is one-step-ahead.

• If we are only interested in (say) 3-step-ahead errors, then we may minimise MSE for 3-step-ahead forecasts.

• Often, we are interested in 1-step, 2-step and 3-step-ahead errors (say). Then minimising MSE for 1-step-ahead forecasts ‘stands in’ for the other two horizons.

• Alternative approaches, taking into account all the relevant horizons, are currently being researched by the Lancaster Centre for Forecasting.

3. Choice of Time Period over which to determine length of SMA


Jan77 Jan79 Jan81 Jan83 Jan85 Jan87 2000

4000

6000

8000

10000

12000US exports of upper and lining leather

Month

Uni

ts

In-sample Out-of-sample

DataIn-sample forecastOut-of-sample forecastForecast origin

• Divide history into ‘in-sample’ (training set) and ‘out-of-sample’ (test set).

• Use in-sample to determine length of SMA • Use out-of-sample to compare SMA with other methods

Example Series


• Open Exponential Smoothing Exercise spreadsheet at first tab

(Data Visualisation) for these series in Columns A and C.

• Two additional series – High Noise, and High Noise with Level Shift are in Columns B and D.

60.0070.0080.0090.00

100.00110.00120.00130.00140.00

Jan

2012

Apr 2

012

Jul 2

012

Oct

201

2

Jan

2013

Apr 2

013

Jul 2

013

Oct

201

3

Jan

2014

Apr 2

014

Jul 2

014

Oct

201

4

Jan

2015

Apr 2

015

Jul 2

015

Oct

201

5

Medium Noise

80.00100.00120.00140.00160.00180.00200.00220.00240.00260.00

Jan

2012

Apr 2

012

Jul 2

012

Oct

201

2

Jan

2013

Apr 2

013

Jul 2

013

Oct

201

3

Jan

2014

Apr 2

014

Jul 2

014

Oct

201

4

Jan

2015

Apr 2

015

Jul 2

015

Oct

201

5

Medium Noise with Level Shift

Fixed Forecasts and Rolling Forecasts in Out-of-Sample


Jan77 Jan79 Jan81 Jan83 Jan85 Jan87 2000

4000

6000

8000

10000

12000US exports of upper and lining leather

Month

Uni

ts

In-sample Out-of-sample

DataIn-sample forecastOut-of-sample forecastForecast origin

• Graph shows fixed forecasts, made at the Forecast Origin (ie one 1-step-ahead f/cast, one 2-step-ahead f/cast etc).

• Rolling forecasts are made at the Origin, then at the Origin plus one period, Origin plus two periods etc.

Split between In-Sample and Out-of-Sample


Fixed Forecasts

• Accuracy will be assessed for all forecast horizons out-of-sample, with each f/cast made at the Forecast Origin.

• So, out-of-sample length should be set to be equal to the longest forecast horizon.

Rolling Forecasts

Trade off between:

1. Longer in-sample lengths allow more accurate assessment of the optimal parameter (length of SMA).

2. Longer out-of-sample lengths allow for more accurate comparisons of different methods if using Rolling Forecasts .

Data Splitting in EXCEL


Data Splitting Exercise

• Open spreadsheet at 2nd tab (2. Data Splitting)

• Experiment with different “In-sample sizes”

(Cell K2, or use the slider bar below) for both: • Medium Noise • Medium Noise with Level Shift.

• What effect would changing the “In-sample size” have on estimation of length of SMA in the Training Set and evaluation of forecast accuracy in the Test Set?

Simple Exponential Smoothing (SES)


Suppose data does not have seasonality or systematic trend Data may have outliers and/or level shifts.

Exponential Smoothing adjusts the last forecast by a fraction (α) of the last forecast error:

Example • Previous Forecast = 100 • Previous Actual = 90 • Previous Error = -10 • Smoothing Constant (α) = 0.2 • New Forecast = 100 + (0.2 x (-10)) = 98

ttttt eyy α+= −+ 1||1 ˆˆ

SES: Error Correction & Standard Forms

Slide 25 MSCI 523 – Exponential Smoothing

Error Correction Form

Standard Form

Substitute for the error expression in Error Correction Form:

This is a weighted average of the last actual and last forecast.

ttttt eyy α+= −+ 1||1 ˆˆ

1|ˆ −−= tttt yye

1|1||1 ˆˆˆ −−+ −+= ttttttt yyyy αα

1||1 ˆ)1(ˆ −+ −+= ttttt yyy αα

Calculation of SES


• Initialise Forecast in period 2 by using Naïve method. • Can then optimise α (0 ≤ α ≤ 1). • Alternatively, can optimise both Initial Forecast and α.

Period Actual SES(0.3) Sqd Error SES(0.7) Sqd Error 1 90 2 85 90.0 25.0 90.0 25.0 3 83 88.5 30.3 86.5 12.3 4 92 86.9 26.5 84.1 63.2 5 98 88.4 92.3 89.6 70.3 6 81 91.3 105.6 95.5 209.8 7 94 88.2 33.7 85.3 74.9 8 150 89.9 3607.7 91.4 3433.5 9 86 108.0 482.0 132.4 2154.9

10 90 101.4 129.2 99.9 98.5 11 104 98.0 93.0 12 96 98.0 93.0

Overall MSE 563.4 764.7

SES in EXCEL


SES Exercise

• Make sure you have the Solver Add-In (File, Options, Add-Ins, Solver Add-In, OK) • Open spreadsheet at 6th tab: 6. Exponential Smoothing) • Select “Medium Noise with Level Shift” at 2nd tab and then

return to 6th tab. Input 24 to Cell X3 (In Sample Size). • Initialise forecast (naïve) in Cell C3 • Calculate Training Set 1-step-ahead forecasts (C4:C26) • Note that Test Set forecasts are all the same as C26.

• Experiment with different alpha values (Cell P3) • Optimise alpha, and check Test Set accuracy

How SES addresses Noise


• Low smoothing constants (alpha values) filter noise. • High smoothing constants have little filtering effect. • BUT: high smoothing constants react more quickly to

level shifts.

SES and Trended Series


1948 1958 1968 1978 1988 1998 2008 2018 200000

400000

600000

800000

1000000

1200000

1400000UK Gross Domestic Product: chained volume measures

Year

GD

P

Alpha = 0.2

1948 1958 1968 1978 1988 1998 2008 2018 200000

400000

600000

800000

1000000

1200000

1400000UK Gross Domestic Product: chained volume measures

Year

GD

P

Alpha = 0.7

• With a low alpha, SES does not keep up with trend and produces a poor forecast.

• With a high alpha, SES keeps up better, but is not filtering the noise well and produces a forecast that could be improved.

SES and Seasonal Series


10/26/0810/27/0810/28/0810/29/0810/30/0810/31/0810/26/08 40000

50000

60000

70000

80000

90000

100000

110000UK Hourly Electricity Demand

Day

Dem

and

10/26/0810/27/0810/28/0810/29/0810/30/0810/31/0810/26/08 40000

50000

60000

70000

80000

90000

100000

110000UK Hourly Electricity Demand

DayD

eman

d

Alpha = 0.2 Alpha = 0.7

• With a low alpha, seasonality is not captured. • With a high alpha, the noise is not smoothed AND the seasonal

pattern is out by one period. • In both cases, the forecasts are poor.

Is SES a Model-Based Method?


• It is sometimes stated that SES is an ‘ad hoc’ or ‘heuristic’ method, lacking a model-based foundation.

• This is wrong!

• It is true that when SES was first proposed, the method lacked a model foundation.

• Since then, two model forms have been found to underpin SES: • ARIMA(0,1,1) Model • State Space Local Level Model

• Model formulations become useful when looking at a whole

family of Exponential Smoothing models (including trend and seasonality).

Summary


• Arithmetic Mean robust to outliers but very slow to respond to level shifts.

• Naïve responds immediately to level shifts but does not filter noise.

• Simple Moving Average (SMA) may be a good compromise but is not part of a wider family of model-based methods.

• Simple Exponential Smoothing (SES) allows suitable weights to be identified for past data and is part of a wider family of model-based methods.

lecture 3: exponential smoothing · these are methods which do link to an explicit statistical...

Documents