lecture 3: exponential smoothing · these are methods which do link to an explicit statistical...

32
NATCOR: Forecasting & Predictive Analytics Lecture 3: Exponential Smoothing John Boylan Lancaster Centre for Forecasting Department of Management Science

Upload: others

Post on 21-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

NATCOR: Forecasting & Predictive Analytics

Lecture 3: Exponential Smoothing

John Boylan

Lancaster Centre for Forecasting Department of Management Science

Page 2: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Methods and Models

Forecasting Method A (numerical) procedure for generating a forecast. eg Take the average of all observations up to (and including) time t, as a forecast for time t+1.

Forecasting Model A statistical description of the data generating process. eg All observations are centred around an unchanging mean (μ) with a normally distributed i.i.d. noise term ( ) with zero mean and constant variance (V).

Slide 2 NATCOR – Exponential Smoothing

),0(~ VNtε

tty εµ +=

∑=

+ =t

iitt y

ty

1|1

Page 3: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Link between Models and Methods

Heuristic Methods (No Link) These are methods that have been designed without reference to statistical models and have no link to such models. eg Simple Moving Averages (see later slides).

Model-Based Methods (Linked) These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing (see later slides). eg “Average of all observations” – links to the model on the previous slide.

Slide 3 NATCOR – Exponential Smoothing

tty εµ +=

Page 4: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Arithmetic Mean

Slide 4

∑=

+ =t

iitt y

ty

1|1

• Gives equal weight to all observations • Has longest possible ‘memory’. • Reduces noise, as the random fluctuations tend to cancel out. • The more data is available, the longer the average, and the

better the estimation of the mean level. • If the model has held in the past, and continues to

hold over the forecast horizon, then the Arithmetic Mean is the ‘best’ forecast for time t+1.

What forecast should be used for time t+2 ? NATCOR – Exponential Smoothing

tty εµ +=

Page 5: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Forecasting with the Arithmetic Mean

Slide 5

10 20 30 40 50 60 70 80 90350

400

450

500

550

600

650

Month

Uni

ts

• Forecast becomes more stable as time progresses • If model holds, forecast accuracy depends on level of

“noise” in the model error term

NATCOR – Exponential Smoothing

1 step-ahead forecasts:

83|84,...,2|3,1|2 ˆˆˆ yyy1 - 12 step-ahead f/casts:

84|96,...,84|86,84|85 ˆˆˆ yyy11|2ˆ yy =

2/)(ˆ 212|3 yyy +=

Page 6: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Arithmetic Mean and Outliers

Slide 6

10 20 30 40 50 60 70 80 90 100 110100

200

300

400

500

600

ActualsArithmetic Mean

NATCOR – Exponential Smoothing

• The Arithmetic Mean becomes more robust to outliers as the length of history grows.

• The “weight” given to the outlier is only 1/t, where t is the length of the history used in calculating the mean.

Outlier

Page 7: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Arithmetic Mean and Level Shifts

Slide 7 NATCOR – Exponential Smoothing

• The Arithmetic Mean is poor at handling level shifts. • The method has a long memory. • It cannot “forget” the previous level and adjust to the new

level within a reasonable period of time.

Level Shift occurs here

10 20 30 40 50 60 70 80 90 100 11050

100

150

200

250

300

350

ActualsArithmetic Mean

Page 8: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Random Walk Model

Slide 8 NATCOR – Exponential Smoothing

ttt yy ε+= −1

• Mean level no longer constant (see graph) • The next noise term ( ) is not forecastable at time • ‘Best’ forecast of is to use the latest observation ( )

1+tεty1+ty

t

Page 9: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Naïve Forecast

Slide 9 NATCOR – Exponential Smoothing

• Naïve does not “filter” the noise - it copies the noise. • Arithmetic Mean good at filtering noise but unresponsive

to level shifts. The Naïve method is the opposite.

ttt yy =+ |1ˆ

10 20 30 40 50 60 70 80 90350

400

450

500

550

600

650

Month

Uni

ts

Page 10: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Alternative Approach: Simple Moving Averages

Slide 10 NATCOR – Exponential Smoothing

• Gives equal weight to all of the last N observations in the average:

• ‘Memory’ depends on length of Simple Moving Average • Unlike Arithmetic Mean and Naïve methods, the Simple Moving

Average has a parameter (N) that needs to be determined. • Higher N values filter noise better but respond more slowly to

level shifts. • Method is not model-based but may still perform more accurately

than some model-based methods (eg Naïve).

∑+−=

+ =t

Ntiitt y

Ny

1|1

Page 11: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Difference between Simple and Centred Moving Average

Slide 11 NATCOR – Exponential Smoothing

• Simple Moving Average (SMA) of length 3 takes the average of the first three observations as a forecast for the fourth period.

• Centred Moving Average (CMA) of length 3 takes the average of the first three observations as an estimate of the underlying model at the second period.

Simple Moving Average

Page 12: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Effect of Length of SMA

Slide 12 NATCOR – Exponential Smoothing

10 20 30 40 50 60 70 80 90 350

400

450

500

550

600

650

Month

Uni

ts

Actuals SMA(6) SMA(12) SMA(24)

• Different lengths of SMA may produce quite different forecasts. • Best choice of length depends on whether it is more important

to filter noise or respond to level shifts.

Page 13: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

SMA and Outliers

Slide 13

10 20 30 40 50 60 70 80 90 100 110 100

150

200

250

300

350

400

450

500

550

Actuals SMA(6) SMA(12) SMA(24)

• Robustness of SMA to outliers depends on length of SMA • The longer the SMA, the more robust is the forecast to outlying

observations. NATCOR – Exponential Smoothing

Page 14: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

SMA and Level Shifts

Slide 14

10 20 30 40 50 60 70 80 90 100 11050

100

150

200

250

300

350

ActualsMA(6)MA(12)MA(24)

NATCOR – Exponential Smoothing

• Adaptation of SMA to level shifts depends on length of SMA • It will take N periods for an SMA to fully adapt to a new level

(where N is the length of the SMA).

Page 15: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Choice of Length (Order) of SMA

Slide 15 NATCOR – Exponential Smoothing

• Best length of SMA not known in advance • Times series graph may give some clues but cannot

determine best length of SMA from this alone. • Need to compare accuracy of SMA using different lengths.

We experiment on past data, but only using data that would have been available at the time to calculate our forecasts. Issues to resolve

1. What error measure? 2. How many steps-ahead? 3. Over what time period?

Page 16: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Error Measures (h-step-ahead forecasts)

Slide 16

Mean Squared Error (MSE)

Mean Absolute Error (MAE)

∑ ∑−

=

=+++++++ −==

1

0

1

0

2|

2 )ˆ(11 m

j

m

jjtjhtjhtjht yy

me

mMSE

∑ ∑−

=

=+++++++ −==

1

0

1

0|ˆ11 m

j

m

jjtjhtjhtjht yy

me

mMAE

NATCOR – Exponential Smoothing

Mean Absolute Percentage Error (MAPE) ∑ ∑

=

= ++

+++++

++

++ −==

1

0

1

0

|ˆ100100 m

j

m

j jht

jtjhtjht

jht

jht

yyy

mye

mMAPE

Page 17: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

1. Choice of Error Measure to determine length of SMA

Slide 17 NATCOR – Exponential Smoothing

• Most common choice is MSE.

• MSE is the error measure used in times series theory to link models to methods which are ‘optimal’ (Minimum Mean Square Error, MMSE) for that model.

• This is what was meant by ‘best’ forecast in earlier slides.

• MSE also links to the AIC measure for model selection (discussed later).

• However, results can be sensitive to outlying observations.

Page 18: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

2. Choice of Forecast Horizon (h) to determine length of SMA

Slide 18 NATCOR – Exponential Smoothing

• Most common choice is one-step-ahead.

• If we are only interested in (say) 3-step-ahead errors, then we may minimise MSE for 3-step-ahead forecasts.

• Often, we are interested in 1-step, 2-step and 3-step-ahead errors (say). Then minimising MSE for 1-step-ahead forecasts ‘stands in’ for the other two horizons.

• Alternative approaches, taking into account all the relevant horizons, are currently being researched by the Lancaster Centre for Forecasting.

Page 19: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

3. Choice of Time Period over which to determine length of SMA

Slide 19 NATCOR – Exponential Smoothing

Jan77 Jan79 Jan81 Jan83 Jan85 Jan87 2000

4000

6000

8000

10000

12000US exports of upper and lining leather

Month

Uni

ts

In-sample Out-of-sample

DataIn-sample forecastOut-of-sample forecastForecast origin

• Divide history into ‘in-sample’ (training set) and ‘out-of-sample’ (test set).

• Use in-sample to determine length of SMA • Use out-of-sample to compare SMA with other methods

Page 20: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Example Series

Slide 20 NATCOR – Exponential Smoothing

• Open Exponential Smoothing Exercise spreadsheet at first tab

(Data Visualisation) for these series in Columns A and C.

• Two additional series – High Noise, and High Noise with Level Shift are in Columns B and D.

60.0070.0080.0090.00

100.00110.00120.00130.00140.00

Jan

2012

Apr 2

012

Jul 2

012

Oct

201

2

Jan

2013

Apr 2

013

Jul 2

013

Oct

201

3

Jan

2014

Apr 2

014

Jul 2

014

Oct

201

4

Jan

2015

Apr 2

015

Jul 2

015

Oct

201

5

Medium Noise

80.00100.00120.00140.00160.00180.00200.00220.00240.00260.00

Jan

2012

Apr 2

012

Jul 2

012

Oct

201

2

Jan

2013

Apr 2

013

Jul 2

013

Oct

201

3

Jan

2014

Apr 2

014

Jul 2

014

Oct

201

4

Jan

2015

Apr 2

015

Jul 2

015

Oct

201

5

Medium Noise with Level Shift

Page 21: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Fixed Forecasts and Rolling Forecasts in Out-of-Sample

Slide 21 NATCOR – Exponential Smoothing

Jan77 Jan79 Jan81 Jan83 Jan85 Jan87 2000

4000

6000

8000

10000

12000US exports of upper and lining leather

Month

Uni

ts

In-sample Out-of-sample

DataIn-sample forecastOut-of-sample forecastForecast origin

• Graph shows fixed forecasts, made at the Forecast Origin (ie one 1-step-ahead f/cast, one 2-step-ahead f/cast etc).

• Rolling forecasts are made at the Origin, then at the Origin plus one period, Origin plus two periods etc.

Page 22: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Split between In-Sample and Out-of-Sample

Slide 22 NATCOR – Exponential Smoothing

Fixed Forecasts

• Accuracy will be assessed for all forecast horizons out-of-sample, with each f/cast made at the Forecast Origin.

• So, out-of-sample length should be set to be equal to the longest forecast horizon.

Rolling Forecasts

Trade off between:

1. Longer in-sample lengths allow more accurate assessment of the optimal parameter (length of SMA).

2. Longer out-of-sample lengths allow for more accurate comparisons of different methods if using Rolling Forecasts .

Page 23: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Data Splitting in EXCEL

Slide 23 NATCOR – Exponential Smoothing

Data Splitting Exercise

• Open spreadsheet at 2nd tab (2. Data Splitting)

• Experiment with different “In-sample sizes”

(Cell K2, or use the slider bar below) for both: • Medium Noise • Medium Noise with Level Shift.

• What effect would changing the “In-sample size” have on estimation of length of SMA in the Training Set and evaluation of forecast accuracy in the Test Set?

Page 24: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Simple Exponential Smoothing (SES)

Slide 24 NATCOR – Exponential Smoothing

Suppose data does not have seasonality or systematic trend Data may have outliers and/or level shifts.

Exponential Smoothing adjusts the last forecast by a fraction (α) of the last forecast error:

Example • Previous Forecast = 100 • Previous Actual = 90 • Previous Error = -10 • Smoothing Constant (α) = 0.2 • New Forecast = 100 + (0.2 x (-10)) = 98

ttttt eyy α+= −+ 1||1 ˆˆ

Page 25: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

SES: Error Correction & Standard Forms

Slide 25 MSCI 523 – Exponential Smoothing

Error Correction Form

Standard Form

Substitute for the error expression in Error Correction Form:

This is a weighted average of the last actual and last forecast.

ttttt eyy α+= −+ 1||1 ˆˆ

1|ˆ −−= tttt yye

1|1||1 ˆˆˆ −−+ −+= ttttttt yyyy αα

1||1 ˆ)1(ˆ −+ −+= ttttt yyy αα

Page 26: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Calculation of SES

Slide 26 NATCOR – Exponential Smoothing

• Initialise Forecast in period 2 by using Naïve method. • Can then optimise α (0 ≤ α ≤ 1). • Alternatively, can optimise both Initial Forecast and α.

Period Actual SES(0.3) Sqd Error SES(0.7) Sqd Error 1 90 2 85 90.0 25.0 90.0 25.0 3 83 88.5 30.3 86.5 12.3 4 92 86.9 26.5 84.1 63.2 5 98 88.4 92.3 89.6 70.3 6 81 91.3 105.6 95.5 209.8 7 94 88.2 33.7 85.3 74.9 8 150 89.9 3607.7 91.4 3433.5 9 86 108.0 482.0 132.4 2154.9

10 90 101.4 129.2 99.9 98.5 11 104 98.0 93.0 12 96 98.0 93.0

Overall MSE 563.4 764.7

Page 27: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

SES in EXCEL

Slide 27 NATCOR – Exponential Smoothing

SES Exercise

• Make sure you have the Solver Add-In (File, Options, Add-Ins, Solver Add-In, OK) • Open spreadsheet at 6th tab: 6. Exponential Smoothing) • Select “Medium Noise with Level Shift” at 2nd tab and then

return to 6th tab. Input 24 to Cell X3 (In Sample Size). • Initialise forecast (naïve) in Cell C3 • Calculate Training Set 1-step-ahead forecasts (C4:C26) • Note that Test Set forecasts are all the same as C26.

• Experiment with different alpha values (Cell P3) • Optimise alpha, and check Test Set accuracy

Page 28: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

How SES addresses Noise

Slide 28 NATCOR – Exponential Smoothing

• Low smoothing constants (alpha values) filter noise. • High smoothing constants have little filtering effect. • BUT: high smoothing constants react more quickly to

level shifts.

Page 29: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

SES and Trended Series

Slide 29 NATCOR – Exponential Smoothing

1948 1958 1968 1978 1988 1998 2008 2018 200000

400000

600000

800000

1000000

1200000

1400000UK Gross Domestic Product: chained volume measures

Year

GD

P

Alpha = 0.2

1948 1958 1968 1978 1988 1998 2008 2018 200000

400000

600000

800000

1000000

1200000

1400000UK Gross Domestic Product: chained volume measures

Year

GD

P

Alpha = 0.7

• With a low alpha, SES does not keep up with trend and produces a poor forecast.

• With a high alpha, SES keeps up better, but is not filtering the noise well and produces a forecast that could be improved.

Page 30: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

SES and Seasonal Series

Slide 30 NATCOR – Exponential Smoothing

10/26/0810/27/0810/28/0810/29/0810/30/0810/31/0810/26/08 40000

50000

60000

70000

80000

90000

100000

110000UK Hourly Electricity Demand

Day

Dem

and

10/26/0810/27/0810/28/0810/29/0810/30/0810/31/0810/26/08 40000

50000

60000

70000

80000

90000

100000

110000UK Hourly Electricity Demand

DayD

eman

d

Alpha = 0.2 Alpha = 0.7

• With a low alpha, seasonality is not captured. • With a high alpha, the noise is not smoothed AND the seasonal

pattern is out by one period. • In both cases, the forecasts are poor.

Page 31: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Is SES a Model-Based Method?

Slide 31 NATCOR – Exponential Smoothing

• It is sometimes stated that SES is an ‘ad hoc’ or ‘heuristic’ method, lacking a model-based foundation.

• This is wrong!

• It is true that when SES was first proposed, the method lacked a model foundation.

• Since then, two model forms have been found to underpin SES: • ARIMA(0,1,1) Model • State Space Local Level Model

• Model formulations become useful when looking at a whole

family of Exponential Smoothing models (including trend and seasonality).

Page 32: Lecture 3: Exponential Smoothing · These are methods which do link to an explicit statistical model and give the ‘best’ forecast if the model holds. eg Simple Exponential Smoothing

Summary

Slide 32 NATCOR – Exponential Smoothing

• Arithmetic Mean robust to outliers but very slow to respond to level shifts.

• Naïve responds immediately to level shifts but does not filter noise.

• Simple Moving Average (SMA) may be a good compromise but is not part of a wider family of model-based methods.

• Simple Exponential Smoothing (SES) allows suitable weights to be identified for past data and is part of a wider family of model-based methods.