5 - linear regression models (sections 5.5 - 5.9) - winonacourse1.winona.edu/bdeppa/fin...

61
5 - Linear Regression Models (Sections 5.5 - 5.9) Rob Hyndman (with Deppa edits) February 26, 2021 Table of Contents 5.5 - Selecting Predictors........................................... 2 Adjusted R 2 or Radj 2...............................................2 Example 5.1 - U.S. Beverage Sales.................................3 Example 5.2 - Growth Rates of Personal Consumption & Personal Income in the US..................................................6 Example 5.3 - Average Daily Sales (Fastenal)......................7 Cross-Validation...................................................12 Example 5.1 - U.S. Beverage Sales (cont’d).......................12 Akaike’s Information Criterion (AIC)...............................19 Corrected Akaike’s Information Criterion (AICc)....................19 Schwarz’s Bayesian Information Criterion (BIC).....................19 Best subset regression.............................................22 Stepwise Regression................................................23 Beware of inference after selecting predictors.....................23 5.6 - Forecasting with Regression Models............................23 Ex-ante versus ex-post forecasts...................................24 Example 5.4 - Australian Quarterly Beer Production...............24 Scenario-Based Forecasting.........................................26 Example 5.2 - Growth Rates of Personal Consumption & Personal Income in the US (cont’d)........................................26 Building a Predictive Regression Model with Time Series Predictors (Looking Ahead)....................................................29 Prediction Intervals...............................................30 Example 5.2 - Growth Rates of Personal Consumption & Personal Income in the US (cont’d)........................................31 Example 5.1 - U.S. Beverage Sales (cont’d).......................33 1

Upload: others

Post on 31-Jan-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

5 - Linear Regression Models (Sections 5.5 - 5.9)

5 - Linear Regression Models (Sections 5.5 - 5.9)

Rob Hyndman (with Deppa edits)

February 26, 2021

Table of Contents5.5 - Selecting Predictors2Adjusted or 2Example 5.1 - U.S. Beverage Sales3Example 5.2 - Growth Rates of Personal Consumption & Personal Income in the US6Example 5.3 - Average Daily Sales (Fastenal)7Cross-Validation12Example 5.1 - U.S. Beverage Sales (cont’d)12Akaike’s Information Criterion (AIC)19Corrected Akaike’s Information Criterion (AICc)19Schwarz’s Bayesian Information Criterion (BIC)19Best subset regression22Stepwise Regression23Beware of inference after selecting predictors235.6 - Forecasting with Regression Models23Ex-ante versus ex-post forecasts24Example 5.4 - Australian Quarterly Beer Production24Scenario-Based Forecasting26Example 5.2 - Growth Rates of Personal Consumption & Personal Income in the US (cont’d)26Building a Predictive Regression Model with Time Series Predictors (Looking Ahead)29Prediction Intervals30Example 5.2 - Growth Rates of Personal Consumption & Personal Income in the US (cont’d)31Example 5.1 - U.S. Beverage Sales (cont’d)335.8 More Nonlinear Terms36Transformations36Log-log model36Example 5.5 - Liquor Store Sales (A Big Example)37

5.5 - Selecting Predictors

When there are many possible predictors, we need some strategy for selecting the best predictors to use in a regression model.

A common approach that is not recommended is to plot the forecast variable against a particular predictor and if there is no noticeable relationship, drop that predictor from the model. This is invalid because it is not always possible to see the relationship from a scatterplot, especially when the effects of other predictors have not been accounted for.

Another common approach which is also invalid is to do a multiple linear regression on all the predictors and disregard all variables whose values are greater than . To start with, statistical significance does not always indicate predictive value. Even if forecasting is not the goal, this is not a good strategy because the values can be misleading when two or more predictors are correlated with each other (see Section 5.9).

Instead, we will use a measure of predictive accuracy. Five such measures are introduced in this section.

Adjusted or

Computer output for a regression will always give the value, discussed in Section 5.1. However, it is not a good measure of the predictive ability of a model. Imagine a model which produces forecasts that are exactly of the actual values. In that case, the value would be 1 (indicating perfect correlation), but the forecasts are not close to the actual values (they are off)!!!

In addition, does not allow for “degrees of freedom” to be taken into account. Adding any variable or term to model will increase the value of , even if that variable is irrelevant. For these reasons, forecasters should not use to determine whether a model will give good predictions.

An equivalent idea is to select the model which gives the minimum sum of squared errors (SSE) or residual sum of squares (RSS), given by

Minimizing the RSS/SSE is equivalent to maximizing as the R-square is given by

and will always choose the model with the most variables, and so is not a valid way of selecting predictors.

An alternative which is designed to overcome these problems is the adjusted or is given by

where is the number of observations in the time series, is the number of terms in our model besides the intercept, and is the usual R-square. The adjusted will no longer increase with each added predictor. Using this measure, the best model will be the one with the largest value of .

Maximizing the adjusted is equivalent to minimizing the standard error , which is given by the equation below,

Maximizing the adjusted works quite well as a method of selecting predictors, although it does tend to err on the side of selecting too many predictors.

Example 5.1 - U.S. Beverage Sales

In our work with these data in the previous R Markdown file we fit models with seasonal (monthly) dummy variables and long-term trends that were linear, quadratic, and cubic in time. How do these models compare on the basis of the ?

require(fpp2)

## Loading required package: fpp2

## Loading required package: ggplot2

## Loading required package: forecast

## Loading required package: fma

## Loading required package: expsmooth

Beverage = read.csv(file="http://course1.winona.edu/bdeppa/FIN%20335/Datasets/US%20Beverage%20Sales.csv")names(Beverage)

## [1] "Time" "Month" "Year" "Date" "Sales"

head(Beverage)

## Time Month Year Date Sales## 1 1 1 1992 1/1/1992 3519## 2 2 2 1992 2/1/1992 3803## 3 3 3 1992 3/1/1992 4332## 4 4 4 1992 4/1/1992 4251## 5 5 5 1992 5/1/1992 4661## 6 6 6 1992 6/1/1992 4811

BevSales = ts(Beverage$Sales,start=1992,frequency=12)BevSales = BevSales/monthdays(BevSales)

autoplot(BevSales) + ggtitle("Monthly US Beverage Sales") + xlab("Year") + ylab("Beverage Sales")

mod1 = tslm(BevSales~trend+season)mod2 = tslm(BevSales~poly(trend,2)+season)mod3 = tslm(BevSales~poly(trend,3)+season)summary(mod1)

## ## Call:## tslm(formula = BevSales ~ trend + season)## ## Residuals:## Min 1Q Median 3Q Max ## -17.1738 -4.1825 -0.0461 4.0747 18.5953 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.084e+02 1.762e+00 61.525 < 2e-16 ***## trend 4.124e-01 8.867e-03 46.505 < 2e-16 ***## season2 2.127e+01 2.252e+00 9.445 < 2e-16 ***## season3 2.094e+01 2.252e+00 9.299 < 2e-16 ***## season4 2.843e+01 2.252e+00 12.624 < 2e-16 ***## season5 3.870e+01 2.252e+00 17.183 < 2e-16 ***## season6 4.849e+01 2.253e+00 21.525 < 2e-16 ***## season7 3.049e+01 2.253e+00 13.534 < 2e-16 ***## season8 3.763e+01 2.253e+00 16.704 < 2e-16 ***## season9 3.290e+01 2.253e+00 14.601 < 2e-16 ***## season10 2.162e+01 2.254e+00 9.593 < 2e-16 ***## season11 2.310e+01 2.254e+00 10.247 < 2e-16 ***## season12 1.348e+01 2.254e+00 5.981 1.31e-08 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 6.168 on 167 degrees of freedom## Multiple R-squared: 0.9456, Adjusted R-squared: 0.9417 ## F-statistic: 241.9 on 12 and 167 DF, p-value: < 2.2e-16

summary(mod2)

## ## Call:## tslm(formula = BevSales ~ poly(trend, 2) + season)## ## Residuals:## Min 1Q Median 3Q Max ## -16.2721 -4.2924 -0.4943 4.2403 15.3598 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 145.704 1.544 94.337 < 2e-16 ***## poly(trend, 2)1 287.466 5.992 47.973 < 2e-16 ***## poly(trend, 2)2 20.457 5.979 3.421 0.000784 ***## season2 21.279 2.183 9.746 < 2e-16 ***## season3 20.954 2.183 9.598 < 2e-16 ***## season4 28.449 2.183 13.030 < 2e-16 ***## season5 38.721 2.183 17.733 < 2e-16 ***## season6 48.505 2.184 22.213 < 2e-16 ***## season7 30.508 2.184 13.970 < 2e-16 ***## season8 37.652 2.184 17.239 < 2e-16 ***## season9 32.916 2.184 15.069 < 2e-16 ***## season10 21.629 2.185 9.901 < 2e-16 ***## season11 23.102 2.185 10.573 < 2e-16 ***## season12 13.483 2.185 6.170 5.04e-09 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 5.979 on 166 degrees of freedom## Multiple R-squared: 0.9492, Adjusted R-squared: 0.9452 ## F-statistic: 238.5 on 13 and 166 DF, p-value: < 2.2e-16

summary(mod3)

## ## Call:## tslm(formula = BevSales ~ poly(trend, 3) + season)## ## Residuals:## Min 1Q Median 3Q Max ## -12.6894 -3.5073 -0.2272 3.2039 12.7256 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 146.213 1.301 112.387 < 2e-16 ***## poly(trend, 3)1 287.750 5.042 57.071 < 2e-16 ***## poly(trend, 3)2 20.457 5.031 4.066 7.38e-05 ***## poly(trend, 3)3 42.150 5.057 8.335 2.88e-14 ***## season2 21.185 1.837 11.532 < 2e-16 ***## season3 20.767 1.837 11.304 < 2e-16 ***## season4 28.169 1.837 15.331 < 2e-16 ***## season5 38.349 1.838 20.868 < 2e-16 ***## season6 48.042 1.838 26.136 < 2e-16 ***## season7 29.953 1.839 16.290 < 2e-16 ***## season8 37.004 1.839 20.119 < 2e-16 ***## season9 32.176 1.840 17.487 < 2e-16 ***## season10 20.797 1.841 11.298 < 2e-16 ***## season11 22.177 1.842 12.041 < 2e-16 ***## season12 12.464 1.843 6.764 2.22e-10 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 5.031 on 165 degrees of freedom## Multiple R-squared: 0.9642, Adjusted R-squared: 0.9612 ## F-statistic: 317.8 on 14 and 165 DF, p-value: < 2.2e-16

By comparing the values we see that the despite being the most complex model with the largest number of terms in the model, the cubic long-term trend model has the highest adjusted .

Example 5.2 - Growth Rates of Personal Consumption & Personal Income in the US

The dataset uschange in the fpp2 library contains percentage changes in quarterly personal consumption expenditure, personal disposable income, production, savings and the unemployment rate for the US, 1960 to 2016. These data are also available on FRED. These data are currently stored as four time series objects. For the purposes of developing regression models using these data will convert this dataset to a data frame.

require(fpp2)mod1 = tslm(Consumption ~ Income + Production + Savings + Unemployment, data = uschange)mod2 = tslm(Consumption ~ Income + Savings + Unemployment,data=uschange)summary(mod1)

## ## Call:## tslm(formula = Consumption ~ Income + Production + Savings + ## Unemployment, data = uschange)## ## Residuals:## Min 1Q Median 3Q Max ## -0.88296 -0.17638 -0.03679 0.15251 1.20553 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.26729 0.03721 7.184 1.68e-11 ***## Income 0.71449 0.04219 16.934 < 2e-16 ***## Production 0.04589 0.02588 1.773 0.0778 . ## Savings -0.04527 0.00278 -16.287 < 2e-16 ***## Unemployment -0.20477 0.10550 -1.941 0.0538 . ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 0.3286 on 182 degrees of freedom## Multiple R-squared: 0.754, Adjusted R-squared: 0.7486 ## F-statistic: 139.5 on 4 and 182 DF, p-value: < 2.2e-16

summary(mod2)

## ## Call:## tslm(formula = Consumption ~ Income + Savings + Unemployment, ## data = uschange)## ## Residuals:## Min 1Q Median 3Q Max ## -0.82491 -0.17737 -0.02716 0.14406 1.25913 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.281017 0.036607 7.677 9.47e-13 ***## Income 0.730497 0.041455 17.622 < 2e-16 ***## Savings -0.045990 0.002766 -16.629 < 2e-16 ***## Unemployment -0.341346 0.072526 -4.707 4.96e-06 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 0.3305 on 183 degrees of freedom## Multiple R-squared: 0.7497, Adjusted R-squared: 0.7456 ## F-statistic: 182.7 on 3 and 183 DF, p-value: < 2.2e-16

Again we see that more complicated model, having more terms, has the larger value, but only slightly.

Example 5.3 - Average Daily Sales (Fastenal)

When exploring the use of an Intervention Term in modeling the monthly average daily sales for Fastenal between 2004-2013, we fit several increasingly complex models to this time series.

Fastenal = read.csv("http://course1.winona.edu/bdeppa/FIN%20335/Datasets/Fastenal%20Sales%20(2004-2013).csv")names(Fastenal)

## [1] "Time" "Month" "Month.Num" ## [4] "Year" "NumBDays" "AvSalesPD" ## [7] "Total.Sales" "Total.Fastner" "Total.Nonfastner"

TotSales = Fastenal$Total.SalesTotSales = ts(TotSales,start=2004,frequency=12)TotSales = TotSales/1000000AvgSalesDay = TotSales/Fastenal$NumBDaysautoplot(AvgSalesDay) + xlab("Year") + ylab("Avg Sales Per Business Day (millions)")

# Create intervention term (pre vs. post 2009)t = time(AvgSalesDay)t## Jan Feb Mar Apr May Jun Jul## 2004 2004.000 2004.083 2004.167 2004.250 2004.333 2004.417 2004.500## 2005 2005.000 2005.083 2005.167 2005.250 2005.333 2005.417 2005.500## 2006 2006.000 2006.083 2006.167 2006.250 2006.333 2006.417 2006.500## 2007 2007.000 2007.083 2007.167 2007.250 2007.333 2007.417 2007.500## 2008 2008.000 2008.083 2008.167 2008.250 2008.333 2008.417 2008.500## 2009 2009.000 2009.083 2009.167 2009.250 2009.333 2009.417 2009.500## 2010 2010.000 2010.083 2010.167 2010.250 2010.333 2010.417 2010.500## 2011 2011.000 2011.083 2011.167 2011.250 2011.333 2011.417 2011.500## 2012 2012.000 2012.083 2012.167 2012.250 2012.333 2012.417 2012.500## 2013 2013.000 2013.083 2013.167 2013.250 2013.333 2013.417 2013.500## Aug Sep Oct Nov Dec## 2004 2004.583 2004.667 2004.750 2004.833 2004.917## 2005 2005.583 2005.667 2005.750 2005.833 2005.917## 2006 2006.583 2006.667 2006.750 2006.833 2006.917## 2007 2007.583 2007.667 2007.750 2007.833 2007.917## 2008 2008.583 2008.667 2008.750 2008.833 2008.917## 2009 2009.583 2009.667 2009.750 2009.833 2009.917## 2010 2010.583 2010.667 2010.750 2010.833 2010.917## 2011 2011.583 2011.667 2011.750 2011.833 2011.917## 2012 2012.583 2012.667 2012.750 2012.833 2012.917## 2013 2013.583 2013.667 2013.750 2013.833 2013.917

d2009 = as.numeric(t>2009)d2009

## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0## [36] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1## [71] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1## [106] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

td = ts(d2009,start=2004)

# No shiftfast.mod0 = tslm(AvgSalesDay~trend+season)summary(fast.mod0)

## ## Call:## tslm(formula = AvgSalesDay ~ trend + season)## ## Residuals:## Min 1Q Median 3Q Max ## -1.7329 -0.3189 0.1807 0.4858 1.0286 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.318742 0.256926 16.809 <2e-16 ***## trend 0.059361 0.001954 30.372 <2e-16 ***## season2 -0.002059 0.330024 -0.006 0.9950 ## season3 0.265279 0.330041 0.804 0.4233 ## season4 0.186757 0.330070 0.566 0.5727 ## season5 0.306577 0.330111 0.929 0.3551 ## season6 0.458594 0.330163 1.389 0.1677 ## season7 0.192444 0.330226 0.583 0.5613 ## season8 0.419758 0.330302 1.271 0.2065 ## season9 0.633968 0.330388 1.919 0.0577 . ## season10 0.433746 0.330486 1.312 0.1922 ## season11 -0.142484 0.330596 -0.431 0.6673 ## season12 -0.716150 0.330718 -2.165 0.0326 * ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 0.7379 on 107 degrees of freedom## Multiple R-squared: 0.8989, Adjusted R-squared: 0.8876 ## F-statistic: 79.27 on 12 and 107 DF, p-value: < 2.2e-16

# Shift only, no changes in trend or seasonal patternfast.mod1 = tslm(AvgSalesDay~trend+td+season)summary(fast.mod1)

## ## Call:## tslm(formula = AvgSalesDay ~ trend + td + season)## ## Residuals:## Min 1Q Median 3Q Max ## -1.69920 -0.16692 0.02304 0.18605 0.67554 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.607945 0.136701 26.393 < 2e-16 ***## trend 0.089980 0.002007 44.836 < 2e-16 ***## td -2.433156 0.138574 -17.558 < 2e-16 ***## season2 0.210638 0.168155 1.253 0.213093 ## season3 0.447356 0.168047 2.662 0.008975 ** ## season4 0.338214 0.167963 2.014 0.046583 * ## season5 0.427416 0.167903 2.546 0.012347 * ## season6 0.548814 0.167867 3.269 0.001454 ** ## season7 0.252044 0.167855 1.502 0.136185 ## season8 0.448739 0.167867 2.673 0.008702 ** ## season9 0.632329 0.167903 3.766 0.000273 ***## season10 0.401488 0.167963 2.390 0.018598 * ## season11 -0.205361 0.168047 -1.222 0.224402 ## season12 -0.809646 0.168155 -4.815 4.92e-06 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 0.375 on 106 degrees of freedom## Multiple R-squared: 0.9741, Adjusted R-squared: 0.971 ## F-statistic: 307.1 on 13 and 106 DF, p-value: < 2.2e-16

# Shift and trend change after intervention - no change in seasonal patternfast.mod2 = tslm(AvgSalesDay~trend*td+season)summary(fast.mod2)

## ## Call:## tslm(formula = AvgSalesDay ~ trend * td + season)## ## Residuals:## Min 1Q Median 3Q Max ## -1.35700 -0.15728 -0.00371 0.16385 0.63783 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.961557 0.120477 32.882 < 2e-16 ***## trend 0.078574 0.002228 35.265 < 2e-16 ***## td -3.937620 0.230027 -17.118 < 2e-16 ***## season2 0.213737 0.136364 1.567 0.120030 ## season3 0.449836 0.136277 3.301 0.001317 ** ## season4 0.340074 0.136208 2.497 0.014089 * ## season5 0.428655 0.136160 3.148 0.002139 ** ## season6 0.549433 0.136130 4.036 0.000103 ***## season7 0.252044 0.136121 1.852 0.066890 . ## season8 0.448119 0.136130 3.292 0.001356 ** ## season9 0.631089 0.136160 4.635 1.03e-05 ***## season10 0.399628 0.136208 2.934 0.004111 ** ## season11 -0.207841 0.136277 -1.525 0.130231 ## season12 -0.812746 0.136364 -5.960 3.41e-08 ***## trend:td 0.024054 0.003209 7.496 2.19e-11 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 0.3041 on 105 degrees of freedom## Multiple R-squared: 0.9831, Adjusted R-squared: 0.9809 ## F-statistic: 437.6 on 14 and 105 DF, p-value: < 2.2e-16

# Shift, trend, and seasonal pattern change after 2009fast.mod3 = tslm(AvgSalesDay~trend*td*season-trend:td:season-trend:season)summary(fast.mod3)

## ## Call:## tslm(formula = AvgSalesDay ~ trend * td * season - trend:td:season - ## trend:season)## ## Residuals:## Min 1Q Median 3Q Max ## -1.26289 -0.15615 0.02302 0.13448 0.60532 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.895098 0.146694 26.553 < 2e-16 ***## trend 0.078120 0.002313 33.769 < 2e-16 ***## td -3.796127 0.308574 -12.302 < 2e-16 ***## season2 0.247407 0.190161 1.301 0.196426 ## season3 0.430885 0.190035 2.267 0.025656 * ## season4 0.408567 0.189936 2.151 0.034034 * ## season5 0.482849 0.189866 2.543 0.012620 * ## season6 0.611788 0.189823 3.223 0.001745 ** ## season7 0.407616 0.189809 2.148 0.034326 * ## season8 0.521845 0.189823 2.749 0.007166 ** ## season9 0.715122 0.189866 3.766 0.000289 ***## season10 0.529715 0.189936 2.789 0.006402 ** ## season11 -0.063630 0.190035 -0.335 0.738495 ## season12 -0.617889 0.190161 -3.249 0.001606 ** ## trend:td 0.025009 0.003359 7.445 4.59e-11 ***## td:season2 -0.107348 0.283770 -0.378 0.706065 ## td:season3 -0.002158 0.283591 -0.008 0.993945 ## td:season4 -0.177094 0.283451 -0.625 0.533630 ## td:season5 -0.148545 0.283352 -0.524 0.601345 ## td:season6 -0.164915 0.283292 -0.582 0.561867 ## td:season7 -0.351400 0.283272 -1.241 0.217877 ## td:season8 -0.187757 0.283292 -0.663 0.509101 ## td:season9 -0.208420 0.283352 -0.736 0.463836 ## td:season10 -0.300577 0.283451 -1.060 0.291671 ## td:season11 -0.328874 0.283591 -1.160 0.249118 ## td:season12 -0.430216 0.283770 -1.516 0.132856 ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 0.3135 on 94 degrees of freedom## Multiple R-squared: 0.984, Adjusted R-squared: 0.9797 ## F-statistic: 230.8 on 25 and 94 DF, p-value: < 2.2e-16

Here we see that the highly parameterized model does not have the best adjusted .

Cross-Validation

Section 3.4 introduced time series cross-validation as a general and useful tool for determining the predictive ability of a forecasting model. The training/test approach is probably the best way to cross-validate a forecast model, and we will explore this approach using regression forecast models in some examples below. Before doing this however we will examine a simple (but not necessarily effective) we to cross-validate a regression model.

For regression models, it is also possible to use classical leave-one-out cross-validation to selection predictors (Bergmeir, Hyndman, and Koo 2018). This is faster and makes more efficient use of the data. The procedure uses the following steps:

1. Remove observation from the data set, and fit the model using the remaining data. Then compute the error () for the omitted observation. It is important to note that this is not the same as the residual because the observation was not used fitting the model to obtain our estimate of . The notation in the subscript means the observation was not used to obtain the fitted or predicted value of .

1. Repeat step 1 for .

1. Compute the MSE from from , we usually call this the PRESS statistic, however the author calls it CV.

Although this looks like a time-consuming procedure, there are very fast methods of calculating CV, so that it takes no longer than fitting one model to the full data set. The equation for computing CV efficiently is given in Section 5.7.

Under this criterion, the “best”" model is the one with the smallest value of CV. The forecast library has a function CV that computes this for a fit to a time series.

Example 5.1 - U.S. Beverage Sales (cont’d)

require(fpp2)autoplot(BevSales) + ggtitle("Monthly US Beverage Sales") + xlab("Year") + ylab("Beverage Sales")

mod1 = tslm(BevSales~trend+season)mod2 = tslm(BevSales~poly(trend,2)+season)mod3 = tslm(BevSales~poly(trend,3)+season)summary(mod1)

## ## Call:## tslm(formula = BevSales ~ trend + season)## ## Residuals:## Min 1Q Median 3Q Max ## -17.1738 -4.1825 -0.0461 4.0747 18.5953 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.084e+02 1.762e+00 61.525 < 2e-16 ***## trend 4.124e-01 8.867e-03 46.505 < 2e-16 ***## season2 2.127e+01 2.252e+00 9.445 < 2e-16 ***## season3 2.094e+01 2.252e+00 9.299 < 2e-16 ***## season4 2.843e+01 2.252e+00 12.624 < 2e-16 ***## season5 3.870e+01 2.252e+00 17.183 < 2e-16 ***## season6 4.849e+01 2.253e+00 21.525 < 2e-16 ***## season7 3.049e+01 2.253e+00 13.534 < 2e-16 ***## season8 3.763e+01 2.253e+00 16.704 < 2e-16 ***## season9 3.290e+01 2.253e+00 14.601 < 2e-16 ***## season10 2.162e+01 2.254e+00 9.593 < 2e-16 ***## season11 2.310e+01 2.254e+00 10.247 < 2e-16 ***## season12 1.348e+01 2.254e+00 5.981 1.31e-08 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 6.168 on 167 degrees of freedom## Multiple R-squared: 0.9456, Adjusted R-squared: 0.9417 ## F-statistic: 241.9 on 12 and 167 DF, p-value: < 2.2e-16

summary(mod2)

## ## Call:## tslm(formula = BevSales ~ poly(trend, 2) + season)## ## Residuals:## Min 1Q Median 3Q Max ## -16.2721 -4.2924 -0.4943 4.2403 15.3598 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 145.704 1.544 94.337 < 2e-16 ***## poly(trend, 2)1 287.466 5.992 47.973 < 2e-16 ***## poly(trend, 2)2 20.457 5.979 3.421 0.000784 ***## season2 21.279 2.183 9.746 < 2e-16 ***## season3 20.954 2.183 9.598 < 2e-16 ***## season4 28.449 2.183 13.030 < 2e-16 ***## season5 38.721 2.183 17.733 < 2e-16 ***## season6 48.505 2.184 22.213 < 2e-16 ***## season7 30.508 2.184 13.970 < 2e-16 ***## season8 37.652 2.184 17.239 < 2e-16 ***## season9 32.916 2.184 15.069 < 2e-16 ***## season10 21.629 2.185 9.901 < 2e-16 ***## season11 23.102 2.185 10.573 < 2e-16 ***## season12 13.483 2.185 6.170 5.04e-09 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 5.979 on 166 degrees of freedom## Multiple R-squared: 0.9492, Adjusted R-squared: 0.9452 ## F-statistic: 238.5 on 13 and 166 DF, p-value: < 2.2e-16

summary(mod3)

## ## Call:## tslm(formula = BevSales ~ poly(trend, 3) + season)## ## Residuals:## Min 1Q Median 3Q Max ## -12.6894 -3.5073 -0.2272 3.2039 12.7256 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 146.213 1.301 112.387 < 2e-16 ***## poly(trend, 3)1 287.750 5.042 57.071 < 2e-16 ***## poly(trend, 3)2 20.457 5.031 4.066 7.38e-05 ***## poly(trend, 3)3 42.150 5.057 8.335 2.88e-14 ***## season2 21.185 1.837 11.532 < 2e-16 ***## season3 20.767 1.837 11.304 < 2e-16 ***## season4 28.169 1.837 15.331 < 2e-16 ***## season5 38.349 1.838 20.868 < 2e-16 ***## season6 48.042 1.838 26.136 < 2e-16 ***## season7 29.953 1.839 16.290 < 2e-16 ***## season8 37.004 1.839 20.119 < 2e-16 ***## season9 32.176 1.840 17.487 < 2e-16 ***## season10 20.797 1.841 11.298 < 2e-16 ***## season11 22.177 1.842 12.041 < 2e-16 ***## season12 12.464 1.843 6.764 2.22e-10 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 5.031 on 165 degrees of freedom## Multiple R-squared: 0.9642, Adjusted R-squared: 0.9612 ## F-statistic: 317.8 on 14 and 165 DF, p-value: < 2.2e-16

CV(mod1)

## CV AIC AICc BIC AdjR2 ## 41.0164330 669.4626000 672.0080545 714.1639959 0.9416903

CV(mod2)

## CV AIC AICc BIC AdjR2 ## 38.7391876 659.1969387 662.1237680 707.0912915 0.9452032

CV(mod3)

## CV AIC AICc BIC AdjR2 ## 27.8693500 597.9424541 601.2798774 649.0297637 0.9612062

The leave-out-one cross-validation suggests the model with cubic polynomial trend and seasonal (monthly) dummy variables has best performance. You see that the CV function also returns other measures that can be used to choose between rival models, one of the them being the which we discussed above.

A better approach to cross-validation is to form train/test sets and use a model fit to the training data to forecast the test set. We will do this for the beverage sales data by using the last 24 months (2 years) of the time series as our test set. Notice in fitting the quadratic and cubic polynomial trends to this time series we are specifying the quadratic () and cubic () terms explicitly (rather than using the poly() funciton). This allows us to forecast ahead, where as the poly approach does not. The I(t^2) notation is just how we have to specify that we want in our long-term trend model.

BevSales.train = head(BevSales,156)BevSales.test = tail(BevSales,24)mod1.train = tslm(BevSales.train~trend+season)mod2.train = tslm(BevSales.train~poly(trend,2)+season)mod3.train = tslm(BevSales.train~poly(trend,3)+season)mod1.fc = forecast(mod1.train,h=24)mod2.fc = forecast(mod2.train,h=24)mod3.fc = forecast(mod3.train,h=24)

# Examine forecast tablesmod1.fc## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95## Jan 2005 170.8216 163.3789 178.2643 159.3949 182.2483## Feb 2005 191.4749 184.0322 198.9176 180.0482 202.9016## Mar 2005 192.2831 184.8405 199.7258 180.8564 203.7098## Apr 2005 199.6390 192.1963 207.0816 188.2123 211.0657## May 2005 209.0176 201.5749 216.4603 197.5909 220.4443## Jun 2005 219.8826 212.4399 227.3252 208.4558 231.3093## Jul 2005 202.6008 195.1581 210.0434 191.1740 214.0275## Aug 2005 209.6405 202.1978 217.0831 198.2137 221.0672## Sep 2005 206.0697 198.6271 213.5124 194.6430 217.4964## Oct 2005 194.7819 187.3392 202.2246 183.3552 206.2086## Nov 2005 195.3979 187.9553 202.8406 183.9712 206.8247## Dec 2005 186.7769 179.3342 194.2196 175.3502 198.2036## Jan 2006 175.3590 167.8931 182.8249 163.8967 186.8214## Feb 2006 196.0123 188.5464 203.4782 184.5500 207.4747## Mar 2006 196.8206 189.3547 204.2865 185.3582 208.2829## Apr 2006 204.1764 196.7105 211.6423 192.7140 215.6388## May 2006 213.5551 206.0892 221.0210 202.0927 225.0174## Jun 2006 224.4200 216.9541 231.8859 212.9576 235.8823## Jul 2006 207.1382 199.6723 214.6041 195.6758 218.6005## Aug 2006 214.1779 206.7120 221.6438 202.7155 225.6402## Sep 2006 210.6072 203.1413 218.0731 199.1448 222.0695## Oct 2006 199.3193 191.8534 206.7852 187.8570 210.7817## Nov 2006 199.9354 192.4695 207.4013 188.4730 211.3977## Dec 2006 191.3144 183.8485 198.7803 179.8520 202.7767

mod2.fc## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95## Jan 2005 169.6785 162.1327 177.2243 158.0931 181.2639## Feb 2005 190.2851 182.7302 197.8401 178.6857 201.8846## Mar 2005 191.0467 183.4822 198.6112 179.4327 202.6608## Apr 2005 198.3559 190.7815 205.9302 186.7267 209.9851## May 2005 207.6879 200.1033 215.2725 196.0430 219.3328## Jun 2005 218.5062 210.9110 226.1013 206.8451 230.1673## Jul 2005 201.1777 193.5717 208.7838 189.4999 212.8556## Aug 2005 208.1708 200.5534 215.7881 196.4756 219.8659## Sep 2005 204.5534 196.9245 212.1823 192.8404 216.2663## Oct 2005 193.2189 185.5780 200.8598 181.4876 204.9502## Nov 2005 193.7883 186.1351 201.4414 182.0381 205.5385## Dec 2005 185.1206 177.4548 192.7864 173.3510 196.8902## Jan 2006 173.6527 165.9505 181.3549 161.8273 185.4782## Feb 2006 194.2527 186.5353 201.9701 182.4039 206.1015## Mar 2006 195.0076 187.2745 202.7407 183.1347 206.8805## Apr 2006 202.3101 194.5609 210.0593 190.4125 214.2077## May 2006 211.6354 203.8697 219.4012 199.7125 223.5584## Jun 2006 222.4471 214.6644 230.2297 210.4980 234.3961## Jul 2006 205.1119 197.3119 212.9120 193.1362 217.0876## Aug 2006 212.0983 204.2805 219.9162 200.0953 224.1014## Sep 2006 208.4743 200.6382 216.3104 196.4432 220.5053## Oct 2006 197.1331 189.2784 204.9878 185.0734 209.1928## Nov 2006 197.6958 189.8220 205.5696 185.6069 209.7848## Dec 2006 189.0215 181.1282 196.9148 176.9027 201.1403

mod3.fc## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95## Jan 2005 177.1093 170.2837 183.9349 166.6294 187.5892## Feb 2005 198.1581 191.3047 205.0115 187.6355 208.6807## Mar 2005 199.3795 192.4961 206.2628 188.8109 209.9481## Apr 2005 207.1662 200.2505 214.0818 196.5480 217.7843## May 2005 216.9934 210.0430 223.9437 206.3219 227.6648## Jun 2005 228.3245 221.3369 235.3121 217.5959 239.0531## Jul 2005 211.5266 204.4991 218.5541 200.7367 222.3165## Aug 2005 219.0679 211.9977 226.1381 208.2124 229.9234## Sep 2005 216.0165 208.9007 223.1323 205.0910 226.9419## Oct 2005 205.2656 198.1012 212.4300 194.2655 216.2656## Nov 2005 206.4363 199.2202 213.6524 195.3568 217.5157## Dec 2005 198.3876 191.1166 205.6586 187.2238 209.5514## Jan 2006 188.5876 181.1583 196.0169 177.1809 199.9944## Feb 2006 209.8445 202.3489 217.3400 198.3360 221.3530## Mar 2006 211.2765 203.7108 218.8421 199.6603 222.8926## Apr 2006 219.2763 211.6366 226.9160 207.5465 231.0061## May 2006 229.3192 221.6014 237.0369 217.4694 241.1689## Jun 2006 240.8685 233.0685 248.6685 228.8925 252.8445## Jul 2006 224.2913 216.4048 232.1778 212.1825 236.4001## Aug 2006 232.0559 224.0785 240.0332 219.8076 244.3041## Sep 2006 229.2302 221.1576 237.3028 216.8356 241.6247## Oct 2006 218.7076 210.5352 226.8800 206.1599 231.2553## Nov 2006 220.1091 211.8323 228.3859 207.4011 232.8171## Dec 2006 212.2938 203.9080 220.6796 199.4184 225.1692

# Plot the forecastsplot(mod1.fc)

plot(mod2.fc)

plot(mod3.fc)

# Calculate the RSS/SSE, i.e. sum of the squared forecast errors, for the 24 forecast valuessum((BevSales.test - mod1.fc$mean)^2)

## [1] 2912.576

sum((BevSales.test - mod2.fc$mean)^2)

## [1] 3752.086

sum((BevSales.test - mod3.fc$mean)^2)

## [1] 1316.556

# Use the accuracy(f,y) function in the forecast libraryaccuracy(mod1.fc,BevSales.test)

## ME RMSE MAE MPE MAPE## Training set 1.833358e-16 5.278594 4.312035 -0.09767203 2.556686## Test set 8.719652e+00 11.016230 9.572466 4.00298255 4.491185## MASE ACF1 Theil's U## Training set 0.6644744 0.6133073 NA## Test set 1.4750943 0.1963101 0.6714005

accuracy(mod2.fc,BevSales.test)

## ME RMSE MAE MPE MAPE## Training set -3.642919e-16 5.254507 4.261748 -0.09283479 2.520439## Test set 1.041931e+01 12.503477 11.073829 4.81475260 5.190392## MASE ACF1 Theil's U## Training set 0.6567253 0.6081548 NA## Test set 1.7064508 0.2275217 0.7633889

accuracy(mod3.fc,BevSales.test)

## ME RMSE MAE MPE MAPE## Training set -5.457672e-16 4.621882 3.811008 -0.07177486 2.328146## Test set -4.133244e+00 7.406517 5.963064 -2.10902412 2.948918## MASE ACF1 Theil's U## Training set 0.5872673 0.4837070 NA## Test set 0.9188941 0.1024153 0.4223368

The quadratic model really has poor predictive performance, while the cubic polynomial long-term trend model has the best performance. Using split-sample cross-validation at this point we would opt for the model with the cubic long-term trend and seasonal (monthly) dummy variables.

Akaike’s Information Criterion (AIC)

A closely-related method to the is Akaike’s Information Criterion (AIC), which we define as

where is the number of observations for estimation and is the number of predictors/terms in the model.

Different computer packages use slightly different definitions for the AIC, although they should all lead to the same model being selected. The part of the equation occurs because there are parameters in the model: the coefficients for the predictors/terms, one more for the intercept and one for the variance of the residuals. The idea here is to penalize the fit of the model (RSS/SSE) with the number of parameters that need to be estimated.

The model with the minimum value of the AIC is often the best model for forecasting. For large values of , minimizing the AIC is equivalent to minimizing the CV value.

Corrected Akaike’s Information Criterion (AICc)

For small values of , the AIC tends to select too many predictors, and so a bias-corrected version of the AIC has been developed,

As with the AIC, the AICc should be minimized.

Schwarz’s Bayesian Information Criterion (BIC)

A related measure is Schwarz’s Bayesian Information Criterion (BIC),

As with the AIC measures, minimizing the BIC is intended to give the “best” model. The model chosen by the BIC is either the same as that chosen by the AIC, or one with fewer terms. This is because the BIC penalizes the number of parameters more heavily than the AIC. For large values of , minimizing BIC is similar to leave- -out cross-validation when .

Many statisticians like to use the BIC because it has the feature that if there is a true underlying model, the BIC will select that model given enough data. However, in reality, there is rarely if ever a true underlying model, and even if there was a true underlying model, selecting that model will not necessarily give the best forecasts (because the parameter estimates may not be accurate). Consequently, we prefer to use the AICc, AIC, or CV statistics, which have forecasting as their objective (and which give equivalent models for large).

Table 5.1: All 16 possible models for forecasting US consumption with 4 predictors.

Income

Production

Savings

Unemployment

CV

AIC

AICc

BIC

AdjR2

1

1

1

1

0.116

-409

-409

-390

0.749

1

0

1

1

0.116

-408

-408

-392

0.746

1

1

1

0

0.118

-407

-407

-391

0.745

1

0

1

0

0.129

-389

-389

-376

0.716

1

1

0

1

0.278

-243

-243

-227

0.386

1

0

0

1

0.283

-238

-238

-225

0.365

1

1

0

0

0.289

-236

-236

-223

0.359

0

1

1

1

0.293

-234

-234

-218

0.356

0

1

1

0

0.300

-229

-229

-216

0.334

0

1

0

1

0.303

-226

-226

-213

0.324

0

0

1

1

0.306

-225

-224

-212

0.318

0

1

0

0

0.314

-220

-219

-210

0.296

0

0

0

1

0.314

-218

-218

-208

0.288

1

0

0

0

0.372

-185

-185

-176

0.154

0

0

1

0

0.414

-164

-164

-154

0.052

0

0

0

0

0.432

-155

-155

-149

0.000

The best model contains all four predictors. However, a closer look at the results reveals some interesting features. There is clear separation between the models in the first four rows and the ones below. This indicates that Income and Savings are both more important variables than Production and Unemployment. Also, the first two rows have almost identical values of CV, AIC and AICc. So we could possibly drop the Production variable and get very similar forecasts. Note that Production and Unemployment are highly (negatively) correlated, as shown in Figure 5.5, so most of the predictive information in Production is also contained in the Unemployment variable.

require(GGally)

## Loading required package: GGally

## ## Attaching package: 'GGally'

## The following object is masked from 'package:fma':## ## pigs

USchange = data.frame(uschange)ggpairs(USchange)

mod1 = lm(Consumption ~ Income + Production + Savings + Unemployment, data = USchange)summary(mod1)

## ## Call:## lm(formula = Consumption ~ Income + Production + Savings + Unemployment, ## data = USchange)## ## Residuals:## Min 1Q Median 3Q Max ## -0.88296 -0.17638 -0.03679 0.15251 1.20553 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.26729 0.03721 7.184 1.68e-11 ***## Income 0.71449 0.04219 16.934 < 2e-16 ***## Production 0.04589 0.02588 1.773 0.0778 . ## Savings -0.04527 0.00278 -16.287 < 2e-16 ***## Unemployment -0.20477 0.10550 -1.941 0.0538 . ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 0.3286 on 182 degrees of freedom## Multiple R-squared: 0.754, Adjusted R-squared: 0.7486 ## F-statistic: 139.5 on 4 and 182 DF, p-value: < 2.2e-16

mod1.step = step(mod1)

## Start: AIC=-411.3## Consumption ~ Income + Production + Savings + Unemployment## ## Df Sum of Sq RSS AIC## 19.652 -411.30## - Production 1 0.3396 19.992 -410.09## - Unemployment 1 0.4068 20.059 -409.47## - Savings 1 28.6413 48.293 -245.16## - Income 1 30.9656 50.618 -236.37

summary(mod1.step)

## ## Call:## lm(formula = Consumption ~ Income + Production + Savings + Unemployment, ## data = USchange)## ## Residuals:## Min 1Q Median 3Q Max ## -0.88296 -0.17638 -0.03679 0.15251 1.20553 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.26729 0.03721 7.184 1.68e-11 ***## Income 0.71449 0.04219 16.934 < 2e-16 ***## Production 0.04589 0.02588 1.773 0.0778 . ## Savings -0.04527 0.00278 -16.287 < 2e-16 ***## Unemployment -0.20477 0.10550 -1.941 0.0538 . ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 0.3286 on 182 degrees of freedom## Multiple R-squared: 0.754, Adjusted R-squared: 0.7486 ## F-statistic: 139.5 on 4 and 182 DF, p-value: < 2.2e-16

Best subset regression

Where possible, all potential regression models should be fitted (as was done in the example above) and the best model should be selected based on one of the measures discussed. This is known as “best subsets” regression or “all possible subsets” regression.

It is recommended that one of CV, AIC or AICc be used for this purpose. If the value of is large enough, they will all lead to the same model.

While is very widely used, and has been around longer than the other measures, its tendency to select too many predictor variables makes it less suitable for forecasting than CV, AIC or AICc. Likewise, the tendency of BIC to select too few variables makes it less suitable for forecasting than CV, AIC or AICc.

Stepwise Regression

If there are a large number of predictors, it is not possible to fit all possible models. For example, 40 predictors leads to possible models! Consequently, a strategy is required to limit the number of models to be explored.

An approach that works quite well is backwards stepwise regression:

1. Start with the model containing all potential predictors.

1. Remove one predictor at a time. Keep the model if it improves the measure of predictive accuracy.

1. Iterate until no further improvement.

If the number of potential predictors is too large, then the backwards stepwise regression will not work and forward stepwise regression can be used instead. This procedure starts with a model that includes only the intercept. Predictors are added one at a time, and the one that most improves the measure of predictive accuracy is retained in the model. The procedure is repeated until no further improvement can be achieved.

Alternatively for either the backward or forward direction, a starting model can be one that includes a subset of potential predictors. In this case, an extra step needs to be included. For the backwards procedure we should also consider adding a predictor with each step, and for the forward procedure we should also consider dropping a predictor with each step. These are referred to as hybrid procedures.

It is important to realize that any stepwise approach is not guaranteed to lead to the best possible model, but it almost always leads to a good model. For further details see James et al. (2014).

Beware of inference after selecting predictors

We do not discuss statistical inference of the predictors in this book (e.g., looking atvalues associated with each predictor). If you do wish to look at the statistical significance of the predictors, beware that any procedure involving selecting predictors first will invalidate the assumptions behind the values. The procedures we recommend for selecting predictors are helpful when the model is used for forecasting; they are not helpful if you wish to study the effect of any predictor on the forecast variable.

5.6 - Forecasting with Regression Models

Recall that predictions of can be obtained using the equation,

which comprises the estimated coefficients and ignores the error in the regression equation. Plugging in the values of the predictor variables for returned the fitted (training-sample) values of . What we are interested in here is forecasting future, yet unobserved, values of .

Ex-ante versus ex-post forecasts

When using regression models for time series data, we need to distinguish between the different types of forecasts that can be produced, depending on what is assumed to be known when the forecasts are computed.

Ex ante forecasts are those that are made using only the information that is available in advance. For example, ex-ante forecasts for the percentage change in US consumption for quarters following the end of the sample, should only use information that was available up to and including 2016 Q3. These are genuine forecasts, made in advance using whatever information is available at the time. Therefore in order to generate ex-ante forecasts, the model requires future values (forecasts) of the predictors. To obtain these we can use one of the simple methods introduced in Section 3.1 or more sophisticated pure time series approaches that follow in Chapters 7 and 8. Alternatively, forecasts from some other source, such as a government agency, may be available and can be used.

Ex post forecasts are those that are made using later information on the predictors. For example, ex post forecasts of consumption may use the actual observations of the predictors, once these have been observed. These are not genuine forecasts, but are useful for studying the behaviour of forecasting models.

The model from which ex-post forecasts are produced should not be estimated using data from the forecast period. That is, ex-post forecasts can assume knowledge of the predictor variables (the variables), but should not assume knowledge of the data that are to be forecast (the variable).

A comparative evaluation of ex-ante forecasts and ex-post forecasts can help to separate out the sources of forecast uncertainty. This will show whether forecast errors have arisen due to poor forecasts of the predictor or due to a poor forecasting model.

Example 5.4 - Australian Quarterly Beer Production

Normally, we cannot use actual future values of the predictor variables when producing ex-ante forecasts because their values will not be known in advance. However, the special predictors introduced in Section 5.4 are all known in advance, as they are based on calendar variables (e.g., seasonal dummy variables or public holiday indicators) or deterministic functions of time (e.g. time trend). In such cases, there is no difference between ex ante and ex post forecasts.

beersub = window(ausbeer,start=1992)beer.mod1 = tslm(beersub~trend+season)summary(beer.mod1)

## ## Call:## tslm(formula = beersub ~ trend + season)## ## Residuals:## Min 1Q Median 3Q Max ## -42.903 -7.599 -0.459 7.991 21.789 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 441.80044 3.73353 118.333 < 2e-16 ***## trend -0.34027 0.06657 -5.111 2.73e-06 ***## season2 -34.65973 3.96832 -8.734 9.10e-13 ***## season3 -17.82164 4.02249 -4.430 3.45e-05 ***## season4 72.79641 4.02305 18.095 < 2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 12.23 on 69 degrees of freedom## Multiple R-squared: 0.9243, Adjusted R-squared: 0.9199 ## F-statistic: 210.7 on 4 and 69 DF, p-value: < 2.2e-16

checkresiduals(beer.mod1)

## ## Breusch-Godfrey test for serial correlation of order up to 8## ## data: Residuals from Linear regression model## LM test = 9.3083, df = 8, p-value = 0.317

beer.fc = forecast(beer.mod1)autoplot(beer.fc) + ggtitle("Forecasts of Beer Production (ML) using Linear Regression") + xlab("Year") + ylab("Beer Production (ML)")

Scenario-Based Forecasting

In this setting, the forecaster assumes possible scenarios for the predictor variables that are of interest. For example, a US policy maker may be interested in comparing the predicted change in consumption when there is a constant growth of 1% and 0.5% respectively for income and savings with no change in the employment rate, versus a respective decline of 1% and 0.5%, for each of the four quarters following the end of the sample. The resulting forecasts are calculated below and shown in Figure 5.18. We should note that prediction intervals for scenario based forecasts do not include the uncertainty associated with the future values of the predictor variables. They assume that the values of the predictors are known in advance.

Example 5.2 - Growth Rates of Personal Consumption & Personal Income in the US (cont’d)

require(fpp2)mod1 = tslm(Consumption~Income+Savings+Unemployment,data=uschange)summary(mod1)

## ## Call:## tslm(formula = Consumption ~ Income + Savings + Unemployment, ## data = uschange)## ## Residuals:## Min 1Q Median 3Q Max ## -0.82491 -0.17737 -0.02716 0.14406 1.25913 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.281017 0.036607 7.677 9.47e-13 ***## Income 0.730497 0.041455 17.622 < 2e-16 ***## Savings -0.045990 0.002766 -16.629 < 2e-16 ***## Unemployment -0.341346 0.072526 -4.707 4.96e-06 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 0.3305 on 183 degrees of freedom## Multiple R-squared: 0.7497, Adjusted R-squared: 0.7456 ## F-statistic: 182.7 on 3 and 183 DF, p-value: < 2.2e-16

checkresiduals(mod1)

## ## Breusch-Godfrey test for serial correlation of order up to 8## ## data: Residuals from Linear regression model## LM test = 15.958, df = 8, p-value = 0.04299

upX = cbind(Income=c(1,1,1,1), Savings=c(0.5,0.5,0.5,0.5), Unemployment=c(0,0,0,0))

upX = data.frame(upX)upX

## Income Savings Unemployment## 1 1 0.5 0## 2 1 0.5 0## 3 1 0.5 0## 4 1 0.5 0

forecast.up = forecast(mod1,newdata=upX)forecast.up

## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95## 2016 Q4 0.9885187 0.5619394 1.415098 0.3341497 1.642888## 2017 Q1 0.9885187 0.5619394 1.415098 0.3341497 1.642888## 2017 Q2 0.9885187 0.5619394 1.415098 0.3341497 1.642888## 2017 Q3 0.9885187 0.5619394 1.415098 0.3341497 1.642888

downX = cbind(Income=c(-1,-1,-1,-1), Savings=c(-.5,-.5,-.5,-.5), Unemployment=c(.25,.25,.25,.25))downX = data.frame(downX)downX

## Income Savings Unemployment## 1 -1 -0.5 0.25## 2 -1 -0.5 0.25## 3 -1 -0.5 0.25## 4 -1 -0.5 0.25

forecast.down = forecast(mod1,newdata=downX)forecast.down

## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95## 2016 Q4 -0.5118215 -0.9454555 -0.07818759 -1.177012 0.1533693## 2017 Q1 -0.5118215 -0.9454555 -0.07818759 -1.177012 0.1533693## 2017 Q2 -0.5118215 -0.9454555 -0.07818759 -1.177012 0.1533693## 2017 Q3 -0.5118215 -0.9454555 -0.07818759 -1.177012 0.1533693

neutralX = cbind(Income=c(0,0,0,0), Savings=c(0,0,0,0), Unemployment=c(0,0,0,0))neutralX = data.frame(neutralX)forecast.neutral = forecast(mod1,newdata=neutralX)forecast.neutral

## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95## 2016 Q4 0.2810169 -0.146696 0.7087297 -0.375091 0.9371247## 2017 Q1 0.2810169 -0.146696 0.7087297 -0.375091 0.9371247## 2017 Q2 0.2810169 -0.146696 0.7087297 -0.375091 0.9371247## 2017 Q3 0.2810169 -0.146696 0.7087297 -0.375091 0.9371247

autoplot(uschange[,"Consumption"]) + ylab("% Changes in Consumption") + xlab("Year") + autolayer(forecast.up,PI=TRUE,series="Increase") + autolayer(forecast.down,PI=TRUE,series="Decrease") + autolayer(forecast.neutral,PI=TRUE,series="Neutral") + guides(colour=guide_legend(title="Scenario"))

The example above illustrates the difficulty/opportunity that forecasting one time series using other time series as predictors. Here we have modeled the percent change in consumption time series using the time series that represent the percent changes in income, saving, and unemployment. We cannot forecast future values of consumption without “knowing” the values of the predictor time series. Of course we cannot know future values of the predictor time series (income, savings & unemployment), anymore than we can know the values of the response time series (consumption). However we can use scenario forecasting as above to explore what the forecast is for the percent change in consumption under different possible future scenarios.

Building a Predictive Regression Model with Time Series Predictors (Looking Ahead)

The great advantage of regression models is that they can be used to capture important relationships between the forecast variable of interest and the predictor variables. A major challenge however (as is stated above), is that in order to generate ex-ante forecasts the model requires future values of each predictor. If scenario based forecasting is of interest then these models are extremely useful. However, if ex-ante forecasting is the main focus, obtaining forecasts of the predictors can be very challenging (in many cases generating forecasts for the predictor variables can be more challenging than forecasting directly the forecast variable without using predictors).

An alternative formulation is to use as predictors their lagged values. Assuming that we are interested in generating a -step ahead forecast we write

for .

Alternatively we express the predictors as being lagged by time units. For example, if then we are using values of the predictor time series immediately proceeding the time we wish to forecast for. For seasonal data with frequency we might use . Using these ideas we could express the model as

or

Furthermore, as we building models using past values of the predictor time series to forecast ahead we could also include lag values of the response series in our model.

In any of the formulation of the models above, the predictor set is formed by values of the ’s that are observed time periods prior to observing . Therefore, when the estimated model is projected into the future, i.e., beyond the end of the sample , all predictor values are available.

Including lagged values of the predictors does not only make the model operational for easily generating forecasts, it also makes it intuitively appealing. For example, the effect of a policy change with the aim of increasing production may not have an instantaneous effect on consumption expenditure. It is most likely that this will happen with a lagging effect. We touched upon this in Section 5.4 when briefly introducing distributed lags as predictors. Several directions for generalising regression models to better incorporate the rich dynamics observed in time series are discussed in Section 9.

Prediction Intervals

With each forecast for the change in consumption Figure 5.18 95% and 80% prediction intervals are also included. The general formulation of how to calculate prediction intervals for multiple regression models is presented in Section 5.7. As this involves some advanced matrix algebra we will not cover it here. Rather below we cover the case for calculating prediction intervals for a simple regression, where a forecast can be generated using the equation,

Assuming that the regression errors (i.e. residuals) are approximately normally distributed, an approximate prediction interval associated with this forecast is given by,

where is the total number of observations, is the mean of observed values, is the standard deviation of the observed values and is the standard error of the regression given by equation $5.3% in the text. Similarly, an prediction interval can be obtained by replacing by . Other prediction confidence levels can be obtained by using the appropriate standard normal table value. When using the forecast function in R more complicated formulae are used estimate the prediction intervals. For example, the formula above would not handle a case where have used a trend and season dummy variables in our model.

One key feature of all prediction intervals is that they get wider was we attempt to forecast further into the future.

Example 5.2 - Growth Rates of Personal Consumption & Personal Income in the US (cont’d)

Here will consider the simple linear regression of percent change in income on the percent change in income alone.

con.mod1 = tslm(Consumption~Income,data=uschange)summary(con.mod1)

## ## Call:## tslm(formula = Consumption ~ Income, data = uschange)## ## Residuals:## Min 1Q Median 3Q Max ## -2.40845 -0.31816 0.02558 0.29978 1.45157 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.54510 0.05569 9.789 < 2e-16 ***## Income 0.28060 0.04744 5.915 1.58e-08 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 0.6026 on 185 degrees of freedom## Multiple R-squared: 0.159, Adjusted R-squared: 0.1545 ## F-statistic: 34.98 on 1 and 185 DF, p-value: 1.577e-08

mean(uschange[,"Income"])

## [1] 0.7176268

The estimate equation is given by,

Assumming that for the next four quarters the percent change in personal income will increase by its historical mean value of , consumption is forecast to increase by,

and the corresponding and prediction intervals are given by and respectively. If we assume an extreme increase of in income then the prediction intervals are considerably wider, because is very far away from the mean .

h = 4newx = rep(mean(uschange[,"Income"]),h)newx

## [1] 0.7176268 0.7176268 0.7176268 0.7176268

newx = data.frame(Income=newx)fc.mean = forecast(con.mod1,newdata=newx)newx10 = rep(10,h)newx10

## [1] 10 10 10 10

newx10 = data.frame(Income=newx10)fc.10 = forecast(con.mod1,newdata=newx10)fc.mean

## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95## 2016 Q4 0.7464708 -0.03063903 1.523581 -0.44557 1.938512## 2017 Q1 0.7464708 -0.03063903 1.523581 -0.44557 1.938512## 2017 Q2 0.7464708 -0.03063903 1.523581 -0.44557 1.938512## 2017 Q3 0.7464708 -0.03063903 1.523581 -0.44557 1.938512

fc.10

## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95## 2016 Q4 3.351116 2.389508 4.312725 1.876065 4.826167## 2017 Q1 3.351116 2.389508 4.312725 1.876065 4.826167## 2017 Q2 3.351116 2.389508 4.312725 1.876065 4.826167## 2017 Q3 3.351116 2.389508 4.312725 1.876065 4.826167

autoplot(uschange[,"Consumption"]) + xlab("Year") + ylab("% Chg Consumption") + autolayer(fc.mean,PI=TRUE,series="Average Increase") + autolayer(fc.10,PI=TRUE,series="Extreme Increase (x = 10%)") + guides(colour=guide_legend(title="Scenario"))

Example 5.1 - U.S. Beverage Sales (cont’d)

Let’s consider forecasting with our model for the monthly U.S. beverage sales that used a cubic polynomial in time and dummy variables for month.

bev.mod3 = tslm(BevSales~trend + I(trend^2) + I(trend^3) + season)summary(bev.mod3)

## ## Call:## tslm(formula = BevSales ~ trend + I(trend^2) + I(trend^3) + season)## ## Residuals:## Min 1Q Median 3Q Max ## -12.6894 -3.5073 -0.2272 3.2039 12.7256 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.037e+02 1.933e+00 53.664 < 2e-16 ***## trend 8.605e-01 7.345e-02 11.715 < 2e-16 ***## I(trend^2) -7.109e-03 9.416e-04 -7.551 2.80e-12 ***## I(trend^3) 2.851e-05 3.421e-06 8.335 2.88e-14 ***## season2 2.118e+01 1.837e+00 11.532 < 2e-16 ***## season3 2.077e+01 1.837e+00 11.304 < 2e-16 ***## season4 2.817e+01 1.837e+00 15.331 < 2e-16 ***## season5 3.835e+01 1.838e+00 20.868 < 2e-16 ***## season6 4.804e+01 1.838e+00 26.136 < 2e-16 ***## season7 2.995e+01 1.839e+00 16.290 < 2e-16 ***## season8 3.700e+01 1.839e+00 20.119 < 2e-16 ***## season9 3.218e+01 1.840e+00 17.487 < 2e-16 ***## season10 2.080e+01 1.841e+00 11.298 < 2e-16 ***## season11 2.218e+01 1.842e+00 12.041 < 2e-16 ***## season12 1.246e+01 1.843e+00 6.764 2.22e-10 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 5.031 on 165 degrees of freedom## Multiple R-squared: 0.9642, Adjusted R-squared: 0.9612 ## F-statistic: 317.8 on 14 and 165 DF, p-value: < 2.2e-16

The fitted model is given by,

with . Let’s forecast ahead 3 years, i.e. .

fc.3yr = forecast(bev.mod3,h=36)fc.3yr

## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95## Jan 2007 195.6290 188.6520 202.6060 184.9228 206.3352## Feb 2007 217.9113 210.9129 224.9096 207.1723 228.6502## Mar 2007 218.6081 211.5870 225.6293 207.8342 229.3821## Apr 2007 227.1412 220.0957 234.1867 216.3298 237.9525## May 2007 238.4701 231.3987 245.5416 227.6190 249.3213## Jun 2007 249.3289 242.2297 256.4280 238.4352 260.2225## Jul 2007 232.4237 225.2951 239.5522 221.4849 243.3624## Aug 2007 240.6767 233.5169 247.8364 229.6900 251.6633## Sep 2007 237.0680 229.8751 244.2609 226.0305 248.1055## Oct 2007 226.9266 219.6986 234.1546 215.8352 238.0179## Nov 2007 229.5622 222.2970 236.8273 218.4138 240.7105## Dec 2007 221.1239 213.8195 228.4282 209.9153 232.3324## Jan 2008 209.9523 202.5324 217.3722 198.5664 221.3381## Feb 2008 232.4488 224.9822 239.9155 220.9913 243.9064## Mar 2008 233.3620 225.8461 240.8779 221.8289 244.8951## Apr 2008 242.1135 234.5458 249.6811 230.5009 253.7260## May 2008 253.6629 246.0408 261.2849 241.9668 265.3589## Jun 2008 264.7441 257.0649 272.4233 252.9604 276.5278## Jul 2008 248.0634 240.3243 255.8025 236.1878 259.9390## Aug 2008 256.5430 248.7412 264.3448 244.5712 268.5148## Sep 2008 253.1630 245.2956 261.0304 241.0905 265.2355## Oct 2008 243.2522 235.3163 251.1882 231.0746 255.4299## Nov 2008 246.1206 238.1131 254.1281 233.8331 258.4081## Dec 2008 237.9171 229.8350 245.9992 225.5151 250.3191## Jan 2009 226.9824 218.7204 235.2444 214.3043 239.6604## Feb 2009 249.7178 241.3716 258.0641 236.9106 262.5251## Mar 2009 250.8720 242.4381 259.3058 237.9302 263.8137## Apr 2009 259.8664 251.3415 268.3914 246.7849 272.9480## May 2009 271.6609 263.0413 280.2805 258.4342 284.8876## Jun 2009 282.9892 274.2715 291.7070 269.6118 296.3666## Jul 2009 266.5577 257.7382 275.3773 253.0241 280.0913## Aug 2009 275.2885 266.3635 284.2136 261.5931 288.9840## Sep 2009 272.1618 263.1277 281.1960 258.2990 286.0247## Oct 2009 262.5064 253.3594 271.6534 248.4703 276.5425## Nov 2009 265.6321 256.3685 274.8958 251.4171 279.8472## Dec 2009 257.6881 248.3041 267.0721 243.2883 272.0879

autoplot(fc.3yr) + xlab("Year") + ylab("Monthly Beverage Sales") + ggtitle("Monthly U.S. Beverage Sales with 3-yr Forecast")

Though it is not visually obvious, the prediction intervals do get wider. We can check this numerically by calculating the prediction interval width for each of the prediction intervals. The attributes lower and upper contain both the (in column 1) and (in column 2) prediction interval limits. To access the first column we use [,1] and for the second column we use [,2].

Lower95 = fc.3yr$lower[,2]Upper95 = fc.3yr$upper[,2]WidthPred95 = Upper95 - Lower95WidthPred95

## Jan Feb Mar Apr May Jun Jul## 2007 21.41236 21.47785 21.54787 21.62264 21.70236 21.78725 21.87751## 2008 22.77169 22.91516 23.06625 23.22518 23.39217 23.56743 23.75118## 2009 25.35607 25.61453 25.88348 26.16307 26.45345 26.75479 27.06720## Aug Sep Oct Nov Dec## 2007 21.97337 22.07504 22.18273 22.29666 22.41704## 2008 23.94361 24.14493 24.35532 24.57498 24.80406## 2009 27.39082 27.72576 28.07214 28.43004 28.79957

autoplot(WidthPred95)

5.8 More Nonlinear Terms

In the U.S. beverage sales examples above we fit models where we included quadratic and cubic terms to our model. In this section we will consider other ways to introduce terms or perform transformations to handle nonlinear trends in a time series regression model.

Transformations

The simplest way of modeling nonlinear relationships in time series regression model is to transform the response variable to be forecast and/or the predictor variable before estimating the regression model. As discussed previously the most commonly employed transformation in the Box-Cox family is the transformation. The log-transformation is particularly good when we have monotonic nonlinear trends in our time series and more importantly when the variation in the time series increases over time.

Log-log model

In the log-log model both the response and the predictor variable are log-transformed.

In this model, the slope can be interpreted as an elasticity and represents the -change in the response time series associated with a increase in the predictor time series . Other useful forms can also be specified.

The log-linear form is specified by only transforming the forecast variable and the linear-log form is obtained by transforming the predictor only. The slope parameter interpretation for the log-linear model is as follows: if increases by unit then the response changes by

The slope parameter interpretation for the linear-log model also changes accordingly but we will not discuss it here. If we need to later, we certainly will.

Example 5.5 - Liquor Store Sales (A Big Example)

In this example we consider the monthly U.S. liquor store sales time series we have examined previously. We will be using a log-linear model based on time information only to forecast monthly liquor sales for a portion of the this time series.

Liquor = read.csv(file="http://course1.winona.edu/bdeppa/FIN%20335/Datasets/US%20Liquor%20Sales.csv")names(Liquor)

## [1] "Time" "Month" "Year" "Liquor.Sales"

LiqSales = ts(Liquor$Liquor.Sales,start=1980,frequency=12)LiqSales = LiqSales/monthdays(LiqSales)autoplot(LiqSales) + xlab("Year") + ylab("Monthly Liquor Sales")

Liq.sub = window(LiqSales,end=c(1989,12))Liq.sub

## Jan Feb Mar Apr May Jun Jul## 1980 15.48387 16.10345 16.58065 16.83333 17.22581 18.20000 17.38710## 1981 16.83871 18.07143 18.00000 17.93333 19.51613 19.43333 19.58065## 1982 19.51613 19.17857 18.54839 19.60000 21.16129 20.76667 21.32258## 1983 20.74194 20.10714 19.87097 21.50000 22.67742 22.80000 23.58065## 1984 22.16129 21.68966 22.16129 23.53333 24.32258 25.80000 26.61290## 1985 22.93548 24.75000 25.48387 25.13333 25.77419 27.46667 27.54839## 1986 23.87097 25.46429 25.51613 25.60000 27.29032 29.46667 28.58065## 1987 25.67742 26.78571 26.90323 27.93333 29.09677 29.83333 31.03226## 1988 27.93548 27.75862 29.19355 29.10000 33.03226 32.83333 33.83871## 1989 29.51613 30.50000 29.74194 32.16667 32.70968 34.66667 36.67742## Aug Sep Oct Nov Dec## 1980 17.45161 18.36667 17.32258 19.46667 27.54839## 1981 20.12903 19.00000 19.64516 22.50000 27.77419## 1982 21.54839 20.10000 20.61290 22.30000 29.51613## 1983 23.29032 22.60000 23.00000 24.16667 31.90323## 1984 24.35484 25.03333 25.25806 26.80000 36.74194## 1985 26.12903 26.60000 26.03226 27.73333 36.83871## 1986 28.32258 27.10000 27.09677 29.46667 40.16129## 1987 31.93548 29.40000 30.19355 33.23333 42.09677## 1988 33.35484 31.70000 32.58065 33.86667 44.45161## 1989 33.09677 33.06667 33.93548 35.20000 47.38710

autoplot(Liq.sub) + xlab("Year") + ylab("Monthly Liquor Sales per Day") + ggtitle("Monthly Liquor Sales per Day (1980-1989)")

autoplot(log(Liq.sub)) + xlab("Year") + ylab("log(Liquor Sales) per Day")

logmod1 = tslm(log(Liq.sub)~trend+season)summary(logmod1)

## ## Call:## tslm(formula = log(Liq.sub) ~ trend + season)## ## Residuals:## Min 1Q Median 3Q Max ## -0.08428 -0.02066 -0.00528 0.01828 0.07867 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.771e+00 1.086e-02 255.282 < 2e-16 ***## trend 5.850e-03 8.258e-05 70.844 < 2e-16 ***## season2 1.908e-02 1.394e-02 1.368 0.17412 ## season3 1.947e-02 1.395e-02 1.396 0.16560 ## season4 4.438e-02 1.395e-02 3.182 0.00192 ** ## season5 9.292e-02 1.395e-02 6.662 1.21e-09 ***## season6 1.184e-01 1.395e-02 8.490 1.29e-13 ***## season7 1.277e-01 1.395e-02 9.154 4.18e-15 ***## season8 1.009e-01 1.396e-02 7.231 7.51e-11 ***## season9 7.111e-02 1.396e-02 5.094 1.52e-06 ***## season10 7.374e-02 1.396e-02 5.281 6.81e-07 ***## season11 1.437e-01 1.397e-02 10.288 < 2e-16 ***## season12 4.209e-01 1.397e-02 30.122 < 2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 0.03118 on 107 degrees of freedom## Multiple R-squared: 0.9846, Adjusted R-squared: 0.9828 ## F-statistic: 568.6 on 12 and 107 DF, p-value: < 2.2e-16

checkresiduals(logmod1)

## ## Breusch-Godfrey test for serial correlation of order up to 24## ## data: Residuals from Linear regression model## LM test = 55.129, df = 24, p-value = 0.0003007

Despite the fact that this model exhibits some lack of fit, i.e. the linear trend misses some curvature, we will interpret the estimate model coefficients in light of the fact that the log-transformation was used on the monthly liquor sales. Consider the estimated coefficient for trend, i.e. , . Using the above formula for the interpretation of in a log-linear model we have,

Thus the estimate increase in liquor sales per month is . We now consider the increase in liquor sales in December vs. the reference month which is January. Here the estimated coefficient for December is thus the percent increase in sales in December relative to January is given by,

Thus liquor sales in December are higher compared to January. The code below shows how to use R as a calculator to do these calculations.

100*(exp(.00585)-1)

## [1] 0.5867145

100*(exp(.421)-1)

## [1] 52.34843

How do we forecast the monthly liquor sales in the original scale for the next two years , i.e. not in the log-scale? As the log-transformation corresponds to when in Box-Cox power transformation family we simply need to add lambda=0 as an argument when fitting our model using the tslm function and when making the forecast using the forecast function. We also need to specify when making the forecast whether we want to bias-adjust our forecasts.

logmod1 = tslm(Liq.sub~trend+season,lambda=0)summary(logmod1)

## ## Call:## tslm(formula = Liq.sub ~ trend + season, lambda = 0)## ## Residuals:## Min 1Q Median 3Q Max ## -0.08428 -0.02066 -0.00528 0.01828 0.07867 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.771e+00 1.086e-02 255.282 < 2e-16 ***## trend 5.850e-03 8.258e-05 70.844 < 2e-16 ***## season2 1.908e-02 1.394e-02 1.368 0.17412 ## season3 1.947e-02 1.395e-02 1.396 0.16560 ## season4 4.438e-02 1.395e-02 3.182 0.00192 ** ## season5 9.292e-02 1.395e-02 6.662 1.21e-09 ***## season6 1.184e-01 1.395e-02 8.490 1.29e-13 ***## season7 1.277e-01 1.395e-02 9.154 4.18e-15 ***## season8 1.009e-01 1.396e-02 7.231 7.51e-11 ***## season9 7.111e-02 1.396e-02 5.094 1.52e-06 ***## season10 7.374e-02 1.396e-02 5.281 6.81e-07 ***## season11 1.437e-01 1.397e-02 10.288 < 2e-16 ***## season12 4.209e-01 1.397e-02 30.122 < 2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 0.03118 on 107 degrees of freedom## Multiple R-squared: 1, Adjusted R-squared: 1 ## F-statistic: 4.112e+05 on 12 and 107 DF, p-value: < 2.2e-16

## Notice the model summary as the same as one fit above.sales.fc = forecast(logmod1,h=24,lambda=0,biasadj=F)sales.fc

## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95## Jan 1990 32.43476 31.07734 33.85147 30.37160 34.63807## Feb 1990 33.25350 31.86181 34.70597 31.13826 35.51243## Mar 1990 33.46162 32.06123 34.92319 31.33314 35.73469## Apr 1990 34.50687 33.06273 36.01409 32.31191 36.85095## May 1990 36.43600 34.91113 38.02748 34.11832 38.91112## Jun 1990 37.59708 36.02361 39.23927 35.20555 40.15107## Jul 1990 38.17044 36.57297 39.83767 35.74243 40.76338## Aug 1990 37.37871 35.81438 39.01137 35.00107 39.91787## Sep 1990 36.49380 34.96650 38.08780 34.17244 38.97284## Oct 1990 36.80466 35.26435 38.41224 34.46353 39.30482## Nov 1990 39.70350 38.04188 41.43770 37.17798 42.40058## Dec 1990 52.69431 50.48901 54.99594 49.34245 56.27387## Jan 1991 34.79368 33.32992 36.32173 32.56902 37.17030## Feb 1991 35.67197 34.17126 37.23859 33.39116 38.10858## Mar 1991 35.89523 34.38513 37.47165 33.60014 38.34709## Apr 1991 37.01650 35.45922 38.64217 34.64972 39.54495## May 1991 39.08593 37.44159 40.80248 36.58683 41.75573## Jun 1991 40.33145 38.63472 42.10270 37.75272 43.08633## Jul 1991 40.94651 39.22390 42.74477 38.32845 43.74339## Aug 1991 40.09720 38.41032 41.85817 37.53345 42.83608## Sep 1991 39.14793 37.50098 40.86720 36.64487 41.82196## Oct 1991 39.48140 37.82042 41.21532 36.95701 42.17821## Nov 1991 42.59107 40.79927 44.46155 39.86786 45.50029## Dec 1991 56.52668 54.14862 59.00918 52.91245 60.38779

sales.fc.ba = forecast(logmod1,h=24,lambda=0,biasadj=T)sales.fc.ba

## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95## Jan 1990 32.45299 31.07734 33.85147 30.37160 34.63807## Feb 1990 33.27220 31.86181 34.70597 31.13826 35.51243## Mar 1990 33.48044 32.06123 34.92319 31.33314 35.73469## Apr 1990 34.52627 33.06273 36.01409 32.31191 36.85095## May 1990 36.45649 34.91113 38.02748 34.11832 38.91112## Jun 1990 37.61822 36.02361 39.23927 35.20555 40.15107## Jul 1990 38.19190 36.57297 39.83767 35.74243 40.76338## Aug 1990 37.39973 35.81438 39.01137 35.00107 39.91787## Sep 1990 36.51431 34.96650 38.08780 34.17244 38.97284## Oct 1990 36.82535 35.26435 38.41224 34.46353 39.30482## Nov 1990 39.72582 38.04188 41.43770 37.17798 42.40058## Dec 1990 52.72394 50.48901 54.99594 49.34245 56.27387## Jan 1991 34.81345 33.32992 36.32173 32.56902 37.17030## Feb 1991 35.69224 34.17126 37.23859 33.39116 38.10858## Mar 1991 35.91563 34.38513 37.47165 33.60014 38.34709## Apr 1991 37.03753 35.45922 38.64217 34.64972 39.54495## May 1991 39.10814 37.44159 40.80248 36.58683 41.75573## Jun 1991 40.35437 38.63472 42.10270 37.75272 43.08633## Jul 1991 40.96977 39.22390 42.74477 38.32845 43.74339## Aug 1991 40.11999 38.41032 41.85817 37.53345 42.83608## Sep 1991 39.17017 37.50098 40.86720 36.64487 41.82196## Oct 1991 39.50383 37.82042 41.21532 36.95701 42.17821## Nov 1991 42.61527 40.79927 44.46155 39.86786 45.50029## Dec 1991 56.55880 54.14862 59.00918 52.91245 60.38779

autoplot(sales.fc) + xlab("Year") + ylab("Liquor Sales") + ggtitle("Monthly Liquor Sales Per Day (1980-1989) with 2-yr Forecasts")

As we know what actually happened in 1990-1992 we can compare these forecasts to what actually occurred.

yact = window(LiqSales,start=1990,end=c(1991,12))length(yact)

## [1] 24

## Next 24 months of actual liquor sales per dayerror = sales.fc$mean - yactRMSEP = sqrt(mean(error^2))RMSEP

## [1] 2.48675

MAEP = mean(abs(error))MAEP

## [1] 2.285425

MAPEP = mean(abs(error)/yact)*100MAPEP

## [1] 6.409714

error.ba = sales.fc.ba$mean - yactRMSEP = sqrt(mean(error.ba^2))RMSEP

## [1] 2.506994

MAEP = mean(abs(error.ba))MAEP

## [1] 2.305786

MAPEP = mean(abs(error.ba)/yact)*100MAPEP

## [1] 6.465179

How does the forecast performance differ between the bias-adjusted and non-adjusted forecasts?

As there was curvature present in a plot of the residuals vs. the fitted values from our model, perhaps we should consider adding a squared trend term to our current model.

logmod2 = tslm(Liq.sub~poly(trend,2)+season,lambda=0)summary(logmod2)

## ## Call:## tslm(formula = Liq.sub ~ poly(trend, 2) + season, lambda = 0)## ## Residuals:## Min 1Q Median 3Q Max ## -0.062946 -0.020640 -0.003957 0.016998 0.067941 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.125513 0.008782 355.884 < 2e-16 ***## poly(trend, 2)1 2.220006 0.027881 79.623 < 2e-16 ***## poly(trend, 2)2 -0.149822 0.027744 -5.400 4.11e-07 ***## season2 0.018952 0.012407 1.527 0.129622 ## season3 0.019238 0.012408 1.551 0.124002 ## season4 0.044071 0.012409 3.552 0.000573 ***## season5 0.092568 0.012411 7.459 2.52e-11 ***## season6 0.118061 0.012413 9.511 7.06e-16 ***## season7 0.127346 0.012415 10.257 < 2e-16 ***## season8 0.100561 0.012418 8.098 1.01e-12 ***## season9 0.070802 0.012421 5.700 1.09e-07 ***## season10 0.073510 0.012425 5.917 4.08e-08 ***## season11 0.143577 0.012429 11.552 < 2e-16 ***## season12 0.420922 0.012433 33.855 < 2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 0.02774 on 106 degrees of freedom## Multiple R-squared: 1, Adjusted R-squared: 1 ## F-statistic: 4.661e+05 on 13 and 106 DF, p-value: < 2.2e-16

checkresiduals(logmod2)

## ## Breusch-Godfrey test for serial correlation of order up to 24## ## data: Residuals from Linear regression model## LM test = 52.079, df = 24, p-value = 0.000764

Let’s compare forecast accuracy for the monthly sales per day in 1990-1992 for this model to our linear trend model logmod1.

sales.fc2 = forecast(logmod2,h=24,lambda=0,biasadj=F)autoplot(sales.fc2) + xlab("Year") + ylab("Liquor Sales") + ggtitle("Monthly Liquor Sales Per Day (1980-1989) with 2-yr Forecasts")

error = sales.fc2$mean - yactRMSEP = sqrt(mean(error^2))RMSEP

## [1] 1.032942

MAEP = mean(abs(error))MAEP

## [1] 0.8213816

MAPEP = mean(abs(error)/yact)*100MAPEP

## [1] 2.335869

CV(logmod1)

## CV AIC AICc BIC AdjR2 ## 1.094043e-03 -8.180711e+02 -8.140711e+02 -7.790462e+02 9.999759e-01

CV(logmod2)

## CV AIC AICc BIC AdjR2 ## 8.727766e-04 -8.452350e+02 -8.406197e+02 -8.034227e+02 9.999804e-01

## CV, AIC, AICc, BIC, and Adj-R2 are all better for quadratic trend model!!!

Alternatively we can also use the accuracy function to compute the different performance metrics by supplying both the forecasted values and the actual values of the time series.

accuracy(sales.fc2$mean,yact)

## ME RMSE MAE MPE MAPE ACF1## Test set -0.2995717 1.032942 0.8213816 -1.052936 2.335869 -0.3161224## Theil's U## Test set 0.1758503

2