arima part 2course1.winona.edu/bdeppa/fin 335/handouts/arima (part 2... · web viewif c≠0 and d=1...

63
ARIMA part 2 Rob Hyndman (with Deppa additions) April 5, 2021 Table of Contents 8.2 - Backshift Notation for Differencing.....................2 8.3 - Autoregressive Models ( AR ( p))............................3 8.4 - Moving Average Models ( MA ( q))............................4 8.5 - Non-seasonal ARIMA Models...............................6 Example 8.1 - Google Closing Price..........................7 Example 8.2 - Murders per 100,000 Women in U.S. (1950-2004) 10 Example 8.3 - Number of Small Construction Loans per Week. .13 Understanding ARIMA Models.................................18 ACF and PACF Plots......................................... 19 Example 8.3 (cont’d)....................................... 20 Example 8.2 (cont’d)....................................... 21 Using the ACF and PACF to help choose ARIMA(p,d,q)...........22 Example 8.3 (cont’d)....................................... 23 Example 8.4 - Internet Usage per Minute....................24 Example 8.5 - U.S. Monthly Industrial Production Index (2010- present)................................................... 27 Example 8.6 - Ramsey County Unemployment...................32 8.6 - Estimation and Order Selection.........................37 Maximum Likelihood Estimation..............................38 Information Criteria (AIC, AICc, and BIC)..................38 8.7 - ARIMA Modeling in R (auto.arima).......................39 Hyndman-Khandakar algorithm for automatic ARIMA modelling. .39 Choosing Your Own Model....................................40 Modelling Procedure........................................ 40 1

Upload: others

Post on 16-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

ARIMA part 2

ARIMA part 2

Rob Hyndman (with Deppa additions)

April 5, 2021

Table of Contents8.2 - Backshift Notation for Differencing28.3 - Autoregressive Models 38.4 - Moving Average Models 48.5 - Non-seasonal ARIMA Models6Example 8.1 - Google Closing Price7Example 8.2 - Murders per 100,000 Women in U.S. (1950-2004)10Example 8.3 - Number of Small Construction Loans per Week13Understanding ARIMA Models18ACF and PACF Plots19Example 8.3 (cont’d)20Example 8.2 (cont’d)21Using the ACF and PACF to help choose ARIMA(p,d,q)22Example 8.3 (cont’d)23Example 8.4 - Internet Usage per Minute24Example 8.5 - U.S. Monthly Industrial Production Index (2010-present)27Example 8.6 - Ramsey County Unemployment328.6 - Estimation and Order Selection37Maximum Likelihood Estimation38Information Criteria (AIC, AICc, and BIC)388.7 - ARIMA Modeling in R (auto.arima)39Hyndman-Khandakar algorithm for automatic ARIMA modelling39Choosing Your Own Model40Modelling Procedure40Example 8.7 - Electrical Equipment Orders (seasonally adjusted)428.8 - Forecasting (read this section in online test)46

8.2 - Backshift Notation for Differencing

The backshift operator is a useful notational device when working with time series lags which are used in forming differences. A power denotes the lag, i.e. means lag .

Below are some common situations.

In other words, , operating on , has effect of shifting the data back time periods. Two applications of is the equivalent to , i.e. lag 2.

The backshift operator is convenient with describing the process of differencing discussed in section 8.1. A first difference can be written as

Notice that the first difference is represented by . Similarly, if second-order differences have to be computed, then

In general, a -order diference can be written as

Backshift notation is very useful when combining differences as the other operator can be treated using basic algebraic rules. In particular, terms involving can be multiplied together. For example, in a few of the examples in section 8.1, we first performed a seasonal difference , i.e. with monthly time series , followed by a first difference . This situation can be written as,

the same result we obtained earlier.

8.3 - Autoregressive Models

In a multiple regression model, we forecast a time series of interest using a linear combination of predictors. For example, if we fit multiple regression model that has a quadratic time trend plus monthly dummy variables, i.e. tslm(y~poly(trend,2)+season), as we did in Chapter 4, then we are assuming the following multiple regression model.

In this model, all of the predictors/terms in our model are based on time .

In an autoregressive model, we forecast the variable of interest using a linear combination of past values of the time series. The term autoregression indicates that it is a regression of the response against previous values of itself.

Thus, an autoregressive model of order can be written as

where , i.e. is white noise. This is like a multiple regression model but with lagged values of the response as predictors/terms. We refer to this as model as an model, an autoregressive model of order .

Autoregressive models are remarkably flexible at handling a wide range of different time series patterns. In the two series in the figure below, we show series from an AR(1) model and an AR(2) model. Changing the parameters results in different time series patterns. The variance of the error term will only change the scale of the series, NOT the patterns.

For an model:

· when , is equivalent to white noise with mean ;

· when and , is equivalent to a random walk;

· when and , is equvalent to a random walk with drift;

· when , tends to oscillate between positive and negative values.

We normally restrict autoregressive models to stationary data, in which case some constraints on the values of the parameters are required.

· For an model: .

· For an model: .

When , the restrictions are much more complicated. takes care of these restrictions automatically when estimating the model parameters, i.e. the parameter estimates will satisfy the necessary constraints for our chosen autoregressive order .

8.4 - Moving Average Models

Rather than using past values of the forecast variable in a regression, a moving average model uses past forecast errors in a regression-like model.

where , i.e. is white noise. We refer to this as an model, a moving average model of order . Of course, we do not observe the values of so it is not really a regression in the usual sense.

Notice that each value of can be thought of as a weighted moving average of the past few forecast errors. However, moving average models SHOULD NOT be confused with the moving average smoothing we discussed in Chapter 6. A moving average model is used for forecasting future values, while moving average smoothing (Ch. 6) is used for estimating the trend-cycle component of past values.

The figure above shows some data from an model and from an model. Changing the parameters results in different time series patterns. As with autoregressive models, the variance of the error term will only change the scale of the time series, not the patterns exhibited.

It is possible to write any stationary model as an model. For example, using repeated substitution, we can demonstrate this for an model:

Provided , the value of will get smaller as gets larger. So eventually we obtain

an process.

The reverse result holds if we impose some constraints on the MA parameters. Then the MA model is called “invertible”. That is, we can write any invertible process as an process. Invertible models are not simply introduced to enable us to convert from MA models to AR models. They also have some desirable mathematical properties.

For example, consider the process, . In its representation, the most recent error can be written as a linear function of current and past observations:

When , the weights increase as lags increase, so the more distant the observations the greater the influence on the current error. When , the weights are constant in size, and the distant observations have the same influence as the recent observations. As neither of these situations make any sense, we require , so the most recent observations have higher weight than the observations in the distant past. Thus, the process is invertible when .

The invertiblility constraints for other models are similar to the stationarity constraints.

· For an model: .

· For an model: , , .

For complicated conditions hold for . Again, will take care of these constraints automatically when estimating the models.

8.5 - Non-seasonal ARIMA Models

If we combine differencing with autoregression and a moving average model, we obtain a non-seasonal ARIMA model. ARIMA is an acronym for AutoRegressive Integrated Moving Average (in this context, “integration” is the reverse of differencing). The full model can be written as

where is the differenced series, keeping in mind it may have to be differenced more than once. The “predictors” on the right hand side include both lagged values of and lagged errors. We call this an model, where

· order of the autoregressive (AR) part

· degree of first differencing involved to make the series near stationary or generally.

· order of the moving averages (MA) part.

The same stationarity and invertibility conditions that are used for autoregressive and moving average models also apply to an ARIMA model.

Many of the models we have already discussed are special cases of the ARIMA model, as shown in the following table.

Once we start combining components in this way to form more complicated models, it is much easier to work with the backshift notation. For example, the ARIMA general equation above can be written in backshift notation as

uses a slightly different parameterization:

where and is the mean of . To convert to the form given above, set .

Selecting appropriate values for , , and can be difficult. However, the auto.arima() function in will do it for you automatically. In Section 8.7, we will learn how this function works, along with some methods for choosing these values yourself.

Example 8.1 - Google Closing Price

goog200 = ts(goog200,start=1)autoplot(goog200) + xlab("Day") + ylab("Google Closing Price")

ggAcf(goog200)

# Compute the first order difference and assign it to googdiffgoogdiff = diff(goog200)autoplot(googdiff) + xlab("Day") + ylab("Change in Closing Price from Previous Day")

ggAcf(googdiff)

goog.arima = auto.arima(goog200)summary(goog.arima)

## Series: goog200 ## ARIMA(0,1,0) with drift ## ## Coefficients:## drift## 0.6967## s.e. 0.4373## ## sigma^2 estimated as 38.25: log likelihood=-644.45## AIC=1292.91 AICc=1292.97 BIC=1299.5## ## Training set error measures:## ME RMSE MAE MPE MAPE MASE## Training set 0.001960665 6.153549 3.807244 -0.01512911 0.8591933 1.01779## ACF1## Training set -0.06043606

goog.fc = forecast(goog.arima,h=14)autoplot(goog.fc)

For the Google stock prices an model with drift (i.e. with a constant term ) is suggested by auto.arima().

Example 8.2 - Murders per 100,000 Women in U.S. (1950-2004)

autoplot(wmurders) + xlab("Year") + ylab("Murders per 100,000 women")

ggtsdisplay(wmurders)

ggtsdisplay(diff(wmurders))

ggtsdisplay(diff(diff(wmurders)))

ndiffs(wmurders)

## [1] 2

wm.arima = auto.arima(wmurders)summary(wm.arima)

## Series: wmurders ## ARIMA(1,2,1) ## ## Coefficients:## ar1 ma1## -0.2434 -0.8261## s.e. 0.1553 0.1143## ## sigma^2 estimated as 0.04632: log likelihood=6.44## AIC=-6.88 AICc=-6.39 BIC=-0.97## ## Training set error measures:## ME RMSE MAE MPE MAPE MASE## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996## ACF1## Training set 0.02176343

wm.fc = forecast(wm.arima,h=5)autoplot(wm.fc)

wm.fc

## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95## 2005 2.470660 2.194836 2.746484 2.0488240 2.892496## 2006 2.363106 1.986351 2.739862 1.7869082 2.939304## 2007 2.252833 1.765391 2.740276 1.5073540 2.998313## 2008 2.143222 1.546366 2.740078 1.2304099 3.056035## 2009 2.033450 1.323819 2.743081 0.9481634 3.118737

Here an is recommended.

Example 8.3 - Number of Small Construction Loans per Week

A bank has recorded the number of small construction loan applications per week for 2-yrs.

Loans = read.csv(file="http://course1.winona.edu/bdeppa/FIN%20335/Datasets/Loan%20Applications.csv")names(Loans)

## [1] "Week" "LoanApps"

LoanApps = ts(Loans$LoanApps,start=0,frequency=52)autoplot(LoanApps) + xlab("Year") + ylab("Num of Loan Apps") + ggtitle("Weekly Small Construction Loan Applications")

ggtsdisplay(LoanApps)

Notice that there appears to slight downward trend in the number of weekly loan applications over time. This suggests that the time series is not stationary. Remember a stationary time series has a mean and variance that are roughly constant over time, which does not appear to be the case here. Thus, we should consider differencing as part of the ARIMA modeling process.

autoplot(diff(LoanApps)) + ggtitle("First Difference = (1-B)yt")

ggtsdisplay(diff(LoanApps))

autoplot(diff(diff(LoanApps))) + ggtitle("Second Order Difference = (1-B)(1-B)yt")

ggtsdisplay(diff(diff(LoanApps)))

ndiffs(LoanApps)

## [1] 1

We will use the auto.arima() function to identify the “optimal” values for . In the next section we will examine some guidelines for choosing these values by examining the ACF and PACF.

Loans.ARIMA = auto.arima(LoanApps)summary(Loans.ARIMA)

## Series: LoanApps ## ARIMA(1,1,0) ## ## Coefficients:## ar1## -0.5774## s.e. 0.0813## ## sigma^2 estimated as 43.29: log likelihood=-339.9## AIC=683.8 AICc=683.92 BIC=689.07## ## Training set error measures:## ME RMSE MAE MPE MAPE MASE## Training set -0.1181553 6.515974 5.120321 -0.8195575 7.647203 0.5220719## ACF1## Training set -0.01568254

checkresiduals(Loans.ARIMA)

## ## Ljung-Box test## ## data: Residuals from ARIMA(1,1,0)## Q* = 80.278, df = 102, p-value = 0.9449## ## Model df: 1. Total lags used: 103

Here an model was chosen, i.e. an model with first order differencing. We can then make forecasts for the next time periods (e.g. ).

loan.fc = forecast(Loans.ARIMA,h=10)loan.fc

## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95## 2.000000 60.69051 52.25848 69.12253 47.79483 73.58618## 2.019231 62.02395 52.86980 71.17809 48.02389 76.02400## 2.038462 61.25405 50.09912 72.40899 44.19405 78.31406## 2.057692 61.69857 49.57380 73.82333 43.15534 80.24180## 2.076923 61.44192 48.04901 74.83483 40.95923 81.92461## 2.096154 61.59010 47.24201 75.93819 39.64659 83.53361## 2.115385 61.50454 46.15271 76.85638 38.02594 84.98315## 2.134615 61.55394 45.31963 77.78826 36.72570 86.38218## 2.153846 61.52542 44.42170 78.62915 35.36753 87.68331## 2.173077 61.54189 43.62881 79.45496 34.14621 88.93757

autoplot(loan.fc) + ggtitle("Weekly Loans with a 10-week Forecast") + xlab("Year")

Understanding ARIMA Models

The auto.arima() function is very useful, but anything automated can be a little dangerous, and it is worth understanding something of the behavior of the models even when you rely on an automatic procedure to choose the model for you.

The constant has an important effect on the long-term forecasts obtained from these models.

· If and , then the long-term forecasts will go to zero.

· If and , the long-term forecasts will go to a non-zero constant.

· If and , the long-term forecasts will follow a straight line.

· If and , the long-term forecasts will go to the mean of the data.

· If and , the long-term forecasts will follow a straight line.

· If and , the long-term forecasts will follow a quadratic trend.

The value of also has an effect on prediction intervals - the higher the value of , the more rapidly the prediction intervals increase in size, i.e., width. For , the long-term forecasts standard deviation will go to the standard deviation of the historical data, so the prediction intervals will not change in width.

The value of is important if the data show cycles. To obtain cyclic forecasts, it is necessary to have , along with some additional conditions on the parameters. For an model, cyclic behaviour occurs if . In that case, the average period of the cycles is

ACF and PACF Plots

It is usually not possible to tell, simply from a time plot, what values of and are appropriate for the data. However, it is sometimes possible to use the ACF plot, and the closely related PACF plot, to determine appropriate values for and .

Recall the ACF plot shows the autocorrelations which measure the strength and direction of the linear relationship between and for different values of . Now if and are correlated, then and must also be correlated. However, then and might be correlated, simply because they are both connected to , rather than because of any new information contained in that could be used in forecasting .

To overcome this problem, we can use partial autocorrelations. These measure the strength of the relationship between and after removing the effects of lags . Thus the first partial autocorrelation is identical to the first autocorrelation, because there is nothing between them to remove. Each partial autocorrelation can be estimated as the last coefficient in an autoregressive model. Specifically, , the partial autocorrelation coefficient, is equal to the estimate of in an model. In practice, there are more efficient algorithms for computing than fitting all of the autoregressions, but they give the same results.

The ACF and PACF for the loan application data in Example 8.3 are shown below. The partial autocorrelations have the same critical values of as for the ordinary autocorrelations in the ACF plot.

Example 8.3 (cont’d)

ggAcf(LoanApps) + ggtitle("ACF Plot for Loan Applications")

ggPacf(LoanApps) + ggtitle("PACF Plot for Loan Applications")

The ACF and PACF for the twice differenced female murders per 100,000 women in the U.S. are shown below.

Example 8.2 (cont’d)

ggAcf(diff(diff(wmurders)))

ggPacf(diff(diff(wmurders)))

Using the ACF and PACF to help choose ARIMA(p,d,q)

If the data are from an or model, then the ACF and PACF plots can be helpful in determining the value or . If and are both positive, then the plots do not necessarily help in finding suitable values of and .

If the data may be well modeled by an model if the ACF and PACF plots of the differenced data show the following patterns:

· the ACF is exponentially decaying or sinusoidal;

· there is a significant spike at lag in the PACF, but none beyond lag .

If the data follows an model, then the ACF and PACF plots of the differenced data will generally show the following patterns:

· the PACF is exponentially decaying or sinusoidal;

· there is a significant spike at lag in the ACF, but none beyond lag .

Example 8.3 (cont’d)

Consider again the loan applications data. We saw that the first differenced time series, i.e. was reasonably stationary, thus we can use the ACF and PACF of this differenced time series to choose possible values for or .

ggAcf(diff(LoanApps))

ggPacf(diff(LoanApps))

The ACF has significant spikes at lags 1 and 2 seems to decay somewhat exponentially. The PACF has a significant spike at lag and none thereafter. Thus, using the guidelines above an model is suggested, i.e. an model applied to the first differenced time series.

Example 8.4 - Internet Usage per Minute

The time series WWWusage is a time series consisting of the number of users connected to the Internet through a server every minute. This time series is included with the fpp2 library.

autoplot(WWWusage) + xlab("Time (minutes)") + ylab("Number of Users") + ggtitle("Internet Users per Minute")

ggtsdisplay(diff(WWWusage))

While the first differenced series does not look perfectly stationary, the differenced series does appear to have a constant mean and variation. Also we see the ACF and PACF have properties that are desirable when choosing the order of the ARIMA model. Here we see that the ACF decays sinusoidally and the PACF cuts off after lag . Thus using the guidelines above a reasonable model for this series might be . We can use the function Arima() to fit a specific model through the use of the order=c(p,d,q) option. An example of this shown below.

usage.arima = Arima(WWWusage,order=c(3,1,0))summary(usage.arima)

## Series: WWWusage ## ARIMA(3,1,0) ## ## Coefficients:## ar1 ar2 ar3## 1.1513 -0.6612 0.3407## s.e. 0.0950 0.1353 0.0941## ## sigma^2 estimated as 9.656: log likelihood=-252## AIC=511.99 AICc=512.42 BIC=522.37## ## Training set error measures:## ME RMSE MAE MPE MAPE MASE## Training set 0.230588 3.044632 2.367157 0.2748377 1.890528 0.5230995## ACF1## Training set -0.003095066

usage.auto = auto.arima(WWWusage)summary(usage.auto)

## Series: WWWusage ## ARIMA(1,1,1) ## ## Coefficients:## ar1 ma1## 0.6504 0.5256## s.e. 0.0842 0.0896## ## sigma^2 estimated as 9.995: log likelihood=-254.15## AIC=514.3 AICc=514.55 BIC=522.08## ## Training set error measures:## ME RMSE MAE MPE MAPE MASE## Training set 0.3035616 3.113754 2.405275 0.2805566 1.917463 0.5315228## ACF1## Training set -0.01715517

Couple things to notice with this example:

· Our guidelines suggest an model whereas auto.arima chooses .

· The AIC and AICc suggest the model is better, whereas the BIC suggests the model.

· In many cases there are several potential ARIMA models that will be reasonable.

fc1 = forecast(usage.arima,h=24)fc2 = forecast(usage.auto,h=24)autoplot(WWWusage) + autolayer(fc1$mean,series="ARIMA(3,1,0)") + autolayer(fc2$mean,series="ARIMA(1,1,1)") + guides(colour=guide_legend("ARIMA Order"))

autoplot(fc1) + ggtitle("Internet Users per Minutes - ARIMA(3,1,0) Forecast (h=24)")

There are subtle differences between the forecasts. We could potentially use cross-validation to help choose between several ARIMA models. We can also see from the plot of the forecasts for the model that the prediction intervals get very wide as the forecast horizon increases.

Example 8.5 - U.S. Monthly Industrial Production Index (2010-present)

The seasonally adjusted monthly industrial production index (2012 = 100) is available on FRED (https://fred.stlouisfed.org/series/INDPRO). We will use ARIMA methods to forecast this index for the next 12 months.

IPI = read.csv(file="http://course1.winona.edu/bdeppa/FIN%20335/Datasets/Industrial%20Production%20Index%20(2010-present).csv")names(IPI)

## [1] "DATE" "INDPRO"

Indus = ts(IPI$INDPRO,start=2010,frequency=12)Indus

## Jan Feb Mar Apr May Jun Jul## 2010 91.6849 92.0073 92.6104 92.9600 94.2988 94.4319 94.8360## 2011 95.9363 95.5147 96.4651 96.1325 96.3347 96.5889 97.1011## 2012 99.3672 99.6443 99.1594 99.9173 100.0956 100.0478 100.3144## 2013 100.8779 101.4265 101.8186 101.6950 101.7517 101.9486 101.4460## 2014 102.7053 103.6016 104.5893 104.7423 105.0571 105.4084 105.5609## 2015 105.8772 105.4193 105.0856 104.5604 104.0675 103.6891 104.2443## 2016 103.0314 102.3429 101.5415 101.7479 101.6011 101.9476 102.1435## 2017 102.5393 102.1574 102.7236 103.7148 103.7121 103.7710 103.6206## 2018 105.5132 106.5109 ## Aug Sep Oct Nov Dec## 2010 95.1423 95.3500 95.0991 95.1376 96.0339## 2011 97.6642 97.6132 98.2931 98.2325 98.7699## 2012 99.9036 99.8944 100.1192 100.6100 100.9267## 2013 102.1758 102.6774 102.5438 102.8625 103.1747## 2014 105.4797 105.7908 105.8154 106.6630 106.5032## 2015 104.1318 103.7281 103.3569 102.7323 102.2696## 2016 102.0654 101.9304 102.0557 101.8293 102.7877## 2017 103.1956 103.1760 104.7870 105.2726 105.7214## 2018

autoplot(Indus) + xlab("Year") + ylab("Industrial Production Index (2012 = 100)") + ggtitle("U.S. Monthly Industrial Production Index (2010 - present)")

ggtsdisplay(Indus)

Clearly this time series is not stationary. Rather than use auto.arima to find a suitable ARIMA model, we will use differencing and ACF/PACF plots in attempts to find the model using the guidelines above. We will then check our choice, with that obtained from auto.arima.

require(tseries)

## Loading required package: tseries

ggtsdisplay(diff(Indus))

kpss.test(diff(Indus))

## ## KPSS Test for Level Stationarity## ## data: diff(Indus)## KPSS Level = 0.44123, Truncation lag parameter = 2, p-value =## 0.05938

ndiffs(Indus)

## [1] 1

The ACF/PACF both have few significant lags, and none within the first few. It appears that lag 6 and lag 24 are significant, though neither is large. Also keep in mind that we expect 1 in 20 lags to be significant by chance (5% significance). Thus, we might simply opt for an model with drift (i.e. add a constant term to the model) to handle the trends in this time series. The KPSS test for stationarity yields provides marginal evidence of stationarity. The ndiffs command confirms that first-order differencing, , should be sufficient.

indus.arima = Arima(Indus,order=c(0,1,0),include.drift=TRUE)summary(indus.arima)

## Series: Indus ## ARIMA(0,1,0) with drift ## ## Coefficients:## drift## 0.1528## s.e. 0.0484## ## sigma^2 estimated as 0.2299: log likelihood=-65.81## AIC=135.61 AICc=135.74 BIC=140.76## ## Training set error measures:## ME RMSE MAE MPE MAPE## Training set 0.0009340001 0.4745112 0.3832641 0.003114376 0.3778526## MASE ACF1## Training set 0.1574995 0.05761287

checkresiduals(indus.arima)

## ## Ljung-Box test## ## data: Residuals from ARIMA(0,1,0) with drift## Q* = 28.763, df = 23, p-value = 0.1883## ## Model df: 1. Total lags used: 24

Box.test(residuals(indus.arima),type="Lj")

## ## Box-Ljung test## ## data: residuals(indus.arima)## X-squared = 0.33535, df = 1, p-value = 0.5625

indus.fc = forecast(indus.arima,h=12)autoplot(indus.fc) + xlab("Year") + ylab("Industrial Production Index (2012 = 100)")

Let’s use auto.arima on the same time series and see what it gives.

indus.auto = auto.arima(Indus)summary(indus.auto)

## Series: Indus ## ARIMA(0,1,0) with drift ## ## Coefficients:## drift## 0.1528## s.e. 0.0484## ## sigma^2 estimated as 0.2299: log likelihood=-65.81## AIC=135.61 AICc=135.74 BIC=140.76## ## Training set error measures:## ME RMSE MAE MPE MAPE## Training set 0.0009340001 0.4745112 0.3832641 0.003114376 0.3778526## MASE ACF1## Training set 0.1574995 0.05761287

Here we can see the same model was chosen by the auto.arima function.

Example 8.6 - Ramsey County Unemployment

In this example we revisit the monthly Ramsey County unemployment rates from 2000 to present we first considered in Chapter 2. As this time series is seasonal, we will first use X-13 or SEATS to obtain the seasonally adjusted time series. We will then use non-seasonal ARIMA to forecast the underlying trend without the seasonality.

Ramsey = read.csv(file="http://course1.winona.edu/bdeppa/FIN%20335/Datasets/Ramsey%20Unemployment.csv")names(Ramsey)

## [1] "DATE" "Month" "Year" "UNRATE"

Unemp = ts(Ramsey$UNRATE,start=2000,frequency=12)autoplot(Unemp) + xlab("Year") + ylab("Unemployment Rate (%)") + ggtitle("Unemployment Rate in Ramsey County, MN (2000-present)")

First we will create a seasonally adjusted version of this time series.

require(seasonal)

## Loading required package: seasonal

ram.seas = seas(Unemp)Unemp.SA = seasadj(ram.seas)autoplot(Unemp.SA) + xlab("Year") + ylab("Seasonally Adjusted Unemployment Rate (%)") + ggtitle("Seasonally Adjusted Unemployment Rate - Ramsey County,MN")

We will now use ARIMA to forecast the seasonally-adjusted unemployment rate for the next months.

ggtsdisplay(diff(Unemp.SA))

ndiffs(Unemp.SA)

## [1] 2

ggtsdisplay(diff(diff(Unemp.SA)))

The first-differenced series does not appear stationary as there are numerous significant lags. The ndiffs command suggests second-order differencing is needed to achieve near stationarity. Unfortunately the ACF/PACF plots do not give clear evidence that our model is either purely AR or MA. The ACF appears to cut off at 1, but there are several lags past 1 that are significant. The PACF sort of decays exponentially, but there are numerous lags in first 5 or so that are significant. This usually indicates that both and are non-zero. A good first step is to use and then try increasing each by 1 individually or together. Here that would mean comparing the models , , and . Of these the first model has the lowest AIC, AICc, and BIC.

mod1 = Arima(Unemp.SA,order=c(1,2,1))mod2 = Arima(Unemp.SA,order=c(2,2,1))mod3 = Arima(Unemp.SA,order=c(1,2,2))mod4 = Arima(Unemp.SA,order=c(2,2,2))summary(mod1)

## Series: Unemp.SA ## ARIMA(1,2,1) ## ## Coefficients:## ar1 ma1## -0.2914 -0.8595## s.e. 0.0733 0.0456## ## sigma^2 estimated as 0.03591: log likelihood=52.14## AIC=-98.29 AICc=-98.17 BIC=-88.2## ## Training set error measures:## ME RMSE MAE MPE MAPE## Training set -0.004055533 0.1877164 0.1270995 -0.05321314 2.738322## MASE ACF1## Training set 0.1809917 -0.005542127

summary(mod2)

## Series: Unemp.SA ## ARIMA(2,2,1) ## ## Coefficients:## ar1 ar2 ma1## -0.2916 -0.0003 -0.8594## s.e. 0.0885 0.0855 0.0569## ## sigma^2 estimated as 0.03608: log likelihood=52.14## AIC=-96.29 AICc=-96.09 BIC=-82.84## ## Training set error measures:## ME RMSE MAE MPE MAPE## Training set -0.004055007 0.1877164 0.1271034 -0.05321738 2.738383## MASE ACF1## Training set 0.1809973 -0.005454118

summary(mod3)

## Series: Unemp.SA ## ARIMA(1,2,2) ## ## Coefficients:## ar1 ma1 ma2## -0.2909 -0.8599 0.0004## s.e. 0.1912 0.1933 0.1825## ## sigma^2 estimated as 0.03608: log likelihood=52.14## AIC=-96.29 AICc=-96.09 BIC=-82.84## ## Training set error measures:## ME RMSE MAE MPE MAPE MASE## Training set -0.004055276 0.1877164 0.1271 -0.05321224 2.738322 0.1809924## ACF1## Training set -0.00555466

summary(mod4)

## Series: Unemp.SA ## ARIMA(2,2,2) ## ## Coefficients:## ar1 ar2 ma1 ma2## -1.2572 -0.3107 0.1099 -0.8208## s.e. 0.0787 0.0731 0.0577 0.0571## ## sigma^2 estimated as 0.03586: log likelihood=53.22## AIC=-96.44 AICc=-96.15 BIC=-79.64## ## Training set error measures:## ME RMSE MAE MPE MAPE## Training set -0.003984072 0.1867173 0.1269675 -0.05224814 2.738366## MASE ACF1## Training set 0.1808037 0.002054258

unemp.fc = forecast(mod1,h=12)autoplot(unemp.fc) + xlab("Year") + ylab("Seasonally Adjusted Unemployment Rate (%)")

Let’s compare this model to one chosen using auto.arima().

AAmod = auto.arima(Unemp.SA)

summary(AAmod)

## Series: Unemp.SA

## ARIMA(0,2,3)

##

## Coefficients:

## ma1 ma2 ma3

## -1.1587 0.3889 -0.1364

## s.e. 0.0686 0.1085 0.0766

##

## sigma^2 estimated as 0.03585: log likelihood=52.78

## AIC=-97.57 AICc=-97.38 BIC=-84.12

##

## Training set error measures:

## ME RMSE MAE MPE MAPE

## Training set -0.004294191 0.1871193 0.1265878 -0.05448621 2.738976

## MASE ACF1

## Training set 0.1802631 -0.0002198912

We see that the model we chose, ARIMA(1,2,1), is superior to the one selected by auto.arima based on the AIC, AICc, and BIC.

8.6 - Estimation and Order SelectionMaximum Likelihood Estimation

Once the model order has been identified (i.e., the values of and , we need to estimate the parameters . When R estimates the ARIMA model, it uses maximum likelihood estimation (MLE). This technique finds the values of the parameters which maximize the probability of obtaining the data that we have observed. For ARIMA models, MLE is very similar to the least squares estimates that would be obtained by minimizing

(For the regression models considered in Chapter 5, MLE gives exactly the same parameter estimates as least squares estimation.) Note that ARIMA models are much more complicated to estimate than regression models, and different software will give slightly different answers as they use different methods of estimation, and different optimization algorithms.

In practice, R will report the value of the log likelihood of the data; that is, the logarithm of the probability of the observed data coming from the estimated model. For given values of and . R will try to maximize the log likelihood when finding parameter estimates.

Information Criteria (AIC, AICc, and BIC)

Akaike’s Information Criterion (AIC), which was useful in selecting predictors for regression, is also useful for determining the order of an ARIMA model. It can be written as

where is the likelihood of the data, if and if . Note that the last term in parentheses is the number of parameters in the model (including , the variance of the residuals).

For ARIMA models, the corrected AIC can be written as

and the Bayesian Information Criteria (BIC) can be written as

As with other modeling methods, good models are obtained by minimizing the AIC, AICc or BIC. Our preference has been and continues to be the AICc.

It is important to note that these information criteria tend not to be good guides for selecting the appropriate order of differencing () of a model, but only for selecting the values of and . This is because the differencing changes the data on which the likelihood is computed, making the AIC values between models with different orders of differencing not comparable. So we need to use some other approach to choose , and then we can use the AICc to select and .

8.7 - ARIMA Modeling in R (auto.arima)

The auto.arima() function in R uses a variation of the Hyndman-Khandakar algorithm (R. J. Hyndman and Khandakar 2008), which combines unit root tests, minimization of the AICc and MLE to obtain an ARIMA model. The algorithm follows these steps.

Hyndman-Khandakar algorithm for automatic ARIMA modelling

1. The number of differences is determined using repeated KPSS tests (tseries library - kpss.test).

1. The values of and are then chosen by minimizing the AICc after differencing the time series times. Rather than considering every possible combination of and , the algorithm uses a stepwise search to traverse the model space.

1. Four initial models are fitted:

·

·

·

·

A constant is included unless . If an additional model is also fitted:

·

1. The best model (with the smallest AICc value) fitted in step (a) is set to be the “current model”.

1. Variations on the current model are considered:

· vary and/or from the current model by ;

· include/exclude from the current model. The best model considered so far (either the current model or one of these variations) becomes the new current model.

1. Repeat Step 2(c) until no lower AICc can be found.

The arguments to auto.arima() provide for many variations on the algorithm. What is described here is the default behavior.

The default procedure uses some approximations to speed up the search. These approximations can be avoided with the argument approximation=FALSE. It is possible that the minimum AICc model will not be found due to these approximations, or because of the use of a stepwise procedure. A much larger set of models will be searched if the argument stepwise=FALSE is used. See the help file for a full description of the arguments.

Choosing Your Own Model

If you want to choose the model yourself, use the Arima() function in R, as was done in a few of the examples above. There is another function arima() in R which also fits an ARIMA model. However, it does not allow for the constant unless , and it does not return everything required for other functions in the forecast package to work. Finally, it does not allow the estimated model to be applied to new data (which is useful for checking forecast accuracy). Consequently, it is recommended that Arima() always be used instead.

Modelling Procedure

When fitting an ARIMA model to a set of (non-seasonal) time series data, the following procedure provides a useful general approach.

1. Plot the data and identify any unusual observations.

1. If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.

1. If the data are non-stationary, take first differences of the data until the data are stationary.

1. Examine the ACF/PACF: Is an or an model appropriate?

1. Try your chosen model(s), and use the AICc to search for a better model.

1. Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a Portmanteau test of the residuals, i.e. checkresiduals(). If they do not look like white noise, try a modified model.

1. Once the residuals look like white noise, calculate forecasts.

The Hyndman-Khandakar algorithm only takes care of steps 3-5. So even if you use it, you will still need to take care of the other steps yourself.

The process for ARIMA modelling is summarized in the diagram below:

Example 8.7 - Electrical Equipment Orders (seasonally adjusted)

The time series elecequip in the fpp2 library contains and index of electrical equipment orders. More specifically it is an index of the monthly manufacture of electrical equipment: computer, electronic and optical products. January 1996 - March 2012. Data adjusted by working days; Euro area (17 countries, 2005=100).

require(fpp2)require(seasonal)# Use X-13 SEATS to decompose the time seriesee.seas = seas(elecequip)autoplot(ee.seas,facet=T)

# Extract seasonally adjusted times series from X-13 SEATS resultsee.sa = seasadj(ee.seas)autoplot(ee.sa) + xlab("Year") + ylab("Electrical Equipment Orders (2005=100)") + ggtitle("Seasonally Adjusted Electrical Equipment Orders")

Now that we have a seasonally adjusted time series, we will use first differencing in an attempt to achieve stationarity.

ggtsdisplay(diff(ee.sa))

1. The time plot shows some sudden changes, particularly the big drop in 2008/2009. These changes are due to the global economic environment. Otherwise there is nothing unusual about the time plot and there appears to be no need to do any data adjustments.

1. There is no evidence of changing variance, so we will not do a Box-Cox transformation.

1. The data are clearly non-stationary, as the series wanders up and down for long periods. Consequently, we will take a first difference of the data. The differenced data are plotted above. These look stationary, and so we will not consider further differencing.

1. The PACF is suggestive of an model. So an initial candidate model is an ARIMA(3,1,0). There are no other obvious candidate models.

1. We can fit an ARIMA(3,1,0) model along with variations including , , , etc. Of these, the has a slightly smaller AICc value.

fit1 = Arima(ee.sa,order=c(3,1,0))fit2 = Arima(ee.sa,order=c(4,1,0))fit3 = Arima(ee.sa,order=c(2,1,0))fit4 = Arima(ee.sa,order=c(3,1,1))fit1

## Series: ee.sa ## ARIMA(3,1,0) ## ## Coefficients:## ar1 ar2 ar3## -0.2788 0.0307 0.2268## s.e. 0.0699 0.0726 0.0697## ## sigma^2 estimated as 8.374: log likelihood=-480.02## AIC=968.03 AICc=968.24 BIC=981.1

fit2

## Series: ee.sa ## ARIMA(4,1,0) ## ## Coefficients:## ar1 ar2 ar3 ar4## -0.3088 0.0271 0.2632 0.1284## s.e. 0.0713 0.0719 0.0719 0.0708## ## sigma^2 estimated as 8.275: log likelihood=-478.39## AIC=966.78 AICc=967.1 BIC=983.12

fit3

## Series: ee.sa ## ARIMA(2,1,0) ## ## Coefficients:## ar1 ar2## -0.2877 -0.0357## s.e. 0.0719 0.0717## ## sigma^2 estimated as 8.791: log likelihood=-485.16## AIC=976.33 AICc=976.45 BIC=986.13

fit4

## Series: ee.sa ## ARIMA(3,1,1) ## ## Coefficients:## ar1 ar2 ar3 ma1## 0.2345 0.1791 0.2477 -0.5556## s.e. 0.1555 0.0775 0.0721 0.1519## ## sigma^2 estimated as 8.213: log likelihood=-477.68## AIC=965.35 AICc=965.67 BIC=981.69

checkresiduals(fit4)

## ## Ljung-Box test## ## data: Residuals from ARIMA(3,1,1)## Q* = 23.589, df = 20, p-value = 0.2608## ## Model df: 4. Total lags used: 24

1. The ACF plot of the residuals from the model shows that all autocorrelations are within the threshold limits, indicating that the residuals are behaving like white noise. A Portmanteau test returns a large p-value, also suggesting that the residuals are white noise.

1. Two-year ( month) forecasts from the chosen model are shown below.

autoplot(forecast(fit4,h=24))+ xlab("Year") + ylab("Seasonally-Adjusted Index")

8.8 - Forecasting (read this section in online test)

1