the box-jenkins approach to model building
DESCRIPTION
The Box-Jenkins ApproachTRANSCRIPT
Chapter 3, Part IV: The Box-Jenkins Approach to Model Building
e
s
The ARMA models have been found to be quite useful for describing stationary nonseasonal tim
eries. A partial explanation for this fact is provided by Wold’s Theorem: "Any stationary series can be
-
b
expressed as the sum of two components: a perfectly forecastable series and a moving average of possi
ly infinite order." In practice, the only perfectly forecastable aspect of an economic series is the sea-
,
w
sonal component, if any. Thus, nonseasonal series can always be represented by an MA (∞) model
hich in turn can usually be approximated by an ARMA(p,q) model with p +q small (i.e., with a small
d
number of parameters). Thus, the ARMA models can typically provide an accurate yet parsimonious
escription of stationary nonseasonal series.
In fact, most economic series are nonstationary and have a seasonal component. This does not
(
degrade the usefulness of ARMA models, however, since the raw data may typically be processed
often by some form of differencing) to produce an approximately stationary, nonseasonal series. This
b
series may be forecast by fitting an appropriate ARMA model. Forecasts of the original series may then
e obtained by reversing the processing operation.
Specifically, the processing proceeds as follows. Seasonal components may be removed by a tech-
a
nique called "seasonal differencing", discussed in Chapter 4. Nonstationarity can often be classified as
"trend in mean", or a "trend in variance". Trends in mean can usually be handled by ordinary
ddifferencing. An example is the series x = (a + bt ) + ε . Trends in variance can often be convertet t
tt n
m
into trends in mean by taking logarithms, as with the series x = exp (a + bt ) exp (ε ) . The trend i
ean of log x can then be removed by differencing. Since the techniques just described are reasonably
e
t
ffective, we can safely assume that our data (after being suitably processed) forms a stationary nonsea-
W
sonal time series.
hat Is Model Building?
So far, in our discussions of forecasting for stationary series, we have assumed that the series
c
actually obeys an ARMA(p,q) model, that the model orders (i.e., p and q ) are known, and that the
orresponding parameter values are known as well. In practice, we will simply have a series of data
v
- 2 -
alues, and none of these assumptions will be valid. Indeed, it is highly doubtful that our stationary
e
b
series obeys an exact ARMA model. The main justification for using such a model is not that w
elieve it actually holds, but instead that we believe it can provide an accurate, parsimonious descrip-
v
tion of the data, as discussed above. Still, some important questions remain: What are the appropriate
alues for (p , q ), and how should we estimate the corresponding parameter values? Box and Jenkins
l
d
refer to these respectively as the identi f ication and estimation stages of model building. We wil
escribe how these two stages are implemented. Note that once a model has been identified and its
t
i
parameters estimated, the result is taken to be the true model and forecasts are obtained accordingly. I
s worth remembering, however, that the fitted model is almost certainly not identical to the true model.
e
This can result in a type of forecasting error (essentially ignored by most authors) which cannot be
asily gauged, and which can in fact be quite devastating. As a minimum protection against such prob-
-
i
lems, we must check that the fitted model is (or at least seems to be) adequate. Such diagnostic check
ng is the final stage of the Box-Jenkins approach, and will be described.
Model Identification: The Correlogram and Partial Correlogram
The class of ARMA models is quite large, and in practice we must decide which of these models
is most appropriate for the data at hand, x , x , . . . , x . The correlogram and partial correlogram are1 2 n
.two simple diagrams which can help us to make this decision (i.e., to "identify the model")
We first describe the correlogram, since it is conceptually the simplest. The theoretical correlo-
gram is a plot of the theoretical autocorrelations
ρ = corr (x , x )k t t −k
sagainst k . The sample correlogram is a plot against k of the estimated autocorrelation
r = (x − xd) (x − xd) / (x − xd) .n
t2
1
n
t t −kt =1
kt =k +Σ Σ
e
z
If the series were actually MA(q), its theoretical correlogram would "cut off" (i.e., take the valu
ero) for k > q . Thus, we would expect that the sample correlogram would have a similar (though not
identical) shape to the theoretical correlogram, and would therefore stay reasonably close to zero for
- 3 -
r
k
k > q . Reversing this reasoning, we get the following rule: If the correlogram seems to cut off fo
> q , then the appropriate model is MA(q).
For AR(p) models, the autocorrelations ρ are approximately (for large enough k ) ρ = A λk
w
k k
here e λ e < 1. Thus, for k large (say k ≥ p ), the correlogram would be expected to decline steadily (if
-
t
λ > 0) or be bounded by a pair of declining curves (if λ < 0). This pattern of decline can often be dis
inguished from the "cutoff" described earlier, and should be taken as evidence that the correct model is
-
t
not MA. To actually identify an AR model, however, we need a diagram which will have a more dis
inctive shape when the series is actually AR. The partial correlogram is such a diagram.
To define partial correlations, suppose we fit an AR(k) model to our data:
x = a x + a x + . . . + a x + ε .
k
t k 1 t −1 k 2 t −2 kk t −k t
k t −kThen a is the estimate of the coefficient of x when a k ’th order AR is fitted. Rewriting this as
x − [a x + . . . + a x ] = a x + ε ,t k 1 t −1 k (k −1) t −(k −1) kk t −k t
tw kk t −ke see that a is a plausible estimate of the correlation between x and that part of x which cannot
be forecast from x , . . . , x . a is called the partial correlation between x and x . It is thet −1 t −(k −1) kk t t −k
e t t −kstimated correlation between x and x after the effects of all intermediate x ’s on this correlation are
taken out.
Clearly, if the series is actually AR(p), then the theoretical partial correlations a will be zero forkk
n
c
k > p . Thus, we can use the partial correlogram (i.e., a plot of the estimated partial correlatio
oefficients) to identify AR models: If the partial correlogram cuts off for k > p , then the appropriate
model is AR(p).
There is an interesting duality (symmetry) between the properties of the correlogram and partial
l
t
correlogram for pure AR and pure MA models. The behavior of a given diagram for a given mode
ype is the same as the behavior of the other diagram for the other model type. We have already seen
l
b
some evidence of this: The correlogram for an MA model and the partial correlogram for an AR mode
oth cut off. As we know, the correlogram for an AR model dies down (but does not cut off). It can be
shown that the partial correlogram for an MA model dies down as well.
- 4 -
A still unanswered question is how we can identify a mixed ARMA model. In this case, it can be
d
shown that the correlogram and partial correlogram both die down (but do not cut off). Thus, if both
iagrams die down, we can conclude that the appropriate model is ARMA. Unfortunately, though, the
diagrams do not in this case help us to decide on the order (p , q ) of the mixed model.
The following table summarizes the behavior of the diagrams.
sBehavior of Correlogram and Partial Correlogram for Various Model
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
iCorrelogram Partial Correlogram
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
iAR Dies Down Cuts Off
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
iMA Cuts Off Dies Down
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
iARMA Dies Down Dies Down
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiccccccccccc
ccccccccccc
ccccccccccc
ccccccccccc
ccccccccccc
-
p
After examining the correlogram and partial correlogram in the light of the above described pro
erties, we should be able to select a few models which seem appropriate. (Unfortunately, the observed
e
i
patterns are often not so clear as to unambiguously point to a single model.) Another guiding principl
n model identification is that of parsimony : The total number of parameters in the model should be as
m
small as possible (e.e., 3 or less, in the view of Box and Jenkins), subject to the restriction that the
odel provide an adequate description of the data. If two models appear to fit the data equally well, the
t
p
one with the fewest parameters will always be preferred. Indeed, in this case the one with the fewes
arameters will almost certainly produce the best forecasts. One reason is that we can obtain more pre-
cise (stable) parameter estimates if the number of parameters is small.
Besides facilitating the identification of models for stationary series, the correlogram can also
y
s
diagnose nonstationarity. If a series is nonstationary (and needs to be differenced to produce a stationar
eries) then the theoretical autocorrelations will be nearly 1 for all k . Thus, if the estimated correlogram
d
c
fails to die down (or dies down very slowly), then the series should be differenced. If the estimate
orrelogram for the differenced series still fails to die down, then the series should be differenced once
s
n
more. Note, however, that economic series typically need to be differenced only once. If the serie
eeds to be differenced d times before an ARMA(p,q) model can be identified, the original series is
- 5 -
.said to be an integrated mixed autoregressive-moving average series, denoted ARIMA(p,d,q)
The model identification method just described is the one advocated by Box and Jenkins, and
-
e
Granger (among others). Its usefulness has been amply demonstrated on actual data, economic and oth
rwise. It is the method that we will use in this course. The method does have some serious drawbacks,
k
however: It is not entirely objective, its implementation requires careful examination of the data by a
nowledgeable and experienced analyst, and it may fail to unambiguously identify a model. Since the
.
T
publication of Box-Jenkins and Granger, several objective methods have been proposed and tested
hese methods automatically select a model without any intervention from the user. Although there is
s
m
no universal agreement as the superiority of the objective methods compared to the Box-Jenkin
ethod, the potential advantages of a high-quality automated method are quite strong. Still, if an
-
g
experienced analyst is available, considerable insight may be gained through examination of the correlo
ram and partial correlogram, even if an automated method is ultimately used. We will discuss the new
E
methods more fully if time permits.
stimation
In the last section, we described ways of choosing an appropriate model. Strictly speaking, how-
l
v
ever, "model identification" consists merely of selecting the form of the model, but not the numerica
alues of its parameters. Suppose, for example, we have decided to fit an AR(1) model x = ax + ε .t t −1 t
,
w
Since the value of the parameter a is not known, it must somehow be estimated from the data. Here
e describe methods of estimating the parameters of ARMA models.
p
b
For pure AR models, there exist simple estimation techniques, since there is a linear relationshi
etween the autocorrelations and the AR parameters. This relationship can be inverted, and then the
.
I
theoretical autocorrelations can be replaced by their estimates, to yield estimates of the AR parameters
n the AR(1) case, for example, we know that ρ = a . Thus, we may estimate a by a = r . In general for
the AR(p) model
1 1
x = a x + a x + . . . + a x + εt
w
t 1 t −1 2 t −2 p t −p
e obtain a system of linear equations called the Yule-Walker equations by multiplying both sides by
- 6 -
sx (k = 1 , . . . , p ), taking expectations and then normalizing. The k ’th equation in the system it −k
k 1 k −1 2 k −2 p k −p .
T
ρ = a ρ + a ρ + . . . + a ρ
he estimates a , . . . , a of the AR parameters are obtained by solving this linear system, thereby1 k
1 p 1 p 1 p r
e
obtaining a formula for a , . . . , a in terms of ρ , . . . , ρ and then replacing ρ , . . . , ρ by thei
stimates r , . . . , r in this formula. This procedure is equivalent to solving the system1 p
k 1 k −1 2 k −2 p k −p )
f
r = a r + a r + . . . + a r (k = 1 , . . . , p
or a , . . . , a . The resulting values are called the Yule-Walker estimates. It can be shown that the
Y
1 p
ule-Walker estimated AR parameters always correspond to a stationary AR model.
p
b
The situation for MA models is considerably more complicated. The theoretical relationshi
etween the parameters and autocorrelations is not linear. For example, in the MA(1) x = ε + b ε ,
we have
t t t −1
ρ =1 + bh bhhhhh .1 2
12
1 -
t
In this case, we get a quadratic equation for b, namely ρ b + (−1)b + ρ = 0 , which has the two solu
ions
b =2ρ
1± 1 − 4ρhhhhhhhhhh .1
12d
I 1
√dddd
t can be shown that e ρ e ≤ .5 for any MA(1) model, so the solutions will both be real. The correspond-
ing estimates of b are
b =2r
1± 1 − 4rhhhhhhhhhhh ,1
12d√ddddd
12 e
w
and two problems arise here. First, there is no guarantee that 1 − 4r > 0 . Second, how do we decid
hich of the two solutions to use?
To answer this second question we must define invertibility . An MA model is said to be inverti-
.
C
ble if it can be represented as (i.e., "inverted to") a stationary infinite-order autoregression, AR (∞)
onsider, for example, the MA(1) model x = ε + b ε . If we consider this as a difference equation fort t t −1
- 7 -
ε , we obtain the solutiont
t t t −12
t −2k
t −k .
I
ε = x − bx + b x + . . . + (−b ) x + . . .
f e b e > 1, an explosive series results and the current ε cannot be estimated from past x . Thus, to bet t
-
t
useful for forecasting, the MA model must be invertible. For the MA (q ) model, the invertibility condi
ion is that the root of largest magnitude of the equation z + b z + . . . + b = 0 should have magni-
tude less than one.
q1
q −1q
Returning now to the issue of which solution to choose for b in the MA(1) case, it can be shown
s
p
that of the two possible solutions, only one gives an invertible model. Estimation for MA(q) model
roceeds similarly. From the expressions for ρ , . . . , ρ , we obtain a system of nonlinear equation
1
1 q
qfor the parameters b , . . . , b . This system will have many solutions, but only one will give an inver-
tible model. Computer programs for fitting MA models will always choose this invertible model.
Estimation for mixed ARMA models proceeds by nonlinear methods. The programs used will
always choose a stationary, invertible model.
The methods just described are those given in Granger. All of these exploit the the connection
,
i
between the autocorrelations and the parameters. In fact, there exist many other estimation techniques
ncluding the very popular maximum likelihood method. This method assumes that the innovations are
s
l
normally distributed, and then exploits this assumption as fully as possible. Another popular method i
east squares , in which the sum of squared errors of the fitted model (i.e., the sum of squares of the
-
h
estimated innovations) is made as small as possible. Assuming normal innovations the maximum likeli
ood and least squares methods are generally superior to the method described in Granger, particularly
-
t
when the model is near the nonstationarity boundary (i.e., when the largest root of the stationarity equa
ion has magnitude close to 1).
Diagnostic Checking
Once a model has been identified and estimated, it is usually taken to the the true model and
d
m
forecasts can be obtained accordingly. As mentioned earlier, it is virtually certain that the estimate
odel is not the true model. To protect against disastrous forecasting errors, the least we can do is to
c
- 8 -
heck that the fitted model is a satisfactory one. This is done by the use of diagnostic checks . If we
e
t
had a large amount of data, it would be feasible to break the data into two parts, identify and estimat
he model on the first part and check the quality of the forecasts on the second part. This method,
e
known as cross −validation , gives one of the few ways of obtaining an honest estimate of forecasting
rror. Unfortunately, there is typically not enough data for cross-validation to be used, so that models
m
are identified, estimated, and diagnostically checked on the same data set. The most commonly used
ethod is to examine the correlogram of the residuals from the fitted model to see if the residuals are a
s
b
white noise (as they should be, if the model is correct). For example, the Box-Pierce test statistic i
ased on the sum of squares of the residual autocorrelations. If this test statistic exceeds some critical
t
i
value (found in a table), then the model in question is declared to be inadequate. Unfortunately, this tes
s not very likely to flag inadequately fitting models. Furthermore, even if a model is not found to be
e
i
inadequate, the method provides no assessment of the probable contribution to forecast error due to th
dentification and estimation stages, and due to the difference between the identified and actual models.