the box-jenkins approach to model building

Chapter 3, Part IV: The Box-Jenkins Approach to Model Building

e

s

The ARMA models have been found to be quite useful for describing stationary nonseasonal tim

eries. A partial explanation for this fact is provided by Wold’s Theorem: "Any stationary series can be

-

b

expressed as the sum of two components: a perfectly forecastable series and a moving average of possi

ly infinite order." In practice, the only perfectly forecastable aspect of an economic series is the sea-

,

w

sonal component, if any. Thus, nonseasonal series can always be represented by an MA (∞) model

hich in turn can usually be approximated by an ARMA(p,q) model with p +q small (i.e., with a small

d

number of parameters). Thus, the ARMA models can typically provide an accurate yet parsimonious

escription of stationary nonseasonal series.

In fact, most economic series are nonstationary and have a seasonal component. This does not

(

degrade the usefulness of ARMA models, however, since the raw data may typically be processed

often by some form of differencing) to produce an approximately stationary, nonseasonal series. This

b

series may be forecast by fitting an appropriate ARMA model. Forecasts of the original series may then

e obtained by reversing the processing operation.

Specifically, the processing proceeds as follows. Seasonal components may be removed by a tech-

a

nique called "seasonal differencing", discussed in Chapter 4. Nonstationarity can often be classified as

"trend in mean", or a "trend in variance". Trends in mean can usually be handled by ordinary

ddifferencing. An example is the series x = (a + bt ) + ε . Trends in variance can often be convertet t

tt n

m

into trends in mean by taking logarithms, as with the series x = exp (a + bt ) exp (ε ) . The trend i

ean of log x can then be removed by differencing. Since the techniques just described are reasonably

e

t

ffective, we can safely assume that our data (after being suitably processed) forms a stationary nonsea-

W

sonal time series.

hat Is Model Building?

So far, in our discussions of forecasting for stationary series, we have assumed that the series

c

actually obeys an ARMA(p,q) model, that the model orders (i.e., p and q ) are known, and that the

orresponding parameter values are known as well. In practice, we will simply have a series of data

v

- 2 -

alues, and none of these assumptions will be valid. Indeed, it is highly doubtful that our stationary

e

b

series obeys an exact ARMA model. The main justification for using such a model is not that w

elieve it actually holds, but instead that we believe it can provide an accurate, parsimonious descrip-

v

tion of the data, as discussed above. Still, some important questions remain: What are the appropriate

alues for (p , q ), and how should we estimate the corresponding parameter values? Box and Jenkins

l

d

refer to these respectively as the identi f ication and estimation stages of model building. We wil

escribe how these two stages are implemented. Note that once a model has been identified and its

t

i

parameters estimated, the result is taken to be the true model and forecasts are obtained accordingly. I

s worth remembering, however, that the fitted model is almost certainly not identical to the true model.

e

This can result in a type of forecasting error (essentially ignored by most authors) which cannot be

asily gauged, and which can in fact be quite devastating. As a minimum protection against such prob-

-

i

lems, we must check that the fitted model is (or at least seems to be) adequate. Such diagnostic check

ng is the final stage of the Box-Jenkins approach, and will be described.

Model Identification: The Correlogram and Partial Correlogram

The class of ARMA models is quite large, and in practice we must decide which of these models

is most appropriate for the data at hand, x , x , . . . , x . The correlogram and partial correlogram are1 2 n

.two simple diagrams which can help us to make this decision (i.e., to "identify the model")

We first describe the correlogram, since it is conceptually the simplest. The theoretical correlo-

gram is a plot of the theoretical autocorrelations

ρ = corr (x , x )k t t −k

sagainst k . The sample correlogram is a plot against k of the estimated autocorrelation

r = (x − xd) (x − xd) / (x − xd) .n

t2

1

n

t t −kt =1

kt =k +Σ Σ

e

z

If the series were actually MA(q), its theoretical correlogram would "cut off" (i.e., take the valu

ero) for k > q . Thus, we would expect that the sample correlogram would have a similar (though not

identical) shape to the theoretical correlogram, and would therefore stay reasonably close to zero for

- 3 -

r

k

k > q . Reversing this reasoning, we get the following rule: If the correlogram seems to cut off fo

> q , then the appropriate model is MA(q).

For AR(p) models, the autocorrelations ρ are approximately (for large enough k ) ρ = A λk

w

k k

here e λ e < 1. Thus, for k large (say k ≥ p ), the correlogram would be expected to decline steadily (if

-

t

λ > 0) or be bounded by a pair of declining curves (if λ < 0). This pattern of decline can often be dis

inguished from the "cutoff" described earlier, and should be taken as evidence that the correct model is

-

t

not MA. To actually identify an AR model, however, we need a diagram which will have a more dis

inctive shape when the series is actually AR. The partial correlogram is such a diagram.

To define partial correlations, suppose we fit an AR(k) model to our data:

x = a x + a x + . . . + a x + ε .

k

t k 1 t −1 k 2 t −2 kk t −k t

k t −kThen a is the estimate of the coefficient of x when a k ’th order AR is fitted. Rewriting this as

x − [a x + . . . + a x ] = a x + ε ,t k 1 t −1 k (k −1) t −(k −1) kk t −k t

tw kk t −ke see that a is a plausible estimate of the correlation between x and that part of x which cannot

be forecast from x , . . . , x . a is called the partial correlation between x and x . It is thet −1 t −(k −1) kk t t −k

e t t −kstimated correlation between x and x after the effects of all intermediate x ’s on this correlation are

taken out.

Clearly, if the series is actually AR(p), then the theoretical partial correlations a will be zero forkk

n

c

k > p . Thus, we can use the partial correlogram (i.e., a plot of the estimated partial correlatio

oefficients) to identify AR models: If the partial correlogram cuts off for k > p , then the appropriate

model is AR(p).

There is an interesting duality (symmetry) between the properties of the correlogram and partial

l

t

correlogram for pure AR and pure MA models. The behavior of a given diagram for a given mode

ype is the same as the behavior of the other diagram for the other model type. We have already seen

l

b

some evidence of this: The correlogram for an MA model and the partial correlogram for an AR mode

oth cut off. As we know, the correlogram for an AR model dies down (but does not cut off). It can be

shown that the partial correlogram for an MA model dies down as well.

- 4 -

A still unanswered question is how we can identify a mixed ARMA model. In this case, it can be

d

shown that the correlogram and partial correlogram both die down (but do not cut off). Thus, if both

iagrams die down, we can conclude that the appropriate model is ARMA. Unfortunately, though, the

diagrams do not in this case help us to decide on the order (p , q ) of the mixed model.

The following table summarizes the behavior of the diagrams.

sBehavior of Correlogram and Partial Correlogram for Various Model

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

iCorrelogram Partial Correlogram

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

iAR Dies Down Cuts Off


iMA Cuts Off Dies Down


iARMA Dies Down Dies Down

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiccccccccccc

ccccccccccc

ccccccccccc

ccccccccccc

ccccccccccc

-

p

After examining the correlogram and partial correlogram in the light of the above described pro

erties, we should be able to select a few models which seem appropriate. (Unfortunately, the observed

e

i

patterns are often not so clear as to unambiguously point to a single model.) Another guiding principl

n model identification is that of parsimony : The total number of parameters in the model should be as

m

small as possible (e.e., 3 or less, in the view of Box and Jenkins), subject to the restriction that the

odel provide an adequate description of the data. If two models appear to fit the data equally well, the

t

p

one with the fewest parameters will always be preferred. Indeed, in this case the one with the fewes

arameters will almost certainly produce the best forecasts. One reason is that we can obtain more pre-

cise (stable) parameter estimates if the number of parameters is small.

Besides facilitating the identification of models for stationary series, the correlogram can also

y

s

diagnose nonstationarity. If a series is nonstationary (and needs to be differenced to produce a stationar

eries) then the theoretical autocorrelations will be nearly 1 for all k . Thus, if the estimated correlogram

d

c

fails to die down (or dies down very slowly), then the series should be differenced. If the estimate

orrelogram for the differenced series still fails to die down, then the series should be differenced once

s

n

more. Note, however, that economic series typically need to be differenced only once. If the serie

eeds to be differenced d times before an ARMA(p,q) model can be identified, the original series is

- 5 -

.said to be an integrated mixed autoregressive-moving average series, denoted ARIMA(p,d,q)

The model identification method just described is the one advocated by Box and Jenkins, and

-

e

Granger (among others). Its usefulness has been amply demonstrated on actual data, economic and oth

rwise. It is the method that we will use in this course. The method does have some serious drawbacks,

k

however: It is not entirely objective, its implementation requires careful examination of the data by a

nowledgeable and experienced analyst, and it may fail to unambiguously identify a model. Since the

.

T

publication of Box-Jenkins and Granger, several objective methods have been proposed and tested

hese methods automatically select a model without any intervention from the user. Although there is

s

m

no universal agreement as the superiority of the objective methods compared to the Box-Jenkin

ethod, the potential advantages of a high-quality automated method are quite strong. Still, if an

-

g

experienced analyst is available, considerable insight may be gained through examination of the correlo

ram and partial correlogram, even if an automated method is ultimately used. We will discuss the new

E

methods more fully if time permits.

stimation

In the last section, we described ways of choosing an appropriate model. Strictly speaking, how-

l

v

ever, "model identification" consists merely of selecting the form of the model, but not the numerica

alues of its parameters. Suppose, for example, we have decided to fit an AR(1) model x = ax + ε .t t −1 t

,

w

Since the value of the parameter a is not known, it must somehow be estimated from the data. Here

e describe methods of estimating the parameters of ARMA models.

p

b

For pure AR models, there exist simple estimation techniques, since there is a linear relationshi

etween the autocorrelations and the AR parameters. This relationship can be inverted, and then the

.

I

theoretical autocorrelations can be replaced by their estimates, to yield estimates of the AR parameters

n the AR(1) case, for example, we know that ρ = a . Thus, we may estimate a by a = r . In general for

the AR(p) model

1 1

x = a x + a x + . . . + a x + εt

w

t 1 t −1 2 t −2 p t −p

e obtain a system of linear equations called the Yule-Walker equations by multiplying both sides by

- 6 -

sx (k = 1 , . . . , p ), taking expectations and then normalizing. The k ’th equation in the system it −k

k 1 k −1 2 k −2 p k −p .

T

ρ = a ρ + a ρ + . . . + a ρ

he estimates a , . . . , a of the AR parameters are obtained by solving this linear system, thereby1 k

1 p 1 p 1 p r

e

obtaining a formula for a , . . . , a in terms of ρ , . . . , ρ and then replacing ρ , . . . , ρ by thei

stimates r , . . . , r in this formula. This procedure is equivalent to solving the system1 p

k 1 k −1 2 k −2 p k −p )

f

r = a r + a r + . . . + a r (k = 1 , . . . , p

or a , . . . , a . The resulting values are called the Yule-Walker estimates. It can be shown that the

Y

1 p

ule-Walker estimated AR parameters always correspond to a stationary AR model.

p

b

The situation for MA models is considerably more complicated. The theoretical relationshi

etween the parameters and autocorrelations is not linear. For example, in the MA(1) x = ε + b ε ,

we have

t t t −1

ρ =1 + bh bhhhhh .1 2

12

1 -

t

In this case, we get a quadratic equation for b, namely ρ b + (−1)b + ρ = 0 , which has the two solu

ions

b =2ρ

1± 1 − 4ρhhhhhhhhhh .1

12d

I 1

√dddd

t can be shown that e ρ e ≤ .5 for any MA(1) model, so the solutions will both be real. The correspond-

ing estimates of b are

b =2r

1± 1 − 4rhhhhhhhhhhh ,1

12d√ddddd

12 e

w

and two problems arise here. First, there is no guarantee that 1 − 4r > 0 . Second, how do we decid

hich of the two solutions to use?

To answer this second question we must define invertibility . An MA model is said to be inverti-

.

C

ble if it can be represented as (i.e., "inverted to") a stationary infinite-order autoregression, AR (∞)

onsider, for example, the MA(1) model x = ε + b ε . If we consider this as a difference equation fort t t −1

- 7 -

ε , we obtain the solutiont

t t t −12

t −2k

t −k .

I

ε = x − bx + b x + . . . + (−b ) x + . . .

f e b e > 1, an explosive series results and the current ε cannot be estimated from past x . Thus, to bet t

-

t

useful for forecasting, the MA model must be invertible. For the MA (q ) model, the invertibility condi

ion is that the root of largest magnitude of the equation z + b z + . . . + b = 0 should have magni-

tude less than one.

q1

q −1q

Returning now to the issue of which solution to choose for b in the MA(1) case, it can be shown

s

p

that of the two possible solutions, only one gives an invertible model. Estimation for MA(q) model

roceeds similarly. From the expressions for ρ , . . . , ρ , we obtain a system of nonlinear equation

1

1 q

qfor the parameters b , . . . , b . This system will have many solutions, but only one will give an inver-

tible model. Computer programs for fitting MA models will always choose this invertible model.

Estimation for mixed ARMA models proceeds by nonlinear methods. The programs used will

always choose a stationary, invertible model.

The methods just described are those given in Granger. All of these exploit the the connection

,

i

between the autocorrelations and the parameters. In fact, there exist many other estimation techniques

ncluding the very popular maximum likelihood method. This method assumes that the innovations are

s

l

normally distributed, and then exploits this assumption as fully as possible. Another popular method i

east squares , in which the sum of squared errors of the fitted model (i.e., the sum of squares of the

-

h

estimated innovations) is made as small as possible. Assuming normal innovations the maximum likeli

ood and least squares methods are generally superior to the method described in Granger, particularly

-

t

when the model is near the nonstationarity boundary (i.e., when the largest root of the stationarity equa

ion has magnitude close to 1).

Diagnostic Checking

Once a model has been identified and estimated, it is usually taken to the the true model and

d

m

forecasts can be obtained accordingly. As mentioned earlier, it is virtually certain that the estimate

odel is not the true model. To protect against disastrous forecasting errors, the least we can do is to

c

- 8 -

heck that the fitted model is a satisfactory one. This is done by the use of diagnostic checks . If we

e

t

had a large amount of data, it would be feasible to break the data into two parts, identify and estimat

he model on the first part and check the quality of the forecasts on the second part. This method,

e

known as cross −validation , gives one of the few ways of obtaining an honest estimate of forecasting

rror. Unfortunately, there is typically not enough data for cross-validation to be used, so that models

m

are identified, estimated, and diagnostically checked on the same data set. The most commonly used

ethod is to examine the correlogram of the residuals from the fitted model to see if the residuals are a

s

b

white noise (as they should be, if the model is correct). For example, the Box-Pierce test statistic i

ased on the sum of squares of the residual autocorrelations. If this test statistic exceeds some critical

t

i

value (found in a table), then the model in question is declared to be inadequate. Unfortunately, this tes

s not very likely to flag inadequately fitting models. Furthermore, even if a model is not found to be

e

i

inadequate, the method provides no assessment of the probable contribution to forecast error due to th

dentification and estimation stages, and due to the difference between the identified and actual models.

the box-jenkins approach to model building

Documents