linear estimation of arma processes

2
Automatica, Vol. 19, No, 4, pp. 447~148, 1983 Printed in Great Britain. 0005 1098/83 $3.00+0.00 Pergamon Press Ltd. ~) 1983 International Federation of Automatic Control Technical Communique Linear Estimation of ARMA Processes* E. J. HANNANI" and L. KAVALIERISt Key Words--Autoregressive-moving average; parameter estimation; almost sure convergence; martingale difference. Abstract--The method of estimating ARMA parameters first discussed by Mayne and Firoozan is investigated. It is shown that if, at the first stage, the order of the fitted autoregression is allowed to depend on the number of time points, in a reason- able manner, then it will still be true that the final estimate of the parameter vector will converge, almost surely, to the true value. This is to be compared to the result in the original paper where the order is fixed and it is shown that, as the sample size increases, the estimate converges to a value which, if the order of the autoregression is high enough, will be arbitrarily near to the true value. Some comments are made on other extensions, on the law of the iterated logarithm, on the central limit theorem and on the choice of the order of the fitted autoregression. The innovation sequence need not be Gaussian and for the converg- ence result only a natural condition relating to prediction needs be imposed. 1. Introduction WE FOLLOW Mayne and Firoozan (1982) (which will henceforth be referred to as MF) and consider the ARMA system A°(q - t)yk = C°(q-1)ek, E(ek) = 0, E(elek) = 6ik 2. (1) Here y~ is observed and A°(z) = 1 + a°z + a°z 2 + ... + a°z ", C°(z) = 1 + c°z + ... + c°z ". It is assumed that A°(z), C°(z) are coprime polynomials with all zeros in Izl > 1. It is also assumed that either a ° or c° is not zero. This last requirement is not explicitly stated in MF but is needed for their theorems. These authors discuss a method whose first three steps were originally suggested by Durbin (1960), as they point out. In the first step d(z) is determined, which is a polynomial of degree p, so that fi(z)= 5z~t~zJ, &o = 1, minimizes We call 2p the minimized value of (2). Then ek = a(q-t)y~ is formed, with (say) initial values zero, and at the third step 1 K k~t {A(q-t)y k -- C(q-t)ek}2 (3) is minimized with respect to al ..... as, cl ..... c, to obtain/il(z), t(z). We shall call 2~ the minimized value of (3). New sequences 37k, ~k are formed satisfying C~(q-1)y k = Yk, ~l(q - ~)ek = ek. Then (3) is minimized again with Yk, ~k replacing Yk, ek to obtain ,zi2(z), ~2(z), which are the final estimates. The last step may be iterated, as MF indicate. Call 0 ° the vector of system parameters, a ° ..... a °, c o ..... co and 0 = 0(N, p) the estimate obtained by the above procedure. The MF show that * Received 11 November 1982. This paper was recommended for publication by editor A. H. Levis. ~ Department of Statistics, I.A.S., Australian National Univer- sity, P.O. Box 4, Canberra City, A.C.T. 2601, Australia. 447 --- 0(p), almost surely (a.s.), as N ~ ~, and O(p) ~ 0 ° as p --* oo. This is, of course, a result that indicates the usefulness of the method. In practice, however, p is likely to bechosen in relation to N so that it would be preferable to consider O(N, p(N)) = 0(N), let us say, and to show that as N increases for a suitable range of functions p(N) then 0(N) --, 0 °, a.s. For example p(N) could be chosen to minimize BIC(p) = log 2p + p log N/N, 0 < p < U(N) (4) where the upper limit, U(N), is suitably chosen. In this connec- tion we mention a result from Hannan and Kavalieris (1983). To state that result we need some conditions on the ek, which will also be used later. Rather than impose a normal distribution on those it is more generally required that E{eklFk-t} = 0, a.s. (5) where F k is the a-algebra determined by ej (or equivalently yj), j < k. The condition (5) is 'natural' in that for (1) it is equivalent to the statement that the best linear predictor is the best predictor, in the least squares sense. In addition to (5) we need E{e~} < ~, E{e21F_~} --- cr 2, a,s. (6) Both of these are fairly innocuous, the second because it merely says that ek z is purely nondeterministic and is not influenced by the infinitely far past. Of course (5) and (6) hold if ek is Gaussian. Let pO be the modulus of a zero of C°(z) nearest to Izl = 1. Thus pO > 1. If U(N) in (4) is (log N) b, b < oo, and p(N) minimizes (4) then p(N)/{log N/(2 log p0)} ~ 1, a.s. [Of course U(N) may be much too large, for b large, for practical considerations but is needed in (4) for the truth of the result.] However, (4) often gives p(N) values that are too small for the values of N met in practice and in particular are much smaller than log N/(2 log pO) (see Hannan and Kavalieris, 1983). In general we may con- sider a sequence p(N)--, oo, a.s.; p(N) = O{(N/log N)1/2}, a.s. (7) The upper limit here is now much too large for practical use and it is introduced only because it is the largest value that make the following result true. Theorem. If Yk is generated by (1) subject to the conditions stated below that equation and (5)-(7) hold then O(N)~ 0 ° a.s. The proof of this theorem will be given in Section 2. Here we mention two further results without proof. Let us assume d log N < lim inf p(N), lim sup p(N) < (log N) b, b < oo (8) N~oo N~oo where d > (2 log pO)- 1 The lower limit can be arrived at, a.s., by choosing pI(N) to minimize (4) and putting p(N) = cpt(N), c > 1. Then I0(N) - 0°l = O{(log log N/N)tl2}, a.s. In MF a central limit theorem is proved, namely that N1/2{O(N, p)- 0(p)} is asymptotically normal with mean zero and a covariance matrix that, as p--, oo, converges to that of

Upload: ej-hannan

Post on 15-Jun-2016

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Linear estimation of ARMA processes

Automatica, Vol. 19, No, 4, pp. 447~148, 1983 Printed in Great Britain.

0005 1098/83 $3.00+0.00 Pergamon Press Ltd.

~) 1983 International Federation of Automatic Control

Technical Communique

Linear Estimation of ARMA Processes*

E. J. H A N N A N I " a n d L. K A V A L I E R I S t

Key Words--Autoregressive-moving average; parameter estimation; almost sure convergence; martingale difference.

Abstract--The method of estimating ARMA parameters first discussed by Mayne and Firoozan is investigated. It is shown that if, at the first stage, the order of the fitted autoregression is allowed to depend on the number of time points, in a reason- able manner, then it will still be true that the final estimate of the parameter vector will converge, almost surely, to the true value. This is to be compared to the result in the original paper where the order is fixed and it is shown that, as the sample size increases, the estimate converges to a value which, if the order of the autoregression is high enough, will be arbitrarily near to the true value. Some comments are made on other extensions, on the law of the iterated logarithm, on the central limit theorem and on the choice of the order of the fitted autoregression. The innovation sequence need not be Gaussian and for the converg- ence result only a natural condition relating to prediction needs be imposed.

1. Introduction WE FOLLOW Mayne and Firoozan (1982) (which will henceforth be referred to as MF) and consider the ARMA system

A°(q - t)yk = C°(q-1)ek, E(ek) = 0, E(elek) = 6ik 2. (1)

Here y~ is observed and

A°(z) = 1 + a°z + a°z 2 + ... + a°z ",

C°(z) = 1 + c°z + ... + c°z ".

It is assumed that A°(z), C°(z) are coprime polynomials with all zeros in Izl > 1. It is also assumed that either a ° or c ° is not zero. This last requirement is not explicitly stated in MF but is needed for their theorems. These authors discuss a method whose first three steps were originally suggested by Durbin (1960), as they point out. In the first step d(z) is determined, which is a polynomial of degree p, so that fi(z)= 5z~t~zJ, &o = 1, minimizes

We call 2p the minimized value of (2). Then ek = a(q-t)y~ is formed, with (say) initial values zero, and at the third step

1 K k~t {A(q - t ) y k -- C(q-t)ek}2 (3)

is minimized with respect to al . . . . . as, cl . . . . . c, to obtain/il(z), t(z). We shall call 2~ the minimized value of (3). New sequences 37k, ~k are formed satisfying C~(q-1)y k = Yk,

~ l (q - ~)ek = ek. Then (3) is minimized again with Yk, ~k replacing Yk, ek to obtain ,zi2(z), ~2(z), which are the final estimates. The last step may be iterated, as MF indicate. Call 0 ° the vector of system parameters, a ° . . . . . a °, c o . . . . . c o and 0 = 0(N, p) the estimate obtained by the above procedure. The MF show that

* Received 11 November 1982. This paper was recommended for publication by editor A. H. Levis.

~ Department of Statistics, I.A.S., Australian National Univer- sity, P.O. Box 4, Canberra City, A.C.T. 2601, Australia.

447

--- 0(p), almost surely (a.s.), as N ~ ~ , and O(p) ~ 0 ° as p --* oo. This is, of course, a result that indicates the usefulness of the method. In practice, however, p is likely to bechosen in relation to N so that it would be preferable to consider O(N, p(N)) = 0(N), let us say, and to show that as N increases for a suitable range of functions p(N) then 0(N) --, 0 °, a.s. For example p(N) could be chosen to minimize

BIC(p) = log 2p + p log N/N, 0 < p < U(N) (4)

where the upper limit, U(N), is suitably chosen. In this connec- tion we mention a result from Hannan and Kavalieris (1983). To state that result we need some conditions on the ek, which will also be used later. Rather than impose a normal distribution on those it is more generally required that

E{eklFk-t} = 0, a.s. (5)

where F k is the a-algebra determined by ej (or equivalently y j), j < k. The condition (5) is 'natural' in that for (1) it is equivalent to the statement that the best linear predictor is the best predictor, in the least squares sense. In addition to (5) we need

E{e~} < ~ , E{e21F_~} --- cr 2, a,s. (6)

Both of these are fairly innocuous, the second because it merely says that ek z is purely nondeterministic and is not influenced by the infinitely far past. Of course (5) and (6) hold if ek is Gaussian. Let pO be the modulus of a zero of C°(z) nearest to I zl = 1. Thus pO > 1. If U(N) in (4) is (log N) b, b < oo, and p(N) minimizes (4) then p(N)/{log N/(2 log p0)} ~ 1, a.s. [Of course U(N) may be much too large, for b large, for practical considerations but is needed in (4) for the truth of the result.] However, (4) often gives p(N) values that are too small for the values of N met in practice and in particular are much smaller than log N/(2 log pO) (see Hannan and Kavalieris, 1983). In general we may con- sider a sequence

p(N)--, oo, a.s.; p(N) = O{(N/log N)1/2}, a.s. (7)

The upper limit here is now much too large for practical use and it is introduced only because it is the largest value that make the following result true.

Theorem. If Yk is generated by (1) subject to the conditions stated below that equation and (5)-(7) hold then O(N)~ 0 ° a.s.

The proof of this theorem will be given in Section 2. Here we mention two further results without proof. Let us assume

d log N < lim inf p(N), lim sup p(N) < (log N) b, b < oo (8) N~oo N~oo

where d > (2 log pO)- 1 The lower limit can be arrived at, a.s., by choosing pI(N) to minimize (4) and putting p(N) = cpt(N), c > 1. Then

I0(N) - 0°l = O{(log log N/N)tl2}, a.s.

In MF a central limit theorem is proved, namely that N1/2{O(N, p ) - 0(p)} is asymptotically normal with mean zero and a covariance matrix that, as p--, oo, converges to that of

Page 2: Linear estimation of ARMA processes

448 Technical Communique

the maximum likelihood estimator in its limiting distribution. For an analogous theorem in the present context the second part of (6) must be strengthened to where

e{e~ lF k_, } : ,t (9)

Of course (9) implies the second part of (6) and (9) necessarily holds if the e~ are Gaussian. Under (5)-(7) and (9) then N 1 /2{0(N)-0 °} has a distribution converging to the multi- variate normal with zero mean and the covariance matrix of the maximum likelihood estimator on Gaussian assumptions.

To conclude this section we mention two further extensions. One would be to eliminate the assumption that n is known. In Hannan and Rissanen (1982) this is discussed. [Theorems 1 and 2 there stated are false unless log T in (1.11) of that paper is replaced by (log T) ~+~, 6 > 0.] The procedure there described differs slightly from that in M F but is essentially the same save that calculations are recursive on p and n and both are determined from the data. Thus a criterion of the form

-2 (10) log a, + n (log N)~+~/N, n < (log N)",a < cc

where 02 is the minimized value in (3), is used also to choose n. The theorems of this paper continue to hold.

A more relevant extension again would be to eliminate the assumption that (1) holds for the Yk and merely to consider the problem of choosing p and then n so as to obtain a suitable approximation to the true system generating Yk" For example the purpose may be to attain an estimate that permits an optimally accurate prediction when the effects of errors of estima- tion in the functions A(z), C(z) are taken into account. The procedures of M F or of Hannan and Rissanen (1982) could still be used of course but now, in relation to the latter, it is con- ceivable that log N in (4) and (log N) ~ +~ in (10) might be replaced by smaller quantities. No doubt this topic will be further discussed in future.

N ~ = 1 ' ~ k_~l (Yk e~) w~

and ~k is the vector of regressor variables ( Y k 1 . . . . . Y~. n ,

-eg_~ , . . . , --ek_,, ). Define w = ( y k - ~ . . . . . Yk ,, --ek-a . . . . . --e~_,) x and S = E(w wr), s = E((yp - ep)wr). To prove strong consistency of the estimate 0 m it is sufficient to prove

S = S + o ( 1 ) , . i = s + o ( l ) , a.s. (16)

since it is easily seen that ( 9 ° = - S - m s and S is nonsingular. We examine the elements of

1 N

= s=0

= ) ' o ~ ~ , r j - i ~+o(1), a.s. (17) ~=o

1

P =>,o~y'&~5~,rj+, , ~+o(1), a.s (18)

s,t=O

In going to the limiting quantities in (17) and (18) we have only made use of(13) and (14). As Yk is ergodic )'o = EO'~) + o(1) and hence ),o/E(y~) = 1 + o(l) and we may replace terms such as ~'0ri in (17) and (18) by E(YkYk-i)" lf~k, k = 0, 1, 2 . . . . is defined by

Z ~ k z~ = A°(z) / (C°(z) , ~o = 1 0

2. Proof o f the theorem The coefficient vector d~ = ( ~ . . . . . ~p) of the polynomial/](z)

minimizing (2) is given by

6~ = / ~ xP (11)

where

i = (t: 1 . . . . . t:t,)'r, I~ij = ii_ j, i, j = 1 . . . . . p

N

1 k~=l Y k Y k - i "

If we replace ig in (11) by the true autocorrelations r~ and define r = ( r ~ . . . , rp ) T, R o = r ~ j we obtain a coefficient vector ~ = (ill . . . . ',~p)T. For convenience we shall write Q(N) for (log N / N ) ~/2 in what follows. By some elementary manipulat ion we obtain

{ I - R - ' ( I ~ - R ) } ( b : - & ) = - R ' { ( / - r ) + ( / ~ - R ) & } . (12)

Hannan and Kavalieris (1983) gives

sup [ i~- r,I : O(Q(N)), a.s. (13)

As the largest eigenvalue of R - ~ is bounded, because C°(z) ~ O, Izl < 1 ensures that the smallest eigenvalue of R is bounded away from zero, then, using (13), R - ~ ( / ~ - R) is a matrix of elements uniformly O(Q(N)). Because p = o(I /Q(N)) the matrix 1 - R-1 (/~ - R) has (in particular) its smallest eigenvalue bounded away from zero. Using (13) and notin~ the coefficients &j are decreasing geometrically, then i f - r ) + ( R - R)~ = O(Q(N)) and we can conclude

sup [d~j-&jl = O ( Q ( N ) ) , a.s. (14) l <=j< p

The vector ~ m = (~il . . . . . d,, cl . . . . . (,)T minimizing (3) is determined by

then we have [Baxter, 1962)

k=l p+l

We may, therefore replace ~ by ~tj in (17) and (18) and replace the sums by infinite ones introducing a discrepancy o(p-~). hence (17) and (18) become

( E I ~ Z ~ , Y k i ~Yk ~ , + o ( 1 ) = E ( e k - i e k - j ) + o t l ) • a.s.

\ ~ , t = o / 120)

Furthermore, from (13) and the fact that 70 converges to E(y~) we have

?'i = E(YkYk-i) + 0(1), a.s.

so we have proved (16). To prove strong consistency of the final estimate 0(N ) the same

argument as above can be used replacing Yk and e k by the filtered quantities Yk and ~k, throughout , and in (19) and (20) replacing e k by ek where t~(q- l )g 'k = ek. We omit the details since they are straightforward.

References Baxter, G. (1962). An asymptotic result for the finite predictor.

Math. Scand., 10, 137. Durbin, J. 0960). The fitting of time-series models. Rev. Inst.

Statist., 28, 233. Hannan, E. J. and J. Rissanen (1982). Recursive estimation of

ARMA order. Biometrika, 69, 81. Hannan, E. J. and L. Kavalieris (1983). The convergence of auto-

correlations and autoregressions. Austral. J. Statist., 25. Mayne, D. Q. and F. Firoozan (1982). Linear identification of

ARMA processes. Automatica, 18, 461.