16. serial correlation -...

16. Serial Correlation

Hayashi pp. 365-412

Advanced Econometrics I, Autumn 2010, Serial Correlations 1

Introduction

The serial correlaion discussion here permits extending the GMM discussionwe had

The extension involves incorporating serially correlated moment conditions

This though necessitates the generalisation of the CLT to serially correlatedprocesses

The generalisation is possible under certain conditions restricting the degreeof serial correlation

The condition is transparent for the stochastic processes called linearprocesses


Introduction (cont’d)

Recall (from Ch. 1) the OLS Assumptions, in particular

(a) Strict exogeneity

E(εi|x) = 0 (i = 1, 2, . . . , n)

– in the context of time-series, this assumption means that the errorterm is orthogonal to the past, current, and future regressors

– for most time-series models this condition is not satisfied– the finite-sample theory based on strict exogeneity is rarely applicable

in time-series contexts– (however, the estimator possesses good large-sample properties wi-

thout strict exogeneity.)



– A first-order Autoregressive process (AR(1)) is the clearest exampleof the violation of strict exogeneity assumption

yi = βyi−1 + εi (i = 1, 2, . . . , n)

– consistent with strict exogeneity assumption, suppose that the regressorfor observation i, yi−1, is orthogonal to the error term for i so thatE(yi−1εi) = 0

E(yiεi) = E[(βyi−1 + εi)εi]

= βE(yi−1εi) + E(ε2i )

= E(ε2i ) (since E(yi−1εi) = 0 by hypothesis)



– ⇒ unless the error term is always zero, E(yiεi) is not zero– but yi is the regressor for observation i + 1 ⇒ the regressor is not

orthogonal to the past error term ⇒ violation of the assumption of strictexogeneity

(b) Spherical error variance

E(ε2i |x) = σ2 > 0 (i = 1, 2, . . . , n)

E(εiεj|x) = 0 (i, j = 1, 2, . . . , n; i 6= j)

– E(εiεj|x) = 0 ⇒ the joint distribution of (εi, εj) conditional on x, thecovariance, is zero

– in the context of time-series models, this states that there is noserial correlation in the error term.



Recall also (from Ch. 2) that:

- if the index for a sequence of raondom variables zi (i = 1, 2, . . . , ) isrepresenting time, t, the stochastic process is called a time series.

- a stochastic process zi (i = 1, 2, . . . , ) is (strictly) stationary if, forany given integer, r, and for any set of subscripts i1, i2, . . . , ir, the j.d.of zi, zi1, zi2, . . . , zir depends only on i1 − i, i2 − i, i3 − i, . . . , ir − i butnot on i.

- a stochastic process zi is weakly (or covariance) stationary if:(i) E(zi) does not depend on i, and (ii) Cov(zi, zi−j) exists, is finite,and depends only on j but not on i.


Modelling Serial Correlation

A white noise process {εt} is a zero-mean covariance-stationary processwith no serial correlation:

E(εt) = 0,

E(ε2t ) = σ2 > 0

E(εtεt−j) = 0 forj 6= 0.

linear processes: a very important class of covariance-stationary processescan be created by taking a moving average of a white noise process.

The current value of a linear process can depend on possibly infinite pastvalue of a white noise process


Modelling Serial Correlation (cont’d)

q-th order moving-average process (MA(q)): a process {yt} is calledMA(q) if it can be written as a weighted average of the current and mostrecent q values of a white noise process

yt = µ+ θ0εt + θ1εt−1 + . . .+ θqεt−q with θ0 = 1.

Serial correlation in MA(q) processes dies out completely after q lags

infinite-order moving-average process (MA(∞)): an (MA(∞)) processis one where yt depends on the infinite past:

yt = µ+ ψ0εt + ψ1εt−1 + . . .

= µ+

∞∑j=0

ψjεt−j where{ψj} = a sequence of real numbers



Let yt is an ergodic-stationary time series with E[yt] = µ and var(yt) existsand is finite.

Wold decomposition Theorem means that yt has the following representation

yt = µ+

∞∑j=0

ψjεt−j

= µ+ εt + ψ1εt−1 + . . .

ψ0 = 1,

∞∑j=0

ψ2j <∞

εt ∼ MDS(0, σ2)



According to the Wold representation:

- yt has a linear structure, hence the Wold representation is often calledthe linear representation of yt

- ψ is the infinite vector of moving average weights

-∑∞

j=0ψ2j < ∞ is called square-summability and controls the memory

of the process.

- square-summability ⇒ |ψj| → 0 as j →∞ at sufficiently fast rate.



Variance

γ0 = var(yt)

= var

∞∑j=0

ψjεt−j

=

∞∑j=0

ψ2jvar(εt)

= σ2∞∑j=0

ψ2j

<∞



Autocovariances

γj = E[(yt − µ)(yt−j − µ)]

= E

[( ∞∑k=0

ψkεt−k

)( ∞∑h=0

ψhεt−h−j

)]= E[(ψ0εt + ψ1εt−1 + . . .+ ψjεt−j︸︷︷︸+ . . .)× (ψ0εt−j︸︷︷︸+ψ1εt−j−1 + . . .)]

= σ2∞∑k=0

ψj+kψk, j = 0, 1, 2, . . . .



Ergodicity requires that∞∑j=0

|ψj| <∞

We can show that ∞∑j=0

ψ2j <∞,

which in turn implies that∑∞

j=0 |ψj| <∞.



Example: MA(1) process

yt = µ+ εt + θεt−1, |θ| < 1

εt ∼ iid(0, σ2)

Then

φ1 = θ, φk = 0 for k > 1

E[yt] = µ

γ0 = E[(yt − µ)2] = σ2(1 + θ2)

γ1 = E[(yt − µ)(yt−1 − µ)] = σ2θ

γk = 0, k > 1,


which shows that

∞∑j=0

ψ2j = 1 + θ2 <∞,

∞∑j=0

|γj| = σ2(1 + θ2 + |θ|) <∞

⇒ {yt} is both weakly stationary and ergodic.



Example: AR(1) process

Mean adjusted form:

yt − µ = φ(yt−1 − µ) + εt, εt ∼WN(0, σ2), |φ| < 1,

E[yt] = µ

Regression form:

yt = c+ φyt−1 + εt, c = µ(1− φ)


Solution by recursive substitution:

yt − µ = φt+1(y−1 − µ) + φtε0 + . . .+ φεt−1 + εt

= φt+1(y−1 − µ) +

t∑i=0

φiεt−i

= φt+1(y−1 − µ) +

t∑i=0

ψiεt−i, ψi = φi



Stability and Stationarity Conditions

If |φ| < 1, thenlimj→∞

φj = limj→∞

ψj = 0

limj→∞

φj(y−1 − µ) = 0

the stationary solution (Wold form) for the AR(1) becomes.

yt = µ+

∞∑j=0

φjεt−j = µ+

∞∑j=0

ψjεt−j

ψj = φj

This is a stable (non-explosive) solution.


Modelling Serial Correlation - Lag operator

The lag operator L, defined by the relation Ljxt = xt−j,enables compactexpression of the operation of taking a weighted average of successive valuesof a process.

Properties of L

- LC = C, the lag of a constant is a constant

- the distributive law holds

(Li + Lj)yt = Liyt + Ljyt = yt−i + yt−j

- associative law of multiplication holds

LiLjyt = Li(Ljyt) = yt−j−i


similarly

LiLjyt = Li+jyt = yt−i−j

note L0yt = yt

- lead operator, L raised to a negative power

L−iyt = yt+1

- For |φ| < 1, the infinite sum

(1 + φL+ φ2L2 + φ3L3 + . . .)yt =yt

(1− φL)


proof:(×) each side by (1− φL)

(1− φL)(1 + φL+ φ2L2 + φ3L3 + . . .)yt = yt

Given that |φ| < 1, φnLnyt → 0 as n→∞

- For |φ| > 1, the infinite sum

[1 + (φL)−1 + (φL)−2 + (φL)−3 + . . .]yt =−φLyt

(1− φL)

Thus


yt(1− φL)

= −(φL)−1∞∑i=0

(φL)−iyt.

proof:(×) each side by (1− φL)

(1− φL)(1 + φL+ φ2L2 + φ2L2 + φ3L3 + . . .)yt = −φyt

⇒ [1−φL+(φL)−1−1+(φL)−2−(φL)−1+(φL)−3−(φL)−2+. . .]yt = −φLyt

since |φ| > 1,

φ−nL−nyt → 0 as n→∞


It is straightforward to use lag operators to solve linear difference equations.

Eg. Consider the First-order equation:

yt = φ0 + φ1yt−1 + εt

where |φ| < 1using L we could write this as

yt = φ0 + φ1Lyt + εt

=φ0 + εt1− φ1L

Property 1 ⇒ Lφ0 = φ0, so that


φ0(1− φ1L)

= φ0 + φ1φ0 + φ21φ0 + . . .

=φ0

(1− φ1)

Prpoerty 5 ⇒

εt(1− φ1L)

= εt + φ1εt−1 + φ21εt−2 + . . .

=

∞∑i=0

φi1εt−i

Thus,

yt =φ0

(1− φ1)+

∞∑i=0

φi1εt−i



A AR(1) process satisfies the following stochastic difference equation:

yt = c+ φyt−1 + εt or

yt − φyt−1 = c+ εt or

(1− φL)yt = c+ εt

where {εt} ∼WN(0, σ2).



AR(1) in Lag Operator Notation:

(1− φL)(yt − µ) = εt

If |φ| < 1, then

(1− φL)−1 =

∞∑j=0

φjLj = 1 + φL+ φ2L2 + . . .

so that(1− φL)−1(1− φL) = 1



Finding the Wold form:

yt − µ = (1− φL)−1(1− φL)(yt − µ) = (1− φL)−1εt

=

∞∑j=0

φjLjεt

=

∞∑j=0

φjεt−j

=

∞∑j=0

ψjεt−j, ψj = φj



Calculating moments: use stationarity properties

E[yt] = E[yt−j] for all j

cov(yt, yt−j) = cov(yt−k, yt−k−j) for all k, j

Mean of AR(1)

E[yt] = c+ φE[yt−1] + E[εt]

= c+ φE[yt]

⇒ E[yt] =c

(1− φ)= µ



Variance of AR(1)

γ0 = var(yt) = E[(yt − µ)2]

= E[(φ(yt−1 − µ) + εt)2]

= φ2E[(yt−1 − µ)2] + 2φE[(yt−1 − µ)εt] + E[ε2t ]

= φ2E[(yt−1 − µ)2] + 0 + σ2

= φ2γ0 + σ2

⇒ γ0 =σ2

1− φ2



Autocovariances and Autocorrelations:

Multiply yt − µ by yt−j − µ and take expectations

γj = E[(yt − µ)(yt−j − µ)]

= E[φ(yt−1 − µ)(yt−j − µ)] + E[εt(yt−j − µ)]

= φγj−1 (by stationarity)

⇒ γj = φjγ0 = φjσ2

1− φ2

Autocorrelations:

ρj =γjγ0

=φjγ0γ0

= φj = ψj


Asymptotic Properties of Linear Processes

LLN for Linear Processes. Assume

yt = µ+ ψ(L)εt, εt ∼ MDS(0, σ2)

= µ+

∞∑j=0

ψjεt−j, ψ(L) =

∞∑j=0

ψjLj

ψ(L) is 1-summable, that is

∞∑j=0

j|ψj| = 1|ψ1|+ 2|ψ2|+ . . . <∞


Asymptotics (cont’d)

Then

µ =1

T

T∑t=1

yt →pE[yt] = µ

γj =1

T

T∑t=1

(yt − µ)(yt − µ)→pcov(ytyt−j) = γj



CLT for Linear Processes


= µ+

∞∑j=0

ψjεt−j, ψ(L) =

∞∑j=0

ψjLj

ψ(L) is 1-summable

ψ(1) =

∞∑j=0

ψj 6= 0



Then √T (µ− µ)→

dN(0, LRV)

LRV = long-run variance

=

∞∑−∞

γj

= γ0 + 2

∞∑j=1

γj, since γj = γ−j

= σ2ψ(1)2



Intuition behind the LRV formula

Consider

var(√T y) = var

(1√T

T∑t=1

yt

)=

1

Tvar

(T∑

t=1

yt

)

Using the fact that

T∑t=1

yt = 1′y, 1 = (1, . . . , 1)

′, y = (y1, . . . , yT )

′

It follows that

var

(T∑

t=1

yt

)= var(1

′y) = 1

′var(y)1



Nowvar(y) = E[(y − µ1)(y − µ1)

′]

=

γ0 γ1 γ2 . . . γT−1γ1 γ0 γ1 . . . γT−2... ... ... . . . ...

γT−1 γT−2 γT−3 . . . γ0

= Γ ,

whereγj = cov(yt, yt−j) and γj = γ−j

Thus,

var

(T∑

t=1

yt

)= 1

′var (yt) 1 = 1

′Γ1



Now1′Γ1=sum of all elements in the T × T matrix Γ

This sum may be computed by summing across the rows, or the columns oralong the diagonals

Given the banded diagonal structure of Γ, it is most convinent to sum alongthe diagonals so that

1′Γ1 = Tγ0 + 2(T − 1)γ1 + 2(T − 2)γ2 + . . .+ 2γT−1



Then

1

T1′Γ1 = γ0 + 2

T − 1

Tγ1 + 2T − 2Tγ2 + . . .+ 2

1

TγT−1

= γ0 + 2 ·T−1∑j=1

(1− j

T

)γj

As T →∞, it can be shown that

1

T1′Γ1 → γ0 + 2 ·

∞∑j=1

γj = LRV



Remark

Since γj = γ−j,1T1′Γ1 may also be re-written as

1

T1′Γ1 = γ0 +

T−1∑j=−(T−1)

(1− |j|

T

)γj



Example: MA(1) Process

Yt = µ+ εt + θεt−1; |θ| < 1, εt ∼ iid(0, σ2)

We have seen that (see slide 14)

ψ(L) = 1 + θL

γ0 = σ2(1 + θ2), γ1 = σ2θ

Then


LRV = γ0 + 2 ·∞∑j=1

γj

= σ2(1 + θ2) + 2σ2θ

= σ2(1 + θ)2

= σ2ψ(1)2

Remarks

1. If θ = 0, then LRV=σ2

2. If θ = −1, then ψ(1) = 0⇒ LRV = σ2ψ(1)2 = 0

This motivates the condition that ψ(1) 6= 0 in the CLT for stationary andergodic linear processes


Estimating Long-Run Variance


LRV =

∞∑j=−∞

γj = γ0 + 2 ·∞∑j=1

γj

= σ2ψ(1)2

There are two types of estimators of the LRV:

• Parametric (assumes a parametric model for yt)

• Nonparametric (does not assume a parametric model for yt)


Incorporating Serial correlation in GMM

Recall that the moment condition gt in our GMM discussion is a K-dimensional vector defined as xt · εt (the product of the K-dimensionalvector of instruments xt and the scalar error term εt).

We also looked at:

i. the mean of gt is zero (by orthogonality assumption)

ii. the matrix S, defined to be the asymptotic variance of g(≡ 1T

∑Tt=1 gt),

was the variance of gt (by the assumption of gt being a m.d.s. with finitesecond moments)

Serial corelation was ruled out by the second assumption (Assumption 3.5)in our earlier discussion.



The CLT we looked at earlier is a generalisation that allows for serialcorrelation in {gt} by relaxing Assumption 3.5

This ensures that the long-run covariance matrix of {gt} is nonsingular.

Then √T g→

dN(0,LRV),

where

LRV =

∞∑j=−∞

Γj = Γ0 +

∞∑j=1

(Γj + Γ′j)

and Γj is the j-th order autocovariance matrix

Γj = E(gtg′t−j) (j = 0,±1,±2, . . .)

.



Or

Γ0 = E(gtg′t) = E[xtx

′tε

2t ]

Γj = E(gtg′t−j) = E[xtx

′t−jεtεt−j]

Comparing GMM with and without serial correlation, we could concludethat:

• S(≡ Avar(g)) = Γ0 in the absence of serial correlation (underAssumption 3.5)

• S(≡ Avar(g)) = LRV =∑∞

j=−∞Γj in the presence of serial correla-tion


16. serial correlation -...

Documents