today’s...

Today's Agenda

1. MLE�Simple Introduction� GARCH estimation

2. Kalman Filtering3. The Delta Method4. Empirical Portfolio Choice5.Wold Decomposition of Stationary Processes

1 Maximum Likelihood Estimation

(Preliminaries for GARCH/Stochastic Volatility &Kalman Filtering)� Suppose we have the series fY1; Y2;::::; YTg witha joint density fYT ::::Y1(�) that depends on someparameters � (such as means, variances, etc.)

� We observe a realization of Yt.� If we make some functional assumptions on f; wecan think of f as the probability of having observedthis particular sample, given the parameters �:

� The maximum likelihood estimate (MLE) of � is thevalue of the parameters � for which this sample ismost likely to have been observed.

� In other words, �̂MLEis the value that maximizes

fYT ::::Y1(�):

� Q: But, how do we know what f�the true density ofthe data�is?

� A: We don't.� Usually, we assume that f is normal, but this isstrictly for simplicity. The fact that we have to makedistributional assumptions limits the use of MLE inmany �nancial applications.

� Recall that if Yt are independent over time, thenfYT ::::Y1(�) = fYT (�T )fYT�1(�T�1):::fY1(�1)

= �Ti=1fYi (�i)

� Sometimes it is more convenient to take the log ofthe likelihood function, then

� (�) = log fYT ::::Y1(�) =TXi=1

log fYi (�)

� However, in most time series applications, theindependence assumption is untenable. Instead,we use a conditioning trick.

� Recall thatfY2Y1 = fY2jY1fY1

� In a similar fashion, we can writefYT ::::Y1(�) = fYT jYT�1::::Y1(�)fYT�1jYT�2::::Y1(�):::fY1(�)

� The log likelihood can be expressed as

� (�) = log fYT ::::Y1(�) =TXi=1

log fYijYi�1;:::;Y1 (�i)

� Example: The log-likelihood of an AR(1) processYt = c + �Yt�1 + "t

� Suppose that "t is iid N(0; �2)� Recall that E (Yt) = c

1�� and V ar (Yt) =�2

1��2

� Since Yt is a linear function of the "0ts, then it is alsoNormal (sum of normals is a normal).

� Therefore, the density (unconditional) of Yt isNormal.

� Result: If Y1 and Y2 are jointly Normal, then themarginals are also normal.

� Therefore,fY2jY1 is N

�(c + �y1) ; �

2�

or

fY2jY1 =1p2��2

exp

"� (y2 � c� �y1)

2

2�2

#

� Similarly,fY3jY2 is N

�(c + �y2) ; �

2�

or

fY3jY2 =1p2��2

exp

"� (y3 � c� �y2)

2

2�2

#

Then,the log likelihood can be written as

� (�) = log fY1 +TXt=2

log fYtjYt�1

= �12log (2�)� 1

2log��2=

�1� �2

��fy1 � (c= (1� �))g2

2�2=�1� �2

��(T � 1)

2log (2�)� (T � 1)

2log��2�

�TXt=2

(yt � c� �yt�1)2

2�2� The unknown parameters are collected in � =(c; �; �)

� We can maximize � (�) with respect to all thoseparameters and �nd the estimates that maximizethe probability of having observed such a sample.

max�� (�)

� Sometimes, we can even put constraints (such asj�j < 1)

� Q: Is it necessary to put the constraint �2 > 0?

� Note: If we forget the �rst observation, then we canwrite (setting c = 0) the FOC:

�TXt=2

@

@�

(yt � �yt�1)2

2�2= 0

TXt=2

yt�1 (yt � �yt�1) = 0

�̂ =

PTt=2 yt�1ytPTt=2 y

2t�1

� RESULT: In the univariate linear regression case,OLS, GMM, MLE are equivalent!!!

� To summarize the maximum likelihood principle:(a) Make a distributional assumption about the data(b) Use the conditioning to write the joint likelihoodfunction

(c) For convenience, we work with the log-likelihoodfunction

(d) Maximize the likelihood function with respect tothe parameters

� There are some subtle points.� We had to specify the unconditional distribution ofthe �rst observation

� We had to make an assumption about thedependence in the series

� But sometimes, MLE is the only way to go.� MLE is particularly appealing if we know thedistribution of the series. Most other de�cienciescan be circumvented.

� Now, you will ask: What are the properties of�̂MLE

? More speci�cally, is it consistent? What is itsdistribution, where

�̂MLE

= argmax� (�)

� Yes, �̂MLEis a consistent estimator of �:

� As you probably expect the asymptotic distributionof �̂

MLEis normal.

� Result:T 1=2

��̂MLE � �

�� aN (0; V )

V =

��@

2� (�)

@�@�0j�̂MLE

��1or

V =

TXt=1

l��̂MLE

; y�l��̂MLE

; y�

l��̂MLE

; y�=@f

@�

��̂MLE

; y�

� But we will not dwell on proving those properties.

Another Example: The log-likelihood of anAR(1)+ARCH(1) process

Yt = c + �Yt�1 + ut� where,

ut =phtvt

� ARCH(1) is:ht = � + au2t�1

where vt is iid with mean 0; and E�v2t�= 1:

� GARCH(1,1): Suppose, we specify ht asht = � + �ht�1 + au

2t�1

� Recall that E (Yt) = c1�� and V ar (Yt) =

�2

1��2

� Since Yt is a linear function of the "0ts, then it is alsoNormal (sum of normals is a normal).

� Therefore, the density (unconditional) of Yt isNormal.

� Result: If Y1 and Y2 are jointly Normal, then themarginals are also normal.

� Therefore,fY2jY1 is N ((c + �y1) ; h2)

or for the ARCH(1)

fY2jY1 =1p

2� (� + au21)exp

"� (y2 � c� �y1)

2

2 (� + au21)

#

� Similarly,fY3jY2 is N ((c + �y2) ; h3)

or

fY3jY2 =1p

2� (� + au22)exp

"� (y3 � c� �y2)

2

2 (� + �u22)

#

Then, the conditional log likelihood can be writtenas

� (�jy1) =TXt=2

log fYtjYt�1

= �(T � 1)2

log (2�)� (1)2

TXt=2

log�� + au2t�1

��

TXt=2

(yt � c� �yt�1)2

2�� + �u2t�1

�� The unknown parameters are collected in � =(c; �; �; �)

� We can maximize � (�) with respect to all thoseparameters and �nd the estimates that maximizethe probability of having observed such a sample.

max�� (�)

� Example:mle_arch.m

� Similarly for GARCH(1,1):

� (�jy1) =TXt=2

log fYtjYt�1

= �(T � 1)2

log (2�)� (1)2

TXt=2

log (ht)

�TXt=2

(yt � c� �yt�1)2

2 (ht)

whereht = � + �ht�1 + �u

2t�1

� To construct ht; we have to �lter the fut�1g series.� For a given uts; h0; and �; �; and �; we construct ht� The ht will allow us to evaluate the likelihood � (�jy1)� Optimize � (�jy1) with respect to all the parameters,given the initial conditions.

� This recursive feature of the GARCH makes itharder to estimate with GMM.

2 Kalman Filtering

� History: Kalman (1963) paper� Problem: We have a missile that we want to guideto its proper target.� The trajectory of the missile IS observable fromthe control center.

� Most other circumstances, such as weatherconditions, possible interception methods, etc.are NOT observable, but can be forecastable.

� We want to guide the missile to its properdestination.

� In �nance the setup is very similar, but the problemis different.

� In the missile case, the parameters of the systemare known. The interest is, given those parameters,to control the missile to its proper destination.

� In �nance, we want to estimate the parameters ofthe system. We are usually not concerned witha control problem, because there are very fewinstruments we can use as controls (although thereare counter-examples).

2.1 Setup (Hamilton CH 13)

yt = A0xt +H0zt + wt

zt = Fzt�1 + vtwhere

� yt is the observable variable (think �returns�)� The �rst equation, the yt equation is called the�space� or the �observation� equation.

� zt is the unobservable variable (think �volatility� or�state of the economy�)� The second equation, the zt equation is called the�state� equation.

� xt is a vector of exogenous (or predetermined)variables (we can set xt = 0 for now).

� vt and wt are iid and assumed to be uncorrelated atall lags

E (wtv0t) = 0

� Also E (vtv0t) = Q; E (wtw0t) = R

� The system of equations is known as a state-spacerepresentation.

� Any time series can be written in a state-spacerepresentation.

� In standard engineering problems, it is assumedthat we know the parameters A;H; F;Q;R:

� The problem is to give impulses xt such that, giventhe states zt; the missile is guided as closely totarget as possible.

� In �nance, we want to estimate the unknownparameters A;H; F;Q;R in order to understandwhere the system is going, given the states zt:There is little attempt at guiding the system. In fact,we usually assume that xt = 1 and A = E(Yt); oreven that xt = 0:

� Note: Any time series can be written as a statespace.

� Example: AR(2): Yt+1 � � = �1 (Yt � �) +�2 (Yt�1 � �) + "t+1

� State equation:�Yt+1 � �Yt � �

�=

��1 �21 0

� �Yt � �Yt�1 � �

�+

�"t+10

�� Observation equation:

yt = � +�1 0

� � Yt+1 � �Yt � �

�� There are other state-space representations of Yt:Can you write down another one?

� As a �rst step, we will assume that A;H; F;Q;Rare known.

� Our goal would be to �nd a best linear forecast ofthe state (unobserved) vector zt: Such a forecastis needed in control problems (to take decisions)and in �nance (state of the economy, forecasts ofunobserved volatility).

� The forecasts will be denoted by:zt+1jt = E (zt+1jyt:::; xt::::)

and we assume that we are only taking linearprojections of zt+1 on yt:::; xt::::: Nonlinear KalmanFilters exist but the results are a bit more compli-cated.

� The Kalman Filter calculates the forecasts zt+1jtrecursively, starting with z1j0; then z2j1; ...until zT jT�1:

� Since ztjt�1 is a forecast, we can ask how good of aforecast it is?

� Therefore, we de�ne Ptjt�1 = E��zt � ztjt�1

� �zt � ztjt�1

��;

which is the forecasting error from the recursiveforecast ztjt�1:

� The Kalman Filter can be broken down into 5 steps1. Initialization of the recursion. We need z1j0: Usually,we take z1j0 to be the unconditional mean, orz1j0 = E (z1) : (Q: how can we estimate E (z1)? )The associated error with this forecast is P1j0 =E��z1j0 � z1

� �z1j0 � z1

��

2. Forecasting yt (intermediate step)The ultimate goal is to calculate ztjt�1, but we dothat recursively. We will �rst need to forecast thevalue of yt; based on available information:

E (ytjxt; zt) = A0xt +H0zt

From the law of iterated expectations,Et�1 (Et (yt)) = Et�1 (yt) = A0xt +H

0ztjt�1The error from this forecast is

yt � ytjt�1 = H 0 �zt � ztjt�1�+ wt

with MSEE�yt � ytjt�1

� �yt � ytjt�1

�0= E

hH 0 �zt � ztjt�1

� �zt � ztjt�1

�0Hi+ E [wtw

0t]

= H 0Ptjt�1H +R

3. Updating Step (ztjt)� Once we observe yt; we can update our forecastof zt; denoting it by ztjt; before making the newforecast, zt+1jt:

� We do this by calculating E (ztjyt; xt; :::) = ztjtztjt = ztjt�1 + E

��zt � ztjt�1


��

E�yt � ytjt�1


�0��1 �yt � ytjt�1

�� We can write this a bit more intuitively as:

ztjt = ztjt�1 + ��yt � ytjt�1

�where � is the OLS coef�cient from regressing�zt � ztjt�1

�on�yt � ytjt�1

�:

� The bigger is the relationship between the twoforecasting errors, the bigger the correction mustbe.

� It can be shown thatztjt = ztjt�1+Ptjt�1H

�H 0Ptjt�1H +R

��1 �yt � A0xt �H 0ztjt�1

�� This updated forecast uses the old forecast ztjt�1;and the just observed values of yt and xt.

4. Forecast zt+1jt:� Once we have an update of the old forecast, wecan produce a new forecast, the forecast of zt+1jt

Et (zt+1) = E (zt+1jyt; xt; :::)= E (Fzt + vt+1jyt; xt; :::)= FE (ztjyt; xt; :::) + 0= Fztjt

� We can use the above equation to writeEt (zt+1) = Ffztjt�1

+Ptjt�1H�H 0Ptjt�1H +R


�g

= Fztjt�1

+FPtjt�1H�H 0Ptjt�1H +R


�� We can also derive an equation for the error inforecast as a recursion

Pt+1jt = F [Ptjt

�Ptjt�1H�H 0Ptjt�1H +R

��1H 0Ptjt�1]F

0

+Q

5. Go to step 2, until we reach T. Then, we are done.

� Summary: The Kalman Filter produces� The optimal forecasts of zt+1jt and yt+1jt (optimalwithin the class of linear forecasts)

� We need some initialization assumptions� We need to know the parameters of the system,i.e. A; H; F; Q; R:

� Now, we need to �nd a way to estimate theparameters A; H; F; Q; R:

� By far, the most popular method is MLE.� Aside: Simulations Methods�getting away from therestrictive assumptions of "t

2.2 Estimation of Kalman Filters (MLE)

� Suppose that z1; and the shocks (wt; vt) are jointlynormally distributed.

� Under such an assumption, we can make the verystrong claim that the forecasts zt+1jt and yt+1jtare optimal among any functions of xt; yt�1:::: Inother words, if we have normal errors, we cannotproduce better forecasts using the past data thanthe Kalman forecasts!!!

� If the errors are normal, then all variables in thelinear system have a normal distribution.

� More speci�cally, the distribution of yt conditionalon xt, and yt�1; ::: is normal, orytjxt; yt�1::: � N

�A0xt +H

0ztjt�1;�H 0Ptjt�1H +R

�� Therefore, we can specify the likelihood function ofytjxt; yt�1 as we did above.

fytjxt;yt�1 = (2�)�n=2��H 0Ptjt�1H +R

��1=2� exp[�1

2

�yt � A0xt �H 0ztjt�1

�0 �H 0Ptjt�1H +R

��1��yt � A0xt �H 0ztjt�1

�]

� The problem is to maximize

maxA;H;F;Q;R

TXt=1

log fytjxt;yt�1

� Words of wisdom:� This maximization problem can easily get un-manageable to estimate, even using moderncomputers. The problem is that searching forglobal max is very tricky.� A possible solution is to make as many restric-tions as possible and then to relax them one byone.

� A second solution is to write a model that givestheoretical restrictions.

� Recall that there are more than 1 state spacerepresentations of an AR process. This impliesthat some of the parameters in the state-spacesystem are not identi�ed. In other words, morethan one value of the parameters (differentcombinations) can give rise to the same likelihoodfunction.� Then, which likelihood do we choose?� Have to make restrictions so that we have anexactly identi�ed problem.

2.3 Applications in Finance

� Anytime we have unobservable state variables� Filtering expected returns (Pastor and Stambaugh(JF, 2008))

� Filtering variance (Brandt and Kang (JFE, 2007))

� Interpolation of data� Bernanke and Kuttner (JME?)

� Time varying parameters� Time-varying Betas (Ghysels (JF, 1998))

3 Kalman Smoother

� For purely forecating purposes, we needztjt�1 = E (ztjIt�1)

where It�1 = fyt�1;yt�2; :::; y1; xt�1; :::x1g and thecorresponding error Ptjt�1 = E

��zt � ztjt�1

�2�� But if we want to model a process (under-stand its properties), we might want to incor-porate all the available information in IT =fyT;yT�1; :::; y1; xT ; :::x1g:

� In other words, we might want to estimateztjT = E (ztjIT )

There is de�nitely a look-ahead bias here, butthat is the point. We want to include all availableinformation in order to get a better glimpse into theproperties of zt!

� Recall that from the KF, we have the sequencesfzt+1jtg; fztjtg; fPt+1jtg; fPtjtg:

� Suppose someone tells you the correct value ofzt+1 at time t. How can you improve upon thebest forecast ztjt? It turns out that we do the sameupdating as we did in step 3 of the KF:

E (ztjzt+1;It) = ztjt + E��zt � ztjt

� �zt+1 � zt+1jt

��

E�zt+1 � zt+1jt

� �zt+1 � zt+1jt

�0��1 �zt+1 � zt+1jt

�1. � We can write this a bit more intuitively as:

E (ztjzt+1;It) = ztjt + Jt�zt+1 � zt+1jt

�whereJt = E

��zt � ztjt

� �zt+1 � zt+1jt

��

E�zt+1 � zt+1jt

� �zt+1 � zt+1jt

�0��1= PtjtFP

�1t+1jt

� Because the process is Markovian,E (ztjzt+1;It) =E (ztjzt+1;IT ) : We can't do better than that!Hence,

E (ztjzt+1;IT ) = ztjt + Jt�zt+1 � zt+1jt

�� Last step. We can show that

E (ztjIT ) = ztjT = ztjt + Jt�zt+1jT � zt+1jt

�

� Hence, the KS algorithm is, after we obtain theKF fzt+1jtg; fztjtg; fPt+1jtg; fPtjtg

2. Start at the end, zT jT :3. Compute JT�1 = PT�1jT�1FP

�1T jT�1

4. ComputezT�1jT = zT�1jT�1 + JT�1

�zT jT � zT jT�1

�5. Use zT�1jT to compute zT�2jT and so on.6.We can compute the associated MSE as

PtjT = Ptjt + Jt�Pt+1jT � Pt+1jt

�J0

t

4 Time-Varying Parameters

� An example of a time varying parameter model:rt+1 = � + �txt + "t+1�t+1 = �t + vt+1

� Q: What equations are observations and what arethe state equations?

� Note that this does not �t within the KF setup:yt = A0xt +H

0zt + wtzt = Fzt�1 + vt

� We need the generalizationyt = A(xt) +H(xt)

0zt + wtzt+1 = F (xt)zt + vt+1

� Note that F (xt) and not F (xt+1) in the stateequation!

� Now, we have to assume that�we didn't have to doit earlier!�

wtvt+1

jxt; It�1�� N

��00

�;

�Q(xt) 00 R(xt)

�� Before, we had linearity in all variables. Now, wedon't.

� Given the conditional normal assumption, we canshow that�ztytjxt; It�1

�� N

��ztjt�1

A(xt) +H(xt)0ztjt�1

�; V

�V =

�Ptjt�1 Ptjt�1H(xt)

H 0(xt)Ptjt�1 H0(xt)Ptjt�1H(xt) +R(xt)

�where

�ztjt�1

;�ztjt;�Ptjt�1

;�Ptjtare ob-

tained from the KF procedure above.

� Notice that, conditional on xt, the time varyingparameters are �xed.

� Estimation is easy (MLE), given the assumption.

� TVP Example:rt = �t�t + wt

�t+1 � �� = F��t � ��

�+ vt+1

We have a CAPM with TV �s in mind.

� If we assume that�wtvt+1

jxt; It�1�� N

��00

�;

��2 00 Q

��then we are within the KF framework.

� Substituting the state variable zt =��t � ��

�in the

space equation, we can writert = �t�� + �tzt + wt

� We can plug in the MLE estimator directly.� Note: We can allow an AR(p) dynamics in the stateequation quite easily.

� Example: Wells, C., The Kalman Filter in Finance,Springer Netherlands.

� Example: Ludvigson and NG (JFE, 2007)mt+1 = a0Ft + �

0Zt + "t+1V OLt+1 = c0Ft + d

0Zt + ut+1where V OLt+1 is the realized volatility in montht + 1: (Observable)

5 Brandt and Kang (JFE, 2004):rt+1 = �t + �tut+1�

ln�tln�t

�= d + A

�ln�t�1ln�t�1

�+ "t

The Delta Method

� We estimate y = �x + "; and obtain �̂ but areinterested in a function g (�) ; where g(:) is somenon-linear model.

� Example: we have a forecast of the volatility, �̂t andwant to test its economic signi�cance.

� Statistical measure of �t: MSE=En(�̂t � �t)

2o

� Economic measure of �t: C (�̂t; St; K; r; T ) andcompare it to C (�t; St; K; r; T ) ; where C(:) is theBS call-option formula.

� We want to know whether C (�̂t; St; K; r; T ) �C (�t; St; K; r; T ) is economically and statisticallydifferent from zero.

� The Delta method� If we have a consistent, asymptotically normalestimator p

T��̂ � �

�!d N(0; V )

and g(:) is differentiable, thenpT�g��̂�� g (�)

�! dN(0; D0V D)

D =@g

@�j�

� Sketch of the proof: From the Mean-Value Theorem,we can write

g��̂�= g (�) +

@g0

@�j�M

��̂ � �

�where �M lies between �̂ and �: Since, �̂ !p �; then�M !p � and @g0

@� j�M !p @g0

@� j� (Continuous MappingTheorem).

Then, we can writepT�g��̂�� g (�)

�=@g0

@�j�M

pT��̂ � �

�� @g0

@� j�M !p @g0

@� j�

�pT��̂ � �

�!d N(0; V )

� Slutsky Theorem: @g0

@� j�MpT��̂ � �

�!dh

@g0

@� j�iN(0; V )

� OrpT�g��̂�� g (�)

�!d N(0;

�@g0

@�j��V

�@g

@�j��)

� Example: We run the regression (s.e. in parenthe-ses)

yt = � + �xt + "t= 0:1

(0:04)+ 1:1(0:3)

xt + "t

� A test of � = 1 yields, t = (1:1� 1) =0:3 = 0:33� We are interested in ln(�̂) and testing under the nullof ln (�) = 0: From the delta method, we know thatp

T�ln��̂�� ln (�)

�!d N(0; D2V )

where D = 11:1 = 0:91, V = 0:3^2 = 0:09;or

pT (0:095� 0)!d N(0; 0:9120:09)

and a test of ln(�) = 0 is t = 0:095=0:2862 = 0:33:

6 Empirical Portfolio Choice�Mean-VarianceImplementation

� The solution to the mean-variance problem:minxvar (rp;t+1) = x0�x

s:t:E (rp) = x0� = ��

isx� =

��

�0�� 1�

= ��1�

� Now, we have to rely on econometrics, to implementthe solution.

� Two step approach:� Solve the model� Estimate the parameters and plug them in!

� PLUG-IN APPROACH:� We continue with the assumption that returns arei.i.d.

� Then, we can estimate

�̂ =1

T

TXt=1

rt+1

�̂ =1

T �N � 2

TXt=1

(rt+1 � �̂) (rt+1 � �̂)0

� We plug in the estimates into the optimal solutionx̂� =

1

�̂�1�̂

� Under the normality assumption, this estimator isunbiased, or

E (x̂�) =1

E��̂�1

�E (�̂)

� In the univariate case we can show by the deltamethod that

V ar (x̂�) =1

2

� ��2

�2�var(�̂)�2

+var(�̂2)

�4

�

� Example� Suppose we have 10 years of monthly data, orT = 120:

� Suppose we have a stock with � = 0:06 and� = 0:15:

� Suppose that = 5:� Note that

x̂� =1

�̂

�

=0:06

5 � 0:15^2 = 0:533

� Very close to the usual 60/40 advice by �nancialadvisors!

� With i.i.d. returns, the standard errors of the meanand variance are

var(�̂) =�pT=0:15p120

= 0:014

var(�̂2) =p2�2pT=p20:152p120

= 0:003

� Plugging all these in the formula for V ar (x̂�) ;weobtain

V ar (x̂�) = 0:14

� We can test hypotheses as with every otherparameter of interest.

� Estimating � is very problematic� Many parameters to estimate� Suppose we have 500 assets in the portfolio. Wehave 125,250 unique elements to estimate.

� In general, for N assets, we have N(N + 1)=2unique elements to estimate!

� We need ��1: Small estimation errors �̂ results invery different �̂�1:

� Solution: Shrink the matrix�̂s = �S + (1� �) �̂

where� � 1

T

A�B

C

A =Xi

Xj

asy var�p

T �̂i;j

�B =

XXasy cov

�pT �̂i;j;

pTsi;j

�C =

XX(�̂i;j � si;j)

2

where S is often taken to be I: For more discus-sions, see Ledoit and Wolf (2003)

� We can also shrink the weights directlyxs = �x0 + (1� �)x�

� This approach is often used in applied work.� Problem with shrinkage: Ad-hoc. No economicjusti�cation for it or for �:

� Bayesian framework� Economic constraints [Jagannathan and Ma (JF,2003)]

� Another solution: Factor models for stock iri;t = �i + �ifm + "i;t

� We can take variances to show that�r = �2m��

0 + �rwhere � is a vector of the betas and �r isa diagonal matrix with diagonal elements thevariances of "i;t:

� Now, the problem is reduced signi�cantly!� What about time variation in � and �!

7 Wold Decomposition: Stationary Processes

� Q: Isn't the AR(1) (or ARMA(p,q)) model restrictive?� No, because of the Wold decomposition result

Wold's (1938) Theorem: Any zero-mean covari-ance stationary process Yt can be represented in theform

Yt =1Xj=0

j"t�j + �t

where 0 = 1 andP1

j=0 2j <1 (square summable).

The term "t is white noise and represents the linearprojection error of Yt on lagged Y 0t s

"t = Yt � E (YtjYt�1; Yt�2; :::) :The value �t is uncorrelated with "t�j for any j and isa purely deterministic term.� Can we estimate all j in the Wold's decomposi-tion?

� The stationary (j�j < 1) AR(1) model can be writtenas

Yt = �Yt�1 + "t(1� �L)Yt = "t

Yt = (1� �L)�1 "t

=

1Xj=0

�j"t�j

or j = �j: This is the restriction for the AR(1)model.� The stationary ARMA(1,1) model can be written as

Yt = �Yt�1 + "t + �"�1(1� �L)Yt = (1 + �L) "t

Yt ="t

(1� �L)+

�"t�1(1� �L)

= "t +1Xj=1

�j�1 (� + �) "t�j

or j = �j�1 (� + �) :

� And so on.

� Another interesting process: Fractionally differenc-ing

Yt = (1� L)�d "t

� where d is a number between 0 and 0.5.� It can be shown (Granger and Joyeux (1980), andJosking (1981)) that

Yt =1Xj=0

�j"t�j

�j = (1=j!) (d + j � 1) (d + j � 2) (d + j � 3) ::: (d + 1) d�j � (j + 1)d�1 ; for large j

� Plot of �j for d = 0:25 and �j for � = 0:5 and� = 0:95

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Lags

φ=0.5d=0.25φ=0.5

� There is a similar representation in the spectraldomaineSpectral Representation Theorem [e.g., Cramer

and Leadbetter (1967)]: Any covariance stationaryprocess Yt with absolutely summable autocovari-ances can be represented as

Yt = � +

Z �

0

[� (!) cos (!t) + � (!) sin (!t)] d!

where � (:) and �(:) are zero-mean random variablesfor any �xed frequency ! 2 [0; �] : Also, for anyfrequencies 0 < !1 < !2 < !3 < !4 < �,

R !2!1�(!)d!

is uncorrelated withR !4!3�(!)d! and the variableR !2

!1�(!)d! is uncorrelated with

R !4!3�(!)d!:

� Different (but equivalent) way of looking at atime-series.

today’s...

Documents