today’s...
TRANSCRIPT
Today's Agenda
1. MLE�Simple Introduction� GARCH estimation
2. Kalman Filtering3. The Delta Method4. Empirical Portfolio Choice5.Wold Decomposition of Stationary Processes
1 Maximum Likelihood Estimation
(Preliminaries for GARCH/Stochastic Volatility &Kalman Filtering)� Suppose we have the series fY1; Y2;::::; YTg witha joint density fYT ::::Y1(�) that depends on someparameters � (such as means, variances, etc.)
� We observe a realization of Yt.� If we make some functional assumptions on f; wecan think of f as the probability of having observedthis particular sample, given the parameters �:
� The maximum likelihood estimate (MLE) of � is thevalue of the parameters � for which this sample ismost likely to have been observed.
� In other words, �̂MLEis the value that maximizes
fYT ::::Y1(�):
� Q: But, how do we know what f�the true density ofthe data�is?
� A: We don't.� Usually, we assume that f is normal, but this isstrictly for simplicity. The fact that we have to makedistributional assumptions limits the use of MLE inmany �nancial applications.
� Recall that if Yt are independent over time, thenfYT ::::Y1(�) = fYT (�T )fYT�1(�T�1):::fY1(�1)
= �Ti=1fYi (�i)
� Sometimes it is more convenient to take the log ofthe likelihood function, then
� (�) = log fYT ::::Y1(�) =TXi=1
log fYi (�)
� However, in most time series applications, theindependence assumption is untenable. Instead,we use a conditioning trick.
� Recall thatfY2Y1 = fY2jY1fY1
� In a similar fashion, we can writefYT ::::Y1(�) = fYT jYT�1::::Y1(�)fYT�1jYT�2::::Y1(�):::fY1(�)
� The log likelihood can be expressed as
� (�) = log fYT ::::Y1(�) =TXi=1
log fYijYi�1;:::;Y1 (�i)
� Example: The log-likelihood of an AR(1) processYt = c + �Yt�1 + "t
� Suppose that "t is iid N(0; �2)� Recall that E (Yt) = c
1�� and V ar (Yt) =�2
1��2
� Since Yt is a linear function of the "0ts, then it is alsoNormal (sum of normals is a normal).
� Therefore, the density (unconditional) of Yt isNormal.
� Result: If Y1 and Y2 are jointly Normal, then themarginals are also normal.
� Therefore,fY2jY1 is N
�(c + �y1) ; �
2�
or
fY2jY1 =1p2��2
exp
"� (y2 � c� �y1)
2
2�2
#
� Similarly,fY3jY2 is N
�(c + �y2) ; �
2�
or
fY3jY2 =1p2��2
exp
"� (y3 � c� �y2)
2
2�2
#
Then,the log likelihood can be written as
� (�) = log fY1 +TXt=2
log fYtjYt�1
= �12log (2�)� 1
2log��2=
�1� �2
���fy1 � (c= (1� �))g2
2�2=�1� �2
��(T � 1)
2log (2�)� (T � 1)
2log��2�
�TXt=2
(yt � c� �yt�1)2
2�2� The unknown parameters are collected in � =(c; �; �)
� We can maximize � (�) with respect to all thoseparameters and �nd the estimates that maximizethe probability of having observed such a sample.
max�� (�)
� Sometimes, we can even put constraints (such asj�j < 1)
� Q: Is it necessary to put the constraint �2 > 0?
� Note: If we forget the �rst observation, then we canwrite (setting c = 0) the FOC:
�TXt=2
@
@�
(yt � �yt�1)2
2�2= 0
TXt=2
yt�1 (yt � �yt�1) = 0
�̂ =
PTt=2 yt�1ytPTt=2 y
2t�1
� RESULT: In the univariate linear regression case,OLS, GMM, MLE are equivalent!!!
� To summarize the maximum likelihood principle:(a) Make a distributional assumption about the data(b) Use the conditioning to write the joint likelihoodfunction
(c) For convenience, we work with the log-likelihoodfunction
(d) Maximize the likelihood function with respect tothe parameters
� There are some subtle points.� We had to specify the unconditional distribution ofthe �rst observation
� We had to make an assumption about thedependence in the series
� But sometimes, MLE is the only way to go.� MLE is particularly appealing if we know thedistribution of the series. Most other de�cienciescan be circumvented.
� Now, you will ask: What are the properties of�̂MLE
? More speci�cally, is it consistent? What is itsdistribution, where
�̂MLE
= argmax� (�)
� Yes, �̂MLEis a consistent estimator of �:
� As you probably expect the asymptotic distributionof �̂
MLEis normal.
� Result:T 1=2
��̂MLE � �
�� aN (0; V )
V =
��@
2� (�)
@�@�0j�̂MLE
��1or
V =
TXt=1
l��̂MLE
; y�l��̂MLE
; y�
l��̂MLE
; y�=@f
@�
��̂MLE
; y�
� But we will not dwell on proving those properties.
Another Example: The log-likelihood of anAR(1)+ARCH(1) process
Yt = c + �Yt�1 + ut� where,
ut =phtvt
� ARCH(1) is:ht = � + au2t�1
where vt is iid with mean 0; and E�v2t�= 1:
� GARCH(1,1): Suppose, we specify ht asht = � + �ht�1 + au
2t�1
� Recall that E (Yt) = c1�� and V ar (Yt) =
�2
1��2
� Since Yt is a linear function of the "0ts, then it is alsoNormal (sum of normals is a normal).
� Therefore, the density (unconditional) of Yt isNormal.
� Result: If Y1 and Y2 are jointly Normal, then themarginals are also normal.
� Therefore,fY2jY1 is N ((c + �y1) ; h2)
or for the ARCH(1)
fY2jY1 =1p
2� (� + au21)exp
"� (y2 � c� �y1)
2
2 (� + au21)
#
� Similarly,fY3jY2 is N ((c + �y2) ; h3)
or
fY3jY2 =1p
2� (� + au22)exp
"� (y3 � c� �y2)
2
2 (� + �u22)
#
Then, the conditional log likelihood can be writtenas
� (�jy1) =TXt=2
log fYtjYt�1
= �(T � 1)2
log (2�)� (1)2
TXt=2
log�� + au2t�1
��
TXt=2
(yt � c� �yt�1)2
2�� + �u2t�1
�� The unknown parameters are collected in � =(c; �; �; �)
� We can maximize � (�) with respect to all thoseparameters and �nd the estimates that maximizethe probability of having observed such a sample.
max�� (�)
� Example:mle_arch.m
� Similarly for GARCH(1,1):
� (�jy1) =TXt=2
log fYtjYt�1
= �(T � 1)2
log (2�)� (1)2
TXt=2
log (ht)
�TXt=2
(yt � c� �yt�1)2
2 (ht)
whereht = � + �ht�1 + �u
2t�1
� To construct ht; we have to �lter the fut�1g series.� For a given uts; h0; and �; �; and �; we construct ht� The ht will allow us to evaluate the likelihood � (�jy1)� Optimize � (�jy1) with respect to all the parameters,given the initial conditions.
� This recursive feature of the GARCH makes itharder to estimate with GMM.
2 Kalman Filtering
� History: Kalman (1963) paper� Problem: We have a missile that we want to guideto its proper target.� The trajectory of the missile IS observable fromthe control center.
� Most other circumstances, such as weatherconditions, possible interception methods, etc.are NOT observable, but can be forecastable.
� We want to guide the missile to its properdestination.
� In �nance the setup is very similar, but the problemis different.
� In the missile case, the parameters of the systemare known. The interest is, given those parameters,to control the missile to its proper destination.
� In �nance, we want to estimate the parameters ofthe system. We are usually not concerned witha control problem, because there are very fewinstruments we can use as controls (although thereare counter-examples).
2.1 Setup (Hamilton CH 13)
yt = A0xt +H0zt + wt
zt = Fzt�1 + vtwhere
� yt is the observable variable (think �returns�)� The �rst equation, the yt equation is called the�space� or the �observation� equation.
� zt is the unobservable variable (think �volatility� or�state of the economy�)� The second equation, the zt equation is called the�state� equation.
� xt is a vector of exogenous (or predetermined)variables (we can set xt = 0 for now).
� vt and wt are iid and assumed to be uncorrelated atall lags
E (wtv0t) = 0
� Also E (vtv0t) = Q; E (wtw0t) = R
� The system of equations is known as a state-spacerepresentation.
� Any time series can be written in a state-spacerepresentation.
� In standard engineering problems, it is assumedthat we know the parameters A;H; F;Q;R:
� The problem is to give impulses xt such that, giventhe states zt; the missile is guided as closely totarget as possible.
� In �nance, we want to estimate the unknownparameters A;H; F;Q;R in order to understandwhere the system is going, given the states zt:There is little attempt at guiding the system. In fact,we usually assume that xt = 1 and A = E(Yt); oreven that xt = 0:
� Note: Any time series can be written as a statespace.
� Example: AR(2): Yt+1 � � = �1 (Yt � �) +�2 (Yt�1 � �) + "t+1
� State equation:�Yt+1 � �Yt � �
�=
��1 �21 0
� �Yt � �Yt�1 � �
�+
�"t+10
�� Observation equation:
yt = � +�1 0
� � Yt+1 � �Yt � �
�� There are other state-space representations of Yt:Can you write down another one?
� As a �rst step, we will assume that A;H; F;Q;Rare known.
� Our goal would be to �nd a best linear forecast ofthe state (unobserved) vector zt: Such a forecastis needed in control problems (to take decisions)and in �nance (state of the economy, forecasts ofunobserved volatility).
� The forecasts will be denoted by:zt+1jt = E (zt+1jyt:::; xt::::)
and we assume that we are only taking linearprojections of zt+1 on yt:::; xt::::: Nonlinear KalmanFilters exist but the results are a bit more compli-cated.
� The Kalman Filter calculates the forecasts zt+1jtrecursively, starting with z1j0; then z2j1; ...until zT jT�1:
� Since ztjt�1 is a forecast, we can ask how good of aforecast it is?
� Therefore, we de�ne Ptjt�1 = E��zt � ztjt�1
� �zt � ztjt�1
��;
which is the forecasting error from the recursiveforecast ztjt�1:
� The Kalman Filter can be broken down into 5 steps1. Initialization of the recursion. We need z1j0: Usually,we take z1j0 to be the unconditional mean, orz1j0 = E (z1) : (Q: how can we estimate E (z1)? )The associated error with this forecast is P1j0 =E��z1j0 � z1
� �z1j0 � z1
��
2. Forecasting yt (intermediate step)The ultimate goal is to calculate ztjt�1, but we dothat recursively. We will �rst need to forecast thevalue of yt; based on available information:
E (ytjxt; zt) = A0xt +H0zt
From the law of iterated expectations,Et�1 (Et (yt)) = Et�1 (yt) = A0xt +H
0ztjt�1The error from this forecast is
yt � ytjt�1 = H 0 �zt � ztjt�1�+ wt
with MSEE�yt � ytjt�1
� �yt � ytjt�1
�0= E
hH 0 �zt � ztjt�1
� �zt � ztjt�1
�0Hi+ E [wtw
0t]
= H 0Ptjt�1H +R
3. Updating Step (ztjt)� Once we observe yt; we can update our forecastof zt; denoting it by ztjt; before making the newforecast, zt+1jt:
� We do this by calculating E (ztjyt; xt; :::) = ztjtztjt = ztjt�1 + E
��zt � ztjt�1
� �yt � ytjt�1
����
E�yt � ytjt�1
� �yt � ytjt�1
�0��1 �yt � ytjt�1
�� We can write this a bit more intuitively as:
ztjt = ztjt�1 + ��yt � ytjt�1
�where � is the OLS coef�cient from regressing�zt � ztjt�1
�on�yt � ytjt�1
�:
� The bigger is the relationship between the twoforecasting errors, the bigger the correction mustbe.
� It can be shown thatztjt = ztjt�1+Ptjt�1H
�H 0Ptjt�1H +R
��1 �yt � A0xt �H 0ztjt�1
�� This updated forecast uses the old forecast ztjt�1;and the just observed values of yt and xt.
4. Forecast zt+1jt:� Once we have an update of the old forecast, wecan produce a new forecast, the forecast of zt+1jt
Et (zt+1) = E (zt+1jyt; xt; :::)= E (Fzt + vt+1jyt; xt; :::)= FE (ztjyt; xt; :::) + 0= Fztjt
� We can use the above equation to writeEt (zt+1) = Ffztjt�1
+Ptjt�1H�H 0Ptjt�1H +R
��1 �yt � A0xt �H 0ztjt�1
�g
= Fztjt�1
+FPtjt�1H�H 0Ptjt�1H +R
��1 �yt � A0xt �H 0ztjt�1
�� We can also derive an equation for the error inforecast as a recursion
Pt+1jt = F [Ptjt
�Ptjt�1H�H 0Ptjt�1H +R
��1H 0Ptjt�1]F
0
+Q
5. Go to step 2, until we reach T. Then, we are done.
� Summary: The Kalman Filter produces� The optimal forecasts of zt+1jt and yt+1jt (optimalwithin the class of linear forecasts)
� We need some initialization assumptions� We need to know the parameters of the system,i.e. A; H; F; Q; R:
� Now, we need to �nd a way to estimate theparameters A; H; F; Q; R:
� By far, the most popular method is MLE.� Aside: Simulations Methods�getting away from therestrictive assumptions of "t
2.2 Estimation of Kalman Filters (MLE)
� Suppose that z1; and the shocks (wt; vt) are jointlynormally distributed.
� Under such an assumption, we can make the verystrong claim that the forecasts zt+1jt and yt+1jtare optimal among any functions of xt; yt�1:::: Inother words, if we have normal errors, we cannotproduce better forecasts using the past data thanthe Kalman forecasts!!!
� If the errors are normal, then all variables in thelinear system have a normal distribution.
� More speci�cally, the distribution of yt conditionalon xt, and yt�1; ::: is normal, orytjxt; yt�1::: � N
�A0xt +H
0ztjt�1;�H 0Ptjt�1H +R
��� Therefore, we can specify the likelihood function ofytjxt; yt�1 as we did above.
fytjxt;yt�1 = (2�)�n=2��H 0Ptjt�1H +R
���1=2� exp[�1
2
�yt � A0xt �H 0ztjt�1
�0 �H 0Ptjt�1H +R
��1��yt � A0xt �H 0ztjt�1
�]
� The problem is to maximize
maxA;H;F;Q;R
TXt=1
log fytjxt;yt�1
� Words of wisdom:� This maximization problem can easily get un-manageable to estimate, even using moderncomputers. The problem is that searching forglobal max is very tricky.� A possible solution is to make as many restric-tions as possible and then to relax them one byone.
� A second solution is to write a model that givestheoretical restrictions.
� Recall that there are more than 1 state spacerepresentations of an AR process. This impliesthat some of the parameters in the state-spacesystem are not identi�ed. In other words, morethan one value of the parameters (differentcombinations) can give rise to the same likelihoodfunction.� Then, which likelihood do we choose?� Have to make restrictions so that we have anexactly identi�ed problem.
2.3 Applications in Finance
� Anytime we have unobservable state variables� Filtering expected returns (Pastor and Stambaugh(JF, 2008))
� Filtering variance (Brandt and Kang (JFE, 2007))
� Interpolation of data� Bernanke and Kuttner (JME?)
� Time varying parameters� Time-varying Betas (Ghysels (JF, 1998))
3 Kalman Smoother
� For purely forecating purposes, we needztjt�1 = E (ztjIt�1)
where It�1 = fyt�1;yt�2; :::; y1; xt�1; :::x1g and thecorresponding error Ptjt�1 = E
��zt � ztjt�1
�2�� But if we want to model a process (under-stand its properties), we might want to incor-porate all the available information in IT =fyT;yT�1; :::; y1; xT ; :::x1g:
� In other words, we might want to estimateztjT = E (ztjIT )
There is de�nitely a look-ahead bias here, butthat is the point. We want to include all availableinformation in order to get a better glimpse into theproperties of zt!
� Recall that from the KF, we have the sequencesfzt+1jtg; fztjtg; fPt+1jtg; fPtjtg:
� Suppose someone tells you the correct value ofzt+1 at time t. How can you improve upon thebest forecast ztjt? It turns out that we do the sameupdating as we did in step 3 of the KF:
E (ztjzt+1;It) = ztjt + E��zt � ztjt
� �zt+1 � zt+1jt
����
E�zt+1 � zt+1jt
� �zt+1 � zt+1jt
�0��1 �zt+1 � zt+1jt
�1. � We can write this a bit more intuitively as:
E (ztjzt+1;It) = ztjt + Jt�zt+1 � zt+1jt
�whereJt = E
��zt � ztjt
� �zt+1 � zt+1jt
����
E�zt+1 � zt+1jt
� �zt+1 � zt+1jt
�0��1= PtjtFP
�1t+1jt
� Because the process is Markovian,E (ztjzt+1;It) =E (ztjzt+1;IT ) : We can't do better than that!Hence,
E (ztjzt+1;IT ) = ztjt + Jt�zt+1 � zt+1jt
�� Last step. We can show that
E (ztjIT ) = ztjT = ztjt + Jt�zt+1jT � zt+1jt
�
� Hence, the KS algorithm is, after we obtain theKF fzt+1jtg; fztjtg; fPt+1jtg; fPtjtg
2. Start at the end, zT jT :3. Compute JT�1 = PT�1jT�1FP
�1T jT�1
4. ComputezT�1jT = zT�1jT�1 + JT�1
�zT jT � zT jT�1
�5. Use zT�1jT to compute zT�2jT and so on.6.We can compute the associated MSE as
PtjT = Ptjt + Jt�Pt+1jT � Pt+1jt
�J0
t
4 Time-Varying Parameters
� An example of a time varying parameter model:rt+1 = � + �txt + "t+1�t+1 = �t + vt+1
� Q: What equations are observations and what arethe state equations?
� Note that this does not �t within the KF setup:yt = A0xt +H
0zt + wtzt = Fzt�1 + vt
� We need the generalizationyt = A(xt) +H(xt)
0zt + wtzt+1 = F (xt)zt + vt+1
� Note that F (xt) and not F (xt+1) in the stateequation!
� Now, we have to assume that�we didn't have to doit earlier!�
wtvt+1
jxt; It�1�� N
��00
�;
�Q(xt) 00 R(xt)
��� Before, we had linearity in all variables. Now, wedon't.
� Given the conditional normal assumption, we canshow that�ztytjxt; It�1
�� N
��ztjt�1
A(xt) +H(xt)0ztjt�1
�; V
�V =
�Ptjt�1 Ptjt�1H(xt)
H 0(xt)Ptjt�1 H0(xt)Ptjt�1H(xt) +R(xt)
�where
�ztjt�1
;�ztjt;�Ptjt�1
;�Ptjtare ob-
tained from the KF procedure above.
� Notice that, conditional on xt, the time varyingparameters are �xed.
� Estimation is easy (MLE), given the assumption.
� TVP Example:rt = �t�t + wt
�t+1 � �� = F��t � ��
�+ vt+1
We have a CAPM with TV �s in mind.
� If we assume that�wtvt+1
jxt; It�1�� N
��00
�;
��2 00 Q
��then we are within the KF framework.
� Substituting the state variable zt =��t � ��
�in the
space equation, we can writert = �t�� + �tzt + wt
� We can plug in the MLE estimator directly.� Note: We can allow an AR(p) dynamics in the stateequation quite easily.
� Example: Wells, C., The Kalman Filter in Finance,Springer Netherlands.
� Example: Ludvigson and NG (JFE, 2007)mt+1 = a0Ft + �
0Zt + "t+1V OLt+1 = c0Ft + d
0Zt + ut+1where V OLt+1 is the realized volatility in montht + 1: (Observable)
5 Brandt and Kang (JFE, 2004):rt+1 = �t + �tut+1�
ln�tln�t
�= d + A
�ln�t�1ln�t�1
�+ "t
The Delta Method
� We estimate y = �x + "; and obtain �̂ but areinterested in a function g (�) ; where g(:) is somenon-linear model.
� Example: we have a forecast of the volatility, �̂t andwant to test its economic signi�cance.
� Statistical measure of �t: MSE=En(�̂t � �t)
2o
� Economic measure of �t: C (�̂t; St; K; r; T ) andcompare it to C (�t; St; K; r; T ) ; where C(:) is theBS call-option formula.
� We want to know whether C (�̂t; St; K; r; T ) �C (�t; St; K; r; T ) is economically and statisticallydifferent from zero.
� The Delta method� If we have a consistent, asymptotically normalestimator p
T��̂ � �
�!d N(0; V )
and g(:) is differentiable, thenpT�g��̂�� g (�)
�! dN(0; D0V D)
D =@g
@�j�
� Sketch of the proof: From the Mean-Value Theorem,we can write
g��̂�= g (�) +
@g0
@�j�M
��̂ � �
�where �M lies between �̂ and �: Since, �̂ !p �; then�M !p � and @g0
@� j�M !p @g0
@� j� (Continuous MappingTheorem).
Then, we can writepT�g��̂�� g (�)
�=@g0
@�j�M
pT��̂ � �
�� � @g0
@� j�M !p @g0
@� j�
�pT��̂ � �
�!d N(0; V )
� Slutsky Theorem: @g0
@� j�MpT��̂ � �
�!dh
@g0
@� j�iN(0; V )
� OrpT�g��̂�� g (�)
�!d N(0;
�@g0
@�j��V
�@g
@�j��)
� Example: We run the regression (s.e. in parenthe-ses)
yt = � + �xt + "t= 0:1
(0:04)+ 1:1(0:3)
xt + "t
� A test of � = 1 yields, t = (1:1� 1) =0:3 = 0:33� We are interested in ln(�̂) and testing under the nullof ln (�) = 0: From the delta method, we know thatp
T�ln��̂�� ln (�)
�!d N(0; D2V )
where D = 11:1 = 0:91, V = 0:3^2 = 0:09;or
pT (0:095� 0)!d N(0; 0:9120:09)
and a test of ln(�) = 0 is t = 0:095=0:2862 = 0:33:
6 Empirical Portfolio Choice�Mean-VarianceImplementation
� The solution to the mean-variance problem:minxvar (rp;t+1) = x0�x
s:t:E (rp) = x0� = ��
isx� =
��
�0��� ��1�
= ���1�
� Now, we have to rely on econometrics, to implementthe solution.
� Two step approach:� Solve the model� Estimate the parameters and plug them in!
� PLUG-IN APPROACH:� We continue with the assumption that returns arei.i.d.
� Then, we can estimate
�̂ =1
T
TXt=1
rt+1
�̂ =1
T �N � 2
TXt=1
(rt+1 � �̂) (rt+1 � �̂)0
� We plug in the estimates into the optimal solutionx̂� =
1
�̂�1�̂
� Under the normality assumption, this estimator isunbiased, or
E (x̂�) =1
E��̂�1
�E (�̂)
� In the univariate case we can show by the deltamethod that
V ar (x̂�) =1
2
� ��2
�2�var(�̂)�2
+var(�̂2)
�4
�
� Example� Suppose we have 10 years of monthly data, orT = 120:
� Suppose we have a stock with � = 0:06 and� = 0:15:
� Suppose that = 5:� Note that
x̂� =1
�̂
�
=0:06
5 � 0:15^2 = 0:533
� Very close to the usual 60/40 advice by �nancialadvisors!
� With i.i.d. returns, the standard errors of the meanand variance are
var(�̂) =�pT=0:15p120
= 0:014
var(�̂2) =p2�2pT=p20:152p120
= 0:003
� Plugging all these in the formula for V ar (x̂�) ;weobtain
V ar (x̂�) = 0:14
� We can test hypotheses as with every otherparameter of interest.
� Estimating � is very problematic� Many parameters to estimate� Suppose we have 500 assets in the portfolio. Wehave 125,250 unique elements to estimate.
� In general, for N assets, we have N(N + 1)=2unique elements to estimate!
� We need ��1: Small estimation errors �̂ results invery different �̂�1:
� Solution: Shrink the matrix�̂s = �S + (1� �) �̂
where� � 1
T
A�B
C
A =Xi
Xj
asy var�p
T �̂i;j
�B =
XXasy cov
�pT �̂i;j;
pTsi;j
�C =
XX(�̂i;j � si;j)
2
where S is often taken to be I: For more discus-sions, see Ledoit and Wolf (2003)
� We can also shrink the weights directlyxs = �x0 + (1� �)x�
� This approach is often used in applied work.� Problem with shrinkage: Ad-hoc. No economicjusti�cation for it or for �:
� Bayesian framework� Economic constraints [Jagannathan and Ma (JF,2003)]
� Another solution: Factor models for stock iri;t = �i + �ifm + "i;t
� We can take variances to show that�r = �2m��
0 + �rwhere � is a vector of the betas and �r isa diagonal matrix with diagonal elements thevariances of "i;t:
� Now, the problem is reduced signi�cantly!� What about time variation in � and �!
7 Wold Decomposition: Stationary Processes
� Q: Isn't the AR(1) (or ARMA(p,q)) model restrictive?� No, because of the Wold decomposition result
Wold's (1938) Theorem: Any zero-mean covari-ance stationary process Yt can be represented in theform
Yt =1Xj=0
j"t�j + �t
where 0 = 1 andP1
j=0 2j <1 (square summable).
The term "t is white noise and represents the linearprojection error of Yt on lagged Y 0t s
"t = Yt � E (YtjYt�1; Yt�2; :::) :The value �t is uncorrelated with "t�j for any j and isa purely deterministic term.� Can we estimate all j in the Wold's decomposi-tion?
� The stationary (j�j < 1) AR(1) model can be writtenas
Yt = �Yt�1 + "t(1� �L)Yt = "t
Yt = (1� �L)�1 "t
=
1Xj=0
�j"t�j
or j = �j: This is the restriction for the AR(1)model.� The stationary ARMA(1,1) model can be written as
Yt = �Yt�1 + "t + �"�1(1� �L)Yt = (1 + �L) "t
Yt ="t
(1� �L)+
�"t�1(1� �L)
= "t +1Xj=1
�j�1 (� + �) "t�j
or j = �j�1 (� + �) :
� And so on.
� Another interesting process: Fractionally differenc-ing
Yt = (1� L)�d "t
� where d is a number between 0 and 0.5.� It can be shown (Granger and Joyeux (1980), andJosking (1981)) that
Yt =1Xj=0
�j"t�j
�j = (1=j!) (d + j � 1) (d + j � 2) (d + j � 3) ::: (d + 1) d�j � (j + 1)d�1 ; for large j
� Plot of �j for d = 0:25 and �j for � = 0:5 and� = 0:95
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Lags
φ=0.5d=0.25φ=0.5
� There is a similar representation in the spectraldomaineSpectral Representation Theorem [e.g., Cramer
and Leadbetter (1967)]: Any covariance stationaryprocess Yt with absolutely summable autocovari-ances can be represented as
Yt = � +
Z �
0
[� (!) cos (!t) + � (!) sin (!t)] d!
where � (:) and �(:) are zero-mean random variablesfor any �xed frequency ! 2 [0; �] : Also, for anyfrequencies 0 < !1 < !2 < !3 < !4 < �,
R !2!1�(!)d!
is uncorrelated withR !4!3�(!)d! and the variableR !2
!1�(!)d! is uncorrelated with
R !4!3�(!)d!:
� Different (but equivalent) way of looking at atime-series.