tests of static asset pricing models. in general asset pricing models quantify the tradeoff between...
TRANSCRIPT
Tests of Static Asset Pricing Models
Tests of Static Asset Pricing Models
• In general asset pricing models quantify the tradeoff between risk and expected return.– Need to both measure risk and relate it to the
expected return on a risky asset.
• The most commonly used models are:– CAPM– APT– FF three factor model
Testable Implications
• These models have testable implications. For the CAPM, for example:– Expected excess return of a risky asset is
proportional to the covariance of its return and that of the market portfolio.
• Note, this tells us the measure of risk used and its relation to expected return.
– There are other restrictions that depend upon whether there exists a riskless asset.
Testable Implications
• For the APT,– The expected excess return on a risky asset is
linearly related to the covariance of its return with various risk factors.
– These risk factors are left unspecified by the theory and have been:
• Derived from the data (CR (1983), CK)
• Exogenously imposed (CRR (1985))
Plan
• Review the basic econometric methodology we will use to test these models.
• Review the CAPM.• Test the CAPM.
– Traditional tests (FM (1972), BJS (1972), Ferson and Harvey)
– ML tests (Gibbons (1982), GRS (1989))– GMM tests
• Factor models: APT and FF– Curve fitting vs. ad-hoc theorizing
Econometric Methodology Review
• Maximum Likelihood Estimation
• The Wald Test
• The F Test
• The LM Test
• A specialization to linear models and linear restrictions– A comparison of test statistics
Review of Maximum Likelihood Estimation
• Let {x1, … xT} be a sample of T, i.i.d. random variables.– Call that vector x.– Let x be continuously distributed with density
f(x|).– Where, is the unknown parameter vector that
determines the distribution.
The Likelihood Function
• The joint density for the independent random variables is given by:
f(x1|) f(x2|) f(x3|)… f(xT|)• This joint density is known as the likelihood
function, L(x|)
L(x|)= f(x1|) f(x2|) f(x3|)… f(xT|)• Can you write the joint density and L(x|) this
way when dealing with time-dependent observations?
Independence
• You can’t.– The reason you can write the product
f(x1|) f(x2|) f(x3|)… f(xT|)
is because of the independence.
• If you have dependence, writing the joint density can be extremely complicated.
• See, e.g. Hamilton (1994) for a good discussion of switching regression models and the EM algorithm.
Idea Behind Maximum Likelihood Estimation
• Pick the parameter vector estimate, , that maximizes the likelihood, L(x|), of observing the particular vector of realizations, x.
MLE Plusses and Minuses
• Plusses: Efficient estimation in terms of picking the estimator with the smallest covariance matrix.– Question: are ML estimators necessarily
unbiased?
• Minuses: Strong distributional assumptions make robustness a problem.
MLE Example: Normal Distributions where OLS assumptions are satisfied
• Sample y of size T is normally distributed with mean x where– X is a T x K matrix of explanatory variables is a K x 1 vector of parameters– The variance-covariance matrix of the errors
from the true regression is 2I, where– I is a T x T identity matrix
The Likelihood Function
• The likelihood function for the linear model with independent normally distributed errors is:
The Log-Likelihood Function
• With independent draws, it is easier to maximize the log-likelihood function, because products are replaced by sums. The log-likelihood is given by:
First-order Conditions:
First-order Conditions: 2
The Information Matrix
• If is our parameter vector,– I() is the information matrix,– which is minus the expectation of the matrix of
second partial derivatives of the log-likelihood with respect to the parameters.
The Information Matrix – Cont…
• The MLE achieves the Cramer-Rao lower bound, which means that the variance of the estimators equals the inverse of the information matrix:
• Now,
• note, the off diagonal elements are zero.
).,( 21 I
The Information Matrix – Cont…
• The negative of the expectation is:
• The inverse of this is:
Another way of Writing I(,2)
• For a vector, , of parameters, I(), the information matrix, can be written in a second way:
• This second form is more convenient for estimation, because it does not require estimating second derivatives.
Estimation
• The Likelihood Ratio Test– Let be a vector of parameters to be estimated.
– Let H0 be a set of restrictions on these parameters.
– These restrictions could be linear or non-linear.– Let be the MLE of estimated without
regard to constraints (the unrestricted model).– Let be the constrained MLE.
U
U
R
The Likelihood Ratio Test Statistic
• If and are the likelihood functions evaluated at these two estimates, the likelihood ratio is given by:
• Then, -2ln() = -2(ln( ) – ln( ) ~ 2 with degrees of freedom equal to the number of restrictions imposed.
)ˆ(ˆUUL )ˆ(ˆ
RRL
)ˆ(ˆUUL )ˆ(ˆ
RRL )ˆ(ˆ)ˆ(ˆ
UU
RR
L
L
Another Look at the LR Test
• Concentrated Log-Likelihood: Many problems can be formulated in terms of partitioning a parameter vector, into {1, 2} such that the solution to the optimization problem, can be written as a function of , e.g.:
• Then, we can concentrate the log-likelihood function as: F*(1, 2) = F(1, t(1)) Fc().
21
).ˆ(ˆ12 t
Why Do This?
• The unrestricted solution to
• then provides the full solution to the optimization problem, since t is known.
• We now use this technique to find estimates for the classical linear regression model.
)( 11 cFMax
Example
• The log-likelihood function (from CLM) with normal disturbances is given by:
• The solution to the likelihood equation for implies that however we estimate , the estimator for will be:
2
2
Ex: Concentrating the Likelihood Function
• Inserting this back into the log-likelihood yields:
• Because (y - X)(y - X) is just the sum of squared residuals from the regression (ee) we can rewrite ln(Lc) as:
Ex: Concentrating the Likelihood Function
• For the restricted model we obtain the restricted concentrated log-likelihood:
• So, plugging in these concentrated log-likelihoods into our definition of the LR test, we obtain:
• Or, T times the log of the ratio of the restricted SSR and the unrestricted SSR, a nice intuition.
)(
1ln)2ln(1
2)ln( '
RRcR eeT
TL
ee
eeTLR RR
'ln
'
Ex: OLS with Normal Errors
• True regression model:
• The t are iid normal.
• Sample size is T.
• Restriction: = 1.
ttt xy
Example – Cont…
• The first-order conditions for the estimates and simply reduce to the OLS normal equations:
Example – Cont…
• Solving
• Substituting into the FOC for yields:
xy ˆˆ
T
t t
T
t tt
xx
yyxx
1
2
1
)(
)))(((
Example – Cont…
• Solve for as before: 22
1
2 )ˆˆ(1
ˆ
T
ttt xy
T
Example – Cont…
• The restricted model is exactly the same, except that is constrained to be one, so that the normal equation reduces to:
and
One can then plug in to obtain and form the likelihood ratio, which is distributed 2(1).
2ˆ R
The Wald Test
• The problem with LR test: Need both restricted and unrestricted model estimates.
• One or the other could be hard to compute.• The Wald test is an alternative that requires
estimating the unrestricted model only.• Suppose y ~ N(X, ), with a sample size of T,
then:21 ~)()'( TXyXy
The Wald Test – Cont…
• Under the null hypothesis that E(y) = X, the quadratic form above has a 2 distribution. If the hypothesis is false, the quadratic form will be larger, on average, than it would be if the null were true.
• In particular, it will be a non-central 2 with the same degrees of freedom, which looks like a central 2, but lies to the right.
• This is the basis for the test.
The Restricted Model
• Now, step back from the normal and let be the parameter estimates from the unrestricted model.
• Let restrictions be given by
H0: f() = 0.
• If the restrictions are valid, then should satisfy them.
• If not, should be farther from zero than would be explained by sampling error alone.
)ˆ(f
Formalism
• The Wald statistic is
• Under H0 in large samples, W ~ 2 with d.f. equal to the number of restrictions. See Greene ch.9 for details.
• Lastly, to use the Wald test, we need to compute the variance term:
)ˆ()])ˆ([()'ˆ( 1 ffVarfW
Restrictions on Slope Coefficients
• If the restrictions are on slope coefficients of a linear regression, then:
where
and K is the number of regressors.
• Then, we can write the Wald Statistic:
where J is the number of restrictions.
12 )'(]ˆ[]ˆ[ XXsVarVar 22 ˆ
' KT
T
KT
ees
][)ˆ())'ˆ(])'()[ˆ(()'ˆ( 2112 JfGXXsGfW
Linear Restrictions
H0: R - q = 0
• For example, suppose there were three betas, 1, 2, and 3. Let’s look at three tests.
(1) 1 = 0,
(2) 1 = 2,
(3) 1 = 0 and 2 = 2.
• Each row of R is a single linear restriction on the coefficient vector.
Writing R
• Case 1:
• Case 2:
• Case3:
The Wald Statistic
• In general, the Wald statistic with J linear restrictions reduces to:
with J d.f.
• We will use these tests extensively in our discussion of Chapters5 and 6 of CLM.
]ˆ[]')'([]'ˆ[ 112 qRRXXRsqRW
The F Test
• A related way to test the validity of the J restrictions
R - q = 0
• Recall that the F test can be written in terms of a comparison of the sum of squared residuals for the restricted and unrestricted models:
• or
)/('
/)''(),(
KTee
JeeeeKTJF RR
J
qRRXXRsqRKTJF
]ˆ[]')'([]'ˆ[),(
112
Why Do We Care?
• We care because in a linear model with normally distributed disturbances under the null, the test statistic derived above is exact.– This will be important later because under
normality, some of our cross-sectional CAPM tests will be of this form and,
– A sufficient condition for the (static) CAPM to be “correct” is for asset returns to be normally distributed.
The LM Test
• This is a test that involves computing only the restricted estimator.– If the hypothesis is valid, at the value of the
restricted estimator, the derivative of the log-likelihood function should be close to zero.
– We will next form the LM test with the J restrictions f() = 0.
The LM Test – Cont…
This is maximized by choice of and
)(')]()'[()2(
1)ln(
2)2ln(
2)ln(
22
FXyXy
TTLLM
.ˆ 2
First-order Conditions
• and
The LM Test – Cont…
• The test then, is whether the Lagrange multipliers equal zero. When the restrictions are linear, the test statistic becomes (see Greene, chapter 7):
where J is the number of restrictions.
]ˆ[]')'([]'ˆ[ 112 qRRXXRsqRLM R
W, LR, LM, and F
• We compare them for J linear restrictions in the linear model with K regressors. It can be shown that:–
–
–
– and that W > LR > LM.
,FJKT
TW
,1
1ln
FJ
KTTLR
,]))/(1(1)[(
FJFJKTKT
TLM