semantic scholar€¦ · efficient tests for long-run predictability: do long-run relations convey...
TRANSCRIPT
Efficient Tests for Long-Run Predictability:
Do Long-Run Relations Convey Extra Information?
Natalia Sizova∗
January 10, 2015
Abstract
Short-run and long-run relations among time series can differ. In situations in which
short-run constraints and information biases obscure equilibrium relations among economic
variables, estimates of the long-run relations, which are free of such contaminations, become
the only basis for evaluating economic hypotheses. The common approach to estimating
long-run predictability has been long-horizon regressions. However, long-horizon regres-
sions are not designed to extract long-run information efficiently, and the lack of accuracy
often outweighs their robustness to short-run noise. This study suggests two methods for
replacing long-horizon regressions. The corresponding tests can be viewed as long-run ver-
sions of the Q-test by Campbell and Yogo (2006) and the nearly optimal test by Elliott,
Muller, and Watson (2014). We demonstrate the usefulness of long-run information in two
common empirical applications.
∗Department of Economics, Rice University, Houston, TX 77251, USA. Tel.: +1 (713) 348-5613; fax: +1 (713)348-5278. Email: [email protected].
1
1 Introduction
We are often charged with the task of analyzing how a dependent variable, e.g., yt, responds
to shocks in the long run. For example, we might directly ask about the predictability in the
aggregated quantity yt+1 + ... + yt+H for some large horizon H or we might be interested in
measuring the effect on yt+H . Examples include tests for long-run monetary neutrality, e.g.,
Fisher and Seater (1993) and Newey and West (1994); tests of the links among exchange rates,
interest rates, and inflation, e.g., Meese and Rogoff (1988) and Mishkin (1990); and tests for
long-run predictability in equity returns, e.g., Fama and French (1988). Because it is the most
intuitive solution, it is not surprising that the usual approach to the task is to run regressions
of yt+1 + ...+ yt+H or yt+H on a set of explanatory variables. However, recent research indicates
that these long-horizon regressions provide biased and disappointingly inaccurate estimates. As
an illustration, consider the task of testing for predictability in stock market returns. The use
of long-horizon regressions for this application is motivated in part by the greater values of the
t-statistics for large horizons H . However, with valid confidence intervals 1 it has been shown
that the p-values of the tests remain roughly constant and even increase with H , e.g., Boudoukh,
Richardson, and Whitelaw (2008), Hjalmarsson (2011). The results of many prior long-run
predictability papers now come into question (e.g., Valkanov, 2003). Can the long-run relations
be estimated accurately enough to convey information that is not already evident from short-run
relations? Because long-horizon regressions do not provide the answer, the solution is to consider
more efficient estimation methods.
Several significant developments related to long-run predictability testing have recently been
reported in the literature. This predictability research focuses on the following model:
yt = βxxt−1 + εt,
xt = ρxt−1 + ut,
where (εt, ut) is a sequence of i.i.d. vectors and ρ is close to one. For this model note that
the effect of xt−1 on yt+H remains substantial over many time periods H as long as βx 6= 0.
Therefore, we say that yt is predictable by xt in the long run if βx 6= 0. The methods that
have been recently developed to efficiently test the hypothesis H0 : βx = 0 include the Q-test
(bias-adjusted OLS t-test) by Campbell and Yogo (2006), the nearly optimal test by Elliott,
Muller, and Watson (2014), and the conditionally optimal test by Jansson and Moreira (2006).
All of these tests belong to the class of quasi-likelihood (QL)-based approaches derived under
the i.i.d. assumption on (εt, ut). However, these methods can be extended to the case of serially
correlated (εt, ut).
1Valid confidence intervals account for small-sample effects due to the persistence in explanatory variables anddue to large horizons.
2
From the long-run predictability perspective, the most interesting extension to the above
model involves εt that is not only serially correlated but is also predictable by prior values of xt,
creating an endogeneity problem. In this case, the short-run predictability dyt/dxt−1 6= 0 arises
from both the βxxt−1 and εt terms. In contrast, the long-run predictability dyt+H/dxt−1 6= 0 for
largeH is due only to βxxt−1. Therefore, in a model with a predictable εt, the correlation between
yt+1 and xt could be of a different magnitude, or even of a different sign, from the correlation
between yt+1 + ...+ yt+H and xt. For example, the presence of market momentum often distorts
the risk-return relationship between the stock returns and measures of the risk (see Hong and
Stein, 1999). The presence of error in the estimates of the payouts to shareholders distorts
the relationship between stock returns and payout ratios (see Boudoukh, Michaely, Richardson,
and Roberts, 2007). In the presence of a distortional monetary policy, the carry trade results
in deviations in the exchange rate changes from the interest rate differentials (see Boudoukh,
Richardson, and Whitelaw, 2013).
When the shock εt is predicted by past values of xt, QL-based methods can be corrected.
However, such corrections necessarily embed an estimation of the predictability in the shock
εt. The resulting loss of efficiency can be substantial. Consider, for example, the case in which
βx and cov(εt, xt−1) are small and have opposite signs, so that dyt/dxt−1 ≈ 0. In this case,
the task of estimating the predictability of εt is on the same level of complexity as estimating
the predictability of yt itself. In contrast, this paper considers methods that are robust to the
short-run endogeneity. One example of such a method is long-horizon regressions with large H .
However, we offer a significantly more accurate alternative.
In this paper, we propose two methods that are designed to extract the maximum amount
of information about the long run and do not rely on the short-run information. Both of these
methods are motivated by the Neyman-Pearson lemma. Under a set of conditions that includes
Gaussianity, they yield the most powerful tests and achieve the same accuracy asymptotically
under more general assumptions. As a result, we obtain sizable efficiency gains in comparison
with long-horizon regressions.
The first method that we propose is referred to as the Local Whittle (LW) test. This test is
based on the maximization of the long-run portion of the Gaussian likelihood for each ρ. The
resulting procedure resembles, in many respects, the Q-test, but it is immune to violations of the
condition E(εt|xt−1, ...) = 0. The only assumption that is required for the validity of the LW test
is that the long-run behavior of (εt, ut) is close to that of independent observations (formally, see
Assumption A).
We examine the asymptotical behavior of the LW test under the assumption that ρ is local
to one, i.e., xt is nearly integrated; thus, we follow Campbell and Yogo (2006), Jansson and
Moreira (2006), Elliott, Muller, and Watson (2014) and many others. In these studies, it was
also reasonable to allow small values of ρ by, for example, setting a threshold below which
the standard methods are applied (see Elliott, Muller and Watson, 2014). In the case of the
3
long-run predictability, however, non-trivial results can arise only if xt is sufficiently persistent.
Therefore, the local-to-unity assumption is crucial. Note that when ρ is near one, the case of
βx 6= β0 corresponds to the presence of near-cointegration between the series of yt − β0xt−1 and
xt−1. Therefore, it is not surprising that the LW test is found to be linked to the frequency-
domain least-square (FDLS) estimator by Robinson (1994)2. The FDLS was previously applied
to measure the (fractional) cointegration between (fractionally) integrated series (see Marinucci
and Robinson, 2003). It appears that the same estimator can be applied to nearly integrated
series. We establish, however, that the FDLS estimator is asymptotically biased, while the
estimator that is the basis of the LW test can be viewed as a bias-adjusted alternative to the
FDLS.
The asymptotic properties of the LW test are then compared with other tests for a set of
model parameters. As a benchmark, we use the performance of the Q-test. We find that,
under assumptions that are favorable to the Q-test, the LW test’s performance is close to this
benchmark. Moreover, when the long-run and short-run dynamics differ, the LW test outperforms
the Q-test. The performance of the long-horizon regressions is unimpressive in both cases.
The second method that we consider is the nearly optimal long-run predictability test. The
first test, LW, is based on the “long-run” likelihood ratio with known ρ, which is then replaced by
conservative estimates (adjusted Bonferroni bounds). The nearly optimal long-run predictability
test is also based on the “long-run” likelihood ratio but inherently treats ρ as a nuisance param-
eter. To incorporate the nuisance parameter within the Neyman-Pearson lemma, the likelihood
under H0 is replaced by an average over the possible values of ρ with the least favorable distri-
bution of the weights (see Elliott, Muller, and Watson, 2014). As expected, the nearly optimal
test is uniformly better than the LW test asymptotically.
In the empirical part of this paper, we evaluate the performance of the long-run predictability
tests for two long-standing empirical questions: the predictability of stock returns by payout
ratios and the validity of the uncovered interest rate parity. We suggest a new version of classical
long-run predictability tables that were used to present results over a range of increasing time
horizons. The alternative long-run predictability tables are obtained by reducing the number of
frequencies that are employed in the estimation. We demonstrate that the long-run estimates
do provide statistically different information from the short-run estimates. Although similar
(and substantially more dramatic) results were previously found with long-horizon regressions,
later these results were challenged and acknowledged to be misleading due to unaccounted for
small-sample effects, e.g., Richardson and Stock (1989), Boudoukh, Richardson, and Whitelaw
(2008).
This study is organized as follows. Section 2 discusses the motivation for the LW test under
2Similar estimators have been considered in the unit-root literature. For example, Corbae, Ouliaris, andPhillips (2002) consider band spectral regressions. This paper naturally extends their results for spectral regres-sions at zero frequency to the case with ρ 6= 1 and endogenous xt.
4
the Gaussian assumption. Section 3 derives asymptotic distributions under general conditions,
and Section 4 describes the construction of the LW test and compares the Q-test by Campbell
and Yogo (2006), the LW test, and the t-tests in simple and long-horizon regressions based on the
asymptotic local power functions. Section 5 presents the nearly optimal long-run predictability
test and compares its asymptotic performance with the LW test. Section 6 reports the results of
long-run predictability tests in two empirical applications. Section 7 presents the conclusions.
2 The Uniformly Most Powerful Test in the Gaussian
Case When ρ is Known.
Let yt denote the variable that we are forecasting, and let xt denote the explanatory variable.
We observe a bi-variate process (xt−1, yt) for t = 1, ..., T , whose dynamics can be represented by
the following system:
yt =µy + βx(xt−1 − µx) + εt,
xt =µx + ρ(xt−1 − µx) + ut.(1)
Suppose that xt is very persistent and can be modeled as nearly integrated, i.e., ρ = ρT =
1+c/T (see Elliott and Stock, 1994). Some constructs in this section require c 6= 0, which can be
either positive or negative. The case of c = 0 is omitted for brevity. Furthermore, the asymptotic
results in the next section do not require c 6= 0. As the initial condition, assume that x0 has a
distribution that does not depend on T .
Suppose that the random and non-degenerate3 mean-zero vector of innovations et = (ut, εt)
satisfies the following conditions from Phillips (1988, p. 1023):
Assumption A. For some γ > 2 and δ > 0, E||et||γ+δ < ∞, and strong mixing coefficients αi
are such that∑∞
i=1 α1−2/γi <∞.
These conditions allow for heteroscedasticity and dependence over time. Under these condi-
tions, the functional central limit theorem holds: for the univariate case, see Herrndorf (1984),
and for the vector form, see Phillips and Durlauf (1986) and Phillips (1987).
The goal is to test a null hypothesis H0 : βx = β0. We start with a simple alternative
hypothesis H1 : βx = β1. To obtain optimality results, we rely on the following normality
assumption, which is relaxed in the derivation of asymptotic distributions:
Assumption B. Process et = (ut, εt), t = 1, ..., T is Gaussian.
3For asymptotic results, the relevant definition of nondegeneracy is that su,ε(0) is positive-definite, wheresu,ε(ω) is the spectrum of et. For optimality results, we require su,ε(ω) to be positive-definite with the determinantgreater than m > 0 for all −π ≤ ω ≤ π, to invoke the results by Dzhaparidze (1986).
5
One can directly specify the most powerful (MP) test when all of the parameters in the model
are known except for βx. In this case, the MP test readily follows from the Neyman-Pearson
lemma (see Lehman and Romano, 2005, Theorem 3.2.1) and is the likelihood ratio test. Note
that the assumption of the known nuisance parameters is especially restrictive with regard to the
parameters that cannot be consistently estimated, such as ρT4. This assumption is relaxed when
constructing confidence intervals, but it is required for the derivation of the optimality results in
this section.
Campbell and Yogo (2006) derive their Q-test, which is a more powerful alternative to the
OLS t-test, under the assumption that (εt, ut) are i.i.d. normal and ρ is known. We, however,
are interested in a more general case, in which εt may depend on past values of ut. Therefore,
we allow for endogeneity, which leads to the differences in the long-run and short-run dynamics.
Our results can also be viewed as an extension to the case in which only the long-run dynamic
parameters are known.
We start from the decomposition that divides the likelihood function into the long-run and
short-run parts: this decomposition can be implemented in the frequency domain by using an
asymptotic equivalent to the Gaussian likelihood, which is known as the Whittle approximation
(see Dzhaparidze, 1986):
log LT = −T log 2π − 1
2
(T∑
j=1
log det(sx,y(ωj)) + tr(sx,y(ωj)−1Ix,yT (ωj))
),
where sx,y(ω) is the spectrum for the vector (xt−1, yt) and Ix,yT (ω) is the corresponding sam-
ple periodogram, i.e., Ix,yT (ω) = d(ω)d∗(ω) with d(ω) ≡ 1√2πT
∑Tt=1 (xt−1 − µx, yt − µy)
′ e−iωt, a
Fourier transformation of (xt−1 − µx, yt − µy), t = 1, .., T , and d∗(ω), its conjugate transpose.
The spectrum and the periodogram are both 2 × 2 matrices and are calculated at the natural
frequencies ωj = 2πjT, j = 1, 2, .., T . The operators det(.) and tr(.) denote determinants and
traces of matrices, respectively.
Dzhaparidze (1986) proves the asymptotic equivalence of the Gaussian log-likelihood and
the Whittle approximation for stationary processes under conditions in which the sample size T
greatly exceeds the decay time for the autocorrelations. Because the nearly integrated processes
are close to non-stationary, they clearly violate this assumption. In Appendix B, we analyze the
difference between the true likelihood and the Whittle approximation for our model. First, we
note that we are interested only in the conditional distribution of ytTt=1 given xt−1Tt=1, which
is also Gaussian under the Whittle approximation. Second, we compare the conditional variances
under the two log-likelihoods and find that their difference is asymptotically small. Finally, we
compare the conditional expectations and find that their difference increases with T and results
in a non-vanishing divergence between the likelihoods.
4Formally, it is the parameter c in ρT = 1 + c/T that cannot be consistently estimated.
6
However, Dzhaparidze’s (1986) results can be appropriately extended to the case with ρT =
1 + c/T , as shown in Appendix B. To obtain a likelihood with an asymptotically equivalent
conditional part, we suggest adjusting the series Y = (y1−µy, .., yT −µy)′ in the Whittle approx-
imation as follows: Y = Y +Re(δy)(x0−xT ), where δy is a complex-valued vector of size T×1. For
example, for the case with i.i.d. (ut, εt), the correction includes δy = (0, ..., cov(εt, ut)/var(ut))′.
If (ut, εt) are not i.i.d., then the correction term δy has a more complex form (see Appendix B)
but does not depend on βx and is, therefore, known by our assumptions. Furthermore, we will
conveniently avoid computing δy and replace it with a simpler vector that depends only on the
long-run variance of et and yields the same asymptotic distributions for the estimators and the
tests suggested in this paper.
Moving forward, we replace Y with Y in the Whittle likelihood. We also change the summa-
tion limits from j = 1, .., T to j = ⌊T/2⌋ + 1 − T, ..,−⌊T/2⌋, which will have no effect, because
sx,y(ω) is 2π-periodic. The Whittle approximation is, therefore, redefined as follows:
log LT = −T log 2π − 1
2
⌊T/2⌋∑
j=⌊T/2⌋+1−Tlog det(sx,y(ωj)) + tr(sx,y(ωj)
−1Ix,yT (ωj))
.
Because we are interested in an estimation that is immune to the short-run endogeneity, we now
assume that the relations between ytTt=1 and xt−1Tt=1 are known only at low frequencies. In
other words, we know the spectra sx,y(ωj) up to βx for j = 0,±1, ...,±q, and no information is
available about the spectral densities sx,y(ωj) for j = ±(q + 1), ...,±⌊T/2⌋. Therefore, the MP
test will depend only on the first terms of log LT that correspond to cycles with periodicities
greater than T/q:
−1/2
q∑
j=−qlog det(sx,y(ωj))− 1/2
q∑
j=−qtr(sx,y(ωj)
−1Ix,yT (ωj)). (2)
The assumption that q is kept constant agrees with the long-horizon regression literature in which
H (the horizon) is kept constant in proportion to the sample size. The goal of such assumptions
is to correctly capture the lack of observations in small samples. Alternatively, one can allow q to
increase to ∞ in such a way that 1/q + q/T → 0. However, such an assumption does not reflect
the small magnitudes of the q values that appear to be necessary to determine the long-run
dynamics in empirical applications, see Section 6.
For model (1), the spectrum of the vector (xt−1, yt) when calculated at frequency ω equals
sx,y(ω) =
su,εuu (ω)|1−ρe−iω |2 βx
su,εuu (ω)|1−ρe−iω|2 +
su,εuε (ω)e−iω
(1−ρe−iω)
βxsu,εuu (ω)
|1−ρe−iω|2 +su,εεu (ω)e+iω
(1−ρe+iω)β2x
su,εuu (ω)|1−ρe−iω|2 + 2βxRe
(e−ωi s
u,εuε (ω)
1−ρe−iω
)+ su,εεε (ω)
, (3)
7
where su,ε(ω) is the spectrum of et = (ut, εt) with the natural partition su,εuu (ω), su,εuε (ω), s
u,εεu (ω),
and su,εεε (ω). Similar notation will be used for the remaining spectra and periodogram matrices
in this paper.
We next perform a series of modifications to the likelihood (2) that remove its dependence
on the autocovariance structure of (ut, εt). Proposition 1 justifies these transitions. The first
modification is to the spectrum sx,y(ω). Note that sx,y(ω) in (3) depends on the autocovariance
of the shocks et = (ut, εt) through the spectrum of the shocks su,ε(ω). We replace su,ε(ω) when
ω ≈ 0 with its value at zero, su,ε(0), and denote the resulting spectrum of (xt−1, yt) by sx,y(ω).
The second modification is to the series Y . Note that the Y-series correction δy also depends on
the autocovariance of the shocks (see Appendix B). We suggest replacing δy with another vector
δy = (0, ..., 0, su,εuε (0)su,εuu (0)
)′ that corresponds to the adjusted series Y = Y + δy(x0 − xT ).
To compare the likelihoods for these two modifications we note that the determinant
det (sx,y(ωj)) = det (su,ε(ωj)) |1 − ρe−iωj |−2 in (2) does not depend on βx. Therefore, the test
for βx will depend only on the remaining term F =∑q
j=−q tr(sx,y(ωj)
−1Ix,yT (ωj)). Similarly, we
define the component F that corresponds to the first modification with the spectrum sx,y(ω) and
the component F that corresponds to the second modification with the periodogram Ix,yT (ωj).
This final F does not depend on the autocovariances of the shocks.
The next proposition states that F, F , and F are all asymptotically equivalent when evaluated
in the vicinity of the true parameter βx.
Proposition 1. Under Assumption A, if matrix su,ε(0) is positive-definite, then the components
F , F , and F , evaluated in the O(1/T )-neighborhoods of the true βx, are all (exactly) Op(1), and
the differences have stochastic orders of Op(1T).
Under the likelihood implied by F , the Neyman-Pearson lemma yields the MP test of H0 :
βx = β0 against H1 : βx = β1, which rejects for small values of F (β1) − F (β0). This MP
test under F is asymptotically equivalent to the MP test based on the long-run component of
the original Gaussian likelihood in the following sense. Consider the O(1/T )-neighborhoods of
the true parameter β0 and re-parameterize βx = β0 + b/T . If we reverse the corrections that
correspond to transitions from the Gaussian likelihood to the Whittle approximation and from
F to F 5, then the MP rejection rule for this new likelihood is asymptotically equivalent to the
MP rejection rule based on F . The two rules take quadratic forms in b that have asymptotically
equivalent coefficients.
We next specify the uniformly most powerful (UMP) test of H0 : βx = β0 against one-sided
alternatives under F . Let s∆β be a constant equal to 1 if H1 : βx > β0, and equal to −1 if
H1 : βx < β0. Note that the part of F that depends on βx is a quadratic function proportional
5Note that we do not reverse the transition from the Whittle approximation with all frequencies to the Whittleapproximation with the first q frequencies. Therefore, some test power is lost due to the removal of the short-runinformation.
8
to β2xsu,εuu (0)
∑qj=−q I
x,yT,xx(ωj) + 2βx
∑qj=−q
((e−iωj − ρ)su,εuε (0)I
x,yT,xx(ωj)− su,εuu (0)I
x,yT,xy(ωj)
). As dis-
cussed in Jansson and Moreira (2006) and in Campbell and Yogo (2006), the derivation of the
UMP test is complicated by the form of this statistic, which presents a weighted sum of two suffi-
cient statistics. However, note that the distribution of Ix,yT,xx(ω) does not depend on βx; therefore,
by the conditionality argument (see Basu (1977)), the UMP test can be derived based on the
conditional distribution of Ix,yT,xy given Ix,yT,xx (see Jansson and Moreira, 2006). The UMP test,
therefore, rejects for small values of
s∆β
q∑
j=−q
((e−iωj − ρ)su,εuε (0)I
x,yT,xx(ωj)− su,εuu (0)I
x,yT,xy(ωj)
),
or, equivalently, for large values of
s∆β
∑qj=−q
(Ix,yT,xy(ωj)− β0I
x,yT,xx(ωj)− (e−iωj − ρ) s
u,εuε (0)su,εuu (0)
Ix,yT,xx(ωj))
√∑qj=−q I
x,yT,xx(ωj)
.
Finally, substituting the definition of Y , we obtain the rejection rule,
s∆β
∑qj=−q
(Ix,yT,xy(ωj)− β0I
x,yT,xx(ωj)− (IT,xx+(ωj)− ρIx,yT,xx(ωj))
su,εuε (0)su,εuu (0)
)
√∑qj=−q I
x,yT,xx(ωj)
> K, (4)
where K is a constant defined by the significance level of the test and the cross-periodogram
IT,xx+(ω) is (2πT )−1(∑T
t=1(xt−1 − µx) exp(−iωt))×(∑T
t=1(xt − µx) exp(iωt)).
There are obvious similarities between the above test and the Q-test developed by Campbell
and Yogo (2006). The Q-test is proportional to the sample covariance of the processes xt−1 and
vt = yt − β0xt−1 − su,εuε (0)su,εuu (0)
(xt − ρxt−1) divided by the sample deviation of xt−1. The rejection rule
in (4) can be expressed as follows, s∆β
(∑qj=−q I
x,vT,vr(ωj)
)(∑qj=−q I
x,vT,xx(ωj)
)−1/2
> K. Because
the periodograms at low frequencies measure the long-run variances, the test proposed here
is proportional to the sample long-run covariance of the processes xt−1 and vt divided by the
long-run sample deviation of xt−1.
The maximum likelihood estimator (MLE) that corresponds to the likelihood with the prin-
cipal component F equals
βx,(µy) =
∑qj=−q I
x,yT,yx(ωj)− (IT,xx+(ωj)− ρIx,yT,xx(ωj))
rψ1/2εε
ψ1/2uu∑q
j=−q Ix,yT,xx(ωj)
,
where ψuu = 2πsu,εuu (0) is the long-run variance of ut, ψεε = 2πsu,εεε (0) is the long-run variance
9
of εt, and r = su,εuε (0)/√su,εuu (0)s
u,εεε (0) is the long-run correlation between these two shocks. The
corresponding long-run conditional variance of εt is, therefore, ψεε|u = ψεε(1− r2).
Finally, we account for the fact that µy is unknown. It appears that to account for the
unknown mean of the Y series, it is necessary only to remove the zero frequencies from the
likelihoods. The argument proceeds as follows. As in Jansson and Moreira (2006), we note that
the testing problem for βx is invariant under the location transformation of Y . Therefore, we
consider the conditional likelihood of the maximal invariant under this transformation, which is
∆Y = (y2−y1, y3−y1, ..., yT−y1). The likelihood of this invariant can be obtained by replacing µy
with µy that maximizes the likelihood of all observations: µy =1T
∑Tt=1 yt−
sx,yyx (0)
sx,yxx (0)( 1T
∑Tt=1 xt−1−
µx). When we substitute this estimator into log Lt, the only term that is affected is the one in
which ωj = ω0 = 0. For this term, the joint periodogram of X and Y becomes the periodogram
of X andsx,yyx (0)
sx,yxx (0)X . Importantly, the corresponding tr(sx,y(0)−1Ix,yT (0) becomes Ix,xT,xx(0)/s
x,yxx (0)
and, therefore, does not depend on βx, i.e., will not appear in the tests.
Therefore, to account for µy being unknown, the only necessary modification is to remove
the term with j = 0. Thus, the UMP test involves the ratio (βx−β0)/(∑
j=±1,..,±q Ix,yT,xx(ωj))
−1/2,
where βx is the MLE for the concentrated likelihood,
βx =
∑j=±1,..,±q
(Ix,yT,yx(ωj)− (IT,xx+(ωj)− ρIx,yT,xx(ωj))
rψ1/2εε
ψ1/2uu
)
∑j=±1,..,±q I
x,yT,xx(ωj)
.
We will refer to this statistic as the Local Whittle (LW) estimator.
Note that the first component of this estimator is the FDLS estimator,∑
j=±1,..,±q Ix,yT,yx(ωj)
∑j=±1,..,±q I
x,yT,xx(ωj)
by Robinson (1994). The FDLS is the estimator of the co-movement between xt−1 and yt at
low frequencies. For a fractionally integrated xt, the FDLS consistently estimates the slope
βx. The same result holds for nearly integrated processes. The role of the second component,
−(∑
j=±1,..,±q(IT,xx+(ωj) − ρIx,yT,xx(ωj))rψ
1/2εε
ψ1/2uu
)(∑
j=±1,..,±q Ix,yT,xx(ωj))
−1, is to adjust for the asymp-
totic bias. In other words, the LW estimator is a bias-adjusted version of the FDLS, akin to the
estimator in the Q-test being a bias-adjusted version of the OLS.
3 Asymptotic Properties Under the General Assumptions
The derivation of the asymptotic distributions is based on the results listed in Appendix A that
require Assumption A (and do not require Assumption B): the partial sums T−1/2∑⌊τT ⌋
t=1 ut,
T−1/2∑⌊τT ⌋
t=1 εt, and T−1/2x⌊τT ⌋ converge jointly to the processes√ψuuW
u(τ),√ψεεW
ε(τ), and√ψuuJc(τ), where the vector process (
√ψuuW
u(τ),√ψεεW
ε(τ))′ is a Brownian motion with the
variance Ω equal to the long-run variance of the vector et, i.e., Ω = Eete′t +
∑+∞j=−∞Eete
′t−j .
Process Jc(τ) is the Ornstein-Uhlenbeck (O-U) process, with the dynamics dJc(τ) = cJc(τ)dτ +
dW u(τ) starting from zero. Theorem 1 derives the asymptotic distributions of the two frequency-
10
based estimators defined in the previous section, namely, the LW and FDLS estimators.
Theorem 1. Let vector (xt−1, yt) follow the dynamics defined in system (1), with ρT = 1+ c/T ;
the initial x0 is a random variable whose distribution does not depend on T , and shocks et =
(ut, εt)′ satisfy Assumption A. Define two estimators:
βFDLSx =
∑j=±1,..,±q I
x,yT,yx(ωj)∑
j=±1,..,±q Ix,yT,xx(ωj)
(5)
and
βx =
∑j=±1,..,±q
(Ix,yT,yx(ωj)− (IT,xx+(ωj)− ρT IT,xx(ωj))
rψ1/2εε
ψ1/2uu
)
∑j=±1,..,±q I
x,yT,xx(ωj)
. (6)
For a fixed q ≥ 1, plim βx = βx and plim βFDLSx = βx and, as T → ∞, the following weak
convergence results hold:
T (βFDLSx − βx) ⇒rψ
1/2εε
ψ1/2uu
(c+ δq) +
√ψεε|uψuu
Z√∑
j=±1,..,±q
∣∣∣∫ 1
τ=0Jc(τ)e−i2πjτdτ
∣∣∣2, (7)
T (βx − βx) ⇒√ψεε|uψuu
Z√∑
j=±1,..,±q
∣∣∣∫ 1
τ=0Jc(τ)e−i2πjτdτ
∣∣∣2, (8)
where Z is a standard normal variable independent of the processes W u(τ) and Jc(τ). The term
δq is a small random variable that is defined by the integral
δq = Jc(1)
∑j=±1,..,±q
(∫ 1
τ=0Jc(τ)e
−i2πjτdτ)
∑j=±1,..,±q
∣∣∣∫ 1
τ=0Jc(τ)e−i2πjτdτ
∣∣∣2 .
Define the following LW statistic, LW (βx) = (βx − βx)
√(ψεε|u/2π)−1
[∑j=±1,..,±q I
x,yT,xx(ωj)
].
LW (βx) ⇒ N(0, 1). (9)
Analyzing the limits in (7) and (8), we see that the LW estimator removes two biases. The
first is the small-sample bias rψ1/2εε
ψ1/2uu
c, which is due to ρ 6= 1. The second is the component rψ1/2εε
ψ1/2uu
δq.
This component arises from the difference between the sample periodogram of xt and the sample
cross-periodogram of xt and xt−1. These corrections are similar to those embedded in the Q-test
of Campbell and Yogo (2006). The Q-test is based on an estimator that is equal to the OLS
11
minus rψ1/2εε
ψ1/2uu
(1 − ρ) and minus a stochastic bias due to the difference between the first sample
autocovariance and the sample variance of xt.
Finally, note that the UMP test from Section 2 coincides with the Z-test based on the t-
statistic (LW statistic) for βx. The asymptotically normal distribution of this test simplifies its
application as discussed in the next section.
4 Comparison of the Tests When ρ is Unknown
The performance of the tests derived in Section 2 greatly depends on how effectively we can
bound the possible values of ρ, or, equivalently, c. Because c cannot be consistently estimated,
one typical approach is to construct conservative intervals based on a set of likely values of ρ. In
this section, we compare the performance of the Q-test, the LW test and long-horizon regressions
when the confidence intervals are constructed using Bonferroni bounds (see Cavanagh, Elliott,
and Stock, 1995).
The idea behind Bonferroni bounds is as follows. First, one constructs the (100 − α1)%
confidence interval for ρ, e.g., (ρ, ρ). Second, for each ρ in this interval, one determines (100 −α2)% confidence intervals for βx, e.g., (βx(ρ), βx(ρ)). Lastly, these two intervals are combined to
construct conservative (100− α1 − α2)% intervals for βx as a set that contains (βx(ρ), βx(ρ)) for
all ρ ∈ (ρ, ρ). The coverage of such confidence intervals cannot be less than α = α1 + α2 for any
value of c.
However, the intervals constructed in this way do not attain the nominal significance level,
(α1 + α2)%, for any value of c, unless the distribution of the test does not depend on c. These
intervals are, therefore, overly conservative and can be further adjusted to increase the power of
the tests. We use one of such approaches formally known as the adjusted Bonferroni method to
construct equal-tailed confidence intervals for βx based on βx in (6).
4.1 Algorithm
Let α = 90%. The intervals to be determined must cover βx with a probability of at least 90%,
so that the probability of each tail does not exceed 5%. Cavanagh, Elliott, and Stock (1995)
explain how to generally construct adjusted Bonferroni bounds. The goal of this subsection is
to show how to apply this method to the LW estimator. We first summarize the approach and
then proceed to the exact algorithm applied in the empirical section of this paper.
First, note that the LW estimator in (6) can be thought of as a function of ρ, i.e., βx(ρ).
The asymptotic distribution of this estimator is mixed normal with standard deviation sβ =√ψεε|u/2π(
∑j=±1,..,±q I
x,yT,xx(ωj))
−1. Without loss of generality, consider ψεε|u < 0 and, therefore,
r < 0. The adjusted Bonferroni bounds [βx, βx] will take the form βx = βx(ρU) − q100−α2/2sβ,
12
βx = βx(ρL)+q100−α2/2sβ, where q100−α2/2 is the (100−α2/2)th percentile of the standard normal
distribution. For example, for α2 = 10%, the value of q100−α2/2 is 1.645.
Second, we determine the bounds [ρL, ρU ] to replace the (100− α1)% confidence interval for
ρ in the unadjusted Bonferroni method. These bounds should satisfy the condition that βx falls
in each of the tails with asymptotic probability of less than 5% for any value of ρ = 1 + cT. In
practice, the values of c used to verify this condition are put onto a grid that extends from -50 to
5. The solutions will take the form ρL = 1+ cL/T and ρU = 1 + cU/T , where cU = cU(r, tρ) and
cL = cL(r, tρ). The second parameter, tρ is a sample statistic that provides information about ρ.
We follow Campbell and Yogo (2006) in selecting the DF-GLS statistic from Elliott, Rothenberg,
and Stock (1996). Table 1 reports cU and cL for a set of values of tρ and r in the case with q = 5.
The steps for the following algorithm mirror the ones in the program provided by M. Yogo for
the Q-test 6. We describe the steps for r < 0. One can always consider the explanatory variable
−xt if r > 0.
STEP 1: Construction of the Dickey-Fuller GLS statistic tρ.
Determine the number of lags required in the autoregressive model of xt using the BIC
criterion. Define a = 1 − 7T. Construct vectors Xa = (xp, xp+1 − axp, ..., xT − axT−1)
′ and
Za = (1, 1− a, ..., 1− a)′. Regress Xa on Za to obtain the slope estimate βa.
Regress (xp+1 − xp, .., xT − xT−1)′ on (xp − βa, ..., xT−1 − βa)
′, (xp − xp−1, .., xT−1 − xT−2)′,
(xp−1 − xp−2, .., xT−2 − xT−3)′,..., (x2 − x1, ..., xT−p+1 − xT−p)
′. The OLS t-statistic for the
first slope in this regression is the Dickey-Fuller GLS statistics tρ.
STEP 2: Estimation of r and ψεε|u.
Regress xt on xt−1 and constant to obtain an estimate of ut. Regress yt on xt−1 and constant
to obtain an estimate of εt.
Fit the vector autoregression (VAR) to the estimated series of (ut, εt). Suppose, the cor-
responding polynomial in the lag operator is Ψ(L) and the variance of the VAR shocks is
Ω(S).
The 2× 2 long-run variance matrix is Ω(L) = (I −Ψ(1))−1Ω(S)(I −Ψ(1)′)−1. The estimate
of the long-run correlation between ut and εt is r = Ω(L)(1,2)(Ω
(L)(1,1)Ω
(L)(2,2))
−1/2, and the estimate
of the conditional long-run variance of εt is ψεε|u = Ω(L)(2,2)(1− r2).
6We are grateful to Motohiro Yogo for making the code available athttps://sites.google.com/site/motohiroyogo/.
13
STEP 3: Construction of the adjusted Bonferroni bounds.
Find bounds cL and cU based on the values of tρ and r. Tables similar to Table 1 can be
used. Define ρL = 1+ cLT
and ρU = 1+ cUT. The adjusted Bonferroni bounds take the form:
βx = βx (ρU)− 1.645
√√√√ ψεε|u2π∑
j=±1,..,±q Ix,yT,xx(ωj)
, (10)
βx = βx (ρL) + 1.645
√√√√ ψεε|u2π∑
j=±1,..,±q Ix,yT,xx(ωj)
.
4.2 Comparison of the Asymptotic Power Functions
Having defined equal-tailed (100 − α)% confidence intervals, we can, for example, perform a
0.5α%-size test of H0 : βx = β0 against H1 : βx > β0 using the rejection rule βx > β0. Figure 1
shows a comparison of the asymptotic powers of the Q-test, the LW test, and the tests based on
OLS t-statistics in simple and long-horizon regressions 7 when α = 10% and β0 = 0. The adjusted
Bonferroni method described in the previous subsection is applied to all of these estimators 8.
First, we consider the case with E(εt|xt−1, xt−2, ...) = 0, i.e., without contamination by short-
run dynamics. In this case, xt−1 is exogenous, and the short-run and long-run relations of xt
and yt coincide. For the LW test, we select q = 10, which corresponds to removing cycles
with periodicities of less than T/10, i.e., less than 5 years in 50 years of data. For the long-
horizon regression, we choose a matching horizon H = T/10 and the number of lags for Newey-
West standard errors is L such that L ∼ H . Parameters ψεε and ψuu are set to 1. Therefore,
the comparison of the tests depends on the persistence c, the correlation r, and the level of
predictability b = βxT . As in Campbell and Yogo (2006) 9, we select c = −2 and c = −20,
r = −0.95 and r = −0.75, and we look at a range of values for b.
As expected, the Q-test yields the highest power in rejecting H0 because the LW test ignores
the short-run information. Nevertheless, the LW test is as powerful as the Q-test for three of
the four calibrations, and it performs comparably for c = −20 and r = −0.75. The LW test
outperforms the OLS t-test for all four calibrations. Lastly, among the considered four methods,
7We consider long-horizon regressions in which the aggregate quantity yt(H) = yt + ...+ yt+H−1 is regressedon xt−1. For these regressions, Valkanov (2003) works out an asymptotic theory for ρT = 1+ c/T and H/T → λ,λ > 0. Valkanov (2003) derives the distribution of the t-tests with OLS standard errors. This paper considerst-statistics that are calculated with Newey-West standard errors with a number of lags L ≥ H − 1, which aremore justifiable. The supplementary materials contain the details of the asymptotic behavior of t-statistics inlong-horizon regressions.
8Note that although the formulas for the simple OLS and long-horizon t-statistics do not depend on ρ, theirasymptotic distributions will; see, for example, Hjalmarsson (2011).
9To obtain comparable results, we use the program provided by M. Yogo on his website for the Q-test. Inaddition, we update this program to compute the LW test.
14
long-horizon regressions show the weakest results: the tests based on the long-horizon OLS
estimates often yield less than half the power of the other tests.
Figure 2 shows the results for the case in which the short-run relation between yt and xt−1
differs from the long-run relation, i.e., for E(εt|xt−1, xt−2, ...) 6= 0. We consider a case in which
the correlation of εt and xt−1 has the opposite sign to βx. Such an effect might, for example,
occur in the presence of the measurement error in xt. Let the measurement error constitute
0.25% of the long-run variance of εt, while keeping the long-run variance of (ut, εt) the same as
in the previous exercise.
The power functions of the LW test and the long-horizon regressions do not change from the
previous case because neither depend on the E(εt|xt−1, xt−2, ...) = 0 assumption. However, the
asymptotic distributions of the Q-test and t-test rely on the exogeneity assumption, and therefore,
their performance is affected. It follows from Figure 2 that the LW test now outperforms the
Q-test uniformly. In fact, even the long-horizon regressions could slightly outperform the Q-test
for a small range of parameters. Note that we obtain this result with a mild deviation from the
exogeneity assumption.
One point to clarify is that the confidence intervals for the Q-test and t-test can be adjusted
to circumvent the effect of the endogeneity. However, this correction depends on the unknown
dynamics of et = (ut, εt), and the use of the estimated model is likely to affect the powers of these
methods. The frequency-domain method that is suggested here does not require adjustment: the
LW test is immune to short-run endogeneity. The same holds for the long-horizon regressions,
although as demonstrated in Figure 2, the long-horizon regressions suffer from a lack of accuracy
in determining the predictability.
5 Long-Run Nearly Optimal Test
Although the LW test is the UMP (asymptotically) in the Gaussian model with the known
persistence parameter ρ, its efficiency can be lost once we apply the Bonferroni method to
remove the dependence on ρ. Elliott, Muller, and Watson (2014) suggest a different approach
to finding the tests with optimal properties. This approach is based on asymptotically least
favorable distributions (ALFDs). Their general method works for non-standard testing problems
in which nuisance parameters affect the asymptotic distributions under H0. In our case, the
nuisance parameter is c. Elliott, Muller, and Watson (2014) consider an application of their
ALFD test to predictability studies under the assumptions of Campbell and Yogo (2006). Here,
we extend their method to the study of the long-run predictability. We derive the corresponding
nearly optimal test and compare the performance of the LW test with the resulting power bound.
Briefly, the idea behind the ALFD test is that the optimal test is the Neyman-Pearson test for
a problem with a known “distribution” of the nuisance parameter (here, c), if this distribution
15
yields the minimum weighted average power (WAP). The existence of the ALFD often cannot
be proved and, even if it exists, the ALFD is unlikely to be known. Nevertheless, the numerical
method by Elliott, Muller, and Watson (2014) ensures an ε-optimality; that is, we can find a
test whose power is not more than ε below the WAP upper bound.
For the long-run version of the ALFD test here, we combine the information about the
conditional distribution of ∆Y = (y2 − y1, .., yT − y1) given X at low frequencies with the
information about the parameter ρ contained in the distribution of X . Specifically, in the
Gaussian likelihood logLT (∆Y,X) = logLT (∆Y |X) + logLT (X), we replace only the condi-
tional distribution of ∆Y given X with the asymptotically equivalent Whittle approximation,
log LT (Y,X) = log LT (∆Y |X) + logLT (X). Details that regard the derivation of the likelihood
are given in Appendix E. Importantly, the conditional likelihood can be represented by the sum
log LT (∆Y |X) =
⌈(T−1)/2⌉∑
j=1
lj(dy(ωj)|X),
where lj(dy(ωj)|X) are conditional probability densities of the Fourier transformations of Y,
dy(ωj) =∑T
t=1 yte−iωjt . As before, we rely on the semi-parametric approach that leaves densities
lj(dy(ωj)|X) for q < j ≤ ⌈(T−1)/2⌉ unspecified. As follows from the derivations in the Appendix,
the resulting conditional likelihood takes the form
log LT (∆Y |X) = logR(∆Y|X, sx,y(ωj), q < j ≤ ⌈(T− 1)/2⌉)− q log 2π
− 1
2
∑
j=±1,..,±q
log
det(su,ε(ωj))
su,εuu (ωj)−
|dy(ωj)− βxdx(ωj)− (eiωj − ρ)su,εεu (ωj)
su,εuu (ωj)dx(ωj)|2
det(su,ε(ωj)) su,εuu (ωj)−1
where R(∆Y|X, sx,y(ωj), q < j ≤ ⌈(T − 1)/2⌉) is the remainder of the conditional likelihood
that describes the short-run dynamics, and dy(ωj) is the Fourier transformation of Y = Y +
Re(δy)(x0 − xT ), as introduced in Section 2.
The marginal distribution of X is derived by assuming normal i.i.d. ut in (1). Subsequently,
however, the argument is made that the test based on this likelihood achieves the same asymptotic
power in the more general case (Assumption A). By the assumptions, the distribution of x0 does
not depend on ρ, and therefore, the marginal likelihood equals
logLT (X) = −T − 1
2log 2π − T − 1
2logψuu −
1
2
T−1∑
t=1
(xt − µx − ρ(xt−1 − µx))2
ψuu
up to the density of x0. The parameter ψuu is defined in Section 2 as the long-run variance of ut
and coincides with the variance of ut for the i.i.d. case.
We obtain the joint likelihood log LT (Y,X) by adding the log of the conditional distribution
16
to the log of the marginal distribution of X , as defined above. We express the original likelihood
in the neighborhood of βx = β0 and ρ = 1 in terms of b′ and c, where βx = β0 +1T
ψεε|u
ψuub′ 10 The
tested hypothesis, then, becomes H0 : b′ = 0. Because the nearly optimal test depends on the
likelihood ratios, the rejection rule requires only the part of the likelihood that is a function of
b′ or c: f(R|b′, c) = exp(b′Rβ + cRρ − 1
2(b′ − c r√
1−r2 )2Rββ − 1
2c2Rρρ
), where the four sufficient
statistics, R, in the practical implementation will be replaced by the asymptomatically equivalent
R = (Rβ , Rρ, Rββ, Rρρ), defined as follows:
Rβ = (ψuuψεε|u)−1/2 1
T
q∑
j=−qj 6=0
dx(ωj)(dy(ωj)− β0dx(ωj)−rψ
1/2εε
ψ1/2uu
(dx+(ωj)− dx(ωj)))∗,
Rρ =1
2
(ψ−1uuT
−1(xT−1 − x0)2 − 1
)− r(1− r2)−1/2Rβ ,
Rββ = ψ−1uuT
−2
q∑
j=−qj 6=0
dx(ωj)dx(ωj)∗,
Rρρ = ψ−1uuT
−2T−1∑
t=1
(xt−1 − x0)2.
Denote the ALFD by Λ∗(c) and the pre-specified weights for WAP by F (b′, c), with the nor-
malization∫b′,cdF (b′, c) = 1. Once the ALFD is found, the testing procedure is based on the
following rejection rule:
RALFD(R,Λ∗) =
∫b′,c
f(R|b′, c)dF(b′, c)∫cf(R|0, c)dΛ∗(c)
> Kα
,
in which the critical value Kα corresponds to the significance level α.
The search for Λ∗(c) and Kα is performed by using the asymptotic limits of R. The limit of
10Note the difference between the parameterization of βx in this section (which follows the notation of Janssonand Moreira (2006) and Elliott, Muller, and Watson (2014)) and in Section 4 (which follows the notation ofCampbell and Yogo (2006)): the relation between the localization parameters is b = b′(1− r2)ψεεψ
−1uu .
17
R is referred to as R(b′, c) = (Rβ ,Rρ,Rββ,Rρρ) with the elements
Rβ =1
2π
q∑
j=−qj 6=0
∫ 1
0
Jc(τ)e−2πijτdτ
(∫ 1
0
e−2πijτdW z(τ) + (b′ − cr
1− r2)
∫ 1
0
Jc(τ)e−2πijτdτ
)∗
,
Rρ =1
2
(J2c (1)− 1
)− r(1− r2)−1/2Rβ,
Rββ =1
2π
q∑
j=−qj 6=0
∣∣∣∣∫ 1
0
e−2πijτJc(τ)dτ
∣∣∣∣2
,
Rρρ =
∫ 1
0
Jc(τ)2dτ,
where W z(τ) is a standard Brownian motion that is independent from the O-U process Jc(τ).
We can find an approximation for the ALFD and the corresponding Kα. Suppose that we
are interested in the one-sided α-size test with α = 5%. Without loss of generality, let b′ > 0
under the alternative. Elliott, Muller, and Watson (2014) suggest F (b′, c) that places equal
weights on the points (b′i, ci), i = 1, .., 57, where b′i = 1.645√
6−2ci1−r2 and ci = −0.0625(i− 1)2. The
approximate ALFD for the 5%-size test is also searched among mixtures of point masses ci.In Section 4, we used the interval [−50, 5] to validate the significance levels of the tests, i.e., we
allowed for small positive c values. For a fair comparison, we extend the set of ci to include
ci = +0.0625(i− 1)2, such that b′i = 1.645√
6−2ci1−r2 is defined, i.e., ci < 3.
Elliott, Muller, and Watson (2014) solve for µ∗i = log(λ∗iKα), where λ
∗i is the probability
weight that Λ∗(c) puts on ci. They note that λ∗i > 0 only for those c = ci for which the asymptotic
probability of the false test rejection, i.e., the probability of the event RALFD(R(0, ci),Λ∗) = 1
is exactly α. Therefore, the ALFD Λ∗(c) is the fixed point of the problem G(Λ) = Λ, Λ =
(λ1, .., λM), where the jth element of G : [0, 1]M → [0, 1]M is
Gj(Λ) =λj +max(0,Pr(RALFD(R(0, cj),Λ) = 1)− α)∑M
i=1 (λi +max(0,Pr(RALFD(R(0, ci),Λ) = 1)− α)).
Details of the algorithm can be found in Elliott, Muller, and Watson (2014). Using this method
with the given grid of ciMi=1, we can obtain ε-ALFD tests (for different values of the correlation
r) with ε ≤ 0.5%, i.e., the tests with the weighted power less than 0.5% from the power bound.
Figure 3 compares the powers of the LW test and the nearly optimal test for the same
parameters as in Section 4. As follows from the graph, the LW test has the same power as the
nearly optimal test for c = −20 but is less accurate for c = −2. Therefore, an additional gain in
accuracy can be achieved by applying the long-run nearly optimal test. However, the LW and
the long-run nearly optimal tests yield the same qualitative results in the empirical applications
from the next section, and consequently, only the results for the LW test will be reported.
18
6 Applications
6.1 “Importance of Measuring Payout Yield”
Among the many stock return predictors, the price-dividend ratio stands out as one of the
most strongly supported by economic theory. Campbell and Shiller (1988) noted that the price-
dividend ratio is related by an accounting identity to either the changes in the future interest
rates or the changes in the future dividend growth. Because dividend changes are only weakly
predicted (see Cochrane, 2008), price-dividend movements must be caused mainly by changes
in the expected returns. In asset pricing models, the price-dividend ratio is found to be related
to the expectations of future growth (Shiller, 1981, Bansal and Yaron, 2004) and premiums
(Bollerslev, Tauchen, and Zhou, 2009). Therefore, there are strong reasons to expect that the
price-dividend ratio predicts future returns.
With regard to the data, there are a variety of methods for calculating what would be a good
equivalent to the theoretical price-dividend ratio, such as the ratio between the company market
value and the total dividend paid during the preceding year or the ratio between an adjusted
market value of the company and an adjusted value of the dividends, such as an adjustment for
stock splits. Because some companies do not pay dividends or adopt different payout policies,
many have argued in favor of replacing the price-dividend ratio by the price-earnings ratio in
empirical work. Because all of these measures are quite persistent (e.g., the dividend yield has
the first autocorrelation of 0.86 at an annual frequency), they are good candidates for nearly
integrated modeling. One can also argue that the long-run components of all of these measures
should coincide and should have the same predictive ability for the future long-term returns.
Boudoukh, Michaely, Richardson, and Roberts (2007) discuss the implications of mismeasure-
ment of the total payout in return predictability regressions. They draw a distinction between the
dividends, total payouts (which are dividends adjusted for share repurchases), and net payouts
(which are dividends adjusted for share repurchases and equity issuances). They characterize
the problem that arises in the regressing of a stock return yt on a “wrong” payout ratio as a
measurement error problem, which is consistent with the assumptions that we made for the test
comparisons in Figure 2. The regressions of future returns on the current values of different
payout ratios present a perfect road test for the methods in this study because it is plausible
only that either all or none of the payout ratios predict future returns. The annual series are
defined as follows.
The unadjusted dividend yield is the logarithm of the ratio between the price of the stock
(here, the value-weighted CRSP index) and the corresponding past dividends. The data are run
from 1926 to 2010, with the price recorded at the end of the year and dividends aggregated over
the preceding 12-month period. The total log payout yields (log payout ratios I and II) are two
versions of the yield series adjusted for common share repurchases. The net payout is based on
19
the sum of dividends and share repurchases minus equity issuances. The logarithmic net payout
series are defined as follows: log(0.1 + Net Payout). The adjusted payout series are available
for the span 1926 - 2003 from the website of Michael Roberts. Lastly, the log earnings yield
is the logarithm of the ratio between the earnings in the previous 12 months and the current
price calculated at the end of the year. Monthly earnings data on the S&P 500 over the period
1926 - 2010 are obtained from the website of Robert J. Shiller. For more information on the
construction of payout yields, see Boudoukh, Michaely, Richardson, and Roberts (2007).
Series yt is calculated as the monthly CRSP (value-weighted) excess returns aggregated to one
year. For each month, we subtract the risk-free rate from the continuously compounded CRSP
return. The risk-free rates are obtained from the website of Kenneth French. The resulting series
span the period 1926-2010.
Table 2 reports the 90% confidence intervals for βx in return regressions starting from the net
payout, for which the predictability evidence is the strongest, and ending with the dividend yield,
for which the link with the future returns is the weakest. All of the tests reject the hypothesis
of no predictability for the net payout ratio and payout yield I. Only the t-test fails to reject
H0 : βx = 0 for the payout yield II. Only the LW test proves the predictability by the earnings-
price ratio and the dividend yield. For the dividend yield, to reject H0 : βx = 0, the number of
relevant frequencies in the LW estimator should be as low as q = 5.
Therefore, there is statistical evidence that all of the payout ratios considered by Boudoukh,
Michaely, Richardson, and Roberts (2007) predict returns. This evidence is consistent with pay-
outs I and II, the earning-price ratio and the dividend yield sharing the same long-run dynamics.
The long-run component of these series predicts future returns.
6.2 Spot Exchange Rate: Uncovered Interest Rate Parity (UIP) and
Carry Trade
Denote st as the logarithm of the spot exchange rate between the currencies of countries “a” and
“b”, with the value of currency “b” in the units of currency ”a”. If the iat is the nominal annual
interest rate in country “a” and ibt is the nominal annual interest rate in country “b”, then the
UIP states that the expected annual change in the spot rates should be equal to iat − ibt . This
UIP follows from the forward parity Etst+1 = ft, where ft is the forward exchange rate, and from
the covered interest rate parity ft − st = iat − ibt , which follows from the no-arbitrage condition.
Therefore, one can test the UIP by running the OLS regression
st+1 − st = β0 + βx(iat − ibt) + εt,
and testing H0 : βx = 1. Even if the forward parity does not hold, one would expect a positive
sign for βx, because the difference in the interest rates cannot be maintained in the long run
20
without eventual currency depreciation. The OLS results, however, yield the negative value
of βx. Boudoukh, Richardson, and Whitelaw (2013) suggest a stylized model that explains the
observed puzzle as the result of monetary policies and the carry-trade phenomenon, although they
do not state a position regarding the reasons for the carry-trade, whether rare currency crashes,
time-varying premiums, and/or limited arbitrage. As a solution, Boudoukh, Richardson, and
Whitelaw (2013) suggest a different predictor in the UIP regression:
st+1 − st = β0 + βx(ifat−j,t,t+1 − if bt−j,t,t+1) + εt,
where ift−j,t,t+1 is the forward interest rate for the period [t, t + 1] that is set at time t− j, i.e.,
ift−j,t,t+1 = (j + 1)it−j,j+1 − jit−j,j, where it−j,j and it−j,j+1 are continuously compounded j and
j + 1-period interest rates at time t− j. They found that the sign in the regressions reverts to
positive for j = 2 − 4 years (t is in annual units). Unfortunately, the standard errors prove too
large for the results to be statistically significant. In this subsection, we reevaluate the result of
Boudoukh, Richardson, and Whitelaw (2013) by using the LW test.
Boudoukh, Richardson, and Whitelaw (2013) work with annual monthly sampled data, which
is data with overlaps. The methods considered in this paper are derived assuming no significant
overlap in the observations, with exception of the long-horizon regressions. Therefore, we test
the positive relation between the interest rate differential and the depreciation of the currency
by estimating a simpler regression model,
st+1/12 − st = β0 + βx(ifat−j,t,t+1 − if bt−j,t,t+1) + εt. (11)
That is, we seek to predict the exchange rate dynamics in the first month of the year that
corresponds to the forward rates ifat−j,t,t+1 and ifbt−j,t,t+1. The cointegration coefficient βx is now
expected to be on the order of 1/12 if the UIP holds.
Table 3 shows the results for the US dollar (USD)/British pound (GBP), USD/Deutsche
Marke (DEM), and USD/Swiss Franc (SWF) pairs. The data on interest rates are obtained
from Datastream. Forward interest rates are estimated from the yield curves that were derived
from the observations on the LIBOR rates with maturities between one and 12 months and swap
rates on LIBOR with maturities between two and five years. To construct the yields implied by
the swap rates, we rely on linear extrapolations for the missing yields that correspond to the
coupon maturities, such as 18 months. Boudoukh, Richardson, and Whitelaw (2013) use the
same data set but use a less transparent method that is based on cubic extrapolation. The data
are recorded on the last trading day of each month. The resulting sample starts in January 1979
and ends in July 2012. The data are then aligned in accordance with model (11).
Table 3 reports 90% confidence intervals and point estimates for βx in (11). As follows from
the table, forward rate differentials with j = 4 do positively correlate with future currency
depreciation. The evidence is mixed for j = 1. The Q-test and OLS t-test yield close results,
21
which is explained by the the modest correlation of the residuals εt with the innovations to
ifat−j,t,t+1 − if bt−j,t,t+1. Neither method is informative about the sign of βx, because all of the
confidence intervals include both negative and positive values. Again, the strongest result is
from the LW test, with q = 5 frequencies. According to this test, the βx in the regressions for
USD/DEM and USD/CHF, and the forward rates with j = 4, are significantly positive. The
estimates of βx (annual units) in these regressions are 3.12 and 2.1 when aggregated to the annual
units by using small-sample corrected estimates of ρ: 0.98 for DEM and 0.96 for CHF. These
estimates are surprisingly close to the regression results of Boudoukh, Richardson, and Whitelaw
(2013, Table 2). However, our results are statistically significant.
Boudoukh, Richardson, and Whitelaw (2013) explain why the forward rates with the longest
horizons are cleaner measures of the future exchange rate changes. These forward rates contain
less of the second component, which is a deviation from the purchasing power parity. For the
statistical properties of this missing component, Gospodinov (2009) argues that the results in
various empirical studies are consistent with the presence of a very persistent omitted variable,
which is often referred to as the forward premium. It is not surprising, therefore, that the LW
test fails to support UIP for j = 1, because it is designed to remove only transient effects. To
summarize, even though the LW test cannot remove the omitted variable bias due to the forward
premium in the interest rate differentials, this test still offers an improvement with less-affected
measures of the interest rate differentials, such as the lagged forward interest rate differentials.
7 Conclusions
We suggest a new estimation method and the associated (Local Whittle) test, which serve the
same purpose as long-horizon regressions, to test for long-run predictability. This test provides
higher power in rejecting the no-predictability hypothesis. We demonstrated that this test is
similar to the Q-test in power and is immune to the short-run dynamics that can bias the
estimator that underlies the Q-test. The accuracy of the long-run predictability testing can be
further improved by using the new long-run nearly optimal test.
We evaluate the performance of the tests in two applications: a test for the predictability
in the stock returns by the payout ratios and a test for the predictability in the exchange rate
changes by the interest rate differentials. The confidence intervals based on the LW test are
usually close to those based on the Q-test and strengthen the predictability evidence in both
cases. For example, the LW test confirms the predictability of the returns by the dividend yield
in the 1926 - 2010 sample. The LW test also confirms the positive sign of the relationship
between the exchange rate changes and the past forward interest rate differentials. Therefore,
the long-run relations do carry extra information that is useful for studying economic relations
and that is accurate enough for performing formal statistical tests.
22
References
[1] Bansal, R. and A. Yaron, 2004, Risks For The Long Run: A Potential Resolution of Asset
Pricing Puzzles. Journal of Finance, 59, 1481 - 1509.
[2] Basu, Debarata, 1977, On the Elimination of Nuisance Parameters. Journal of the American
Statistical Association, 72(358), 355 - 366.
[3] Bollerslev, T., Tauchen, G. and H. Zhou, 2009, Expected Stock Returns and Variance Risk
Premia. Review of Financial Studies, 22, 4463 - 4492.
[4] Boudoukh, Jacob, Michaely, Roni, Richardson, Matthew, and Michael R. Roberts, 2007,
On the Importance of Measuring Payout Yield: Implications for Empirical Asset Pricing.
Journal of Finance, 62(2), 877 - 915.
[5] Boudoukh, Jacob, Richardson, Matthew, and Robert Whitelaw, 2008, The Myth of Long-
Horizon Predictability. Review of Financial Studies, 21(4), 1577 - 1605.
[6] Boudoukh, Jacob, Richardson, Matthew, and Robert Whitelaw, 2013, New Evidence on the
Forward Premium Puzzle. Working paper.
[7] Brillinger, David R., 1975, Time Series Analysis. Data Analysis and Theory. Holt, Rinehart
and Winston, New York.
[8] Campbell, John Y., and Robert J. Shiller, 1988, Stock Prices, Earnings, and Expected Divi-
dends. Journal of Finance, 43(3), 661 - 676.
[9] Campbell, John Y., and Motohiro Yogo, 2006, Efficient tests of stock return predictability.
Journal of Financial Economics, 81(1), 27 - 60.
[10] Cavanagh, Christopher L., Elliott, Graham, and James H. Stock, 1995, Inference in Models
with Nearly Integrated Regressors, Econometric Theory, 11(05), 1131 - 1147.
[11] Cochrane, J. H., 2008, The Dog That Did Not Bark: A Defense of Return Predictability.
Review of Financial Studies, 21, 1533 - 1575.
[12] Davis, Philip J., 1979, Circulant Matrices, Whiley, New Jork.
[13] Dzhaparidze, Kacha, 1986, Parameter Estimation and Hypothesis Testing in Spectral Anal-
ysis of Stationary Time Series. Springer, New York.
[14] Durlauf, Steven N., and Peter C. B. Phillips, 1986, Multiple Time Series Regression with
Integrated Processes. Review of Economic Studies, 53(4), 473 - 495.
23
[15] Gospodinov Nikolay, 2009, A New Look at the Forward Premium Puzzle, Journal of Finan-
cial Econometrics, 7 (3), 312 - 338.
[16] Elliott, Graham, Muller, Ulrich, and Mark Watson, 2014, Nearly Optimal Tests when a Nui-
sance Parameter is Present Under the Null Hypothesis. Working paper, Princeton University.
[17] Elliott, Graham, Rothenberg, Thomas J., and James H. Stock, 1996, Efficient Tests for an
Autoregressive Unit Root. Econometrica, 64, 813 836.
[18] Elliott, Graham, and James H. Stock, 1994. Inference in time series regression when the
order of integration of a regressor is unknown. Econometric Theory, 10, 672700.
[19] Fama, E. F. and K. R. French, 1988, Dividend Yields and Expected Stock Returns. Journal
of Financial Economics, 22, 3 - 25.
[20] Fisher, Mark E., and John J. Seater, 1993, Long-Run Neutrality and Superneutrality in an
ARIMA Framework. American Economic Review, 83, 402 415.
[21] Hjalmarsson, Erik, 2011, New Methods for Inference in Long-Horizon Regressions, Journal
of Financial and Quantitative Analysis, 46 (3), 815-839.
[22] Ibragimov, Ildar A., and Yuri V. Linnik, 1971, Independent and Stationary Sequences of
Random Variables. Wolters-Noordhoff Pubishing, Groningen.
[23] Ibragimov, Ildar A., and Yurii A. Rozanov, 1971, Gaussian Random Processes. Springer-
Verlag, New York.
[24] Jansson, Michael, and Marcelo Moreira, 2006, Optimal Inference in Regression Models with
Nearly Integrated Regressors. Econometrica, 74, 681 - 714.
[25] Hamilton, James H., 1994, The Time Series Analysis. Princeton University Press, Princeton.
[26] Herrndorf, N., (1984), A Functional Central Limit Theorem for Weakly Dependent Se-
quences of Random Variables. The Annals of Probability, 12(1), 141 - 153.
[27] Hong, Harrison, and Jeremy C. Stein, 1999, A Unified Theory of Underreaction, Momentum
Trading, and Overreaction in Asset Markets. Journal of Finance, 54(6), 2143 2184.
[28] Lehmann, Erich L., and Joseph P. Romano, 2005, Testing Statistical Hypotheses. Springer,
New York.
[29] Mishkin, Frederic S., 1990, What Does the Term Structure of Interest Rates Tell Us about
Future Inflation? Journal of Monetary Economics 70, 1064 1072.
24
[30] Phillips, Peter C. B., 1987, Towards a Unified Asymptotic Theory for Autoregression.
Biometrika, 74(3), 535 - 547.
[31] Phillips, Peter C. B., 1987, Asymptotic Expansions in Nonstationary Vector Autoregres-
sions. Econometric Theory, 3(1), 45 - 68.
[32] Phillips, Peter C. B., 1988, Regression Theory for Near-Integrated Time Series. Economet-
rica, 56(5), 1021 - 1043.
[33] Phillips, Peter C. B., and Bruce E. Hansen, 1990, Statistical Inference in Instrumental
Variables Regression with I(1) Processes. Review of Economic Studies, 57, 99 - 125.
[34] Richardson, Matthew, and James Stock, 1989, Drawing Inferences from Statistics Based on
Multi-Year Asset Returns. Journal of Financial Economics, 25, 323 48.
[35] Robinson, Peter M., 1994, Semiparametric Analysis of Long-Memory Time Series. Annals
of Statistics, 22 , 515 - 539.
[36] Robinson, Peter M., and Domenico Marinucci, 2003, Semiparametric Frequency Domain
Analysis of Fractional Cointegration. Time Series with Long Memory, Robinson, P.M. (Ed.),
Oxford University Press, Oxford, 334 - 373.
[37] Stambaugh, Robert F., 1999, Predictive Regressions. Journal of Financial Economics, 54,
375 - 421.
[38] Valkanov, Rossen, 2003, Long-Horizon Regressions: Theoretical Results and Applications.
Journal of Financial Economics, 68, 201 - 232.
A Results for Reference
Most of the proofs in this paper rely on the results by Phillips (1987, 1988) and the related
statements gathered in this list. All of the following weak convergence results hold jointly.
Result 1 Under Assumption A, Phillips (1987, Lemma 1, and 1988, Lemma 3.1) proved that
the process 1√ψuu
X⌊τT⌋√T
, τ ∈ [0, 1] weakly converges to the O-U process Jc(τ) start-
ing from Jc(0) = 0. The O-U process follows the dynamics dJc(τ) = cdτ + dW uτ ,
where W uτ is a standard Brownian motion. By the functional central limit theorem
(FCLT), 1√ψuu
u⌊τT⌋√T
⇒ W uτ and 1√
ψεε
ε⌊τT⌋√T
⇒ W ετ , where the vector process (W u
τ ,Wετ )
is a two-dimensional Brownian motion with correlation r = su,εuε (0)(su,εεε (0)s
u,εuu (0))
−1/2
and unit marginal variances. Note that the same result holds for the demeaned process1√ψuu
(X⌊τT⌋−µx)√T
⇒ Jc(τ) because µx/√T → 0.
25
Result 2 From the continuous mapping theorem (CMT) and Result 1, it follows that the Fourier
transformation ofX = (x0, .., xT−1) calculated at frequency ωj0 = 2πj0/T , i.e., dx(ωj0) =1√2πT
∑Tt=1 xt−1e
−iωj0t, has the following limit,
dx(ωj0)
T=
1√2π
∫ 1
τ=0
x⌊τT ⌋√Te−i
2πj0(⌊τT⌋+1)T dτ ⇒
√ψuu2π
∫ 1
τ=0
Jc(τ)e−i2πj0τdτ,
where the convergence holds jointly across dx(ωj0), j0 = 1, .., q.
Result 3 Define Sε,t = εt+ ..+ε1. Then, the Fourier transformation of εt has the following repre-
sentation:∑T
t=1 εte−iωj0
t
√2πT
= T (eiωj0−1)√2π
∑Tt=1
Sε,t−1√Te−2πj0i
tT
1T+ 1√
2π
Sε,T√T. Because the FCLT
holds for the partial sums Sε,t and limT→∞ T (eiωj0 − 1) = i2πj0, by applying the CMT
we obtain∑T
t=1 εte−iωj0
t
√2πT
⇒ j0i√2πψεε
∫ 1
τ=0W y(τ)e−2πj0iτdτ+
√ψεε
2πW y(1), and, therefore,
∑Tt=1 εte
−iωj0t
√2πT
⇒√
ψεε
2π
∫ 1
τ=0e−2πj0iτdW y(τ), where the convergence holds jointly across
dε(ωj0), j0 = 1, .., q.
Result 4 For any two processes, (w1,t, w2,t), which satisfy the conditions for the bi-variate FCLT
(e.g., Assumption A) partial sums 1√T
∑⌊τT ⌋t=1 w1,t and 1√
T
∑⌊τT ⌋t=1 w2,t converge jointly
with the Fourier transformations∑⌊τT⌋
t=1 w1,te−iωj0
t
2π√T
and∑⌊τT⌋
t=1 w2,te−iωj0
t
2π√T
to the processes
Ww1(τ),Ww2(τ),12π
∫ 1
τ=0e−2πj0iτdWw1(τ), and
12π
∫ 1
τ=0e−2πj0iτdWw2(τ), respectively, where
Ww(τ) = (Ww1(τ),Ww2(τ))′ is a Brownian process with a variance equal to the long-run
variance of (w1,t, w2,t). The proof is obtained by using the representation for the Fourier
transformations in Result 3.
B Asymptotic Equivalence of the Whittle Approximation
and the Gaussian Likelihood
Without loss of generality, let µx = 0, µy = 0, and the shocks εt and ut have unit variances. As
an illustration, consider first the case of (εt, ut), which are normal i.i.d. with the correlation r.
Conditional on X = (x0, .., xT−1)′, the distribution of the vector Y = (y1, ..., yT )
′ is normal with
the mean E(Y |X) = ΠX ,
Π = βxIT + r
−ρ 1 0 · · · 0
0 −ρ 1 · · · 0
0 0 −ρ · · · 0...
......
. . ....
0 0 0 · · · 0
,
26
where IT is the T × T identity matrix. The conditional variance of Y is the matrix B =
diag(1 − r2, .., 1 − r2, 1). For the Whittle approximation, denote∑T
j=1 tr(sx,y(ωj)
−1Ix,yT (ωj) =
(X ′Y ′)Ω−1(X ′Y ′)′, where Ω is the variance-covariance of the vector (X ′Y ′)′ under the “Whittle
likelihood”. Matrix Ω also holds information about the conditional moments. Note that matrix
Ω−1 can be comformably partitioned into four blocks, T ×T each, Axx, Axy,Ayx, and Ayy, where
the elements of the blocks are as follows: Axx(t, s) =∑T
j=1sx,yyy (ωj)
det(sx,y(ωj))e−iωj(t−s) 1
2πT, Ayy(t, s) =
∑Tj=1
sx,yxx (ωj)
det(sx,y(ωj))e−iωj(t−s) 1
2πT, Axy(t, s) = −Re
∑Tj=1
sx,yyx (ωj)
det(sx,y(ωj))e−iωj(t−s) 1
2πT, and Ayx(t, s) =
Axy(s, t). Let Ωxx, Ωxy, Ωyx, Ωyy be the corresponding partition of the matrix Ω. The conditional
expectation of Y under the Whittle likelihood is E(Y |X) = ΠX , where Π = ΩyxΩ−1xx = −A−1
yy Ayx.
Furthermore, Var(Y |X) = B = Ωyy − ΩyxΩ−1xx Ωxy = A−1
yy . Lastly, substitute the formula for the
spectrum and obtain that B = (1− r2)IT , and
Π = βxIT + r
−ρ 1 0 · · · 0
0 −ρ 1 · · · 0
0 0 −ρ · · · 0...
......
. . ....
1 0 0 · · · −ρ
.
The difference between the Gaussian and Whittle log likelihoods will depend on the following
term:
(Y − ΠX)′B−1(Y − ΠX)− (Y − ΠX)B−1(Y −ΠX) =
2X ′(Π− Π)′B−1(Y −ΠX)+X ′(Π− Π)′B−1(Π− Π)X− (Y −ΠX)′(B−1− B−1)(Y −ΠX).
The last term is simply ε2T (1− 11−r2 ) and is, therefore, Op(1). That is, the small difference between
B and B does not result in an asymptotically significant difference between the likelihood and
its approximation. However, for the remainder of the terms to be bounded in probability, it is
necessary that EX ′(Π− Π)′B−1(Π− Π)X be bounded. Note, however, that X ′(Π− Π)′B−1(Π−Π)X = r2
1−r2 (ρxT−1 − x0)2. For a nearly integrated xt, as follows from Result 1 in Appendix
A, this term is of the stochastic order Op(T ). Therefore, the Whittle approximation to the
conditional distribution of Y is not asymptotically equivalent to the Gaussian likelihood.
To obtain an asymptotically equivalent approximation, consider the transformation Y ≡ Y +
(0, .., r(x0−ρxT−1))′. Then, Y −ΠX = Y − ΠX , and therefore, the Whittle likelihood calculated
for Y and X is equivalent to the original Gaussian likelihood for Y and X11. Alternatively, for
11Note that the difference between the time-domain and the modified frequency-domain log-likelihoods involvesonly the term ε2T (1− 1
1−r2), which does not depend on the persistence parameter c in ρ = 1+ c/T . Therefore, the
convergence is uniform in c. Also, note that the Whittle likelihood is not defined for ρ = 1, but the argumentsin this Appendix can be extended to the case with ρ = 1 by continuity. For example, note that Π and B in theconditional Whittle distribution of Y are well-defined for ρ = 1.
27
a nearly integrated xt, we can consider the following asymptotically equivalent transformation
Y ≡ Y + (0, .., 0, r(x0 − xT ))′ because xT − ρxT−1 ≡ uT ∼ Op(1).
In the general case, when vectors (ut, εt) are not i.i.d.,
B−1 = Ayy =
[T∑
j=1
su,εuu (ωj)
det(su,ε(ωj))e−iωj(t−s) 1
2πT
]
t,s
,
and therefore, Ayy does not depend on ρT . As follows from Dzhaparidze (1986), (Y−E(Y |X))(B−1−B−1)(Y −E(Y |X)) ∼ Op(1), for a constant ρ < 1. Therefore, this relation also holds for any ρT .
Furthermore,
−Ayx = βxAyy + Re
[1
2πT
T∑
j=1
su,εuε (ωj)
det(su,ε(ωj))(e−iωj(t−s+1) − ρe−iωj(t−s))
]
t,s
.
Therefore, ΠX = βxX + A−1yy Re(T1)U + A−1
yy Re(T2)(x0 − xT ), where U = (u1, .., uT )′, T1
is a matrix T × T with the elements T1(t, s) = 12πT
∑Tj=1
su,εuε (ωj)
det(su,ε(ωj))e−iωj(t−s), and T2 is a
T × 1 vector with the elements T2(t) = 12πT
∑Tj=1
su,εuε (ωj)
det(su,ε(ωj))e−iωjt. Therefore, E(Y |X) =
βxX− A−1εε AεuU + A−1
yy Re(T2)(x0−xT ). The second term in the above expression is the expecta-
tion of E = (ε1, ..., εT)′ conditional on U implied by the Whittle approximation to the Gaussian
likelihood for the observations (ut, εt), t = 1, .., T . As follows from Dzhaparidze (1986), this sec-
ond term converges to E(E|U) and (E(E|U) − E(E|U))′Var−1(E|U)(E(E|U) − E(E|U)) = O(1).
The difference between the Gaussian and Whittle log likelihoods is, therefore, Op(1) + (x0 −xT )
2Re(T ′2)A
−1yy Re(T2)+2(x0−xT ) Re(T ′
2)(Y −ΠX). We conclude that this Whittle approxima-
tion and the Gaussian likelihood are not equivalent. However, we can construct an equivalent
Whittle approximation if we consider the transformed series Y = Y +Re(δy)(x0 − xT ), where δy
is a T × 1 complex vector equal to A−1yy T2.
The vector δy, which is necessary to obtain the transformed series Y , depends on all of
the parameters of the model for (ut, εt), including the parameters of the short-run dynamics.
Therefore, it is useful to derive an alternative transformation Y in such a way that it gives the
same asymptotic properties of the frequency-based estimators considered in this study, but it
must depend only on the parameters of the long-run dynamics, i.e., only on su,ε(0). We suggest
Y ≡ Y + (0, .., su,εuε (0)su,εuu (0)
(x0 − xT ))′ and prove that for a fixed j0, the Fourier transformation of Y at
frequency ωj0 =2πj0T
is asymptotically equivalent to the Fourier transformation of Y . The latter
statement readily follows from the following lemma.
Lemma 1. Under Assumption A,∑T
t=1 eiωj0
t Re(δy(t)) =su,εuε (ωj0
)
su,εuu (ωj0)→ su,εuε (0)
su,εuu (0).
Proof. The real part of∑T
t=1 eiωj0
tRe(δy(t)) equals12Re(∑T
t=1(eiωj0
t + e−iωj0t)δy(t)). The imagi-
28
nary part of∑T
t=1 eiωj0
t Re(δy(t)) equals12Im(∑T
t=1(e−iωj0
t − eiωj0t)δy(t)). Therefore, consider
T∑
t=1
eiωj0tδy(t) =
T∑
j=1
su,εuε (ωj)
det(su,ε)(ωj)
∑Tt=1
∑Ts=1 A
−1yy (t, s)e
−i(ωjs−ωj0t)
2πT.
The result for∑T
t=1 e−iωj0
tδy(t) is derived analogously. Note that the matrix A−1yy is circulant.
That is, its elements allow for the following representation: A−1yy (t, s) = γ(s − t), and for any
k ∈ [1−T,−1], γ(k) = γ(T+k). This property of the matrix A−1yy follows from the same property
of Ayy, which is verified directly from the definition of Ayy.
Because matrix A−1yy is circulant, all of the sums
∑Tt=1
∑Ts=1 A
−1yy (t, s)e
−i(ωs−ω0t) are zeroes as
long as ω 6= ω0 + 2πn, n = 0,±1,±2, .... Note that
T∑
t=1
T∑
s=1
γ(s− t)e−i(ωs−ω0t) =
T∑
t=1
t−T∑
k=t−1
γ(k)e−i(ω−ω0)t+iωk =
γ(0)
T∑
t=1
e−i(ω−ω0)t
︸ ︷︷ ︸0
+
T−1∑
k=1
γ(k)eiωkT∑
t=k+1
e−i(ω−ω0)t +
T−1∑
k=1−Tγ(k)eiωk
T+k∑
t=1
e−i(ω−ω0)t =
=e−i(ω−ω0)
1− e−i(ω−ω0)
(T−1∑
k=1
γ(k)eiωk(e−i(ω−ω0)k − 1) +
−1∑
k=1−Tγ(k)eiωk(1− e−i(ω−ω0)k)
).
After replacing γ(k) by γ(k + T ) in the second sum, we obtain the result∑T
t=1
∑Ts=1 γ(s −
t)e−i(ωs−ω0t) = 0. Therefore,
T∑
t=1
eiωj0tδy(t) =
su,εuε (ωj0)
det(su,ε(ωj0))
∑Tt=1
∑Ts=1 A
−1yy (t, s)e
−iωj0(s−t)
2πT.
Note also thatT∑
t=1
T∑
s=1
A−1yy (t, s)e
−iωj0(s−t) =
T∑
t=1
T∑
s=1
γ(s− t)e−iωj0(s−t) = T
T−1∑
k=0
γ(k)e−iωj0k.
The latter sum λj0(A−1yy ) ≡
∑T−1k=0 γ(k)e
−iωj0k is the eigenvalue of A−1
yy (t, s) that corresponds to
the eigenvector (1, e−iωj0 , .., e−iωj0(T−1))′ (see Davis (1979)). Because Ayy is also circulant, the
same result holds:
T∑
t=1
T∑
s=1
Ayy(t, s)e−iωj0
(s−t) = Tλj0(Ayy).
From the known relation between the eigenvalues of a matrix and its inverse, λj0(Ayy) =
29
λj0(A−1yy )
−1, we obtain
T∑
t=1
eiωj0tδy(t) =
su,εuε (ωj0)
det(su,ε(ωj0))
T/2π∑Tt=1
∑Ts=1 Ayy(t, s)e
−iωj0(s−t)
Lastly, substituting the definition of Ayy(t, s), we obtain
T∑
t=1
T∑
s=1
Ayy(t, s)e−iωj0
(t−s) =
T∑
t=1
T∑
s=1
Ayy(t, s)e−iωj0
(s−t) =T
2π
su,εuu (ωj0)
det(su,ε(ωj0)),
and∑T
t=1 eiωj0
tδy(t) =su,εuε (ωj0
)
su,εuu (ωj0). The same result holds for
∑Tt=1 e
−iωj0tδy(t). Therefore,
∑Tt=1 e
iωj0tRe(δy(t)) =
su,εuε (ωj0)
su,εuu (ωj0). The convergence to su,εuε (0)
su,εuu (0)follows because Assumption A implies
the continuous spectrum (see Phillips (1988)).
C Proof of Proposition 1
Proof. Express the principal components F, F , and F as follows:
F =
q∑
j=−q
|1− ρe−iωj |2det(su,ε(ωj))
(sx,yxx (ωj)I
x,yT,yy(ωj) + sx,yyy (ωj)I
x,yT,xx(ωj)− 2Re
(sx,yxy (ωj)I
x,yT,yx(ωj)
)),
F =
q∑
j=−q
|1− ρe−iωj |2det(su,ε(0))
(sx,yxx (ωj)I
x,yT,yy(ωj) + sx,yyy (ωj)I
x,yT,xx(ωj)− 2Re
(sx,yxy (ωj)I
x,yT,yx(ωj)
)),
F =
q∑
j=−q
|1− ρe−iωj |2det(su,ε(0))
(sx,yxx (ωj)I
x,yT,yy(ωj) + sx,yyy (ωj)I
x,yT,xx(ωj)− 2Re
(sx,yxy (ωj)I
x,yT,yx(ωj)
)).
I. Under Assumption A, the limit su,ε(ωj) → su,ε(0) is finite and well-defined. Therefore,
det(su,ε(ωj)) ∼ O(1) (but not o(1), because det(su,ε(0)) 6= 0). Consider now the difference
su,ε(ωj) − su,ε(0). Assumption A implies that the αi coefficients should be at most O(i−1−γ(h)),
for any 0 < γ(h) < 2/(γ − 2). The strong mixing condition also implies the complete linear
regularity of et = (ut, εt) (see Ibragimov and Rozanov, 1978) with linear regularity coefficients
that satisfy the same condition as the strong mixing coefficients. Therefore, Theorem 8 (p. 181)
of Ibragimov and Rozanov (1978) implies that the spectrum su,ε(ω) has at least one derivative.
Therefore, su,ε(ωj)− su,ε(0) ∼ Op(1/T ).
II. Let F2(ωj) = sx,yxx (ωj)Ix,yT,yy(ωj) + sx,yyy (ωj)I
x,yT,xx(ωj) − Re(sx,yxy (ωj)I
x,yT,yx(ωj)). Define F2(ωj)
and F2(ωj) analogously. To analyze the convergence behavior of the elements F2(ωj) in a neigh-
borhood of the true parameters, it suffices to evaluate F2(ωj) at the true parameters. At the
true value of βx, F2(ωj) =su,εuu (ωj)
|1−ρT e−iωj |2 Ix,εT,εε(ωj) + su,εεε (ωj)I
x,εT,xx(ωj)− 2Re su,εuε e
−iωj
1−ρT e−iωjIx,εT,εx(ωj), where
the series εt = ε+ (x0 − xT ) Re(δy(t)).
Consider the convergence of the periodograms (and the corresponding Fourier transforma-
tions). Define the vector of Fourier transformations, d(ωj) = 1/√2πT
∑Tt=1(xt−1 εt)
′×exp(−iωjt).
30
By Result 2 in Appendix A, the first element of the vector d(ωj) is Op(T ). By Result 3 in Ap-
pendix A, the Fourier transformation of εt, dε(ωj), is Op(1). The Fourier transformation of εt,
dε(ωj) = dε(ωj) +x0−xT√
T1√2π
∑Tt Re(δy(t))e
−iωjt. By Lemma 1, the sum∑T
t Re(δy(t))e−iωjt ∼
O(1). By Result 1 in Appendix A, x0−xT√T
is Op(1). Therefore, the second element of d(ωj)
is Op(1). Therefore, the periodogram Ix,εT (ωj) = d(ωj)d∗(ωj) consists of the elements of the
following stochastic orders: Ix,εT,xx ∼ Op(T2), Ix,εT,εε ∼ Op(1), and I
x,εT,εx ∼ Op(T ).
Combining the results for the spectral densities and the periodogram, we obtain F2(ωj) ∼Op(T
2) and F2(ωj)− F2(ωj) ∼ Op(T ).
III. Note that F is a sum of elements |1−ρT e−iωj |2det(su,ε(ωj))
F2(ωj). From the results in II, it follows
directly that these elements are of the stochastic order of Op(1). Therefore, F ∼ Op(1). Similarly,
F ∼ Op(1). Furthermore,
|1− ρT e−iωj |2
det(su,ε(ωj))F2(ωj)−
|1− ρT e−iωj |2
det(su,ε(0))F2(ωj) =
|1− ρT e−iωj |2(F2(ωj)− F2(ωj))
det(su,ε(ωj))+
|1− ρT e−iωj |2F2(ωj)(det(s
u,ε(0))− det(su,ε(ωj)))
det(su,ε(0))det(su,ε(ωj))∼ Op
(1
T
).
Therefore, F− F ∼ Op
(1T
).
IV. From Lemma 1 (and Result 1), it follows that dy(ωj)−dy(ωj) = x0−xT√2πT
(su,εuε (ωj)/su,εuu (ωj)−
su,εuε (0)/su,εuu (0)) ∼ Op(1/T ). Therefore, F2(ωj)− F2(ωj) = Op(T ). Thus, F − F = Op(
1T).
D Proof of Theorem 1
We first show that the result of Theorem 1 holds for et that is i.i.d. normal, and then, demonstrate
that the asymptotic limits do not depend on the independence and Gaussian assumption.
Proof of Theorem 1: i.i.d. Gaussian Case. Decompose εt = r√
ψεε
ψuuut +
√ψεε|uzt. Note that by
construction, zt is i.i.d. standard normal, independent of the observations of xt. The related
decomposition of yt is yt = βxxt−1+r√
ψεε
ψuuut+
√ψεε|uzt or, equivalently, yt = βxxt−1+r
√ψεε
ψuu(xt−
ρxt−1) +√ψεε|uzt. Substituting this decomposition into the formula for βx (6) we obtain
βx =
q∑
j=−qj 6=0
Ix,yT,xx(ωj)
−1(∑
j=−qj 6=0
βxIx,yT,xx(ωj) + r
√ψεεψuu
(IT,xx+(ωj)− ρIT,xx(ωj))+
+√ψεε|uI
x,zT,zx(ωj) − (IT,xx+(ωj)− ρT IT,xx(ωj))r
√ψεεψuu
),
31
where Ix,zT,zx(ωj) is the cross-periodogram of zt and xt−1. Therefore, the estimation error is simply
βx − βx =√ψεε|u
∑q=±1,..,±q I
x,zT,zx(ωj)∑
q=±1,..,±q Ix,yT,xx(ωj)
=√ψεε|uRe
(∑qj=1 I
x,zT,zx(ωj)∑q
j=1 Ix,yT,xx(ωj)
).
Define a complex normal variable wzj =∑T
t=1 zte−iωjt. Asymptotically, wzj , j = 1, .., q are i.i.d
NC(0, T )(using the notation of Brillinger, 1975), i.e., the real and imaginary parts of wzj are
jointly normal with the mean (0, 0)′ and the variance matrix T2I2, where I2 is a two-by-two
identity matrix. Define also
cxj =
∑Tt=1 xt−1e
iωjt
∑qj′=1 |
∑Tt=1 xt−1e
−iωj′ t|2.
Therefore, βx − βx =√ψεε|uRe
∑qj=1w
zj cxj , which is normal conditional on the observations of
xt. Conditional on xt, the limit of the variance of Re∑q
j=1wzj cxj is T/2
∑qj=1 |cxj |2. Additionally,
note that∑q
j=1 |cxj |2 = (∑q
j=1
∑Tt=1 |xt−1e
−iωjt|2)−1. That is,
βx − βx =√ψεε|u
√T/2ZT√∑q
j=1 |∑T
t=1 xt−1e−iωjt|2(1 + oxp(1)
)=
=√ψεε|u
√TZT√∑
j=±1,..,±q |∑T
t=1 xt−1e−iωjt|2(1 + oxp(1)
),
where ZT is a standard normal variable independent of the observations of xt. The term oxp(1),
which is a function of xt, t = 0, ..., T − 1, appears in the above formula after we replace the true
conditional variance of Re∑q
j=1wzj cxj with its approximation as T → ∞. For example,
Var(Reωzj cxj ) = |cxj |2
T
2
(1 +
(Re cxj )2
|cxj |2∑T
t=1 cos(2ωjt)
T
−(Im cxj )
2
|cxj |2∑T
t=1 cos(2ωjt)
T+ 2
Im cxj Re cxj
|cxj |2∑T
t=1 sin(2ωjt)
T
).
Because the sequences(Re cxj )
2
|cxj |2,
(Im cxj )2
|cxj |2, and
Im cxj Re cxj|cxj |2
are all uniformly tight and the ratios∑T
t=1 cos(2ωjt)
Tand
∑Tt=1 sin(2ωjt)
Tboth converge to zero, then Var(ωzj c
xj ) = |cxj |2 T2 (1 + oxp(1)).
We conclude that
βx − βx√Tψεε|u
(∑j=±1,..,±q |
∑Tt=1 xt−1e−iωjt|2
)−1/2⇒ Z,
where Z is standard normal. We obtain the result (9) after replacing |∑T
t=1 xt−1e−iωjt|2 by
32
2πTIx,yT,xx(ωj).
To obtain (8), we use Result 1 to replace |∑T
t=1 xt−1e−iωjt| in the expression above with its
asymptotic limit. Applying the CMT, we obtain the asymptotic distribution of βx,
T (βx − βx) ⇒√ψεε|uψuu
Z√∑j=±1,..,±q |
∫ 1
τ=0Jc(τ)e−i2πjτdτ |2
. (12)
To derive the asymptotic limit of the FDLS estimator notice that
βFDLSx = βx + r
√ψεεψuu
(∑j=±1,..,±q(IT,xx+(ωj)− Ix,yT,xx(ωj))∑
j=±1,..,±q Ix,yT,xx(ωj)
+
∑j=±1,..,±q I
x,yT,xx(ωj)(1− ρT )∑
j=±1,..,±q Ix,yT,xx(ωj)
),
where the last term in the parenthesis simplifies to c/T . Furthermore, IT,xx+(ωj) = (xT −x0)/
√2πTdx(ωj) + eiωjIx,yT,xx(ωj). That is,
T (βxFDLS − βx)
r√
ψεε
ψuu
= TxT − x0√
2πT
∑j=±1,..,±q dx(ωj)∑
j=±1,..,±q Ix,yT,xx(ωj)
+ T
∑j=±1,..,±q I
x,yT,xx(ωj)(e
iωj − 1)∑j=±1,..,±q I
x,yT,xx(ωj)
+ c.
The first term T xT−x0√2πT
∑j=±1,..,±q dx(ωj)
∑j=±1,..,±q I
x,yT,xx(ωj)
converges to δq by the CMT and by Results 1-2 in Ap-
pendix A. The second term converges to zero because
T
∑j=±1,..,±q I
x,yT,xx(ωj)(e
iωj − 1)∑j=±1,..,±q I
x,yT,xx(ωj)
=
∑j=±1,..,±q I
x,yT,xx(ωj)(cos(ωj)− 1)T∑
j=±1,..,±q Ix,yT,xx(ωj)
,
and (cos(ωj)− 1)T ∼ Op(1/T ).
The limit (7) is obtained by summing the term r√
ψεε
ψuu(δq+c) and the limit of T (βx−βx).
We now demonstrate that the asymptotic distributions do not depend on the independence
and Gaussian assumptions.
Proof of Theorem 1: General Case. Under Assumption A, we can still use Results 2-3 in Ap-
pendix A to obtain the following for the FDLS estimator:
T (βFDLSx − βx) =
∑j=±1,..,±q
(∑Tt=1 εte
−iωjt
√T
)(∑Tt=1
xt−1√Tei2πj
tT
1T
)
∑j=±1,..,±q
∣∣∣∑T
t=1xt−1√Te−i2πj
tT
1T
∣∣∣2 ⇒
√ψεεψuu
∑j=±1,..,±q(
∫ 1
τ=0e−2πjiτdW ε(τ))
(∫ 1
τ=0Jc(τ)e
i2πjτdτ)
∑j=±1,..,±q
∣∣∣∫ 1
τ=0Jc(τ)ei2πjτdτ
∣∣∣2 .
The above formula is valid for the case with the Gaussian i.i.d. (ut, εt) that satisfies Assumption
A. Therefore, the right-hand side expression coincides with the limit in (7). The asymptotic
33
convergence in (7), therefore, is also valid in this more general case. The result for the LW
estimator is proved analogously.
E Joint Likelihood for the Long-Run Nearly Optimal Test
In this section, we derive the likelihood function that is used for the long-run nearly optimal
test. For vectors ∆Y = (y2−y1, .., yT −y1) and X = (x0, .., xT−1), divide the Gaussian likelihood
logLT (∆Y,X) = logLT (∆Y |X) + logLT (X) into the distribution of X and the conditional
density of Y . The conditional density is then replaced by the equivalent Whittle approximation,
log LT (∆Y,X) = log LT (∆Y |X) + logLT (X). The steps to obtain the Whittle approximation
log LT (∆Y |X) are given in Section 2.
Note that the Whittle approximation describes the distribution of the Fourier transformations
dx(ωj) =∑T
t=1 xt−1e−iωjt and dy(ωj) =
∑Tt=1 yte
−iωjt, where the series Y = Y + Re(δy)(x0 −xT ) is given in Section 2. With some abuse of notation, log LT (∆Y,X) ≡ log LT (dx, dy) =∑T−1
j=1 fj(dx(ωj), dy(ωj)), where for d(ωj) = (dx(ωj), dy(ωj))
fj(dx(ωj), dy(ωj)) = − log 2π − 1
2log det(sx,y(ωj)) + tr(sx,y(ωj)
−1d(ωj)d∗(ωj)).
Assume for simplicity that T is odd. For 1 ≤ j ≤ (T − 1)/2, the above likelihood cor-
responds to a complex mean-zero normal variable d(ωj) with E(d(ωj)d∗(ωj)) = sx,y(ωj) and
E(d(ωj)d(ωj)′) = 02×2. Note, that the implied density for each pair d(ωj) is lj(dx(ωj), dy(ωj)) ≡
(fj(dx(ωj), dy(ωj))+ fj(dx(ωT−j), dy(ωT−j)) because the corresponding d(ωT−j) is just a complex
conjugate.
Next, represent the Whittle likelihood as the likelihood of independent bivariate normal
variables d(ωj), j = 1, .., (T − 1)/2, log LT (dx, dy) =∑(T−1)/2
j=1 lj(dx(ωj), dy(ωj)). It follows that
the dy(ωj) conditional on X are independent and are complex normal, with moments that depend
on dx(ωj) only. Note that the Fourier decomposition of Y , dy(ωj), is equal to dy(ωj) minus the
Fourier decomposition of Re(δy)(x0−xT ), say dδ(ωj). Therefore, the dy(ωj) conditional on X are
also independent across j = 1, .., (T − 1)/2 and are complex normal, with the mean equal to the
conditional mean of dy(ωj) minus dδ(ωj) and the second moments equal to the second moments
of dy(ωj).
For j = 1, .., (T − 1)/2, dy(ωj) is NC(sx,yyx (ωj)
sx,yxx (ωj)dx(ωj), 2
det(sx,y(ωj))
sx,yxx (ωj)
). In other words, its mean
issx,yyx (ωj)
sx,yxx (ωj)dx(ωj), and the imaginary and real parts are independent normal with variances of
det(sx,y(ωj))
sx,yxx (ωj)each.
34
Combining all of these results together, we obtain:
log LT (∆Y |X) =T−1∑
j=1
−1
2log 2π − 1
2log
det(sx,y(ωj))
sx,yxx (ωj)−
|dy(ωj)− sx,yyx (ωj)
sx,yxx (ωj)dx(ωj)|2
2 det(sx,y(ωj)) sx,yxx (ωj)−1
.
Next, we divide this likelihood into the short-run and long-run parts and substitute the
formula for sx,y(ωj),
log LT (∆Y |X) = logR(∆Y|X, sx,y(ωj), q < j ≤ (T− 1)/2)− q log 2π
− 1
2
q∑
j=−qj 6=0
log
det(su,ε(ωj))
su,εuu (ωj)−
|dy(ωj)− βxdx(ωj)− (eiωj − ρ)su,εεu (ωj)
su,εuu (ωj)dx(ωj)|2
det(su,ε(ωj)) su,εuu (ωj)−1
,
where R(∆Y|X, sx,y(ωj), q < j ≤ (T − 1)/2) is the remainder of the conditional likelihood that
describes the short-run dynamics.
The marginal distribution of X is derived for the normal i.i.d. ut in (1). Assuming that the
distribution of x0 does not depend on ρ, the marginal likelihood is then
logLT (X) = −T − 1
2log 2π − T − 1
2logψuu −
1
2
T−1∑
t=1
(xt − µx − ρ(xt−1 − µx))2
ψuu
up to the density of x0.
We obtain the joint likelihood log LT (Y,X) by adding the log of the conditional distribution
and of the marginal distribution ofX as defined above. Furthermore, we replace the series Y with
Y , as explained in the main text, to obtain an asymptotically equivalent likelihood. Similarly,
based on Proposition 1, we can further simplify the testing problem after replacing suε(ωj) by
suε(0). The resulting likelihood to be used in the likelihood ratios is, therefore, defined as follows:
log LT (∆Y,X|βx, ρ) = log R(∆Y|X, sx,y(ωj), q < j ≤ (T− 1)/2)− q log 2π
− 1
2
q∑
jj 6=0
=−q
logψεε|u −
|dy(ωj)− βxdx(ωj)− (dx+(ωj)− ρdx(ωj))rψ
1/2εε
ψ1/2uu
|2
ψεε|u
+
− T − 1
2log 2π − T − 1
2logψuu −
1
2
T−1∑
t=1
(xt − µx − ρ(xt−1 − µx))2
ψuu,
where dx+(ωj) = 1/√2πT
∑Tt=1 xtexp(−iωjt).
Following Jansson and Moreira (2006), we next derive the asymptotic distribution of the
obtained likelihood in the neighborhood of the null hypothesis H0 : βx = β0. We re-parameterize
βx = β0+1T
ψεε|u
ψuub′ and ρ = 1+c/T , and we express the likelihood as follows, log LT (∆Y,X|b′, c) =
35
log LT (Y,X|b′ = 0, c = 0)+ b′Rβ+ cRρ− 12(b′− r
1−r2 c)2Rββ− 1
2c2Rρρ, where Rβ = (ψuuψεε|u)
−1/2×1T
∑j=±1,..,±q dx(ωj)(dy(ωj) − β0dx(ωj) − rψ
1/2εε
ψ1/2uu
(dx+(ωj) − dx(ωj)))∗, Rρ = ψ−1
uuT−1∑T−1
t=1 (xt−1 −µx)(xt−xt−1)−r(1−r2)−1/2Rβ, Rββ = ψ−1
uu T−2∑
j=±1,..,±q dx(ωj)×dx(ωj)∗, and Rρρ = ψ−1uuT
−2×∑T−1t=1 x
2t−1.
The asymptotic behavior of the likelihood in the vicinity of β0 can then be obtained by
plugging in the limits for Rβ, Rρ, Rββ , and Rρρ. First,
Rβ = (ψuuψεε|u)−1/2 1
T
q∑
j=−qj 6=0
dx(ωj)
(b′
√ψεε|uψuu
dx(ωj)
T+√ψεε|udz(ωj)− c
rψ1/2εε
ψ1/2uu
dx(ωj)
T
)∗
,
where zt ≡ (ψεε|u)−1/2(εt − rψ
1/2εε
ψ1/2uu
ut). Note that the long-run variance of zt is 1 and, therefore,
as follows from Result 4 in Appendix A, its Fourier transformation has the limit dz(ωj) ⇒12π
∑1τ=0 e
−2πijτdW z(τ), where W z(τ) is a Brownian process. As follows from Result 2 from
Appendix A, the Fourier transformation of xt has the limitdx(ωj)
T⇒
√ψuu
2π
∑1τ=0 e
−2πijτJc(τ)dτ .
Furthermore, because the long-run covariance between zt and ut is zero, then from Results
2 and 5 in Appendix A, it follows that Jc(τ) and W z(τ) are independent. Thus, we ob-
tain the limit Rβ ⇒ Rβ , where Rβ = 12π
∑qj=−qj 6=0
(∫ 1
0Jc(τ)e
−2πijτdτ)(∫ 1
0e−2πijτdW z(τ) + (b′ −
cr1−r2 )
∫ 1
0Jc(τ)e
−2πijτdτ)∗. Second, combining the latter result and the result by Jansson and Mor-
eira (2006, Lemma 3), we obtain Rρ ⇒ Rρ =∫ 1
0Jc(τ)dJc(τ) − r(1 − r2)−1/2Rβ, and similarly,
Rββ ⇒ Rββ = 12π
∑j=±1,..,±q |
∑1τ=0 e
−2πijτJc(τ)dτ |2 and Rρρ ⇒ Rρρ =∫ 1
0(Jc(τ))
2dτ.
The asymptotic inference problem, thus, relies on the exponential function of the multi-variate
process R = (Rβ ,Rρ,Rββ ,Rρρ) and the parameters (b′, c):
L(R|b′, c) ∝ eb′Rβ+cRρ− 1
2(b′−c r√
1−r2)2Rββ− 1
2c2Rρρ
.
Note that in calculating the likelihood ratio, the observations R can be replaced by the
36
asymptotically equivalent R,
Rβ = (ψuuψεε|u)−1/2 1
T
q∑
j=−qj 6=0
dx(ωj)(dy(ωj)− β0dx(ωj)−rψ
1/2εε
ψ1/2uu
(dx+(ωj)− dx(ωj)))∗,
Rρ = ψ−1uuT
−1
T−1∑
t=1
(xt−1 − x0)(xt − xt−1)− r(1− r2)−1/2Rβ,
Rββ = ψ−1uuT
−2
q∑
|j|=1
dx(ωj)dx(ωj)∗,
Rρρ = ψ−1uuT
−2
T−1∑
t=1
(xt−1 − x0)2.
If utTt=1 are not i.i.d. but satisfy Assumption A, then the statistics Rβ, Rββ , and Rρρ still
converge to Rβ, Rββ , and Rρρ, respectively. For the limit of Rρ in the i.i.d. case, note that
Rρ =12(Jc(1)
2 − 1)− r(1− r2)−1/2Rβ . Therefore, we can use the following statistic
Rρ =1
2
(ψ−1uu T
−1(xT−1 − x0)2 − 1
)− r(1− r2)−1/2Rβ ,
which converges to Rρ in the general case.
F Figures and Tables
37
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
b
c = −2 r = −0.95
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
b
c = −2 r = −0.75
0 5 10 15 200
0.2
0.4
0.6
0.8
1
b
c = −20 r = −0.95
0 5 10 15 200
0.2
0.4
0.6
0.8
1
b
c = −20 r = −0.75
Q−test t−test LW−test Long−horizon t−test
Figure 1: Asymptotic local power functions for H0 : βx = 0. The data are generatedaccording to model (1) with E(εt|ut−1, ...) = 0. The power functions depend on the persistenceparameter c in ρT = 1 + c
T, the long-run correlation r between shocks εt and ut, and the true
value of βx = bT. The long-run variances of εt and ut are set to 1. The Q-test is the test by
Campbell and Yogo (2006). The t-test is the OLS t-test. The LW test is the test defined inTheorem 1. The long-horizon t-test is the OLS t-test with Newey-West standard errors (s.e.)for the regression of yt + ... + yt+H−1 on xt−1. The number of lags for the Newey-West s.e. L issuch that L/H → 1. The horizon H is 1/10th the sample size. Correspondingly, the number offrequencies q for the LW test is 10. The confidence intervals for all of the tests are constructedby using the adjusted Bonferroni bounds based on the DF-GLS statistic for ρT .
38
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
b
c = −2 r = −0.95
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
b
c = −2 r = −0.75
0 5 10 15 200
0.2
0.4
0.6
0.8
1
b
c = −20 r = −0.95
0 5 10 15 200
0.2
0.4
0.6
0.8
1
b
c = −20 r = −0.75
Q−test t−test LW−test Long−horizon t−test
Figure 2: Asymptotic local power functions for H0 : βx = 0. The data are generatedaccording to model (1) with E(εt|ut−1, ...) 6= 0 due to a measurement error in xt. In particular,if mt is a measurement error, then ut is positively correlated with mt, and the error term εt =−βxmt−1 + εt, where E(εt|ut−1, ...) = 0. The parameters for the dynamics of ut, mt, and εt arechosen in such a way that the long-run covariance matrix of et = (εt, ut) is the same as in Figure1, and the variance of E(εt|ut−1, ...) is 0.25% of the variance of εt. The power functions dependon the persistence parameter c in ρT = 1 + c
T, the long-run correlation r between shocks εt and
ut, and the true value of βx = bT. The Q-test is the test by Campbell and Yogo (2006). The
t-test is the OLS t-test. The LW test is the test defined in Theorem 1. The long-horizon t-testis the OLS t-test with Newey-West standard errors (s.e.) for the regression of yt + ... + yt+H−1
on xt−1. The number of lags for the Newey-West s.e. L is such that L/H → 1. The horizon His 1/10th the sample size. Correspondingly, the number of frequencies q for the LW test is 10.The confidence intervals for all of the tests are constructed by using adjusted Bonferroni boundsbased on the DF-GLS statistic for ρT .
39
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
b
c = −2 r = −0.95
LW test Long−run nearly−optimal
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
b
c = −2 r = −0.75
0 5 10 15 200
0.2
0.4
0.6
0.8
1
b
c = −20 r = −0.95
0 5 10 15 200
0.2
0.4
0.6
0.8
1
b
c = −20 r = −0.75
Figure 3: Asymptotic local power functions for H0 : βx = 0. The data are generatedaccording to model (1). The power functions depend on the persistence parameter c in ρT = 1+ c
T,
the long-run correlation r between shocks εt and ut, and the true value of βx =bT. The long-run
variances of εt and ut are set to 1. The LW test is the test defined in Theorem 1. The nearlyoptimal long-run predictability test is described in Section 5. The number of frequencies q forthe LW and the nearly optimal tests is 10.
40
Table 1: Conservative Bounds on c in the LW Test With q = 5 FrequenciesThe table reports lower (cL) and upper (cU ) bounds on c in model (1) with ρ = 1 + c
T that are usedto construct conservative (Bonferroni adjusted) equal-tailed 90% confidence intervals for βx based onthe value of the Local Whittle estimator with q = 5 frequencies. To choose the row, calculate Dickey-Fuller GLS statistic tρ, see Elliott, Rothenberg, and Stock (1996). Columns correspond to consistentlyestimated long-run correlations r between shocks εt and ut.
r = −1.0 r = −0.9 r = −0.8 r = −0.7 r = −0.6tρ cL cU cL cU cL cU cL cU cL cU-12.00 −257.26 −188.59 −248.20 −201.47 −247.82 −206.43 −245.24 −208.49 −245.69 −211.01-11.50 −242.75 −175.15 −234.79 −187.81 −234.36 −192.34 −232.00 −194.73 −232.27 −197.12-11.00 −228.24 −161.80 −221.39 −174.15 −220.90 −178.24 −218.75 −180.97 −218.84 −183.23-10.50 −213.72 −148.78 −207.98 −160.43 −207.43 −164.25 −205.51 −167.25 −205.42 −169.34-10.00 −199.21 −135.94 −194.57 −147.29 −193.97 −150.80 −192.27 −153.82 −191.99 −155.62-9.75 −191.96 −129.65 −187.87 −140.75 −187.24 −144.24 −185.64 −146.97 −185.28 −148.91-9.50 −184.70 −123.33 −181.17 −134.33 −180.51 −137.65 −179.02 −140.42 −178.57 −142.30-9.25 −177.45 −117.13 −174.46 −127.98 −173.78 −131.12 −172.40 −133.90 −171.85 −135.78-9.00 −170.19 −111.00 −167.68 −121.63 −166.74 −124.91 −165.31 −127.51 −164.74 −129.22-8.75 −162.70 −104.97 −159.94 −115.26 −159.25 −118.62 −157.98 −121.06 −157.40 −122.86-8.50 −155.49 −99.15 −152.88 −109.18 −152.20 −112.24 −150.83 −114.76 −150.20 −116.51-8.25 −148.24 −93.39 −145.79 −103.09 −144.97 −106.14 −143.56 −108.58 −143.05 −110.27-8.00 −140.94 −87.78 −138.58 −97.23 −138.00 −100.17 −136.66 −102.54 −136.10 −104.17-7.75 −134.15 −82.26 −131.73 −91.35 −131.05 −94.31 −129.82 −96.57 −129.20 −98.13-7.50 −126.92 −76.81 −124.86 −85.83 −124.25 −88.56 −123.08 −90.71 −122.51 −92.18-7.25 −120.31 −71.61 −118.10 −80.09 −117.46 −82.86 −116.25 −85.05 −115.73 −86.60-7.00 −113.70 −66.29 −111.33 −74.82 −110.67 −77.44 −109.58 −79.41 −109.10 −80.85-6.75 −106.91 −61.44 −104.92 −69.48 −104.40 −72.09 −103.31 −74.06 −102.82 −75.47-6.50 −100.45 −56.47 −98.58 −64.24 −98.02 −66.75 −96.96 −68.70 −96.51 −70.06-6.25 −94.12 −51.61 −92.26 −59.30 −91.72 −61.75 −90.78 −63.51 −90.33 −64.83-6.00 −87.99 −47.19 −86.27 −54.47 −85.75 −56.78 −84.78 −58.50 −84.36 −59.79-5.75 −82.01 −42.73 −80.18 −49.78 −79.70 −51.91 −78.80 −53.78 −78.38 −55.05-5.50 −76.19 −38.56 −74.58 −45.30 −74.14 −47.46 −73.22 −49.08 −72.79 −50.23-5.25 −70.57 −34.48 −68.95 −40.97 −68.48 −43.00 −67.64 −44.54 −67.26 −45.70-5.00 −65.00 −30.57 −63.48 −36.81 −63.03 −38.79 −62.26 −40.27 −61.89 −41.36-4.75 −59.68 −26.92 −58.21 −32.85 −57.79 −34.70 −57.03 −36.12 −56.65 −37.16-4.50 −54.58 −23.40 −53.22 −29.00 −52.84 −30.82 −52.09 −32.20 −51.75 −33.17-4.25 −49.62 −20.04 −48.34 −25.44 −47.96 −27.10 −47.28 −28.44 −46.95 −29.36-4.00 −44.93 −16.99 −43.68 −22.13 −43.33 −23.70 −42.71 −24.90 −42.41 −25.76-3.75 −40.46 −14.20 −39.32 −18.93 −38.94 −20.40 −38.31 −21.58 −38.01 −22.40-3.50 −36.05 −11.47 −34.99 −15.94 −34.69 −17.36 −34.15 −18.45 −33.87 −19.22-3.25 −32.05 −8.94 −30.97 −13.25 −30.67 −14.57 −30.14 −15.53 −29.91 −16.25-3.00 −28.20 −6.78 −27.25 −10.68 −26.98 −11.87 −26.49 −12.87 −26.26 −13.54-2.75 −24.57 −4.80 −23.74 −8.40 −23.48 −9.53 −23.00 −10.38 −22.78 −11.01-2.50 −21.14 −2.95 −20.32 −6.29 −20.09 −7.36 −19.70 −8.16 −19.50 −8.71-2.25 −17.97 −1.42 −17.27 −4.47 −17.05 −5.42 −16.66 −6.15 −16.48 −6.66-2.00 −15.08 −0.10 −14.41 −2.85 −14.22 −3.75 −13.89 −4.37 −13.74 −4.87-1.75 −12.44 1.03 −11.85 −1.48 −11.70 −2.24 −11.39 −2.85 −11.24 −3.27-1.50 −10.08 1.90 −9.54 −0.35 −9.39 −1.03 −9.12 −1.55 −9.00 −1.91-1.25 −8.01 2.59 −7.50 0.58 −7.39 0.00 −7.15 −0.49 −7.02 −0.80-1.00 −6.18 3.08 −5.76 1.30 −5.64 0.77 −5.42 0.35 −5.32 0.07-0.75 −4.76 3.38 −4.33 1.78 −4.22 1.31 −4.00 0.96 −3.92 0.72-0.50 −3.58 3.60 −3.20 2.08 −3.11 1.69 −2.94 1.36 −2.88 1.14-0.25 −2.69 3.77 −2.38 2.33 −2.29 1.92 −2.15 1.64 −2.08 1.440.00 −2.00 3.91 −1.72 2.51 −1.65 2.12 −1.50 1.85 −1.45 1.670.25 −1.43 4.04 −1.20 2.67 −1.14 2.30 −1.03 2.02 −0.98 1.850.50 −0.99 4.16 −0.80 2.81 −0.74 2.44 −0.65 2.18 −0.60 2.011.00 −0.34 4.35 −0.18 3.05 −0.14 2.69 −0.05 2.44 −0.01 2.281.50 0.13 4.51 0.26 3.23 0.29 2.90 0.35 2.66 0.38 2.502.00 0.47 4.67 0.57 3.40 0.60 3.08 0.66 2.84 0.69 2.692.50 0.74 4.81 0.83 3.55 0.86 3.23 0.92 3.01 0.94 2.853.00 0.96 4.94 1.05 3.68 1.07 3.36 1.12 3.14 1.14 3.003.50 1.13 5.04 1.22 3.80 1.24 3.49 1.29 3.27 1.31 3.124.00 1.29 5.13 1.36 3.90 1.39 3.60 1.43 3.37 1.46 3.23
41
Table 1: Conservative Bounds on c in the LW Test With q = 5 Frequencies (continued)The table reports lower (cL) and upper (cU ) bounds on c in model (1) with ρ = 1 + c
T that are usedto construct conservative (Bonferroni adjusted) equal-tailed 90% confidence intervals for βx based onthe value of the Local Whittle estimator with q = 5 frequencies. To choose the row, calculate Dickey-Fuller GLS statistic tρ, see Elliott, Rothenberg, and Stock (1996). Columns correspond to consistentlyestimated long-run correlations r between shocks εt and ut.
r = −0.5 r = −0.4 r = −0.3 r = −0.2 r = −0.1tρ cL cU cL cU cL cU cL cU cL cU-12.00 −243.45 −213.70 −246.30 −214.75 −242.27 −216.72 −243.95 −218.79 −236.80 −222.30-11.50 −230.23 −199.67 −232.39 −200.89 −228.74 −202.82 −229.72 −204.81 −223.26 −208.09-11.00 −217.01 −185.64 −218.47 −187.04 −215.21 −188.91 −215.49 −190.82 −209.72 −193.87-10.50 −203.79 −171.62 −204.55 −173.18 −201.69 −175.01 −201.26 −176.84 −196.18 −179.65-10.00 −190.57 −157.80 −190.63 −159.26 −188.16 −160.99 −187.03 −162.73 −182.64 −165.48-9.75 −183.96 −151.05 −183.67 −152.68 −181.40 −154.41 −179.92 −155.94 −175.87 −158.49-9.50 −177.35 −144.31 −176.71 −145.84 −174.63 −147.53 −172.80 −149.14 −169.10 −151.71-9.25 −170.74 −137.72 −169.75 −139.22 −167.84 −140.88 −165.50 −142.47 −161.72 −144.81-9.00 −163.56 −131.16 −162.33 −132.60 −160.33 −134.25 −158.23 −135.78 −154.72 −138.12-8.75 −156.33 −124.80 −155.23 −126.20 −153.23 −127.72 −151.19 −129.20 −147.79 −131.54-8.50 −149.11 −118.45 −148.04 −119.76 −146.24 −121.26 −144.11 −122.71 −140.73 −125.01-8.25 −142.14 −112.05 −141.02 −113.40 −139.09 −114.89 −137.19 −116.30 −133.97 −118.61-8.00 −135.07 −105.89 −134.15 −107.20 −132.40 −108.67 −130.35 −110.05 −127.18 −112.21-7.75 −128.09 −99.92 −127.19 −101.18 −125.49 −102.58 −123.73 −103.86 −120.55 −105.96-7.50 −121.47 −93.92 −120.59 −95.14 −118.94 −96.53 −117.03 −97.82 −114.12 −99.87-7.25 −114.79 −88.22 −113.94 −89.43 −112.23 −90.70 −110.45 −91.89 −107.55 −93.90-7.00 −108.21 −82.43 −107.33 −83.58 −105.73 −84.90 −104.08 −86.21 −101.34 −88.19-6.75 −101.91 −77.00 −101.10 −78.07 −99.54 −79.26 −97.82 −80.43 −95.11 −82.29-6.50 −95.61 −71.57 −94.83 −72.65 −93.36 −73.82 −91.75 −75.01 −89.16 −76.79-6.25 −89.47 −66.27 −88.74 −67.30 −87.40 −68.47 −85.81 −69.58 −83.18 −71.36-6.00 −83.53 −61.23 −82.78 −62.23 −81.35 −63.28 −79.84 −64.28 −77.38 −65.99-5.75 −77.62 −56.29 −76.90 −57.17 −75.65 −58.18 −74.31 −59.24 −71.88 −60.92-5.50 −72.03 −51.44 −71.35 −52.35 −70.15 −53.49 −68.77 −54.50 −66.46 −55.97-5.25 −66.56 −46.94 −65.86 −47.84 −64.68 −48.78 −63.37 −49.67 −61.21 −51.13-5.00 −61.22 −42.49 −60.61 −43.29 −59.47 −44.20 −58.20 −45.14 −56.13 −46.58-4.75 −56.02 −38.29 −55.47 −39.08 −54.46 −39.97 −53.28 −40.81 −51.21 −42.12-4.50 −51.11 −34.19 −50.53 −34.93 −49.50 −35.80 −48.43 −36.63 −46.54 −37.91-4.25 −46.37 −30.36 −45.82 −31.07 −44.85 −31.90 −43.82 −32.68 −42.08 −33.84-4.00 −41.88 −26.70 −41.38 −27.39 −40.49 −28.15 −39.51 −28.87 −37.77 −30.00-3.75 −37.49 −23.29 −37.02 −23.91 −36.16 −24.61 −35.25 −25.31 −33.69 −26.39-3.50 −33.38 −20.05 −32.93 −20.65 −32.11 −21.31 −31.26 −21.95 −29.77 −22.97-3.25 −29.47 −17.03 −29.07 −17.60 −28.34 −18.24 −27.50 −18.84 −26.13 −19.78-3.00 −25.85 −14.25 −25.47 −14.76 −24.76 −15.34 −23.98 −15.89 −22.68 −16.79-2.75 −22.36 −11.67 −22.00 −12.14 −21.36 −12.69 −20.67 −13.21 −19.50 −14.00-2.50 −19.15 −9.29 −18.82 −9.77 −18.21 −10.25 −17.59 −10.74 −16.52 −11.47-2.25 −16.15 −7.23 −15.85 −7.65 −15.32 −8.08 −14.75 −8.52 −13.75 −9.17-2.00 −13.43 −5.35 −13.16 −5.70 −12.67 −6.13 −12.12 −6.52 −11.26 −7.11-1.75 −10.97 −3.73 −10.71 −4.03 −10.25 −4.38 −9.80 −4.79 −9.01 −5.31-1.50 −8.75 −2.31 −8.49 −2.60 −8.12 −2.92 −7.74 −3.24 −6.98 −3.72-1.25 −6.79 −1.15 −6.59 −1.41 −6.26 −1.67 −5.86 −1.96 −5.25 −2.37-1.00 −5.12 −0.23 −4.94 −0.46 −4.65 −0.69 −4.30 −0.93 −3.76 −1.29-0.75 −3.76 0.45 −3.61 0.27 −3.33 0.05 −3.03 −0.14 −2.56 −0.45-0.50 −2.74 0.92 −2.59 0.77 −2.33 0.57 −2.10 0.40 −1.67 0.13-0.25 −1.96 1.22 −1.83 1.08 −1.61 0.92 −1.39 0.77 −1.05 0.520.00 −1.34 1.47 −1.24 1.32 −1.07 1.17 −0.89 1.03 −0.57 0.820.25 −0.89 1.67 −0.81 1.53 −0.65 1.39 −0.48 1.25 −0.21 1.050.50 −0.52 1.83 −0.44 1.72 −0.31 1.57 −0.16 1.44 0.08 1.251.00 0.05 2.11 0.11 1.99 0.22 1.87 0.33 1.76 0.53 1.571.50 0.43 2.33 0.49 2.23 0.58 2.10 0.69 1.99 0.87 1.832.00 0.75 2.52 0.79 2.41 0.87 2.30 0.97 2.19 1.12 2.032.50 0.99 2.69 1.03 2.58 1.10 2.47 1.19 2.36 1.34 2.203.00 1.18 2.84 1.22 2.73 1.29 2.61 1.38 2.51 1.53 2.353.50 1.35 2.97 1.38 2.86 1.46 2.75 1.54 2.64 1.67 2.494.00 1.50 3.08 1.53 2.99 1.59 2.87 1.67 2.77 1.81 2.61
42
Table 2: Relation between Stock Index Returns and Payout RatiosThe table reports 90% confidence intervals for the slope βx in the model rt+1 = β0 +βxxt+ εt, where
rt+1 is the annual CRSP value-weighted excess return and xt is one of the measures of the payout toshareholders. Sample: annual observations from 1926 to 2003 (for payout ratios and net payout) andfrom 1926 to 2010 for the dividend yield and earnings yield. The estimates for the Q-test and the LWtest are the midpoints of the confidence intervals. The estimate for the t-test is the Stambaugh’s (1999)corrected OLS slope. Asterisks denote significance at the 5% level for the one-sided tests and at the10% level for the two-sided tests.
xt−1 t-test Q-test (q = all) LW test (q = 10) LW test (q = 5)
Net Payout 0.44∗[ 0.29, 0.59] 0.48∗[ 0.32, 0.64] 0.44∗[ 0.25, 0.63] 0.38∗[ 0.15, 0.62]
Payout I 0.14∗[ 0.01, 0.24] 0.15∗[ 0.04, 0.27] 0.14∗[ 0.02, 0.26] 0.15∗[ 0.02, 0.27]
Payout II 0.11 [−0.01, 0.20] 0.11∗[ 0.01, 0.21] 0.10∗[ 0.00, 0.20] 0.11∗[ 0.01, 0.22]
Earning-Price 0.08 [−0.04, 0.22] 0.10 [−0.04, 0.25] 0.14 [−0.02, 0.30] 0.20∗[ 0.03, 0.37]
Dividend Yield 0.06 [−0.06, 0.15] 0.07 [−0.03, 0.17] 0.09 [−0.01, 0.20] 0.11∗[ 0.01, 0.22]
Table 3: Relation between Exchange Rate Changes and Forward Interest Rate DifferentialsThe table reports the estimates and 90% confidence intervals for the slope βx in the model st+1/12 −
st = β0 + βx(ifat−j,t,t+1 − if bt−j,t,t+1) + εt, where st is the logarithm of the exchange rate, and ifat−j,t,t+1
and if bt−j,t,t+1 are continuously compounded annualized forward interest rates (domestic and foreign) set
at time t− j from t to t+1. Denote ∆ift−j,t,t+1 ≡ ifat−j,t,t+1− if bt−j,t,t+1. The observations are monthly.Time t is in years, i.e., 1/12 stands for one month. Sample: USD/GBP, USD/DEM, USD/CHF 1979,Jan – 2012, Jul. The estimates for the Q-test and the LW test are the midpoints of the confidenceintervals. The estimate for the t-test is the Stambaugh’s (1999) corrected OLS slope. Asterisks denotesignificance at the 5% level for the one-sided tests and at the 10% level for the two-sided tests.
xt−1 t-test Q-test (q = all) LW test (q = 10) LW test (q = 5)
USD/GBP (LIBOR/swap rates)∆ift−1,t,t+1 0.06 [−0.04, 0.16] 0.06 [−0.04, 0.15] 0.06 [−0.06, 0.18] 0.04 [−0.13, 0.21]∆ift−4,t,t+1 0.09 [−0.11, 0.29] 0.11 [−0.09, 0.31] 0.16 [−0.10, 0.42] 0.22 [−0.09, 0.52]
USD/DEM (LIBOR/swap rates)∆ift−1,t,t+1 0.01 [−0.08, 0.09] 0.01 [−0.08, 0.10] 0.01 [−0.10, 0.11] −0.02 [−0.14, 0.10]∆ift−4,t,t+1 0.16 [−0.01, 0.35] 0.16 [−0.01, 0.34] 0.23∗[ 0.04, 0.43] 0.29∗[ 0.07, 0.51]
USD/CHF (LIBOR/swap rates)∆ift−1,t,t+1 0.02 [−0.09, 0.09] −0.01 [−0.10, 0.08] −0.04 [−0.14, 0.07] −0.07 [−0.19, 0.04]∆ift−4,t,t+1 0.13 [−0.04, 0.31] 0.13 [−0.04, 0.31] 0.16 [−0.03, 0.35] 0.21∗[ 0.00, 0.41]
43