semantic scholar€¦ · efficient tests for long-run predictability: do long-run relations convey...

43
Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova January 10, 2015 Abstract Short-run and long-run relations among time series can differ. In situations in which short-run constraints and information biases obscure equilibrium relations among economic variables, estimates of the long-run relations, which are free of such contaminations, become the only basis for evaluating economic hypotheses. The common approach to estimating long-run predictability has been long-horizon regressions. However, long-horizon regres- sions are not designed to extract long-run information efficiently, and the lack of accuracy often outweighs their robustness to short-run noise. This study suggests two methods for replacing long-horizon regressions. The corresponding tests can be viewed as long-run ver- sions of the Q-test by Campbell and Yogo (2006) and the nearly optimal test by Elliott, uller, and Watson (2014). We demonstrate the usefulness of long-run information in two common empirical applications. Department of Economics, Rice University, Houston, TX 77251, USA. Tel.: +1 (713) 348-5613; fax: +1 (713) 348-5278. Email: [email protected]. 1

Upload: others

Post on 25-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

Efficient Tests for Long-Run Predictability:

Do Long-Run Relations Convey Extra Information?

Natalia Sizova∗

January 10, 2015

Abstract

Short-run and long-run relations among time series can differ. In situations in which

short-run constraints and information biases obscure equilibrium relations among economic

variables, estimates of the long-run relations, which are free of such contaminations, become

the only basis for evaluating economic hypotheses. The common approach to estimating

long-run predictability has been long-horizon regressions. However, long-horizon regres-

sions are not designed to extract long-run information efficiently, and the lack of accuracy

often outweighs their robustness to short-run noise. This study suggests two methods for

replacing long-horizon regressions. The corresponding tests can be viewed as long-run ver-

sions of the Q-test by Campbell and Yogo (2006) and the nearly optimal test by Elliott,

Muller, and Watson (2014). We demonstrate the usefulness of long-run information in two

common empirical applications.

∗Department of Economics, Rice University, Houston, TX 77251, USA. Tel.: +1 (713) 348-5613; fax: +1 (713)348-5278. Email: [email protected].

1

Page 2: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

1 Introduction

We are often charged with the task of analyzing how a dependent variable, e.g., yt, responds

to shocks in the long run. For example, we might directly ask about the predictability in the

aggregated quantity yt+1 + ... + yt+H for some large horizon H or we might be interested in

measuring the effect on yt+H . Examples include tests for long-run monetary neutrality, e.g.,

Fisher and Seater (1993) and Newey and West (1994); tests of the links among exchange rates,

interest rates, and inflation, e.g., Meese and Rogoff (1988) and Mishkin (1990); and tests for

long-run predictability in equity returns, e.g., Fama and French (1988). Because it is the most

intuitive solution, it is not surprising that the usual approach to the task is to run regressions

of yt+1 + ...+ yt+H or yt+H on a set of explanatory variables. However, recent research indicates

that these long-horizon regressions provide biased and disappointingly inaccurate estimates. As

an illustration, consider the task of testing for predictability in stock market returns. The use

of long-horizon regressions for this application is motivated in part by the greater values of the

t-statistics for large horizons H . However, with valid confidence intervals 1 it has been shown

that the p-values of the tests remain roughly constant and even increase with H , e.g., Boudoukh,

Richardson, and Whitelaw (2008), Hjalmarsson (2011). The results of many prior long-run

predictability papers now come into question (e.g., Valkanov, 2003). Can the long-run relations

be estimated accurately enough to convey information that is not already evident from short-run

relations? Because long-horizon regressions do not provide the answer, the solution is to consider

more efficient estimation methods.

Several significant developments related to long-run predictability testing have recently been

reported in the literature. This predictability research focuses on the following model:

yt = βxxt−1 + εt,

xt = ρxt−1 + ut,

where (εt, ut) is a sequence of i.i.d. vectors and ρ is close to one. For this model note that

the effect of xt−1 on yt+H remains substantial over many time periods H as long as βx 6= 0.

Therefore, we say that yt is predictable by xt in the long run if βx 6= 0. The methods that

have been recently developed to efficiently test the hypothesis H0 : βx = 0 include the Q-test

(bias-adjusted OLS t-test) by Campbell and Yogo (2006), the nearly optimal test by Elliott,

Muller, and Watson (2014), and the conditionally optimal test by Jansson and Moreira (2006).

All of these tests belong to the class of quasi-likelihood (QL)-based approaches derived under

the i.i.d. assumption on (εt, ut). However, these methods can be extended to the case of serially

correlated (εt, ut).

1Valid confidence intervals account for small-sample effects due to the persistence in explanatory variables anddue to large horizons.

2

Page 3: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

From the long-run predictability perspective, the most interesting extension to the above

model involves εt that is not only serially correlated but is also predictable by prior values of xt,

creating an endogeneity problem. In this case, the short-run predictability dyt/dxt−1 6= 0 arises

from both the βxxt−1 and εt terms. In contrast, the long-run predictability dyt+H/dxt−1 6= 0 for

largeH is due only to βxxt−1. Therefore, in a model with a predictable εt, the correlation between

yt+1 and xt could be of a different magnitude, or even of a different sign, from the correlation

between yt+1 + ...+ yt+H and xt. For example, the presence of market momentum often distorts

the risk-return relationship between the stock returns and measures of the risk (see Hong and

Stein, 1999). The presence of error in the estimates of the payouts to shareholders distorts

the relationship between stock returns and payout ratios (see Boudoukh, Michaely, Richardson,

and Roberts, 2007). In the presence of a distortional monetary policy, the carry trade results

in deviations in the exchange rate changes from the interest rate differentials (see Boudoukh,

Richardson, and Whitelaw, 2013).

When the shock εt is predicted by past values of xt, QL-based methods can be corrected.

However, such corrections necessarily embed an estimation of the predictability in the shock

εt. The resulting loss of efficiency can be substantial. Consider, for example, the case in which

βx and cov(εt, xt−1) are small and have opposite signs, so that dyt/dxt−1 ≈ 0. In this case,

the task of estimating the predictability of εt is on the same level of complexity as estimating

the predictability of yt itself. In contrast, this paper considers methods that are robust to the

short-run endogeneity. One example of such a method is long-horizon regressions with large H .

However, we offer a significantly more accurate alternative.

In this paper, we propose two methods that are designed to extract the maximum amount

of information about the long run and do not rely on the short-run information. Both of these

methods are motivated by the Neyman-Pearson lemma. Under a set of conditions that includes

Gaussianity, they yield the most powerful tests and achieve the same accuracy asymptotically

under more general assumptions. As a result, we obtain sizable efficiency gains in comparison

with long-horizon regressions.

The first method that we propose is referred to as the Local Whittle (LW) test. This test is

based on the maximization of the long-run portion of the Gaussian likelihood for each ρ. The

resulting procedure resembles, in many respects, the Q-test, but it is immune to violations of the

condition E(εt|xt−1, ...) = 0. The only assumption that is required for the validity of the LW test

is that the long-run behavior of (εt, ut) is close to that of independent observations (formally, see

Assumption A).

We examine the asymptotical behavior of the LW test under the assumption that ρ is local

to one, i.e., xt is nearly integrated; thus, we follow Campbell and Yogo (2006), Jansson and

Moreira (2006), Elliott, Muller, and Watson (2014) and many others. In these studies, it was

also reasonable to allow small values of ρ by, for example, setting a threshold below which

the standard methods are applied (see Elliott, Muller and Watson, 2014). In the case of the

3

Page 4: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

long-run predictability, however, non-trivial results can arise only if xt is sufficiently persistent.

Therefore, the local-to-unity assumption is crucial. Note that when ρ is near one, the case of

βx 6= β0 corresponds to the presence of near-cointegration between the series of yt − β0xt−1 and

xt−1. Therefore, it is not surprising that the LW test is found to be linked to the frequency-

domain least-square (FDLS) estimator by Robinson (1994)2. The FDLS was previously applied

to measure the (fractional) cointegration between (fractionally) integrated series (see Marinucci

and Robinson, 2003). It appears that the same estimator can be applied to nearly integrated

series. We establish, however, that the FDLS estimator is asymptotically biased, while the

estimator that is the basis of the LW test can be viewed as a bias-adjusted alternative to the

FDLS.

The asymptotic properties of the LW test are then compared with other tests for a set of

model parameters. As a benchmark, we use the performance of the Q-test. We find that,

under assumptions that are favorable to the Q-test, the LW test’s performance is close to this

benchmark. Moreover, when the long-run and short-run dynamics differ, the LW test outperforms

the Q-test. The performance of the long-horizon regressions is unimpressive in both cases.

The second method that we consider is the nearly optimal long-run predictability test. The

first test, LW, is based on the “long-run” likelihood ratio with known ρ, which is then replaced by

conservative estimates (adjusted Bonferroni bounds). The nearly optimal long-run predictability

test is also based on the “long-run” likelihood ratio but inherently treats ρ as a nuisance param-

eter. To incorporate the nuisance parameter within the Neyman-Pearson lemma, the likelihood

under H0 is replaced by an average over the possible values of ρ with the least favorable distri-

bution of the weights (see Elliott, Muller, and Watson, 2014). As expected, the nearly optimal

test is uniformly better than the LW test asymptotically.

In the empirical part of this paper, we evaluate the performance of the long-run predictability

tests for two long-standing empirical questions: the predictability of stock returns by payout

ratios and the validity of the uncovered interest rate parity. We suggest a new version of classical

long-run predictability tables that were used to present results over a range of increasing time

horizons. The alternative long-run predictability tables are obtained by reducing the number of

frequencies that are employed in the estimation. We demonstrate that the long-run estimates

do provide statistically different information from the short-run estimates. Although similar

(and substantially more dramatic) results were previously found with long-horizon regressions,

later these results were challenged and acknowledged to be misleading due to unaccounted for

small-sample effects, e.g., Richardson and Stock (1989), Boudoukh, Richardson, and Whitelaw

(2008).

This study is organized as follows. Section 2 discusses the motivation for the LW test under

2Similar estimators have been considered in the unit-root literature. For example, Corbae, Ouliaris, andPhillips (2002) consider band spectral regressions. This paper naturally extends their results for spectral regres-sions at zero frequency to the case with ρ 6= 1 and endogenous xt.

4

Page 5: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

the Gaussian assumption. Section 3 derives asymptotic distributions under general conditions,

and Section 4 describes the construction of the LW test and compares the Q-test by Campbell

and Yogo (2006), the LW test, and the t-tests in simple and long-horizon regressions based on the

asymptotic local power functions. Section 5 presents the nearly optimal long-run predictability

test and compares its asymptotic performance with the LW test. Section 6 reports the results of

long-run predictability tests in two empirical applications. Section 7 presents the conclusions.

2 The Uniformly Most Powerful Test in the Gaussian

Case When ρ is Known.

Let yt denote the variable that we are forecasting, and let xt denote the explanatory variable.

We observe a bi-variate process (xt−1, yt) for t = 1, ..., T , whose dynamics can be represented by

the following system:

yt =µy + βx(xt−1 − µx) + εt,

xt =µx + ρ(xt−1 − µx) + ut.(1)

Suppose that xt is very persistent and can be modeled as nearly integrated, i.e., ρ = ρT =

1+c/T (see Elliott and Stock, 1994). Some constructs in this section require c 6= 0, which can be

either positive or negative. The case of c = 0 is omitted for brevity. Furthermore, the asymptotic

results in the next section do not require c 6= 0. As the initial condition, assume that x0 has a

distribution that does not depend on T .

Suppose that the random and non-degenerate3 mean-zero vector of innovations et = (ut, εt)

satisfies the following conditions from Phillips (1988, p. 1023):

Assumption A. For some γ > 2 and δ > 0, E||et||γ+δ < ∞, and strong mixing coefficients αi

are such that∑∞

i=1 α1−2/γi <∞.

These conditions allow for heteroscedasticity and dependence over time. Under these condi-

tions, the functional central limit theorem holds: for the univariate case, see Herrndorf (1984),

and for the vector form, see Phillips and Durlauf (1986) and Phillips (1987).

The goal is to test a null hypothesis H0 : βx = β0. We start with a simple alternative

hypothesis H1 : βx = β1. To obtain optimality results, we rely on the following normality

assumption, which is relaxed in the derivation of asymptotic distributions:

Assumption B. Process et = (ut, εt), t = 1, ..., T is Gaussian.

3For asymptotic results, the relevant definition of nondegeneracy is that su,ε(0) is positive-definite, wheresu,ε(ω) is the spectrum of et. For optimality results, we require su,ε(ω) to be positive-definite with the determinantgreater than m > 0 for all −π ≤ ω ≤ π, to invoke the results by Dzhaparidze (1986).

5

Page 6: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

One can directly specify the most powerful (MP) test when all of the parameters in the model

are known except for βx. In this case, the MP test readily follows from the Neyman-Pearson

lemma (see Lehman and Romano, 2005, Theorem 3.2.1) and is the likelihood ratio test. Note

that the assumption of the known nuisance parameters is especially restrictive with regard to the

parameters that cannot be consistently estimated, such as ρT4. This assumption is relaxed when

constructing confidence intervals, but it is required for the derivation of the optimality results in

this section.

Campbell and Yogo (2006) derive their Q-test, which is a more powerful alternative to the

OLS t-test, under the assumption that (εt, ut) are i.i.d. normal and ρ is known. We, however,

are interested in a more general case, in which εt may depend on past values of ut. Therefore,

we allow for endogeneity, which leads to the differences in the long-run and short-run dynamics.

Our results can also be viewed as an extension to the case in which only the long-run dynamic

parameters are known.

We start from the decomposition that divides the likelihood function into the long-run and

short-run parts: this decomposition can be implemented in the frequency domain by using an

asymptotic equivalent to the Gaussian likelihood, which is known as the Whittle approximation

(see Dzhaparidze, 1986):

log LT = −T log 2π − 1

2

(T∑

j=1

log det(sx,y(ωj)) + tr(sx,y(ωj)−1Ix,yT (ωj))

),

where sx,y(ω) is the spectrum for the vector (xt−1, yt) and Ix,yT (ω) is the corresponding sam-

ple periodogram, i.e., Ix,yT (ω) = d(ω)d∗(ω) with d(ω) ≡ 1√2πT

∑Tt=1 (xt−1 − µx, yt − µy)

′ e−iωt, a

Fourier transformation of (xt−1 − µx, yt − µy), t = 1, .., T , and d∗(ω), its conjugate transpose.

The spectrum and the periodogram are both 2 × 2 matrices and are calculated at the natural

frequencies ωj = 2πjT, j = 1, 2, .., T . The operators det(.) and tr(.) denote determinants and

traces of matrices, respectively.

Dzhaparidze (1986) proves the asymptotic equivalence of the Gaussian log-likelihood and

the Whittle approximation for stationary processes under conditions in which the sample size T

greatly exceeds the decay time for the autocorrelations. Because the nearly integrated processes

are close to non-stationary, they clearly violate this assumption. In Appendix B, we analyze the

difference between the true likelihood and the Whittle approximation for our model. First, we

note that we are interested only in the conditional distribution of ytTt=1 given xt−1Tt=1, which

is also Gaussian under the Whittle approximation. Second, we compare the conditional variances

under the two log-likelihoods and find that their difference is asymptotically small. Finally, we

compare the conditional expectations and find that their difference increases with T and results

in a non-vanishing divergence between the likelihoods.

4Formally, it is the parameter c in ρT = 1 + c/T that cannot be consistently estimated.

6

Page 7: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

However, Dzhaparidze’s (1986) results can be appropriately extended to the case with ρT =

1 + c/T , as shown in Appendix B. To obtain a likelihood with an asymptotically equivalent

conditional part, we suggest adjusting the series Y = (y1−µy, .., yT −µy)′ in the Whittle approx-

imation as follows: Y = Y +Re(δy)(x0−xT ), where δy is a complex-valued vector of size T×1. For

example, for the case with i.i.d. (ut, εt), the correction includes δy = (0, ..., cov(εt, ut)/var(ut))′.

If (ut, εt) are not i.i.d., then the correction term δy has a more complex form (see Appendix B)

but does not depend on βx and is, therefore, known by our assumptions. Furthermore, we will

conveniently avoid computing δy and replace it with a simpler vector that depends only on the

long-run variance of et and yields the same asymptotic distributions for the estimators and the

tests suggested in this paper.

Moving forward, we replace Y with Y in the Whittle likelihood. We also change the summa-

tion limits from j = 1, .., T to j = ⌊T/2⌋ + 1 − T, ..,−⌊T/2⌋, which will have no effect, because

sx,y(ω) is 2π-periodic. The Whittle approximation is, therefore, redefined as follows:

log LT = −T log 2π − 1

2

⌊T/2⌋∑

j=⌊T/2⌋+1−Tlog det(sx,y(ωj)) + tr(sx,y(ωj)

−1Ix,yT (ωj))

.

Because we are interested in an estimation that is immune to the short-run endogeneity, we now

assume that the relations between ytTt=1 and xt−1Tt=1 are known only at low frequencies. In

other words, we know the spectra sx,y(ωj) up to βx for j = 0,±1, ...,±q, and no information is

available about the spectral densities sx,y(ωj) for j = ±(q + 1), ...,±⌊T/2⌋. Therefore, the MP

test will depend only on the first terms of log LT that correspond to cycles with periodicities

greater than T/q:

−1/2

q∑

j=−qlog det(sx,y(ωj))− 1/2

q∑

j=−qtr(sx,y(ωj)

−1Ix,yT (ωj)). (2)

The assumption that q is kept constant agrees with the long-horizon regression literature in which

H (the horizon) is kept constant in proportion to the sample size. The goal of such assumptions

is to correctly capture the lack of observations in small samples. Alternatively, one can allow q to

increase to ∞ in such a way that 1/q + q/T → 0. However, such an assumption does not reflect

the small magnitudes of the q values that appear to be necessary to determine the long-run

dynamics in empirical applications, see Section 6.

For model (1), the spectrum of the vector (xt−1, yt) when calculated at frequency ω equals

sx,y(ω) =

su,εuu (ω)|1−ρe−iω |2 βx

su,εuu (ω)|1−ρe−iω|2 +

su,εuε (ω)e−iω

(1−ρe−iω)

βxsu,εuu (ω)

|1−ρe−iω|2 +su,εεu (ω)e+iω

(1−ρe+iω)β2x

su,εuu (ω)|1−ρe−iω|2 + 2βxRe

(e−ωi s

u,εuε (ω)

1−ρe−iω

)+ su,εεε (ω)

, (3)

7

Page 8: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

where su,ε(ω) is the spectrum of et = (ut, εt) with the natural partition su,εuu (ω), su,εuε (ω), s

u,εεu (ω),

and su,εεε (ω). Similar notation will be used for the remaining spectra and periodogram matrices

in this paper.

We next perform a series of modifications to the likelihood (2) that remove its dependence

on the autocovariance structure of (ut, εt). Proposition 1 justifies these transitions. The first

modification is to the spectrum sx,y(ω). Note that sx,y(ω) in (3) depends on the autocovariance

of the shocks et = (ut, εt) through the spectrum of the shocks su,ε(ω). We replace su,ε(ω) when

ω ≈ 0 with its value at zero, su,ε(0), and denote the resulting spectrum of (xt−1, yt) by sx,y(ω).

The second modification is to the series Y . Note that the Y-series correction δy also depends on

the autocovariance of the shocks (see Appendix B). We suggest replacing δy with another vector

δy = (0, ..., 0, su,εuε (0)su,εuu (0)

)′ that corresponds to the adjusted series Y = Y + δy(x0 − xT ).

To compare the likelihoods for these two modifications we note that the determinant

det (sx,y(ωj)) = det (su,ε(ωj)) |1 − ρe−iωj |−2 in (2) does not depend on βx. Therefore, the test

for βx will depend only on the remaining term F =∑q

j=−q tr(sx,y(ωj)

−1Ix,yT (ωj)). Similarly, we

define the component F that corresponds to the first modification with the spectrum sx,y(ω) and

the component F that corresponds to the second modification with the periodogram Ix,yT (ωj).

This final F does not depend on the autocovariances of the shocks.

The next proposition states that F, F , and F are all asymptotically equivalent when evaluated

in the vicinity of the true parameter βx.

Proposition 1. Under Assumption A, if matrix su,ε(0) is positive-definite, then the components

F , F , and F , evaluated in the O(1/T )-neighborhoods of the true βx, are all (exactly) Op(1), and

the differences have stochastic orders of Op(1T).

Under the likelihood implied by F , the Neyman-Pearson lemma yields the MP test of H0 :

βx = β0 against H1 : βx = β1, which rejects for small values of F (β1) − F (β0). This MP

test under F is asymptotically equivalent to the MP test based on the long-run component of

the original Gaussian likelihood in the following sense. Consider the O(1/T )-neighborhoods of

the true parameter β0 and re-parameterize βx = β0 + b/T . If we reverse the corrections that

correspond to transitions from the Gaussian likelihood to the Whittle approximation and from

F to F 5, then the MP rejection rule for this new likelihood is asymptotically equivalent to the

MP rejection rule based on F . The two rules take quadratic forms in b that have asymptotically

equivalent coefficients.

We next specify the uniformly most powerful (UMP) test of H0 : βx = β0 against one-sided

alternatives under F . Let s∆β be a constant equal to 1 if H1 : βx > β0, and equal to −1 if

H1 : βx < β0. Note that the part of F that depends on βx is a quadratic function proportional

5Note that we do not reverse the transition from the Whittle approximation with all frequencies to the Whittleapproximation with the first q frequencies. Therefore, some test power is lost due to the removal of the short-runinformation.

8

Page 9: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

to β2xsu,εuu (0)

∑qj=−q I

x,yT,xx(ωj) + 2βx

∑qj=−q

((e−iωj − ρ)su,εuε (0)I

x,yT,xx(ωj)− su,εuu (0)I

x,yT,xy(ωj)

). As dis-

cussed in Jansson and Moreira (2006) and in Campbell and Yogo (2006), the derivation of the

UMP test is complicated by the form of this statistic, which presents a weighted sum of two suffi-

cient statistics. However, note that the distribution of Ix,yT,xx(ω) does not depend on βx; therefore,

by the conditionality argument (see Basu (1977)), the UMP test can be derived based on the

conditional distribution of Ix,yT,xy given Ix,yT,xx (see Jansson and Moreira, 2006). The UMP test,

therefore, rejects for small values of

s∆β

q∑

j=−q

((e−iωj − ρ)su,εuε (0)I

x,yT,xx(ωj)− su,εuu (0)I

x,yT,xy(ωj)

),

or, equivalently, for large values of

s∆β

∑qj=−q

(Ix,yT,xy(ωj)− β0I

x,yT,xx(ωj)− (e−iωj − ρ) s

u,εuε (0)su,εuu (0)

Ix,yT,xx(ωj))

√∑qj=−q I

x,yT,xx(ωj)

.

Finally, substituting the definition of Y , we obtain the rejection rule,

s∆β

∑qj=−q

(Ix,yT,xy(ωj)− β0I

x,yT,xx(ωj)− (IT,xx+(ωj)− ρIx,yT,xx(ωj))

su,εuε (0)su,εuu (0)

)

√∑qj=−q I

x,yT,xx(ωj)

> K, (4)

where K is a constant defined by the significance level of the test and the cross-periodogram

IT,xx+(ω) is (2πT )−1(∑T

t=1(xt−1 − µx) exp(−iωt))×(∑T

t=1(xt − µx) exp(iωt)).

There are obvious similarities between the above test and the Q-test developed by Campbell

and Yogo (2006). The Q-test is proportional to the sample covariance of the processes xt−1 and

vt = yt − β0xt−1 − su,εuε (0)su,εuu (0)

(xt − ρxt−1) divided by the sample deviation of xt−1. The rejection rule

in (4) can be expressed as follows, s∆β

(∑qj=−q I

x,vT,vr(ωj)

)(∑qj=−q I

x,vT,xx(ωj)

)−1/2

> K. Because

the periodograms at low frequencies measure the long-run variances, the test proposed here

is proportional to the sample long-run covariance of the processes xt−1 and vt divided by the

long-run sample deviation of xt−1.

The maximum likelihood estimator (MLE) that corresponds to the likelihood with the prin-

cipal component F equals

βx,(µy) =

∑qj=−q I

x,yT,yx(ωj)− (IT,xx+(ωj)− ρIx,yT,xx(ωj))

rψ1/2εε

ψ1/2uu∑q

j=−q Ix,yT,xx(ωj)

,

where ψuu = 2πsu,εuu (0) is the long-run variance of ut, ψεε = 2πsu,εεε (0) is the long-run variance

9

Page 10: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

of εt, and r = su,εuε (0)/√su,εuu (0)s

u,εεε (0) is the long-run correlation between these two shocks. The

corresponding long-run conditional variance of εt is, therefore, ψεε|u = ψεε(1− r2).

Finally, we account for the fact that µy is unknown. It appears that to account for the

unknown mean of the Y series, it is necessary only to remove the zero frequencies from the

likelihoods. The argument proceeds as follows. As in Jansson and Moreira (2006), we note that

the testing problem for βx is invariant under the location transformation of Y . Therefore, we

consider the conditional likelihood of the maximal invariant under this transformation, which is

∆Y = (y2−y1, y3−y1, ..., yT−y1). The likelihood of this invariant can be obtained by replacing µy

with µy that maximizes the likelihood of all observations: µy =1T

∑Tt=1 yt−

sx,yyx (0)

sx,yxx (0)( 1T

∑Tt=1 xt−1−

µx). When we substitute this estimator into log Lt, the only term that is affected is the one in

which ωj = ω0 = 0. For this term, the joint periodogram of X and Y becomes the periodogram

of X andsx,yyx (0)

sx,yxx (0)X . Importantly, the corresponding tr(sx,y(0)−1Ix,yT (0) becomes Ix,xT,xx(0)/s

x,yxx (0)

and, therefore, does not depend on βx, i.e., will not appear in the tests.

Therefore, to account for µy being unknown, the only necessary modification is to remove

the term with j = 0. Thus, the UMP test involves the ratio (βx−β0)/(∑

j=±1,..,±q Ix,yT,xx(ωj))

−1/2,

where βx is the MLE for the concentrated likelihood,

βx =

∑j=±1,..,±q

(Ix,yT,yx(ωj)− (IT,xx+(ωj)− ρIx,yT,xx(ωj))

rψ1/2εε

ψ1/2uu

)

∑j=±1,..,±q I

x,yT,xx(ωj)

.

We will refer to this statistic as the Local Whittle (LW) estimator.

Note that the first component of this estimator is the FDLS estimator,∑

j=±1,..,±q Ix,yT,yx(ωj)

∑j=±1,..,±q I

x,yT,xx(ωj)

by Robinson (1994). The FDLS is the estimator of the co-movement between xt−1 and yt at

low frequencies. For a fractionally integrated xt, the FDLS consistently estimates the slope

βx. The same result holds for nearly integrated processes. The role of the second component,

−(∑

j=±1,..,±q(IT,xx+(ωj) − ρIx,yT,xx(ωj))rψ

1/2εε

ψ1/2uu

)(∑

j=±1,..,±q Ix,yT,xx(ωj))

−1, is to adjust for the asymp-

totic bias. In other words, the LW estimator is a bias-adjusted version of the FDLS, akin to the

estimator in the Q-test being a bias-adjusted version of the OLS.

3 Asymptotic Properties Under the General Assumptions

The derivation of the asymptotic distributions is based on the results listed in Appendix A that

require Assumption A (and do not require Assumption B): the partial sums T−1/2∑⌊τT ⌋

t=1 ut,

T−1/2∑⌊τT ⌋

t=1 εt, and T−1/2x⌊τT ⌋ converge jointly to the processes√ψuuW

u(τ),√ψεεW

ε(τ), and√ψuuJc(τ), where the vector process (

√ψuuW

u(τ),√ψεεW

ε(τ))′ is a Brownian motion with the

variance Ω equal to the long-run variance of the vector et, i.e., Ω = Eete′t +

∑+∞j=−∞Eete

′t−j .

Process Jc(τ) is the Ornstein-Uhlenbeck (O-U) process, with the dynamics dJc(τ) = cJc(τ)dτ +

dW u(τ) starting from zero. Theorem 1 derives the asymptotic distributions of the two frequency-

10

Page 11: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

based estimators defined in the previous section, namely, the LW and FDLS estimators.

Theorem 1. Let vector (xt−1, yt) follow the dynamics defined in system (1), with ρT = 1+ c/T ;

the initial x0 is a random variable whose distribution does not depend on T , and shocks et =

(ut, εt)′ satisfy Assumption A. Define two estimators:

βFDLSx =

∑j=±1,..,±q I

x,yT,yx(ωj)∑

j=±1,..,±q Ix,yT,xx(ωj)

(5)

and

βx =

∑j=±1,..,±q

(Ix,yT,yx(ωj)− (IT,xx+(ωj)− ρT IT,xx(ωj))

rψ1/2εε

ψ1/2uu

)

∑j=±1,..,±q I

x,yT,xx(ωj)

. (6)

For a fixed q ≥ 1, plim βx = βx and plim βFDLSx = βx and, as T → ∞, the following weak

convergence results hold:

T (βFDLSx − βx) ⇒rψ

1/2εε

ψ1/2uu

(c+ δq) +

√ψεε|uψuu

Z√∑

j=±1,..,±q

∣∣∣∫ 1

τ=0Jc(τ)e−i2πjτdτ

∣∣∣2, (7)

T (βx − βx) ⇒√ψεε|uψuu

Z√∑

j=±1,..,±q

∣∣∣∫ 1

τ=0Jc(τ)e−i2πjτdτ

∣∣∣2, (8)

where Z is a standard normal variable independent of the processes W u(τ) and Jc(τ). The term

δq is a small random variable that is defined by the integral

δq = Jc(1)

∑j=±1,..,±q

(∫ 1

τ=0Jc(τ)e

−i2πjτdτ)

∑j=±1,..,±q

∣∣∣∫ 1

τ=0Jc(τ)e−i2πjτdτ

∣∣∣2 .

Define the following LW statistic, LW (βx) = (βx − βx)

√(ψεε|u/2π)−1

[∑j=±1,..,±q I

x,yT,xx(ωj)

].

LW (βx) ⇒ N(0, 1). (9)

Analyzing the limits in (7) and (8), we see that the LW estimator removes two biases. The

first is the small-sample bias rψ1/2εε

ψ1/2uu

c, which is due to ρ 6= 1. The second is the component rψ1/2εε

ψ1/2uu

δq.

This component arises from the difference between the sample periodogram of xt and the sample

cross-periodogram of xt and xt−1. These corrections are similar to those embedded in the Q-test

of Campbell and Yogo (2006). The Q-test is based on an estimator that is equal to the OLS

11

Page 12: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

minus rψ1/2εε

ψ1/2uu

(1 − ρ) and minus a stochastic bias due to the difference between the first sample

autocovariance and the sample variance of xt.

Finally, note that the UMP test from Section 2 coincides with the Z-test based on the t-

statistic (LW statistic) for βx. The asymptotically normal distribution of this test simplifies its

application as discussed in the next section.

4 Comparison of the Tests When ρ is Unknown

The performance of the tests derived in Section 2 greatly depends on how effectively we can

bound the possible values of ρ, or, equivalently, c. Because c cannot be consistently estimated,

one typical approach is to construct conservative intervals based on a set of likely values of ρ. In

this section, we compare the performance of the Q-test, the LW test and long-horizon regressions

when the confidence intervals are constructed using Bonferroni bounds (see Cavanagh, Elliott,

and Stock, 1995).

The idea behind Bonferroni bounds is as follows. First, one constructs the (100 − α1)%

confidence interval for ρ, e.g., (ρ, ρ). Second, for each ρ in this interval, one determines (100 −α2)% confidence intervals for βx, e.g., (βx(ρ), βx(ρ)). Lastly, these two intervals are combined to

construct conservative (100− α1 − α2)% intervals for βx as a set that contains (βx(ρ), βx(ρ)) for

all ρ ∈ (ρ, ρ). The coverage of such confidence intervals cannot be less than α = α1 + α2 for any

value of c.

However, the intervals constructed in this way do not attain the nominal significance level,

(α1 + α2)%, for any value of c, unless the distribution of the test does not depend on c. These

intervals are, therefore, overly conservative and can be further adjusted to increase the power of

the tests. We use one of such approaches formally known as the adjusted Bonferroni method to

construct equal-tailed confidence intervals for βx based on βx in (6).

4.1 Algorithm

Let α = 90%. The intervals to be determined must cover βx with a probability of at least 90%,

so that the probability of each tail does not exceed 5%. Cavanagh, Elliott, and Stock (1995)

explain how to generally construct adjusted Bonferroni bounds. The goal of this subsection is

to show how to apply this method to the LW estimator. We first summarize the approach and

then proceed to the exact algorithm applied in the empirical section of this paper.

First, note that the LW estimator in (6) can be thought of as a function of ρ, i.e., βx(ρ).

The asymptotic distribution of this estimator is mixed normal with standard deviation sβ =√ψεε|u/2π(

∑j=±1,..,±q I

x,yT,xx(ωj))

−1. Without loss of generality, consider ψεε|u < 0 and, therefore,

r < 0. The adjusted Bonferroni bounds [βx, βx] will take the form βx = βx(ρU) − q100−α2/2sβ,

12

Page 13: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

βx = βx(ρL)+q100−α2/2sβ, where q100−α2/2 is the (100−α2/2)th percentile of the standard normal

distribution. For example, for α2 = 10%, the value of q100−α2/2 is 1.645.

Second, we determine the bounds [ρL, ρU ] to replace the (100− α1)% confidence interval for

ρ in the unadjusted Bonferroni method. These bounds should satisfy the condition that βx falls

in each of the tails with asymptotic probability of less than 5% for any value of ρ = 1 + cT. In

practice, the values of c used to verify this condition are put onto a grid that extends from -50 to

5. The solutions will take the form ρL = 1+ cL/T and ρU = 1 + cU/T , where cU = cU(r, tρ) and

cL = cL(r, tρ). The second parameter, tρ is a sample statistic that provides information about ρ.

We follow Campbell and Yogo (2006) in selecting the DF-GLS statistic from Elliott, Rothenberg,

and Stock (1996). Table 1 reports cU and cL for a set of values of tρ and r in the case with q = 5.

The steps for the following algorithm mirror the ones in the program provided by M. Yogo for

the Q-test 6. We describe the steps for r < 0. One can always consider the explanatory variable

−xt if r > 0.

STEP 1: Construction of the Dickey-Fuller GLS statistic tρ.

Determine the number of lags required in the autoregressive model of xt using the BIC

criterion. Define a = 1 − 7T. Construct vectors Xa = (xp, xp+1 − axp, ..., xT − axT−1)

′ and

Za = (1, 1− a, ..., 1− a)′. Regress Xa on Za to obtain the slope estimate βa.

Regress (xp+1 − xp, .., xT − xT−1)′ on (xp − βa, ..., xT−1 − βa)

′, (xp − xp−1, .., xT−1 − xT−2)′,

(xp−1 − xp−2, .., xT−2 − xT−3)′,..., (x2 − x1, ..., xT−p+1 − xT−p)

′. The OLS t-statistic for the

first slope in this regression is the Dickey-Fuller GLS statistics tρ.

STEP 2: Estimation of r and ψεε|u.

Regress xt on xt−1 and constant to obtain an estimate of ut. Regress yt on xt−1 and constant

to obtain an estimate of εt.

Fit the vector autoregression (VAR) to the estimated series of (ut, εt). Suppose, the cor-

responding polynomial in the lag operator is Ψ(L) and the variance of the VAR shocks is

Ω(S).

The 2× 2 long-run variance matrix is Ω(L) = (I −Ψ(1))−1Ω(S)(I −Ψ(1)′)−1. The estimate

of the long-run correlation between ut and εt is r = Ω(L)(1,2)(Ω

(L)(1,1)Ω

(L)(2,2))

−1/2, and the estimate

of the conditional long-run variance of εt is ψεε|u = Ω(L)(2,2)(1− r2).

6We are grateful to Motohiro Yogo for making the code available athttps://sites.google.com/site/motohiroyogo/.

13

Page 14: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

STEP 3: Construction of the adjusted Bonferroni bounds.

Find bounds cL and cU based on the values of tρ and r. Tables similar to Table 1 can be

used. Define ρL = 1+ cLT

and ρU = 1+ cUT. The adjusted Bonferroni bounds take the form:

βx = βx (ρU)− 1.645

√√√√ ψεε|u2π∑

j=±1,..,±q Ix,yT,xx(ωj)

, (10)

βx = βx (ρL) + 1.645

√√√√ ψεε|u2π∑

j=±1,..,±q Ix,yT,xx(ωj)

.

4.2 Comparison of the Asymptotic Power Functions

Having defined equal-tailed (100 − α)% confidence intervals, we can, for example, perform a

0.5α%-size test of H0 : βx = β0 against H1 : βx > β0 using the rejection rule βx > β0. Figure 1

shows a comparison of the asymptotic powers of the Q-test, the LW test, and the tests based on

OLS t-statistics in simple and long-horizon regressions 7 when α = 10% and β0 = 0. The adjusted

Bonferroni method described in the previous subsection is applied to all of these estimators 8.

First, we consider the case with E(εt|xt−1, xt−2, ...) = 0, i.e., without contamination by short-

run dynamics. In this case, xt−1 is exogenous, and the short-run and long-run relations of xt

and yt coincide. For the LW test, we select q = 10, which corresponds to removing cycles

with periodicities of less than T/10, i.e., less than 5 years in 50 years of data. For the long-

horizon regression, we choose a matching horizon H = T/10 and the number of lags for Newey-

West standard errors is L such that L ∼ H . Parameters ψεε and ψuu are set to 1. Therefore,

the comparison of the tests depends on the persistence c, the correlation r, and the level of

predictability b = βxT . As in Campbell and Yogo (2006) 9, we select c = −2 and c = −20,

r = −0.95 and r = −0.75, and we look at a range of values for b.

As expected, the Q-test yields the highest power in rejecting H0 because the LW test ignores

the short-run information. Nevertheless, the LW test is as powerful as the Q-test for three of

the four calibrations, and it performs comparably for c = −20 and r = −0.75. The LW test

outperforms the OLS t-test for all four calibrations. Lastly, among the considered four methods,

7We consider long-horizon regressions in which the aggregate quantity yt(H) = yt + ...+ yt+H−1 is regressedon xt−1. For these regressions, Valkanov (2003) works out an asymptotic theory for ρT = 1+ c/T and H/T → λ,λ > 0. Valkanov (2003) derives the distribution of the t-tests with OLS standard errors. This paper considerst-statistics that are calculated with Newey-West standard errors with a number of lags L ≥ H − 1, which aremore justifiable. The supplementary materials contain the details of the asymptotic behavior of t-statistics inlong-horizon regressions.

8Note that although the formulas for the simple OLS and long-horizon t-statistics do not depend on ρ, theirasymptotic distributions will; see, for example, Hjalmarsson (2011).

9To obtain comparable results, we use the program provided by M. Yogo on his website for the Q-test. Inaddition, we update this program to compute the LW test.

14

Page 15: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

long-horizon regressions show the weakest results: the tests based on the long-horizon OLS

estimates often yield less than half the power of the other tests.

Figure 2 shows the results for the case in which the short-run relation between yt and xt−1

differs from the long-run relation, i.e., for E(εt|xt−1, xt−2, ...) 6= 0. We consider a case in which

the correlation of εt and xt−1 has the opposite sign to βx. Such an effect might, for example,

occur in the presence of the measurement error in xt. Let the measurement error constitute

0.25% of the long-run variance of εt, while keeping the long-run variance of (ut, εt) the same as

in the previous exercise.

The power functions of the LW test and the long-horizon regressions do not change from the

previous case because neither depend on the E(εt|xt−1, xt−2, ...) = 0 assumption. However, the

asymptotic distributions of the Q-test and t-test rely on the exogeneity assumption, and therefore,

their performance is affected. It follows from Figure 2 that the LW test now outperforms the

Q-test uniformly. In fact, even the long-horizon regressions could slightly outperform the Q-test

for a small range of parameters. Note that we obtain this result with a mild deviation from the

exogeneity assumption.

One point to clarify is that the confidence intervals for the Q-test and t-test can be adjusted

to circumvent the effect of the endogeneity. However, this correction depends on the unknown

dynamics of et = (ut, εt), and the use of the estimated model is likely to affect the powers of these

methods. The frequency-domain method that is suggested here does not require adjustment: the

LW test is immune to short-run endogeneity. The same holds for the long-horizon regressions,

although as demonstrated in Figure 2, the long-horizon regressions suffer from a lack of accuracy

in determining the predictability.

5 Long-Run Nearly Optimal Test

Although the LW test is the UMP (asymptotically) in the Gaussian model with the known

persistence parameter ρ, its efficiency can be lost once we apply the Bonferroni method to

remove the dependence on ρ. Elliott, Muller, and Watson (2014) suggest a different approach

to finding the tests with optimal properties. This approach is based on asymptotically least

favorable distributions (ALFDs). Their general method works for non-standard testing problems

in which nuisance parameters affect the asymptotic distributions under H0. In our case, the

nuisance parameter is c. Elliott, Muller, and Watson (2014) consider an application of their

ALFD test to predictability studies under the assumptions of Campbell and Yogo (2006). Here,

we extend their method to the study of the long-run predictability. We derive the corresponding

nearly optimal test and compare the performance of the LW test with the resulting power bound.

Briefly, the idea behind the ALFD test is that the optimal test is the Neyman-Pearson test for

a problem with a known “distribution” of the nuisance parameter (here, c), if this distribution

15

Page 16: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

yields the minimum weighted average power (WAP). The existence of the ALFD often cannot

be proved and, even if it exists, the ALFD is unlikely to be known. Nevertheless, the numerical

method by Elliott, Muller, and Watson (2014) ensures an ε-optimality; that is, we can find a

test whose power is not more than ε below the WAP upper bound.

For the long-run version of the ALFD test here, we combine the information about the

conditional distribution of ∆Y = (y2 − y1, .., yT − y1) given X at low frequencies with the

information about the parameter ρ contained in the distribution of X . Specifically, in the

Gaussian likelihood logLT (∆Y,X) = logLT (∆Y |X) + logLT (X), we replace only the condi-

tional distribution of ∆Y given X with the asymptotically equivalent Whittle approximation,

log LT (Y,X) = log LT (∆Y |X) + logLT (X). Details that regard the derivation of the likelihood

are given in Appendix E. Importantly, the conditional likelihood can be represented by the sum

log LT (∆Y |X) =

⌈(T−1)/2⌉∑

j=1

lj(dy(ωj)|X),

where lj(dy(ωj)|X) are conditional probability densities of the Fourier transformations of Y,

dy(ωj) =∑T

t=1 yte−iωjt . As before, we rely on the semi-parametric approach that leaves densities

lj(dy(ωj)|X) for q < j ≤ ⌈(T−1)/2⌉ unspecified. As follows from the derivations in the Appendix,

the resulting conditional likelihood takes the form

log LT (∆Y |X) = logR(∆Y|X, sx,y(ωj), q < j ≤ ⌈(T− 1)/2⌉)− q log 2π

− 1

2

j=±1,..,±q

log

det(su,ε(ωj))

su,εuu (ωj)−

|dy(ωj)− βxdx(ωj)− (eiωj − ρ)su,εεu (ωj)

su,εuu (ωj)dx(ωj)|2

det(su,ε(ωj)) su,εuu (ωj)−1

where R(∆Y|X, sx,y(ωj), q < j ≤ ⌈(T − 1)/2⌉) is the remainder of the conditional likelihood

that describes the short-run dynamics, and dy(ωj) is the Fourier transformation of Y = Y +

Re(δy)(x0 − xT ), as introduced in Section 2.

The marginal distribution of X is derived by assuming normal i.i.d. ut in (1). Subsequently,

however, the argument is made that the test based on this likelihood achieves the same asymptotic

power in the more general case (Assumption A). By the assumptions, the distribution of x0 does

not depend on ρ, and therefore, the marginal likelihood equals

logLT (X) = −T − 1

2log 2π − T − 1

2logψuu −

1

2

T−1∑

t=1

(xt − µx − ρ(xt−1 − µx))2

ψuu

up to the density of x0. The parameter ψuu is defined in Section 2 as the long-run variance of ut

and coincides with the variance of ut for the i.i.d. case.

We obtain the joint likelihood log LT (Y,X) by adding the log of the conditional distribution

16

Page 17: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

to the log of the marginal distribution of X , as defined above. We express the original likelihood

in the neighborhood of βx = β0 and ρ = 1 in terms of b′ and c, where βx = β0 +1T

ψεε|u

ψuub′ 10 The

tested hypothesis, then, becomes H0 : b′ = 0. Because the nearly optimal test depends on the

likelihood ratios, the rejection rule requires only the part of the likelihood that is a function of

b′ or c: f(R|b′, c) = exp(b′Rβ + cRρ − 1

2(b′ − c r√

1−r2 )2Rββ − 1

2c2Rρρ

), where the four sufficient

statistics, R, in the practical implementation will be replaced by the asymptomatically equivalent

R = (Rβ , Rρ, Rββ, Rρρ), defined as follows:

Rβ = (ψuuψεε|u)−1/2 1

T

q∑

j=−qj 6=0

dx(ωj)(dy(ωj)− β0dx(ωj)−rψ

1/2εε

ψ1/2uu

(dx+(ωj)− dx(ωj)))∗,

Rρ =1

2

(ψ−1uuT

−1(xT−1 − x0)2 − 1

)− r(1− r2)−1/2Rβ ,

Rββ = ψ−1uuT

−2

q∑

j=−qj 6=0

dx(ωj)dx(ωj)∗,

Rρρ = ψ−1uuT

−2T−1∑

t=1

(xt−1 − x0)2.

Denote the ALFD by Λ∗(c) and the pre-specified weights for WAP by F (b′, c), with the nor-

malization∫b′,cdF (b′, c) = 1. Once the ALFD is found, the testing procedure is based on the

following rejection rule:

RALFD(R,Λ∗) =

∫b′,c

f(R|b′, c)dF(b′, c)∫cf(R|0, c)dΛ∗(c)

> Kα

,

in which the critical value Kα corresponds to the significance level α.

The search for Λ∗(c) and Kα is performed by using the asymptotic limits of R. The limit of

10Note the difference between the parameterization of βx in this section (which follows the notation of Janssonand Moreira (2006) and Elliott, Muller, and Watson (2014)) and in Section 4 (which follows the notation ofCampbell and Yogo (2006)): the relation between the localization parameters is b = b′(1− r2)ψεεψ

−1uu .

17

Page 18: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

R is referred to as R(b′, c) = (Rβ ,Rρ,Rββ,Rρρ) with the elements

Rβ =1

q∑

j=−qj 6=0

∫ 1

0

Jc(τ)e−2πijτdτ

(∫ 1

0

e−2πijτdW z(τ) + (b′ − cr

1− r2)

∫ 1

0

Jc(τ)e−2πijτdτ

)∗

,

Rρ =1

2

(J2c (1)− 1

)− r(1− r2)−1/2Rβ,

Rββ =1

q∑

j=−qj 6=0

∣∣∣∣∫ 1

0

e−2πijτJc(τ)dτ

∣∣∣∣2

,

Rρρ =

∫ 1

0

Jc(τ)2dτ,

where W z(τ) is a standard Brownian motion that is independent from the O-U process Jc(τ).

We can find an approximation for the ALFD and the corresponding Kα. Suppose that we

are interested in the one-sided α-size test with α = 5%. Without loss of generality, let b′ > 0

under the alternative. Elliott, Muller, and Watson (2014) suggest F (b′, c) that places equal

weights on the points (b′i, ci), i = 1, .., 57, where b′i = 1.645√

6−2ci1−r2 and ci = −0.0625(i− 1)2. The

approximate ALFD for the 5%-size test is also searched among mixtures of point masses ci.In Section 4, we used the interval [−50, 5] to validate the significance levels of the tests, i.e., we

allowed for small positive c values. For a fair comparison, we extend the set of ci to include

ci = +0.0625(i− 1)2, such that b′i = 1.645√

6−2ci1−r2 is defined, i.e., ci < 3.

Elliott, Muller, and Watson (2014) solve for µ∗i = log(λ∗iKα), where λ

∗i is the probability

weight that Λ∗(c) puts on ci. They note that λ∗i > 0 only for those c = ci for which the asymptotic

probability of the false test rejection, i.e., the probability of the event RALFD(R(0, ci),Λ∗) = 1

is exactly α. Therefore, the ALFD Λ∗(c) is the fixed point of the problem G(Λ) = Λ, Λ =

(λ1, .., λM), where the jth element of G : [0, 1]M → [0, 1]M is

Gj(Λ) =λj +max(0,Pr(RALFD(R(0, cj),Λ) = 1)− α)∑M

i=1 (λi +max(0,Pr(RALFD(R(0, ci),Λ) = 1)− α)).

Details of the algorithm can be found in Elliott, Muller, and Watson (2014). Using this method

with the given grid of ciMi=1, we can obtain ε-ALFD tests (for different values of the correlation

r) with ε ≤ 0.5%, i.e., the tests with the weighted power less than 0.5% from the power bound.

Figure 3 compares the powers of the LW test and the nearly optimal test for the same

parameters as in Section 4. As follows from the graph, the LW test has the same power as the

nearly optimal test for c = −20 but is less accurate for c = −2. Therefore, an additional gain in

accuracy can be achieved by applying the long-run nearly optimal test. However, the LW and

the long-run nearly optimal tests yield the same qualitative results in the empirical applications

from the next section, and consequently, only the results for the LW test will be reported.

18

Page 19: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

6 Applications

6.1 “Importance of Measuring Payout Yield”

Among the many stock return predictors, the price-dividend ratio stands out as one of the

most strongly supported by economic theory. Campbell and Shiller (1988) noted that the price-

dividend ratio is related by an accounting identity to either the changes in the future interest

rates or the changes in the future dividend growth. Because dividend changes are only weakly

predicted (see Cochrane, 2008), price-dividend movements must be caused mainly by changes

in the expected returns. In asset pricing models, the price-dividend ratio is found to be related

to the expectations of future growth (Shiller, 1981, Bansal and Yaron, 2004) and premiums

(Bollerslev, Tauchen, and Zhou, 2009). Therefore, there are strong reasons to expect that the

price-dividend ratio predicts future returns.

With regard to the data, there are a variety of methods for calculating what would be a good

equivalent to the theoretical price-dividend ratio, such as the ratio between the company market

value and the total dividend paid during the preceding year or the ratio between an adjusted

market value of the company and an adjusted value of the dividends, such as an adjustment for

stock splits. Because some companies do not pay dividends or adopt different payout policies,

many have argued in favor of replacing the price-dividend ratio by the price-earnings ratio in

empirical work. Because all of these measures are quite persistent (e.g., the dividend yield has

the first autocorrelation of 0.86 at an annual frequency), they are good candidates for nearly

integrated modeling. One can also argue that the long-run components of all of these measures

should coincide and should have the same predictive ability for the future long-term returns.

Boudoukh, Michaely, Richardson, and Roberts (2007) discuss the implications of mismeasure-

ment of the total payout in return predictability regressions. They draw a distinction between the

dividends, total payouts (which are dividends adjusted for share repurchases), and net payouts

(which are dividends adjusted for share repurchases and equity issuances). They characterize

the problem that arises in the regressing of a stock return yt on a “wrong” payout ratio as a

measurement error problem, which is consistent with the assumptions that we made for the test

comparisons in Figure 2. The regressions of future returns on the current values of different

payout ratios present a perfect road test for the methods in this study because it is plausible

only that either all or none of the payout ratios predict future returns. The annual series are

defined as follows.

The unadjusted dividend yield is the logarithm of the ratio between the price of the stock

(here, the value-weighted CRSP index) and the corresponding past dividends. The data are run

from 1926 to 2010, with the price recorded at the end of the year and dividends aggregated over

the preceding 12-month period. The total log payout yields (log payout ratios I and II) are two

versions of the yield series adjusted for common share repurchases. The net payout is based on

19

Page 20: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

the sum of dividends and share repurchases minus equity issuances. The logarithmic net payout

series are defined as follows: log(0.1 + Net Payout). The adjusted payout series are available

for the span 1926 - 2003 from the website of Michael Roberts. Lastly, the log earnings yield

is the logarithm of the ratio between the earnings in the previous 12 months and the current

price calculated at the end of the year. Monthly earnings data on the S&P 500 over the period

1926 - 2010 are obtained from the website of Robert J. Shiller. For more information on the

construction of payout yields, see Boudoukh, Michaely, Richardson, and Roberts (2007).

Series yt is calculated as the monthly CRSP (value-weighted) excess returns aggregated to one

year. For each month, we subtract the risk-free rate from the continuously compounded CRSP

return. The risk-free rates are obtained from the website of Kenneth French. The resulting series

span the period 1926-2010.

Table 2 reports the 90% confidence intervals for βx in return regressions starting from the net

payout, for which the predictability evidence is the strongest, and ending with the dividend yield,

for which the link with the future returns is the weakest. All of the tests reject the hypothesis

of no predictability for the net payout ratio and payout yield I. Only the t-test fails to reject

H0 : βx = 0 for the payout yield II. Only the LW test proves the predictability by the earnings-

price ratio and the dividend yield. For the dividend yield, to reject H0 : βx = 0, the number of

relevant frequencies in the LW estimator should be as low as q = 5.

Therefore, there is statistical evidence that all of the payout ratios considered by Boudoukh,

Michaely, Richardson, and Roberts (2007) predict returns. This evidence is consistent with pay-

outs I and II, the earning-price ratio and the dividend yield sharing the same long-run dynamics.

The long-run component of these series predicts future returns.

6.2 Spot Exchange Rate: Uncovered Interest Rate Parity (UIP) and

Carry Trade

Denote st as the logarithm of the spot exchange rate between the currencies of countries “a” and

“b”, with the value of currency “b” in the units of currency ”a”. If the iat is the nominal annual

interest rate in country “a” and ibt is the nominal annual interest rate in country “b”, then the

UIP states that the expected annual change in the spot rates should be equal to iat − ibt . This

UIP follows from the forward parity Etst+1 = ft, where ft is the forward exchange rate, and from

the covered interest rate parity ft − st = iat − ibt , which follows from the no-arbitrage condition.

Therefore, one can test the UIP by running the OLS regression

st+1 − st = β0 + βx(iat − ibt) + εt,

and testing H0 : βx = 1. Even if the forward parity does not hold, one would expect a positive

sign for βx, because the difference in the interest rates cannot be maintained in the long run

20

Page 21: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

without eventual currency depreciation. The OLS results, however, yield the negative value

of βx. Boudoukh, Richardson, and Whitelaw (2013) suggest a stylized model that explains the

observed puzzle as the result of monetary policies and the carry-trade phenomenon, although they

do not state a position regarding the reasons for the carry-trade, whether rare currency crashes,

time-varying premiums, and/or limited arbitrage. As a solution, Boudoukh, Richardson, and

Whitelaw (2013) suggest a different predictor in the UIP regression:

st+1 − st = β0 + βx(ifat−j,t,t+1 − if bt−j,t,t+1) + εt,

where ift−j,t,t+1 is the forward interest rate for the period [t, t + 1] that is set at time t− j, i.e.,

ift−j,t,t+1 = (j + 1)it−j,j+1 − jit−j,j, where it−j,j and it−j,j+1 are continuously compounded j and

j + 1-period interest rates at time t− j. They found that the sign in the regressions reverts to

positive for j = 2 − 4 years (t is in annual units). Unfortunately, the standard errors prove too

large for the results to be statistically significant. In this subsection, we reevaluate the result of

Boudoukh, Richardson, and Whitelaw (2013) by using the LW test.

Boudoukh, Richardson, and Whitelaw (2013) work with annual monthly sampled data, which

is data with overlaps. The methods considered in this paper are derived assuming no significant

overlap in the observations, with exception of the long-horizon regressions. Therefore, we test

the positive relation between the interest rate differential and the depreciation of the currency

by estimating a simpler regression model,

st+1/12 − st = β0 + βx(ifat−j,t,t+1 − if bt−j,t,t+1) + εt. (11)

That is, we seek to predict the exchange rate dynamics in the first month of the year that

corresponds to the forward rates ifat−j,t,t+1 and ifbt−j,t,t+1. The cointegration coefficient βx is now

expected to be on the order of 1/12 if the UIP holds.

Table 3 shows the results for the US dollar (USD)/British pound (GBP), USD/Deutsche

Marke (DEM), and USD/Swiss Franc (SWF) pairs. The data on interest rates are obtained

from Datastream. Forward interest rates are estimated from the yield curves that were derived

from the observations on the LIBOR rates with maturities between one and 12 months and swap

rates on LIBOR with maturities between two and five years. To construct the yields implied by

the swap rates, we rely on linear extrapolations for the missing yields that correspond to the

coupon maturities, such as 18 months. Boudoukh, Richardson, and Whitelaw (2013) use the

same data set but use a less transparent method that is based on cubic extrapolation. The data

are recorded on the last trading day of each month. The resulting sample starts in January 1979

and ends in July 2012. The data are then aligned in accordance with model (11).

Table 3 reports 90% confidence intervals and point estimates for βx in (11). As follows from

the table, forward rate differentials with j = 4 do positively correlate with future currency

depreciation. The evidence is mixed for j = 1. The Q-test and OLS t-test yield close results,

21

Page 22: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

which is explained by the the modest correlation of the residuals εt with the innovations to

ifat−j,t,t+1 − if bt−j,t,t+1. Neither method is informative about the sign of βx, because all of the

confidence intervals include both negative and positive values. Again, the strongest result is

from the LW test, with q = 5 frequencies. According to this test, the βx in the regressions for

USD/DEM and USD/CHF, and the forward rates with j = 4, are significantly positive. The

estimates of βx (annual units) in these regressions are 3.12 and 2.1 when aggregated to the annual

units by using small-sample corrected estimates of ρ: 0.98 for DEM and 0.96 for CHF. These

estimates are surprisingly close to the regression results of Boudoukh, Richardson, and Whitelaw

(2013, Table 2). However, our results are statistically significant.

Boudoukh, Richardson, and Whitelaw (2013) explain why the forward rates with the longest

horizons are cleaner measures of the future exchange rate changes. These forward rates contain

less of the second component, which is a deviation from the purchasing power parity. For the

statistical properties of this missing component, Gospodinov (2009) argues that the results in

various empirical studies are consistent with the presence of a very persistent omitted variable,

which is often referred to as the forward premium. It is not surprising, therefore, that the LW

test fails to support UIP for j = 1, because it is designed to remove only transient effects. To

summarize, even though the LW test cannot remove the omitted variable bias due to the forward

premium in the interest rate differentials, this test still offers an improvement with less-affected

measures of the interest rate differentials, such as the lagged forward interest rate differentials.

7 Conclusions

We suggest a new estimation method and the associated (Local Whittle) test, which serve the

same purpose as long-horizon regressions, to test for long-run predictability. This test provides

higher power in rejecting the no-predictability hypothesis. We demonstrated that this test is

similar to the Q-test in power and is immune to the short-run dynamics that can bias the

estimator that underlies the Q-test. The accuracy of the long-run predictability testing can be

further improved by using the new long-run nearly optimal test.

We evaluate the performance of the tests in two applications: a test for the predictability

in the stock returns by the payout ratios and a test for the predictability in the exchange rate

changes by the interest rate differentials. The confidence intervals based on the LW test are

usually close to those based on the Q-test and strengthen the predictability evidence in both

cases. For example, the LW test confirms the predictability of the returns by the dividend yield

in the 1926 - 2010 sample. The LW test also confirms the positive sign of the relationship

between the exchange rate changes and the past forward interest rate differentials. Therefore,

the long-run relations do carry extra information that is useful for studying economic relations

and that is accurate enough for performing formal statistical tests.

22

Page 23: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

References

[1] Bansal, R. and A. Yaron, 2004, Risks For The Long Run: A Potential Resolution of Asset

Pricing Puzzles. Journal of Finance, 59, 1481 - 1509.

[2] Basu, Debarata, 1977, On the Elimination of Nuisance Parameters. Journal of the American

Statistical Association, 72(358), 355 - 366.

[3] Bollerslev, T., Tauchen, G. and H. Zhou, 2009, Expected Stock Returns and Variance Risk

Premia. Review of Financial Studies, 22, 4463 - 4492.

[4] Boudoukh, Jacob, Michaely, Roni, Richardson, Matthew, and Michael R. Roberts, 2007,

On the Importance of Measuring Payout Yield: Implications for Empirical Asset Pricing.

Journal of Finance, 62(2), 877 - 915.

[5] Boudoukh, Jacob, Richardson, Matthew, and Robert Whitelaw, 2008, The Myth of Long-

Horizon Predictability. Review of Financial Studies, 21(4), 1577 - 1605.

[6] Boudoukh, Jacob, Richardson, Matthew, and Robert Whitelaw, 2013, New Evidence on the

Forward Premium Puzzle. Working paper.

[7] Brillinger, David R., 1975, Time Series Analysis. Data Analysis and Theory. Holt, Rinehart

and Winston, New York.

[8] Campbell, John Y., and Robert J. Shiller, 1988, Stock Prices, Earnings, and Expected Divi-

dends. Journal of Finance, 43(3), 661 - 676.

[9] Campbell, John Y., and Motohiro Yogo, 2006, Efficient tests of stock return predictability.

Journal of Financial Economics, 81(1), 27 - 60.

[10] Cavanagh, Christopher L., Elliott, Graham, and James H. Stock, 1995, Inference in Models

with Nearly Integrated Regressors, Econometric Theory, 11(05), 1131 - 1147.

[11] Cochrane, J. H., 2008, The Dog That Did Not Bark: A Defense of Return Predictability.

Review of Financial Studies, 21, 1533 - 1575.

[12] Davis, Philip J., 1979, Circulant Matrices, Whiley, New Jork.

[13] Dzhaparidze, Kacha, 1986, Parameter Estimation and Hypothesis Testing in Spectral Anal-

ysis of Stationary Time Series. Springer, New York.

[14] Durlauf, Steven N., and Peter C. B. Phillips, 1986, Multiple Time Series Regression with

Integrated Processes. Review of Economic Studies, 53(4), 473 - 495.

23

Page 24: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

[15] Gospodinov Nikolay, 2009, A New Look at the Forward Premium Puzzle, Journal of Finan-

cial Econometrics, 7 (3), 312 - 338.

[16] Elliott, Graham, Muller, Ulrich, and Mark Watson, 2014, Nearly Optimal Tests when a Nui-

sance Parameter is Present Under the Null Hypothesis. Working paper, Princeton University.

[17] Elliott, Graham, Rothenberg, Thomas J., and James H. Stock, 1996, Efficient Tests for an

Autoregressive Unit Root. Econometrica, 64, 813 836.

[18] Elliott, Graham, and James H. Stock, 1994. Inference in time series regression when the

order of integration of a regressor is unknown. Econometric Theory, 10, 672700.

[19] Fama, E. F. and K. R. French, 1988, Dividend Yields and Expected Stock Returns. Journal

of Financial Economics, 22, 3 - 25.

[20] Fisher, Mark E., and John J. Seater, 1993, Long-Run Neutrality and Superneutrality in an

ARIMA Framework. American Economic Review, 83, 402 415.

[21] Hjalmarsson, Erik, 2011, New Methods for Inference in Long-Horizon Regressions, Journal

of Financial and Quantitative Analysis, 46 (3), 815-839.

[22] Ibragimov, Ildar A., and Yuri V. Linnik, 1971, Independent and Stationary Sequences of

Random Variables. Wolters-Noordhoff Pubishing, Groningen.

[23] Ibragimov, Ildar A., and Yurii A. Rozanov, 1971, Gaussian Random Processes. Springer-

Verlag, New York.

[24] Jansson, Michael, and Marcelo Moreira, 2006, Optimal Inference in Regression Models with

Nearly Integrated Regressors. Econometrica, 74, 681 - 714.

[25] Hamilton, James H., 1994, The Time Series Analysis. Princeton University Press, Princeton.

[26] Herrndorf, N., (1984), A Functional Central Limit Theorem for Weakly Dependent Se-

quences of Random Variables. The Annals of Probability, 12(1), 141 - 153.

[27] Hong, Harrison, and Jeremy C. Stein, 1999, A Unified Theory of Underreaction, Momentum

Trading, and Overreaction in Asset Markets. Journal of Finance, 54(6), 2143 2184.

[28] Lehmann, Erich L., and Joseph P. Romano, 2005, Testing Statistical Hypotheses. Springer,

New York.

[29] Mishkin, Frederic S., 1990, What Does the Term Structure of Interest Rates Tell Us about

Future Inflation? Journal of Monetary Economics 70, 1064 1072.

24

Page 25: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

[30] Phillips, Peter C. B., 1987, Towards a Unified Asymptotic Theory for Autoregression.

Biometrika, 74(3), 535 - 547.

[31] Phillips, Peter C. B., 1987, Asymptotic Expansions in Nonstationary Vector Autoregres-

sions. Econometric Theory, 3(1), 45 - 68.

[32] Phillips, Peter C. B., 1988, Regression Theory for Near-Integrated Time Series. Economet-

rica, 56(5), 1021 - 1043.

[33] Phillips, Peter C. B., and Bruce E. Hansen, 1990, Statistical Inference in Instrumental

Variables Regression with I(1) Processes. Review of Economic Studies, 57, 99 - 125.

[34] Richardson, Matthew, and James Stock, 1989, Drawing Inferences from Statistics Based on

Multi-Year Asset Returns. Journal of Financial Economics, 25, 323 48.

[35] Robinson, Peter M., 1994, Semiparametric Analysis of Long-Memory Time Series. Annals

of Statistics, 22 , 515 - 539.

[36] Robinson, Peter M., and Domenico Marinucci, 2003, Semiparametric Frequency Domain

Analysis of Fractional Cointegration. Time Series with Long Memory, Robinson, P.M. (Ed.),

Oxford University Press, Oxford, 334 - 373.

[37] Stambaugh, Robert F., 1999, Predictive Regressions. Journal of Financial Economics, 54,

375 - 421.

[38] Valkanov, Rossen, 2003, Long-Horizon Regressions: Theoretical Results and Applications.

Journal of Financial Economics, 68, 201 - 232.

A Results for Reference

Most of the proofs in this paper rely on the results by Phillips (1987, 1988) and the related

statements gathered in this list. All of the following weak convergence results hold jointly.

Result 1 Under Assumption A, Phillips (1987, Lemma 1, and 1988, Lemma 3.1) proved that

the process 1√ψuu

X⌊τT⌋√T

, τ ∈ [0, 1] weakly converges to the O-U process Jc(τ) start-

ing from Jc(0) = 0. The O-U process follows the dynamics dJc(τ) = cdτ + dW uτ ,

where W uτ is a standard Brownian motion. By the functional central limit theorem

(FCLT), 1√ψuu

u⌊τT⌋√T

⇒ W uτ and 1√

ψεε

ε⌊τT⌋√T

⇒ W ετ , where the vector process (W u

τ ,Wετ )

is a two-dimensional Brownian motion with correlation r = su,εuε (0)(su,εεε (0)s

u,εuu (0))

−1/2

and unit marginal variances. Note that the same result holds for the demeaned process1√ψuu

(X⌊τT⌋−µx)√T

⇒ Jc(τ) because µx/√T → 0.

25

Page 26: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

Result 2 From the continuous mapping theorem (CMT) and Result 1, it follows that the Fourier

transformation ofX = (x0, .., xT−1) calculated at frequency ωj0 = 2πj0/T , i.e., dx(ωj0) =1√2πT

∑Tt=1 xt−1e

−iωj0t, has the following limit,

dx(ωj0)

T=

1√2π

∫ 1

τ=0

x⌊τT ⌋√Te−i

2πj0(⌊τT⌋+1)T dτ ⇒

√ψuu2π

∫ 1

τ=0

Jc(τ)e−i2πj0τdτ,

where the convergence holds jointly across dx(ωj0), j0 = 1, .., q.

Result 3 Define Sε,t = εt+ ..+ε1. Then, the Fourier transformation of εt has the following repre-

sentation:∑T

t=1 εte−iωj0

t

√2πT

= T (eiωj0−1)√2π

∑Tt=1

Sε,t−1√Te−2πj0i

tT

1T+ 1√

Sε,T√T. Because the FCLT

holds for the partial sums Sε,t and limT→∞ T (eiωj0 − 1) = i2πj0, by applying the CMT

we obtain∑T

t=1 εte−iωj0

t

√2πT

⇒ j0i√2πψεε

∫ 1

τ=0W y(τ)e−2πj0iτdτ+

√ψεε

2πW y(1), and, therefore,

∑Tt=1 εte

−iωj0t

√2πT

⇒√

ψεε

∫ 1

τ=0e−2πj0iτdW y(τ), where the convergence holds jointly across

dε(ωj0), j0 = 1, .., q.

Result 4 For any two processes, (w1,t, w2,t), which satisfy the conditions for the bi-variate FCLT

(e.g., Assumption A) partial sums 1√T

∑⌊τT ⌋t=1 w1,t and 1√

T

∑⌊τT ⌋t=1 w2,t converge jointly

with the Fourier transformations∑⌊τT⌋

t=1 w1,te−iωj0

t

2π√T

and∑⌊τT⌋

t=1 w2,te−iωj0

t

2π√T

to the processes

Ww1(τ),Ww2(τ),12π

∫ 1

τ=0e−2πj0iτdWw1(τ), and

12π

∫ 1

τ=0e−2πj0iτdWw2(τ), respectively, where

Ww(τ) = (Ww1(τ),Ww2(τ))′ is a Brownian process with a variance equal to the long-run

variance of (w1,t, w2,t). The proof is obtained by using the representation for the Fourier

transformations in Result 3.

B Asymptotic Equivalence of the Whittle Approximation

and the Gaussian Likelihood

Without loss of generality, let µx = 0, µy = 0, and the shocks εt and ut have unit variances. As

an illustration, consider first the case of (εt, ut), which are normal i.i.d. with the correlation r.

Conditional on X = (x0, .., xT−1)′, the distribution of the vector Y = (y1, ..., yT )

′ is normal with

the mean E(Y |X) = ΠX ,

Π = βxIT + r

−ρ 1 0 · · · 0

0 −ρ 1 · · · 0

0 0 −ρ · · · 0...

......

. . ....

0 0 0 · · · 0

,

26

Page 27: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

where IT is the T × T identity matrix. The conditional variance of Y is the matrix B =

diag(1 − r2, .., 1 − r2, 1). For the Whittle approximation, denote∑T

j=1 tr(sx,y(ωj)

−1Ix,yT (ωj) =

(X ′Y ′)Ω−1(X ′Y ′)′, where Ω is the variance-covariance of the vector (X ′Y ′)′ under the “Whittle

likelihood”. Matrix Ω also holds information about the conditional moments. Note that matrix

Ω−1 can be comformably partitioned into four blocks, T ×T each, Axx, Axy,Ayx, and Ayy, where

the elements of the blocks are as follows: Axx(t, s) =∑T

j=1sx,yyy (ωj)

det(sx,y(ωj))e−iωj(t−s) 1

2πT, Ayy(t, s) =

∑Tj=1

sx,yxx (ωj)

det(sx,y(ωj))e−iωj(t−s) 1

2πT, Axy(t, s) = −Re

∑Tj=1

sx,yyx (ωj)

det(sx,y(ωj))e−iωj(t−s) 1

2πT, and Ayx(t, s) =

Axy(s, t). Let Ωxx, Ωxy, Ωyx, Ωyy be the corresponding partition of the matrix Ω. The conditional

expectation of Y under the Whittle likelihood is E(Y |X) = ΠX , where Π = ΩyxΩ−1xx = −A−1

yy Ayx.

Furthermore, Var(Y |X) = B = Ωyy − ΩyxΩ−1xx Ωxy = A−1

yy . Lastly, substitute the formula for the

spectrum and obtain that B = (1− r2)IT , and

Π = βxIT + r

−ρ 1 0 · · · 0

0 −ρ 1 · · · 0

0 0 −ρ · · · 0...

......

. . ....

1 0 0 · · · −ρ

.

The difference between the Gaussian and Whittle log likelihoods will depend on the following

term:

(Y − ΠX)′B−1(Y − ΠX)− (Y − ΠX)B−1(Y −ΠX) =

2X ′(Π− Π)′B−1(Y −ΠX)+X ′(Π− Π)′B−1(Π− Π)X− (Y −ΠX)′(B−1− B−1)(Y −ΠX).

The last term is simply ε2T (1− 11−r2 ) and is, therefore, Op(1). That is, the small difference between

B and B does not result in an asymptotically significant difference between the likelihood and

its approximation. However, for the remainder of the terms to be bounded in probability, it is

necessary that EX ′(Π− Π)′B−1(Π− Π)X be bounded. Note, however, that X ′(Π− Π)′B−1(Π−Π)X = r2

1−r2 (ρxT−1 − x0)2. For a nearly integrated xt, as follows from Result 1 in Appendix

A, this term is of the stochastic order Op(T ). Therefore, the Whittle approximation to the

conditional distribution of Y is not asymptotically equivalent to the Gaussian likelihood.

To obtain an asymptotically equivalent approximation, consider the transformation Y ≡ Y +

(0, .., r(x0−ρxT−1))′. Then, Y −ΠX = Y − ΠX , and therefore, the Whittle likelihood calculated

for Y and X is equivalent to the original Gaussian likelihood for Y and X11. Alternatively, for

11Note that the difference between the time-domain and the modified frequency-domain log-likelihoods involvesonly the term ε2T (1− 1

1−r2), which does not depend on the persistence parameter c in ρ = 1+ c/T . Therefore, the

convergence is uniform in c. Also, note that the Whittle likelihood is not defined for ρ = 1, but the argumentsin this Appendix can be extended to the case with ρ = 1 by continuity. For example, note that Π and B in theconditional Whittle distribution of Y are well-defined for ρ = 1.

27

Page 28: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

a nearly integrated xt, we can consider the following asymptotically equivalent transformation

Y ≡ Y + (0, .., 0, r(x0 − xT ))′ because xT − ρxT−1 ≡ uT ∼ Op(1).

In the general case, when vectors (ut, εt) are not i.i.d.,

B−1 = Ayy =

[T∑

j=1

su,εuu (ωj)

det(su,ε(ωj))e−iωj(t−s) 1

2πT

]

t,s

,

and therefore, Ayy does not depend on ρT . As follows from Dzhaparidze (1986), (Y−E(Y |X))(B−1−B−1)(Y −E(Y |X)) ∼ Op(1), for a constant ρ < 1. Therefore, this relation also holds for any ρT .

Furthermore,

−Ayx = βxAyy + Re

[1

2πT

T∑

j=1

su,εuε (ωj)

det(su,ε(ωj))(e−iωj(t−s+1) − ρe−iωj(t−s))

]

t,s

.

Therefore, ΠX = βxX + A−1yy Re(T1)U + A−1

yy Re(T2)(x0 − xT ), where U = (u1, .., uT )′, T1

is a matrix T × T with the elements T1(t, s) = 12πT

∑Tj=1

su,εuε (ωj)

det(su,ε(ωj))e−iωj(t−s), and T2 is a

T × 1 vector with the elements T2(t) = 12πT

∑Tj=1

su,εuε (ωj)

det(su,ε(ωj))e−iωjt. Therefore, E(Y |X) =

βxX− A−1εε AεuU + A−1

yy Re(T2)(x0−xT ). The second term in the above expression is the expecta-

tion of E = (ε1, ..., εT)′ conditional on U implied by the Whittle approximation to the Gaussian

likelihood for the observations (ut, εt), t = 1, .., T . As follows from Dzhaparidze (1986), this sec-

ond term converges to E(E|U) and (E(E|U) − E(E|U))′Var−1(E|U)(E(E|U) − E(E|U)) = O(1).

The difference between the Gaussian and Whittle log likelihoods is, therefore, Op(1) + (x0 −xT )

2Re(T ′2)A

−1yy Re(T2)+2(x0−xT ) Re(T ′

2)(Y −ΠX). We conclude that this Whittle approxima-

tion and the Gaussian likelihood are not equivalent. However, we can construct an equivalent

Whittle approximation if we consider the transformed series Y = Y +Re(δy)(x0 − xT ), where δy

is a T × 1 complex vector equal to A−1yy T2.

The vector δy, which is necessary to obtain the transformed series Y , depends on all of

the parameters of the model for (ut, εt), including the parameters of the short-run dynamics.

Therefore, it is useful to derive an alternative transformation Y in such a way that it gives the

same asymptotic properties of the frequency-based estimators considered in this study, but it

must depend only on the parameters of the long-run dynamics, i.e., only on su,ε(0). We suggest

Y ≡ Y + (0, .., su,εuε (0)su,εuu (0)

(x0 − xT ))′ and prove that for a fixed j0, the Fourier transformation of Y at

frequency ωj0 =2πj0T

is asymptotically equivalent to the Fourier transformation of Y . The latter

statement readily follows from the following lemma.

Lemma 1. Under Assumption A,∑T

t=1 eiωj0

t Re(δy(t)) =su,εuε (ωj0

)

su,εuu (ωj0)→ su,εuε (0)

su,εuu (0).

Proof. The real part of∑T

t=1 eiωj0

tRe(δy(t)) equals12Re(∑T

t=1(eiωj0

t + e−iωj0t)δy(t)). The imagi-

28

Page 29: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

nary part of∑T

t=1 eiωj0

t Re(δy(t)) equals12Im(∑T

t=1(e−iωj0

t − eiωj0t)δy(t)). Therefore, consider

T∑

t=1

eiωj0tδy(t) =

T∑

j=1

su,εuε (ωj)

det(su,ε)(ωj)

∑Tt=1

∑Ts=1 A

−1yy (t, s)e

−i(ωjs−ωj0t)

2πT.

The result for∑T

t=1 e−iωj0

tδy(t) is derived analogously. Note that the matrix A−1yy is circulant.

That is, its elements allow for the following representation: A−1yy (t, s) = γ(s − t), and for any

k ∈ [1−T,−1], γ(k) = γ(T+k). This property of the matrix A−1yy follows from the same property

of Ayy, which is verified directly from the definition of Ayy.

Because matrix A−1yy is circulant, all of the sums

∑Tt=1

∑Ts=1 A

−1yy (t, s)e

−i(ωs−ω0t) are zeroes as

long as ω 6= ω0 + 2πn, n = 0,±1,±2, .... Note that

T∑

t=1

T∑

s=1

γ(s− t)e−i(ωs−ω0t) =

T∑

t=1

t−T∑

k=t−1

γ(k)e−i(ω−ω0)t+iωk =

γ(0)

T∑

t=1

e−i(ω−ω0)t

︸ ︷︷ ︸0

+

T−1∑

k=1

γ(k)eiωkT∑

t=k+1

e−i(ω−ω0)t +

T−1∑

k=1−Tγ(k)eiωk

T+k∑

t=1

e−i(ω−ω0)t =

=e−i(ω−ω0)

1− e−i(ω−ω0)

(T−1∑

k=1

γ(k)eiωk(e−i(ω−ω0)k − 1) +

−1∑

k=1−Tγ(k)eiωk(1− e−i(ω−ω0)k)

).

After replacing γ(k) by γ(k + T ) in the second sum, we obtain the result∑T

t=1

∑Ts=1 γ(s −

t)e−i(ωs−ω0t) = 0. Therefore,

T∑

t=1

eiωj0tδy(t) =

su,εuε (ωj0)

det(su,ε(ωj0))

∑Tt=1

∑Ts=1 A

−1yy (t, s)e

−iωj0(s−t)

2πT.

Note also thatT∑

t=1

T∑

s=1

A−1yy (t, s)e

−iωj0(s−t) =

T∑

t=1

T∑

s=1

γ(s− t)e−iωj0(s−t) = T

T−1∑

k=0

γ(k)e−iωj0k.

The latter sum λj0(A−1yy ) ≡

∑T−1k=0 γ(k)e

−iωj0k is the eigenvalue of A−1

yy (t, s) that corresponds to

the eigenvector (1, e−iωj0 , .., e−iωj0(T−1))′ (see Davis (1979)). Because Ayy is also circulant, the

same result holds:

T∑

t=1

T∑

s=1

Ayy(t, s)e−iωj0

(s−t) = Tλj0(Ayy).

From the known relation between the eigenvalues of a matrix and its inverse, λj0(Ayy) =

29

Page 30: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

λj0(A−1yy )

−1, we obtain

T∑

t=1

eiωj0tδy(t) =

su,εuε (ωj0)

det(su,ε(ωj0))

T/2π∑Tt=1

∑Ts=1 Ayy(t, s)e

−iωj0(s−t)

Lastly, substituting the definition of Ayy(t, s), we obtain

T∑

t=1

T∑

s=1

Ayy(t, s)e−iωj0

(t−s) =

T∑

t=1

T∑

s=1

Ayy(t, s)e−iωj0

(s−t) =T

su,εuu (ωj0)

det(su,ε(ωj0)),

and∑T

t=1 eiωj0

tδy(t) =su,εuε (ωj0

)

su,εuu (ωj0). The same result holds for

∑Tt=1 e

−iωj0tδy(t). Therefore,

∑Tt=1 e

iωj0tRe(δy(t)) =

su,εuε (ωj0)

su,εuu (ωj0). The convergence to su,εuε (0)

su,εuu (0)follows because Assumption A implies

the continuous spectrum (see Phillips (1988)).

C Proof of Proposition 1

Proof. Express the principal components F, F , and F as follows:

F =

q∑

j=−q

|1− ρe−iωj |2det(su,ε(ωj))

(sx,yxx (ωj)I

x,yT,yy(ωj) + sx,yyy (ωj)I

x,yT,xx(ωj)− 2Re

(sx,yxy (ωj)I

x,yT,yx(ωj)

)),

F =

q∑

j=−q

|1− ρe−iωj |2det(su,ε(0))

(sx,yxx (ωj)I

x,yT,yy(ωj) + sx,yyy (ωj)I

x,yT,xx(ωj)− 2Re

(sx,yxy (ωj)I

x,yT,yx(ωj)

)),

F =

q∑

j=−q

|1− ρe−iωj |2det(su,ε(0))

(sx,yxx (ωj)I

x,yT,yy(ωj) + sx,yyy (ωj)I

x,yT,xx(ωj)− 2Re

(sx,yxy (ωj)I

x,yT,yx(ωj)

)).

I. Under Assumption A, the limit su,ε(ωj) → su,ε(0) is finite and well-defined. Therefore,

det(su,ε(ωj)) ∼ O(1) (but not o(1), because det(su,ε(0)) 6= 0). Consider now the difference

su,ε(ωj) − su,ε(0). Assumption A implies that the αi coefficients should be at most O(i−1−γ(h)),

for any 0 < γ(h) < 2/(γ − 2). The strong mixing condition also implies the complete linear

regularity of et = (ut, εt) (see Ibragimov and Rozanov, 1978) with linear regularity coefficients

that satisfy the same condition as the strong mixing coefficients. Therefore, Theorem 8 (p. 181)

of Ibragimov and Rozanov (1978) implies that the spectrum su,ε(ω) has at least one derivative.

Therefore, su,ε(ωj)− su,ε(0) ∼ Op(1/T ).

II. Let F2(ωj) = sx,yxx (ωj)Ix,yT,yy(ωj) + sx,yyy (ωj)I

x,yT,xx(ωj) − Re(sx,yxy (ωj)I

x,yT,yx(ωj)). Define F2(ωj)

and F2(ωj) analogously. To analyze the convergence behavior of the elements F2(ωj) in a neigh-

borhood of the true parameters, it suffices to evaluate F2(ωj) at the true parameters. At the

true value of βx, F2(ωj) =su,εuu (ωj)

|1−ρT e−iωj |2 Ix,εT,εε(ωj) + su,εεε (ωj)I

x,εT,xx(ωj)− 2Re su,εuε e

−iωj

1−ρT e−iωjIx,εT,εx(ωj), where

the series εt = ε+ (x0 − xT ) Re(δy(t)).

Consider the convergence of the periodograms (and the corresponding Fourier transforma-

tions). Define the vector of Fourier transformations, d(ωj) = 1/√2πT

∑Tt=1(xt−1 εt)

′×exp(−iωjt).

30

Page 31: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

By Result 2 in Appendix A, the first element of the vector d(ωj) is Op(T ). By Result 3 in Ap-

pendix A, the Fourier transformation of εt, dε(ωj), is Op(1). The Fourier transformation of εt,

dε(ωj) = dε(ωj) +x0−xT√

T1√2π

∑Tt Re(δy(t))e

−iωjt. By Lemma 1, the sum∑T

t Re(δy(t))e−iωjt ∼

O(1). By Result 1 in Appendix A, x0−xT√T

is Op(1). Therefore, the second element of d(ωj)

is Op(1). Therefore, the periodogram Ix,εT (ωj) = d(ωj)d∗(ωj) consists of the elements of the

following stochastic orders: Ix,εT,xx ∼ Op(T2), Ix,εT,εε ∼ Op(1), and I

x,εT,εx ∼ Op(T ).

Combining the results for the spectral densities and the periodogram, we obtain F2(ωj) ∼Op(T

2) and F2(ωj)− F2(ωj) ∼ Op(T ).

III. Note that F is a sum of elements |1−ρT e−iωj |2det(su,ε(ωj))

F2(ωj). From the results in II, it follows

directly that these elements are of the stochastic order of Op(1). Therefore, F ∼ Op(1). Similarly,

F ∼ Op(1). Furthermore,

|1− ρT e−iωj |2

det(su,ε(ωj))F2(ωj)−

|1− ρT e−iωj |2

det(su,ε(0))F2(ωj) =

|1− ρT e−iωj |2(F2(ωj)− F2(ωj))

det(su,ε(ωj))+

|1− ρT e−iωj |2F2(ωj)(det(s

u,ε(0))− det(su,ε(ωj)))

det(su,ε(0))det(su,ε(ωj))∼ Op

(1

T

).

Therefore, F− F ∼ Op

(1T

).

IV. From Lemma 1 (and Result 1), it follows that dy(ωj)−dy(ωj) = x0−xT√2πT

(su,εuε (ωj)/su,εuu (ωj)−

su,εuε (0)/su,εuu (0)) ∼ Op(1/T ). Therefore, F2(ωj)− F2(ωj) = Op(T ). Thus, F − F = Op(

1T).

D Proof of Theorem 1

We first show that the result of Theorem 1 holds for et that is i.i.d. normal, and then, demonstrate

that the asymptotic limits do not depend on the independence and Gaussian assumption.

Proof of Theorem 1: i.i.d. Gaussian Case. Decompose εt = r√

ψεε

ψuuut +

√ψεε|uzt. Note that by

construction, zt is i.i.d. standard normal, independent of the observations of xt. The related

decomposition of yt is yt = βxxt−1+r√

ψεε

ψuuut+

√ψεε|uzt or, equivalently, yt = βxxt−1+r

√ψεε

ψuu(xt−

ρxt−1) +√ψεε|uzt. Substituting this decomposition into the formula for βx (6) we obtain

βx =

q∑

j=−qj 6=0

Ix,yT,xx(ωj)

−1(∑

j=−qj 6=0

βxIx,yT,xx(ωj) + r

√ψεεψuu

(IT,xx+(ωj)− ρIT,xx(ωj))+

+√ψεε|uI

x,zT,zx(ωj) − (IT,xx+(ωj)− ρT IT,xx(ωj))r

√ψεεψuu

),

31

Page 32: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

where Ix,zT,zx(ωj) is the cross-periodogram of zt and xt−1. Therefore, the estimation error is simply

βx − βx =√ψεε|u

∑q=±1,..,±q I

x,zT,zx(ωj)∑

q=±1,..,±q Ix,yT,xx(ωj)

=√ψεε|uRe

(∑qj=1 I

x,zT,zx(ωj)∑q

j=1 Ix,yT,xx(ωj)

).

Define a complex normal variable wzj =∑T

t=1 zte−iωjt. Asymptotically, wzj , j = 1, .., q are i.i.d

NC(0, T )(using the notation of Brillinger, 1975), i.e., the real and imaginary parts of wzj are

jointly normal with the mean (0, 0)′ and the variance matrix T2I2, where I2 is a two-by-two

identity matrix. Define also

cxj =

∑Tt=1 xt−1e

iωjt

∑qj′=1 |

∑Tt=1 xt−1e

−iωj′ t|2.

Therefore, βx − βx =√ψεε|uRe

∑qj=1w

zj cxj , which is normal conditional on the observations of

xt. Conditional on xt, the limit of the variance of Re∑q

j=1wzj cxj is T/2

∑qj=1 |cxj |2. Additionally,

note that∑q

j=1 |cxj |2 = (∑q

j=1

∑Tt=1 |xt−1e

−iωjt|2)−1. That is,

βx − βx =√ψεε|u

√T/2ZT√∑q

j=1 |∑T

t=1 xt−1e−iωjt|2(1 + oxp(1)

)=

=√ψεε|u

√TZT√∑

j=±1,..,±q |∑T

t=1 xt−1e−iωjt|2(1 + oxp(1)

),

where ZT is a standard normal variable independent of the observations of xt. The term oxp(1),

which is a function of xt, t = 0, ..., T − 1, appears in the above formula after we replace the true

conditional variance of Re∑q

j=1wzj cxj with its approximation as T → ∞. For example,

Var(Reωzj cxj ) = |cxj |2

T

2

(1 +

(Re cxj )2

|cxj |2∑T

t=1 cos(2ωjt)

T

−(Im cxj )

2

|cxj |2∑T

t=1 cos(2ωjt)

T+ 2

Im cxj Re cxj

|cxj |2∑T

t=1 sin(2ωjt)

T

).

Because the sequences(Re cxj )

2

|cxj |2,

(Im cxj )2

|cxj |2, and

Im cxj Re cxj|cxj |2

are all uniformly tight and the ratios∑T

t=1 cos(2ωjt)

Tand

∑Tt=1 sin(2ωjt)

Tboth converge to zero, then Var(ωzj c

xj ) = |cxj |2 T2 (1 + oxp(1)).

We conclude that

βx − βx√Tψεε|u

(∑j=±1,..,±q |

∑Tt=1 xt−1e−iωjt|2

)−1/2⇒ Z,

where Z is standard normal. We obtain the result (9) after replacing |∑T

t=1 xt−1e−iωjt|2 by

32

Page 33: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

2πTIx,yT,xx(ωj).

To obtain (8), we use Result 1 to replace |∑T

t=1 xt−1e−iωjt| in the expression above with its

asymptotic limit. Applying the CMT, we obtain the asymptotic distribution of βx,

T (βx − βx) ⇒√ψεε|uψuu

Z√∑j=±1,..,±q |

∫ 1

τ=0Jc(τ)e−i2πjτdτ |2

. (12)

To derive the asymptotic limit of the FDLS estimator notice that

βFDLSx = βx + r

√ψεεψuu

(∑j=±1,..,±q(IT,xx+(ωj)− Ix,yT,xx(ωj))∑

j=±1,..,±q Ix,yT,xx(ωj)

+

∑j=±1,..,±q I

x,yT,xx(ωj)(1− ρT )∑

j=±1,..,±q Ix,yT,xx(ωj)

),

where the last term in the parenthesis simplifies to c/T . Furthermore, IT,xx+(ωj) = (xT −x0)/

√2πTdx(ωj) + eiωjIx,yT,xx(ωj). That is,

T (βxFDLS − βx)

r√

ψεε

ψuu

= TxT − x0√

2πT

∑j=±1,..,±q dx(ωj)∑

j=±1,..,±q Ix,yT,xx(ωj)

+ T

∑j=±1,..,±q I

x,yT,xx(ωj)(e

iωj − 1)∑j=±1,..,±q I

x,yT,xx(ωj)

+ c.

The first term T xT−x0√2πT

∑j=±1,..,±q dx(ωj)

∑j=±1,..,±q I

x,yT,xx(ωj)

converges to δq by the CMT and by Results 1-2 in Ap-

pendix A. The second term converges to zero because

T

∑j=±1,..,±q I

x,yT,xx(ωj)(e

iωj − 1)∑j=±1,..,±q I

x,yT,xx(ωj)

=

∑j=±1,..,±q I

x,yT,xx(ωj)(cos(ωj)− 1)T∑

j=±1,..,±q Ix,yT,xx(ωj)

,

and (cos(ωj)− 1)T ∼ Op(1/T ).

The limit (7) is obtained by summing the term r√

ψεε

ψuu(δq+c) and the limit of T (βx−βx).

We now demonstrate that the asymptotic distributions do not depend on the independence

and Gaussian assumptions.

Proof of Theorem 1: General Case. Under Assumption A, we can still use Results 2-3 in Ap-

pendix A to obtain the following for the FDLS estimator:

T (βFDLSx − βx) =

∑j=±1,..,±q

(∑Tt=1 εte

−iωjt

√T

)(∑Tt=1

xt−1√Tei2πj

tT

1T

)

∑j=±1,..,±q

∣∣∣∑T

t=1xt−1√Te−i2πj

tT

1T

∣∣∣2 ⇒

√ψεεψuu

∑j=±1,..,±q(

∫ 1

τ=0e−2πjiτdW ε(τ))

(∫ 1

τ=0Jc(τ)e

i2πjτdτ)

∑j=±1,..,±q

∣∣∣∫ 1

τ=0Jc(τ)ei2πjτdτ

∣∣∣2 .

The above formula is valid for the case with the Gaussian i.i.d. (ut, εt) that satisfies Assumption

A. Therefore, the right-hand side expression coincides with the limit in (7). The asymptotic

33

Page 34: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

convergence in (7), therefore, is also valid in this more general case. The result for the LW

estimator is proved analogously.

E Joint Likelihood for the Long-Run Nearly Optimal Test

In this section, we derive the likelihood function that is used for the long-run nearly optimal

test. For vectors ∆Y = (y2−y1, .., yT −y1) and X = (x0, .., xT−1), divide the Gaussian likelihood

logLT (∆Y,X) = logLT (∆Y |X) + logLT (X) into the distribution of X and the conditional

density of Y . The conditional density is then replaced by the equivalent Whittle approximation,

log LT (∆Y,X) = log LT (∆Y |X) + logLT (X). The steps to obtain the Whittle approximation

log LT (∆Y |X) are given in Section 2.

Note that the Whittle approximation describes the distribution of the Fourier transformations

dx(ωj) =∑T

t=1 xt−1e−iωjt and dy(ωj) =

∑Tt=1 yte

−iωjt, where the series Y = Y + Re(δy)(x0 −xT ) is given in Section 2. With some abuse of notation, log LT (∆Y,X) ≡ log LT (dx, dy) =∑T−1

j=1 fj(dx(ωj), dy(ωj)), where for d(ωj) = (dx(ωj), dy(ωj))

fj(dx(ωj), dy(ωj)) = − log 2π − 1

2log det(sx,y(ωj)) + tr(sx,y(ωj)

−1d(ωj)d∗(ωj)).

Assume for simplicity that T is odd. For 1 ≤ j ≤ (T − 1)/2, the above likelihood cor-

responds to a complex mean-zero normal variable d(ωj) with E(d(ωj)d∗(ωj)) = sx,y(ωj) and

E(d(ωj)d(ωj)′) = 02×2. Note, that the implied density for each pair d(ωj) is lj(dx(ωj), dy(ωj)) ≡

(fj(dx(ωj), dy(ωj))+ fj(dx(ωT−j), dy(ωT−j)) because the corresponding d(ωT−j) is just a complex

conjugate.

Next, represent the Whittle likelihood as the likelihood of independent bivariate normal

variables d(ωj), j = 1, .., (T − 1)/2, log LT (dx, dy) =∑(T−1)/2

j=1 lj(dx(ωj), dy(ωj)). It follows that

the dy(ωj) conditional on X are independent and are complex normal, with moments that depend

on dx(ωj) only. Note that the Fourier decomposition of Y , dy(ωj), is equal to dy(ωj) minus the

Fourier decomposition of Re(δy)(x0−xT ), say dδ(ωj). Therefore, the dy(ωj) conditional on X are

also independent across j = 1, .., (T − 1)/2 and are complex normal, with the mean equal to the

conditional mean of dy(ωj) minus dδ(ωj) and the second moments equal to the second moments

of dy(ωj).

For j = 1, .., (T − 1)/2, dy(ωj) is NC(sx,yyx (ωj)

sx,yxx (ωj)dx(ωj), 2

det(sx,y(ωj))

sx,yxx (ωj)

). In other words, its mean

issx,yyx (ωj)

sx,yxx (ωj)dx(ωj), and the imaginary and real parts are independent normal with variances of

det(sx,y(ωj))

sx,yxx (ωj)each.

34

Page 35: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

Combining all of these results together, we obtain:

log LT (∆Y |X) =T−1∑

j=1

−1

2log 2π − 1

2log

det(sx,y(ωj))

sx,yxx (ωj)−

|dy(ωj)− sx,yyx (ωj)

sx,yxx (ωj)dx(ωj)|2

2 det(sx,y(ωj)) sx,yxx (ωj)−1

.

Next, we divide this likelihood into the short-run and long-run parts and substitute the

formula for sx,y(ωj),

log LT (∆Y |X) = logR(∆Y|X, sx,y(ωj), q < j ≤ (T− 1)/2)− q log 2π

− 1

2

q∑

j=−qj 6=0

log

det(su,ε(ωj))

su,εuu (ωj)−

|dy(ωj)− βxdx(ωj)− (eiωj − ρ)su,εεu (ωj)

su,εuu (ωj)dx(ωj)|2

det(su,ε(ωj)) su,εuu (ωj)−1

,

where R(∆Y|X, sx,y(ωj), q < j ≤ (T − 1)/2) is the remainder of the conditional likelihood that

describes the short-run dynamics.

The marginal distribution of X is derived for the normal i.i.d. ut in (1). Assuming that the

distribution of x0 does not depend on ρ, the marginal likelihood is then

logLT (X) = −T − 1

2log 2π − T − 1

2logψuu −

1

2

T−1∑

t=1

(xt − µx − ρ(xt−1 − µx))2

ψuu

up to the density of x0.

We obtain the joint likelihood log LT (Y,X) by adding the log of the conditional distribution

and of the marginal distribution ofX as defined above. Furthermore, we replace the series Y with

Y , as explained in the main text, to obtain an asymptotically equivalent likelihood. Similarly,

based on Proposition 1, we can further simplify the testing problem after replacing suε(ωj) by

suε(0). The resulting likelihood to be used in the likelihood ratios is, therefore, defined as follows:

log LT (∆Y,X|βx, ρ) = log R(∆Y|X, sx,y(ωj), q < j ≤ (T− 1)/2)− q log 2π

− 1

2

q∑

jj 6=0

=−q

logψεε|u −

|dy(ωj)− βxdx(ωj)− (dx+(ωj)− ρdx(ωj))rψ

1/2εε

ψ1/2uu

|2

ψεε|u

+

− T − 1

2log 2π − T − 1

2logψuu −

1

2

T−1∑

t=1

(xt − µx − ρ(xt−1 − µx))2

ψuu,

where dx+(ωj) = 1/√2πT

∑Tt=1 xtexp(−iωjt).

Following Jansson and Moreira (2006), we next derive the asymptotic distribution of the

obtained likelihood in the neighborhood of the null hypothesis H0 : βx = β0. We re-parameterize

βx = β0+1T

ψεε|u

ψuub′ and ρ = 1+c/T , and we express the likelihood as follows, log LT (∆Y,X|b′, c) =

35

Page 36: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

log LT (Y,X|b′ = 0, c = 0)+ b′Rβ+ cRρ− 12(b′− r

1−r2 c)2Rββ− 1

2c2Rρρ, where Rβ = (ψuuψεε|u)

−1/2×1T

∑j=±1,..,±q dx(ωj)(dy(ωj) − β0dx(ωj) − rψ

1/2εε

ψ1/2uu

(dx+(ωj) − dx(ωj)))∗, Rρ = ψ−1

uuT−1∑T−1

t=1 (xt−1 −µx)(xt−xt−1)−r(1−r2)−1/2Rβ, Rββ = ψ−1

uu T−2∑

j=±1,..,±q dx(ωj)×dx(ωj)∗, and Rρρ = ψ−1uuT

−2×∑T−1t=1 x

2t−1.

The asymptotic behavior of the likelihood in the vicinity of β0 can then be obtained by

plugging in the limits for Rβ, Rρ, Rββ , and Rρρ. First,

Rβ = (ψuuψεε|u)−1/2 1

T

q∑

j=−qj 6=0

dx(ωj)

(b′

√ψεε|uψuu

dx(ωj)

T+√ψεε|udz(ωj)− c

rψ1/2εε

ψ1/2uu

dx(ωj)

T

)∗

,

where zt ≡ (ψεε|u)−1/2(εt − rψ

1/2εε

ψ1/2uu

ut). Note that the long-run variance of zt is 1 and, therefore,

as follows from Result 4 in Appendix A, its Fourier transformation has the limit dz(ωj) ⇒12π

∑1τ=0 e

−2πijτdW z(τ), where W z(τ) is a Brownian process. As follows from Result 2 from

Appendix A, the Fourier transformation of xt has the limitdx(ωj)

T⇒

√ψuu

∑1τ=0 e

−2πijτJc(τ)dτ .

Furthermore, because the long-run covariance between zt and ut is zero, then from Results

2 and 5 in Appendix A, it follows that Jc(τ) and W z(τ) are independent. Thus, we ob-

tain the limit Rβ ⇒ Rβ , where Rβ = 12π

∑qj=−qj 6=0

(∫ 1

0Jc(τ)e

−2πijτdτ)(∫ 1

0e−2πijτdW z(τ) + (b′ −

cr1−r2 )

∫ 1

0Jc(τ)e

−2πijτdτ)∗. Second, combining the latter result and the result by Jansson and Mor-

eira (2006, Lemma 3), we obtain Rρ ⇒ Rρ =∫ 1

0Jc(τ)dJc(τ) − r(1 − r2)−1/2Rβ, and similarly,

Rββ ⇒ Rββ = 12π

∑j=±1,..,±q |

∑1τ=0 e

−2πijτJc(τ)dτ |2 and Rρρ ⇒ Rρρ =∫ 1

0(Jc(τ))

2dτ.

The asymptotic inference problem, thus, relies on the exponential function of the multi-variate

process R = (Rβ ,Rρ,Rββ ,Rρρ) and the parameters (b′, c):

L(R|b′, c) ∝ eb′Rβ+cRρ− 1

2(b′−c r√

1−r2)2Rββ− 1

2c2Rρρ

.

Note that in calculating the likelihood ratio, the observations R can be replaced by the

36

Page 37: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

asymptotically equivalent R,

Rβ = (ψuuψεε|u)−1/2 1

T

q∑

j=−qj 6=0

dx(ωj)(dy(ωj)− β0dx(ωj)−rψ

1/2εε

ψ1/2uu

(dx+(ωj)− dx(ωj)))∗,

Rρ = ψ−1uuT

−1

T−1∑

t=1

(xt−1 − x0)(xt − xt−1)− r(1− r2)−1/2Rβ,

Rββ = ψ−1uuT

−2

q∑

|j|=1

dx(ωj)dx(ωj)∗,

Rρρ = ψ−1uuT

−2

T−1∑

t=1

(xt−1 − x0)2.

If utTt=1 are not i.i.d. but satisfy Assumption A, then the statistics Rβ, Rββ , and Rρρ still

converge to Rβ, Rββ , and Rρρ, respectively. For the limit of Rρ in the i.i.d. case, note that

Rρ =12(Jc(1)

2 − 1)− r(1− r2)−1/2Rβ . Therefore, we can use the following statistic

Rρ =1

2

(ψ−1uu T

−1(xT−1 − x0)2 − 1

)− r(1− r2)−1/2Rβ ,

which converges to Rρ in the general case.

F Figures and Tables

37

Page 38: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

b

c = −2 r = −0.95

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

b

c = −2 r = −0.75

0 5 10 15 200

0.2

0.4

0.6

0.8

1

b

c = −20 r = −0.95

0 5 10 15 200

0.2

0.4

0.6

0.8

1

b

c = −20 r = −0.75

Q−test t−test LW−test Long−horizon t−test

Figure 1: Asymptotic local power functions for H0 : βx = 0. The data are generatedaccording to model (1) with E(εt|ut−1, ...) = 0. The power functions depend on the persistenceparameter c in ρT = 1 + c

T, the long-run correlation r between shocks εt and ut, and the true

value of βx = bT. The long-run variances of εt and ut are set to 1. The Q-test is the test by

Campbell and Yogo (2006). The t-test is the OLS t-test. The LW test is the test defined inTheorem 1. The long-horizon t-test is the OLS t-test with Newey-West standard errors (s.e.)for the regression of yt + ... + yt+H−1 on xt−1. The number of lags for the Newey-West s.e. L issuch that L/H → 1. The horizon H is 1/10th the sample size. Correspondingly, the number offrequencies q for the LW test is 10. The confidence intervals for all of the tests are constructedby using the adjusted Bonferroni bounds based on the DF-GLS statistic for ρT .

38

Page 39: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

b

c = −2 r = −0.95

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

b

c = −2 r = −0.75

0 5 10 15 200

0.2

0.4

0.6

0.8

1

b

c = −20 r = −0.95

0 5 10 15 200

0.2

0.4

0.6

0.8

1

b

c = −20 r = −0.75

Q−test t−test LW−test Long−horizon t−test

Figure 2: Asymptotic local power functions for H0 : βx = 0. The data are generatedaccording to model (1) with E(εt|ut−1, ...) 6= 0 due to a measurement error in xt. In particular,if mt is a measurement error, then ut is positively correlated with mt, and the error term εt =−βxmt−1 + εt, where E(εt|ut−1, ...) = 0. The parameters for the dynamics of ut, mt, and εt arechosen in such a way that the long-run covariance matrix of et = (εt, ut) is the same as in Figure1, and the variance of E(εt|ut−1, ...) is 0.25% of the variance of εt. The power functions dependon the persistence parameter c in ρT = 1 + c

T, the long-run correlation r between shocks εt and

ut, and the true value of βx = bT. The Q-test is the test by Campbell and Yogo (2006). The

t-test is the OLS t-test. The LW test is the test defined in Theorem 1. The long-horizon t-testis the OLS t-test with Newey-West standard errors (s.e.) for the regression of yt + ... + yt+H−1

on xt−1. The number of lags for the Newey-West s.e. L is such that L/H → 1. The horizon His 1/10th the sample size. Correspondingly, the number of frequencies q for the LW test is 10.The confidence intervals for all of the tests are constructed by using adjusted Bonferroni boundsbased on the DF-GLS statistic for ρT .

39

Page 40: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

b

c = −2 r = −0.95

LW test Long−run nearly−optimal

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

b

c = −2 r = −0.75

0 5 10 15 200

0.2

0.4

0.6

0.8

1

b

c = −20 r = −0.95

0 5 10 15 200

0.2

0.4

0.6

0.8

1

b

c = −20 r = −0.75

Figure 3: Asymptotic local power functions for H0 : βx = 0. The data are generatedaccording to model (1). The power functions depend on the persistence parameter c in ρT = 1+ c

T,

the long-run correlation r between shocks εt and ut, and the true value of βx =bT. The long-run

variances of εt and ut are set to 1. The LW test is the test defined in Theorem 1. The nearlyoptimal long-run predictability test is described in Section 5. The number of frequencies q forthe LW and the nearly optimal tests is 10.

40

Page 41: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

Table 1: Conservative Bounds on c in the LW Test With q = 5 FrequenciesThe table reports lower (cL) and upper (cU ) bounds on c in model (1) with ρ = 1 + c

T that are usedto construct conservative (Bonferroni adjusted) equal-tailed 90% confidence intervals for βx based onthe value of the Local Whittle estimator with q = 5 frequencies. To choose the row, calculate Dickey-Fuller GLS statistic tρ, see Elliott, Rothenberg, and Stock (1996). Columns correspond to consistentlyestimated long-run correlations r between shocks εt and ut.

r = −1.0 r = −0.9 r = −0.8 r = −0.7 r = −0.6tρ cL cU cL cU cL cU cL cU cL cU-12.00 −257.26 −188.59 −248.20 −201.47 −247.82 −206.43 −245.24 −208.49 −245.69 −211.01-11.50 −242.75 −175.15 −234.79 −187.81 −234.36 −192.34 −232.00 −194.73 −232.27 −197.12-11.00 −228.24 −161.80 −221.39 −174.15 −220.90 −178.24 −218.75 −180.97 −218.84 −183.23-10.50 −213.72 −148.78 −207.98 −160.43 −207.43 −164.25 −205.51 −167.25 −205.42 −169.34-10.00 −199.21 −135.94 −194.57 −147.29 −193.97 −150.80 −192.27 −153.82 −191.99 −155.62-9.75 −191.96 −129.65 −187.87 −140.75 −187.24 −144.24 −185.64 −146.97 −185.28 −148.91-9.50 −184.70 −123.33 −181.17 −134.33 −180.51 −137.65 −179.02 −140.42 −178.57 −142.30-9.25 −177.45 −117.13 −174.46 −127.98 −173.78 −131.12 −172.40 −133.90 −171.85 −135.78-9.00 −170.19 −111.00 −167.68 −121.63 −166.74 −124.91 −165.31 −127.51 −164.74 −129.22-8.75 −162.70 −104.97 −159.94 −115.26 −159.25 −118.62 −157.98 −121.06 −157.40 −122.86-8.50 −155.49 −99.15 −152.88 −109.18 −152.20 −112.24 −150.83 −114.76 −150.20 −116.51-8.25 −148.24 −93.39 −145.79 −103.09 −144.97 −106.14 −143.56 −108.58 −143.05 −110.27-8.00 −140.94 −87.78 −138.58 −97.23 −138.00 −100.17 −136.66 −102.54 −136.10 −104.17-7.75 −134.15 −82.26 −131.73 −91.35 −131.05 −94.31 −129.82 −96.57 −129.20 −98.13-7.50 −126.92 −76.81 −124.86 −85.83 −124.25 −88.56 −123.08 −90.71 −122.51 −92.18-7.25 −120.31 −71.61 −118.10 −80.09 −117.46 −82.86 −116.25 −85.05 −115.73 −86.60-7.00 −113.70 −66.29 −111.33 −74.82 −110.67 −77.44 −109.58 −79.41 −109.10 −80.85-6.75 −106.91 −61.44 −104.92 −69.48 −104.40 −72.09 −103.31 −74.06 −102.82 −75.47-6.50 −100.45 −56.47 −98.58 −64.24 −98.02 −66.75 −96.96 −68.70 −96.51 −70.06-6.25 −94.12 −51.61 −92.26 −59.30 −91.72 −61.75 −90.78 −63.51 −90.33 −64.83-6.00 −87.99 −47.19 −86.27 −54.47 −85.75 −56.78 −84.78 −58.50 −84.36 −59.79-5.75 −82.01 −42.73 −80.18 −49.78 −79.70 −51.91 −78.80 −53.78 −78.38 −55.05-5.50 −76.19 −38.56 −74.58 −45.30 −74.14 −47.46 −73.22 −49.08 −72.79 −50.23-5.25 −70.57 −34.48 −68.95 −40.97 −68.48 −43.00 −67.64 −44.54 −67.26 −45.70-5.00 −65.00 −30.57 −63.48 −36.81 −63.03 −38.79 −62.26 −40.27 −61.89 −41.36-4.75 −59.68 −26.92 −58.21 −32.85 −57.79 −34.70 −57.03 −36.12 −56.65 −37.16-4.50 −54.58 −23.40 −53.22 −29.00 −52.84 −30.82 −52.09 −32.20 −51.75 −33.17-4.25 −49.62 −20.04 −48.34 −25.44 −47.96 −27.10 −47.28 −28.44 −46.95 −29.36-4.00 −44.93 −16.99 −43.68 −22.13 −43.33 −23.70 −42.71 −24.90 −42.41 −25.76-3.75 −40.46 −14.20 −39.32 −18.93 −38.94 −20.40 −38.31 −21.58 −38.01 −22.40-3.50 −36.05 −11.47 −34.99 −15.94 −34.69 −17.36 −34.15 −18.45 −33.87 −19.22-3.25 −32.05 −8.94 −30.97 −13.25 −30.67 −14.57 −30.14 −15.53 −29.91 −16.25-3.00 −28.20 −6.78 −27.25 −10.68 −26.98 −11.87 −26.49 −12.87 −26.26 −13.54-2.75 −24.57 −4.80 −23.74 −8.40 −23.48 −9.53 −23.00 −10.38 −22.78 −11.01-2.50 −21.14 −2.95 −20.32 −6.29 −20.09 −7.36 −19.70 −8.16 −19.50 −8.71-2.25 −17.97 −1.42 −17.27 −4.47 −17.05 −5.42 −16.66 −6.15 −16.48 −6.66-2.00 −15.08 −0.10 −14.41 −2.85 −14.22 −3.75 −13.89 −4.37 −13.74 −4.87-1.75 −12.44 1.03 −11.85 −1.48 −11.70 −2.24 −11.39 −2.85 −11.24 −3.27-1.50 −10.08 1.90 −9.54 −0.35 −9.39 −1.03 −9.12 −1.55 −9.00 −1.91-1.25 −8.01 2.59 −7.50 0.58 −7.39 0.00 −7.15 −0.49 −7.02 −0.80-1.00 −6.18 3.08 −5.76 1.30 −5.64 0.77 −5.42 0.35 −5.32 0.07-0.75 −4.76 3.38 −4.33 1.78 −4.22 1.31 −4.00 0.96 −3.92 0.72-0.50 −3.58 3.60 −3.20 2.08 −3.11 1.69 −2.94 1.36 −2.88 1.14-0.25 −2.69 3.77 −2.38 2.33 −2.29 1.92 −2.15 1.64 −2.08 1.440.00 −2.00 3.91 −1.72 2.51 −1.65 2.12 −1.50 1.85 −1.45 1.670.25 −1.43 4.04 −1.20 2.67 −1.14 2.30 −1.03 2.02 −0.98 1.850.50 −0.99 4.16 −0.80 2.81 −0.74 2.44 −0.65 2.18 −0.60 2.011.00 −0.34 4.35 −0.18 3.05 −0.14 2.69 −0.05 2.44 −0.01 2.281.50 0.13 4.51 0.26 3.23 0.29 2.90 0.35 2.66 0.38 2.502.00 0.47 4.67 0.57 3.40 0.60 3.08 0.66 2.84 0.69 2.692.50 0.74 4.81 0.83 3.55 0.86 3.23 0.92 3.01 0.94 2.853.00 0.96 4.94 1.05 3.68 1.07 3.36 1.12 3.14 1.14 3.003.50 1.13 5.04 1.22 3.80 1.24 3.49 1.29 3.27 1.31 3.124.00 1.29 5.13 1.36 3.90 1.39 3.60 1.43 3.37 1.46 3.23

41

Page 42: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

Table 1: Conservative Bounds on c in the LW Test With q = 5 Frequencies (continued)The table reports lower (cL) and upper (cU ) bounds on c in model (1) with ρ = 1 + c

T that are usedto construct conservative (Bonferroni adjusted) equal-tailed 90% confidence intervals for βx based onthe value of the Local Whittle estimator with q = 5 frequencies. To choose the row, calculate Dickey-Fuller GLS statistic tρ, see Elliott, Rothenberg, and Stock (1996). Columns correspond to consistentlyestimated long-run correlations r between shocks εt and ut.

r = −0.5 r = −0.4 r = −0.3 r = −0.2 r = −0.1tρ cL cU cL cU cL cU cL cU cL cU-12.00 −243.45 −213.70 −246.30 −214.75 −242.27 −216.72 −243.95 −218.79 −236.80 −222.30-11.50 −230.23 −199.67 −232.39 −200.89 −228.74 −202.82 −229.72 −204.81 −223.26 −208.09-11.00 −217.01 −185.64 −218.47 −187.04 −215.21 −188.91 −215.49 −190.82 −209.72 −193.87-10.50 −203.79 −171.62 −204.55 −173.18 −201.69 −175.01 −201.26 −176.84 −196.18 −179.65-10.00 −190.57 −157.80 −190.63 −159.26 −188.16 −160.99 −187.03 −162.73 −182.64 −165.48-9.75 −183.96 −151.05 −183.67 −152.68 −181.40 −154.41 −179.92 −155.94 −175.87 −158.49-9.50 −177.35 −144.31 −176.71 −145.84 −174.63 −147.53 −172.80 −149.14 −169.10 −151.71-9.25 −170.74 −137.72 −169.75 −139.22 −167.84 −140.88 −165.50 −142.47 −161.72 −144.81-9.00 −163.56 −131.16 −162.33 −132.60 −160.33 −134.25 −158.23 −135.78 −154.72 −138.12-8.75 −156.33 −124.80 −155.23 −126.20 −153.23 −127.72 −151.19 −129.20 −147.79 −131.54-8.50 −149.11 −118.45 −148.04 −119.76 −146.24 −121.26 −144.11 −122.71 −140.73 −125.01-8.25 −142.14 −112.05 −141.02 −113.40 −139.09 −114.89 −137.19 −116.30 −133.97 −118.61-8.00 −135.07 −105.89 −134.15 −107.20 −132.40 −108.67 −130.35 −110.05 −127.18 −112.21-7.75 −128.09 −99.92 −127.19 −101.18 −125.49 −102.58 −123.73 −103.86 −120.55 −105.96-7.50 −121.47 −93.92 −120.59 −95.14 −118.94 −96.53 −117.03 −97.82 −114.12 −99.87-7.25 −114.79 −88.22 −113.94 −89.43 −112.23 −90.70 −110.45 −91.89 −107.55 −93.90-7.00 −108.21 −82.43 −107.33 −83.58 −105.73 −84.90 −104.08 −86.21 −101.34 −88.19-6.75 −101.91 −77.00 −101.10 −78.07 −99.54 −79.26 −97.82 −80.43 −95.11 −82.29-6.50 −95.61 −71.57 −94.83 −72.65 −93.36 −73.82 −91.75 −75.01 −89.16 −76.79-6.25 −89.47 −66.27 −88.74 −67.30 −87.40 −68.47 −85.81 −69.58 −83.18 −71.36-6.00 −83.53 −61.23 −82.78 −62.23 −81.35 −63.28 −79.84 −64.28 −77.38 −65.99-5.75 −77.62 −56.29 −76.90 −57.17 −75.65 −58.18 −74.31 −59.24 −71.88 −60.92-5.50 −72.03 −51.44 −71.35 −52.35 −70.15 −53.49 −68.77 −54.50 −66.46 −55.97-5.25 −66.56 −46.94 −65.86 −47.84 −64.68 −48.78 −63.37 −49.67 −61.21 −51.13-5.00 −61.22 −42.49 −60.61 −43.29 −59.47 −44.20 −58.20 −45.14 −56.13 −46.58-4.75 −56.02 −38.29 −55.47 −39.08 −54.46 −39.97 −53.28 −40.81 −51.21 −42.12-4.50 −51.11 −34.19 −50.53 −34.93 −49.50 −35.80 −48.43 −36.63 −46.54 −37.91-4.25 −46.37 −30.36 −45.82 −31.07 −44.85 −31.90 −43.82 −32.68 −42.08 −33.84-4.00 −41.88 −26.70 −41.38 −27.39 −40.49 −28.15 −39.51 −28.87 −37.77 −30.00-3.75 −37.49 −23.29 −37.02 −23.91 −36.16 −24.61 −35.25 −25.31 −33.69 −26.39-3.50 −33.38 −20.05 −32.93 −20.65 −32.11 −21.31 −31.26 −21.95 −29.77 −22.97-3.25 −29.47 −17.03 −29.07 −17.60 −28.34 −18.24 −27.50 −18.84 −26.13 −19.78-3.00 −25.85 −14.25 −25.47 −14.76 −24.76 −15.34 −23.98 −15.89 −22.68 −16.79-2.75 −22.36 −11.67 −22.00 −12.14 −21.36 −12.69 −20.67 −13.21 −19.50 −14.00-2.50 −19.15 −9.29 −18.82 −9.77 −18.21 −10.25 −17.59 −10.74 −16.52 −11.47-2.25 −16.15 −7.23 −15.85 −7.65 −15.32 −8.08 −14.75 −8.52 −13.75 −9.17-2.00 −13.43 −5.35 −13.16 −5.70 −12.67 −6.13 −12.12 −6.52 −11.26 −7.11-1.75 −10.97 −3.73 −10.71 −4.03 −10.25 −4.38 −9.80 −4.79 −9.01 −5.31-1.50 −8.75 −2.31 −8.49 −2.60 −8.12 −2.92 −7.74 −3.24 −6.98 −3.72-1.25 −6.79 −1.15 −6.59 −1.41 −6.26 −1.67 −5.86 −1.96 −5.25 −2.37-1.00 −5.12 −0.23 −4.94 −0.46 −4.65 −0.69 −4.30 −0.93 −3.76 −1.29-0.75 −3.76 0.45 −3.61 0.27 −3.33 0.05 −3.03 −0.14 −2.56 −0.45-0.50 −2.74 0.92 −2.59 0.77 −2.33 0.57 −2.10 0.40 −1.67 0.13-0.25 −1.96 1.22 −1.83 1.08 −1.61 0.92 −1.39 0.77 −1.05 0.520.00 −1.34 1.47 −1.24 1.32 −1.07 1.17 −0.89 1.03 −0.57 0.820.25 −0.89 1.67 −0.81 1.53 −0.65 1.39 −0.48 1.25 −0.21 1.050.50 −0.52 1.83 −0.44 1.72 −0.31 1.57 −0.16 1.44 0.08 1.251.00 0.05 2.11 0.11 1.99 0.22 1.87 0.33 1.76 0.53 1.571.50 0.43 2.33 0.49 2.23 0.58 2.10 0.69 1.99 0.87 1.832.00 0.75 2.52 0.79 2.41 0.87 2.30 0.97 2.19 1.12 2.032.50 0.99 2.69 1.03 2.58 1.10 2.47 1.19 2.36 1.34 2.203.00 1.18 2.84 1.22 2.73 1.29 2.61 1.38 2.51 1.53 2.353.50 1.35 2.97 1.38 2.86 1.46 2.75 1.54 2.64 1.67 2.494.00 1.50 3.08 1.53 2.99 1.59 2.87 1.67 2.77 1.81 2.61

42

Page 43: Semantic Scholar€¦ · Efficient Tests for Long-Run Predictability: Do Long-Run Relations Convey Extra Information? Natalia Sizova∗ January 10, 2015 Abstract Short-run and long-run

Table 2: Relation between Stock Index Returns and Payout RatiosThe table reports 90% confidence intervals for the slope βx in the model rt+1 = β0 +βxxt+ εt, where

rt+1 is the annual CRSP value-weighted excess return and xt is one of the measures of the payout toshareholders. Sample: annual observations from 1926 to 2003 (for payout ratios and net payout) andfrom 1926 to 2010 for the dividend yield and earnings yield. The estimates for the Q-test and the LWtest are the midpoints of the confidence intervals. The estimate for the t-test is the Stambaugh’s (1999)corrected OLS slope. Asterisks denote significance at the 5% level for the one-sided tests and at the10% level for the two-sided tests.

xt−1 t-test Q-test (q = all) LW test (q = 10) LW test (q = 5)

Net Payout 0.44∗[ 0.29, 0.59] 0.48∗[ 0.32, 0.64] 0.44∗[ 0.25, 0.63] 0.38∗[ 0.15, 0.62]

Payout I 0.14∗[ 0.01, 0.24] 0.15∗[ 0.04, 0.27] 0.14∗[ 0.02, 0.26] 0.15∗[ 0.02, 0.27]

Payout II 0.11 [−0.01, 0.20] 0.11∗[ 0.01, 0.21] 0.10∗[ 0.00, 0.20] 0.11∗[ 0.01, 0.22]

Earning-Price 0.08 [−0.04, 0.22] 0.10 [−0.04, 0.25] 0.14 [−0.02, 0.30] 0.20∗[ 0.03, 0.37]

Dividend Yield 0.06 [−0.06, 0.15] 0.07 [−0.03, 0.17] 0.09 [−0.01, 0.20] 0.11∗[ 0.01, 0.22]

Table 3: Relation between Exchange Rate Changes and Forward Interest Rate DifferentialsThe table reports the estimates and 90% confidence intervals for the slope βx in the model st+1/12 −

st = β0 + βx(ifat−j,t,t+1 − if bt−j,t,t+1) + εt, where st is the logarithm of the exchange rate, and ifat−j,t,t+1

and if bt−j,t,t+1 are continuously compounded annualized forward interest rates (domestic and foreign) set

at time t− j from t to t+1. Denote ∆ift−j,t,t+1 ≡ ifat−j,t,t+1− if bt−j,t,t+1. The observations are monthly.Time t is in years, i.e., 1/12 stands for one month. Sample: USD/GBP, USD/DEM, USD/CHF 1979,Jan – 2012, Jul. The estimates for the Q-test and the LW test are the midpoints of the confidenceintervals. The estimate for the t-test is the Stambaugh’s (1999) corrected OLS slope. Asterisks denotesignificance at the 5% level for the one-sided tests and at the 10% level for the two-sided tests.

xt−1 t-test Q-test (q = all) LW test (q = 10) LW test (q = 5)

USD/GBP (LIBOR/swap rates)∆ift−1,t,t+1 0.06 [−0.04, 0.16] 0.06 [−0.04, 0.15] 0.06 [−0.06, 0.18] 0.04 [−0.13, 0.21]∆ift−4,t,t+1 0.09 [−0.11, 0.29] 0.11 [−0.09, 0.31] 0.16 [−0.10, 0.42] 0.22 [−0.09, 0.52]

USD/DEM (LIBOR/swap rates)∆ift−1,t,t+1 0.01 [−0.08, 0.09] 0.01 [−0.08, 0.10] 0.01 [−0.10, 0.11] −0.02 [−0.14, 0.10]∆ift−4,t,t+1 0.16 [−0.01, 0.35] 0.16 [−0.01, 0.34] 0.23∗[ 0.04, 0.43] 0.29∗[ 0.07, 0.51]

USD/CHF (LIBOR/swap rates)∆ift−1,t,t+1 0.02 [−0.09, 0.09] −0.01 [−0.10, 0.08] −0.04 [−0.14, 0.07] −0.07 [−0.19, 0.04]∆ift−4,t,t+1 0.13 [−0.04, 0.31] 0.13 [−0.04, 0.31] 0.16 [−0.03, 0.35] 0.21∗[ 0.00, 0.41]

43