a new approach to time varying parameters in vector ... new approach to time varying parameters in...
TRANSCRIPT
A New Approach To Time Varying Parametersin Vector Autoregressive Models∗
Florian Huber1, Gregor Kastner†1, and Martin Feldkircher‡2
1WU Vienna University of Economics and Business2Oesterreichische Nationalbank (OeNB)
January 15, 2018
Abstract
We propose a flexible means of estimating vector autoregressions with time-varying
parameters (TVP-VARs) by introducing a latent threshold process that is driven by
the absolute size of parameter changes. This enables us to dynamically detect
whether a given regression coefficient is constant or time-varying. When applied
to a medium-scale macroeconomic US dataset our model yields precise density and
turning point predictions, especially during economic downturns, and provides new
insights on the changing effects of increases in short-term interest rates over time.
Keywords: Change point model, Threshold mixture innovations, Structural breaks, Shrink-
age, Bayesian statistics, Monetary policy.
JEL Codes: C11, C32, C52, E42.
∗This paper is a substantially revised version of a paper circulated under the title “Should I stayor should I go? Bayesian inference in the threshold time-varying parameter model”, Department ofEconomics Working Paper Series 235, WU Vienna University of Economics and Business, Vienna.†Corresponding author. Address: Welthandelsplatz 1, Building D4, 4th Floor, 1020 Wien, Austria.
Phone: +43 1 31336-5593. Fax: +43 1 31336-905593. Email: [email protected].‡The opinions expressed in this paper are those of the authors and do not necessarily reflect the
official viewpoint of the Oesterreichische Nationalbank or the Eurosystem.
1
arX
iv:1
607.
0453
2v4
[st
at.M
E]
12
Jan
2018
1 Introduction
In the last few years, economists in policy institutions and central banks were criticized
for their failure to foresee the recent financial crisis that engulfed the world economy
and led to a sharp drop in economic activity. Critics argued that economists failed
to predict the crisis because models commonly utilized at policy institutions back then
were too simplistic. For instance, the majority of forecasting models adopted were (and
possibly still are) linear and low dimensional. The former implies that the underlying
structural mechanisms and the volatility of economic shocks are assumed to remain
constant over time – a rather restrictive assumption. The latter implies that only little
information is exploited which may be detrimental for obtaining reliable predictions.
In light of this criticism, practitioners started to develop more complex models that
are capable of capturing salient features of time series commonly observed in macroe-
conomics and finance. Recent research (Stock and Watson, 1996; Cogley and Sargent,
2002; 2005; Primiceri, 2005; Sims and Zha, 2006) suggests that, at least for US data,
there is considerable evidence that the influence of certain variables appears to be time-
varying. This raises additional issues related to model specification and estimation. For
instance, do all regression parameters vary over time? Or is time variation just limited
to a specific subset of the parameter space? Moreover, as is the case with virtually any
modeling problem, the question whether a given variable should be included in the
model in the first place naturally arises. Apart from deciding whether parameters are
changing over time, the nature of the process that drives the dynamics of the coeffi-
cients also proves to be an important modeling decision.
In a recent contribution, Frühwirth-Schnatter and Wagner (2010) focus on model
specification issues within the general framework of state space models. Exploiting a
non-centered parametrization of the model allows them to rewrite the model in terms
of a constant parameter specification, effectively capturing the steady state of the pro-
cess along with deviations thereof. The non-centered parameterization is subsequently
used to search for appropriate model specifications, imposing shrinkage on the steady
state part and the corresponding deviations. Recent research aims to discriminate be-
tween inclusion/exclusion of elements of different variables and whether the associated
regression coefficient is constant or time-varying (Belmonte, Koop, and Korobilis, 2014;
Eisenstat, Chan, and Strachan, 2016; Koop and Korobilis, 2012; 2013; Kalli and Griffin,
2014). Another strand of the literature asks whether coefficients are constant or time-
varying by assuming that the innovation variance in the state equation is characterized
by a change point process (McCulloch and Tsay, 1993; Gerlach, Carter, and Kohn, 2000;
Koop, Leon-Gonzalez, and Strachan, 2009; Giordani and Kohn, 2012). However, the
main drawback of this modeling approach is the severe computational burden originat-
2
ing from the need to simulate additional latent states for each parameter. This renders
estimation of large dimensional models like vector autoregressions (VARs) unfeasible.
To circumvent such problems, Koop, Leon-Gonzalez, and Strachan (2009) estimate a
single Bernoulli random variable to discriminate between time constancy and parame-
ter variation for the autoregressive coefficients, the covariances, and the log-volatilities,
respectively. This assumption, however, implies that either all autoregressive parame-
ters change over a given time frame, or none of them. Along these lines, Maheu and
Song (forthcoming) allow for independent breaks in regression coefficients and the
volatility parameters. However, they show that their multivariate approach is inferior
to univariate change point models when out-of-sample forecasts are considered and
conclude that allowing for independent breaks in each series is important.
In the present paper, we introduce a method that can be applied to a highly param-
eterized VAR model by combining ideas from the literature of latent threshold models
(Neelon and Dunson, 2004; Nakajima and West, 2013a;b; Zhou, Nakajima, and West,
2014; Kimura and Nakajima, 2016) and mixture innovation models. Specifically, we in-
troduce a set of latent thresholds that controls the degree of time-variation separately
for each parameter and for each point in time. This is achieved by estimating variable-
specific thresholds that allow for movements in the autoregressive parameters if the
proposed change of the parameter is large enough. We show that this can be achieved
by assuming that the innovations of the state equation follow a threshold model that
discriminates between a situation where the innovation variance is large and a case
with an innovation variance set (very close) to zero. The proposed framework nests a
wide variety of competing models, most notably the standard time-varying parameter
model, a change-point model with an unknown number of regimes, mixtures between
different models, and finally the simple constant parameter model. To assess systemat-
ically, in a data-driven fashion, which predictors should be included in the model, we
impose a set of Normal-Gamma priors (Griffin and Brown, 2010) in the spirit of Bitto
and Frühwirth-Schnatter (2016) on the initial state of the system.
We illustrate the empirical merits of our approach by carrying out an extensive
forecasting exercise based on a medium-scale US dataset. Our proposed framework
is benchmarked against two constant parameter Bayesian VAR models with stochastic
volatility and hierarchical shrinkage priors. The findings indicate that the threshold
time-varying parameter VAR excels in crisis periods while being only slightly inferior
during “normal” periods in terms of one-step-ahead log predictive likelihoods. Consid-
ering turning point predictions for GDP growth, our model outperforms the constant
parameter benchmarks when upward turning points are considered while yielding sim-
ilar forecasts for downward turning points.
3
In the second part of the application, we provide evidence on the degree of time-
variation of the underlying causal mechanisms for the USA. Considering the determi-
nant of the time-varying variance-covariance matrix of the state innovations as a global
measure for the strength of parameter movements, we find that those movements reach
a maximum in the beginning of the 1980s while displaying only relatively modest move-
ments before and after that period. Consequently, we investigate the effects of a mon-
etary policy shock for the pre- and post-1980 periods separately. This exercise reveals
a considerable prize puzzle in the 1960s which starts disappearing in the early 1980s.
Moreover, considering the most recent part of our sample period, we find evidence for
increased effectiveness of monetary policy. This is especially pronounced during the
aftermath of the global financial crisis in 2008/09.
The paper is structured as follows. Section 2 introduces the modeling approach, the
prior setup and the corresponding MCMC algorithm for posterior simulation. Section 3
illustrates the behavior of the model by showcasing scenarios with few, moderately
many, and many jumps in the state equation. In Section 4, we apply the model to a
medium-scale US macroeconomic dataset and assess its predictive capabilities against
a range of competing models in terms of density and turning point predictions. More-
over, we investigate during which periods VAR coefficients display the largest amount
of time-variation and consider the associated implications on dynamic responses with
respect to a monetary policy shock. Finally, Section 5 concludes.
2 Econometric framework
We begin by specifying a flexible model that is capable of discriminating between con-
stant and time-varying parameters at each point in time.
2.1 A threshold mixture innovation model
Consider the following dynamic regression model,
yt = x′tβt + ut, ut ∼ N (0, σ2t ), (2.1)
where xt is a K-dimensional vector of explanatory variables and βt = (β1t, . . . , βKt)′
a vector of regression coefficients. The error term ut is assumed to be independently
normally distributed with (potentially) time-varying variance. This model assumes that
the relationship between elements of xt and yt is not necessarily constant over time,
but changes subject to some law of motion for βt. Typically, researchers assume that
4
the jth element of βt (j = 1, . . . , K) follows a random walk process,
βjt = βj,t−1 + ejt, ejt ∼ N (0, ϑj), (2.2)
with ϑj denoting the innovation variance of the latent states. Eq. (2.2) implies that
parameters evolve gradually over time, ruling out abrupt changes. While being con-
ceptually flexible, in the presence of only a few breaks in the parameters, this model
generates spurious movements in the coefficients that could be detrimental for the em-
pirical performance of the model (D’Agostino, Gambetti, and Giannone, 2013).
Thus, we deviate from Eq. (2.2) by specifying the innovations of the state equation
ejt to be a mixture distribution. More concretely, let
ejt ∼ N (0, θjt), (2.3)
θjt = sjtϑj1 + (1− sjt)ϑj0, (2.4)
where sjt is an indicator variable with an unconditional Bernoulli distribution. This
mechanism is closely related to an absolutely continuous spike-and-slab prior where
the slab has variance ϑj1 and the spike has variance ϑj0 with ϑj1 ϑj0 (see e.g.,
Malsiner-Walli and Wagner, 2011, for an excellent survey on this class of priors).
In the present framework we assume that the conditional distribution p(sjt|∆βt)follows a threshold process,
sjt =
1 if |∆βjt| > dj,
0 if |∆βjt| ≤ dj,(2.5)
where dj is a coefficient-specific threshold to be estimated and ∆βjt := βjt − βj,t−1.
Equations (2.4) and (2.5) state that if the absolute period-on-period change of βjt ex-
ceeds a threshold dj, we assume that the change in βjt is normally distributed with zero
mean and variance ϑj1. On the contrary, if the change in the parameter is too small, the
innovation variance is set close to zero, effectively implying that βjt ≈ βj,t−1, i.e., almost
no change from period (t− 1) to t.
This modeling approach provides a great deal of flexibility, nesting a plethora of sim-
pler model specifications. The interesting cases are characterized by situations where
sjt equals unity only for some t. For instance, it could be the case that parameters tend
to exhibit strong movements at given points in time but stay constant for the major-
ity of the time. An unrestricted time-varying parameter model would imply that the
parameters are gradually changing over time, depending on the innovation variance in
Eq. (2.2). Another prominent case would be a structural break model with an unknown
number of breaks (for a recent Bayesian exposition, see Koop and Potter, 2007).
5
The mixture innovation component in Eq. (2.4) implies that we discriminate be-
tween two regimes. The first regime assumes that changes in the parameters tend
to be large and important to predict yt whereas in the second regime, these changes
can be safely regarded as zero, thus effectively leading to a constant parameter model
over a given period of time. Compared to a standard mixture innovation model that
postulates sjt as a sequence of independent Bernoulli variables, our approach assumes
that regime shifts are governed by a (conditionally) deterministic law of motion. The
main advantage of our approach relative to mixture innovation models is that instead
of having to estimate a full sequence of sjt for all j, the threshold mixture innovation
model only relies on a single additional parameter per coefficient. This renders estima-
tion of high dimensional models such as vector autoregressions (VARs) feasible. The
additional computational burden turns out to be negligible relative to an unrestricted
TVP-VAR, see Section 2.4 for more information.
Our model is also closely related to the latent thresholding approach put forward in
Nakajima and West (2013a) within the time series context. While in their model latent
thresholding discriminates between the inclusion or exclusion of a given covariate at
time t, our model detects whether the associated regression coefficient is constant or
time-varying.
2.2 The threshold mixture innovation TTVP-VAR model
The model proposed in the previous subsection can be straightforwardly generalized to
the VAR case with stochastic volatility (SV) by assuming that yt is an m-dimensional
response vector. In this case, Eq. (2.1) becomes,
yt = x′tβt + ut, (2.6)
with x′t = IM⊗z′t, where zt = (y′t−1, . . . ,y′t−P )′ includes the P lags of the endogenous
variables.1 The vector βt now contains the dynamic autoregressive coefficients with
dimension K = M2p where each element follows the state equation given by Eqs. (2.2)
to (2.5). The vector of white noise shocks ut is distributed as
ut ∼ N (0m,Σt). (2.7)
Hereby, 0m denotes an m-variate zero vector and Σt = V tH tV′t is a time-varying
variance-covariance matrix. The matrix V t is a lower triangular matrix with unit di-
agonal and H t = diag(eh1t , . . . , ehmt). We assume that the logarithm of the variances
1In the empirical application, we also include an intercept term which we omit here for simplicity.
6
evolves according to
hit = µi + ρi(hi,t−1 + µi) + νit, for i = 1, . . . ,m, (2.8)
with µi and ρi being equation-specific mean and persistence parameters and νit ∼N (0, ζi) is an equation-specific white noise error with variance ζi. For the covariances in
V t we impose the random walk state equation with error variances given by Eq. (2.4).
Conditional on the ordering of the variables it is straightforward to estimate the
TTVP model on an equation-by-equation basis, augmenting the ith equation with the
contemporaneous values of the preceding (i − 1) equations (for i > 1), leading to a
Cholesky-type decomposition of the variance-covariance matrix. Thus, the ith equation
(for i = 2, . . . ,m) is given by
yit = z′itβit + uit. (2.9)
We let zit = (z′t, y1t, . . . , yi−1,t)′, and βit = (β′it, vi1,t, . . . , vii−1,t)
′ is a vector of latent
states with dimension Ki = Mp + i − 1. Here, vij,t denotes the dynamic regression co-
efficients on the jth (for j < i) contemporaneous value showing up in the ith equation.
Note that for the first equation we have z1t = zt and β1t = β1t.
The law of motion of the jth element of βit reads
βij,t = βij,t−1 +√θij,trt, rt ∼ N (0, 1). (2.10)
Hereby, θij,t is defined similarly to Eq. (2.4). In what follows, it proves to be convenient
to stack the states of all equations in a K-dimensional vector βt = (β′1t, . . . , β′Mt)
′ and
let βjt denote the jth element of βt.
While being clearly not order-invariant, this specific way of stating the model yields
two significant computational gains. First, the matrix operations involved in estimat-
ing the latent state vector become computationally less cumbersome. Second, we can
exploit parallel computing and estimate each equation simultaneously on a grid.
2.3 Prior specification
Since our approach to estimation and inference is Bayesian, we have to specify suitable
prior distributions for all parameters of the model.
We impose a Normal-Gamma prior (Griffin and Brown, 2010) on each element of
βi0, the initial state of the ith equation,
β0i,j|τj ∼ N(
0,2
λ2iτ 2ij
), τ 2ij ∼ G(ai, ai), (2.11)
7
for i = 1, . . . ,m; j = 1, . . . , Ki. Hereby, λ2i and ai are hyperparameters and τ 2ij denotes
an idiosyncratic scaling parameter that applies an individual degree of shrinkage on
each element of βi0. The hyperparameter λ2i serves as an equation-specific shrinkage
parameter that shrinks all elements of βi0 that belong to the ith equation towards zero
while the local shrinkage parameters τij provide enough flexibility to also allow for
non-zero values of β0i,j in the presence of a tight global prior specification.
For the equation-specific scaling parameter λ2i we impose a Gamma prior, λ2i ∼G(b0, b1), with b0 and b1 being hyperparameters chosen by the researcher. In typical
applications we specify b0 and b1 to render this prior effectively non-influential.
If the innovation variances of the observation equation are assumed to be con-
stant over time, we impose a Gamma prior on σ−2i with hyperparameters c0 and c1,
i.e., σ−2i ∼ G(c0, c1). By contrast, if stochastic volatility is introduced we follow Kast-
ner and Frühwirth-Schnatter (2014) and impose a normally distributed prior on µi with
mean zero and variance 100, a Beta prior on ρi with (ρi+1)/2 ∼ B(aρ, bρ), and a Gamma
distributed prior on ζi ∼ G(1/2, 1/(2Bζ)).
In principle, the spike variance ϑij,0 could be estimated from the data and a suitable
shrinkage prior could be employed to push ϑij,0 towards zero. However, we follow a
simpler approach and estimate the slab variance ϑij,1 only while setting ϑij,0 = ξ × ϑij.Here, ϑij denotes the variance of the OLS estimate for automatic scaling which we
treat as a constant specified a priori. The multiplier ξ is set to a fixed constant close
to zero, effectively turning off any time-variation in the parameters. As long as ϑij,0is not chosen too large, the specific value of the spike variance proves to be rather
non-influential in the empirical applications that follow.
We use a Gamma distributed prior on the inverse of the innovation variances in the
state specification in Eq. (2.2), i.e., ϑ−1ij,1 ∼ G(rij,0, rij,1) for i = 1, . . . ,m; j = 1, . . . , Ki.2
Again, rij,0 and rij,1 denote scalar hyperparameters. This choice implies that we ar-
tificially bound ϑij,1 away from zero, implying that in the upper regime we do not
exert strong shrinkage. This is in contrast to a standard time-varying parameter model,
where this prior is usually set rather tight to control the degree of time variation in
the parameters (see, e.g., Primiceri, 2005). Note that in our model the degree of time
variation is governed by the thresholding mechanism instead.
2Of course, it would also be possible to use a (restricted) Gamma prior on ϑij,1 in the spirit ofFrühwirth-Schnatter and Wagner (2010). However, we have encountered some issues with such a priorif the number of observations in the regime associated with sij,t = 1 is small. This stems from the factthat the corresponding conditional posterior distribution is generalized inverse Gaussian, a distributionthat is heavy tailed and under certain conditions leads to excessively large draws of ϑij,1.
8
Finally, the prior specification of the baseline model is completed by imposing a
uniform distributed prior on the thresholds,
dij ∼ U(πij,0, πij,1) for j = 1, . . . , Ki. (2.12)
Here, πij,0 and πij,1 denote the boundaries of the prior that have to be specified care-
fully. In our examples, we use π0i,j = 0.1 ×√ϑij,1 and πij,1 = 1.5 ×
√ϑij,1. This prior
bounds the thresholds away from zero, implying that a certain amount of shrinkage is
always imposed on the autoregressive coefficients. Setting πij,0 = 0 for all i, j would
also be a feasible option but we found in simulations that being slightly informative on
the presence of a threshold improves the empirical performance of the proposed model
markedly. It is worth noting that even under the assumption that π0j > 0, our frame-
work performs well in simulations where the data is obtained from a non-thresholded
version of our model. This stems from the fact that in a situation where parameters are
expected to evolve smoothly over time, the average period-on-period change of βij,t is
small, implying that 0.1×√ϑij,1 is close to zero and the model effectively shrinks small
parameter movements to zero.
2.4 Posterior simulation
We sample from the joint posterior distribution of the model parameters by utilizing a
Markov chain Monte Carlo (MCMC) algorithm. Conditional on the thresholds dij, the
remaining parameters can be simulated in a straightforward fashion. After initializing
the parameters using suitable starting values we iterate between the following six steps.
1. We start with equation-by-equation simulation of the full history βitt=0,1,...,T
by means of a standard forward filtering backward sampling algorithm (Carter
and Kohn, 1994; Frühwirth-Schnatter, 1994) while conditioning on the remaining
parameters of the model.
2. The inverse of the innovation variances of Eq. (2.2), ϑ−1ij , i = 1, . . . ,m; j =
1, . . . , Ki have conditional density
p(ϑ−1ij |•) = p(ϑ−1ij |dij,β) ∝ p(β|ϑ−1ij , dij)p(dij|ϑ−1ij )p(ϑ−1ij ),
which turns out to be a Gamma distribution, i.e.,
ϑ−1ij |• ∼ G
(rij,0 +
Tij,12
+1
2, rij,1 +
∑Tt=1 sij,t(βij,t − βij,t−1)2
2
), (2.13)
9
with Tij,t =∑T
t=1 sij,t denoting the number of time periods that feature time vari-
ation in the jth parameter and the ith equation.
3. Combining the Gamma prior on τ 2ij with the Gaussian likelihood yields a General-
ized Inverted Gaussian (GIG) distribution
τ 2ij|• ∼ GIG(aij −
1
2, β2
ij,0, aijλ2i
), (2.14)
where the density of the GIG(κ, χ, ψ) distribution is proportional to
zκ−1 exp
−1
2
(χz
+ ψz)
. (2.15)
To sample from this distribution, we use the R package GIGrvg (Leydold and Hör-
mann, 2015) implementing the efficient rejection sampler proposed by Hörmann
and Leydold (2013).
4. The global shrinkage parameter λ2i is sampled from a Gamma distribution given
by
λ2i |• ∼ G
(b0 + aiKi, b1 +
ai2
Ki∑j=1
τ 2ij
). (2.16)
5. We update the thresholds by applying Ki Griddy Gibbs steps (Ritter and Tanner,
1992) per equation. Due to the structure of the model, the conditional distribu-
tion of βij,1:T = (βij,1, . . . , βij,T )′ is
p(βij,1:T |dij, ϑij
)∝
T∏t=1
1√2πθij,t
exp
−(βijt − βij,t−1)2
2θij,t
. (2.17)
This expression can be straightforwardly combined with the prior in Eq. (2.12)
to evaluate the conditional posterior of dij at a given candidate point. The pro-
cedure is repeated over a fine grid of values that is determined by the prior and
an approximation to the inverse cumulative distribution function of the posterior
is constructed. Finally, this approximation is used to perform inverse transform
sampling.
6. The coefficients of the log-volatility equation and the corresponding history of
the log-volatilities are sampled by means of the algorithm brought forward by
Kastner and Frühwirth-Schnatter (2014) which is efficiently implemented in the
R package stochvol (Kastner, 2016). Under homoscedasticity, σ−2i is simulated
from σ−2i |• ∼ G(c0 + T/2, c1 +
∑Tt=1(yit − z′itβit)2/2
).
10
After obtaining an appropriate number of draws, we discard the first N as burn-in and
base our inference on the remaining draws from the joint posterior.
In comparison with standard TVP-VARs, Step (5) is the only additional MCMC step
needed to estimate the proposed TTVP model. Moreover, note that this update is com-
putationally cheap, increasing the amount of time needed to carry out the analysis
conducted in Section 4 by around five percent. For larger models (i.e., with m being
around 15) this step becomes slightly more intensive but, relative to the additional com-
putational burden introduced by applying the FFBS algorithm in Step (1), its costs are
still comparably small relative to the overall computation time needed. We found that
mixing and convergence properties of our proposed algorithm are similar to standard
Bayesian TVP-VAR estimators. In other words, the sampling of the thresholds does not
seem to substantially increase the autocorrelation of the MCMC draws. The TTVP al-
gorithm is bundled into the R package threshtvp which is available from the authors
upon request.
3 An illustrative example
In this section we illustrate our approach by means of a rather stylized example that
emphasizes how well the mixture innovation component for the state innovations per-
forms when used to approximate different data generating processes (DGPs).
For demonstration purposes it proves to be convenient to start with the following
simple DGP with K = 1:
yt = x′1tβ1t + ut, ut ∼ N (0, 0.012),
β1t = β1t−1 + e1t, e1t ∼ N (0, s1t × 0.152),
where s1t ∈ 0, 1 is chosen at random to yield paths which are characterized by many,
moderately many, and few breaks. Finally, independently for all t, we generate x1t ∼U(−1, 1) and set β1,0 = 0.
Fig. 1 shows three possible realizations of β1t and the corresponding estimates ob-
tained from a standard TVP model and our TTVP model. To ease comparison between
the models we impose a similar prior setup for both models. Specifically, for σ−2 we
set c0 = 0.01 and c1 = 0.01, implying a rather vague prior. For the shrinkage part on
β1,0 we set λ2 ∼ G(0.01, 0.01) and a1 = 0.1, effectively applying heavy shrinkage on
the initial state of the system. The prior on ϑ1 is specified as in Nakajima and West
(2013a), i.e., ϑ−11 ∼ G(3, 0.03). To complete the prior setup for the TTVP model we set
π1,0 = 0.1×√ϑ1 and π1,1 = 1.5×
√ϑ1.
11
0.0
0.2
0.4
0.6
0.8
1.0
0 100 200 300 400 500
−1.
5−
1.0
−0.
50.
00.
51.
0
(a) Posterior median and true value of β1
0 100 200 300 400 500
−0.
15−
0.10
−0.
050.
000.
050.
100.
15
(b) Demeaned posterior distribution of β1
0.0
0.2
0.4
0.6
0.8
1.0
0 100 200 300 400 500
−0.
4−
0.2
0.0
0.2
0.4
(a) Posterior median and true value of β1
0 100 200 300 400 500
−0.
10−
0.05
0.00
0.05
0.10
(b) Demeaned posterior distribution of β1
0.0
0.2
0.4
0.6
0.8
1.0
0 100 200 300 400 500
−0.
8−
0.6
−0.
4−
0.2
0.0
(a) Posterior median and true value of β1
0 100 200 300 400 500
−0.
050.
000.
05
(b) Demeaned posterior distribution of β1
Fig. 1: Left: Evolution of the actual state vector (dotted black) along with the posteriormedians of the TVP model (dashed blue) and the TTVP model (solid red). TheTTVP posterior moving probability is indicated by areas shaded in gray. Right: De-meaned posterior distribution of the TVP model (90% credible intervals in shadedblue) and the TTVP model (90% credible intervals in red).
12
The left panel of Fig. 1 displays the evolution of the posterior median of a standard
TVP model (in dotted blue) and of the TTVP model (in solid red) along with the actual
evolution of the state vector (in dotted black). In addition, the areas shaded in gray
depict the probability that a given coefficient moves over a certain time frame (hence-
forth labeled as posterior moving probability, PMP). The right panel shows de-meaned
90% credible intervals of the coefficients from the TVP model (blue shaded area) and
the TTVP model (solid red lines).
At least two interesting findings emerge. First, note that in all three cases, our ap-
proach detects parameter movements rather well, with the PMP reaching unity in virtu-
ally all time points that feature a structural break of the corresponding parameter. The
TVP model also tracks the actual movement of the states well but does so with much
more high frequency variation. This is a direct consequence of the inverted Gamma
prior on the state innovation variances that bound ϑ1 artificially away from zero, irre-
spective of the information contained in the likelihood (see Frühwirth-Schnatter and
Wagner, 2010, for a general discussion of this issue).
Second, looking at the uncertainty surrounding the median estimate (right panel of
Fig. 1) reveals that our approach succeeds in shrinking the posterior variance. This is
due to the fact that in periods where the true value of β1t is constant, our model suc-
cessfully assumes that the estimate of the coefficient at time t is also constant, whereas
the TVP model imposes a certain amount of time variation. This generates additional
uncertainty that inflates the posterior variance, possibly leading to imprecise inference.
Thus, the TTVP model detects change points in the parameters in situations where
the actual number of breaks is small, moderate and large. In situations where the
DGP suggests that the actual threshold equals zero, our approach still captures most of
medium to low frequency noise but shrinks small movements that might, in any case,
be less relevant for econometric inference.
4 Empirical application: Macroeconomic forecasting and structural change
4.1 Model specification and data
We use an extended version of the US macroeconomic data set employed in Smets and
Wouters (2007), Geweke and Amisano (2012) and Amisano and Geweke (2017). Data
are on a quarterly basis, span the period from 1947Q2 to 2014Q4, and comprise the
log differences of consumption, investment, real GDP, hours worked, consumer prices
and real wages. Last, and as a policy variable, we include the Federal Funds Rate (FFR)
in levels. In the next subsections we investigate structural breaks in macroeconomic
relationships by means of forecasting and impulse response analysis.
13
Following Primiceri (2005), we include p = 2 lags of the endogenous variables. The
prior setup is similar to the one adopted in the previous sections, except that now all
hyperparameters are equation-specific and feature an additional index i = 1, . . . ,m.
More specifically, for all applicable i and j, we use the following values for the hy-
perparameters. For the shrinkage part on the initial state of the system, we again
set λ2i ∼ G(0.01, 0.01) and ai = 0.1, and the prior on ϑij is specified to be informa-
tive with ϑ−1ij ∼ G(3, 0.03). For the parameters of the log-volatility equation we use
µi ∼ N (0, 102), ρi+12∼ B(25, 5), and ζi ∼ G(1/2, 1/2). The last ingredient missing is the
prior on the thresholds where we set πij,0 = 0.1×√ϑij,1 and πij,1 = 1.5×
√ϑij,1.
For the seven-variable VAR we draw 500 000 samples from the joint posterior and
discard the first 400 000 draws as burn-in. Finally, we use thinning such that inference
is based on 5000 draws out of 100 000 retained draws.
4.2 Forecasting evidence
We start with a simple forecasting exercise of one-step-ahead predictions. For that
purpose we use an expanding window and a hold-out sample of 100 quarters. Forecasts
are evaluated using log-predictive Bayes factors, which are defined as the difference of
log predictive scores (LPS) of a specification of interest and a benchmark model. The
log-predictive score is a widely used metric to measure density forecast accuracy (see
e.g., Geweke and Amisano, 2010).
As the benchmark model, we use a TVP-VAR with relatively little shrinkage. This
amounts to setting the thresholds equal to zero and specify the prior on ϑ−1ij ∼ G(3, 0.03).
We, moreover, include two additional constant parameter competitors, namely a Minne-
sota-type VAR (Doan, Litterman, and Sims, 1984) and a Normal-Gamma (NG) VAR
(Huber and Feldkircher, 2017). All models feature stochastic volatility. In order to
assess the impact of different prior hyperparameters on ϑij and the impact of ξ, we
estimate the TTVP model over a grid of meaningful values.
Table 1 depicts the LPS differences between a given model and the benchmark
model. First, we see that all models outperform the no-shrinkage time-varying pa-
rameter VAR as indicated by positive values of the log-predictive Bayes factors. Second,
constant parameter VARs with shrinkage turn out to be hard to beat. Especially the
hierarchical Minnesota prior does a good job with respect to one-quarter-ahead fore-
casts. For the TTVP model we see that forecast performance also varies with the prior
specification. More specifically, the results show that increasing ξ, which implies more
time variation in the lower regime a-priori, deteriorates the forecasting performance.
This is especially true if a large value for ξ is coupled with small values of rij,0 and rij,1
14
rij,0 = 3
rij,1 = 0.03
rij,0 = 1.5
rij,1 = 1
rij,0 = 0.001
rij,1 = 0.001
ξ = ξ1 = 10−6 169.06 168.75 169.16ξ = ξ2 = 10−5 170.80 170.60 173.87ξ = ξ3 = 10−4 170.95 172.31 158.44ξ = ξ4 = 10−3 130.45 163.78 137.53
NG MinnesotaBVAR 173.77 177.20
Table 1: Log predictive Bayes factors relative to a time-varying parameter VAR withoutshrinkage for different key parameters of the model. The final row refers to the logpredictive Bayes factor of a BVAR equipped with a Normal-Gamma (NG) shrinkageprior and a hierarchical Minnesota prior. All models estimated with stochasticvolatility. Numbers greater than zero indicate that a given model outperforms thebenchmark.
– the latter referring to the a priori belief of large swings of coefficients in the upper
regime of the model.
To investigate the predictive performance of the different model specifications fur-
ther, Fig. 2(a) shows the log predictive Bayes factors relative to the benchmark model
over time. The plot shows that the specifications with ξ1 and ξ2 excel during most
of the sample period, irrespective of the prior on ϑij. The constant parameter mod-
els, by contrast, dominate only during two very distinct periods of our sample, namely
at the beginning and at the end of the time span covered. In both periods, no se-
vere up or downswings in economic activity occur and the constant parameter models
with SV display excellent predictive capabilities. By contrast, during volatile periods –
such as the global financial crisis – our modeling approach seems to pay off in terms
of predictive accuracy. To investigate this in more detail, we focus on the forecasting
performance of the different model specifications during the period from 2006Q1 to
2010Q1 in Fig. 2(b). Here we see that TTVP specifications with ξj for j < 4 outperform
all remaining competitors. This additional, and more detailed, look at the forecasting
performance during turbulent times thus reveals that the TTVP model is a valuable
alternative to simpler models. Put differently, we observe that during more volatile
periods the TTVP model can severely outperform constant parameter models, while in
tranquil times its forecasts are never far off.
Next, we examine turning point forecasts, since the detection of structural breaks
might be a further useful application of the TTVP framework. We focus on turning
points in real GDP growth and follow Canova and Ciccarelli (2004) to label time point
(t + 1) a downward turning point – conditional on the information up to time t – if
St+1, the growth rate of real GDP at time (t + 1), satisfies that St−2 < St, St−1 < St,
15
Time
2000 2005 2010 2015
050
100
150
200
250
300
350
ξ1 r0 =3, r1 =0.03ξ1 r0 =1.5, r1 =1ξ1 r0 =0.001, r1 =0.001ξ2 r0 =3, r1 =0.03ξ2 r0 =1.5, r1 =1ξ2 r0 =0.001, r1 =0.001ξ3 r0 =3, r1 =0.03ξ3 r0 =1.5, r1 =1ξ3 r0 =0.001, r1 =0.001ξ4 r0 =3, r1 =0.03ξ4 r0 =1.5, r1 =1ξ4 r0 =0.001, r1 =0.001MinnesotaNG
(a) Full evaluation period (1995Q1 to 2014Q4).
Time
2006 2007 2008 2009 2010
140
160
180
200
ξ1 r0 =3, r1 =0.03ξ1 r0 =1.5, r1 =1ξ1 r0 =0.001, r1 =0.001ξ2 r0 =3, r1 =0.03ξ2 r0 =1.5, r1 =1ξ2 r0 =0.001, r1 =0.001ξ3 r0 =3, r1 =0.03ξ3 r0 =1.5, r1 =1ξ3 r0 =0.001, r1 =0.001ξ4 r0 =3, r1 =0.03ξ4 r0 =1.5, r1 =1ξ4 r0 =0.001, r1 =0.001MinnesotaNG
(b) Crisis period only (2006Q1 to 2010Q1).
Fig. 2: Log predictive Bayes factor relative to a TVP-VAR-SV model.
16
Downturns Upturnsrij,0 = 3
rij,1 = 0.03
rij,0 = 1.5
rij,1 = 1
rij,0 = 0.001
rij,1 = 0.001
rij,0 = 3
rij,1 = 0.03
rij,0 = 1.5
rij,1 = 1
rij,0 = 0.001
rij,1 = 0.001
ξ = ξ1 = 10−6 0.66 0.67 0.67 0.84 0.83 0.83ξ = ξ2 = 10−5 0.66 0.66 0.67 0.83 0.83 0.81ξ = ξ3 = 10−4 0.64 0.65 0.68 0.83 0.84 0.80ξ = ξ4 = 10−3 0.87 0.67 0.78 0.81 0.83 0.78
NG Minnesota NG MinnesotaBVAR 0.62 0.62 0.84 0.93
Table 2: QPS scores relative to a time-varying parameter VAR without shrinkage fordifferent key parameters of the model. The final row refers to the QPS score ofa BVAR equipped with a Normal-Gamma (NG) shrinkage prior and a hierarchicalMinnesota prior. All models estimated with stochastic volatility. Numbers belowunity indicate that a given model outperforms the benchmark.
and St > St+1. Analogously, the time point (t + 1) is labeled an upward turning pointif St−2 > St, St−1 > St, and St < St+1. Equipped with these definitions, we then can
split the total number of turning points up into upturns and downturns and compute
the quadratic probability (QPS) scores as an accuracy measure of upturn and downturn
probability forecasts. The results are provided in Table 2.
The picture that arises is similar to that of the density forecasting exercise: all vari-
ants of the TTVP model beat the no-shrinkage time-varying parameter VAR model.
Turning point forecasts deteriorate for larger values of ξ and especially so if they are
coupled with small choices for rij, yielding a relatively uninformative prior on ϑij and
consequently little shrinkage. Forecast gains relative to the benchmark model are more
sizable for downward than for upward turning points. In comparison to the two con-
stant parameter competitors, the TTVP model excels in predicting upward turning
points (for which there are more observations in the sample), while forecast perfor-
mance is slightly inferior for downward forecasts. Also note that for downward predic-
tions, penalizing time-variation seems to be essential and consequently the strongest
performance among TTVP specifications is achieved for small values of ξ. The opposite
is the case for upward turning points where reasonable predictions can be also achieved
with a rather loose prior.
4.3 Detecting structural breaks in US data
In this section we aim to have a closer and more systematic look at changes in the joint
dynamics of our seven variable TTVP-VAR model for the US economy. To that end, we
examine the posterior mean of the determinant of the time-varying variance-covariance
matrix of the innovations in the state equation (Cogley and Sargent, 2005). For each
draw of Ωit = diag(θi1,t, . . . , θKi1,t) we compute its log-determinant and subtract the
17
mean across time. Large values of this measure point towards a pronounced degree of
time-variation in the autoregressive coefficients of the corresponding equations. The
results are provided in Fig. 3 for each equation and the full system.
For all variables we see at least one prominent spike during the sample period in-
dicating a structural break. Most spikes in the determinant occur around 1980, when
then Fed chairman Paul Volcker sharply increased short-term interest rates to fight infla-
tion. Other breaks relate to the dot-com bubble in the early 2000s (consumption), the
oil price crisis and stock market crash in the early 1970s (hours worked) and another
oil price related crisis in the early 1990s. Also, the transition from positive interest rates
to the zero lower bound in the midst of the global financial crisis is indicated by a spike
in the determinant. That we can relate spikes to historical episodes of financial and
economic distress lends further confidence in the modeling approach. Among these
periods, the early 1980s seem to have constituted by far the most severe rupture for
the US economy.
4.4 Impulse responses to a monetary policy shock
In this section we examine the dynamic responses of a set of macroeconomic variables
to a contractionary monetary policy shock. The monetary policy shock is calibrated as a
100 basis point (bp) increase in the FFR and identified using a Cholesky ordering with
the variables appearing in exactly the same order as mentioned above. This ordering is
in the spirit of Christiano, Eichenbaum, and Evans (2005) and has been subsequently
used in the literature (see Coibion, 2012, for an excellent survey). Drawing on the
results of the previous section, we focus on two sub-sets of the sample, namely the pre-
Volcker period from 1947Q4 to 1979Q1 and the rest of the sample.3 The time-varying
impulse responses – as functions of horizons – are displayed in Fig. 4. Additionally, we
also include impulse responses for different horizons – as functions of time – over the
full sample period in Fig. 5 and Fig. 6.
In Fig. 4 we investigate whether the size and the shape of responses varies between
and within the two sub-samples. For that purpose we show median responses over the
first sample split in the top row and for the second part of the sample in the bottom row
of Fig. 4. Impulse responses that belong to the beginning of a sample split are depicted
in light yellow, those that belong to the end of the sample period in dark red. To fix
ideas, if the size of a response increases continuously over time we should see a smooth
darkening of the corresponding impulse from light yellow to dark red. Considering, for
instance, hours worked, this phenomenon can clearly be seen in in the second sample
period from 1979Q2 to 2014Q4, where the median response changes gradually from
3The split into two sub-sets is conducted for interpretation purposes only. For estimation, the entiresample has been used.
18
1950 1960 1970 1980 1990 2000 2010
0.8
1.0
1.2
1.4
1.6
(a) Consumption
1950 1960 1970 1980 1990 2000 2010
0.5
1.0
1.5
2.0
2.5
(b) Investment
1950 1960 1970 1980 1990 2000 2010
0.95
1.00
1.05
1.10
1.15
1.20
(c) Output
1950 1960 1970 1980 1990 2000 2010
0.9
1.0
1.1
1.2
1.3
1.4
(d) Hours worked
1950 1960 1970 1980 1990 2000 2010
12
34
5
(e) Inflation
1950 1960 1970 1980 1990 2000 2010
0.5
1.0
1.5
2.0
2.5
3.0
(f) Real wages
1950 1960 1970 1980 1990 2000 2010
0.5
1.0
1.5
2.0
2.5
3.0
(g) FFRTime
1950 1960 1970 1980 1990 2000 2010
02
46
810
12
(h) Overall
Fig. 3: Posterior mean of the determinant of time-varying variance-covariance ma-trix of the innovations to the state equation from 1947Q2 to 2014Q4. Values areobtained by taking the exponential of the demeaned log-determinant across equa-tions. Gray shaded areas refer to US recessions dated by the NBER business cycledating committee.
19
1947Q4 to 1979Q1
−0.
08−
0.04
0.00
0.02
0 4 10 15
Consumption
−1.
2−
0.8
−0.
40.
0
0 4 10 15
Investment
−0.
06−
0.04
−0.
020.
00
0 4 10 15
Output
−0.
15−
0.10
−0.
050.
00
0 4 10 15
Hours worked
0.0
0.2
0.4
0.6
0 4 10 15
Inflation
−0.
10.
00.
10.
2
0 4 10 15
Real wages0.
51.
01.
5
0 4 10 15
FFR
1979Q2 to 2014Q4
−0.
08−
0.04
0.00
0.02
0 4 10 15
Consumption
−1.
2−
0.8
−0.
40.
0
0 4 10 15
Investment
−0.
06−
0.04
−0.
020.
00
0 4 10 15
Output
−0.
15−
0.10
−0.
050.
00
0 4 10 15
Hours worked
0.0
0.2
0.4
0.6
0 4 10 15
Inflation
−0.
10.
00.
10.
2
0 4 10 15
Real wages
0.5
1.0
1.5
0 4 10 15
FFR
Fig. 4: Posterior median impulse response functions over two sample splits, namelythe pre-Volcker period (1947Q4 to 1979Q1) and the rest of the sample period(1979Q2 to 2014Q4). The coloring of the impulse responses refer to their timing:light yellow stands for the beginning of the sample split, dark red stands for theend of sample split. For reference, 68% credible intervals over the average of thesample period provided (dotted black lines).
20
−0.20−0.15−0.10−0.050.000.05
Con
sum
ptio
n
1947
1954
1960
1966
1972
1979
1985
1991
1997
2004
2010
t= 4
−2.0−1.5−1.0−0.50.0
Inve
stm
ent
1947
1954
1960
1966
1972
1979
1985
1991
1997
2004
2010
−0.10−0.050.00
Out
put
1947
1954
1960
1966
1972
1979
1985
1991
1997
2004
2010
−0.50.00.51.0
Infla
tion
1947
1954
1960
1966
1972
1979
1985
1991
1997
2004
2010
−0.20−0.15−0.10−0.050.000.05
1947
1954
1960
1966
1972
1979
1985
1991
1997
2004
2010
t= 8
−2.0−1.5−1.0−0.50.0
1947
1954
1960
1966
1972
1979
1985
1991
1997
2004
2010
−0.10−0.050.00
1947
1954
1960
1966
1972
1979
1985
1991
1997
2004
2010
−0.50.00.51.0
1947
1954
1960
1966
1972
1979
1985
1991
1997
2004
2010
−0.20−0.15−0.10−0.050.000.05
1947
1954
1960
1966
1972
1979
1985
1991
1997
2004
2010
t= 12
−2.0−1.5−1.0−0.50.0
1947
1954
1960
1966
1972
1979
1985
1991
1997
2004
2010
−0.10−0.050.00
1947
1954
1960
1966
1972
1979
1985
1991
1997
2004
2010
−0.50.00.51.0
1947
1954
1960
1966
1972
1979
1985
1991
1997
2004
2010
Fig.
5:Po
ster
ior
med
ian
resp
onse
sto
a+
100
bpm
onet
ary
polic
ysh
ock,
afte
r4
(top
pane
ls),
8(m
iddl
epa
nels
)an
d12
(bot
tom
pane
ls)
quar
ters
.Sh
aded
area
sco
rres
pond
to90
%(d
ark
red)
and
68%
(lig
htre
d)cr
edib
lese
ts.
21
−0.30−0.25−0.20−0.15−0.10−0.050.00
Hou
rs w
orke
d
1947
1952
1956
1961
1965
1970
1974
1979
1983
1988
1992
1997
2001
2006
2010
t= 4
−0.4−0.20.00.20.40.6
Rea
l wag
es
1947
1952
1956
1961
1965
1970
1974
1979
1983
1988
1992
1997
2001
2006
2010
0.51.01.5
Fed
eral
fund
s ra
te
1947
1952
1956
1961
1965
1970
1974
1979
1983
1988
1992
1997
2001
2006
2010
−0.30−0.25−0.20−0.15−0.10−0.050.00
1947
1952
1956
1961
1965
1970
1974
1979
1983
1988
1992
1997
2001
2006
2010
t= 8
−0.4−0.20.00.20.40.6
1947
1952
1956
1961
1965
1970
1974
1979
1983
1988
1992
1997
2001
2006
2010
0.51.01.5
1947
1952
1956
1961
1965
1970
1974
1979
1983
1988
1992
1997
2001
2006
2010
−0.30−0.25−0.20−0.15−0.10−0.050.00
1947
1952
1956
1961
1965
1970
1974
1979
1983
1988
1992
1997
2001
2006
2010
t= 12
−0.4−0.20.00.20.40.6
1947
1952
1956
1961
1965
1970
1974
1979
1983
1988
1992
1997
2001
2006
2010
0.51.01.5
1947
1952
1956
1961
1965
1970
1974
1979
1983
1988
1992
1997
2001
2006
2010
Fig.
6:Po
ster
ior
med
ian
resp
onse
sto
a+
100
bpm
onet
ary
polic
ysh
ock,
afte
r4
(top
pane
ls),
8(m
iddl
epa
nels
)an
d12
(bot
tom
pane
ls)
quar
ters
.Sh
aded
area
sco
rres
pond
to90
%(d
ark
red)
and
68%
(lig
htre
d)cr
edib
lese
ts.
22
slightly negative to substantially negative. On the other hand, abrupt changes are also
clearly visible, see e.g., the drastic change of the inflation response from 1979Q1 (the
last quarter in the first sample) to 1979Q2 (the first quarter in the second sampler),
dropping from substantially positive to just above zero within one quarter (see also
Fig. 5).
Considering the dynamic responses across different angles, we find three regular-
ities which are worth emphasizing. The first concerns the overall effects of the mon-
etary policy shock. Note that an unexpected rate increase deters investment growth,
hours worked and consequently overall output growth for both sample splits. These
results are reasonable from an economic perspective. Also, estimated effects on output
growth and inflation are comparable to those of Baumeister and Benati (2013) who
use a TVP-VAR framework and US data. Responses of consumption growth tend to be
accompanied by wide credible sets. The same applies to inflation and real wages.
Second, we examine changes in responses over time for the first sub-period. One of
the variables that shows a great deal of variation in magnitudes is the response of infla-
tion. Here, effects become increasingly positive the further one moves from 1947Q4 to
1979Q1 and the shades of the responses turn continuously darker. These results imply
a severe “price puzzle”. While overall credible sets for the sub-sample are wide, positive
responses for inflation and thus the price puzzle are estimated over the period from the
mid-1960s to the beginning of the 1980s (see also Fig. 5). A similar picture arises when
looking at consumption growth. During the first sample split, effects become increas-
ingly more negative, but responses are only precisely estimated for the period from the
mid-1960s to the beginning of the 1980s. This might be explained by the fact that the
monetary policy driven increase in inflation spurs consumption since saving becomes
less attractive.
Third, we focus on the results over the more recent second sample split from 1979Q2
to 2014Q4. Paul Volcker’s fight against inflation had some bearings on overall macroe-
conomic dynamics in the USA. With the onset of the 1980s, the aforementioned price
puzzle starts to disappear (in the sense that effects are surrounded by wide credible
sets and medium responses increasingly negative). There is also a great deal of time
variation evident in other responses, mostly becoming increasingly negative. Put differ-
ently, the effectiveness of monetary policy seems to be higher in the more recent sample
period than before. This can be seen by effects on hours worked, investment growth
and output growth. That the effects of a hypothetical monetary policy shock on output
growth are particular strong after the crisis corroborates findings of Baumeister and
Benati (2013) and Feldkircher and Huber (2016). The latter argue that this is related
to the zero lower bound period: after a prolonged period of unaltered interest rates, a
23
deviation from the (long-run) interest rate mean can exert considerable effects on the
macroeconomy.
5 Closing remarks
This paper puts forth a novel approach to estimate time-varying parameter models in
a Bayesian framework. We assume that the state innovations are following a threshold
model where the threshold variable is the absolute period-on-period change of the
corresponding states. This implies that if the (proposed) change is sufficiently large,
the corresponding variance is set to a value greater than zero. Otherwise, it is set close
to zero which implies that the states remained virtually constant from (t− 1) to t. Our
framework is capable of discriminating between a plethora of competing specifications,
most notably models that feature many, moderately many, and few structural breaks in
the regression parameters.
We also propose a generalization of our model to the VAR framework with stochastic
volatility. In an application to the US macroeconomy, we examine the usefulness of
the TTVP-VAR in terms of forecasting, turning point prediction, and structural impulse
response analysis. Our results show that the model yields precise forecasts, especially
so during more volatile times such as witnessed in 2008. For that period, the forecast
gain over simpler constant parameter models is particularly high. We then proceed by
investigating turning point predictions, and observe excellent performance of the TTVP
model, in particular for upturn predictions. Finally, we examine impulse responses to a
+100 basis points contractionary monetary policy shock focusing on two sub-periods of
our sample span, the pre-Volcker period and the rest of the sample. Our results reveal
significant evidence for a severe price puzzle during episodes of the pre-Volcker period.
The positive effect of the rate increase in inflation disappears in the second half of our
sample. Modeling changes in responses over the two sub-periods only, such as in a
regime switching model, however, would be too simplistic, as we do find also a lot of
time variation within each sub-period. For example, we find increasing effectiveness of
monetary policy in terms of output, investment growth, and hours worked in the more
recent sub-period. This is especially true for the period after the global financial crisis
in which the Federal Funds Rate has been tied to zero. For that period, a hypothetical
deviation from the zero lower bound would create pronounced effects on the wider
macroeconomy.
24
6 Acknowledgments
We sincerely thank the participants of the WU Brown Bag Seminar of the Institute of
Statistics and Mathematics, the 3rd Vienna Workshop on High-Dimensional Time Series
in Macroeconomics and Finance 2017, the NBP Workshop on Forecasting 2017, and in
particular Sylvia Frühwirth-Schnatter, for many helpful comments and suggestions that
improved the paper significantly.
References
AMISANO, G., AND J. GEWEKE (2017): “Prediction using several macroeconomic mod-els,” Review of Economics and Statistics, 99(5), 912–925.
BAUMEISTER, C., AND L. BENATI (2013): “Unconventional Monetary Policy and theGreat Recession: Estimating the Macroeconomic Effects of a Spread Compression atthe Zero Lower Bound,” International Journal of Central Banking, 9(2), 165–212.
BELMONTE, M. A., G. KOOP, AND D. KOROBILIS (2014): “Hierarchical Shrinkage inTime-Varying Parameter Models,” Journal of Forecasting, 33(1), 80–94.
BITTO, A., AND S. FRÜHWIRTH-SCHNATTER (2016): “Achieving shrinkage in a time-varying parameter model framework,” arXiv pre-print 1611.01310.
CANOVA, F., AND M. CICCARELLI (2004): “Forecasting and turning point predictions ina Bayesian panel VAR model,” Journal of Econometrics, 120(2), 327–359.
CARTER, C. K., AND R. KOHN (1994): “On Gibbs sampling for state space models,”Biometrika, 81(3), 541–553.
CHRISTIANO, L. J., M. EICHENBAUM, AND C. L. EVANS (2005): “Nominal Rigidities andthe Dynamic Effects of a Shock to Monetary Policy,” Journal of Political Economy,113(1), 1–45.
COGLEY, T., AND T. J. SARGENT (2002): “Evolving post-world war II US inflation dy-namics,” in NBER Macroeconomics Annual 2001, Volume 16, pp. 331–388. MIT Press.
(2005): “Drifts and volatilities: monetary policies and outcomes in the postWWII US,” Review of Economic Dynamics, 8(2), 262–302.
COIBION, O. (2012): “Are the Effects of Monetary Policy Shocks Big or Small?,” Ameri-can Economic Journal: Macroeconomics, 4(2), 1–32.
D’AGOSTINO, A., L. GAMBETTI, AND D. GIANNONE (2013): “Macroeconomic forecastingand structural change,” Journal of Applied Econometrics, 28(1), 82–101.
DOAN, T. R., B. R. LITTERMAN, AND C. A. SIMS (1984): “Forecasting and ConditionalProjection Using Realistic Prior Distributions,” Econometric Reviews, 3, 1–100.
EISENSTAT, E., J. C. CHAN, AND R. W. STRACHAN (2016): “Stochastic model specifica-tion search for time-varying parameter VARs,” Econometric Reviews, pp. 1–28.
FELDKIRCHER, M., AND F. HUBER (2016): “Unconventional US Monetary Policy: NewTools, Same Channels?,” Working Papers 208, Oesterreichische Nationalbank (Aus-trian Central Bank).
FRÜHWIRTH-SCHNATTER, S. (1994): “Data augmentation and dynamic linear models,”Journal of time series analysis, 15(2), 183–202.
FRÜHWIRTH-SCHNATTER, S., AND H. WAGNER (2010): “Stochastic model specificationsearch for Gaussian and partial non-Gaussian state space models,” Journal of Econo-metrics, 154(1), 85–100.
25
GERLACH, R., C. CARTER, AND R. KOHN (2000): “Efficient Bayesian inference for dy-namic mixture models,” Journal of the American Statistical Association, 95(451),819–828.
GEWEKE, J., AND G. AMISANO (2010): “Comparing and evaluating Bayesian predictivedistributions of asset returns,” International Journal of Forecasting, 26(2), 216–230.
(2012): “Prediction with misspecified models,” The American Economic Review,102(3), 482–486.
GIORDANI, P., AND R. KOHN (2012): “Efficient Bayesian inference for multiple change-point and mixture innovation models,” Journal of Business & Economic Statistics.
GRIFFIN, J. E., AND P. J. BROWN (2010): “Inference with normal-gamma prior distri-butions in regression problems,” Bayesian Analysis, 5(1), 171–188.
HÖRMANN, W., AND J. LEYDOLD (2013): “Generating generalized inverse Gaussianrandom variates,” Statistics and Computing, 24(4), 1–11.
HUBER, F., AND M. FELDKIRCHER (2017): “Adaptive shrinkage in Bayesian vector au-toregressive models,” Journal of Business and Economic Statistics, forthcoming.
KALLI, M., AND J. E. GRIFFIN (2014): “Time-varying sparsity in dynamic regressionmodels,” Journal of Econometrics, 178(2), 779–793.
KASTNER, G. (2016): “Dealing with stochastic volatility in time series using the R pack-age stochvol,” Journal of Statistical Software, 69(5), 1–30.
KASTNER, G., AND S. FRÜHWIRTH-SCHNATTER (2014): “Ancillarity-sufficiency inter-weaving strategy (ASIS) for boosting MCMC estimation of stochastic volatility mod-els,” Computational Statistics & Data Analysis, 76, 408–423.
KIMURA, T., AND J. NAKAJIMA (2016): “Identifying conventional and unconventionalmonetary policy shocks: a latent threshold approach,” The BE Journal of Macroeco-nomics, 16(1), 277–300.
KOOP, G., AND D. KOROBILIS (2012): “Forecasting inflation using dynamic model aver-aging,” International Economic Review, 53(3), 867–886.
(2013): “Large time-varying parameter VARs,” Journal of Econometrics, 177(2),185–198.
KOOP, G., R. LEON-GONZALEZ, AND R. W. STRACHAN (2009): “On the evolution ofthe monetary policy transmission mechanism,” Journal of Economic Dynamics andControl, 33(4), 997–1017.
KOOP, G., AND S. M. POTTER (2007): “Estimation and forecasting in models with mul-tiple breaks,” The Review of Economic Studies, 74(3), 763–789.
LEYDOLD, J., AND W. HÖRMANN (2015): GIGrvg: Random variate generator for the GIGdistributionR package version 0.4.
MAHEU, J. M., AND Y. SONG (forthcoming): “An efficient Bayesian approach to multiplestructural change in multivariate time series,” Journal of Applied Econometrics.
MALSINER-WALLI, G., AND H. WAGNER (2011): “Comparing Spike and Slab Priors forBayesian Variable Selection,” Austrian Journal Of Statistics, 40(4), 241–264.
MCCULLOCH, R. E., AND R. S. TSAY (1993): “Bayesian inference and prediction formean and variance shifts in autoregressive time series,” Journal of the AmericanStatistical Association, 88(423), 968–978.
NAKAJIMA, J., AND M. WEST (2013a): “Bayesian analysis of latent threshold dynamicmodels,” Journal of Business & Economic Statistics, 31(2), 151–164.
(2013b): “Dynamic factor volatility modeling: A Bayesian latent thresholdapproach,” Journal of Financial Econometrics, 11(1), 116–153.
26
NEELON, B., AND D. B. DUNSON (2004): “Bayesian Isotonic Regression and Trend Anal-ysis,” Biometrics, 60(2), 398–406.
PRIMICERI, G. E. (2005): “Time varying structural vector autoregressions and mone-tary policy,” The Review of Economic Studies, 72(3), 821–852.
RITTER, C., AND M. A. TANNER (1992): “Facilitating the Gibbs sampler: the Gibbs stop-per and the griddy-Gibbs sampler,” Journal of the American Statistical Association,87(419), 861–868.
SIMS, C. A., AND T. ZHA (2006): “Were there regime switches in US monetary policy?,”The American Economic Review, 96(1), 54–81.
SMETS, F., AND R. WOUTERS (2007): “Shocks and frictions in US business cycles: ABayesian DSGE approach,” The American Economic Review, 97(3), 586–606.
STOCK, J. H., AND M. W. WATSON (1996): “Evidence on structural instability inmacroeconomic time series relations,” Journal of Business & Economic Statistics,14(1), 11–30.
ZHOU, X., J. NAKAJIMA, AND M. WEST (2014): “Bayesian forecasting and portfoliodecisions using dynamic dependent sparse factor models,” International Journal ofForecasting, 30(4), 963–980.
27